Kindred · The Behavior-Change Layer

01

Define success before code

Evaluation

Success defined in numbers before anything is built, then graded automatically on every change. Without it, every other layer is guesswork.

Our version asks more than "is it correct?" It asks whether the response fits the user's emotional state and trajectory, and whether it moved them toward the target behavior.

Breaks down into

Success metrics in numbers, per behavior-change flow
Behavioral / clinical benchmark set, built from real interactions and grown by real failures
Deterministic checks: format, PII, length
Semantic judging: correctness, groundedness, safety
Behavioral checks: right tools, right order, escalation, scope
Automated grading pipeline and live scoreboard

Expertise to build

AI evaluation engineering Behavioral science / clinical psychology Psychometrics & measurement validity Data annotation & labeling ops

02

See everything, always

Observability

Every decision traced, any failed interaction replayable in minutes, live monitoring with automatic fallback. You cannot debug, or defend, what you cannot see.

Our trace records the emotional state the system inferred, its confidence, and why a memory, recommendation, or action changed because of it. The trace doubles as the regulatory evidence trail.

Breaks down into

Decision tracing across every step
Affect-aware trace schema: inferred state, confidence, reasoning
Sub-five-minute replay of any session
Live monitoring with retry caps and human handoff
Audit-grade evidence trail for regulators

Expertise to build

Distributed tracing / platform engineering Streaming & data pipelines Compliance-aware systems design

03

Signals, memory & tracking

Data Foundation

The majority of the real work, and more than two data sets. The system answers from live data, sees itself through tracking data, remembers through affective memory, and reads how people actually use the host product. Most teams build the first and skip the rest.

Memory is only one signal. Usage and behavioral telemetry from the host product (what people do, when they drop off, what they ignore) are first-class inputs too. Part of every deployment is sitting with the product team to co-define which data is worth collecting, so we read the right signals.

Breaks down into

Question data: live APIs, versioned knowledge base, user history
Tracking data: traces, prompt-to-output mapping, session replay
Affective memory: emotional encoding, affect-aware retrieval, trajectory over time
Usage & behavioral signals: engagement cadence, feature use, drop-off, host-product telemetry
Signal discovery: co-define with the product team what data is worth collecting
Schema & data governance across products

Expertise to build

Data platform / data engineering Retrieval & memory ML (RAG, vectors) Product & usage analytics Affective-computing research Data architecture & governance lead

04

Patterns that scale

Orchestration

One agent is simple; several is exponential complexity. Coordination patterns are chosen deliberately, and the relationship layer routes across whatever surfaces the user already lives in.

Paige is the relationship layer, invisible where appropriate. Human-in-the-loop is mandatory and state-gated: the confidence threshold is also an overwhelm threshold, and escalation routes to a qualified human.

Breaks down into

Coordination patterns: orchestrator-worker, choreography, human-in-the-loop
Relationship-layer routing across host surfaces
State-gated escalation to a qualified human
State management & fault tolerance: saga, compensation, circuit breaker

Expertise to build

Agentic systems / orchestration engineering Event-driven / workflow systems Surface integration (chat, calendar, HRIS) Conversation & relationship design

05

What keeps you in production

Governance

Audit trails, validation before the model, change management for prompts and models, and a clear answer to "who is accountable when this fails?" Its absence is what pulls a system back out of production.

This is where emotionally-calibrated guardrails live: the policy that decides which actions are permitted given the user's current state. Defer a low-value nudge when sentiment is low; step up verification on detected distress.

Breaks down into

Audit trails for every decision
PII pre-validation before input reaches the model
Emotionally-calibrated guardrails and affect-aware action policy
Prompt versioning as change management (log the intent)
Model change management: re-run the suite on every upgrade
Regulatory mapping: IL HB 1806, Colorado AI Act, CA SB 243, EU AI Act

Expertise to build

AI safety / guardrails engineering Security & PII / application security Healthcare-AI regulatory & compliance Clinical governance & licensed oversight MLOps for model change control

The layer that turns a model into behavioral change.

Five layers, their parts, and the expertise behind them.

Five layers. Each one specialized for behavior change.

Evaluation

Breaks down into

Expertise to build

Observability

Breaks down into

Expertise to build

Data Foundation

Breaks down into

Expertise to build

Orchestration

Breaks down into

Expertise to build

Governance

Breaks down into

Expertise to build

Where the layers land, and how they orchestrate.

Clinical and fine-tuning research, led by us.

The ask is alignment, not execution.

The goal

Orchestration

Signals in

The behavior-change layer

Actions on surface

The goal

Orchestration

Signals in

The behavior-change layer

Actions on surface

The goal

Orchestration

Signals in

The behavior-change layer

Actions on surface