kindredThe Behavior-Change Layer Internal working document · Jun 2026
Where we sit · what we build · who we need

The layer that turns a model into behavioral change.

Kindred sits between a foundation model and the application it powers. We are not a model, and we are not just another app: we are the intelligence layer that takes raw capability and uses it to assess, understand, guide, measure, and change human behavior over time. This page maps that layer into five parts, what each part is made of, and the expertise we need to build it.

Client application what the user is trying to achieve surface
Kindred · the behavior-change layer assess · understand · guide · measure · improve where we sit
Foundation model raw capability (we will never be one) substrate
How to read this

Five layers, their parts, and the expertise behind them.

Each layer below breaks into concrete components and the disciplines needed to build them. The expertise tags carry a working read on our team today, so we can move on the clear calls and hold where scope is still open. This is a starting hypothesis for discussion, not a finished org chart.

In-house strengthWe have this and extend it. Hire to scale, not to start.
Priority gapNet-new, specialized, and central to the moat. Clear line of sight to begin hiring.
Hold, scope-dependentReal need, but shape depends on decisions still open. Hold until we choose.
The framework

Five layers. Each one specialized for behavior change.

The five are generic to any serious production-AI system. What makes them ours is that every layer is tuned for behavioral change and emotional intelligence: affective memory, emotionally-calibrated guardrails, and state-aware action. That specialization is the differentiation, and it is far cheaper to build in from the start than to retrofit later.

01
Define success before code

Evaluation

Success defined in numbers before anything is built, then graded automatically on every change. Without it, every other layer is guesswork.

Our version asks more than "is it correct?" It asks whether the response fits the user's emotional state and trajectory, and whether it moved them toward the target behavior.

Breaks down into

  • Success metrics in numbers, per behavior-change flow
  • Behavioral / clinical benchmark set, built from real interactions and grown by real failures
  • Deterministic checks: format, PII, length
  • Semantic judging: correctness, groundedness, safety
  • Behavioral checks: right tools, right order, escalation, scope
  • Automated grading pipeline and live scoreboard

Expertise to build

AI evaluation engineering Behavioral science / clinical psychology Psychometrics & measurement validity Data annotation & labeling ops
02
See everything, always

Observability

Every decision traced, any failed interaction replayable in minutes, live monitoring with automatic fallback. You cannot debug, or defend, what you cannot see.

Our trace records the emotional state the system inferred, its confidence, and why a memory, recommendation, or action changed because of it. The trace doubles as the regulatory evidence trail.

Breaks down into

  • Decision tracing across every step
  • Affect-aware trace schema: inferred state, confidence, reasoning
  • Sub-five-minute replay of any session
  • Live monitoring with retry caps and human handoff
  • Audit-grade evidence trail for regulators

Expertise to build

Distributed tracing / platform engineering Streaming & data pipelines Compliance-aware systems design
03
Signals, memory & tracking

Data Foundation

The majority of the real work, and more than two data sets. The system answers from live data, sees itself through tracking data, remembers through affective memory, and reads how people actually use the host product. Most teams build the first and skip the rest.

Memory is only one signal. Usage and behavioral telemetry from the host product (what people do, when they drop off, what they ignore) are first-class inputs too. Part of every deployment is sitting with the product team to co-define which data is worth collecting, so we read the right signals.

Breaks down into

  • Question data: live APIs, versioned knowledge base, user history
  • Tracking data: traces, prompt-to-output mapping, session replay
  • Affective memory: emotional encoding, affect-aware retrieval, trajectory over time
  • Usage & behavioral signals: engagement cadence, feature use, drop-off, host-product telemetry
  • Signal discovery: co-define with the product team what data is worth collecting
  • Schema & data governance across products

Expertise to build

Data platform / data engineering Retrieval & memory ML (RAG, vectors) Product & usage analytics Affective-computing research Data architecture & governance lead
04
Patterns that scale

Orchestration

One agent is simple; several is exponential complexity. Coordination patterns are chosen deliberately, and the relationship layer routes across whatever surfaces the user already lives in.

Paige is the relationship layer, invisible where appropriate. Human-in-the-loop is mandatory and state-gated: the confidence threshold is also an overwhelm threshold, and escalation routes to a qualified human.

Breaks down into

  • Coordination patterns: orchestrator-worker, choreography, human-in-the-loop
  • Relationship-layer routing across host surfaces
  • State-gated escalation to a qualified human
  • State management & fault tolerance: saga, compensation, circuit breaker

Expertise to build

Agentic systems / orchestration engineering Event-driven / workflow systems Surface integration (chat, calendar, HRIS) Conversation & relationship design
05
What keeps you in production

Governance

Audit trails, validation before the model, change management for prompts and models, and a clear answer to "who is accountable when this fails?" Its absence is what pulls a system back out of production.

This is where emotionally-calibrated guardrails live: the policy that decides which actions are permitted given the user's current state. Defer a low-value nudge when sentiment is low; step up verification on detected distress.

Breaks down into

  • Audit trails for every decision
  • PII pre-validation before input reaches the model
  • Emotionally-calibrated guardrails and affect-aware action policy
  • Prompt versioning as change management (log the intent)
  • Model change management: re-run the suite on every upgrade
  • Regulatory mapping: IL HB 1806, Colorado AI Act, CA SB 243, EU AI Act

Expertise to build

AI safety / guardrails engineering Security & PII / application security Healthcare-AI regulatory & compliance Clinical governance & licensed oversight MLOps for model change control
Use cases

Where the layers land, and how they orchestrate.

Three deployments: one internal product, one partner overlay, and one healthcare network we are working to open. Each shows the signals we read, how the five layers orchestrate a response, and where a human stays in the loop. Open any one for the full picture.

Cross-cutting capability

Clinical and fine-tuning research, led by us.

The five layers run on top of every product we operate. Together they generate the one thing no one can buy: longitudinal, affect-labeled, behavior-change data. Current leaning is that Kindred leads the fine-tuning and clinical research as a core part of the moat, done in conjunction with partners but primarily by us, since they are not positioned to do this work themselves. This capability draws from every layer above.

Evaluation → benchmark sets Observability → labeled traces Data Foundation → affective memory Governance → safe-use boundaries
Applied research scientists (LLM fine-tuning) Clinical research & study design Behavioral data science
What this is for

The ask is alignment, not execution.

This document maps the shape of the behavior-change layer so we can agree it is the direction worth committing to. Scope, sequencing, and ownership follow that agreement. The expertise reads are a working hypothesis to sharpen together: where we have full line of sight, we can start hiring now; where scope is still open, we hold. A good chunk of ownership likely sits with the C-suite and engineering, while some areas may not be wholly ours. The one fixed principle: every shared, living system needs a named owner before it ships, or it silently rots.

Candidate first shared-Core surface
  1. Evaluation harness + behavioral benchmark, built once, reused across products.
  2. Affect-aware trace schema, one format emitted by every product.
  3. Guardrail + escalation policy service, deciding what is permitted given user state.