Compatibility Policies

Overview

When you update your agent — change a tool, rewrite a prompt, swap a model — DecimalAI automatically detects the change and classifies every existing trace into one of five actions. This classification determines which traces stay in your training datasets, which need repair, which should be re-run, which need a human look, and which are no longer valid. The compatibility policy controls how changes at each severity level map to actions. You can use built-in presets or create custom rules per agent.

Severity and action are two different things. Severity answers “how much did this surface change?” (minor / moderate / major). Action answers “what do we do with the trace?” (keep / repair / flag / replay / drop). The compatibility policy is the lookup table from one to the other — and it’s the only place the mapping lives. The same major tool change becomes replay under the default preset but drop under strict.

The Five Actions

Action	What it means	What happens to the trace	Cost
Keep ✅	Trace is fully compatible with the new agent version	Stays in training datasets as-is	Free
Repair 🔧	Trace can be fixed with a deterministic data migration	Tool calls are rewritten to match new schema (e.g., rename param)	Free (no LLM)
Flag 🚩	Trace might be fine, but the change is ambiguous enough to want a human look	Stays in datasets, but is marked for review in the Impact Report — no automatic action is taken	Free
Replay 🔄	Trace input is still valuable, but output is stale	Original prompt is re-run through the new agent to get fresh output	LLM cost
Drop ✕	Trace is incompatible — both input and output are invalid	Excluded from datasets entirely	Free

flag is a real action, not a label. It’s the “don’t decide automatically” verdict: the trace stays usable, but it’s surfaced in the Impact Report so a person can choose keep, replay, or drop for it. The permissive preset leans on flag heavily — it never auto-drops, it only flags major changes for review. When two actions tie on a trace, priority runs drop > replay > flag > repair > keep.

Compatibility policy configuration showing rules per component type

The key question is: is the trace’s input still meaningful?

If yes → Replay (re-run it through the new agent)
If no → Drop (discard it entirely)

When Each Action Applies

The tables below show the raw severity the diff engine assigns to each kind of change, plus the action the default preset resolves it to. These are the before-policy defaults — switching presets or overriding a surface changes the action column (see The Three Presets), but the severity classification in the “Scenario” rows stays the same.

Tool Changes

Tools are the most common source of agent changes. Here’s how different types of tool changes are classified (default preset):

Scenario	Example	Severity	Default Action	Why
Tool unchanged	`search_docs` is identical in v1 and v2	none	Keep	Nothing changed — trace is still valid
Optional parameter added	`check_inventory` gains optional `region` param	minor	Keep	Old calls still work as-is; nothing to rewrite
Parameter renamed	`product_id` → `item_id`	moderate	Repair	Deterministic rewrite — find-and-replace in trace records
Parameter removed	`check_inventory` no longer accepts `verbose` flag	moderate	Repair	Strip the removed param from stored tool calls
Description changed	Tool description updated but schema identical	minor	Keep	No structural impact on stored calls
Required parameter added	`refund_order` now requires `reason` field	major	Replay	Old calls are missing a required field — re-run to capture it correctly
Type changed	`quantity` changed from `string` to `integer`	moderate	Repair	Cast the stored value to the new type during the data migration
Tool renamed	`check_inventory` → `lookup_stock` (new name, same capability)	major	Replay ¹	Capability still exists under a new name; re-run captures the new tool usage
Tool removed entirely	`get_pricing` deleted from agent	major	Replay	Default replays `major` tool changes; override to `drop` if the capability is truly gone

¹ Tool renames appear as a removal + addition to the compatibility engine — both land in the tool_registry surface as a major change. Under the default preset that resolves to Replay (re-running picks up the new tool name). Under the strict preset a major tool change resolves to Drop instead; switch presets or override individual traces in the Impact Report.

These are tool_registry surface actions under the default preset: minor → keep, moderate → repair, major → replay. The per-scenario rows above just show which severity bucket each kind of change lands in.

Prompt Changes

Prompt changes are classified by the text diff percentage between old and new versions (prompt_stack surface, default preset: minor → keep, moderate → flag, major → replay):

Scenario	Diff Threshold	Severity	Default Action	Why
Typo fix	≤5% diff	minor	Keep	Negligible impact on agent behavior
Paragraph added	5–30% diff	moderate	Flag	Minor behavioral shift — a human should review, but data is likely still valid
Complete rewrite	>30% diff	major	Replay	Agent behavior fundamentally changed — old outputs don’t reflect new instructions

Model Changes

model_runtime surface, default preset: minor → keep, moderate → flag, major → drop.

Scenario	Example	Severity	Default Action	Why
Config tweak	Temperature 0.7 → 0.8	minor	Keep	Same model, minor sampling change
Version bump (same family)	`gpt-4o-2024-05` → `gpt-4o-2024-08`	moderate	Flag	Same family, likely similar — but worth a human look
Different model (same provider)	`gpt-4` → `gpt-4o`	major	Drop	Different model architecture — output distributions differ significantly
Provider change	`gpt-4o` → `claude-sonnet-4-6`	major	Drop	Response distributions are fundamentally different — outputs from one model shouldn’t train another

Drop vs Replay — The Decision Guide

The distinction between Drop and Replay is the most important decision in the policy. Here’s a simple framework:

Question	If yes →	If no →
Is the trace’s input (user question) still meaningful for the new agent?	Replay	Drop
Could the new agent handle this request successfully?	Replay	Drop
Does the capability still exist in some form (renamed, moved, or replaced)?	Replay	Drop
Was the tool/feature permanently removed with no replacement?	Drop	Replay

Drop means the question is worthless — not just the output, but the entire conversation has no value for training the new agent.Replay means the question is still good, but the answer is stale — re-running it through the new agent will produce a fresh, valid output.

Examples

Replay: You renamed check_inventory to lookup_stock. A trace asking “Is SKU-1234 in stock?” is still a perfectly valid customer question — the new agent can answer it using lookup_stock. Re-run the trace to capture the new tool usage. Drop: You removed the get_pricing tool because pricing is now handled by a separate microservice your agent doesn’t access. A trace asking “What’s the price of SKU-1234?” can’t be answered by the new agent — the capability is gone. Drop the trace.

The Three Presets

DecimalAI ships with three policy presets. Each preset configures how severity levels map to actions for every surface:

Preset Comparison

Surface	Severity	Strict	Default	Permissive
Tools	Minor (optional param, description)	Keep	Keep	Keep
Tools	Moderate (schema change, repairable)	Drop	Repair	Keep
Tools	Major (tool removed, required param added)	Drop	Replay	Flag
Prompts	Minor (≤5% diff)	Flag	Keep	Keep
Prompts	Moderate (5–30% diff)	Drop	Flag	Keep
Prompts	Major (>30% diff)	Drop	Replay	Flag
Model	Minor (config tweak)	Keep	Keep	Keep
Model	Moderate (version bump)	Drop	Flag	Keep
Model	Major (different model/provider)	Drop	Drop	Flag
Output Contract	Minor	Flag	Keep	Keep
Output Contract	Moderate	Drop	Repair	Keep
Output Contract	Major	Drop	Drop	Flag

When to Use Each Preset

Strict
Default
Permissive

Use when: Training data purity is critical. You’re fine-tuning for production deployment and can’t afford any stale data.Behavior: Aggressively drops traces when changes are detected. Only keeps traces that are fully compatible. No automatic repair.Trade-off: Maximum data quality, minimum dataset size.

Why the Defaults Are Set This Way

Tools: major → replay (not drop)

Most tool “removals” are actually renames or replacements. The user’s question is still valid — re-running it captures how the new agent handles the same request. That’s why the default preset maps tool_registry on_major to replay rather than drop. Teams that want stricter behavior can switch to the strict preset, where a major tool change drops instead.

Prompts: major → replay

A prompt rewrite changes agent behavior, not the domain. “How do I reset my password?” is still a valid question even if the agent’s personality, tone, and instructions changed completely. Re-running produces fresh training data aligned with the new instructions, so prompt_stack on_major defaults to replay.

Model: major → drop

Switching model providers (OpenAI → Anthropic) produces fundamentally different output distributions. Training model B on model A’s outputs is generally counterproductive — the response styles, reasoning patterns, and formatting differ too much. This is the one case where the old output is genuinely harmful to keep, so model_runtime on_major defaults to drop. (Distillation and synthetic traces are the exception — see the source-type overrides, which let those skip the model check.)

Output contract: major → drop

If the expected output format changed entirely (e.g., from plain text to structured JSON), old outputs can’t be used for training the new format. The data is structurally incompatible, so output_contract on_major defaults to drop. A moderate change (a field type shift) defaults to repair instead — those are often mechanically fixable.

Customizing Policies

The fastest way to set a policy is the dashboard. For automation, the same configuration is available over the REST API. (There is no decimalai policy CLI command and no Python SDK helper — policies are configured through the dashboard or the API below.)

Using Presets (Dashboard)

Navigate to your agent’s Policy tab to configure rules visually:

Select a preset as your starting point (strict, default, or permissive)
Adjust individual surface rules using the dropdown matrix
Preview the impact on your existing traces before saving
Save to apply the policy

Using Presets (API)

Create a policy for a specific agent by POSTing a preset name:

curl -X POST https://api.decimal.ai/api/v1/manifests/policies \
  -H "Authorization: Bearer $API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "project_id": "proj_...",
    "agent_name": "support-agent",
    "name": "support-strict",
    "preset": "strict"
  }'

The built-in presets are available at GET /api/v1/manifests/policies/presets, and the policy currently active for an agent is at GET /api/v1/manifests/policies/active?agent_name=support-agent.

Custom Per-Surface Rules

Override individual surfaces by passing rules_json instead of (or alongside) a preset. The keys are the manifest surface names; each maps severity (on_minor / on_moderate / on_major) to an action:

curl -X POST https://api.decimal.ai/api/v1/manifests/policies \
  -H "Authorization: Bearer $API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "project_id": "proj_...",
    "agent_name": "support-agent",
    "name": "support-custom",
    "preset": "custom",
    "rules_json": {
      "tool_registry":  {"on_minor": "keep", "on_moderate": "repair", "on_major": "drop"},
      "prompt_stack":   {"on_minor": "keep", "on_moderate": "keep",   "on_major": "replay"},
      "model_runtime":  {"on_minor": "keep", "on_moderate": "keep",   "on_major": "drop"}
    }
  }'

`rules_json` field reference

rules_json

object

The full per-surface policy. Keys are surface names; omit a surface to inherit the named preset. One reserved key, source_type_overrides, is not a surface (see below).

Show surface key — one of the 10 compatibility surfaces

Valid keys: prompt_stack, model_runtime, tool_registry, skill_registry, workflow, subagents, output_contract, guardrails, context_config, environment. An unknown surface key is logged and treated as flag rather than silently kept.

on_minor

enum

default:"keep"

Action when the diff engine classifies this surface’s change as minor (cosmetic — a typo fix, a temperature tweak). One of keep · repair · flag · replay · drop.

on_moderate

enum

default:"keep"

Action for a moderate (significant, possibly repairable) change — a renamed parameter, a moderately rewritten prompt. One of keep · repair · flag · replay · drop.

on_major

enum

default:"keep"

Action for a major (breaking) change — a removed tool, a provider swap, a full prompt rewrite. One of keep · repair · flag · replay · drop.

Show source_type_overrides (reserved key — not a surface)

Per-source_type overrides that take precedence over the base surface rules for matching traces. The shape is { <source_type>: { <surface>: { on_minor, on_moderate, on_major } } }.

<source_type>

object

A trace source type (production, distillation, synthetic, manual, test, …). Maps to a partial set of surface rules that override the base rules for traces of that source type.

Source-Type Overrides

Distillation and synthetic traces can skip model compatibility checks, since they were generated by a teacher model — the student model’s identity doesn’t matter. Add a source_type_overrides block to rules_json:

curl -X POST https://api.decimal.ai/api/v1/manifests/policies \
  -H "Authorization: Bearer $API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "project_id": "proj_...",
    "agent_name": "support-agent",
    "name": "support-distill",
    "preset": "default",
    "rules_json": {
      "source_type_overrides": {
        "distillation": {
          "model_runtime": {"on_minor": "keep", "on_moderate": "keep", "on_major": "keep"}
        }
      }
    }
  }'

All three built-in presets already ship distillation, synthetic, and manual source-type overrides for model_runtime (each set to keep on every severity) — pass your own block only when you need to widen or narrow them. To update an existing policy, PUT /api/v1/manifests/policies/{policy_id} with the same body shape.

How It Works End-to-End

Detect — The SDK automatically captures your agent’s manifest (tools, prompts, model) on each run
Diff — When a new manifest is detected, DecimalAI computes a field-level diff against the previous version
Classify — Each trace is individually classified based on which components it actually used
Review — The Impact Report shows the distribution and lets you override individual verdicts
Act — Repair traces, replay stale ones, build clean datasets, or export replay prompts

Next Steps

Manifests

How manifests get captured and what’s in the diff.

Versioning concepts

Component types, severity, verdicts.

Regression Check

Get the analysis as a PR comment.

​Overview

​The Five Actions

​When Each Action Applies

​Tool Changes

​Prompt Changes

​Model Changes

​Drop vs Replay — The Decision Guide

​Examples

​The Three Presets

​Preset Comparison

​When to Use Each Preset

​Why the Defaults Are Set This Way

​Customizing Policies

​Using Presets (Dashboard)

​Using Presets (API)

​Custom Per-Surface Rules

​rules_json field reference

​Source-Type Overrides

​How It Works End-to-End

​Next Steps

Manifests

Versioning concepts

Regression Check

Overview

The Five Actions

When Each Action Applies

Tool Changes

Prompt Changes

Model Changes

Drop vs Replay — The Decision Guide

Examples

The Three Presets

Preset Comparison

When to Use Each Preset

Why the Defaults Are Set This Way

Customizing Policies

Using Presets (Dashboard)

Using Presets (API)

Custom Per-Surface Rules

`rules_json` field reference

Source-Type Overrides

How It Works End-to-End

Next Steps