Skip to main content

Overview

When you update your agent — change a tool, rewrite a prompt, swap a model — DecimalAI automatically detects the change and classifies every existing trace into one of five actions. This classification determines which traces stay in your training datasets, which need repair, which should be re-run, which need a human look, and which are no longer valid. The compatibility policy controls how changes at each severity level map to actions. You can use built-in presets or create custom rules per agent.
Severity and action are two different things. Severity answers “how much did this surface change?” (minor / moderate / major). Action answers “what do we do with the trace?” (keep / repair / flag / replay / drop). The compatibility policy is the lookup table from one to the other — and it’s the only place the mapping lives. The same major tool change becomes replay under the default preset but drop under strict.

The Five Actions

ActionWhat it meansWhat happens to the traceCost
KeepTrace is fully compatible with the new agent versionStays in training datasets as-isFree
Repair 🔧Trace can be fixed with a deterministic data migrationTool calls are rewritten to match new schema (e.g., rename param)Free (no LLM)
Flag 🚩Trace might be fine, but the change is ambiguous enough to want a human lookStays in datasets, but is marked for review in the Impact Report — no automatic action is takenFree
Replay 🔄Trace input is still valuable, but output is staleOriginal prompt is re-run through the new agent to get fresh outputLLM cost
DropTrace is incompatible — both input and output are invalidExcluded from datasets entirelyFree
flag is a real action, not a label. It’s the “don’t decide automatically” verdict: the trace stays usable, but it’s surfaced in the Impact Report so a person can choose keep, replay, or drop for it. The permissive preset leans on flag heavily — it never auto-drops, it only flags major changes for review. When two actions tie on a trace, priority runs drop > replay > flag > repair > keep.
Compatibility policy configuration showing rules per component type
The key question is: is the trace’s input still meaningful?
  • If yes → Replay (re-run it through the new agent)
  • If no → Drop (discard it entirely)

When Each Action Applies

The tables below show the raw severity the diff engine assigns to each kind of change, plus the action the default preset resolves it to. These are the before-policy defaults — switching presets or overriding a surface changes the action column (see The Three Presets), but the severity classification in the “Scenario” rows stays the same.

Tool Changes

Tools are the most common source of agent changes. Here’s how different types of tool changes are classified (default preset):
ScenarioExampleSeverityDefault ActionWhy
Tool unchangedsearch_docs is identical in v1 and v2noneKeepNothing changed — trace is still valid
Optional parameter addedcheck_inventory gains optional region paramminorKeepOld calls still work as-is; nothing to rewrite
Parameter renamedproduct_iditem_idmoderateRepairDeterministic rewrite — find-and-replace in trace records
Parameter removedcheck_inventory no longer accepts verbose flagmoderateRepairStrip the removed param from stored tool calls
Description changedTool description updated but schema identicalminorKeepNo structural impact on stored calls
Required parameter addedrefund_order now requires reason fieldmajorReplayOld calls are missing a required field — re-run to capture it correctly
Type changedquantity changed from string to integermoderateRepairCast the stored value to the new type during the data migration
Tool renamedcheck_inventorylookup_stock (new name, same capability)majorReplay ¹Capability still exists under a new name; re-run captures the new tool usage
Tool removed entirelyget_pricing deleted from agentmajorReplayDefault replays major tool changes; override to drop if the capability is truly gone
¹ Tool renames appear as a removal + addition to the compatibility engine — both land in the tool_registry surface as a major change. Under the default preset that resolves to Replay (re-running picks up the new tool name). Under the strict preset a major tool change resolves to Drop instead; switch presets or override individual traces in the Impact Report.
These are tool_registry surface actions under the default preset: minor → keep, moderate → repair, majorreplay. The per-scenario rows above just show which severity bucket each kind of change lands in.

Prompt Changes

Prompt changes are classified by the text diff percentage between old and new versions (prompt_stack surface, default preset: minor → keep, moderate → flag, major → replay):
ScenarioDiff ThresholdSeverityDefault ActionWhy
Typo fix≤5% diffminorKeepNegligible impact on agent behavior
Paragraph added5–30% diffmoderateFlagMinor behavioral shift — a human should review, but data is likely still valid
Complete rewrite>30% diffmajorReplayAgent behavior fundamentally changed — old outputs don’t reflect new instructions

Model Changes

model_runtime surface, default preset: minor → keep, moderate → flag, major → drop.
ScenarioExampleSeverityDefault ActionWhy
Config tweakTemperature 0.7 → 0.8minorKeepSame model, minor sampling change
Version bump (same family)gpt-4o-2024-05gpt-4o-2024-08moderateFlagSame family, likely similar — but worth a human look
Different model (same provider)gpt-4gpt-4omajorDropDifferent model architecture — output distributions differ significantly
Provider changegpt-4oclaude-sonnet-4-6majorDropResponse distributions are fundamentally different — outputs from one model shouldn’t train another

Drop vs Replay — The Decision Guide

The distinction between Drop and Replay is the most important decision in the policy. Here’s a simple framework:
QuestionIf yes →If no →
Is the trace’s input (user question) still meaningful for the new agent?ReplayDrop
Could the new agent handle this request successfully?ReplayDrop
Does the capability still exist in some form (renamed, moved, or replaced)?ReplayDrop
Was the tool/feature permanently removed with no replacement?DropReplay
Drop means the question is worthless — not just the output, but the entire conversation has no value for training the new agent.Replay means the question is still good, but the answer is stale — re-running it through the new agent will produce a fresh, valid output.

Examples

Replay: You renamed check_inventory to lookup_stock. A trace asking “Is SKU-1234 in stock?” is still a perfectly valid customer question — the new agent can answer it using lookup_stock. Re-run the trace to capture the new tool usage. Drop: You removed the get_pricing tool because pricing is now handled by a separate microservice your agent doesn’t access. A trace asking “What’s the price of SKU-1234?” can’t be answered by the new agent — the capability is gone. Drop the trace.

The Three Presets

DecimalAI ships with three policy presets. Each preset configures how severity levels map to actions for every surface:

Preset Comparison

SurfaceSeverityStrictDefaultPermissive
ToolsMinor (optional param, description)KeepKeepKeep
ToolsModerate (schema change, repairable)DropRepairKeep
ToolsMajor (tool removed, required param added)DropReplayFlag
PromptsMinor (≤5% diff)FlagKeepKeep
PromptsModerate (5–30% diff)DropFlagKeep
PromptsMajor (>30% diff)DropReplayFlag
ModelMinor (config tweak)KeepKeepKeep
ModelModerate (version bump)DropFlagKeep
ModelMajor (different model/provider)DropDropFlag
Output ContractMinorFlagKeepKeep
Output ContractModerateDropRepairKeep
Output ContractMajorDropDropFlag

When to Use Each Preset

Use when: Training data purity is critical. You’re fine-tuning for production deployment and can’t afford any stale data.Behavior: Aggressively drops traces when changes are detected. Only keeps traces that are fully compatible. No automatic repair.Trade-off: Maximum data quality, minimum dataset size.

Why the Defaults Are Set This Way

Most tool “removals” are actually renames or replacements. The user’s question is still valid — re-running it captures how the new agent handles the same request. That’s why the default preset maps tool_registry on_major to replay rather than drop. Teams that want stricter behavior can switch to the strict preset, where a major tool change drops instead.
A prompt rewrite changes agent behavior, not the domain. “How do I reset my password?” is still a valid question even if the agent’s personality, tone, and instructions changed completely. Re-running produces fresh training data aligned with the new instructions, so prompt_stack on_major defaults to replay.
Switching model providers (OpenAI → Anthropic) produces fundamentally different output distributions. Training model B on model A’s outputs is generally counterproductive — the response styles, reasoning patterns, and formatting differ too much. This is the one case where the old output is genuinely harmful to keep, so model_runtime on_major defaults to drop. (Distillation and synthetic traces are the exception — see the source-type overrides, which let those skip the model check.)
If the expected output format changed entirely (e.g., from plain text to structured JSON), old outputs can’t be used for training the new format. The data is structurally incompatible, so output_contract on_major defaults to drop. A moderate change (a field type shift) defaults to repair instead — those are often mechanically fixable.

Customizing Policies

The fastest way to set a policy is the dashboard. For automation, the same configuration is available over the REST API. (There is no decimalai policy CLI command and no Python SDK helper — policies are configured through the dashboard or the API below.)

Using Presets (Dashboard)

Navigate to your agent’s Policy tab to configure rules visually:
  1. Select a preset as your starting point (strict, default, or permissive)
  2. Adjust individual surface rules using the dropdown matrix
  3. Preview the impact on your existing traces before saving
  4. Save to apply the policy

Using Presets (API)

Create a policy for a specific agent by POSTing a preset name:
curl -X POST https://api.decimal.ai/api/v1/manifests/policies \
  -H "Authorization: Bearer $API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "project_id": "proj_...",
    "agent_name": "support-agent",
    "name": "support-strict",
    "preset": "strict"
  }'
The built-in presets are available at GET /api/v1/manifests/policies/presets, and the policy currently active for an agent is at GET /api/v1/manifests/policies/active?agent_name=support-agent.

Custom Per-Surface Rules

Override individual surfaces by passing rules_json instead of (or alongside) a preset. The keys are the manifest surface names; each maps severity (on_minor / on_moderate / on_major) to an action:
curl -X POST https://api.decimal.ai/api/v1/manifests/policies \
  -H "Authorization: Bearer $API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "project_id": "proj_...",
    "agent_name": "support-agent",
    "name": "support-custom",
    "preset": "custom",
    "rules_json": {
      "tool_registry":  {"on_minor": "keep", "on_moderate": "repair", "on_major": "drop"},
      "prompt_stack":   {"on_minor": "keep", "on_moderate": "keep",   "on_major": "replay"},
      "model_runtime":  {"on_minor": "keep", "on_moderate": "keep",   "on_major": "drop"}
    }
  }'

rules_json field reference

rules_json
object
The full per-surface policy. Keys are surface names; omit a surface to inherit the named preset. One reserved key, source_type_overrides, is not a surface (see below).

Source-Type Overrides

Distillation and synthetic traces can skip model compatibility checks, since they were generated by a teacher model — the student model’s identity doesn’t matter. Add a source_type_overrides block to rules_json:
curl -X POST https://api.decimal.ai/api/v1/manifests/policies \
  -H "Authorization: Bearer $API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "project_id": "proj_...",
    "agent_name": "support-agent",
    "name": "support-distill",
    "preset": "default",
    "rules_json": {
      "source_type_overrides": {
        "distillation": {
          "model_runtime": {"on_minor": "keep", "on_moderate": "keep", "on_major": "keep"}
        }
      }
    }
  }'
All three built-in presets already ship distillation, synthetic, and manual source-type overrides for model_runtime (each set to keep on every severity) — pass your own block only when you need to widen or narrow them. To update an existing policy, PUT /api/v1/manifests/policies/{policy_id} with the same body shape.

How It Works End-to-End

  1. Detect — The SDK automatically captures your agent’s manifest (tools, prompts, model) on each run
  2. Diff — When a new manifest is detected, DecimalAI computes a field-level diff against the previous version
  3. Classify — Each trace is individually classified based on which components it actually used
  4. Review — The Impact Report shows the distribution and lets you override individual verdicts
  5. Act — Repair traces, replay stale ones, build clean datasets, or export replay prompts

Next Steps

Manifests

How manifests get captured and what’s in the diff.

Versioning concepts

Component types, severity, verdicts.

Regression Check

Get the analysis as a PR comment.