Replay API - DecimalAI

The Replay API runs historical traces back through your agent under a different manifest version. Use it to empirically confirm whether a change actually breaks the things the regression check flagged as risky — or to recover traces that the pre-deploy check marked medium_risk (where structural reasoning can only say “might differ”).

When to use replay

Confirm a structural prediction

The regression check said medium_risk (model swap, prompt rewrite). Replay actually runs the affected traces through the new manifest so you can see the new outputs side-by-side with the originals.

Reproduce a flaky bug

A trace failed in production. Replay it against the same manifest to see if the failure is deterministic, then against your fix branch to verify it’s resolved.

Build training data from drift

Replay flagged-for-repair traces against a known-good manifest, then export as JSONL for SFT. This is the bridge between trace history and the Datasets API.

Re-score with a new evaluator

Add a new @eval function. Replay traces under the same manifest to re-score them without re-running the agent.

Lifecycle

Endpoints at a glance

Method	Path	Purpose
`POST`	`/api/v1/replay/batches`	Create a new replay batch (selects which traces to run)
`GET`	`/api/v1/replay/batches/{batch_id}`	Track progress + retrieve aggregate results
`GET`	`/api/v1/replay/export`	Export trace prompts or replay results as JSONL
`POST`	`/api/v1/replay/tasks/{task_id}/submit`	Submit a single task’s result (called by replay workers)

Quick start

import httpx

# 1. Create a replay batch — every "drop"-verdict trace from the last 7 days,
#    re-run against manifest v4.
resp = httpx.post(
    "https://api.decimal.ai/api/v1/replay/batches",
    headers={"Authorization": "Bearer dai_sk_..."},
    json={
        "agent_name": "support-agent",
        "target_manifest_id": "mfst_v4_abc",
        "trace_filter": {"eval_verdict": "drop", "since_days": 7},
    },
)
batch_id = resp.json()["batch_id"]

# 2. Poll for completion
import time
while True:
    status = httpx.get(
        f"https://api.decimal.ai/api/v1/replay/batches/{batch_id}",
        headers={"Authorization": "Bearer dai_sk_..."},
    ).json()
    if status["status"] == "completed":
        break
    time.sleep(5)

print(f"{status['kept']} kept · {status['repaired']} repaired · {status['still_failing']} still failing")

Replay Guide — when to replay vs. when to repair
Regression Check — the pre-deploy companion that flags candidates for replay
Datasets API — export replay results as training data

​When to use replay

Confirm a structural prediction

Reproduce a flaky bug

Build training data from drift

Re-score with a new evaluator

​Lifecycle

​Endpoints at a glance

​Quick start

​Related

When to use replay

Lifecycle

Endpoints at a glance

Quick start

Related