Skip to main content
Replay re-runs historical traces against a new manifest: pick a set of inputs, create one batch against a target manifest, and aggregate the raw outputs. The four functions below are the Python SDK’s replay surface. There is no experiment() function in the Python SDK, and the generic /api/v1/experiments endpoints have been removed. To compare two variants on the same inputs, use the skill Benchmark (with-skill vs without-skill) and Versions diff surfaces — see skillevaluation. See the Replay guide for when to use which.

decimalai.get_replay_prompts()

Download prompts that need to be re-run after a manifest change.
result = decimalai.get_replay_prompts("my-agent")

print(f"Total stale prompts: {result['total']}")
for prompt in result["prompts"]:
    print(f"  Trace {prompt['trace_id']}: {prompt['user_input'][:80]}...")
agent_name
str
required
Agent name to get replay prompts for.
verdict
str
default:"replay + drop"
Filter by compatibility verdict — what to do with the trace: "keep", "repair", "replay", or "drop". (This is the keep/repair/replay/drop axis, not the eval pass/fail axis.)
limit
int
default:"500"
Maximum number of prompts (max 5000).

decimalai.create_replay_batch()

Create a batch of replay tasks.
batch = decimalai.create_replay_batch(
    source_manifest_id="mfst_old",
    target_manifest_id="mfst_new",
    trace_ids=["trace_1", "trace_2", "trace_3"],
)
print(f"Batch {batch['batch_id']}: {batch['total_tasks']} tasks")

decimalai.get_replay_batch()

Check replay batch progress.
batch = decimalai.get_replay_batch("batch_abc")
print(f"Status: {batch['batch_status']}")
print(f"Progress: {batch['completed']}/{batch['total']}")

decimalai.submit_replay_result()

Submit the result of a replayed trace.
decimalai.submit_replay_result(
    task_id="task_xyz",
    replayed_trace_id="new_trace_id",
    eval_score=0.95,
    eval_verdict="pass",  # eval pass/fail — NOT the keep/repair/replay/drop verdict
)

A/B comparison (not in the SDK)

There is no decimalai.experiment() function, and the generic /api/v1/experiments endpoints have been removed. To compare two variants on the same inputs, use the skill Benchmark (with-skill vs without-skill) and the Versions diff surfaces — see skillevaluation and the Replay guide.

What’s next

Datasets

Export replay results as JSONL or push to HuggingFace.

Replay guide

When to replay vs. when to repair — conceptual model.