Replay & Experiments

Replay re-runs historical traces against a new manifest: pick a set of inputs, create one batch against a target manifest, and aggregate the raw outputs. The four functions below are the Python SDK’s replay surface. There is no experiment() function in the Python SDK, and the generic /api/v1/experiments endpoints have been removed. To compare two variants on the same inputs, use the skill Benchmark (with-skill vs without-skill) and Versions diff surfaces — see skillevaluation. See the Replay guide for when to use which.

`decimalai.get_replay_prompts()`

Download prompts that need to be re-run after a manifest change.

result = decimalai.get_replay_prompts("my-agent")

print(f"Total stale prompts: {result['total']}")
for prompt in result["prompts"]:
    print(f"  Trace {prompt['trace_id']}: {prompt['user_input'][:80]}...")

agent_name

str

required

Agent name to get replay prompts for.

verdict

str

default:"replay + drop"

Filter by compatibility verdict — what to do with the trace: "keep", "repair", "replay", or "drop". (This is the keep/repair/replay/drop axis, not the eval pass/fail axis.)

limit

int

default:"500"

Maximum number of prompts (max 5000).

`decimalai.create_replay_batch()`

Create a batch of replay tasks.

batch = decimalai.create_replay_batch(
    source_manifest_id="mfst_old",
    target_manifest_id="mfst_new",
    trace_ids=["trace_1", "trace_2", "trace_3"],
)
print(f"Batch {batch['batch_id']}: {batch['total_tasks']} tasks")

`decimalai.get_replay_batch()`

Check replay batch progress.

batch = decimalai.get_replay_batch("batch_abc")
print(f"Status: {batch['batch_status']}")
print(f"Progress: {batch['completed']}/{batch['total']}")

`decimalai.submit_replay_result()`

Submit the result of a replayed trace.

decimalai.submit_replay_result(
    task_id="task_xyz",
    replayed_trace_id="new_trace_id",
    eval_score=0.95,
    eval_verdict="pass",  # eval pass/fail — NOT the keep/repair/replay/drop verdict
)

A/B comparison (not in the SDK)

There is no decimalai.experiment() function, and the generic /api/v1/experiments endpoints have been removed. To compare two variants on the same inputs, use the skill Benchmark (with-skill vs without-skill) and the Versions diff surfaces — see skillevaluation and the Replay guide.

Replay & Experiments

`decimalai.get_replay_prompts()`

`decimalai.create_replay_batch()`

`decimalai.get_replay_batch()`

`decimalai.submit_replay_result()`

A/B comparison (not in the SDK)

What’s next

Datasets

Replay guide

​decimalai.get_replay_prompts()

​decimalai.create_replay_batch()

​decimalai.get_replay_batch()

​decimalai.submit_replay_result()

​A/B comparison (not in the SDK)

​What’s next

Datasets

Replay guide

`decimalai.get_replay_prompts()`

`decimalai.create_replay_batch()`

`decimalai.get_replay_batch()`

`decimalai.submit_replay_result()`

A/B comparison (not in the SDK)

What’s next