experiment() function in the Python SDK, and the generic /api/v1/experiments endpoints have been removed. To compare two variants on the same inputs, use the skill Benchmark (with-skill vs without-skill) and Versions diff surfaces — see skillevaluation.
See the Replay guide for when to use which.
decimalai.get_replay_prompts()
Download prompts that need to be re-run after a manifest change.
Agent name to get replay prompts for.
Filter by compatibility verdict — what to do with the trace:
"keep", "repair", "replay", or "drop". (This is the keep/repair/replay/drop axis, not the eval pass/fail axis.)Maximum number of prompts (max 5000).
decimalai.create_replay_batch()
Create a batch of replay tasks.
decimalai.get_replay_batch()
Check replay batch progress.
decimalai.submit_replay_result()
Submit the result of a replayed trace.
A/B comparison (not in the SDK)
There is nodecimalai.experiment() function, and the generic /api/v1/experiments endpoints have been removed. To compare two variants on the same inputs, use the skill Benchmark (with-skill vs without-skill) and the Versions diff surfaces — see skillevaluation and the Replay guide.
What’s next
Datasets
Export replay results as JSONL or push to HuggingFace.
Replay guide
When to replay vs. when to repair — conceptual model.