How Executions Nest
Trace
A trace is a single, complete agent execution — everything from input to output. It is the atomic unit of the platform. Every feature (evaluation, compatibility scoring, dataset building) operates on traces. A trace captures:- LLM calls — model, prompt, completion, tokens, latency, cost
- Tool calls — function name, arguments, results
- Active skills — which skills were loaded for this run
- Metadata — agent name, status, timing, tags
agent_name) and optionally links to one manifest version (manifest_id).
Trace vs Episode: The compatibility engine sometimes calls traces “episodes” (as in “classify this episode”). They mean the same thing — a
RunTrace database row. If you see “episode” in the API or logs, think “trace.”Trace vs Trajectory (ML context): In reinforcement learning, a “trajectory” is a full sequence of (state, action, reward) tuples. A DecimalAI trace is similar — it captures a sequence of decisions — but it doesn’t carry an explicit reward signal. Instead, eval scores serve as the quality signal, and traces are used for SFT/DPO training rather than RLHF.
Span
A span is a timed segment within a trace representing a discrete operation. Spans nest viaparent_span_id to form a tree.
| Span Type | What It Captures |
|---|---|
llm | A model invocation — prompt, completion, tokens |
tool | A tool/function call — name, arguments, result |
retriever | A RAG retrieval step — query, documents returned |
other | Custom application logic |
LLM Call
An LLM call is a single model invocation stored at full fidelity. It contains the rendered prompt messages, model output, token counts, latency, cost, and any tool calls the model requested. LLM calls are the most important artifact for fine-tuning — they become the input→output pairs in SFT datasets.Session
A session groups related traces into a multi-turn conversation. Traces within a session share asession_id and are ordered by turn_index.
Sessions enable:
- History-aware replay (re-running a full conversation, not just one message)
- Multi-turn evaluation (judging coherence across turns)
- Conversation-level analytics
Source Type
Every trace carries asource_type indicating where it came from:
| Source | Meaning |
|---|---|
production | Real user traffic (default) |
sandbox | Manual testing via the Playground |
test | Automated test suite |
eval_replay | Re-execution of a historical trace during evaluation |
evaluation, sdk, manual, synthetic, development, sample, and demo. Ingest rejects any value outside this allowlist with a 422.
Next
Versioning & Compatibility
How agent versions are tracked and what happens when they change.
Tracing Guide
How to instrument your agent across frameworks.