Supported Frameworks
DecimalAI provides first-class integration with the most popular agent frameworks. Pick yours and add two lines of code:| Framework | Flag | How it works |
|---|---|---|
| LangChain / LangGraph | langchain=True | Native callback handler |
| OpenAI Agents SDK | openai_agents=True | Native tracing hook |
| LlamaIndex | llamaindex=True | Native SpanHandler on LlamaIndex’s dispatcher |
| CrewAI | crewai=True | OpenTelemetry (CrewAI emits standard GenAI spans) |
| AutoGen / AG2 | autogen=True | OpenTelemetry (AutoGen emits standard GenAI spans) |
| Any OTEL framework | otel=True | Generic OpenTelemetry span exporter |
| Any Python function | @decimalai.trace() | Decorator-based manual tracing |
CrewAI and AutoGen use the OpenTelemetry GenAI semantic conventions under the hood.
The
crewai=True and autogen=True flags are the same as otel=True — they’re
convenience aliases so you don’t need to know what “OTEL” means.Quick Setup
Zero-Code Setup (Environment Variables)
You can instrument without changing any code:What Gets Captured
Each trace contains:- User input — the query or task that triggered the agent
- Final output — the agent’s response
- LLM calls — model name, provider, prompt/completion tokens, latency, temperature
- Tool calls — function name, arguments, results
- Spans — hierarchical execution steps (retrieval, chain, agent, etc.)
- Status — success or error, with error messages
- Timestamps — start/end times for precise latency measurement
- Active skills — which skills were engaged during execution (auto-detected)
Trace Schema
Every trace is represented as aRunTrace with these core fields:
| Field | Type | Description |
|---|---|---|
id | UUID | Unique trace identifier |
agent_name | string | The agent that produced this trace |
status | success / error | Overall execution status |
user_input_preview | string | Truncated user input |
final_output_preview | string | Truncated agent output |
llm_calls | LlmCallRecord[] | All LLM API calls made |
spans | TraceSpan[] | Hierarchical execution spans |
started_at / ended_at | datetime | Trace timing |
source_type | string | production, replay, experiment |
manifest_id | UUID | The manifest version active when this trace was recorded |
active_skills | string[] | Skills detected as active in this trace |
LlmCallRecord
Each LLM call within a trace records:| Field | Type | Description |
|---|---|---|
model_name | string | e.g., gpt-4o, claude-sonnet-4-6, gemini-2.5-pro |
provider | string | openai, anthropic, google, etc. |
input_tokens | int | Prompt tokens |
output_tokens | int | Completion tokens |
latency_ms | int | Wall-clock time for this call |
tool_calls_json | list | Tool calls made by the model |
finish_reason | string | stop, tool_calls, length, etc. |
Manual Tracing
For custom frameworks or non-LLM workflows, use the@decimalai.trace() decorator:
Viewing Traces
Once traces are flowing, view them in the dashboard:- Traces page (
/traces) — all traces across all agents - Agent dashboard → Traces tab — filtered to a specific agent
- Trace detail — click any trace to see the full execution tree with spans, LLM calls, and tool invocations
Search & Filtering
The Traces page supports filtering by:| Filter | Example |
|---|---|
| Agent | support-agent, research-bot |
| Status | success, error |
| Date range | Last 24h, last 7 days, custom |
| Manifest version | v1, v2, v3 |
| Eval verdict | pass, fail, review |
Cost Tracking
DecimalAI estimates the cost of each trace based on the LLM calls made. Costs are calculated from model pricing tables and displayed per-trace and aggregated on the dashboard. This helps you:- Identify expensive queries
- Compare cost across agent versions
- Set budgets for evaluation and auto-scoring
Playground
From any trace detail page, click the “Open in Playground” button on an LLM call to debug and iterate on prompts using real production data. The Playground opens with:- System message pre-populated from the original call (editable)
- User message pre-populated (editable)
- Model and temperature pre-filled from the original call
- Original output shown on the left for comparison
- Test prompt variations without re-running the full agent
- Compare outputs across different model parameters
- Debug specific LLM calls in isolation
Next Steps
Evaluations
Auto-score traces with built-in checks or LLM-as-judge.
Manifests
Detect agent configuration changes automatically.
Datasets
Build training data from your best traces.
Troubleshooting
Traces not showing up, or missing a verdict? Common fixes.