Troubleshooting - DecimalAI

Symptom	Likely cause	Section
Traces aren’t appearing in the dashboard	Missing key, wrong base URL, no framework flag, unflushed buffer, wrong workspace	Traces aren’t appearing
`401 Unauthorized` on every request	Key missing, malformed, or expired	401 Unauthorized
`402 Payment Required`	Plan quota exhausted	402 Payment Required
`409 Conflict` on skill sync or registry install	Resource already exists (idempotent) — safe to ignore	409 Conflict
`429 Too Many Requests`	Per-plan rate limit	429 Too Many Requests
Traces appear but no evaluation / verdict	No eval policy, silent check failure, or worker lag	No evaluation / verdict
regression-check says “no manifest”	Manifest not registered for this agent + ref	Manifest not registered
SDK is blocking my agent / requests are slow	Client re-created in hot path, or stale SDK	SDK is blocking my agent
Multi-agent traces aren’t linking	`parent_trace_id` not propagated	Multi-agent traces
Webhooks aren’t firing	Bad URL, event disabled, slow handler	Webhooks aren’t firing

My traces aren't appearing in the dashboard

Most often one of:

DECIMAL_API_KEY is missing or wrong. Confirm with echo $DECIMAL_API_KEY — it should start with dai_sk_. If you’re using .env files, make sure they’re loaded before decimalai.init().
Wrong base URL. If you’re self-hosting or pointing at staging, set DECIMAL_BASE_URL explicitly:
```
export DECIMAL_BASE_URL=https://api.decimal.ai
```
Framework flag missing. decimalai.init(api_key="...") alone doesn’t capture traces — you also need a framework flag:
```
decimalai.init(api_key="...", langchain=True)   # or openai_agents, llamaindex, crewai, autogen, otel
```
See Tracing for the full matrix.
Process exited before flush. The SDK buffers traces and flushes on a background timer. An atexit handler drains the buffer on normal interpreter shutdown, so most short-lived scripts are covered automatically. The gap is processes that are hard-killed (SIGKILL, os._exit(), a crash) before atexit runs. For those, flush explicitly before exit:
```
import decimalai

try:
    run_agent()
finally:
    decimalai.flush()  # block until the buffer drains
```
Wrong workspace. SDK traces are routed by the API key you send them with — and, if set, the project you pass to init() (sent as the X-Decimal-Project header). There is no team= parameter or DECIMAL_TEAM env var. Confirm the dashboard’s active workspace matches the key, and that any decimalai.init(project="...") value is the one you expect. See Teams & Workspaces for how routing resolves.

401 Unauthorized

The API returns four status codes you’ll see most often. This table is the at-a-glance summary; the per-code detail follows in the accordions below.

Code	Meaning	First thing to try
`401`	Key missing, malformed, or expired	Re-issue the key; confirm the `Bearer dai_sk_…` header
`402`	Plan quota exhausted	Read `detail` for the metric; wait for reset or upgrade
`409`	Resource already exists (idempotent)	Safe to ignore — no action needed
`429`	Rate limit hit	Honor `Retry-After`; batch your requests

401 detail. Your API key is missing, malformed, or expired.

# Confirm header format
curl https://api.decimal.ai/api/v1/agents \
  -H "Authorization: Bearer dai_sk_YOUR_KEY"

Re-issue the key in Settings → API Keys. Old keys are revoked when you create a new one with the same label.If you’re in Clerk dashboard mode and getting 401 on export endpoints specifically, this is a known issue (getApiKey() legacy fallback). Workaround: use a workspace API key explicitly.

402 Payment Required

You’ve hit your plan’s quota — usually traces ingested per month or SFT rows generated.The detail field names the exhausted metric:

{ "detail": "Plan limit reached: traces_ingested (5000 / 5000 for plan=free)" }

Either:

Wait for the next billing period (resets at month boundary).
Upgrade in Settings → Billing.

The dashboard banner shows your current usage. If usage feels too high, look for runaway tests or accidental dev traffic hitting prod keys.

409 Conflict on skill sync or registry install

This is safe to ignore — it means the resource already exists in an equivalent state.

Skill sync 409: the body hash already matches an existing version. No new version was created.
Registry install 409: you’ve already installed this skill in this org.

Both endpoints are idempotent by design.

429 Too Many Requests

You’re hitting the per-plan rate limit. Responses include Retry-After:

HTTP/1.1 429 Too Many Requests
Retry-After: 12

The SDK respects this automatically. If you’re calling the API directly, sleep for the indicated seconds.To reduce request count, send traces in batches instead of one-at-a-time. The SDK automatically batches when buffer thresholds are hit. For direct API use, hit POST /api/v1/traces/batch with up to 100 traces per call.See Errors for the full rate limit table.

I see traces but no evaluation / verdict

Evaluations run asynchronously by default. After ingest:

Background eval worker scores each trace against the active policy.
Decision engine computes a unified verdict (pass / fail / review).

Common causes of missing verdicts:

No evaluators configured for the agent. Attach one from the Evaluate dashboard’s Auto-Scoring panel, or register evaluators via /api/v1/evaluators.
Custom eval check failed silently. Check the trace detail page → “Eval Errors” section.
Background worker hasn’t caught up. New traces typically score within 30s. Refresh the dashboard.

Manifest not registered — regression-check action says "no manifest"

The GitHub Action looks up the manifest by agent_name + the git ref it’s running against. If the manifest hasn’t been registered, the action skips the check.Fix: run scripts/init_for_decimal.py (or your equivalent) with DECIMALAI_MODE=manifest_only as a step before decimal-labs/regression-check@v1:

- name: Register manifest
  env:
    DECIMALAI_MODE: manifest_only
    DECIMAL_API_KEY: ${{ secrets.DECIMAL_API_KEY }}
  run: python scripts/init_for_decimal.py

- uses: decimal-labs/regression-check@v1
  with:
    api-key: ${{ secrets.DECIMAL_API_KEY }}
    agent-name: support-agent

manifest_only mode runs the manifest-extraction code path without actually invoking your agent.

SDK is blocking my agent / requests are slow

The SDK ingests traces in a background thread by default — the request path should never block on network I/O. If you’re seeing latency added to your agent:

Make sure you call decimalai.init() once at startup and reuse it — don’t construct new clients in hot paths.
Confirm you’re on a current SDK (pip install -U decimalai); background flush has been the default for a long time.

Multi-agent traces aren't linking together

For parent-child agent calls to show up as a tree, the parent must propagate the trace_id:If you’re using LangGraph, CrewAI, or the OpenAI Agents SDK, parent–child linkage happens automatically — just verify the framework flag is set on decimalai.init(). For custom orchestrators, capture the parent’s trace ID and pass it to the child’s parent_trace_id:

import decimalai

# Parent (orchestrator) — capture its trace ID, then hand it to the child
with decimalai.start_trace(agent_name="orchestrator") as parent:
    parent.log_llm_call(model="gpt-4o", input=msgs, output=resp)
    parent_id = parent.get_trace_id()

    # Child (sub-agent) — link it by passing parent_trace_id
    with decimalai.start_trace(
        agent_name="researcher",
        parent_trace_id=parent_id,
    ) as child:
        child.log_llm_call(model="gpt-4o", input=sub_msgs, output=sub_resp)

The platform displays the tree as long as parent_trace_id is set on the child trace.

Webhooks aren't firing

Confirm the URL. curl -X POST <your-url> from your terminal — does the endpoint accept the request?
Confirm the event is enabled. Settings → Notifications → Enabled events.
5-second per-attempt timeout. A handler that takes longer than 5 seconds counts as a failed attempt. Acknowledge fast (return 200), then process asynchronously.
Failed deliveries are retried with backoff, so a transient outage self-heals — but make your handler idempotent (deduplicate on the X-Decimal-Event-Id header) so a redelivered event isn’t processed twice. See Webhooks for signing and retry detail.

Tracing

How auto-detection picks up your framework, how spans are stitched together, and what you can override.

Manifests

The capability matrix per framework — useful when “my tools/prompts aren’t being captured” is the actual problem.

Errors

Full list of error codes the API can return, and what each one means.

Webhooks

Delivery semantics, HMAC signing, and retry — read this first if webhooks aren’t firing.

​Related reading

Tracing

Manifests

Errors

Webhooks

​Still stuck?

Related reading

Still stuck?