Evaluations API - DecimalAI

The Evaluations API attaches quality scores to traces and aggregates them into a per-trace verdict (keep / repair / replay / drop). It accepts scores from multiple sources — built-in deterministic checks, your @eval functions, LLM judges, and pushed-from-elsewhere scores from DeepEval / LangSmith / your CI pipeline. Together with the Compatibility Policies guide, this is how DecimalAI decides what’s training-data-grade.

When to use this API

Push scores from an external eval pipeline

Already running DeepEval or a homegrown harness? POST results to /traces/{id}/eval-scores and they show up in the same dashboard view as your built-ins, tagged by source.

Re-score a trace on demand

Call /traces/{id}/evaluate to re-run the configured eval policy (built-ins + LLM judges + your @eval functions) without re-running the agent itself.

Override a verdict manually

A human reviewer looked at the trace and disagrees with the auto-verdict. /traces/{id}/decision writes the override; the original auto-verdict is preserved for audit.

Bulk classify after a policy change

Tighten your eval policy and want every existing trace re-classified? /traces/batch-decision runs the new policy against existing scores without re-running anything.

Score sources (the `source` field)

Every score row carries a source so the dashboard can show it as a tagged badge:

Source	Where it comes from
`built_in`	Server-side deterministic checks (completion / has_output / tool_compliance / latency / token_efficiency). Always attached automatically at ingest.
`sdk` / `custom`	Your Python `@eval`-decorated functions, computed SDK-side before upload.
`llm_judge`	Server-side LLM-as-judge against a rubric you configured.
`deepeval` / `langsmith` / `external`	Pushed via `POST /eval-scores` from an external pipeline.
`compat_engine`	Computed by the manifest impact engine (relative to a target manifest).

Endpoints at a glance

Method	Path	Purpose
`POST`	`/api/v1/traces/{id}/eval-scores`	Push N quality scores from any source onto a trace
`GET`	`/api/v1/traces/{id}/eval-scores`	Read every score for a trace, grouped by source
`GET`	`/api/v1/traces/{id}/eval-breakdown`	Score view with provenance + decision-engine reasons
`GET`	`/api/v1/traces/eval/stats`	Aggregate eval stats across a workspace
`POST`	`/api/v1/traces/{id}/evaluate`	Re-run the trace’s configured evaluators
`POST`	`/api/v1/traces/{id}/decision`	Override the auto-verdict manually
`POST`	`/api/v1/traces/batch-decision`	Recompute verdicts for many traces under a new policy

Quick start

import httpx

# 1. Push DeepEval scores onto a trace
httpx.post(
    "https://api.decimal.ai/api/v1/traces/trc_abc123/eval-scores",
    headers={"Authorization": "Bearer dai_sk_..."},
    json={
        "source": "deepeval",
        "scores": [
            {"name": "faithfulness", "score": 0.92, "passed": True},
            {"name": "answer_relevance", "score": 0.88, "passed": True},
            {"name": "context_precision", "score": 0.71, "passed": True},
        ],
    },
)

# 2. Read the full breakdown
breakdown = httpx.get(
    "https://api.decimal.ai/api/v1/traces/trc_abc123/eval-breakdown",
    headers={"Authorization": "Bearer dai_sk_..."},
).json()
print(breakdown["eval_verdict"])  # keep / repair / replay / drop
print(breakdown["quality_avg"])   # 0.0 — 1.0
for group in breakdown["source_groups"]:  # a list, one entry per source
    print(f"  {group['source']}: {len(group['scores'])} scores, avg {group['source_avg']:.2f}")

Evaluations Guide — the conceptual model + @eval decorator
Compatibility Policies — how scores aggregate into verdicts
Datasets API — datasets filter on verdict + score
Replay API — replay flagged-for-repair traces to recover them

​When to use this API

Push scores from an external eval pipeline

Re-score a trace on demand

Override a verdict manually

Bulk classify after a policy change

​Score sources (the source field)

​Endpoints at a glance

​Quick start

​Related

When to use this API

Score sources (the `source` field)

Endpoints at a glance

Quick start

Related