Skip to main content
The evaluations API has three layers: push (eval, score, batch_eval, external helpers), read (get_eval_breakdown), and define (@eval decorator for client-side evaluators). See the Evaluations guide for the conceptual model.

decimalai.eval()

Push a single eval score to a trace.
decimalai.eval(
    trace_id="abc-123",
    name="factual_accuracy",
    score=0.85,
    reason="4/5 facts verified against source docs",
    source="custom",
)
trace_id
str
required
The trace to attach the score to.
name
str
required
Metric name (e.g., "factual_accuracy", "relevance").
score
float
required
Score value between 0.0 and 1.0.
source
str
default:"custom"
Eval source identifier. Appears grouped in dashboard.
source_label
str
Human-readable display name (e.g., "My RAG Eval").
passed
bool
default:"score >= 0.5"
Binary pass/fail override.
reason
str
Human-readable explanation of the score.
category
str
default:"quality"
"quality" or "compatibility".

decimalai.score()

Shorthand for pushing a single eval score.
decimalai.score("abc-123", "coherence", 0.92)
decimalai.score("abc-123", "factual", 0.75, reason="One hallucination detected")

decimalai.get_eval_breakdown()

Get the full eval breakdown for a trace, grouped by source.
bd = decimalai.get_eval_breakdown("abc-123")

print(f"Verdict: {bd['eval_verdict']}")       # "keep", "drop", "replay"
print(f"Quality avg: {bd['quality_avg']}")     # 0.85
print(f"Compat avg: {bd['compat_avg']}")       # 1.0

for group in bd["source_groups"]:
    print(f"  {group['source']}: {group['scores']}")

decimalai.batch_eval()

Apply an evaluator function across multiple traces.
from decimalai import batch_eval

def my_evaluator(trace_data):
    return 1.0 if "source" in trace_data["output"] else 0.0

results = batch_eval(
    agent_name="my-agent",
    eval_fn=my_evaluator,
    eval_name="has_citation",
    limit=100,
)
print(f"Evaluated {results['evaluated']} traces")

External Integration Helpers

Import scores from third-party evaluation frameworks:
from decimalai import push_deepeval_results

# After running DeepEval evaluations
push_deepeval_results(
    trace_id="abc-123",
    results=deepeval_results,  # DeepEval TestResults object
)

Custom Evaluators (@eval)

Define evaluators that run client-side before trace upload:
from decimalai.evals import eval, TraceData, EvalResult

@eval(name="has_citation", category="quality")
def check_citation(trace: TraceData) -> bool:
    """Returns True if the output contains a citation."""
    return "[source:" in trace.output

@eval(name="response_length", category="quality")
def check_length(trace: TraceData) -> EvalResult:
    """Check response length is reasonable."""
    length = len(trace.output)
    return EvalResult(
        score=min(length / 500, 1.0),
        passed=50 < length < 2000,
        reason=f"Length: {length} chars",
    )

# Register evals with your framework.
# The evals= kwarg lives on the LangChain adapter's install():
from decimalai.langchain import install
install(agent_name="my-agent", evals=[check_citation, check_length])
The @eval decorator also accepts category="llm_judge" (for judge-style scorers), sampling_rate (a 0.0–1.0 fraction of traces to evaluate), and version (a string tag for the evaluator definition). See the Evaluations guide for built-in evaluators and sampling configuration.

What’s next

Replay

Re-run traces after a manifest change and re-score them.

Datasets

Datasets filter on the eval verdict — keep traces become training data.