Evaluations - DecimalAI

The evaluations API has three layers: push (eval, score, batch_eval, external helpers), read (get_eval_breakdown), and define (@eval decorator for client-side evaluators). See the Evaluations guide for the conceptual model.

`decimalai.eval()`

Push a single eval score to a trace.

decimalai.eval(
    trace_id="abc-123",
    name="factual_accuracy",
    score=0.85,
    reason="4/5 facts verified against source docs",
    source="custom",
)

trace_id

str

required

The trace to attach the score to.

name

str

required

Metric name (e.g., "factual_accuracy", "relevance").

score

float

required

Score value between 0.0 and 1.0.

source

str

default:"custom"

Eval source identifier. Appears grouped in dashboard.

source_label

str

Human-readable display name (e.g., "My RAG Eval").

passed

bool

default:"score >= 0.5"

Binary pass/fail override.

reason

str

Human-readable explanation of the score.

`decimalai.score()`

Shorthand for pushing a single eval score.

decimalai.score("abc-123", "coherence", 0.92)
decimalai.score("abc-123", "factual", 0.75, reason="One hallucination detected")

`decimalai.get_eval_breakdown()`

Get the full eval breakdown for a trace, grouped by source.

bd = decimalai.get_eval_breakdown("abc-123")

print(f"Verdict: {bd['eval_verdict']}")       # "keep", "drop", "replay"
print(f"Quality avg: {bd['quality_avg']}")     # 0.85
print(f"Compat avg: {bd['compat_avg']}")       # 1.0

for group in bd["source_groups"]:
    print(f"  {group['source']}: {group['scores']}")

`decimalai.batch_eval()`

Apply an evaluator function across multiple traces.

from decimalai import batch_eval

def my_evaluator(trace_data):
    return 1.0 if "source" in trace_data["output"] else 0.0

results = batch_eval(
    agent_name="my-agent",
    eval_fn=my_evaluator,
    eval_name="has_citation",
    limit=100,
)
print(f"Evaluated {results['evaluated']} traces")

External Integration Helpers

Import scores from third-party evaluation frameworks:

DeepEval
LangSmith
Custom Scores

from decimalai import push_deepeval_results

# After running DeepEval evaluations
push_deepeval_results(
    trace_id="abc-123",
    results=deepeval_results,  # DeepEval TestResults object
)

from decimalai import push_langsmith_scores

push_langsmith_scores(
    trace_id="abc-123",
    scores=langsmith_feedback,  # LangSmith feedback list
)

from decimalai import push_custom_scores

push_custom_scores(
    trace_id="abc-123",
    source="my-eval-pipeline",
    scores=[
        {"name": "relevance", "score": 0.9},
        {"name": "safety", "score": 1.0, "passed": True},
    ],
)

Custom Evaluators (`@eval`)

Define evaluators that run client-side before trace upload:

from decimalai.evals import eval, TraceData, EvalResult

@eval(name="has_citation", category="quality")
def check_citation(trace: TraceData) -> bool:
    """Returns True if the output contains a citation."""
    return "[source:" in trace.output

@eval(name="response_length", category="quality")
def check_length(trace: TraceData) -> EvalResult:
    """Check response length is reasonable."""
    length = len(trace.output)
    return EvalResult(
        score=min(length / 500, 1.0),
        passed=50 < length < 2000,
        reason=f"Length: {length} chars",
    )

# Register evals with your framework.
# The evals= kwarg lives on the LangChain adapter's install():
from decimalai.langchain import install
install(agent_name="my-agent", evals=[check_citation, check_length])

The @eval decorator also accepts category="llm_judge" (for judge-style scorers), sampling_rate (a 0.0–1.0 fraction of traces to evaluate), and version (a string tag for the evaluator definition). See the Evaluations guide for built-in evaluators and sampling configuration.

What’s next

Replay

Re-run traces after a manifest change and re-score them.

Datasets

Datasets filter on the eval verdict — keep traces become training data.

​decimalai.eval()

​decimalai.score()

​decimalai.get_eval_breakdown()

​decimalai.batch_eval()

​External Integration Helpers

​Custom Evaluators (@eval)

​What’s next

Replay

Datasets

`decimalai.eval()`

`decimalai.score()`

`decimalai.get_eval_breakdown()`

`decimalai.batch_eval()`

External Integration Helpers

Custom Evaluators (`@eval`)

What’s next