Catch agent regressions before they ship

Every agent team ships changes weekly. Every change risks breaking something. Most teams find out only after their users do. DecimalAI tracks your agent’s structural identity — tools, prompts, models, skills — as a versioned manifest, then uses your production traffic as the test set when you propose a change. You don’t write eval cases. You don’t hand over your API keys. We don’t run your agent. Here’s what the check posts on your next PR — structural impact, computed before the change deploys:

🔍 Decimal Manifest Impact — support-agent

  🔴 HIGH IMPACT — 247 traces will break (called the removed `compare_competitors` tool)
  🟡 MEDIUM IMPACT — 501 traces may behave differently
  🟢 LOW IMPACT — 1,254 traces unaffected

  Verdict: Review before merging.

See it in action: Follow the 2-Minute Demo.

Run this in Colab

Live notebook, no setup — just paste your API key.

Try either demo in 2 minutes

Two of the three capabilities ship with a one-command demo — seeded realistic data in your workspace, so the payoff lands before you instrument your own agent.

pip install decimalai
export DECIMAL_API_KEY="dai_sk_..."   # from app.decimal.ai/settings

For engineers — catch regressions

decimalai demo regression

Links straight to the impact report: which production traces your next change would break.

For prompt engineers — find skills that work

decimalai demo skills

Links to the registry ranked by real production effectiveness — or browse it now, no signup.

What you can do with DecimalAI

Three capabilities, one foundation. Regression checks, the skills registry, and training-data validation each stand alone — but they compound on one thing: manifest-aware versioning, the structural fingerprint of your agent.

Catch regressions before you ship

Open a PR with an agent change; get a structural impact report — which production traces will break, may differ, or are unaffected — before it deploys.

Discover & share proven skills

Install skills from the registry ranked by real production-effectiveness data, then measure how they perform on your own traffic.

Keep training data valid

As your agent evolves, traces are auto-classified keep / repair / replay / drop against the manifest diff — so your training set stays clean.

Why “manifest-aware” matters

Other observability platforms (LangSmith, Braintrust, Langfuse, Weave) detect regressions by running your eval suite on the new version. That works only if you’ve written eval cases — which most teams haven’t, and the ones they have are usually stale. DecimalAI works differently. Your production traces are tagged with the manifest they ran under. When you propose a manifest change, we identify which traces depended on what’s changing and tell you the structural blast radius — no eval suite required.

	Eval-driven (other tools)	Manifest-aware (DecimalAI)
Requires writing eval cases	✅ Yes	❌ No — production traffic IS the test set
Requires running your agent in CI	✅ Yes	❌ No — pure trace-store query
Knows the blast radius of a change	❌ No	✅ Yes — identifies exactly which traces touched the changed surface
Catches removed-tool regressions	⚠️ Only if eval coverage exists	✅ Yes, structurally
Cost per check	$$ (full eval suite, possibly LLM-graded)	<$0.001 (database query)

What DecimalAI does NOT do (yet)

We’re honest about the boundaries:

Boundary	What we do today	Deferred / how to cover
Running your full agent in CI	Pre-deploy regression detection is structural impact analysis — we tell you what’s at risk based on the manifest diff, without executing your agent.	Full end-to-end agent replay is deferred. For changes where structure can only say “everything may be affected” (large prompt rewrites), use a careful canary deploy and our post-deploy bisect view.
Behavioral verification of model swaps	Model-swap call-replay ships now (preview): for each affected trace we re-issue one recorded model call against the candidate model and diff the outputs. Defaults to mock (no spend); `mode=real` does same-provider swaps.	Cross-provider swaps and replaying the full agent (not just a single recorded call) are deferred.
Holding your LLM API keys	Pre-deploy structural analysis runs entirely against our trace store — no keys required.	`mode=real` call-replay needs a key for the same provider; mock replay needs none.
Replacing your observability tool	DecimalAI runs alongside LangSmith, Braintrust, Langfuse, etc. Pipe your traces in via the SDK; we add the manifest layer underneath.	Not a goal — keep your existing tracing in place.

Supported Frameworks

Framework	Integration	Manifest Fidelity
LangChain / LangGraph	`decimalai.langchain`	Full (tools, models, prompts)
OpenAI Agents SDK	`decimalai.openai_agents`	Full (with Agent introspection)
LlamaIndex	`decimalai.llamaindex`	Full (span-based)
AutoGen / AG2	`decimalai.autogen`	Via OTEL
CrewAI, Haystack, ADK	`decimalai.otel`	Model + tool names
Any Python agent	`@decimalai.trace`	Configurable

Next Steps

Start here

Quickstart

Install the SDK, get your first trace, and add the GitHub Action in under 10 minutes.

2-Minute Demo

One command each: a live impact report and the ranked skills registry, on seeded data.

By role

Engineering teams

Catch agent regressions on every PR with manifest-aware impact analysis.

Prompt engineers

Track skills, browse the registry, and measure effectiveness with production data.

ML teams

Build versioned SFT datasets that stay valid as your agent evolves.

Reference

Concepts

Every term and system explained, with diagrams.

Manifests Guide

How automatic version tracking and compatibility scoring work.

Why DecimalAI?

Detailed comparison with LangSmith, Braintrust, Langfuse.

API Reference

Every REST endpoint with examples and schemas.

Run this in Colab

​Try either demo in 2 minutes

For engineers — catch regressions

For prompt engineers — find skills that work

​What you can do with DecimalAI

Catch regressions before you ship

Discover & share proven skills

Keep training data valid

​Why “manifest-aware” matters

​What DecimalAI does NOT do (yet)

​Supported Frameworks

​Next Steps

​Start here

Quickstart

2-Minute Demo

​By role

Engineering teams

Prompt engineers

ML teams

​Reference

Concepts

Manifests Guide

Why DecimalAI?

API Reference

Try either demo in 2 minutes

What you can do with DecimalAI

Why “manifest-aware” matters

What DecimalAI does NOT do (yet)

Supported Frameworks

Next Steps

Start here

By role

Reference