Skip to main content
Every agent team ships changes weekly. Every change risks breaking something. Most teams find out only after their users do. DecimalAI tracks your agent’s structural identity — tools, prompts, models, skills — as a versioned manifest, then uses your production traffic as the test set when you propose a change. You don’t write eval cases. You don’t hand over your API keys. We don’t run your agent. Here’s what the check posts on your next PR — structural impact, computed before the change deploys:
🔍 Decimal Manifest Impact — support-agent

  🔴 HIGH IMPACT — 247 traces will break (called the removed `compare_competitors` tool)
  🟡 MEDIUM IMPACT — 501 traces may behave differently
  🟢 LOW IMPACT — 1,254 traces unaffected

  Verdict: Review before merging.
See it in action: Follow the 2-Minute Demo.

Run this in Colab

Live notebook, no setup — just paste your API key.

Try either demo in 2 minutes

Two of the three capabilities ship with a one-command demo — seeded realistic data in your workspace, so the payoff lands before you instrument your own agent.
pip install decimalai
export DECIMAL_API_KEY="dai_sk_..."   # from app.decimal.ai/settings

For engineers — catch regressions

decimalai demo regression
Links straight to the impact report: which production traces your next change would break.

For prompt engineers — find skills that work

decimalai demo skills
Links to the registry ranked by real production effectiveness — or browse it now, no signup.

What you can do with DecimalAI

Three capabilities, one foundation. Regression checks, the skills registry, and training-data validation each stand alone — but they compound on one thing: manifest-aware versioning, the structural fingerprint of your agent.

Catch regressions before you ship

Open a PR with an agent change; get a structural impact report — which production traces will break, may differ, or are unaffected — before it deploys.

Discover & share proven skills

Install skills from the registry ranked by real production-effectiveness data, then measure how they perform on your own traffic.

Keep training data valid

As your agent evolves, traces are auto-classified keep / repair / replay / drop against the manifest diff — so your training set stays clean.

Why “manifest-aware” matters

Other observability platforms (LangSmith, Braintrust, Langfuse, Weave) detect regressions by running your eval suite on the new version. That works only if you’ve written eval cases — which most teams haven’t, and the ones they have are usually stale. DecimalAI works differently. Your production traces are tagged with the manifest they ran under. When you propose a manifest change, we identify which traces depended on what’s changing and tell you the structural blast radius — no eval suite required.
Eval-driven (other tools)Manifest-aware (DecimalAI)
Requires writing eval cases✅ Yes❌ No — production traffic IS the test set
Requires running your agent in CI✅ Yes❌ No — pure trace-store query
Knows the blast radius of a change❌ No✅ Yes — identifies exactly which traces touched the changed surface
Catches removed-tool regressions⚠️ Only if eval coverage exists✅ Yes, structurally
Cost per check$$ (full eval suite, possibly LLM-graded)<$0.001 (database query)

What DecimalAI does NOT do (yet)

We’re honest about the boundaries:
BoundaryWhat we do todayDeferred / how to cover
Running your full agent in CIPre-deploy regression detection is structural impact analysis — we tell you what’s at risk based on the manifest diff, without executing your agent.Full end-to-end agent replay is deferred. For changes where structure can only say “everything may be affected” (large prompt rewrites), use a careful canary deploy and our post-deploy bisect view.
Behavioral verification of model swapsModel-swap call-replay ships now (preview): for each affected trace we re-issue one recorded model call against the candidate model and diff the outputs. Defaults to mock (no spend); mode=real does same-provider swaps.Cross-provider swaps and replaying the full agent (not just a single recorded call) are deferred.
Holding your LLM API keysPre-deploy structural analysis runs entirely against our trace store — no keys required.mode=real call-replay needs a key for the same provider; mock replay needs none.
Replacing your observability toolDecimalAI runs alongside LangSmith, Braintrust, Langfuse, etc. Pipe your traces in via the SDK; we add the manifest layer underneath.Not a goal — keep your existing tracing in place.

Supported Frameworks

FrameworkIntegrationManifest Fidelity
LangChain / LangGraphdecimalai.langchainFull (tools, models, prompts)
OpenAI Agents SDKdecimalai.openai_agentsFull (with Agent introspection)
LlamaIndexdecimalai.llamaindexFull (span-based)
AutoGen / AG2decimalai.autogenVia OTEL
CrewAI, Haystack, ADKdecimalai.otelModel + tool names
Any Python agent@decimalai.traceConfigurable

Next Steps

Start here

Quickstart

Install the SDK, get your first trace, and add the GitHub Action in under 10 minutes.

2-Minute Demo

One command each: a live impact report and the ranked skills registry, on seeded data.

By role

Engineering teams

Catch agent regressions on every PR with manifest-aware impact analysis.

Prompt engineers

Track skills, browse the registry, and measure effectiveness with production data.

ML teams

Build versioned SFT datasets that stay valid as your agent evolves.

Reference

Concepts

Every term and system explained, with diagrams.

Manifests Guide

How automatic version tracking and compatibility scoring work.

Why DecimalAI?

Detailed comparison with LangSmith, Braintrust, Langfuse.

API Reference

Every REST endpoint with examples and schemas.