Run this in Colab
Live notebook, no setup — just paste your API key.
Try either demo in 2 minutes
Two of the three capabilities ship with a one-command demo — seeded realistic data in your workspace, so the payoff lands before you instrument your own agent.For engineers — catch regressions
For prompt engineers — find skills that work
What you can do with DecimalAI
Three capabilities, one foundation. Regression checks, the skills registry, and training-data validation each stand alone — but they compound on one thing: manifest-aware versioning, the structural fingerprint of your agent.Catch regressions before you ship
Open a PR with an agent change; get a structural impact report — which production traces will break, may differ, or are unaffected — before it deploys.
Discover & share proven skills
Install skills from the registry ranked by real production-effectiveness data, then measure how they perform on your own traffic.
Keep training data valid
As your agent evolves, traces are auto-classified keep / repair / replay / drop against the manifest diff — so your training set stays clean.
Why “manifest-aware” matters
Other observability platforms (LangSmith, Braintrust, Langfuse, Weave) detect regressions by running your eval suite on the new version. That works only if you’ve written eval cases — which most teams haven’t, and the ones they have are usually stale. DecimalAI works differently. Your production traces are tagged with the manifest they ran under. When you propose a manifest change, we identify which traces depended on what’s changing and tell you the structural blast radius — no eval suite required.| Eval-driven (other tools) | Manifest-aware (DecimalAI) | |
|---|---|---|
| Requires writing eval cases | ✅ Yes | ❌ No — production traffic IS the test set |
| Requires running your agent in CI | ✅ Yes | ❌ No — pure trace-store query |
| Knows the blast radius of a change | ❌ No | ✅ Yes — identifies exactly which traces touched the changed surface |
| Catches removed-tool regressions | ⚠️ Only if eval coverage exists | ✅ Yes, structurally |
| Cost per check | $$ (full eval suite, possibly LLM-graded) | <$0.001 (database query) |
What DecimalAI does NOT do (yet)
We’re honest about the boundaries:| Boundary | What we do today | Deferred / how to cover |
|---|---|---|
| Running your full agent in CI | Pre-deploy regression detection is structural impact analysis — we tell you what’s at risk based on the manifest diff, without executing your agent. | Full end-to-end agent replay is deferred. For changes where structure can only say “everything may be affected” (large prompt rewrites), use a careful canary deploy and our post-deploy bisect view. |
| Behavioral verification of model swaps | Model-swap call-replay ships now (preview): for each affected trace we re-issue one recorded model call against the candidate model and diff the outputs. Defaults to mock (no spend); mode=real does same-provider swaps. | Cross-provider swaps and replaying the full agent (not just a single recorded call) are deferred. |
| Holding your LLM API keys | Pre-deploy structural analysis runs entirely against our trace store — no keys required. | mode=real call-replay needs a key for the same provider; mock replay needs none. |
| Replacing your observability tool | DecimalAI runs alongside LangSmith, Braintrust, Langfuse, etc. Pipe your traces in via the SDK; we add the manifest layer underneath. | Not a goal — keep your existing tracing in place. |
Supported Frameworks
| Framework | Integration | Manifest Fidelity |
|---|---|---|
| LangChain / LangGraph | decimalai.langchain | Full (tools, models, prompts) |
| OpenAI Agents SDK | decimalai.openai_agents | Full (with Agent introspection) |
| LlamaIndex | decimalai.llamaindex | Full (span-based) |
| AutoGen / AG2 | decimalai.autogen | Via OTEL |
| CrewAI, Haystack, ADK | decimalai.otel | Model + tool names |
| Any Python agent | @decimalai.trace | Configurable |
Next Steps
Start here
Quickstart
Install the SDK, get your first trace, and add the GitHub Action in under 10 minutes.
2-Minute Demo
One command each: a live impact report and the ranked skills registry, on seeded data.
By role
Engineering teams
Catch agent regressions on every PR with manifest-aware impact analysis.
Prompt engineers
Track skills, browse the registry, and measure effectiveness with production data.
ML teams
Build versioned SFT datasets that stay valid as your agent evolves.
Reference
Concepts
Every term and system explained, with diagrams.
Manifests Guide
How automatic version tracking and compatibility scoring work.
Why DecimalAI?
Detailed comparison with LangSmith, Braintrust, Langfuse.
API Reference
Every REST endpoint with examples and schemas.