The 2-Minute Demo - DecimalAI

The fastest way to understand DecimalAI is to see its two headline outputs on real-looking data. Both ship as one-command sandboxes that seed your workspace and link you straight to the result:

decimalai demo regression — what a risky agent change does to production traffic (the impact report)
decimalai demo skills — the skills registry, ranked by measured effectiveness

Everything seeded is prefixed [Demo] and fully removable. No agent code, no framework setup, no LLM API keys.

Prerequisites

Python 3.10+ (python --version — older Pythons silently install an outdated SDK)
A DecimalAI API key from app.decimal.ai/settings

pip install decimalai
export DECIMAL_API_KEY="dai_sk_..."

Run the regression demo

decimalai demo regression

In ~30 seconds this seeds a demo support agent with two manifest versions and a production trace corpus, then runs the regression check:

Demo A — "Your agent changed"
Seeding the demo agent (v1 → v2 + traces)…
  agent: [Demo] support-agent  ·  traces: 10  ·  v1 c7664af1 → v2 1d163437
Running the regression check (v2 vs auto-resolved v1)…

Verdict: high_risk — 2 traces will break. Review before merging.
Traces analyzed: 110 (high 2 / med 108 / low 0)

Open the impact report (keep / repair / replay / drop fan-out):
  https://app.decimal.ai/agents/.../impact-reports/<id>

The v1→v2 diff contains the three change types you’ll actually ship: a model swap, a tool rename + removal, and a prompt rewrite.

traces: 10 and Traces analyzed: 110 mean different things. 10 is the seeded corpus you’ll see in the dashboard; ~110 is what the engine actually analyzes — it pulls in roughly 100 additional history rows so the check runs against a realistic volume.

Read the impact report

Open the printed link. The report answers the question every reviewer has — what does this change do to traffic we’ve already served?

🔴 HIGH IMPACT — traces that called the removed tool. They will break.
🟡 MEDIUM IMPACT — traces touched by the model swap / prompt rewrite. Outputs may differ; structural analysis can’t predict direction.
🟢 LOW IMPACT — traces that never touched a changed surface.

Below the severity bands, each affected trace gets a keep / repair / replay / drop verdict — the same classification that keeps training datasets valid as your agent evolves.This is exactly what the GitHub Action posts on every PR — computed against your production traces instead of seeded ones.

Run the skills demo

decimalai demo skills

Seeds three public skills with deliberately varied effectiveness and runs the stats recompute:

Demo B — "Find skills that work"
Seeding 3 public skills + stats + traces (runs the recompute)…
  skills: [Demo] code-review, [Demo] sql-optimizer, [Demo] deploy-checklist

Open the Skills Registry (ranked by SkillScore — measured quality, not installs):
  https://app.decimal.ai/skills

The three seeded skills land at different SkillScores by design:

Skill	Effectiveness	Rank — why
`code-review`	High	#1 — verified, high eval pass rate and strong AI-judge quality
`sql-optimizer`	Medium	#2 — solid pass rate, fewer activations
`deploy-checklist`	Low	#3 — low pass rate; the kind of skill you’d retire or rewrite

The high performer is verified and tops the ranking; the weak one sits at the bottom — because SkillScore is computed from live eval pass rates and AI-judge quality, not install counts.

Clean up (optional)

decimalai demo reset

Removes all [Demo] -prefixed agents, manifests, traces, and skills. Exact-prefix matched — anything you created yourself is untouched. Re-running either demo also resets first by default, so you always land in a clean state.

What just happened

No agent was executed and no LLM was called — the regression check is a structural query against the trace store, which is why it runs in seconds and costs nothing. The same mechanism, pointed at skills, produces the registry’s effectiveness ranking.

Do it with your own agent

Quickstart

Instrument your agent and get your first real trace in ~5 minutes.

Regression Check on every PR

Wire the GitHub Action so this report appears on your next pull request.

Skills guide

Auto-discover your SKILL.md files and measure them on your traffic.

Run the manual loop in Colab

Prefer code? Build the v1→v2 loop yourself in a live notebook.

​Prerequisites

​What just happened

​Do it with your own agent

Quickstart

Regression Check on every PR

Skills guide

Run the manual loop in Colab

Prerequisites

What just happened

Do it with your own agent