decimalai demo regression— what a risky agent change does to production traffic (the impact report)decimalai demo skills— the skills registry, ranked by measured effectiveness
[Demo] and fully removable. No agent code, no framework setup, no LLM API keys.
Prerequisites
- Python 3.10+ (
python --version— older Pythons silently install an outdated SDK) - A DecimalAI API key from app.decimal.ai/settings
Run the regression demo
traces: 10 and Traces analyzed: 110 mean different things. 10 is the seeded corpus you’ll see in the dashboard; ~110 is what the engine actually analyzes — it pulls in roughly 100 additional history rows so the check runs against a realistic volume.Read the impact report
Open the printed link. The report answers the question every reviewer has — what does this change do to traffic we’ve already served?
- 🔴 HIGH IMPACT — traces that called the removed tool. They will break.
- 🟡 MEDIUM IMPACT — traces touched by the model swap / prompt rewrite. Outputs may differ; structural analysis can’t predict direction.
- 🟢 LOW IMPACT — traces that never touched a changed surface.
Run the skills demo
| Skill | Effectiveness | Rank — why |
|---|---|---|
code-review | High | #1 — verified, high eval pass rate and strong AI-judge quality |
sql-optimizer | Medium | #2 — solid pass rate, fewer activations |
deploy-checklist | Low | #3 — low pass rate; the kind of skill you’d retire or rewrite |
What just happened
No agent was executed and no LLM was called — the regression check is a structural query against the trace store, which is why it runs in seconds and costs nothing. The same mechanism, pointed at skills, produces the registry’s effectiveness ranking.Do it with your own agent
Quickstart
Instrument your agent and get your first real trace in ~5 minutes.
Regression Check on every PR
Wire the GitHub Action so this report appears on your next pull request.
Skills guide
Auto-discover your SKILL.md files and measure them on your traffic.
Run the manual loop in Colab
Prefer code? Build the v1→v2 loop yourself in a live notebook.