decimal-labs/regression-check GitHub Action. Dates use ISO 8601.
The platform follows a rolling release model — changes ship to api.decimal.ai continuously. The SDK and Action follow Semantic Versioning and are tagged on GitHub.
For breaking-change notices, subscribe to the GitHub release feeds:
- SDK: https://github.com/decimal-labs/decimalai-python/releases
- Action: https://github.com/decimal-labs/regression-check/releases
SDK (
decimalai-python 0.6.0)- 🗑️ Removed the experiments API. The agent/dataset experiment runner (
experiment(),run_experiment(),compare_experiments(), the offlineEval()helper) and the matching client methods backed/api/v1/experiments, which was never shipped and always returned 404. The endpoint has been formally retired. Use regression-check for pre-deploy A/B (POST /api/v1/regression-check), the regression timeline for post-deploy comparison, and skill version compare (/api/v1/skills/analytics/compare) for skill diffs.
Platform
- ✨ HMAC webhook signing — outbound webhooks now carry an
X-Decimal-Signatureheader for verifiable delivery. - ✨ Webhook retry + delivery log — failed deliveries retry with exponential backoff, with a per-webhook delivery log in the UI.
- ✨ Stripe billing end-to-end — checkout + customer portal wired through for self-serve plan upgrades.
- ✨ Anthropic in Playground — the Claude provider now sits alongside OpenAI and Gemini in the prompt-testing playground.
decimalai-python)- ✨
atexitflush handler — buffered traces flush on script exit, so short-lived scripts no longer lose traces silently.
SDK
0.4.0 (pip install decimalai, requires Python 3.10+)- ✨ One-command demo sandbox — see both demos on seeded data in ~2 minutes, before instrumenting anything:
decimalai demo regression— seeds a v1→v2 agent change + trace corpus, runs the regression check, links straight to the impact report.decimalai demo skills— seeds three skills with varied effectiveness, links to the ranked registry.decimalai demo reset— removes all[Demo]-prefixed data; your own agents and skills are never touched.
- ✨
decimalai initnow surfaces the demo commands in its next-steps output.
- ✨ SkillScore v2 — the registry score is now a quality-only composite (0–100): live eval pass rate + AI-judge quality, gated on sample size. Popularity and maintenance no longer affect the score. Skills under 10 activations/30d are relegated below scored skills in the default sort instead of hidden.
- ✨ Leaderboard axes: Highest SkillScore (default) · Biggest Improvement (measured lift vs no-skill baseline) · Most Efficient (token savings) · Top live rating.
decimal-labs/regression-check)- ✨ Honest behavioral nudge — when a PR’s diff contains a model change and
behavioral-checkis off, the impact comment now shows how many recorded calls can be verified and how (behavioral-check: realor post-deploy bisect). No fabricated counts. - 🔧
behavioral-check: mockno longer renders a meaningless equivalent/changed split (the mock stub always read ~100% changed); it now reports the eligible-call count and points atreal.
The skill registry layer that knows what works — registry, router, and observability shipped as one product.Registry
- ✨ Per-model effectiveness on every registry skill — see the pass rate a skill gets on GPT-5 vs Claude Opus vs Gemini Flash, computed from production traces. “Best with” badge marks the highest-passing model.
- ✨ Real “Most Effective” sort ranks by SkillScore (with a minimum-activations gate so cold-start skills don’t dominate). New separate
sort=popularfor raw activation count. - ✨ Activation sparkline on every public skill page — 30-day daily trend, server-rendered SVG, zero JS.
- ✨ Version diff viewer lets unauthenticated visitors compare any two published versions side-by-side.
- ✨ Popular forks surfaced on detail pages so consumers can find community-iterated variants.
- ✨ Integration snippets (Python SDK ·
pull· curl · agent-runtime paths) on every detail page with copy-to-clipboard. - ✨ 25 new flagship official skills authored — code review, API design, data/SQL, prompt engineering, agent design, ops, docs, security. All Apache-2.0.
- 🔧 Default browse view hides bulk-imported skills with under 10 activations so the registry feels curated. Use the Imported tab or search to see all 3,000+ imports.
- ✨ The
SkillRouteris now a first-class product surface with its own page in the API reference. Documents the three strategies (full menu / smart route / on-demand body), response shape, telemetry, policy controls, and smart-routing internals.
- ✨ Weekly skill degradation digest — opt-in email per workspace when a skill’s pass rate drops ≥15% week-over-week with at least 20 baseline activations. Thresholds tunable via env vars.
- ✨
/skills/<slug>is the new canonical public URL for a registry skill. Legacy/skills/<slug>continues to work; both share the same OG image. - ✨ OpenGraph cards dynamically rendered per skill — name, SkillScore, per-model row, activation count. Twitter, LinkedIn, and Slack unfurls show the effectiveness data on every share.
- ✨ Embed widget at
/embed/skills/<slug>— drop a 380×180px iframe into a README or blog post showing live effectiveness. Light + dark theme via?theme=.
- ✨
decimalai skills pull <slug>— pull any public registry skill to disk with no signup. Writes./<slug>/SKILL.md. Read-only (no fork, no telemetry); signup is only required to install + activate tracking.
GET /api/v1/registry/skills/{id}/activations— daily activation series for the sparkline.GET /api/v1/registry/skills/{id}/versions/{version_number}— body markdown for any published version (powers the public diff viewer).GET /api/v1/registry/skills/{id}/lineagealready existed; now surfaced on the public detail page as “Popular community forks”.
Platform
- ✨ Skills lifecycle is generally available: create, version, fork, subscribe, publish to registry, analytics.
- ✨ Public skills registry (
/skills) with SkillScore effectiveness ranking (Quality / Popularity / Maintenance). - ✨ Prompt Testing playground promoted from internal tool to first-class feature (
/playground), with BYOK support for OpenAI and Gemini. - ✨ Multi-agent topology graph + per-sub-agent compatibility dashboard.
- ✨ Workspace CRUD + RBAC role model in place (enforcement landing in next release; see rollout plan D-4).
- 🔧 Manifest registration is idempotent by hash — repeated
POST /manifestsreturns existing IDs.
decimalai-python)- ✨
decimalai.init(langchain=True | openai_agents=True | llamaindex=True | crewai=True | autogen=True | otel=True)covers 6+ frameworks. - ✨ Skill auto-discovery from
.claude/skills/,.agents/skills/. - ✨ Bidirectional skill sync (
POST /skills/sync+SkillRouter.pull_missing()). - ✨
@decimalai.trace()decorator for any Python function.
decimal-labs/regression-check)- ✨ Initial release. Computes structural diff between PR manifest and production manifest; posts impact report as a PR comment.
- ✨
manifest_onlySDK mode for CI: runs manifest extraction without invoking the agent.
Platform
- ✨ Hero workflow: manifest change → batch compatibility re-score → Impact Report banner → Auto-Repair + Build Dataset stepper → JSONL export.
- ✨ Training Data Health dashboard at
/(health ring, category bars). - ✨ Drift detection toast + sidebar compat badges.
- ✨ First public version. Manifest capture, trace ingest, framework adapters.