How fast?
~250–350ms warm. Cache hit (multi-turn) is free. Adds ~10–25% to total turn latency.
What if it fails?
Every method degrades to an empty fragment + a warning log. Your agent never breaks on a Router hiccup.
What's the payoff?
Per-skill, per-model effectiveness scoring in the dashboard. The data is the moat.
Before / after
Integrate in 3 steps
Install the framework adapter
Pick your framework. Each adapter is a one-line install with the same flag:Skills are injected into every
- LangChain
- OpenAI Agents
- Pydantic AI
- Anthropic SDK
- No framework
BaseChatModel.invoke() / ainvoke() as a SystemMessage. Works with LangChain agents, LangGraph nodes, and bare LLM calls.Run your agent — telemetry flows automatically
Make a normal call. Open the DecimalAI dashboard. You’ll see:
- Which skills the Router offered for each user turn
- Which ones the LLM activated
- Per-skill pass rate, per-(skill, model) effectiveness, trend over time
routing_id on every trace so the join can close server-side.How it works (high level)
The single load-bearing detail: the samerouting_id propagates from the Router through every LLM call in a turn to the trace. That join is what produces per-skill effectiveness scoring.
Routing strategies
| Strategy | Method | When to use |
|---|---|---|
| Full menu | router.get_menu() | Registry under ~20 skills. Load all names + descriptions; let the LLM choose. |
| Smart route | router.smart_route(query=...) | Larger registry. Server picks the top-K by hybrid retrieval and historical effectiveness. |
| On-demand body | router.get_skill_body(name) | Two-pass agents. Surface menu first; load body only after the LLM commits. |
Response shape
Bothget_menu() and smart_route() return the same envelope:
prompt_fragment is a ready-to-splice markdown block. routing_id is the join key you (or your adapter) attach to the resulting trace.
Policy controls
| Knob | Effect |
|---|---|
strategy="auto" | Smart route when query present, full menu otherwise |
max_menu_size | Hard cap on returned skills, regardless of registry size |
category | Restrict to a single registry category |
include_attachments=False | Strip bundled scripts to shrink the payload |
Plan limits
The Router itself is Free. What scales with your plan is skill count and a few publishing / analytics capabilities:| Free | Core | Pro | Enterprise | |
|---|---|---|---|---|
| Skills in your org | 10 | 50 | 250 | Unlimited |
smart_route | ✓ | ✓ | ✓ | ✓ |
| Publish to registry | — | ✓ | ✓ | ✓ |
analytics/compare + leaderboard | — | ✓ | ✓ | ✓ |
| Per-agent bundle templates | — | — | ✓ | ✓ |
| Private registry / BYO embedder | — | — | — | ✓ |
smart_route is Free because its value scales with skill count — a user with 8 skills doesn’t benefit; a user with 80 does, and at 80 they’ve already left the Free tier.
Smart-route internals: how skills are picked
Smart-route internals: how skills are picked
The server runs a hybrid retrieval pipeline against your registry:
- Embed the query with the production embedding model (currently
text-embedding-004). - Hybrid retrieve — combine semantic similarity (vector cosine on
description_embedding) with lexical match (PostgreSQL full-text search onsearch_doc). Fuse with Reciprocal Rank Fusion (RRF) — parameter-free, no weight to tune. - Re-rank by effectiveness — blend retrieval score with each skill’s historical pass rate. Skills with proven track records bubble up; new skills aren’t penalised until they have enough activations to be judged.
- Persist a
routing_decisionrow so the offered-vs-activated join can close.
Caching: why multi-step agent loops still produce one routing decision
Caching: why multi-step agent loops still produce one routing decision
The Router caches
build_prompt_fragment() results in-process for 30 seconds by default. This matters because a single user turn often produces multiple LLM calls (tool-using agent loops, retries, sub-steps). Without caching, each call would re-route the same query, producing duplicate routing_decision rows and wasted embedding calls.With caching, every LLM call within a turn:- Hits the platform once (the first call)
- Reuses the cached fragment and the same
routing_idfor all subsequent calls - Produces exactly one
routing_decisionrow for the entire turn
Routing telemetry — what gets logged
Routing telemetry — what gets logged
Every routing call writes a record. Joined with traces, it produces:
Per-model effectiveness is the single hardest signal for competitors to replicate — they don’t have your routing telemetry, so they can’t compute it.
| Metric | Where it surfaces |
|---|---|
| Activations per skill | Registry detail page |
| Effectiveness per skill | Registry detail page |
| Per-(skill, model) effectiveness | Registry detail page |
| Offered-but-not-activated rate | Skill detail page |
| Trend (improving / stable / degrading) | Registry browse + alerts |
Graceful degradation — what happens when something fails
Graceful degradation — what happens when something fails
| Failure | Behavior |
|---|---|
GEMINI_API_KEY unset on server | Falls back to full menu — every active skill, name+description only |
| Hybrid SQL errors (e.g., right after schema migration) | Same fallback; warns once |
| Platform unreachable from SDK | Returns ("", None) from build_prompt_fragment — agent runs with base instructions only |
Persisted routing_decision write fails | Synthetic routing_id returned; downstream join just won’t find a row |
Inspecting a routing decision directly
Inspecting a routing decision directly
What the Router does NOT do
What the Router does NOT do
- Execute skills. Skills are markdown — they shape the LLM’s behavior. For code execution, see how skills differ from tools.
- Decide which model to use. Model routing is separate. The Router operates within your chosen model.
- Cache bodies across processes. The in-process cache is per-process. For shared caching, put a service in front.
Router vs disk auto-loading (pick one)
Some agent runtimes (Claude Code, Cursor) auto-discoverSKILL.md
files from .claude/skills/ and inject them into the system prompt
themselves. The Router also injects skills into the system prompt — from
the platform. Running both at once means the same skill ends up in
the prompt twice, which confuses the LLM and inflates token cost.
The fix is to pick one source of skill injection per agent process:
| Your setup | What to use | Why |
|---|---|---|
| Python app (LangChain / OpenAI Agents / Pydantic AI) — no IDE runtime | Router with enable_skill_loader=True | The framework doesn’t auto-load from disk; Router is the only injector. No duplication. |
| Claude Code / Cursor — IDE-managed agent | Disk auto-loading, no Router | The runtime injects from .claude/skills/; adding the Router would duplicate. |
| Python app wrapping Anthropic SDK that also runs inside Claude Code | Pick one: either skip enable_skill_loader=True OR pass disk_sync=False and remove local SKILL.md files | Both runtimes are active; only one should inject. |
| Authoring + editing locally, deploying as a Python service | Edit SKILL.md files locally; SDK syncs them to platform; Router loads from platform at runtime | Disk is the editing surface, platform is the runtime surface, Router is the bridge. |
CLAUDECODE, CLAUDE_CODE_ENTRYPOINT, CURSOR_AGENT) and logs a
one-shot warning when enable_skill_loader=True fires inside one of
them. Silence the warning intentionally with
DECIMALAI_SUPPRESS_DISK_RUNTIME_WARNING=1 if you’ve consciously
chosen the setup (e.g., a benchmark).
disk_sync=False — Router is the only source
When you’re running a Python stack that should rely solely on the
platform for skill content, pass disk_sync=False to the framework
adapter:
- Local
SKILL.mdauto-discovery (no disk read) - Push of local skills to the platform (no
sync_skills) - Pull of platform-only skills to disk (no
pull_missing)
/api/v1/skills/route per turn;
that’s the runtime injector. Everything happens over the network —
nothing on disk.
Available on decimalai.openai_agents.install() and
decimalai.langchain.install(). The pydantic_ai and anthropic
adapters never touched disk, so they don’t need the flag.
Related
- Skills API endpoints — raw REST surface
SkillRouterPython class — full SDK reference with CRUD, sync, export-to-disk- Skills & Data Pipeline — conceptual model: skills vs tools vs prompts
- Registry API — browse and install public skills
- Skills Observability tutorial — measure skill impact with experiments