Skill Router - DecimalAI

Most agents ship with a hardcoded list of tools or skills. As your registry grows past a handful, the prompt bloats, the LLM gets confused, and you have no signal about which skills actually work. The Skill Router fixes both. It picks which skills to surface on every user turn based on the query, and it logs every decision so DecimalAI can compute which skills are pulling their weight — automatically.

How fast?

~250–350ms warm. Cache hit (multi-turn) is free. Adds ~10–25% to total turn latency.

What if it fails?

Every method degrades to an empty fragment + a warning log. Your agent never breaks on a Router hiccup.

What's the payoff?

Per-skill, per-model effectiveness scoring in the dashboard. The data is the moat.

Before / after

from langchain.agents import create_react_agent

# You decide upfront which tools the LLM sees.
# Every turn includes all of them, even when irrelevant.
agent = create_react_agent(
    llm,
    tools=[code_review, security_audit, python_linter,
           refund_handler, ...20 more...],
    prompt=system_prompt,
)

The shift: you stop choosing skills at agent-creation time and start choosing them at prompt-assembly time, based on what the user actually asked.

Integrate in 3 steps

Initialize the SDK

import decimalai
decimalai.init()  # reads DECIMAL_API_KEY from env

Install the framework adapter

Pick your framework. Each adapter is a one-line install with the same flag:

LangChain
OpenAI Agents
Pydantic AI
Anthropic SDK
No framework

from decimalai.langchain import install
install(enable_skill_loader=True)

Skills are injected into every BaseChatModel.invoke() / ainvoke() as a SystemMessage. Works with LangChain agents, LangGraph nodes, and bare LLM calls.

from decimalai.openai_agents import install
install(enable_skill_loader=True)

Agent.__init__ is wrapped so any string instructions becomes dynamic — the Router runs per Runner.run() call.

from decimalai.pydantic_ai import install
install(enable_skill_loader=True)

A @agent.system_prompt function is auto-registered on every new Agent.

from decimalai.anthropic import install
install(enable_skill_loader=True)

client.messages.create(system=...) is augmented per call. Your user-supplied system content is preserved and placed after the skill fragment.

from decimalai.skill_router import SkillRouter
router = SkillRouter(api_key="dai_sk_...")

fragment, routing_id = router.build_prompt_fragment(
    query=user_message,
    agent_name="my-agent",
)
system_prompt = f"{base_prompt}\n\n{fragment}"

The routing_id is what closes the offered-vs-activated loop. If you build your own trace ingest, stamp it on the trace.

Run your agent — telemetry flows automatically

Make a normal call. Open the DecimalAI dashboard. You’ll see:

Which skills the Router offered for each user turn
Which ones the LLM activated
Per-skill pass rate, per-(skill, model) effectiveness, trend over time

No additional instrumentation required. The framework adapter you installed in step 2 stamps the routing_id on every trace so the join can close server-side.

How it works (high level)

The single load-bearing detail: the same routing_id propagates from the Router through every LLM call in a turn to the trace. That join is what produces per-skill effectiveness scoring.

Routing strategies

Strategy	Method	When to use
Full menu	`router.get_menu()`	Registry under ~20 skills. Load all names + descriptions; let the LLM choose.
Smart route	`router.smart_route(query=...)`	Larger registry. Server picks the top-K by hybrid retrieval and historical effectiveness.
On-demand body	`router.get_skill_body(name)`	Two-pass agents. Surface menu first; load body only after the LLM commits.

The framework adapters default to smart route when a query is detectable in context, full menu otherwise.

Response shape

Both get_menu() and smart_route() return the same envelope:

{
  "skills": [
    {
      "name": "refund-policy",
      "description": "Process refunds for settled payments.",
      "category": "support",
      "score": 0.91,
      "relevance": 0.84,
      "performance": 0.87,
      "version": 4
    }
  ],
  "prompt_fragment": "## Recommended Skills\n| refund-policy | ... |",
  "strategy": "smart_routing",
  "routing_id": "rt_a1b2c3d4..."
}

prompt_fragment is a ready-to-splice markdown block. routing_id is the join key you (or your adapter) attach to the resulting trace.

Policy controls

router = SkillRouter(
    api_key="dai_sk_...",
    agent_name="support-agent",
    strategy="auto",       # "auto" | "menu" | "semantic"
    max_menu_size=20,
)

routed = router.smart_route(
    query="...",
    top_k=5,
    category="support",     # restrict to a category
    include_attachments=True,
)

Knob	Effect
`strategy="auto"`	Smart route when query present, full menu otherwise
`max_menu_size`	Hard cap on returned skills, regardless of registry size
`category`	Restrict to a single registry category
`include_attachments=False`	Strip bundled scripts to shrink the payload

Plan limits

The Router itself is Free. What scales with your plan is skill count and a few publishing / analytics capabilities:

	Free	Core	Pro	Enterprise
Skills in your org	10	50	250	Unlimited
`smart_route`	✓	✓	✓	✓
Publish to registry	—	✓	✓	✓
`analytics/compare` + `leaderboard`	—	✓	✓	✓
Per-agent bundle templates	—	—	✓	✓
Private registry / BYO embedder	—	—	—	✓

smart_route is Free because its value scales with skill count — a user with 8 skills doesn’t benefit; a user with 80 does, and at 80 they’ve already left the Free tier.

Smart-route internals: how skills are picked

The server runs a hybrid retrieval pipeline against your registry:

Embed the query with the production embedding model (currently text-embedding-004).
Hybrid retrieve — combine semantic similarity (vector cosine on description_embedding) with lexical match (PostgreSQL full-text search on search_doc). Fuse with Reciprocal Rank Fusion (RRF) — parameter-free, no weight to tune.
Re-rank by effectiveness — blend retrieval score with each skill’s historical pass rate. Skills with proven track records bubble up; new skills aren’t penalised until they have enough activations to be judged.
Persist a routing_decision row so the offered-vs-activated join can close.

The hybrid step is why queries like “snake-language formatter” find Python skills (semantic) and queries like “python” find them too (lexical) — pure-dense retrieval misses one or the other.

Caching: why multi-step agent loops still produce one routing decision

The Router caches build_prompt_fragment() results in-process for 30 seconds by default. This matters because a single user turn often produces multiple LLM calls (tool-using agent loops, retries, sub-steps). Without caching, each call would re-route the same query, producing duplicate routing_decision rows and wasted embedding calls.With caching, every LLM call within a turn:

Hits the platform once (the first call)
Reuses the cached fragment and the same routing_id for all subsequent calls
Produces exactly one routing_decision row for the entire turn

router = SkillRouter(
    api_key="dai_sk_...",
    fragment_cache_ttl=30.0,     # shorten for very dynamic registrys
    fragment_cache_size=64,
)

# Bypass when you want a fresh routing decision (replay / regression):
fragment, routing_id = router.build_prompt_fragment(
    query="...",
    bypass_cache=True,
)

Routing telemetry — what gets logged

Every routing call writes a record. Joined with traces, it produces:

Metric	Where it surfaces
Activations per skill	Registry detail page
Effectiveness per skill	Registry detail page
Per-(skill, model) effectiveness	Registry detail page
Offered-but-not-activated rate	Skill detail page
Trend (improving / stable / degrading)	Registry browse + alerts

Per-model effectiveness is the single hardest signal for competitors to replicate — they don’t have your routing telemetry, so they can’t compute it.

Graceful degradation — what happens when something fails

Failure	Behavior
`GEMINI_API_KEY` unset on server	Falls back to full menu — every active skill, name+description only
Hybrid SQL errors (e.g., right after schema migration)	Same fallback; warns once
Platform unreachable from SDK	Returns `("", None)` from `build_prompt_fragment` — agent runs with base instructions only
Persisted `routing_decision` write fails	Synthetic `routing_id` returned; downstream join just won’t find a row

Net effect: a Router failure never breaks your agent run. Worst case, the LLM falls back to whatever instructions it had pre-Router.

Inspecting a routing decision directly

curl -X POST https://api.decimal.ai/api/v1/skills/route \
  -H "Authorization: Bearer dai_sk_..." \
  -H "Content-Type: application/json" \
  -d '{
    "query": "user wants to refund a settled payment",
    "agent_name": "support-agent",
    "top_k": 5
  }'

Or in Python:

preview = router.smart_route("user wants to refund...", top_k=5)
print([s["name"] for s in preview["skills"]])

What the Router does NOT do

Execute skills. Skills are markdown — they shape the LLM’s behavior. For code execution, see how skills differ from tools.
Decide which model to use. Model routing is separate. The Router operates within your chosen model.
Cache bodies across processes. The in-process cache is per-process. For shared caching, put a service in front.

Router vs disk auto-loading (pick one)

Some agent runtimes (Claude Code, Cursor) auto-discover SKILL.md files from .claude/skills/ and inject them into the system prompt themselves. The Router also injects skills into the system prompt — from the platform. Running both at once means the same skill ends up in the prompt twice, which confuses the LLM and inflates token cost. The fix is to pick one source of skill injection per agent process:

Your setup	What to use	Why
Python app (LangChain / OpenAI Agents / Pydantic AI) — no IDE runtime	Router with `enable_skill_loader=True`	The framework doesn’t auto-load from disk; Router is the only injector. No duplication.
Claude Code / Cursor — IDE-managed agent	Disk auto-loading, no Router	The runtime injects from `.claude/skills/`; adding the Router would duplicate.
Python app wrapping Anthropic SDK that also runs inside Claude Code	Pick one: either skip `enable_skill_loader=True` OR pass `disk_sync=False` and remove local `SKILL.md` files	Both runtimes are active; only one should inject.
Authoring + editing locally, deploying as a Python service	Edit `SKILL.md` files locally; SDK syncs them to platform; Router loads from platform at runtime	Disk is the editing surface, platform is the runtime surface, Router is the bridge.

The SDK auto-detects known disk runtimes via environment variables (CLAUDECODE, CLAUDE_CODE_ENTRYPOINT, CURSOR_AGENT) and logs a one-shot warning when enable_skill_loader=True fires inside one of them. Silence the warning intentionally with DECIMALAI_SUPPRESS_DISK_RUNTIME_WARNING=1 if you’ve consciously chosen the setup (e.g., a benchmark).

`disk_sync=False` — Router is the only source

When you’re running a Python stack that should rely solely on the platform for skill content, pass disk_sync=False to the framework adapter:

from decimalai.openai_agents import install
install(enable_skill_loader=True, disk_sync=False)

This skips:

Local SKILL.md auto-discovery (no disk read)
Push of local skills to the platform (no sync_skills)
Pull of platform-only skills to disk (no pull_missing)

The Router still calls the platform’s /api/v1/skills/route per turn; that’s the runtime injector. Everything happens over the network — nothing on disk. Available on decimalai.openai_agents.install() and decimalai.langchain.install(). The pydantic_ai and anthropic adapters never touched disk, so they don’t need the flag.

Skills API endpoints — raw REST surface
SkillRouter Python class — full SDK reference with CRUD, sync, export-to-disk
Skills & Data Pipeline — conceptual model: skills vs tools vs prompts
Registry API — browse and install public skills
Skills Observability tutorial — measure skill impact with experiments

How fast?

What if it fails?

What's the payoff?

​Before / after

​Integrate in 3 steps

​How it works (high level)

​Routing strategies

​Response shape

​Policy controls

​Plan limits

​Router vs disk auto-loading (pick one)

​disk_sync=False — Router is the only source

​Related