Skip to main content
Most agents ship with a hardcoded list of tools or skills. As your registry grows past a handful, the prompt bloats, the LLM gets confused, and you have no signal about which skills actually work. The Skill Router fixes both. It picks which skills to surface on every user turn based on the query, and it logs every decision so DecimalAI can compute which skills are pulling their weight — automatically.

How fast?

~250–350ms warm. Cache hit (multi-turn) is free. Adds ~10–25% to total turn latency.

What if it fails?

Every method degrades to an empty fragment + a warning log. Your agent never breaks on a Router hiccup.

What's the payoff?

Per-skill, per-model effectiveness scoring in the dashboard. The data is the moat.

Before / after

from langchain.agents import create_react_agent

# You decide upfront which tools the LLM sees.
# Every turn includes all of them, even when irrelevant.
agent = create_react_agent(
    llm,
    tools=[code_review, security_audit, python_linter,
           refund_handler, ...20 more...],
    prompt=system_prompt,
)
The shift: you stop choosing skills at agent-creation time and start choosing them at prompt-assembly time, based on what the user actually asked.

Integrate in 3 steps

1

Initialize the SDK

import decimalai
decimalai.init()  # reads DECIMAL_API_KEY from env
2

Install the framework adapter

Pick your framework. Each adapter is a one-line install with the same flag:
from decimalai.langchain import install
install(enable_skill_loader=True)
Skills are injected into every BaseChatModel.invoke() / ainvoke() as a SystemMessage. Works with LangChain agents, LangGraph nodes, and bare LLM calls.
3

Run your agent — telemetry flows automatically

Make a normal call. Open the DecimalAI dashboard. You’ll see:
  • Which skills the Router offered for each user turn
  • Which ones the LLM activated
  • Per-skill pass rate, per-(skill, model) effectiveness, trend over time
No additional instrumentation required. The framework adapter you installed in step 2 stamps the routing_id on every trace so the join can close server-side.

How it works (high level)

The single load-bearing detail: the same routing_id propagates from the Router through every LLM call in a turn to the trace. That join is what produces per-skill effectiveness scoring.

Routing strategies

StrategyMethodWhen to use
Full menurouter.get_menu()Registry under ~20 skills. Load all names + descriptions; let the LLM choose.
Smart routerouter.smart_route(query=...)Larger registry. Server picks the top-K by hybrid retrieval and historical effectiveness.
On-demand bodyrouter.get_skill_body(name)Two-pass agents. Surface menu first; load body only after the LLM commits.
The framework adapters default to smart route when a query is detectable in context, full menu otherwise.

Response shape

Both get_menu() and smart_route() return the same envelope:
{
  "skills": [
    {
      "name": "refund-policy",
      "description": "Process refunds for settled payments.",
      "category": "support",
      "score": 0.91,
      "relevance": 0.84,
      "performance": 0.87,
      "version": 4
    }
  ],
  "prompt_fragment": "## Recommended Skills\n| refund-policy | ... |",
  "strategy": "smart_routing",
  "routing_id": "rt_a1b2c3d4..."
}
prompt_fragment is a ready-to-splice markdown block. routing_id is the join key you (or your adapter) attach to the resulting trace.

Policy controls

router = SkillRouter(
    api_key="dai_sk_...",
    agent_name="support-agent",
    strategy="auto",       # "auto" | "menu" | "semantic"
    max_menu_size=20,
)

routed = router.smart_route(
    query="...",
    top_k=5,
    category="support",     # restrict to a category
    include_attachments=True,
)
KnobEffect
strategy="auto"Smart route when query present, full menu otherwise
max_menu_sizeHard cap on returned skills, regardless of registry size
categoryRestrict to a single registry category
include_attachments=FalseStrip bundled scripts to shrink the payload

Plan limits

The Router itself is Free. What scales with your plan is skill count and a few publishing / analytics capabilities:
FreeCoreProEnterprise
Skills in your org1050250Unlimited
smart_route
Publish to registry
analytics/compare + leaderboard
Per-agent bundle templates
Private registry / BYO embedder
smart_route is Free because its value scales with skill count — a user with 8 skills doesn’t benefit; a user with 80 does, and at 80 they’ve already left the Free tier.
The server runs a hybrid retrieval pipeline against your registry:
  1. Embed the query with the production embedding model (currently text-embedding-004).
  2. Hybrid retrieve — combine semantic similarity (vector cosine on description_embedding) with lexical match (PostgreSQL full-text search on search_doc). Fuse with Reciprocal Rank Fusion (RRF) — parameter-free, no weight to tune.
  3. Re-rank by effectiveness — blend retrieval score with each skill’s historical pass rate. Skills with proven track records bubble up; new skills aren’t penalised until they have enough activations to be judged.
  4. Persist a routing_decision row so the offered-vs-activated join can close.
The hybrid step is why queries like “snake-language formatter” find Python skills (semantic) and queries like “python” find them too (lexical) — pure-dense retrieval misses one or the other.
The Router caches build_prompt_fragment() results in-process for 30 seconds by default. This matters because a single user turn often produces multiple LLM calls (tool-using agent loops, retries, sub-steps). Without caching, each call would re-route the same query, producing duplicate routing_decision rows and wasted embedding calls.With caching, every LLM call within a turn:
  • Hits the platform once (the first call)
  • Reuses the cached fragment and the same routing_id for all subsequent calls
  • Produces exactly one routing_decision row for the entire turn
router = SkillRouter(
    api_key="dai_sk_...",
    fragment_cache_ttl=30.0,     # shorten for very dynamic registrys
    fragment_cache_size=64,
)

# Bypass when you want a fresh routing decision (replay / regression):
fragment, routing_id = router.build_prompt_fragment(
    query="...",
    bypass_cache=True,
)
Every routing call writes a record. Joined with traces, it produces:
MetricWhere it surfaces
Activations per skillRegistry detail page
Effectiveness per skillRegistry detail page
Per-(skill, model) effectivenessRegistry detail page
Offered-but-not-activated rateSkill detail page
Trend (improving / stable / degrading)Registry browse + alerts
Per-model effectiveness is the single hardest signal for competitors to replicate — they don’t have your routing telemetry, so they can’t compute it.
FailureBehavior
GEMINI_API_KEY unset on serverFalls back to full menu — every active skill, name+description only
Hybrid SQL errors (e.g., right after schema migration)Same fallback; warns once
Platform unreachable from SDKReturns ("", None) from build_prompt_fragment — agent runs with base instructions only
Persisted routing_decision write failsSynthetic routing_id returned; downstream join just won’t find a row
Net effect: a Router failure never breaks your agent run. Worst case, the LLM falls back to whatever instructions it had pre-Router.
curl -X POST https://api.decimal.ai/api/v1/skills/route \
  -H "Authorization: Bearer dai_sk_..." \
  -H "Content-Type: application/json" \
  -d '{
    "query": "user wants to refund a settled payment",
    "agent_name": "support-agent",
    "top_k": 5
  }'
Or in Python:
preview = router.smart_route("user wants to refund...", top_k=5)
print([s["name"] for s in preview["skills"]])
  • Execute skills. Skills are markdown — they shape the LLM’s behavior. For code execution, see how skills differ from tools.
  • Decide which model to use. Model routing is separate. The Router operates within your chosen model.
  • Cache bodies across processes. The in-process cache is per-process. For shared caching, put a service in front.

Router vs disk auto-loading (pick one)

Some agent runtimes (Claude Code, Cursor) auto-discover SKILL.md files from .claude/skills/ and inject them into the system prompt themselves. The Router also injects skills into the system prompt — from the platform. Running both at once means the same skill ends up in the prompt twice, which confuses the LLM and inflates token cost. The fix is to pick one source of skill injection per agent process:
Your setupWhat to useWhy
Python app (LangChain / OpenAI Agents / Pydantic AI) — no IDE runtimeRouter with enable_skill_loader=TrueThe framework doesn’t auto-load from disk; Router is the only injector. No duplication.
Claude Code / Cursor — IDE-managed agentDisk auto-loading, no RouterThe runtime injects from .claude/skills/; adding the Router would duplicate.
Python app wrapping Anthropic SDK that also runs inside Claude CodePick one: either skip enable_skill_loader=True OR pass disk_sync=False and remove local SKILL.md filesBoth runtimes are active; only one should inject.
Authoring + editing locally, deploying as a Python serviceEdit SKILL.md files locally; SDK syncs them to platform; Router loads from platform at runtimeDisk is the editing surface, platform is the runtime surface, Router is the bridge.
The SDK auto-detects known disk runtimes via environment variables (CLAUDECODE, CLAUDE_CODE_ENTRYPOINT, CURSOR_AGENT) and logs a one-shot warning when enable_skill_loader=True fires inside one of them. Silence the warning intentionally with DECIMALAI_SUPPRESS_DISK_RUNTIME_WARNING=1 if you’ve consciously chosen the setup (e.g., a benchmark).

disk_sync=False — Router is the only source

When you’re running a Python stack that should rely solely on the platform for skill content, pass disk_sync=False to the framework adapter:
from decimalai.openai_agents import install
install(enable_skill_loader=True, disk_sync=False)
This skips:
  • Local SKILL.md auto-discovery (no disk read)
  • Push of local skills to the platform (no sync_skills)
  • Pull of platform-only skills to disk (no pull_missing)
The Router still calls the platform’s /api/v1/skills/route per turn; that’s the runtime injector. Everything happens over the network — nothing on disk. Available on decimalai.openai_agents.install() and decimalai.langchain.install(). The pydantic_ai and anthropic adapters never touched disk, so they don’t need the flag.