Skills
Skills are a unique DecimalAI concept — reusable instruction files that augment agent behavior.Skill
A skill is a structured instruction file (typicallySKILL.md) that an agent loads on demand to modify its behavior. Skills combine prompt instructions with configuration metadata.
A skill is not a tool. A skill is markdown that shapes how the agent thinks — it’s loaded into the prompt and never executes. A tool is an executable function the LLM calls during a run. A skill can reference tools, but it can’t do anything itself.
Skill vs Tool: A skill is a structured instruction that modifies how the agent thinks. A tool is an executable function that modifies what the agent can do. A skill can reference tools (e.g., “use the search_docs tool to find examples”), but a skill is not a tool itself.
| Skill | Tool | |
|---|---|---|
| What it is | Structured instruction file (SKILL.md) | Executable function with JSON Schema interface |
| How it works | Loaded into the prompt at runtime | Called by the LLM during execution |
| Versioned by | Content hash of instruction text | JSON Schema of parameters + return type |
| Example | ”When reviewing code, check for security vulnerabilities and style” | search_docs(query: str, limit: int) → List[Doc] |
Skill Activation
A record of which skills were active during a specific trace. Reported by the SDK via theactive_skills field. Used to measure per-skill effectiveness — which skills correlate with higher quality outputs.
Effectiveness rolls up into a single quality measure (SkillScore), and skills are published, forked, and discovered through the registry. Two pointers to go deeper:
- SkillScore — how per-skill effectiveness is measured and scored.
- Registry — how skills are published, forked, and discovered.
Data Pipeline
The final stage of the lifecycle: turning evaluated, compatible traces into training data.Dataset
A dataset is a curated collection of training examples built from filtered production traces. The key insight: by combining manifest compatibility + eval scores, DecimalAI ensures training data is both current (recorded against the latest agent config) and high-quality (passed evaluation). → See Datasets & Training for the full guide.Export Formats
| Format | Full Name | How It Works |
|---|---|---|
| SFT | Supervised Fine-Tuning | Each row is an input→output pair from an LLM call. Trains the model to replicate the agent’s best behavior. |
| DPO | Direct Preference Optimization | Each row has a “chosen” (good) and “rejected” (bad) response for the same input. Trains the model to prefer better outputs. |
Replay
Replay re-runs a historical trace’s input against the current version of your agent. The original output and the replayed output are then compared — often by a pairwise LLM judge — to measure whether the agent improved or regressed. Replayed traces havesource_type="replay" and generate DPO preference pairs (original = rejected, new = chosen — or vice versa if the new version regressed).
→ See Replay for the full guide.
Repair
Repair mechanically fixes a trace to be compatible with a new manifest version. Examples: renaming a tool parameter, removing references to a deleted field. Repairs are deterministic (zero LLM cost) and fully auditable. → See Manifests & Versioning for repair details.Next
Training Pipeline Tutorial
End-to-end: trace → evaluate → fine-tune.
Glossary
Quick A-Z reference for any term.