Skip to main content
The two final pieces: how skills augment agent behavior, and how DecimalAI turns evaluated, compatible traces into training data.

Skills

Skills are a unique DecimalAI concept — reusable instruction files that augment agent behavior.

Skill

A skill is a structured instruction file (typically SKILL.md) that an agent loads on demand to modify its behavior. Skills combine prompt instructions with configuration metadata. A skill is not a tool. A skill is markdown that shapes how the agent thinks — it’s loaded into the prompt and never executes. A tool is an executable function the LLM calls during a run. A skill can reference tools, but it can’t do anything itself.
Skill vs Tool: A skill is a structured instruction that modifies how the agent thinks. A tool is an executable function that modifies what the agent can do. A skill can reference tools (e.g., “use the search_docs tool to find examples”), but a skill is not a tool itself.
SkillTool
What it isStructured instruction file (SKILL.md)Executable function with JSON Schema interface
How it worksLoaded into the prompt at runtimeCalled by the LLM during execution
Versioned byContent hash of instruction textJSON Schema of parameters + return type
Example”When reviewing code, check for security vulnerabilities and style”search_docs(query: str, limit: int) → List[Doc]

Skill Activation

A record of which skills were active during a specific trace. Reported by the SDK via the active_skills field. Used to measure per-skill effectiveness — which skills correlate with higher quality outputs. Effectiveness rolls up into a single quality measure (SkillScore), and skills are published, forked, and discovered through the registry. Two pointers to go deeper:
  • SkillScore — how per-skill effectiveness is measured and scored.
  • Registry — how skills are published, forked, and discovered.
→ See Skills for the full guide.

Data Pipeline

The final stage of the lifecycle: turning evaluated, compatible traces into training data.

Dataset

A dataset is a curated collection of training examples built from filtered production traces. The key insight: by combining manifest compatibility + eval scores, DecimalAI ensures training data is both current (recorded against the latest agent config) and high-quality (passed evaluation). → See Datasets & Training for the full guide.

Export Formats

FormatFull NameHow It Works
SFTSupervised Fine-TuningEach row is an input→output pair from an LLM call. Trains the model to replicate the agent’s best behavior.
DPODirect Preference OptimizationEach row has a “chosen” (good) and “rejected” (bad) response for the same input. Trains the model to prefer better outputs.

Replay

Replay re-runs a historical trace’s input against the current version of your agent. The original output and the replayed output are then compared — often by a pairwise LLM judge — to measure whether the agent improved or regressed. Replayed traces have source_type="replay" and generate DPO preference pairs (original = rejected, new = chosen — or vice versa if the new version regressed). → See Replay for the full guide.

Repair

Repair mechanically fixes a trace to be compatible with a new manifest version. Examples: renaming a tool parameter, removing references to a deleted field. Repairs are deterministic (zero LLM cost) and fully auditable. → See Manifests & Versioning for repair details.

Next

Training Pipeline Tutorial

End-to-end: trace → evaluate → fine-tune.

Glossary

Quick A-Z reference for any term.