Skills & Data Pipeline

The two final pieces: how skills augment agent behavior, and how DecimalAI turns evaluated, compatible traces into training data.

Skills

Skills are a unique DecimalAI concept — reusable instruction files that augment agent behavior.

Skill

A skill is a structured instruction file (typically SKILL.md) that an agent loads on demand to modify its behavior. Skills combine prompt instructions with configuration metadata. A skill is not a tool. A skill is markdown that shapes how the agent thinks — it’s loaded into the prompt and never executes. A tool is an executable function the LLM calls during a run. A skill can reference tools, but it can’t do anything itself.

Skill vs Tool: A skill is a structured instruction that modifies how the agent thinks. A tool is an executable function that modifies what the agent can do. A skill can reference tools (e.g., “use the search_docs tool to find examples”), but a skill is not a tool itself.

	Skill	Tool
What it is	Structured instruction file (SKILL.md)	Executable function with JSON Schema interface
How it works	Loaded into the prompt at runtime	Called by the LLM during execution
Versioned by	Content hash of instruction text	JSON Schema of parameters + return type
Example	”When reviewing code, check for security vulnerabilities and style”	`search_docs(query: str, limit: int) → List[Doc]`

Skill Activation

A record of which skills were active during a specific trace. Reported by the SDK via the active_skills field. Used to measure per-skill effectiveness — which skills correlate with higher quality outputs. Effectiveness rolls up into a single quality measure (SkillScore), and skills are published, forked, and discovered through the registry. Two pointers to go deeper:

SkillScore — how per-skill effectiveness is measured and scored.
Registry — how skills are published, forked, and discovered.

→ See Skills for the full guide.

Data Pipeline

The final stage of the lifecycle: turning evaluated, compatible traces into training data.

Dataset

A dataset is a curated collection of training examples built from filtered production traces. The key insight: by combining manifest compatibility + eval scores, DecimalAI ensures training data is both current (recorded against the latest agent config) and high-quality (passed evaluation). → See Datasets & Training for the full guide.

Export Formats

Format	Full Name	How It Works
SFT	Supervised Fine-Tuning	Each row is an input→output pair from an LLM call. Trains the model to replicate the agent’s best behavior.
DPO	Direct Preference Optimization	Each row has a “chosen” (good) and “rejected” (bad) response for the same input. Trains the model to prefer better outputs.

Replay

Replay re-runs a historical trace’s input against the current version of your agent. The original output and the replayed output are then compared — often by a pairwise LLM judge — to measure whether the agent improved or regressed. Replayed traces have source_type="replay" and generate DPO preference pairs (original = rejected, new = chosen — or vice versa if the new version regressed). → See Replay for the full guide.

Repair

Repair mechanically fixes a trace to be compatible with a new manifest version. Examples: renaming a tool parameter, removing references to a deleted field. Repairs are deterministic (zero LLM cost) and fully auditable. → See Manifests & Versioning for repair details.

Training Pipeline Tutorial

End-to-end: trace → evaluate → fine-tune.

Glossary

Quick A-Z reference for any term.

​Skills

​Skill

​Skill Activation

​Data Pipeline

​Dataset

​Export Formats

​Replay

​Repair

​Next