Prerequisites
- DecimalAI SDK installed (
pip install decimalai[evals]) - An API key (
DECIMAL_API_KEY) - An agent producing traces (see Quickstart)
- An OpenAI or Together.AI API key for fine-tuning
Instrument Your Agent
Make sure your agent is instrumented and sending traces:After running your agent on production traffic for a period, you’ll have traces in the dashboard.
Evaluate Traces
Attach evaluators to score output quality:Traces are now scored automatically. Check the Evaluate page in the dashboard to see pass rates and trends.
Review the Eval Dashboard
In the dashboard, navigate to Evaluate:
- Pass Rate: What percentage of traces are passing your evaluators
- Score Distribution: Histogram of scores across all traces
- Evaluator Breakdown: Which evaluators catch the most failures
Build a Dataset
Navigate to Datasets in the sidebar, then click “Build Dataset”:Each multi-turn conversation becomes one training example. Tool calls and results are preserved so the fine-tuned model learns when and how to use tools.
- Select agent: Choose
support-agent - Filter by manifest: Use the latest manifest version (ensures current agent config)
- Filter by eval verdict: Select only
passverdicts - Choose format: SFT (supervised fine-tuning)
- Click Build
Launch Fine-Tuning
From the dataset detail page, click “Train”:
- Select provider: OpenAI, Together.AI, or Gemini
- Enter credentials: API key for the training provider
- Configure: Choose base model, training epochs
- Launch
| Provider | Models |
|---|---|
| OpenAI | GPT-4o, GPT-4.1-mini, GPT-4.1-nano |
| Together.AI | Llama 4, Llama 3.3/3.1, Qwen 3, DeepSeek R1/V3, Mistral |
| Gemini | Gemini 2.5 Flash, Gemini 2.5 Pro |
Alternative: Pull Data for External Training
Prefer to train locally or with your own infrastructure? Pull the dataset via SDK or CLI:Push to HuggingFace Hub for use with Axolotl, Unsloth, or TRL:Now use the data in any training framework:
- Axolotl
- Unsloth
- TRL
- In-Memory (No File)
Deploy and Iterate
Update your agent to use the fine-tuned model. DecimalAI will:
- Detect the manifest change (model changed) and register a new version
- Generate a compatibility report for existing traces
- Continue evaluating new traces from the fine-tuned model
- Build the next dataset from improved outputs
You’ve done it
Instrumented an agent and collected production traces
Scored traces with evaluators and a manifest compatibility check
Built a versioned SFT dataset filtered to keep + pass traces
Exported to HuggingFace and fine-tuned a model
Closed the loop — deployed the fine-tuned model, which produces traces for the next iteration
What Makes This Unique
Most platforms stop at evaluation. DecimalAI connects:- Manifest compatibility ensures training data matches your current agent config
- Eval scoring ensures only high-quality outputs enter training data
- Automatic format conversion handles the complex multi-turn, tool-using conversation structure
- HuggingFace Hub integration means one-click compatibility with every open-source trainer
- The loop repeats — each fine-tuned model feeds the next iteration
Next Steps
Datasets Guide
Filter strategies, version pinning, and export formats in depth.
Replay Guide
Regenerate training data by replaying historical inputs against the new model.
Evaluations
Configure quality gates so only high-signal traces enter datasets.
Manifests
How compatibility is computed when you change the agent.