Building Datasets
From the Dashboard
Filtering
| Filter | Purpose |
|---|---|
| Agent | Which agent’s traces to include |
| Manifest version | Only traces from a specific config version |
| Eval verdict | Only traces that passed quality checks |
| Compatibility | Only keep or repair traces (exclude stale data) |
Compatibility verdicts tell you what to do with each trace for training: keep — use as-is; repair — patch a stale field, then use; replay — re-run the input to regenerate output; drop — too stale to use. These are orthogonal to a trace’s HIGH/MEDIUM/LOW IMPACT severity.
SFT Format
DecimalAI converts multi-turn agent traces into the chat completion format expected by fine-tuning APIs. This handles the complexity of tool-using agents:Why This Matters
A ReAct agent calls the LLM multiple times per user request. Each call, the LLM sees all prior messages and generates only the next assistant turn. Naive SFT (single input → output) doesn’t capture this multi-turn structure. DecimalAI’s format preserves:- System prompts — the instructions the model should follow
- Tool calls — when and how the model should use tools
- Tool results — what the model learns from tool output
- Multi-turn reasoning — the full chain of thought
Multi-Agent Traces
For multi-agent architectures (supervisor + workers), DecimalAI can build separate datasets per agent role, ensuring each sub-agent trains on its own traces.DPO Format
DPO (Direct Preference Optimization) pairs are generated from replay results:Dataset Versioning
Each dataset supports multiple versions:- Adding traces creates a new version
- Version comparison shows added/removed/unchanged rows
- Quality review workflow: pending → approved → rejected
Row Preview
View dataset contents inline with expandable row detail:- Role-colored messages (system, user, assistant, tool)
- Tool call arguments and results
- Raw JSON toggle
- Quality stats: score distribution, message length, split breakdown
Fine-Tuning
Supported Providers
| Provider | Models | Setup |
|---|---|---|
| OpenAI | GPT-4o, GPT-4o-mini, GPT-4.1-mini, GPT-4.1-nano | OpenAI API key |
| Together.AI | Llama 4, Llama 3.3/3.1, Qwen 3/2.5, DeepSeek R1/V3, Mistral | Together.AI API key |
| Gemini | Gemini 2.5 Flash, Gemini 2.5 Pro | Google Cloud API key + project |
| Generic | Any model | Webhook URL (optional) |
Launching a Job
From the dataset detail page:
The platform submits the job and polls for completion. Training metrics (loss, validation) are stored for review.
Export
You can also export datasets for training elsewhere:- JSONL: Standard format for OpenAI fine-tuning
- Parquet: Efficient columnar format for large datasets
Pull & Export
The fastest way to get training data onto disk:version parameter accepts:
| Value | Behavior |
|---|---|
None or "latest" | Most recent version (default) |
"v3" or "3" | Specific version by number |
| Full UUID | Exact version ID |
HuggingFace Hub Integration
Push datasets directly to HuggingFace Hub, making them instantly loadable by Axolotl, Unsloth, TRL, and any tool that supportsload_dataset().
Push to Hub
Load as HuggingFace Dataset (In-Memory)
Skip the file entirely — load a DecimalAI dataset directly as adatasets.Dataset object:
Requirements:
pip install huggingface_hub datasets. These are optional dependencies — the core SDK works without them.Next Steps
Training Pipeline tutorial
End-to-end: trace → evaluate → fine-tune.
Datasets API
REST reference for build, export, version comparison.
Skills & Data Pipeline
SFT vs DPO, repair vs replay.
Replay
Regenerate training data by replaying historical inputs.