Skip to main content
DecimalAI builds training datasets from your production traces — filtered by agent version, eval scores, and compatibility verdicts — and launches fine-tuning jobs directly from the platform.

Building Datasets

From the Dashboard

1

Open Build Dataset

Navigate to DatasetsBuild Dataset.
2

Select an agent

Choose which agent’s traces to draw from.
3

Filter by manifest version

Pin to a manifest version to ensure current config.
4

Filter by eval verdict

pass only is recommended.
5

Choose a format

SFT (supervised fine-tuning) or DPO (preference pairs).
6

Build

Click Build.

Filtering

FilterPurpose
AgentWhich agent’s traces to include
Manifest versionOnly traces from a specific config version
Eval verdictOnly traces that passed quality checks
CompatibilityOnly keep or repair traces (exclude stale data)
Compatibility verdicts tell you what to do with each trace for training: keep — use as-is; repair — patch a stale field, then use; replay — re-run the input to regenerate output; drop — too stale to use. These are orthogonal to a trace’s HIGH/MEDIUM/LOW IMPACT severity.

SFT Format

DecimalAI converts multi-turn agent traces into the chat completion format expected by fine-tuning APIs. This handles the complexity of tool-using agents:
{
  "messages": [
    {"role": "system", "content": "You are a support agent..."},
    {"role": "user", "content": "How do I reset my password?"},
    {"role": "assistant", "content": null, "tool_calls": [
      {"function": {"name": "search_docs", "arguments": "{\"query\": \"password reset\"}"}}
    ]},
    {"role": "tool", "content": "{\"results\": [\"Go to Settings > Security...\"]}"},
    {"role": "assistant", "content": "To reset your password, go to Settings > Security..."}
  ]
}

Why This Matters

A ReAct agent calls the LLM multiple times per user request. Each call, the LLM sees all prior messages and generates only the next assistant turn. Naive SFT (single input → output) doesn’t capture this multi-turn structure. DecimalAI’s format preserves:
  • System prompts — the instructions the model should follow
  • Tool calls — when and how the model should use tools
  • Tool results — what the model learns from tool output
  • Multi-turn reasoning — the full chain of thought

Multi-Agent Traces

For multi-agent architectures (supervisor + workers), DecimalAI can build separate datasets per agent role, ensuring each sub-agent trains on its own traces.

DPO Format

DPO (Direct Preference Optimization) pairs are generated from replay results:
{
  "prompt": "How do I reset my password?",
  "chosen": "To reset your password, go to Settings > Security...",
  "rejected": "I'm not sure, maybe check the FAQ?"
}
The “chosen” response comes from the current agent (v2), and the “rejected” from the older agent (v1) or a failed trace.

Dataset Versioning

Each dataset supports multiple versions:
  • Adding traces creates a new version
  • Version comparison shows added/removed/unchanged rows
  • Quality review workflow: pending → approved → rejected

Row Preview

View dataset contents inline with expandable row detail:
  • Role-colored messages (system, user, assistant, tool)
  • Tool call arguments and results
  • Raw JSON toggle
  • Quality stats: score distribution, message length, split breakdown

Fine-Tuning

Supported Providers

ProviderModelsSetup
OpenAIGPT-4o, GPT-4o-mini, GPT-4.1-mini, GPT-4.1-nanoOpenAI API key
Together.AILlama 4, Llama 3.3/3.1, Qwen 3/2.5, DeepSeek R1/V3, MistralTogether.AI API key
GeminiGemini 2.5 Flash, Gemini 2.5 ProGoogle Cloud API key + project
GenericAny modelWebhook URL (optional)

Launching a Job

From the dataset detail page:
1

Train

Click “Train”.
2

Select provider and base model

Pick the training provider and the base model to fine-tune.
3

Enter your API key

Provide your training provider API key.
4

Configure parameters

Set epochs and other training parameters.
5

Launch

Click Launch.
The platform submits the job and polls for completion. Training metrics (loss, validation) are stored for review.

Export

You can also export datasets for training elsewhere:
  • JSONL: Standard format for OpenAI fine-tuning
  • Parquet: Efficient columnar format for large datasets

Pull & Export

The fastest way to get training data onto disk:
import decimalai
decimalai.init()

# Pull the latest version
result = decimalai.pull_dataset("ds_abc123", "./training_data.jsonl")
print(f"Wrote {result['row_count']} rows to {result['file_path']}")

# Pull a specific version
result = decimalai.pull_dataset(
    "ds_abc123",
    "./data.jsonl",
    version="v2",
)

# Pull as Parquet
result = decimalai.pull_dataset(
    "ds_abc123",
    "./data.parquet",
)
The version parameter accepts:
ValueBehavior
None or "latest"Most recent version (default)
"v3" or "3"Specific version by number
Full UUIDExact version ID

HuggingFace Hub Integration

Push datasets directly to HuggingFace Hub, making them instantly loadable by Axolotl, Unsloth, TRL, and any tool that supports load_dataset().

Push to Hub

import decimalai
decimalai.init()

result = decimalai.push_to_hub(
    "ds_abc123",
    "my-org/support-agent-sft",
)
print(f"Pushed to {result['repo_url']}")
Now the dataset is usable across the entire open-source training stack:
# Unsloth / TRL
from datasets import load_dataset
ds = load_dataset("my-org/support-agent-sft")

Load as HuggingFace Dataset (In-Memory)

Skip the file entirely — load a DecimalAI dataset directly as a datasets.Dataset object:
import decimalai
decimalai.init()

ds = decimalai.load_hf_dataset("ds_abc123")
# Dataset({features: ['messages'], num_rows: 500})

# Use directly with TRL
from trl import SFTTrainer
trainer = SFTTrainer(model=model, train_dataset=ds, ...)
Requirements: pip install huggingface_hub datasets. These are optional dependencies — the core SDK works without them.

Next Steps

Training Pipeline tutorial

End-to-end: trace → evaluate → fine-tune.

Datasets API

REST reference for build, export, version comparison.

Skills & Data Pipeline

SFT vs DPO, repair vs replay.

Replay

Regenerate training data by replaying historical inputs.