Datasets & Training

DecimalAI builds training datasets from your production traces — filtered by agent version, eval scores, and compatibility verdicts — and launches fine-tuning jobs directly from the platform.

Building Datasets

From the Dashboard

Open Build Dataset

Navigate to Datasets → Build Dataset.

Select an agent

Choose which agent’s traces to draw from.

Filter by manifest version

Pin to a manifest version to ensure current config.

Filter by eval verdict

pass only is recommended.

Choose a format

SFT (supervised fine-tuning) or DPO (preference pairs).

Build

Click Build.

Filtering

Filter	Purpose
Agent	Which agent’s traces to include
Manifest version	Only traces from a specific config version
Eval verdict	Only traces that passed quality checks
Compatibility	Only `keep` or `repair` traces (exclude stale data)

Compatibility verdicts tell you what to do with each trace for training: keep — use as-is; repair — patch a stale field, then use; replay — re-run the input to regenerate output; drop — too stale to use. These are orthogonal to a trace’s HIGH/MEDIUM/LOW IMPACT severity.

SFT Format

DecimalAI converts multi-turn agent traces into the chat completion format expected by fine-tuning APIs. This handles the complexity of tool-using agents:

{
  "messages": [
    {"role": "system", "content": "You are a support agent..."},
    {"role": "user", "content": "How do I reset my password?"},
    {"role": "assistant", "content": null, "tool_calls": [
      {"function": {"name": "search_docs", "arguments": "{\"query\": \"password reset\"}"}}
    ]},
    {"role": "tool", "content": "{\"results\": [\"Go to Settings > Security...\"]}"},
    {"role": "assistant", "content": "To reset your password, go to Settings > Security..."}
  ]
}

Why This Matters

A ReAct agent calls the LLM multiple times per user request. Each call, the LLM sees all prior messages and generates only the next assistant turn. Naive SFT (single input → output) doesn’t capture this multi-turn structure. DecimalAI’s format preserves:

System prompts — the instructions the model should follow
Tool calls — when and how the model should use tools
Tool results — what the model learns from tool output
Multi-turn reasoning — the full chain of thought

Multi-Agent Traces

For multi-agent architectures (supervisor + workers), DecimalAI can build separate datasets per agent role, ensuring each sub-agent trains on its own traces.

DPO Format

DPO (Direct Preference Optimization) pairs are generated from replay results:

{
  "prompt": "How do I reset my password?",
  "chosen": "To reset your password, go to Settings > Security...",
  "rejected": "I'm not sure, maybe check the FAQ?"
}

The “chosen” response comes from the current agent (v2), and the “rejected” from the older agent (v1) or a failed trace.

Dataset Versioning

Each dataset supports multiple versions:

Adding traces creates a new version
Version comparison shows added/removed/unchanged rows
Quality review workflow: pending → approved → rejected

Row Preview

View dataset contents inline with expandable row detail:

Role-colored messages (system, user, assistant, tool)
Tool call arguments and results
Raw JSON toggle
Quality stats: score distribution, message length, split breakdown

Fine-Tuning

Supported Providers

Provider	Models	Setup
OpenAI	GPT-4o, GPT-4o-mini, GPT-4.1-mini, GPT-4.1-nano	OpenAI API key
Together.AI	Llama 4, Llama 3.3/3.1, Qwen 3/2.5, DeepSeek R1/V3, Mistral	Together.AI API key
Gemini	Gemini 2.5 Flash, Gemini 2.5 Pro	Google Cloud API key + project
Generic	Any model	Webhook URL (optional)

Launching a Job

From the dataset detail page:

Train

Click “Train”.

Select provider and base model

Pick the training provider and the base model to fine-tune.

Enter your API key

Provide your training provider API key.

Configure parameters

Set epochs and other training parameters.

Launch

Click Launch.

The platform submits the job and polls for completion. Training metrics (loss, validation) are stored for review.

Export

You can also export datasets for training elsewhere:

JSONL: Standard format for OpenAI fine-tuning
Parquet: Efficient columnar format for large datasets

Pull & Export

The fastest way to get training data onto disk:

import decimalai
decimalai.init()

# Pull the latest version
result = decimalai.pull_dataset("ds_abc123", "./training_data.jsonl")
print(f"Wrote {result['row_count']} rows to {result['file_path']}")

# Pull a specific version
result = decimalai.pull_dataset(
    "ds_abc123",
    "./data.jsonl",
    version="v2",
)

# Pull as Parquet
result = decimalai.pull_dataset(
    "ds_abc123",
    "./data.parquet",
)

The version parameter accepts:

Value	Behavior
`None` or `"latest"`	Most recent version (default)
`"v3"` or `"3"`	Specific version by number
Full UUID	Exact version ID

HuggingFace Hub Integration

Push datasets directly to HuggingFace Hub, making them instantly loadable by Axolotl, Unsloth, TRL, and any tool that supports load_dataset().

Push to Hub

import decimalai
decimalai.init()

result = decimalai.push_to_hub(
    "ds_abc123",
    "my-org/support-agent-sft",
)
print(f"Pushed to {result['repo_url']}")

Now the dataset is usable across the entire open-source training stack:

# Unsloth / TRL
from datasets import load_dataset
ds = load_dataset("my-org/support-agent-sft")

Load as HuggingFace Dataset (In-Memory)

Skip the file entirely — load a DecimalAI dataset directly as a datasets.Dataset object:

import decimalai
decimalai.init()

ds = decimalai.load_hf_dataset("ds_abc123")
# Dataset({features: ['messages'], num_rows: 500})

# Use directly with TRL
from trl import SFTTrainer
trainer = SFTTrainer(model=model, train_dataset=ds, ...)

Requirements: pip install huggingface_hub datasets. These are optional dependencies — the core SDK works without them.

Next Steps

Training Pipeline tutorial

End-to-end: trace → evaluate → fine-tune.

Datasets API

REST reference for build, export, version comparison.

Skills & Data Pipeline

SFT vs DPO, repair vs replay.

Replay

Regenerate training data by replaying historical inputs.

​Building Datasets

​From the Dashboard

​Filtering

​SFT Format

​Why This Matters

​Multi-Agent Traces

​DPO Format

​Dataset Versioning

​Row Preview

​Fine-Tuning

​Supported Providers

​Launching a Job

​Export

​Pull & Export

​HuggingFace Hub Integration

​Push to Hub

​Load as HuggingFace Dataset (In-Memory)

​Next Steps

Training Pipeline tutorial

Datasets API

Skills & Data Pipeline

Replay

Building Datasets

From the Dashboard

Filtering

SFT Format

Why This Matters

Multi-Agent Traces

DPO Format

Dataset Versioning

Row Preview

Fine-Tuning

Supported Providers

Launching a Job

Export

Pull & Export

HuggingFace Hub Integration

Push to Hub

Load as HuggingFace Dataset (In-Memory)

Next Steps