End-to-End Training Pipeline

This tutorial walks through DecimalAI’s end-to-end training workflow: trace → evaluate → build dataset → fine-tune. By the end, you’ll have a fine-tuned model trained on your agent’s best production outputs. This isn’t a one-shot pipeline — it’s a flywheel. Each fine-tuned model you deploy produces new traces, which feed the next round of evaluation and training:

Prerequisites

DecimalAI SDK installed (pip install decimalai[evals])
An API key (DECIMAL_API_KEY)
An agent producing traces (see Quickstart)
An OpenAI or Together.AI API key for fine-tuning

Instrument Your Agent

Make sure your agent is instrumented and sending traces:

import decimalai
decimalai.init(api_key="dai_sk_...", agent_name="support-agent")

from decimalai.langchain import install
install()

# Your agent runs as normal — traces are captured automatically

After running your agent on production traffic for a period, you’ll have traces in the dashboard.

Evaluate Traces

Attach evaluators to score output quality:

import decimalai
decimalai.init(api_key="dai_sk_...", agent_name="support-agent")

from decimalai.langchain import install
from decimalai.evals import Relevance, Toxicity, eval, TraceData

# Pre-built evaluators
evals = [Relevance(), Toxicity()]

# Custom evaluator
@eval(name="answered_question")
def check_answered(trace: TraceData) -> bool:
    return len(trace.output) > 50 and "?" not in trace.output[-20:]

# Register with auto-scoring
install(evals=[*evals, check_answered])

Traces are now scored automatically. Check the Evaluate page in the dashboard to see pass rates and trends.

Review the Eval Dashboard

In the dashboard, navigate to Evaluate:

Pass Rate: What percentage of traces are passing your evaluators
Score Distribution: Histogram of scores across all traces
Evaluator Breakdown: Which evaluators catch the most failures

Use the verdict filter to isolate failing traces and understand what’s going wrong.

Build a Dataset

Navigate to Datasets in the sidebar, then click “Build Dataset”:

Select agent: Choose support-agent
Filter by manifest: Use the latest manifest version (ensures current agent config)
Filter by eval verdict: Select only pass verdicts
Choose format: SFT (supervised fine-tuning)
Click Build

DecimalAI converts multi-turn agent traces into the chat completion format:

{
  "messages": [
    {"role": "system", "content": "You are a support agent..."},
    {"role": "user", "content": "How do I reset my password?"},
    {"role": "assistant", "content": null, "tool_calls": [{"function": {"name": "search_docs", "arguments": "{\"query\": \"password reset\"}"}}]},
    {"role": "tool", "content": "{\"results\": [\"Go to Settings > Security...\"]}"},
    {"role": "assistant", "content": "To reset your password, go to Settings > Security..."}
  ]
}

Each multi-turn conversation becomes one training example. Tool calls and results are preserved so the fine-tuned model learns when and how to use tools.

Launch Fine-Tuning

From the dataset detail page, click “Train”:

Select provider: OpenAI, Together.AI, or Gemini
Enter credentials: API key for the training provider
Configure: Choose base model, training epochs
Launch

The platform submits the job and polls for completion. Training metrics (loss, validation) are stored for review.Supported Providers:

Provider	Models
OpenAI	GPT-4o, GPT-4.1-mini, GPT-4.1-nano
Together.AI	Llama 4, Llama 3.3/3.1, Qwen 3, DeepSeek R1/V3, Mistral
Gemini	Gemini 2.5 Flash, Gemini 2.5 Pro

Alternative: Pull Data for External Training

Prefer to train locally or with your own infrastructure? Pull the dataset via SDK or CLI:

import decimalai
decimalai.init()

# Pull to a local file
result = decimalai.pull_dataset("ds_abc123", "./training_data.jsonl", version="latest")
print(f"Wrote {result['row_count']} rows")

# CLI equivalent
decimalai datasets pull ds_abc123 -o ./training_data.jsonl

Push to HuggingFace Hub for use with Axolotl, Unsloth, or TRL:

# Push to HF Hub — instantly usable by open-source trainers
decimalai.push_to_hub("ds_abc123", "my-org/support-agent-sft")

# CLI equivalent
decimalai datasets push-to-hub ds_abc123 my-org/support-agent-sft

Now use the data in any training framework:

Axolotl
Unsloth
TRL
In-Memory (No File)

datasets:
  - path: my-org/support-agent-sft
    type: chat_template

from datasets import load_dataset
ds = load_dataset("my-org/support-agent-sft")
# Use with FastLanguageModel...

from datasets import load_dataset
from trl import SFTTrainer

ds = load_dataset("my-org/support-agent-sft")
trainer = SFTTrainer(model=model, train_dataset=ds, ...)

# Skip file entirely — load directly as HF Dataset
ds = decimalai.load_hf_dataset("ds_abc123")
# → Dataset({features: ['messages'], num_rows: 500})

Deploy and Iterate

Update your agent to use the fine-tuned model. DecimalAI will:

Detect the manifest change (model changed) and register a new version
Generate a compatibility report for existing traces
Continue evaluating new traces from the fine-tuned model
Build the next dataset from improved outputs

This creates a continuous improvement loop: better model → better traces → better training data → even better model.

You’ve done it

Instrumented an agent and collected production traces

Scored traces with evaluators and a manifest compatibility check

Built a versioned SFT dataset filtered to keep + pass traces

Exported to HuggingFace and fine-tuned a model

Closed the loop — deployed the fine-tuned model, which produces traces for the next iteration

What Makes This Unique

Most platforms stop at evaluation. DecimalAI connects:

Manifest compatibility ensures training data matches your current agent config
Eval scoring ensures only high-quality outputs enter training data
Automatic format conversion handles the complex multi-turn, tool-using conversation structure
HuggingFace Hub integration means one-click compatibility with every open-source trainer
The loop repeats — each fine-tuned model feeds the next iteration

Next Steps

Datasets Guide

Filter strategies, version pinning, and export formats in depth.

Replay Guide

Regenerate training data by replaying historical inputs against the new model.

Evaluations

Configure quality gates so only high-signal traces enter datasets.

Manifests

How compatibility is computed when you change the agent.

​Prerequisites

​You’ve done it

​What Makes This Unique

​Next Steps

Datasets Guide

Replay Guide

Evaluations

Manifests

Prerequisites

You’ve done it

What Makes This Unique

Next Steps