Skip to main content
This tutorial walks through DecimalAI’s end-to-end training workflow: trace → evaluate → build dataset → fine-tune. By the end, you’ll have a fine-tuned model trained on your agent’s best production outputs. This isn’t a one-shot pipeline — it’s a flywheel. Each fine-tuned model you deploy produces new traces, which feed the next round of evaluation and training:

Prerequisites

  • DecimalAI SDK installed (pip install decimalai[evals])
  • An API key (DECIMAL_API_KEY)
  • An agent producing traces (see Quickstart)
  • An OpenAI or Together.AI API key for fine-tuning

1

Instrument Your Agent

Make sure your agent is instrumented and sending traces:
import decimalai
decimalai.init(api_key="dai_sk_...", agent_name="support-agent")

from decimalai.langchain import install
install()

# Your agent runs as normal — traces are captured automatically
After running your agent on production traffic for a period, you’ll have traces in the dashboard.
2

Evaluate Traces

Attach evaluators to score output quality:
import decimalai
decimalai.init(api_key="dai_sk_...", agent_name="support-agent")

from decimalai.langchain import install
from decimalai.evals import Relevance, Toxicity, eval, TraceData

# Pre-built evaluators
evals = [Relevance(), Toxicity()]

# Custom evaluator
@eval(name="answered_question")
def check_answered(trace: TraceData) -> bool:
    return len(trace.output) > 50 and "?" not in trace.output[-20:]

# Register with auto-scoring
install(evals=[*evals, check_answered])
Traces are now scored automatically. Check the Evaluate page in the dashboard to see pass rates and trends.
3

Review the Eval Dashboard

In the dashboard, navigate to Evaluate:
  • Pass Rate: What percentage of traces are passing your evaluators
  • Score Distribution: Histogram of scores across all traces
  • Evaluator Breakdown: Which evaluators catch the most failures
Use the verdict filter to isolate failing traces and understand what’s going wrong.
4

Build a Dataset

Navigate to Datasets in the sidebar, then click “Build Dataset”:
  1. Select agent: Choose support-agent
  2. Filter by manifest: Use the latest manifest version (ensures current agent config)
  3. Filter by eval verdict: Select only pass verdicts
  4. Choose format: SFT (supervised fine-tuning)
  5. Click Build
DecimalAI converts multi-turn agent traces into the chat completion format:
{
  "messages": [
    {"role": "system", "content": "You are a support agent..."},
    {"role": "user", "content": "How do I reset my password?"},
    {"role": "assistant", "content": null, "tool_calls": [{"function": {"name": "search_docs", "arguments": "{\"query\": \"password reset\"}"}}]},
    {"role": "tool", "content": "{\"results\": [\"Go to Settings > Security...\"]}"},
    {"role": "assistant", "content": "To reset your password, go to Settings > Security..."}
  ]
}
Each multi-turn conversation becomes one training example. Tool calls and results are preserved so the fine-tuned model learns when and how to use tools.
5

Launch Fine-Tuning

From the dataset detail page, click “Train”:
  1. Select provider: OpenAI, Together.AI, or Gemini
  2. Enter credentials: API key for the training provider
  3. Configure: Choose base model, training epochs
  4. Launch
The platform submits the job and polls for completion. Training metrics (loss, validation) are stored for review.Supported Providers:
ProviderModels
OpenAIGPT-4o, GPT-4.1-mini, GPT-4.1-nano
Together.AILlama 4, Llama 3.3/3.1, Qwen 3, DeepSeek R1/V3, Mistral
GeminiGemini 2.5 Flash, Gemini 2.5 Pro
6

Alternative: Pull Data for External Training

Prefer to train locally or with your own infrastructure? Pull the dataset via SDK or CLI:
import decimalai
decimalai.init()

# Pull to a local file
result = decimalai.pull_dataset("ds_abc123", "./training_data.jsonl", version="latest")
print(f"Wrote {result['row_count']} rows")
# CLI equivalent
decimalai datasets pull ds_abc123 -o ./training_data.jsonl
Push to HuggingFace Hub for use with Axolotl, Unsloth, or TRL:
# Push to HF Hub — instantly usable by open-source trainers
decimalai.push_to_hub("ds_abc123", "my-org/support-agent-sft")
# CLI equivalent
decimalai datasets push-to-hub ds_abc123 my-org/support-agent-sft
Now use the data in any training framework:
datasets:
  - path: my-org/support-agent-sft
    type: chat_template
7

Deploy and Iterate

Update your agent to use the fine-tuned model. DecimalAI will:
  1. Detect the manifest change (model changed) and register a new version
  2. Generate a compatibility report for existing traces
  3. Continue evaluating new traces from the fine-tuned model
  4. Build the next dataset from improved outputs
This creates a continuous improvement loop: better model → better traces → better training data → even better model.

You’ve done it

Instrumented an agent and collected production traces
Scored traces with evaluators and a manifest compatibility check
Built a versioned SFT dataset filtered to keep + pass traces
Exported to HuggingFace and fine-tuned a model
Closed the loop — deployed the fine-tuned model, which produces traces for the next iteration

What Makes This Unique

Most platforms stop at evaluation. DecimalAI connects:
  • Manifest compatibility ensures training data matches your current agent config
  • Eval scoring ensures only high-quality outputs enter training data
  • Automatic format conversion handles the complex multi-turn, tool-using conversation structure
  • HuggingFace Hub integration means one-click compatibility with every open-source trainer
  • The loop repeats — each fine-tuned model feeds the next iteration

Next Steps

Datasets Guide

Filter strategies, version pinning, and export formats in depth.

Replay Guide

Regenerate training data by replaying historical inputs against the new model.

Evaluations

Configure quality gates so only high-signal traces enter datasets.

Manifests

How compatibility is computed when you change the agent.