Playground - DecimalAI

The Playground page lets you experiment with your agent’s prompts in a safe sandbox. Re-run real production traces with modified instructions, compare outputs side-by-side, and save successful edits as new skill versions.

Playground uses your own LLM API keys (BYOK — Bring Your Own Key). DecimalAI doesn’t subsidize LLM calls — you pay your provider directly. Configure keys in Settings → Credentials.

Playground vs. skillevaluation. The Playground is for interactive, single-trace exploration — re-run one trace, eyeball the side-by-side, iterate by hand. To measure whether a skill actually helps across a batch of cases (a with-skill vs. without-skill A/B that produces a SkillScore), use the skillevaluation benchmark instead — see Skills. Use the Playground to form a hypothesis; use a benchmark to prove it.

Getting Started

Navigate to the Playground page from the sidebar, or open it contextually:

From a trace: Click “Open in Playground” on any trace detail page
From a skill: Click “Test in Playground” on any skill detail page
Direct URL: /playground or /playground?skill=code-review

Three Modes

The Playground runs in one of three modes — pick the one that matches what you’re iterating on.

Import from Trace
Skill Testing
Scratch Pad

Re-run a production trace with modifications:

Select an agent from the dropdown
Select a trace — the system prompt and user message auto-populate
The original output appears on the right for comparison
Edit the system prompt or user message
Choose a model and temperature
Click Run (or press ⌘+Enter)
Compare the new output against the original side-by-side

This is the default mode — ideal for debugging unexpected outputs or testing prompt changes against real conversations.

Test skill body edits against production traces:

Navigate from a skill page via “Test in Playground”, or use /playground?skill=my-skill-name
The skill body editor shows the editable portion of the skill
A trace selector shows recent traces where this skill was activated
Edit the skill body (e.g., add instructions, refine examples)
Click Run — the platform reconstructs the full system prompt with your edited skill body swapped in
Review the side-by-side comparison
If satisfied, click “Save as New Version” — this creates a new SkillVersion with your edited body

The skill editor highlights only the skill’s body within the larger system prompt. Your edits are surgically inserted at the correct position, preserving all other prompt context.

Model Selection

The Playground page supports multiple LLM providers:

Provider	Models	API Key Env Var
OpenAI	gpt-4o, gpt-4o-mini, gpt-4-turbo, o1-preview, o3-mini	`OPENAI_API_KEY`
Google	gemini-3.5-flash, gemini-2.5-pro	`GEMINI_API_KEY`
Anthropic	claude-opus-4-7, claude-sonnet-4-6, claude-haiku-4-5	`ANTHROPIC_API_KEY`

Select a provider and model from the dropdowns. If no API key is configured for the selected provider, you’ll see a friendly error with a link to Settings.

All three providers run directly in the Playground with your own keys (BYOK) — set each provider’s key in Settings. gemini-3.5-flash is the default model.

Temperature Control

Adjust the temperature slider (0.0–2.0) to control output randomness:

Temperature	Behavior
0.0	Deterministic — same output every time
0.3–0.5	Focused but varied — good for code and structured outputs
0.7	Default — balanced creativity and precision
1.0–2.0	More creative — good for brainstorming and open-ended tasks

Comparing Outputs

When importing from a trace or testing a skill, the page shows a side-by-side comparison:

Left Panel	Right Panel
Original output — what the agent produced in production	New output — what the agent produces with your modified prompt

This makes it easy to spot differences and judge whether your changes improved the output.

Saving Skill Changes

In Skill Testing mode, after running a modified skill body:

If the output improves, click “Save as New Version”
This creates a new SkillVersion with your edited body
The version is automatically tracked in the skill’s version history
All future activations of this skill use the updated body

Saving overwrites the skill’s current body. The previous version is preserved in the version history — you can always revert from the Skills page.

Keyboard Shortcuts

Shortcut	Action
`⌘+Enter` (Mac) / `Ctrl+Enter` (Win)	Run the prompt

Workflow Examples

Debugging a Bad Output

Find the trace

Find a trace with a poor output in the Traces page.

Open in Playground

Click “Open in Playground” on the trace detail page.

Refine the prompt

Edit the system prompt to add more specific instructions.

Iterate

Run → compare → iterate until the output improves.

Apply the fix

Apply the improved prompt to your agent’s configuration.

Hand-Tuning a Skill Body

Open the skill

Open a skill → “Test in Playground”.

Pick a few traces

Select 3–5 recent traces from the trace dropdown to sanity-check your edit against real conversations.

Edit and compare

For each trace: edit the skill body → run → compare side-by-side.

Save a new version

When satisfied, click “Save as New Version”.

Confirm it helps

Monitor the new version’s effectiveness on the Skills dashboard. To confirm the edit actually helps across a batch — not just on the handful you eyeballed — run a skillevaluation benchmark.

Testing Different Models

Set up the prompt

Import a trace or write a scratch prompt.

Run the first model

Run with gpt-4o → note the output.

Run a second model

Switch to claude-sonnet-4-6 → run again.

Choose the best fit

Compare outputs to choose the best model for your use case.

Next Steps

Tracing

Open any production trace in the playground to iterate on it.

Skills

Test skill changes in the playground before saving a new version.

Evaluations

Score playground runs with the same evaluators used in production.

​Getting Started

​Three Modes

​Model Selection

​Temperature Control

​Comparing Outputs

​Saving Skill Changes

​Keyboard Shortcuts

​Workflow Examples

​Debugging a Bad Output

​Hand-Tuning a Skill Body

​Testing Different Models

​Next Steps

Tracing

Skills

Evaluations

Getting Started

Three Modes

Model Selection

Temperature Control

Comparing Outputs

Saving Skill Changes

Keyboard Shortcuts

Workflow Examples

Debugging a Bad Output

Hand-Tuning a Skill Body

Testing Different Models

Next Steps