pull_dataset— writes a local file (JSONL or Parquet). Works with any tool.push_to_hub— pushes to HuggingFace Hub. Instantly loadable by Axolotl, Unsloth, TRL, etc.load_hf_dataset— returns adatasets.Datasetdirectly, no file needed.
decimalai.pull_dataset()
Download a versioned dataset to a local file.
The dataset ID.
Local file path. Format is inferred from extension (
.jsonl or .parquet).Version specifier:
None/"latest", "v3"/"3", or a full UUID.Override format:
"jsonl" or "parquet". Defaults to auto-detect from file extension.{"row_count": 500, "file_path": "./data.jsonl", "bytes_written": 12345, "format": "jsonl"}
decimalai.push_to_hub()
Push a dataset to HuggingFace Hub. Makes the dataset instantly loadable by Axolotl, Unsloth, TRL, and any tool supporting load_dataset().
The DecimalAI dataset ID.
HuggingFace repo in
"org/dataset-name" format.Version specifier:
None/"latest", "v3"/"3", or UUID.HuggingFace API token. Falls back to
HF_TOKEN env var or cached login.Create a private repo.
Dataset split name.
{"repo_url": "...", "repo_id": "...", "row_count": 500, "version_id": "...", "split": "train"}
Requires
pip install huggingface_hub datasets. These are optional dependencies.decimalai.load_hf_dataset()
Load a dataset directly as a HuggingFace Dataset object — no intermediate file needed.
The DecimalAI dataset ID.
Version specifier.
datasets.Dataset object.
What’s next
Datasets guide
Verdict filtering, versioning, and how to build training-ready splits.
Training tutorial
End-to-end SFT recipe using a DecimalAI dataset.