Skip to main content

This guide shows you how to:
  • Upload training (and optional validation) datasets
  • Create a fine-tuning job via Python or cURL
  • Monitor job status and events
  • Download the resulting model checkpoints
  • Understand the fine-tuning job API shape

Prerequisites

  1. Select a supported base model for fine-tuning.
  2. Create a dataset for training.
    Optionally, create a validation dataset as well.
    A typical split is:
    • 80–90% of examples → training
    • 10–20% of examples → validation
  3. Create an API key.
  4. Export the API key as an environment variable:
    export NEBIUS_API_KEY=<YOUR_API_KEY>
    

How to fine-tune a model

1. Install and import the client

  1. Install the openai Python SDK (Nebius exposes an OpenAI-compatible API):
    pip3 install --upgrade openai
    
  2. Import libraries:
    import os
    import time
    from openai import OpenAI
    
  3. Initialize the Nebius client:
    client = OpenAI(
        base_url="https://api.tokenfactory.nebius.com/v1/",
        api_key=os.environ["NEBIUS_API_KEY"],
    )
    

2. Upload training (and optional validation) datasets

If you already uploaded datasets via the UI or API, you can skip this and reuse their IDs.
# Upload a training dataset
training_file = client.files.create(
    file=open("training.jsonl", "rb"),
    purpose="fine-tune",
)

print("Training file ID:", training_file.id)

# Optional: upload a validation dataset
validation_file = client.files.create(
    file=open("validation.jsonl", "rb"),
    purpose="fine-tune",
)

print("Validation file ID:", validation_file.id)
You only need the id fields from these responses to create a fine-tuning job.

3. Configure fine-tuning parameters

For a full list of allowed fields and defaults, see API specification for a fine-tuning job.
job_request = {
    "model": "meta-llama/Llama-3.1-8B-Instruct",
    "training_file": training_file.id,
    # Optional: only include this if you actually uploaded a validation file
    "validation_file": validation_file.id,
    "suffix": "my-domain-adapter",  # Optional, helps you identify this run
    "hyperparameters": {
        "batch_size": 8,
        "learning_rate": 1e-5,
        "n_epochs": 3,
        "warmup_ratio": 0.0,
        "weight_decay": 0.0,
        "lora": True,
        "lora_r": 16,
        "lora_alpha": 16,
        "lora_dropout": 0.05,
        "packing": True,
        "max_grad_norm": 1.0,
        # context_length is in tokens (e.g. default: 8192)
        "context_length": 8192,
    },
    "seed": 42,
    "integrations": [
        {
            "type": "wandb",
            "wandb": {
                "project": "my-finetunes",
                "name": "llama-8b-customer-support",
                "entity": "my-team",
                "tags": ["finetune", "llama-3.1", "support-bot"],
            },
        },
        {
            "type": "hf",
            "hf": {
                "output_repo_name": "<repo>",  # e.g. "org/llama-8b-support-ft"
                "api_token": "<token-value>",  # HF PAT with write access
            },
        },
    ],
}

4. Create and run the fine-tuning job

job = client.fine_tuning.jobs.create(**job_request)
print("Created job:", job.id, "status:", job.status)

5. Poll job status

Fine-tuning takes time. Poll the job until it reaches a terminal status:
ACTIVE_STATUSES = ["succeeded", "failed", "cancelled"]
POLL_INTERVAL_SECONDS = 15

while job.status in ACTIVE_STATUSES:
    time.sleep(POLL_INTERVAL_SECONDS)
    job = client.fine_tuning.jobs.retrieve(job.id)
    print("Current status:", job.status)

print("Final status:", job.status)
print("Job ID:", job.id)

if job.status == "failed":
    print("Job failed with error:", job.error)
  • If job.status == "succeeded", training finished successfully.
  • If job.status == "failed", inspect job.error for code, message, and param. For transient 5xx errors, you can safely retry.

Events help you understand the lifecycle (file validation, dataset processing, training progress).
if job.status == "succeeded":
    events = client.fine_tuning.jobs.list_events(job.id)
    for event in events.data:
        print(event.created_at, event.level, "-", event.message)
You can consider training finished when you see messages like:
  • Dataset processed successfully
  • Training completed successfully

7. Download checkpoints and model files

Each checkpoint represents the model after a certain number of training steps (often per epoch).
if job.status == "succeeded":
    checkpoints = client.fine_tuning.jobs.checkpoints.list(job.id).data

    for checkpoint in checkpoints:
        print("Checkpoint:", checkpoint.id, "step:", checkpoint.step_number)
        os.makedirs(checkpoint.id, exist_ok=True)

        for file_id in checkpoint.result_files:
            # Get file metadata
            file_obj = client.files.retrieve(file_id)
            filename = file_obj.filename  # e.g. "<checkpoint_ID>/adapter_config.json"

            # Download file contents
            file_content = client.files.content(file_id)

            # Save to disk with the same filename
            output_path = os.path.join(checkpoint.id, os.path.basename(filename))
            file_content.write_to_file(output_path)
            print("Saved:", output_path)
You now have:
  • Intermediate checkpoints (per step / epoch)
  • Final checkpoint (usually the last one in the list)
Use the latest checkpoint for deployment unless you have a specific reason to pick an earlier one.You can now deploy your fine-tuned model and serve it via Nebius Token Factory.

API specification for a fine-tuning job

This section describes the request payload when creating a fine-tuning job.
{
  "model": "<string>",
  "suffix": "<string>",
  "training_file": "<file_ID>",
  "validation_file": "<file_ID>",
  "hyperparameters": {
    "batch_size": 8,
    "learning_rate": 0.00001,
    "n_epochs": 3,
    "warmup_ratio": 0,
    "weight_decay": 0,
    "lora": false,
    "lora_r": 8,
    "lora_alpha": 8,
    "lora_dropout": 0,
    "packing": true,
    "max_grad_norm": 1,
    "context_length": 8192
  },
  "seed": 42,
   "integrations": [
    {
      "type": "wandb",
      "wandb": {
        "project": "<string>",
        "name": "<string>",
        "entity": "<string>",
        "tags": ["<string>"]
      }
    },
    {
      "type": "hf",
      "hf": {
        "output_repo_name": "<string>", 
        "api_token": "<string>"
      }
    }
  ]
}

Top-level fields

  • model (string, required) Base model to fine-tune.
  • suffix (string, optional) Human-readable suffix appended to the model name. Use this to distinguish multiple runs, e.g., customer-support-v1.
  • training_file (string, required) ID of the file with the training dataset (purpose = "fine-tune"). See:
  • validation_file (string, optional) ID of the file with the validation dataset. Same format and requirements as the training dataset.
  • hyperparameters (object, optional) Fine-tuning configuration. Omitted fields fall back to defaults.
  • seed (integer, optional) Random seed used during training. Using the same seed and the same data/hyperparameters improves reproducibility between runs.
  • integrations (array, optional) Third-party integrations configured for this job.
    • type (string, required) Currently supported: "wandb".
  • wandb (object, required when type = "wandb") Settings for exporting metrics to

    Weights & Biases:
    • project (string, required): W&B project name.
    • name (string, optional): Run name.
    • entity (string, optional): W&B entity (user or team).
    • tags (array of strings, optional): Tags to attach to the run.
    Hugging Face integration: hf (object, required when type = "hf")
    • output_repo_name (string, required):
      Target Hugging Face repo name, e.g. "org/llama-8b-support-ft" or "username/my-finetune".
    • api_token (string, required):
      Hugging Face access token (PAT) with write access to output_repo_name.

Hyperparameters

All hyperparameters are nested under hyperparameters.
  • batch_size (integer, optional) Number of examples per training batch. Larger batch sizes are more efficient but require more VRAM.
    • Typical range: 832
    • Default: 8
  • context_length (integer, optional) Maximum sequence length in tokens used during fine-tuning. Inputs longer than this limit will cause errors.
    • Units: tokens (e.g., 8192)
    • Supported values depend on the base model; see the models page.
    • Default: 8192
    We recommend:
    • Analyze the token length distribution of your dataset.
    • Choose the smallest context length that covers your P95–P99 examples.
    • If packing = false, a much larger context length choice than your examples leads to heavy padding and wasted compute.
    Larger context lengths significantly increase VRAM usage and FLOPs due to attention scaling.
  • learning_rate (float, optional) Step size for gradient descent.
    • Must be >= 0
    • Typical values: 1e-65e-5
    • Default: 0.00001
  • n_epochs (integer, optional) Number of passes over the entire dataset.
    • Range: 120
    • Default: 3
    More epochs increase task specialization but also overfitting risk.
  • warmup_ratio (float, optional) Fraction of total training steps used for linear warmup of the learning rate from 0 to the target value.
    • Range: 01
    • Default: 0
  • weight_decay (float, optional) L2 regularization factor applied to weights. Helps prevent overfitting and preserve generalization.
    • Must be >= 0
    • Default: 0
  • lora (boolean, optional) Whether to use LoRA (Low-Rank Adaptation) instead of full-parameter fine-tuning.
    • true: only LoRA adapter weights are trained; base model weights stay frozen.
    • false: full fine-tuning is applied.
    • Default: false
  • lora_r (integer, optional) Rank of LoRA matrices. Higher values increase capacity but also overfitting and cost.
    • Range: 8128
    • Default: 8
  • lora_alpha (integer, optional) Scaling factor for LoRA updates. Higher values increase the impact of LoRA adapters.
    • Must be >= 8
    • Default: 8
  • lora_dropout (float, optional) Dropout applied to LoRA layers. Helps prevent overfitting, especially on small datasets.
    • Range: 01
    • Default: 0
  • packing (boolean, optional) If true, multiple shorter samples can be packed into a single sequence to better utilize the context window and improve efficiency.
    • Default: true
  • max_grad_norm (float, optional) Gradient clipping threshold (L2 norm). Avoids unstable updates:
    • Too high → effectively no clipping → risk of exploding gradients.
    • Too low → overly aggressive clipping → risk of under-training.
    • Must be >= 0
    • Default: 1

Fine-tuning job object (response shape)

When you query a job or list jobs, you get objects shaped like this:
{
  "data": [
    {
      "id": "<string>",
      "created_at": 123,
      "hyperparameters": {
        "batch_size": 8,
        "learning_rate": 0.00001,
        "n_epochs": 3,
        "warmup_ratio": 0,
        "weight_decay": 0,
        "lora": false,
        "lora_r": 8,
        "lora_alpha": 8,
        "lora_dropout": 0,
        "packing": true,
        "max_grad_norm": 1,
        "context_length": 8192
      },
      "model": "<string>",
      "status": "validating_files",
      "training_file": "<string>",
      "error": {
        "code": "<string>",
        "message": "<string>",
        "param": "<string>"
      },
      "finished_at": 123,
     "integrations": [
        {
          "wandb": {
            "project": "<string>",
            "name": "<string>",
            "entity": "<string>",
            "tags": ["<string>"]
          },
          "type": "wandb"
        },
        {
          "hf": {
            "output_repo_name": "<string>",
            "api_token": "<string>"
          },
          "type": "hf"
        }
      ],
      "object": "fine_tuning.job",
      "organization_id": "",
      "result_files": [],
      "seed": 0,
      "suffix": "<string>",
      "trained_tokens": 123,
      "validation_file": "<string>",
      "estimated_finish": 123,
      "trained_steps": 123,
      "total_steps": 123
    }
  ],
  "has_more": true,
  "object": "list"
}
Key fields to watch during a run:
  • status: validating_filesqueuedrunningsucceeded / failed
  • trained_tokens: how many tokens have been processed so far
  • trained_steps / total_steps: progress of the training loop
  • error: structured error info when status = "failed"
  • result_files: IDs of produced artifacts (also available via checkpoints API)
Use these fields plus job events to drive your own monitoring, dashboards, or CI/CD automation around fine-tuning.