How to fine-tune your custom model

This guide shows you how to:

Upload training (and optional validation) datasets
Create a fine-tuning job via Python or cURL
Monitor job status and events
Download the resulting model checkpoints
Understand the fine-tuning job API shape

Prerequisites

Select a supported base model for fine-tuning.
Create a dataset for training.
Optionally, create a validation dataset as well. A typical split is:
- 80–90% of examples → training
- 10–20% of examples → validation
Create an API key.
Export the API key as an environment variable:
```
export NEBIUS_API_KEY=<YOUR_API_KEY>
```

How to fine-tune a model

Python
cURL

1. Install and import the client

Install the openai Python SDK (Nebius exposes an OpenAI-compatible API):
```
pip3 install --upgrade openai
```

Import libraries:

import os
import time
from openai import OpenAI

Initialize the Nebius client:

client = OpenAI(
    base_url="https://api.tokenfactory.nebius.com/v1/",
    api_key=os.environ["NEBIUS_API_KEY"],
)

2. Upload training (and optional validation) datasets

If you already uploaded datasets via the UI or API, you can skip this and reuse their IDs.

# Upload a training dataset
training_file = client.files.create(
    file=open("training.jsonl", "rb"),
    purpose="fine-tune",
)

print("Training file ID:", training_file.id)

# Optional: upload a validation dataset
validation_file = client.files.create(
    file=open("validation.jsonl", "rb"),
    purpose="fine-tune",
)

print("Validation file ID:", validation_file.id)

You only need the id fields from these responses to create a fine-tuning job.

3. Configure fine-tuning parameters

For a full list of allowed fields and defaults, see API specification for a fine-tuning job.

job_request = {
    "model": "meta-llama/Llama-3.1-8B-Instruct",
    "training_file": training_file.id,
    # Optional: only include this if you actually uploaded a validation file
    "validation_file": validation_file.id,
    "suffix": "my-domain-adapter",  # Optional, helps you identify this run
    "hyperparameters": {
        "batch_size": 8,
        "learning_rate": 1e-5,
        "n_epochs": 3,
        "warmup_ratio": 0.0,
        "weight_decay": 0.0,
        "lora": True,
        "lora_r": 16,
        "lora_alpha": 16,
        "lora_dropout": 0.05,
        "packing": True,
        "max_grad_norm": 1.0,
        # context_length is in tokens (e.g. default: 8192)
        "context_length": 8192,
    },
    "seed": 42,
    "integrations": [
        {
            "type": "wandb",
            "wandb": {
                "project": "my-finetunes",
                "name": "llama-8b-customer-support",
                "entity": "my-team",
                "tags": ["finetune", "llama-3.1", "support-bot"],
            },
        },
        {
            "type": "hf",
            "hf": {
                "output_repo_name": "<repo>",  # e.g. "org/llama-8b-support-ft"
                "api_token": "<token-value>",  # HF PAT with write access
            },
        },
    ],
}

4. Create and run the fine-tuning job

job = client.fine_tuning.jobs.create(**job_request)
print("Created job:", job.id, "status:", job.status)

5. Poll job status

Fine-tuning takes time. Poll the job until it reaches a terminal status:

ACTIVE_STATUSES = ["succeeded", "failed", "cancelled"]
POLL_INTERVAL_SECONDS = 15

while job.status in ACTIVE_STATUSES:
    time.sleep(POLL_INTERVAL_SECONDS)
    job = client.fine_tuning.jobs.retrieve(job.id)
    print("Current status:", job.status)

print("Final status:", job.status)
print("Job ID:", job.id)

if job.status == "failed":
    print("Job failed with error:", job.error)

If job.status == "succeeded", training finished successfully.
If job.status == "failed", inspect job.error for code, message, and param. For transient 5xx errors, you can safely retry.

6. Inspect job events (optional but recommended)

Events help you understand the lifecycle (file validation, dataset processing, training progress).

if job.status == "succeeded":
    events = client.fine_tuning.jobs.list_events(job.id)
    for event in events.data:
        print(event.created_at, event.level, "-", event.message)

You can consider training finished when you see messages like:

Dataset processed successfully
Training completed successfully

7. Download checkpoints and model files

Each checkpoint represents the model after a certain number of training steps (often per epoch).

if job.status == "succeeded":
    checkpoints = client.fine_tuning.jobs.checkpoints.list(job.id).data

    for checkpoint in checkpoints:
        print("Checkpoint:", checkpoint.id, "step:", checkpoint.step_number)
        os.makedirs(checkpoint.id, exist_ok=True)

        for file_id in checkpoint.result_files:
            # Get file metadata
            file_obj = client.files.retrieve(file_id)
            filename = file_obj.filename  # e.g. "<checkpoint_ID>/adapter_config.json"

            # Download file contents
            file_content = client.files.content(file_id)

            # Save to disk with the same filename
            output_path = os.path.join(checkpoint.id, os.path.basename(filename))
            file_content.write_to_file(output_path)
            print("Saved:", output_path)

You now have:

Intermediate checkpoints (per step / epoch)
Final checkpoint (usually the last one in the list)

Use the latest checkpoint for deployment unless you have a specific reason to pick an earlier one.You can now deploy your fine-tuned model and serve it via Nebius Token Factory.

1. Upload datasets

Upload the training dataset:

curl 'https://api.tokenfactory.nebius.com/v1/files' \
  -H 'Accept: application/json' \
  -H 'Content-Type: multipart/form-data' \
  -H "Authorization: Bearer $NEBIUS_API_KEY" \
  -F '[email protected]' \
  -F 'purpose=fine-tune'

Response example

{
  "id": "<file_ID>",
  "bytes": 700867,
  "created_at": 1738235422,
  "filename": "training.jsonl",
  "object": "file",
  "purpose": "fine-tune"
}

Save the id as your training_file.Optionally, upload a validation dataset in the same way and save its id as validation_file.

2. Create a fine-tuning job

curl 'https://api.tokenfactory.nebius.com/v1/fine_tuning/jobs' \
  -X POST \
  -H 'Accept: application/json' \
  -H 'Content-Type: application/json' \
  -H "Authorization: Bearer $NEBIUS_API_KEY" \
  -d '{
    "model": "meta-llama/Llama-3.1-8B-Instruct",
    "suffix": "my-domain-adapter",
    "training_file": "<training_file_ID>",
    "validation_file": "<validation_file_ID>",
    "hyperparameters": {
      "batch_size": 8,
      "learning_rate": 0.00001,
      "n_epochs": 3,
      "warmup_ratio": 0.0,
      "weight_decay": 0.0,
      "lora": true,
      "lora_r": 16,
      "lora_alpha": 16,
      "lora_dropout": 0.05,
      "packing": true,
      "max_grad_norm": 1.0,
      "context_length": 8192
    },
    "seed": 42,
    "integrations": [
      {
        "type": "wandb",
        "wandb": {
          "project": "my-finetunes",
          "name": "llama-8b-customer-support",
          "entity": "my-team",
          "tags": ["finetune", "llama-3.1", "support-bot"]
        }
      }
    ]
  }'

You can now deploy the produced model files when the job succeeds.

3. Check job status

curl "https://api.tokenfactory.nebius.com/v1/fine_tuning/jobs/<job_ID>" \
  -X GET \
  -H "Accept: application/json" \
  -H "Authorization: Bearer $NEBIUS_API_KEY"

200 OK response example

{
  "id": "<job_ID>",
  "created_at": 1738250578,
  "error": null,
  "finished_at": null,
  "hyperparameters": {
    "batch_size": 8,
    "learning_rate": 0.00001,
    "n_epochs": 3,
    "warmup_ratio": 0,
    "weight_decay": 0,
    "lora": true,
    "lora_r": 16,
    "lora_alpha": 16,
    "lora_dropout": 0.05,
    "packing": true,
    "max_grad_norm": 1,
    "context_length": 8192
  },
  "model": "<model_name>",
  "object": "fine_tuning.job",
  "organization_id": "",
  "result_files": [],
  "seed": 0,
  "status": "running",
  "trained_tokens": 0,
  "training_file": "<file_ID>",
  "validation_file": "<file_ID>",
  "estimated_finish": null,
  "suffix": "",
  "trained_steps": 0,
  "total_steps": 100
}

Poll this endpoint every ≥ 15 seconds.
Terminal statuses: succeeded, failed.
If status is failed, inspect the error object. For transient server errors (5xx), you can recreate the job.

4. Inspect job events

curl "https://api.tokenfactory.nebius.com/v1/fine_tuning/jobs/<job_ID>/events" \
  -X GET \
  -H "Accept: application/json" \
  -H "Authorization: Bearer $NEBIUS_API_KEY" \
  --url-query limit=50

Supported query parameters:

limit (integer, optional): Max number of events to return.
after (string, optional): Event ID to paginate after.

You can consider training completed when you see messages like:

Dataset '<file_ID>' processed successfully
Training completed successfully

200 OK response example

{
  "data": [
    {
      "object": "fine_tuning.job.event",
      "id": "<event_ID>",
      "created_at": 1738250578,
      "level": "info",
      "message": "Job is submitted",
      "source": "api",
      "job_uuid": "<job_ID>"
    },
    {
      "object": "fine_tuning.job.event",
      "id": "<event_ID>",
      "created_at": 1738250609,
      "level": "info",
      "message": "Dataset '<file_ID>' processed successfully",
      "source": "datasets",
      "job_uuid": "<job_ID>"
    }
  ],
  "has_more": false
}

5. List checkpoints

curl "https://api.tokenfactory.nebius.com/v1/fine_tuning/jobs/<job_ID>/checkpoints" \
  -X GET \
  -H "Accept: application/json" \
  -H "Authorization: Bearer $NEBIUS_API_KEY"

200 OK response example

{
  "object": "list",
  "data": [
    {
      "id": "<checkpoint_ID>",
      "created_at": 1740501233,
      "fine_tuned_model_checkpoint": "ft:meta-llama/Llama-3.1-8B-Instruct-2025-02-25:org_placeholder::IDPlaceholder:ckpt-step-3",
      "fine_tuning_job_id": "<job_ID>",
      "metrics": {
        "train_loss": 2.01,
        "valid_loss": 2.32
      },
      "object": "fine_tuning.job.checkpoint",
      "step_number": 3,
      "result_files": [
        "<file_ID_1>",
        "<file_ID_2>",
        "<file_ID_3>"
      ]
    }
  ],
  "first_id": "<first_checkpoint_ID>",
  "last_id": "<last_checkpoint_ID>",
  "has_more": false
}

The result_files array contains IDs of all files that belong to this checkpoint.

6. Inspect file metadata

curl "https://api.tokenfactory.nebius.com/v1/files/<file_ID>" \
  -X GET \
  -H "Accept: application/json" \
  -H "Authorization: Bearer $NEBIUS_API_KEY"

200 OK response example

{
  "id": "<file_ID>",
  "bytes": 907,
  "created_at": 1740501244,
  "filename": "<checkpoint_ID>/adapter_config.json",
  "object": "file",
  "purpose": "fine-tune"
}

Use the filename field to save the file with the correct path and extension.

7. Download file contents

curl "https://api.tokenfactory.nebius.com/v1/files/<file_ID>/content" \
  -X GET \
  -H "Accept: application/json" \
  -H "Authorization: Bearer $NEBIUS_API_KEY"

Copy the content from the response and save it locally with the name from the filename field you retrieved in the previous step.Once you download the required checkpoint files, you can host the fine-tuned model and serve it via Nebius Token Factory.

API specification for a fine-tuning job

This section describes the request payload when creating a fine-tuning job.

{
  "model": "<string>",
  "suffix": "<string>",
  "training_file": "<file_ID>",
  "validation_file": "<file_ID>",
  "hyperparameters": {
    "batch_size": 8,
    "learning_rate": 0.00001,
    "n_epochs": 3,
    "warmup_ratio": 0,
    "weight_decay": 0,
    "lora": false,
    "lora_r": 8,
    "lora_alpha": 8,
    "lora_dropout": 0,
    "packing": true,
    "max_grad_norm": 1,
    "context_length": 8192
  },
  "seed": 42,
   "integrations": [
    {
      "type": "wandb",
      "wandb": {
        "project": "<string>",
        "name": "<string>",
        "entity": "<string>",
        "tags": ["<string>"]
      }
    },
    {
      "type": "hf",
      "hf": {
        "output_repo_name": "<string>", 
        "api_token": "<string>"
      }
    }
  ]
}

Top-level fields

model (string, required) Base model to fine-tune.
suffix (string, optional) Human-readable suffix appended to the model name. Use this to distinguish multiple runs, e.g., customer-support-v1.
training_file (string, required) ID of the file with the training dataset (purpose = "fine-tune"). See:
- How to create a dataset for fine-tuning
- How to fine-tune a model
validation_file (string, optional) ID of the file with the validation dataset. Same format and requirements as the training dataset.
hyperparameters (object, optional) Fine-tuning configuration. Omitted fields fall back to defaults.
seed (integer, optional) Random seed used during training. Using the same seed and the same data/hyperparameters improves reproducibility between runs.
integrations (array, optional) Third-party integrations configured for this job.
- type (string, required) Currently supported: "wandb".
wandb (object, required when type = "wandb") Settings for exporting metrics to

Weights & Biases:
- project (string, required): W&B project name.
- name (string, optional): Run name.
- entity (string, optional): W&B entity (user or team).
- tags (array of strings, optional): Tags to attach to the run.
Hugging Face integration: hf (object, required when type = "hf")
- output_repo_name (string, required):
  Target Hugging Face repo name, e.g. "org/llama-8b-support-ft" or "username/my-finetune".
- api_token (string, required):
  Hugging Face access token (PAT) with write access to output_repo_name.

Hyperparameters

All hyperparameters are nested under hyperparameters.

batch_size (integer, optional) Number of examples per training batch. Larger batch sizes are more efficient but require more VRAM.
- Typical range: 8–32
- Default: 8
context_length (integer, optional) Maximum sequence length in tokens used during fine-tuning. Inputs longer than this limit will cause errors.
- Units: tokens (e.g., 8192)
- Supported values depend on the base model; see the models page.
- Default: 8192
We recommend:
- Analyze the token length distribution of your dataset.
- Choose the smallest context length that covers your P95–P99 examples.
- If packing = false, a much larger context length choice than your examples leads to heavy padding and wasted compute.
Larger context lengths significantly increase VRAM usage and FLOPs due to attention scaling.
learning_rate (float, optional) Step size for gradient descent.
- Must be >= 0
- Typical values: 1e-6–5e-5
- Default: 0.00001
n_epochs (integer, optional) Number of passes over the entire dataset.
- Range: 1–20
- Default: 3
More epochs increase task specialization but also overfitting risk.
warmup_ratio (float, optional) Fraction of total training steps used for linear warmup of the learning rate from 0 to the target value.
- Range: 0–1
- Default: 0
weight_decay (float, optional) L2 regularization factor applied to weights. Helps prevent overfitting and preserve generalization.
- Must be >= 0
- Default: 0
lora (boolean, optional) Whether to use LoRA (Low-Rank Adaptation) instead of full-parameter fine-tuning.
- true: only LoRA adapter weights are trained; base model weights stay frozen.
- false: full fine-tuning is applied.
- Default: false
lora_r (integer, optional) Rank of LoRA matrices. Higher values increase capacity but also overfitting and cost.
- Range: 8–128
- Default: 8
lora_alpha (integer, optional) Scaling factor for LoRA updates. Higher values increase the impact of LoRA adapters.
- Must be >= 8
- Default: 8
lora_dropout (float, optional) Dropout applied to LoRA layers. Helps prevent overfitting, especially on small datasets.
- Range: 0–1
- Default: 0
packing (boolean, optional) If true, multiple shorter samples can be packed into a single sequence to better utilize the context window and improve efficiency.
- Default: true
max_grad_norm (float, optional) Gradient clipping threshold (L2 norm). Avoids unstable updates:
- Too high → effectively no clipping → risk of exploding gradients.
- Too low → overly aggressive clipping → risk of under-training.
- Must be >= 0
- Default: 1

Fine-tuning job object (response shape)

When you query a job or list jobs, you get objects shaped like this:

{
  "data": [
    {
      "id": "<string>",
      "created_at": 123,
      "hyperparameters": {
        "batch_size": 8,
        "learning_rate": 0.00001,
        "n_epochs": 3,
        "warmup_ratio": 0,
        "weight_decay": 0,
        "lora": false,
        "lora_r": 8,
        "lora_alpha": 8,
        "lora_dropout": 0,
        "packing": true,
        "max_grad_norm": 1,
        "context_length": 8192
      },
      "model": "<string>",
      "status": "validating_files",
      "training_file": "<string>",
      "error": {
        "code": "<string>",
        "message": "<string>",
        "param": "<string>"
      },
      "finished_at": 123,
     "integrations": [
        {
          "wandb": {
            "project": "<string>",
            "name": "<string>",
            "entity": "<string>",
            "tags": ["<string>"]
          },
          "type": "wandb"
        },
        {
          "hf": {
            "output_repo_name": "<string>",
            "api_token": "<string>"
          },
          "type": "hf"
        }
      ],
      "object": "fine_tuning.job",
      "organization_id": "",
      "result_files": [],
      "seed": 0,
      "suffix": "<string>",
      "trained_tokens": 123,
      "validation_file": "<string>",
      "estimated_finish": 123,
      "trained_steps": 123,
      "total_steps": 123
    }
  ],
  "has_more": true,
  "object": "list"
}

Key fields to watch during a run:

status: validating_files → queued → running → succeeded / failed
trained_tokens: how many tokens have been processed so far
trained_steps / total_steps: progress of the training loop
error: structured error info when status = "failed"
result_files: IDs of produced artifacts (also available via checkpoints API)

Use these fields plus job events to drive your own monitoring, dashboards, or CI/CD automation around fine-tuning.

Get Started

AI Models Inference

Post-training

Utilities

Teams & Access Management

Other Capabilities

Integrations

Prerequisites

How to fine-tune a model

1. Install and import the client

2. Upload training (and optional validation) datasets

3. Configure fine-tuning parameters

4. Create and run the fine-tuning job

5. Poll job status

6. Inspect job events (optional but recommended)

7. Download checkpoints and model files

1. Upload datasets

2. Create a fine-tuning job

3. Check job status

4. Inspect job events

5. List checkpoints

6. Inspect file metadata

7. Download file contents

API specification for a fine-tuning job

Top-level fields

Hyperparameters

Fine-tuning job object (response shape)

Get Started

AI Models Inference

Post-training

Utilities

Teams & Access Management

Other Capabilities

Integrations

​Prerequisites

​How to fine-tune a model

​1. Install and import the client

​2. Upload training (and optional validation) datasets

​3. Configure fine-tuning parameters

​4. Create and run the fine-tuning job

​5. Poll job status

​6. Inspect job events (optional but recommended)

​7. Download checkpoints and model files

​1. Upload datasets

​2. Create a fine-tuning job

​3. Check job status

​4. Inspect job events

​5. List checkpoints

​6. Inspect file metadata

​7. Download file contents

​API specification for a fine-tuning job

​Top-level fields

​Hyperparameters

​Fine-tuning job object (response shape)

Prerequisites

How to fine-tune a model

1. Install and import the client

2. Upload training (and optional validation) datasets

3. Configure fine-tuning parameters

4. Create and run the fine-tuning job

5. Poll job status

6. Inspect job events (optional but recommended)

7. Download checkpoints and model files

1. Upload datasets

2. Create a fine-tuning job

3. Check job status

4. Inspect job events

5. List checkpoints

6. Inspect file metadata

7. Download file contents

API specification for a fine-tuning job

Top-level fields

Hyperparameters

Fine-tuning job object (response shape)