How to fine-tune your custom model

Prerequisites

Choose one of the models supported for fine-tuning.
Create a dataset for training. You can optionally create an additional dataset for validation. Split the data between two datasets as 80–90% for training and 10–20% for validation. Requirements for validation datasets are the same as for training datasets.
Create an API key for authentication.
Save the API key to an environment variable:
```
export NEBIUS_API_KEY=<API_key>
```

How to fine-tune a model

Python
cURL

Using Python : Install openai package:
```
pip3 install openai
```

Import essential librarires

import os
from openai import OpenAI
import time

Set up Nebius API key

client = OpenAI(
    base_url="https://api.tokenfactory.nebius.com/v1/",
    api_key=os.environ.get("NEBIUS_API_KEY"),
)

Upload a training and a validation dataset. The validation dataset is optional.

# Upload a training dataset
training_dataset = client.files.create(
    file=open("<dataset_name>.jsonl", "rb"), # Specify the dataset name
    purpose="fine-tune"
)

# Upload a validation dataset
validation_dataset = client.files.create(
    file=open("<dataset_name>.jsonl", "rb"), # Specify the dataset name
    purpose="fine-tune"
)

Configure Fine-tuning parameters. For more information about the tuning job parameters, see the specification of the fine-tuning job object.

# Fine-tuning job parameters
job_request = {
    "model": "<...>",
    "training_file": "training_dataset.id",
    "validation_file": "validation_dataset.id",
    "hyperparameters": {
        "batch_size": "<...>",
        "learning_rate_multiplier": "<...>",
        "n_epochs": "<...>",
        "warmup_ratio": "<...>",
        "weight_decay": "<...>",
        "lora": "<True|False>",
        "lora_r": "<...>",
        "lora_alpha": "<...>",
        "lora_dropout": "<...>",
        "packing": "<True|False>",
        "max_grad_norm": "<...>",
    },
    "integrations": [{
            "type": "wandb",
            "wandb": {
                "api_key": "<...>",
                "project": "<...>"
            }
    }]
}

Create and run the finetuning job.

# Create and run the fine-tuning job
job = client.fine_tuning.jobs.create(**job_request)

Checks that the job status.
```
# Check for the job status
active_statuses = ["validating_files", "queued", "running"]
while job.status in active_statuses:
    time.sleep(15)
    job = client.fine_tuning.jobs.retrieve(job.id)
    print("current status is", job.status)

print("Job ID:", job.id)
```
The status of a freshly started job is running. The script polls the status periodically to make sure that the job status has changed to succeeded. The minimum time window between subsequent polls is 15 seconds. If the status is failed, examine the output. It describes the error and how to fix it. If the error code is 500, resubmit the job. Checks that the training has been successful. To do this, check the job events. They are created when the job status changes. You can consider the training as finished if the response contains either the Dataset processed successfully or Training completed successfully message.

Retrieves the contents of the files with the fine-tuned model.

if job.status == "succeeded":
    # Check the job events
    events = client.fine_tuning.jobs.list_events(job.id)
    print(events)

    for checkpoint in client.fine_tuning.jobs.checkpoints.list(job.id).data:
        print("Checkpoint ID:", checkpoint.id)

        # Create a directory for every checkpoint
        os.makedirs(checkpoint.id, exist_ok=True)

        for model_file_id in checkpoint.result_files:
            # Get the name of a model file
            filename = client.files.retrieve(model_file_id).filename

            # Retrieve the contents of the file
            file_content = client.files.content(model_file_id)

            # Save the contents into a local file
            file_content.write_to_file(filename)

You get the files for every fine-tuning checkpoint. A checkpoint is created after every epoch of training a model, so you get intermediate results of the training. If you need final results, use the files from the last checkpoint. Saves the contents to files. The script creates a directory per checkpoint and saves the files into these directories.

Upload a file with a training dataset:

curl 'https://api.tokenfactory.nebius.com/v1/files' \
   -H 'Accept: application/json' \
   -H 'Content-Type: multipart/form-data' \
   -H "Authorization: Bearer $NEBIUS_API_KEY"
   -F 'file=@<dataset-name>.jsonl' \
   -F 'purpose=fine-tune'

Specify the file name in the request.

Response Example

{
   "id": "<file_ID>",
   "bytes": 700867,
   "created_at": 1738235422,
   "filename": "training-dataset.jsonl",
   "object": "file",
   "purpose": "fine-tune"
}

Save the file ID; it is required to create a fine-tuning job.
Optionally to upload a file with validation dataset, use the same request as for the training dataset and save the file ID from the response.

Create a fine-tuning job by using the Nebius Token Factory API:

curl 'https://api.tokenfactory.nebius.com/v1/fine_tuning/jobs' \
    -X 'POST' \
    -H 'Accept: application/json' \
    -H 'Content-Type: application/json' \
    -H "Authorization: Bearer $NEBIUS_API_KEY" \
    -d '{
        "model": "<...>",
        "suffix": "<...>",
        "training_file": "<...>",
        "validation_file": "<...>",
        "hyperparameters": {
            "batch_size": "<...>",
            "learning_rate": "<...>",
            "n_epochs": "<...>",
            "warmup_ratio": "<...>",
            "weight_decay": "<...>",
            "lora": "<true|false>",
            "lora_r": "<...>",
            "lora_alpha": "<...>",
            "lora_dropout": "<...>",
            "packing": "<true|false>",
            "max_grad_norm": "<...>"
        },
        "seed": "<...>",
        "integrations": [
            {
                "type": "wandb",
                "wandb": {
                    "api_key": "<...>",
                    "project": "<...>"
                }
            }
        ]
    }'

For more information about the fine-tuning job parameters, see the API specification of the fine-tuning job object below.Now, you can use these files to host the fine-tuned model and work with it.

Make sure that the job status is succeeded. To do this, request information about this job:

curl 'https://api.tokenfactory.nebius.com/v1/fine_tuning/jobs/<job_ID>' \
   -X 'GET' \
   -H 'Accept: application/json' \
   -H 'Authorization: Bearer $NEBIUS_API_KEY'

Specify the job ID in the endpoint. The status of a freshly started job is running. You can poll the status periodically to make sure that the job status has changed to succeeded. Do not send the requests more frequently than 15 seconds each. If the status is failed, examine the output. It describes the error and how to fix it. If the error code is 500, resubmit the job.

200 0K response example

 {
  "id": "<job_ID>",
  "created_at": 1738250578,
  "error": null,
  "finished_at": null,
  "hyperparameters": {
    "batch_size": 10,
    "learning_rate": 0.0001,
    "n_epochs": 5,
    "warmup_ratio": 0.5,
    "weight_decay": 0,
    "lora": true,
    "lora_r": 16,
    "lora_alpha": 16,
    "lora_dropout": 0.05,
    "packing": true,
    "max_grad_norm": 1
  },
  "model": "<model_name>",
  "object": "fine_tuning.job",
  "organization_id": "",
  "result_files": [],
  "seed": 0,
  "status": "running",
  "trained_tokens": 0,
  "training_file": "<file_ID>",
  "validation_file": "<file_ID>",
  "estimated_finish": null,
  "suffix": ""
}

Make sure that the training was successful. To do this, check the job events. They are created when the job status changes.

curl 'https://api.tokenfactory.nebius.com/v1/fine_tuning/jobs/<job_ID>/events' \
   -X 'GET' \
   -H 'Accept: application/json' \
   -H 'Authorization: Bearer $NEBIUS_API_KEY' \
   --url-query limit=<...> \
   --url-query after=<...>

You can add the following query parameters:

limit (integer, optional): Number of events to return.
after (string, optional): Pagination ID. Points to the event from which the response should continue.

You can consider the training as finished if the response contains either the Dataset processed successfully or Training completed successfully message.

200 OK response example

{
  "data": [
    {
      "object": "fine_tuning.job.event",
      "id": "<event_ID>",
      "created_at": 1738250578,
      "level": "info",
      "message": "Job is submitted",
      "source": "api",
      "job_uuid": "<job_ID>"
    },
    {
      "object": "fine_tuning.job.event",
      "id": "<event_ID>",
      "created_at": 1738250609,
      "level": "info",
      "message": "Dataset 'external/tokenfactory/upload/google-oauth2|105015164235234823556/<file_ID>' processed successfully",
      "source": "datasets",
      "job_uuid": "<job_ID>"
    },
    {
      "object": "fine_tuning.job.event",
      "id": "<event_ID>",
      "created_at": 1738250610,
      "level": "info",
      "message": "Dataset 'external/tokenfactory/upload/google-oauth2|105015164235234823556/<file_ID>' processed successfully",
      "source": "datasets",
      "job_uuid": "<job_ID>"
    }
  ],
  "has_more": false
}

To get the files with the fine-tuned model, first get a list of checkpoints. This list contains the IDs of the required files. A checkpoint is created after every epoch of training a model, so you can get intermediate results of the training. If you need final results, use the files from the last checkpoint. To get the checkpoint list, send the following request:

curl 'https://api.tokenfactory.nebius.com/v1/fine_tuning/jobs/<job_ID>/checkpoints' \
   -X 'GET' \
   -H 'Accept: application/json' \
   -H 'Authorization: Bearer $NEBIUS_API_KEY'

200 OK response example

{
  "object": "list",
  "data": [
    {
      "id": "<checkpoint_ID>",
      "created_at": 1740501233,
      "fine_tuned_model_checkpoint": "ft:meta-llama/Llama-3.1-8B-Instruct-2025-02-25:org_placeholder::IDPlaceholder:ckpt-step-3",
      "fine_tuning_job_id": "<job_id>",
      "metrics": {
        "train_loss": 2.0120839999999998,
        "valid_loss": 2.3227109909057617
      },
      "object": "fine_tuning.job.checkpoint",
      "step_number": 3,
      "result_files": [
        "<file_ID_1>",
        "<file_ID_2>",
        "<file_ID_3>"
      ]
    },
    ...
  ],
  "first_id": "<first_checkpoint_ID>",
  "last_id": "<last_checkpoint_ID>",
  "has_more": false
}

The IDs of the model files are specified in the data.result_files field.

Find out the name and extension of every required file:

curl 'https://api.tokenfactory.nebius.com/v1/files/<file_ID>' \
   -X 'GET' \
   -H 'Accept: application/json' \
   -H 'Authorization: Bearer $NEBIUS_API_KEY'

Send this request for every file of the checkpoint that you need.

200 OK response example

{
  "id": "<file_ID>",
  "bytes": 907,
  "created_at": 1740501244,
  "filename": "<checkpoint_ID>/adapter_config.json",
  "object": "file",
  "purpose": "fine-tune"
}

Use the filename field to save the files properly. For example, if file A has adapter_config.json as its filename, save this file contents as the adapter_config.json file.

Retrieve the contents of the files with the fine-tuned model:

curl 'https://api.tokenfactory.nebius.com/v1/files/<file_ID>/content' \
   -X 'GET' \
   -H 'Accept: application/json' \
   -H 'Authorization: Bearer $NEBIUS_API_KEY'

Send this request for every file of the checkpoint that you need.

Copy the file contents from the response, and then save the files by using a proper name and extension (see the filenamefield in the output of the previous request).

Now, you can use these files to host the fine-tuned model and work with it.

API specification for a fine-tuning job

The object below represents the fine-tuning job specification used in the API.

{
    "model": "<...>",
    "suffix": "<...>",
    "training_file": "<...>",
    "validation_file": "<...>",
    "hyperparameters": {
        "batch_size": "<...>",
        "learning_rate_multiplier": "<...>",
        "n_epochs": "<...>",
        "warmup_ratio": "<...>",
        "weight_decay": "<...>",
        "lora": "<true|false>",
        "lora_r": "<...>",
        "lora_alpha": "<...>",
        "lora_dropout": "<...>",
        "packing": "<true|false>",
        "max_grad_norm": "<...>"
    },
    "seed": "<...>",
    "integrations": [
        {
            "type": "wandb",
            "wandb": {
                "api_key": "<...>",
                "project": "<...>"
            }
        }
    ]
}

model (string, required): Model to fine-tune.
suffix (string, optional): Suffix added to the model name (for example, my-modelor my-experiment). It helps you differentiate between fine-tuned models in their list.
training_file (string, required): ID of the file with the training dataset. For more information about how to prepare and upload datasets and how to get their IDs, see the following instructions:
- How to create a dataset for fine-tuning in Nebius Token Factory
- How to fine-tune a model
validation_file (string, optional): ID of the file with the validation dataset.
hyperparameters (object, optional): Fine-tuning parameters:
- batch_size (integer, optional): Number of training examples used in a batch for fine-tuning. A bigger batch size works better with bigger datasets. From 8 to 32. Default: 8.
- learning_rate (float, optional): Learning rate for training. If you train a model in a domain in which the model has not been trained before, you may need a higher learning rate. Greater or equal to 0. Default: 0.00001.
- n_epochs (integer, optional): Number of epochs to train on the dataset. An epoch is a cycle of going through the whole dataset for training. For example, if the number of epochs is 10, the model is trained on a given dataset 10 times. From 1 to 20. Default: 3.
- warmup_ratio (float, optional): Percentage by which the learning rate should increase from the beginning of training. From 0 to 1. Default: 0.
- weight_decay (float, optional): Weight decay value. Weight decay is a regularization technique that adds a penalty to the loss function and keeps fine-tuning weights small. This approach prevents overfitting and preserves generalization, so it is better suited for larger models or more complex tasks. Greater or equal to 0. Default: 0.
- lora (boolean, optional): Whether to enable LoRA (Low-Rank Adaptation) for training. The LoRA method presumes that low-rank matrices are inserted into a pre-trained model. These matrices catch task-specific data during the training. As a result, you only train these matrices; you do not need to retrain the whole model and modify any preset fine-tuning parameters. If false, full fine-tuning is performed. Default: false.
- lora_r (integer, optional): Rank for weights of LoRA adapters. A larger rank captures more pre-existing model weights for training. Eventually, the model is trained better, especially if it is trained for a task for which it has not been trained before. However, ranking too high can cause overfitting. From 8 to 128. Default: 8.
- lora_alpha (integer, optional): Alpha value for training LoRA adapters. This parameter balances the influence of low-rank LoRA matrices on pre-existing model weights. If only a slight adjustment of a model is required, use a lower value. Greater or equal to 8. Default: 8.
- lora_dropout (float, optional): LoRA dropout rate. LoRA dropout is a regularization technique that randomly omits a fraction of the model’s LoRA parameters during training. As a result, this technique helps avoid overfitting on the dataset, especially in cases when the dataset is small and the model should suit more general tasks. From 0 to 1. Default: 0.
- packing (boolean, optional): Whether to use packing for training. With packing enabled, you can combine multiple small samples in a batch instead of having one sample per batch. This increases training efficiency. Default: true.
- max_grad_norm (float, optional): Maximum gradient norm value used for gradient clipping. Make sure that the value is not too large or small:
  - A value that is too large causes the vanishing gradient problem. It happens when weight gradients become too small during backpropagation. As a result, the model cannot learn quickly enough.
  - A value that is too small causes the exploding gradient problem, which is the opposite to the vanishing gradient problem. Explosion happens when weight gradients get large. As a result, it leads to unstable and unoptimized training.
  Greater or equal to 0. Default: 1.
seed (integer, optional): Control of the LLM output reproduction. If you pass along the same seed in different requests, you achieve approximately the same results. If you use the same seed but different values of other parameters, the results of your requests might differ.
integrations (array, optional): Integrations that Nebius Token Factory supports for fine-tuning:
- type (string, optional): Integration type. The possible values are the following:
  - wandb: Integration with Weights & Biases.
  You can export the model training metrics to a project in Weights & Biases. Nebius Token Factory exports the metrics after you create a fine-tuning job. The service does not export system metrics or logs.
- wandb (object, optional): Settings for the export to a project in Weights & Biases:
  - api_key (string, optional): API key from Weights & Biases. The key should be 40 characters long.
  - project (string, optional): Name of the project in Weights & Biases.

Get Started

AI Models Inference

Observability

Post-training

Data Lab

Teams & Access Management

Integrations

Prerequisites

How to fine-tune a model

API specification for a fine-tuning job

Get Started

AI Models Inference

Observability

Post-training

Data Lab

Teams & Access Management

Integrations

Documentation Index

​Prerequisites

​How to fine-tune a model

​API specification for a fine-tuning job

Prerequisites

How to fine-tune a model

API specification for a fine-tuning job