Fine-tune a base model on your own data using Nebius Token Factory. This guide walks you through creating a job, monitoring it, and retrieving checkpoints via API and Python.
Events help you understand the lifecycle (file validation, dataset processing, training progress).
if job.status == "succeeded": events = client.fine_tuning.jobs.list_events(job.id) for event in events.data: print(event.created_at, event.level, "-", event.message)
You can consider training finished when you see messages like:
Each checkpoint represents the model after a certain number of training steps (often per epoch).
if job.status == "succeeded": checkpoints = client.fine_tuning.jobs.checkpoints.list(job.id).data for checkpoint in checkpoints: print("Checkpoint:", checkpoint.id, "step:", checkpoint.step_number) os.makedirs(checkpoint.id, exist_ok=True) for file_id in checkpoint.result_files: # Get file metadata file_obj = client.files.retrieve(file_id) filename = file_obj.filename # e.g. "<checkpoint_ID>/adapter_config.json" # Download file contents file_content = client.files.content(file_id) # Save to disk with the same filename output_path = os.path.join(checkpoint.id, os.path.basename(filename)) file_content.write_to_file(output_path) print("Saved:", output_path)
You now have:
Intermediate checkpoints (per step / epoch)
Final checkpoint (usually the last one in the list)
Use the latest checkpoint for deployment unless you have a specific reason to pick an earlier one.You can now deploy your fine-tuned model and serve it via Nebius Token Factory.
Copy the content from the response and save it locally with the name from the filename field you retrieved in the previous step.Once you download the required checkpoint files, you can host the fine-tuned model and serve it via Nebius Token Factory.
validation_file (string, optional) ID of the file with the validation dataset. Same format and requirements as the training dataset.
hyperparameters (object, optional) Fine-tuning configuration. Omitted fields fall back to defaults.
seed (integer, optional) Random seed used during training. Using the same seed and the same data/hyperparameters improves reproducibility between runs.
integrations (array, optional) Third-party integrations configured for this job.
type (string, required) Currently supported: "wandb".
wandb (object, required when type = "wandb") Settings for exporting metrics to
All hyperparameters are nested under hyperparameters.
batch_size (integer, optional) Number of examples per training batch. Larger batch sizes are more efficient but require more VRAM.
Typical range: 8–32
Default: 8
context_length (integer, optional) Maximum sequence length in tokens used during fine-tuning. Inputs longer than this limit will cause errors.
Units: tokens (e.g., 8192)
Supported values depend on the base model; see the models page.
Default: 8192
We recommend:
Analyze the token length distribution of your dataset.
Choose the smallest context length that covers your P95–P99 examples.
If packing = false, a much larger context length choice than your examples leads to heavy padding and wasted compute.
Larger context lengths significantly increase VRAM usage and FLOPs due to attention scaling.
learning_rate (float, optional) Step size for gradient descent.
Must be >= 0
Typical values: 1e-6–5e-5
Default: 0.00001
n_epochs (integer, optional) Number of passes over the entire dataset.
Range: 1–20
Default: 3
More epochs increase task specialization but also overfitting risk.
warmup_ratio (float, optional) Fraction of total training steps used for linear warmup of the learning rate from 0 to the target value.
Range: 0–1
Default: 0
weight_decay (float, optional) L2 regularization factor applied to weights. Helps prevent overfitting and preserve generalization.
Must be >= 0
Default: 0
lora (boolean, optional) Whether to use LoRA (Low-Rank Adaptation) instead of full-parameter fine-tuning.
true: only LoRA adapter weights are trained; base model weights stay frozen.
false: full fine-tuning is applied.
Default: false
lora_r (integer, optional) Rank of LoRA matrices. Higher values increase capacity but also overfitting and cost.
Range: 8–128
Default: 8
lora_alpha (integer, optional) Scaling factor for LoRA updates. Higher values increase the impact of LoRA adapters.
Must be >= 8
Default: 8
lora_dropout (float, optional) Dropout applied to LoRA layers. Helps prevent overfitting, especially on small datasets.
Range: 0–1
Default: 0
packing (boolean, optional) If true, multiple shorter samples can be packed into a single sequence to better utilize the context window and improve efficiency.