This page lists:
- Which base models you can fine-tune
- Which context lengths they support
- Which fine-tuning types are available (LoRA vs full fine-tuning)
Model List
For each models listed below, Nebius Token Factory supports the followingcontext_length: 8192, 16384, 32768, 65536, 131072
Unless you override it via the context_length hyperparameter, the default context length for fine-tuning is 8192 tokens.
Check hyperparameter section for model details regarding context_length
OpenAI / Unsloth GPT-OSS
These models are OpenAI GPT-OSS weights (bf16) packaged by Unsloth.They are Apache 2.0–licensed and suitable for both research and commercial use (subject to the license). To convert the weights into MXFP4 please follow instructions here.
| Name | Training type | Model card / license |
|---|---|---|
| unsloth/gpt-oss-20b-BF16 (Model card) | LoRA and Full Parameter fine-tuning | Apache 2.0 |
| unsloth/gpt-oss-120b-BF16 (Model card) | LoRA and Full Parameter fine-tuning | Apache 2.0 |
For merging MoE LoRA adapter weights to deploy on Dedicated endpoints please follow the guide here.
Qwen
Nebius Token Factory supports dense, moe and coder variants across Qwen3 and Qwen2.5 families.All Qwen models below use the Apache 2.0 license (see each model card for details).
Qwen3 MoE
| Name | Training type | Model card / license |
|---|---|---|
| Qwen/Qwen3-235B-A22B (Model card) | LoRA and Full Parameter fine-tuning | Apache 2.0 |
| Qwen/Qwen3-235B-A22B-Instruct-2507 (Model card) | LoRA and Full Parameter fine-tuning | Apache 2.0 |
| Qwen/Qwen3-235B-A22B-Thinking-2507 (Model card) | LoRA and Full Parameter fine-tuning | Apache 2.0 |
| Qwen/Qwen3-30B-A3B (Model card) | LoRA and Full Parameter fine-tuning | Apache 2.0 |
| Qwen/Qwen3-30B-A3B-Instruct-2507 (Model card) | LoRA and Full Parameter fine-tuning | Apache 2.0 |
| Qwen/Qwen3-30B-A3B-Thinking-2507 (Model card) | LoRA and Full Parameter fine-tuning | Apache 2.0 |
Qwen3 coder
| Name | Training type | Model card / license |
|---|---|---|
| Qwen/Qwen3-Coder-30B-A3B-Instruct (Model card) | LoRA and Full Parameter fine-tuning | Apache 2.0 |
| Qwen/Qwen3-Coder-480B-A35B-Instruct (Model card) | LoRA and Full Parameter fine-tuning | Apache 2.0 |
Qwen3 dense + base
| Name | Training type | Model card / license |
|---|---|---|
| Qwen/Qwen3-32B (Model card) | LoRA and Full Parameter fine-tuning | Apache 2.0 |
| Qwen/Qwen3-14B (Model card) | LoRA and Full Parameter fine-tuning | Apache 2.0 |
| Qwen/Qwen3-14B-Base (Model card) | LoRA and Full Parameter fine-tuning | Apache 2.0 |
| Qwen/Qwen3-8B (Model card) | LoRA and Full Parameter fine-tuning | Apache 2.0 |
| Qwen/Qwen3-8B-Base (Model card) | LoRA and Full Parameter fine-tuning | Apache 2.0 |
| Qwen/Qwen3-4B (Model card) | LoRA and Full Parameter fine-tuning | Apache 2.0 |
| Qwen/Qwen3-4B-Base (Model card) | LoRA and Full Parameter fine-tuning | Apache 2.0 |
| Qwen/Qwen3-1.7B (Model card) | LoRA and Full Parameter fine-tuning | Apache 2.0 |
| Qwen/Qwen3-1.7B-Base (Model card) | LoRA and Full Parameter fine-tuning | Apache 2.0 |
| Qwen/Qwen3-0.6B (Model card) | LoRA and Full Parameter fine-tuning | Apache 2.0 |
| Qwen/Qwen3-0.6B-Base (Model card) | LoRA and Full Parameter fine-tuning | Apache 2.0 |
Qwen3.5
| Name | Training type | Model card / license |
|---|---|---|
| Qwen/Qwen3.5-27B (Model card) | Full Parameter fine-tuning | Apache 2.0 |
Qwen3.6
| Name | Training type | Model card / license |
|---|---|---|
| Qwen/Qwen3.6-27B (Model card) | Full Parameter fine-tuning | Apache 2.0 |
Qwen2.5 dense + coder
| Name | Training type | Model card / license |
|---|---|---|
| Qwen/Qwen2.5-0.5B (Model card) | LoRA and Full Parameter fine-tuning | Apache 2.0 |
| Qwen/Qwen2.5-0.5B-Instruct (Model card) | LoRA and Full Parameter fine-tuning | Apache 2.0 |
| Qwen/Qwen2.5-7B (Model card) | LoRA and Full Parameter fine-tuning | Apache 2.0 |
| Qwen/Qwen2.5-7B-Instruct (Model card) | LoRA and Full Parameter fine-tuning | Apache 2.0 |
| Qwen/Qwen2.5-14B (Model card) | LoRA and Full Parameter fine-tuning | Apache 2.0 |
| Qwen/Qwen2.5-14B-Instruct (Model card) | LoRA and Full Parameter fine-tuning | Apache 2.0 |
| Qwen/Qwen2.5-32B (Model card) | LoRA and Full Parameter fine-tuning | Apache 2.0 |
| Qwen/Qwen2.5-32B-Instruct (Model card) | LoRA and Full Parameter fine-tuning | Apache 2.0 |
| Qwen/Qwen2.5-72B (Model card) | LoRA and Full Parameter fine-tuning | Apache 2.0 |
| Qwen/Qwen2.5-72B-Instruct (Model card) | LoRA and Full Parameter fine-tuning | Apache 2.0 |
| Qwen/Qwen2.5-Coder-32B (Model card) | LoRA and Full Parameter fine-tuning | Apache 2.0 |
| Qwen/Qwen2.5-Coder-32B-Instruct (Model card) | LoRA and Full Parameter fine-tuning | Apache 2.0 |
DeepSeek
Nebius Token Factory integrates DeepSeek V3 models for high-capacity reasoning workloads.DeepSeek V3 and its variants are released under the MIT License (see model cards for details).
| Name | Supported fine-tuning type | Model card / license |
|---|---|---|
| deepseek-ai/DeepSeek-V3-0324 (Model card) | Full fine-tuning | MIT License |
| deepseek-ai/DeepSeek-V3.1 (Model card) | Full fine-tuning | MIT License |
Meta (Llama 3.1 / 3.2 / 3.3)
Nebius Token Factory and the Meta models hosted in the service are built on the Llama 3.1, Llama 3.2, and Llama 3.3 families. For acceptable use, see Meta’s policies:| Name | Training type | Model card / license |
|---|---|---|
| meta-llama/Meta-Llama-3.1-8B-Instruct (Model card) | LoRA and Full Parameter fine-tuning | Llama 3.1 Community License Agreement |
| meta-llama/Meta-Llama-3.1-8B (Model card) | LoRA and Full Parameter fine-tuning | Llama 3.1 Community License Agreement |
| meta-llama/Llama-3.1-70B-Instruct (Model card) | LoRA and Full Parameter fine-tuning | Llama 3.1 Community License Agreement |
| meta-llama/Llama-3.1-70B (Model card) | LoRA and Full Parameter fine-tuning | Llama 3.1 Community License Agreement |
| meta-llama/Llama-3.2-1B-Instruct (Model card) | LoRA and Full Parameter fine-tuning | Llama 3.2 Community License Agreement |
| meta-llama/Llama-3.2-1B (Model card) | LoRA and Full Parameter fine-tuning | Llama 3.2 Community License Agreement |
| meta-llama/Llama-3.2-3B-Instruct (Model card) | LoRA and Full Parameter fine-tuning | Llama 3.2 Community License Agreement |
| meta-llama/Llama-3.2-3B (Model card) | LoRA and Full Parameter fine-tuning | Llama 3.2 Community License Agreement |
| meta-llama/Llama-3.3-70B-Instruct (Model card) | LoRA and Full Parameter fine-tuning | Llama 3.3 Community License Agreement |