> ## Documentation Index
> Fetch the complete documentation index at: https://docs.tokenfactory.nebius.com/llms.txt
> Use this file to discover all available pages before exploring further.

# Models for fine-tuning in Nebius Token Factory

> Supported base models for fine-tuning in Nebius Token Factory, with available context lengths and fine-tuning types (LoRA and full fine-tuning), grouped by provider.

Nebius Token Factory supports fine-tuning on multiple open-weight model families.<br />This page lists:

* Which **base models** you can fine-tune
* Which **context lengths** they support
* Which **fine-tuning types** are available (LoRA vs full fine-tuning)

***

## Model List

For each models listed below, Nebius Token Factory supports the following <br /><br />`context_length: 8192`, `16384`, `32768`, `65536`, `131072`

Unless you override it via the `context_length` hyperparameter, the default context length for fine-tuning is **8192 tokens**.

Check hyperparameter section for model details regarding `context_length`

***

## DeepSeek

Nebius Token Factory integrates DeepSeek V3 and V4 models for high-capacity reasoning workloads.<br />DeepSeek and its variants are released under the **MIT License** (see model cards for details).

| Name                                                                                                    | Supported fine-tuning type | Model card / license                                                                  |
| :------------------------------------------------------------------------------------------------------ | :------------------------- | :------------------------------------------------------------------------------------ |
| deepseek-ai/DeepSeek-V3-0324<br />([Model card](https://huggingface.co/deepseek-ai/DeepSeek-V3-0324))   | Full fine-tuning           | [MIT License](https://huggingface.co/deepseek-ai/DeepSeek-V3-0324/blob/main/LICENSE)  |
| deepseek-ai/DeepSeek-V4-Flash<br />([Model card)](https://huggingface.co/deepseek-ai/DeepSeek-V4-Flash) | Full fine-tuning           | [MIT License](https://huggingface.co/deepseek-ai/DeepSeek-V4-Flash/blob/main/LICENSE) |

***

## Google

Nebius Token Factory supports full fine-tuning for select Google Gemma 4 instruction-tuned models.

These models are available under the Apache 2.0 License. Review the corresponding model card for full license terms and usage details.

| Name                                                                                    | Supported fine-tuning type | License                                                       |
| :-------------------------------------------------------------------------------------- | :------------------------- | :------------------------------------------------------------ |
| google/gemma-4-E2B-it<br />([Model card](https://huggingface.co/google/gemma-4-E2B-it)) | Full fine-tuning           | [Apache 2.0](https://choosealicense.com/licenses/apache-2.0/) |
| google/gemma-4-E4B-it<br />([Model card)](https://huggingface.co/google/gemma-4-E4B-it) | Full fine-tuning           | [Apache 2.0](https://choosealicense.com/licenses/apache-2.0/) |
| google/gemma-4-31B-it<br />[(Model card)](https://huggingface.co/google/gemma-4-31B-it) | Full fine-tuning           | [Apache 2.0](https://choosealicense.com/licenses/apache-2.0/) |

***

## OpenAI / Unsloth GPT-OSS

These models are OpenAI GPT-OSS weights (bf16) packaged by Unsloth.<br />They are Apache 2.0–licensed and suitable for both research and commercial use (subject to the license).

To convert the weights into MXFP4 please follow instructions [here](https://docs.tokenfactory.nebius.com/post-training/weights-conversion).

| Name                                                                                            | Training type                       | License                                                       |
| :---------------------------------------------------------------------------------------------- | :---------------------------------- | :------------------------------------------------------------ |
| unsloth/gpt-oss-20b-BF16<br />([Model card](https://huggingface.co/unsloth/gpt-oss-20b-BF16))   | LoRA and Full Parameter fine-tuning | [Apache 2.0](https://choosealicense.com/licenses/apache-2.0/) |
| unsloth/gpt-oss-120b-BF16<br />([Model card](https://huggingface.co/unsloth/gpt-oss-120b-BF16)) | LoRA and Full Parameter fine-tuning | [Apache 2.0](https://choosealicense.com/licenses/apache-2.0/) |

<Note>
  For merging MoE LoRA adapter weights to deploy on Dedicated endpoints please follow the guide [here](https://docs.tokenfactory.nebius.com/post-training/merge-moe-lora-weights).
</Note>

***

## Qwen

Nebius Token Factory supports **dense, moe** and **coder** variants across Qwen3 and Qwen2.5 families.<br />All Qwen models below use the **Apache 2.0** license (see each model card for details).

### Qwen3 MoE

| Name                                                                                                              | Training type                       | License                                                                                   |
| :---------------------------------------------------------------------------------------------------------------- | :---------------------------------- | :---------------------------------------------------------------------------------------- |
| Qwen/Qwen3-235B-A22B<br />([Model card](https://huggingface.co/Qwen/Qwen3-235B-A22B))                             | LoRA and Full Parameter fine-tuning | [Apache 2.0](https://huggingface.co/Qwen/Qwen3-235B-A22B/blob/main/LICENSE)               |
| Qwen/Qwen3-235B-A22B-Instruct-2507<br />([Model card](https://huggingface.co/Qwen/Qwen3-235B-A22B-Instruct-2507)) | LoRA and Full Parameter fine-tuning | [Apache 2.0](https://huggingface.co/Qwen/Qwen3-235B-A22B-Instruct-2507/blob/main/LICENSE) |
| Qwen/Qwen3-235B-A22B-Thinking-2507<br />([Model card](https://huggingface.co/Qwen/Qwen3-235B-A22B-Thinking-2507)) | LoRA and Full Parameter fine-tuning | [Apache 2.0](https://huggingface.co/Qwen/Qwen3-235B-A22B-Thinking-2507/blob/main/LICENSE) |
| Qwen/Qwen3-30B-A3B<br />([Model card](https://huggingface.co/Qwen/Qwen3-30B-A3B))                                 | LoRA and Full Parameter fine-tuning | [Apache 2.0](https://huggingface.co/Qwen/Qwen3-30B-A3B/blob/main/LICENSE)                 |
| Qwen/Qwen3-30B-A3B-Instruct-2507<br />([Model card](https://huggingface.co/Qwen/Qwen3-30B-A3B-Instruct-2507))     | LoRA and Full Parameter fine-tuning | [Apache 2.0](https://huggingface.co/Qwen/Qwen3-30B-A3B-Instruct-2507/blob/main/LICENSE)   |
| Qwen/Qwen3-30B-A3B-Thinking-2507<br />([Model card](https://huggingface.co/Qwen/Qwen3-30B-A3B-Thinking-2507))     | LoRA and Full Parameter fine-tuning | [Apache 2.0](https://huggingface.co/Qwen/Qwen3-30B-A3B-Thinking-2507/blob/main/LICENSE)   |

### Qwen3 coder

| Name                                                                                                                | Training type                       | License                                                                                    |
| :------------------------------------------------------------------------------------------------------------------ | :---------------------------------- | :----------------------------------------------------------------------------------------- |
| Qwen/Qwen3-Coder-30B-A3B-Instruct<br />([Model card](https://huggingface.co/Qwen/Qwen3-Coder-30B-A3B-Instruct))     | LoRA and Full Parameter fine-tuning | [Apache 2.0](https://huggingface.co/Qwen/Qwen3-Coder-30B-A3B-Instruct/blob/main/LICENSE)   |
| Qwen/Qwen3-Coder-480B-A35B-Instruct<br />([Model card](https://huggingface.co/Qwen/Qwen3-Coder-480B-A35B-Instruct)) | LoRA and Full Parameter fine-tuning | [Apache 2.0](https://huggingface.co/Qwen/Qwen3-Coder-480B-A35B-Instruct/blob/main/LICENSE) |

### Qwen3 dense + base

| Name                                                                                  | Training type                       | License                                                       |
| :------------------------------------------------------------------------------------ | ----------------------------------- | :------------------------------------------------------------ |
| Qwen/Qwen3-32B<br />([Model card](https://huggingface.co/Qwen/Qwen3-32B))             | LoRA and Full Parameter fine-tuning | [Apache 2.0](https://choosealicense.com/licenses/apache-2.0/) |
| Qwen/Qwen3-14B<br />([Model card](https://huggingface.co/Qwen/Qwen3-14B))             | LoRA and Full Parameter fine-tuning | [Apache 2.0](https://choosealicense.com/licenses/apache-2.0/) |
| Qwen/Qwen3-14B-Base<br />([Model card](https://huggingface.co/Qwen/Qwen3-14B-Base))   | LoRA and Full Parameter fine-tuning | [Apache 2.0](https://choosealicense.com/licenses/apache-2.0/) |
| Qwen/Qwen3-8B<br />([Model card](https://huggingface.co/Qwen/Qwen3-8B))               | LoRA and Full Parameter fine-tuning | [Apache 2.0](https://choosealicense.com/licenses/apache-2.0/) |
| Qwen/Qwen3-8B-Base<br />([Model card](https://huggingface.co/Qwen/Qwen3-8B-Base))     | LoRA and Full Parameter fine-tuning | [Apache 2.0](https://choosealicense.com/licenses/apache-2.0/) |
| Qwen/Qwen3-4B<br />([Model card](https://huggingface.co/Qwen/Qwen3-4B))               | LoRA and Full Parameter fine-tuning | [Apache 2.0](https://choosealicense.com/licenses/apache-2.0/) |
| Qwen/Qwen3-4B-Base<br />([Model card](https://huggingface.co/Qwen/Qwen3-4B-Base))     | LoRA and Full Parameter fine-tuning | [Apache 2.0](https://choosealicense.com/licenses/apache-2.0/) |
| Qwen/Qwen3-1.7B<br />([Model card](https://huggingface.co/Qwen/Qwen3-1.7B))           | LoRA and Full Parameter fine-tuning | [Apache 2.0](https://choosealicense.com/licenses/apache-2.0/) |
| Qwen/Qwen3-1.7B-Base<br />([Model card](https://huggingface.co/Qwen/Qwen3-1.7B-Base)) | LoRA and Full Parameter fine-tuning | [Apache 2.0](https://choosealicense.com/licenses/apache-2.0/) |
| Qwen/Qwen3-0.6B<br />([Model card](https://huggingface.co/Qwen/Qwen3-0.6B))           | LoRA and Full Parameter fine-tuning | [Apache 2.0](https://choosealicense.com/licenses/apache-2.0/) |
| Qwen/Qwen3-0.6B-Base<br />([Model card](https://huggingface.co/Qwen/Qwen3-0.6B-Base)) | LoRA and Full Parameter fine-tuning | [Apache 2.0](https://choosealicense.com/licenses/apache-2.0/) |

### Qwen3.5

| Name                                                                          | Training type              | License                                                       |
| :---------------------------------------------------------------------------- | :------------------------- | :------------------------------------------------------------ |
| Qwen/Qwen3.5-27B<br />[(Model card)](https://huggingface.co/Qwen/Qwen3.5-27B) | Full Parameter fine-tuning | [Apache 2.0](https://choosealicense.com/licenses/apache-2.0/) |

### Qwen3.6

| Name                                                                          | Training type              | License                                                       |
| :---------------------------------------------------------------------------- | :------------------------- | :------------------------------------------------------------ |
| Qwen/Qwen3.6-27B<br />[(Model card)](https://huggingface.co/Qwen/Qwen3.6-27B) | Full Parameter fine-tuning | [Apache 2.0](https://choosealicense.com/licenses/apache-2.0/) |

### Qwen2.5 dense + coder

| Name                                                                                                        | Training type                       | License                                                       |
| :---------------------------------------------------------------------------------------------------------- | :---------------------------------- | :------------------------------------------------------------ |
| Qwen/Qwen2.5-0.5B<br />([Model card](https://huggingface.co/Qwen/Qwen2.5-0.5B))                             | LoRA and Full Parameter fine-tuning | [Apache 2.0](https://choosealicense.com/licenses/apache-2.0/) |
| Qwen/Qwen2.5-0.5B-Instruct<br />([Model card](https://huggingface.co/Qwen/Qwen2.5-0.5B-Instruct))           | LoRA and Full Parameter fine-tuning | [Apache 2.0](https://choosealicense.com/licenses/apache-2.0/) |
| Qwen/Qwen2.5-7B<br />([Model card](https://huggingface.co/Qwen/Qwen2.5-7B))                                 | LoRA and Full Parameter fine-tuning | [Apache 2.0](https://choosealicense.com/licenses/apache-2.0/) |
| Qwen/Qwen2.5-7B-Instruct<br />([Model card](https://huggingface.co/Qwen/Qwen2.5-7B-Instruct))               | LoRA and Full Parameter fine-tuning | [Apache 2.0](https://choosealicense.com/licenses/apache-2.0/) |
| Qwen/Qwen2.5-14B<br />([Model card](https://huggingface.co/Qwen/Qwen2.5-14B))                               | LoRA and Full Parameter fine-tuning | [Apache 2.0](https://choosealicense.com/licenses/apache-2.0/) |
| Qwen/Qwen2.5-14B-Instruct<br />([Model card](https://huggingface.co/Qwen/Qwen2.5-14B-Instruct))             | LoRA and Full Parameter fine-tuning | [Apache 2.0](https://choosealicense.com/licenses/apache-2.0/) |
| Qwen/Qwen2.5-32B<br />([Model card](https://huggingface.co/Qwen/Qwen2.5-32B))                               | LoRA and Full Parameter fine-tuning | [Apache 2.0](https://choosealicense.com/licenses/apache-2.0/) |
| Qwen/Qwen2.5-32B-Instruct<br />([Model card](https://huggingface.co/Qwen/Qwen2.5-32B-Instruct))             | LoRA and Full Parameter fine-tuning | [Apache 2.0](https://choosealicense.com/licenses/apache-2.0/) |
| Qwen/Qwen2.5-72B<br />([Model card](https://huggingface.co/Qwen/Qwen2.5-72B))                               | LoRA and Full Parameter fine-tuning | [Apache 2.0](https://choosealicense.com/licenses/apache-2.0/) |
| Qwen/Qwen2.5-72B-Instruct<br />([Model card](https://huggingface.co/Qwen/Qwen2.5-72B-Instruct))             | LoRA and Full Parameter fine-tuning | [Apache 2.0](https://choosealicense.com/licenses/apache-2.0/) |
| Qwen/Qwen2.5-Coder-32B<br />([Model card](https://huggingface.co/Qwen/Qwen2.5-Coder-32B))                   | LoRA and Full Parameter fine-tuning | [Apache 2.0](https://choosealicense.com/licenses/apache-2.0/) |
| Qwen/Qwen2.5-Coder-32B-Instruct<br />([Model card](https://huggingface.co/Qwen/Qwen2.5-Coder-32B-Instruct)) | LoRA and Full Parameter fine-tuning | [Apache 2.0](https://choosealicense.com/licenses/apache-2.0/) |

***

## Meta (Llama 3.1 / 3.2 / 3.3)

Nebius Token Factory and the Meta models hosted in the service are built on the **Llama 3.1**, **Llama 3.2**, and **Llama 3.3** families.

For acceptable use, see Meta’s policies:

* [Llama 3.1 Acceptable Use Policy](https://llama.meta.com/llama3_1/use-policy/)
* [Llama 3.2 Acceptable Use Policy](https://www.llama.com/llama3_2/use-policy/)
* [Llama 3.3 Acceptable Use Policy](https://www.llama.com/llama3_3/use-policy/)

| Name                                                                                                                    | Training type                       | License                                                                                                               |
| :---------------------------------------------------------------------------------------------------------------------- | :---------------------------------- | :-------------------------------------------------------------------------------------------------------------------- |
| meta-llama/Meta-Llama-3.1-8B-Instruct<br />([Model card](https://huggingface.co/meta-llama/Meta-Llama-3.1-8B-Instruct)) | LoRA and Full Parameter fine-tuning | [Llama 3.1 Community License Agreement](https://github.com/meta-llama/llama-models/blob/main/models/llama3_1/LICENSE) |
| meta-llama/Meta-Llama-3.1-8B<br />([Model card](https://huggingface.co/meta-llama/Llama-3.1-8B))                        | LoRA and Full Parameter fine-tuning | [Llama 3.1 Community License Agreement](https://github.com/meta-llama/llama-models/blob/main/models/llama3_1/LICENSE) |
| meta-llama/Llama-3.1-70B-Instruct<br />([Model card](https://huggingface.co/meta-llama/Llama-3.1-70B-Instruct))         | LoRA and Full Parameter fine-tuning | [Llama 3.1 Community License Agreement](https://github.com/meta-llama/llama-models/blob/main/models/llama3_1/LICENSE) |
| meta-llama/Llama-3.1-70B<br />([Model card](https://huggingface.co/meta-llama/Llama-3.1-70B))                           | LoRA and Full Parameter fine-tuning | [Llama 3.1 Community License Agreement](https://github.com/meta-llama/llama-models/blob/main/models/llama3_1/LICENSE) |
| meta-llama/Llama-3.2-1B-Instruct<br />([Model card](https://huggingface.co/meta-llama/Llama-3.2-1B-Instruct))           | LoRA and Full Parameter fine-tuning | [Llama 3.2 Community License Agreement](https://huggingface.co/meta-llama/Llama-3.2-1B/blob/main/LICENSE.txt)         |
| meta-llama/Llama-3.2-1B<br />([Model card](https://huggingface.co/meta-llama/Llama-3.2-1B))                             | LoRA and Full Parameter fine-tuning | [Llama 3.2 Community License Agreement](https://huggingface.co/meta-llama/Llama-3.2-1B/blob/main/LICENSE.txt)         |
| meta-llama/Llama-3.2-3B-Instruct<br />([Model card](https://huggingface.co/meta-llama/Llama-3.2-3B-Instruct))           | LoRA and Full Parameter fine-tuning | [Llama 3.2 Community License Agreement](https://huggingface.co/meta-llama/Llama-3.2-3B/blob/main/LICENSE.txt)         |
| meta-llama/Llama-3.2-3B<br />([Model card](https://huggingface.co/meta-llama/Llama-3.2-3B))                             | LoRA and Full Parameter fine-tuning | [Llama 3.2 Community License Agreement](https://huggingface.co/meta-llama/Llama-3.2-3B/blob/main/LICENSE.txt)         |
| meta-llama/Llama-3.3-70B-Instruct<br />([Model card](https://huggingface.co/meta-llama/Llama-3.3-70B-Instruct))         | LoRA and Full Parameter fine-tuning | [Llama 3.3 Community License Agreement](https://github.com/meta-llama/llama-models/blob/main/models/llama3_3/LICENSE) |

Deployment options currently only include via Dedicated endpoints.