> ## Documentation Index
> Fetch the complete documentation index at: https://docs.tokenfactory.nebius.com/llms.txt
> Use this file to discover all available pages before exploring further.

# Models for fine-tuning in Nebius Token Factory

> Supported base models for fine-tuning in Nebius Token Factory, with available context lengths and fine-tuning types (LoRA and full fine-tuning), grouped by provider.

Nebius Token Factory supports fine-tuning on multiple open-weight model families.\
This page lists:

* Which **base models** you can fine-tune
* Which **context lengths** they support
* Which **fine-tuning types** are available (LoRA vs full fine-tuning)

<Note>
  **Deployment note**\
  Not all models that can be fine-tuned can be deployed as serverless endpoints in Nebius Token Factory.\
  \
  For serving options, see [Deploy custom model](https://docs.tokenfactory.nebius.com/post-training/deploy-custom-model) and the list of available deployment models.
</Note>

***

## Model List

For each models listed below, Nebius Token Factory supports the following \
\
`context_length: 8192`, `16384`, `32768`, `65536`, `131072`

Unless you override it via the `context_length` hyperparameter, the default context length for fine-tuning is **8192 tokens**.

Check hyperparameter section for model details regarding `context_length`

***

## OpenAI / Unsloth GPT-OSS

These models are OpenAI GPT-OSS weights (bf16) packaged by Unsloth.\
They are Apache 2.0–licensed and suitable for both research and commercial use (subject to the license).

To convert the weights into MXFP4 please follow instructions [here](https://docs.tokenfactory.nebius.com/post-training/weights-conversion).

| Name                                                                                            | Training type                       | Model card / license                                                             |
| :---------------------------------------------------------------------------------------------- | :---------------------------------- | :------------------------------------------------------------------------------- |
| unsloth/gpt-oss-20b-BF16<br />([Model card](https://huggingface.co/unsloth/gpt-oss-20b-BF16))   | LoRA and Full Parameter fine-tuning | [Apache 2.0](https://huggingface.co/unsloth/gpt-oss-20b-BF16/blob/main/LICENSE)  |
| unsloth/gpt-oss-120b-BF16<br />([Model card](https://huggingface.co/unsloth/gpt-oss-120b-BF16)) | LoRA and Full Parameter fine-tuning | [Apache 2.0](https://huggingface.co/unsloth/gpt-oss-120b-BF16/blob/main/LICENSE) |

<Note>
  For merging MoE LoRA adapter weights to deploy on Dedicated endpoints please follow the guide [here](https://docs.tokenfactory.nebius.com/post-training/merge-moe-lora-weights).
</Note>

***

## Qwen

Nebius Token Factory supports **dense, moe** and **coder** variants across Qwen3 and Qwen2.5 families.\
All Qwen models below use the **Apache 2.0** license (see each model card for details).

### Qwen3 MoE

| Name                                                                                                              | Training type                       | Model card / license                                                                      |
| :---------------------------------------------------------------------------------------------------------------- | :---------------------------------- | :---------------------------------------------------------------------------------------- |
| Qwen/Qwen3-235B-A22B<br />([Model card](https://huggingface.co/Qwen/Qwen3-235B-A22B))                             | LoRA and Full Parameter fine-tuning | [Apache 2.0](https://huggingface.co/Qwen/Qwen3-235B-A22B/blob/main/LICENSE)               |
| Qwen/Qwen3-235B-A22B-Instruct-2507<br />([Model card](https://huggingface.co/Qwen/Qwen3-235B-A22B-Instruct-2507)) | LoRA and Full Parameter fine-tuning | [Apache 2.0](https://huggingface.co/Qwen/Qwen3-235B-A22B-Instruct-2507/blob/main/LICENSE) |
| Qwen/Qwen3-235B-A22B-Thinking-2507<br />([Model card](https://huggingface.co/Qwen/Qwen3-235B-A22B-Thinking-2507)) | LoRA and Full Parameter fine-tuning | [Apache 2.0](https://huggingface.co/Qwen/Qwen3-235B-A22B-Thinking-2507/blob/main/LICENSE) |
| Qwen/Qwen3-30B-A3B<br />([Model card](https://huggingface.co/Qwen/Qwen3-30B-A3B))                                 | LoRA and Full Parameter fine-tuning | [Apache 2.0](https://huggingface.co/Qwen/Qwen3-30B-A3B/blob/main/LICENSE)                 |
| Qwen/Qwen3-30B-A3B-Instruct-2507<br />([Model card](https://huggingface.co/Qwen/Qwen3-30B-A3B-Instruct-2507))     | LoRA and Full Parameter fine-tuning | [Apache 2.0](https://huggingface.co/Qwen/Qwen3-30B-A3B-Instruct-2507/blob/main/LICENSE)   |
| Qwen/Qwen3-30B-A3B-Thinking-2507<br />([Model card](https://huggingface.co/Qwen/Qwen3-30B-A3B-Thinking-2507))     | LoRA and Full Parameter fine-tuning | [Apache 2.0](https://huggingface.co/Qwen/Qwen3-30B-A3B-Thinking-2507/blob/main/LICENSE)   |

### Qwen3 coder

| Name                                                                                                                | Training type                       | Model card / license                                                                       |
| :------------------------------------------------------------------------------------------------------------------ | :---------------------------------- | :----------------------------------------------------------------------------------------- |
| Qwen/Qwen3-Coder-30B-A3B-Instruct<br />([Model card](https://huggingface.co/Qwen/Qwen3-Coder-30B-A3B-Instruct))     | LoRA and Full Parameter fine-tuning | [Apache 2.0](https://huggingface.co/Qwen/Qwen3-Coder-30B-A3B-Instruct/blob/main/LICENSE)   |
| Qwen/Qwen3-Coder-480B-A35B-Instruct<br />([Model card](https://huggingface.co/Qwen/Qwen3-Coder-480B-A35B-Instruct)) | LoRA and Full Parameter fine-tuning | [Apache 2.0](https://huggingface.co/Qwen/Qwen3-Coder-480B-A35B-Instruct/blob/main/LICENSE) |

### Qwen3 dense + base

| Name                                                                                  | Training type                       | Model card / license                                                        |
| :------------------------------------------------------------------------------------ | ----------------------------------- | :-------------------------------------------------------------------------- |
| Qwen/Qwen3-32B<br />([Model card](https://huggingface.co/Qwen/Qwen3-32B))             | LoRA and Full Parameter fine-tuning | [Apache 2.0](https://huggingface.co/Qwen/Qwen3-32B/blob/main/LICENSE)       |
| Qwen/Qwen3-14B<br />([Model card](https://huggingface.co/Qwen/Qwen3-14B))             | LoRA and Full Parameter fine-tuning | [Apache 2.0](https://huggingface.co/Qwen/Qwen3-14B/blob/main/LICENSE)       |
| Qwen/Qwen3-14B-Base<br />([Model card](https://huggingface.co/Qwen/Qwen3-14B-Base))   | LoRA and Full Parameter fine-tuning | [Apache 2.0](https://huggingface.co/Qwen/Qwen3-14B-Base/blob/main/LICENSE)  |
| Qwen/Qwen3-8B<br />([Model card](https://huggingface.co/Qwen/Qwen3-8B))               | LoRA and Full Parameter fine-tuning | [Apache 2.0](https://huggingface.co/Qwen/Qwen3-8B/blob/main/LICENSE)        |
| Qwen/Qwen3-8B-Base<br />([Model card](https://huggingface.co/Qwen/Qwen3-8B-Base))     | LoRA and Full Parameter fine-tuning | [Apache 2.0](https://huggingface.co/Qwen/Qwen3-8B-Base/blob/main/LICENSE)   |
| Qwen/Qwen3-4B<br />([Model card](https://huggingface.co/Qwen/Qwen3-4B))               | LoRA and Full Parameter fine-tuning | [Apache 2.0](https://huggingface.co/Qwen/Qwen3-4B/blob/main/LICENSE)        |
| Qwen/Qwen3-4B-Base<br />([Model card](https://huggingface.co/Qwen/Qwen3-4B-Base))     | LoRA and Full Parameter fine-tuning | [Apache 2.0](https://huggingface.co/Qwen/Qwen3-4B-Base/blob/main/LICENSE)   |
| Qwen/Qwen3-1.7B<br />([Model card](https://huggingface.co/Qwen/Qwen3-1.7B))           | LoRA and Full Parameter fine-tuning | [Apache 2.0](https://huggingface.co/Qwen/Qwen3-1.7B/blob/main/LICENSE)      |
| Qwen/Qwen3-1.7B-Base<br />([Model card](https://huggingface.co/Qwen/Qwen3-1.7B-Base)) | LoRA and Full Parameter fine-tuning | [Apache 2.0](https://huggingface.co/Qwen/Qwen3-1.7B-Base/blob/main/LICENSE) |
| Qwen/Qwen3-0.6B<br />([Model card](https://huggingface.co/Qwen/Qwen3-0.6B))           | LoRA and Full Parameter fine-tuning | [Apache 2.0](https://huggingface.co/Qwen/Qwen3-0.6B/blob/main/LICENSE)      |
| Qwen/Qwen3-0.6B-Base<br />([Model card](https://huggingface.co/Qwen/Qwen3-0.6B-Base)) | LoRA and Full Parameter fine-tuning | [Apache 2.0](https://huggingface.co/Qwen/Qwen3-0.6B-Base/blob/main/LICENSE) |

### Qwen2.5 dense + coder

| Name                                                                                                        | Model card / license                                                                   | Training type                       |
| :---------------------------------------------------------------------------------------------------------- | :------------------------------------------------------------------------------------- | :---------------------------------- |
| Qwen/Qwen2.5-0.5B<br />([Model card](https://huggingface.co/Qwen/Qwen2.5-0.5B))                             | [Apache 2.0](https://huggingface.co/Qwen/Qwen2.5-0.5B/blob/main/LICENSE)               | LoRA and Full Parameter fine-tuning |
| Qwen/Qwen2.5-0.5B-Instruct<br />([Model card](https://huggingface.co/Qwen/Qwen2.5-0.5B-Instruct))           | [Apache 2.0](https://huggingface.co/Qwen/Qwen2.5-0.5B-Instruct/blob/main/LICENSE)      | LoRA and Full Parameter fine-tuning |
| Qwen/Qwen2.5-7B<br />([Model card](https://huggingface.co/Qwen/Qwen2.5-7B))                                 | [Apache 2.0](https://huggingface.co/Qwen/Qwen2.5-7B/blob/main/LICENSE)                 | LoRA and Full Parameter fine-tuning |
| Qwen/Qwen2.5-7B-Instruct<br />([Model card](https://huggingface.co/Qwen/Qwen2.5-7B-Instruct))               | [Apache 2.0](https://huggingface.co/Qwen/Qwen2.5-7B-Instruct/blob/main/LICENSE)        | LoRA and Full Parameter fine-tuning |
| Qwen/Qwen2.5-14B<br />([Model card](https://huggingface.co/Qwen/Qwen2.5-14B))                               | [Apache 2.0](https://huggingface.co/Qwen/Qwen2.5-14B/blob/main/LICENSE)                | LoRA and Full Parameter fine-tuning |
| Qwen/Qwen2.5-14B-Instruct<br />([Model card](https://huggingface.co/Qwen/Qwen2.5-14B-Instruct))             | [Apache 2.0](https://huggingface.co/Qwen/Qwen2.5-14B-Instruct/blob/main/LICENSE)       | LoRA and Full Parameter fine-tuning |
| Qwen/Qwen2.5-32B<br />([Model card](https://huggingface.co/Qwen/Qwen2.5-32B))                               | [Apache 2.0](https://huggingface.co/Qwen/Qwen2.5-32B/blob/main/LICENSE)                | LoRA and Full Parameter fine-tuning |
| Qwen/Qwen2.5-32B-Instruct<br />([Model card](https://huggingface.co/Qwen/Qwen2.5-32B-Instruct))             | [Apache 2.0](https://huggingface.co/Qwen/Qwen2.5-32B-Instruct/blob/main/LICENSE)       | LoRA and Full Parameter fine-tuning |
| Qwen/Qwen2.5-72B<br />([Model card](https://huggingface.co/Qwen/Qwen2.5-72B))                               | [Apache 2.0](https://huggingface.co/Qwen/Qwen2.5-72B/blob/main/LICENSE)                | LoRA and Full Parameter fine-tuning |
| Qwen/Qwen2.5-72B-Instruct<br />([Model card](https://huggingface.co/Qwen/Qwen2.5-72B-Instruct))             | [Apache 2.0](https://huggingface.co/Qwen/Qwen2.5-72B-Instruct/blob/main/LICENSE)       | LoRA and Full Parameter fine-tuning |
| Qwen/Qwen2.5-Coder-32B<br />([Model card](https://huggingface.co/Qwen/Qwen2.5-Coder-32B))                   | [Apache 2.0](https://huggingface.co/Qwen/Qwen2.5-Coder-32B/blob/main/LICENSE)          | LoRA and Full Parameter fine-tuning |
| Qwen/Qwen2.5-Coder-32B-Instruct<br />([Model card](https://huggingface.co/Qwen/Qwen2.5-Coder-32B-Instruct)) | [Apache 2.0](https://huggingface.co/Qwen/Qwen2.5-Coder-32B-Instruct/blob/main/LICENSE) | Training type                       |

***

## DeepSeek

Nebius Token Factory integrates DeepSeek V3 models for high-capacity reasoning workloads.\
DeepSeek V3 and its variants are released under the **MIT License** (see model cards for details).

| Name                                                                                                  | Supported fine-tuning type | Model card / license                                                                 |
| :---------------------------------------------------------------------------------------------------- | :------------------------- | :----------------------------------------------------------------------------------- |
| deepseek-ai/DeepSeek-V3-0324<br />([Model card](https://huggingface.co/deepseek-ai/DeepSeek-V3-0324)) | Full fine-tuning           | [MIT License](https://huggingface.co/deepseek-ai/DeepSeek-V3-0324/blob/main/LICENSE) |
| deepseek-ai/DeepSeek-V3.1<br />([Model card](https://huggingface.co/deepseek-ai/DeepSeek-V3.1))       | Full fine-tuning           | [MIT License](https://huggingface.co/deepseek-ai/DeepSeek-V3.1/blob/main/LICENSE)    |

<Note>
  DeepSeek V3 and DeepSeek V3.1 are currently **only available in Nebius US data centers**.
</Note>

***

## Meta (Llama 3.1 / 3.2 / 3.3)

Nebius Token Factory and the Meta models hosted in the service are built on the **Llama 3.1**, **Llama 3.2**, and **Llama 3.3** families.

For acceptable use, see Meta’s policies:

* [Llama 3.1 Acceptable Use Policy](https://llama.meta.com/llama3_1/use-policy/)
* [Llama 3.2 Acceptable Use Policy](https://www.llama.com/llama3_2/use-policy/)
* [Llama 3.3 Acceptable Use Policy](https://www.llama.com/llama3_3/use-policy/)

| Name                                                                                                                    | Training type                       | Model card / license                                                                                                  |
| :---------------------------------------------------------------------------------------------------------------------- | :---------------------------------- | :-------------------------------------------------------------------------------------------------------------------- |
| meta-llama/Meta-Llama-3.1-8B-Instruct<br />([Model card](https://huggingface.co/meta-llama/Meta-Llama-3.1-8B-Instruct)) | LoRA and Full Parameter fine-tuning | [Llama 3.1 Community License Agreement](https://github.com/meta-llama/llama-models/blob/main/models/llama3_1/LICENSE) |
| meta-llama/Meta-Llama-3.1-8B<br />([Model card](https://huggingface.co/meta-llama/Llama-3.1-8B))                        | LoRA and Full Parameter fine-tuning | [Llama 3.1 Community License Agreement](https://github.com/meta-llama/llama-models/blob/main/models/llama3_1/LICENSE) |
| meta-llama/Llama-3.1-70B-Instruct<br />([Model card](https://huggingface.co/meta-llama/Llama-3.1-70B-Instruct))         | LoRA and Full Parameter fine-tuning | [Llama 3.1 Community License Agreement](https://github.com/meta-llama/llama-models/blob/main/models/llama3_1/LICENSE) |
| meta-llama/Llama-3.1-70B<br />([Model card](https://huggingface.co/meta-llama/Llama-3.1-70B))                           | LoRA and Full Parameter fine-tuning | [Llama 3.1 Community License Agreement](https://github.com/meta-llama/llama-models/blob/main/models/llama3_1/LICENSE) |
| meta-llama/Llama-3.2-1B-Instruct<br />([Model card](https://huggingface.co/meta-llama/Llama-3.2-1B-Instruct))           | LoRA and Full Parameter fine-tuning | [Llama 3.2 Community License Agreement](https://huggingface.co/meta-llama/Llama-3.2-1B/blob/main/LICENSE.txt)         |
| meta-llama/Llama-3.2-1B<br />([Model card](https://huggingface.co/meta-llama/Llama-3.2-1B))                             | LoRA and Full Parameter fine-tuning | [Llama 3.2 Community License Agreement](https://huggingface.co/meta-llama/Llama-3.2-1B/blob/main/LICENSE.txt)         |
| meta-llama/Llama-3.2-3B-Instruct<br />([Model card](https://huggingface.co/meta-llama/Llama-3.2-3B-Instruct))           | LoRA and Full Parameter fine-tuning | [Llama 3.2 Community License Agreement](https://huggingface.co/meta-llama/Llama-3.2-3B/blob/main/LICENSE.txt)         |
| meta-llama/Llama-3.2-3B<br />([Model card](https://huggingface.co/meta-llama/Llama-3.2-3B))                             | LoRA and Full Parameter fine-tuning | [Llama 3.2 Community License Agreement](https://huggingface.co/meta-llama/Llama-3.2-3B/blob/main/LICENSE.txt)         |
| meta-llama/Llama-3.3-70B-Instruct<br />([Model card](https://huggingface.co/meta-llama/Llama-3.3-70B-Instruct))         | LoRA and Full Parameter fine-tuning | [Llama 3.3 Community License Agreement](https://github.com/meta-llama/llama-models/blob/main/models/llama3_3/LICENSE) |

***

## Base LoRA adapter models available for deployment

You can [deploy serverless LoRA adapter models](https://docs.tokenfactory.nebius.com/fine-tuning/deploy-custom-model) in Nebius Token Factory with **per-token billing**.\
To deploy a LoRA-adapted model, first fine-tune an adapter on one of the base models below:

| Name                                                                                                                    | Training type                       | License                                                                                                               |
| :---------------------------------------------------------------------------------------------------------------------- | :---------------------------------- | :-------------------------------------------------------------------------------------------------------------------- |
| meta-llama/Meta-Llama-3.1-8B-Instruct<br />([Model card](https://huggingface.co/meta-llama/Meta-Llama-3.1-8B-Instruct)) | LoRA and Full Parameter fine-tuning | [Llama 3.1 Community License Agreement](https://github.com/meta-llama/llama-models/blob/main/models/llama3_1/LICENSE) |
| meta-llama/Llama-3.3-70B-Instruct<br />([Model card](https://huggingface.co/meta-llama/Llama-3.3-70B-Instruct))         | LoRA and Full Parameter fine-tuning | [Llama 3.3 Community License Agreement](https://github.com/meta-llama/llama-models/blob/main/models/llama3_3/LICENSE) |

***

For other models listed on this page, fine-tuning is supported, but deployment options may differ (for example, only via custom hosting or Dedicated Endpoints).
