Documentation Index
Fetch the complete documentation index at: https://docs.tokenfactory.nebius.com/llms.txt
Use this file to discover all available pages before exploring further.
Following functionality will no longer be available in Nebius Token Factory :
- Some of the text models
- LoRA model per-token deployment and inference
- Text to Image models
These models will no longer be supported in UI Playground and API
We strongly encourage you to migrate to newer, actively supported models to ensure continued stability and performance.
Deprecation timeline
On Apr 13, 2026 Affected model APIs and UI will be disabled
Models affected
Text-to-Text
| Model id |
|---|
| zai_org/glm_4.7_fp8 |
| minimaxai/minimax_m2.1 |
| deepseek_ai/deepseek_r1_0528 |
| deepseek_ai/deepseek_v3_0324 |
| meta_llama/llama_3.3_70b_instruct_fast |
| qwen/qwen3_coder_480b_a35b_instruct |
| moonshotai/kimi_k2_instruct |
| moonshotai/kimi_k2_thinking |
| deepseek_ai/deepseek_r1_0528_fast |
| deepseek_ai/deepseek_v3_0324_fast |
| openai/gpt_oss_20b |
| zai_org/glm_4.5 |
| qwen/qwen3_32b_fast |
| qwen/qwen3_235b_a22b_thinking_2507 |
| zai_org/glm_4.5_air |
| qwen/qwen3_30b_a3b_thinking_2507 |
| qwen/qwen3_coder_30b_a3b_instruct |
| meta_llama/meta_llama_3.1_8b_instruct |
| meta_llama/meta_llama_3.1_8b_instruct_fast |
| google/gemma_2_9b_it_fast |
| qwen/qwen2.5_coder_7b_fast |
| baai/bge_en_icl |
| nvidia/nemotron_nano_v2_12b |
| google/gemma_3_27b_it_fast |
| meta_llama/llama_guard_3_8b |
| baai/bge_multilingual_gemma2 |
| intfloat/e5_mistral_7b_instruct |
| google/gemma_2_2b_it |
Text-to-Image
We are also deprecating all Text-to-Image models as we streamline supported modalities and focus our infrastructure on Text-based workloads excellence.
We may continue supporting image generation through more robust and scalable alternatives in the future.** For now both UI and API will no longer be available.**
| Model id |
|---|
| black_forest_labs/flux_schnell |
| black_forest_labs/flux_dev |
LoRA per-token serverless endpoints
We are deprecating LoRA per-token deployments as part of a shift toward more scalable and production-ready deployment option - Dedicated Endpoints.
If you are currently using LoRA-based setups, we recommend transitioning to:
- Standard public model deployments
- Dedicated Endpoints for controlled and predictable performance
| Model id |
|---|
| meta_llama/meta_llama_3.1_8b_instruct_lora |
| gemma-2-2b-it-lora |
| llama-3.3-70b-lora |
What you should do
- Review your current usage for any dependencies on deprecated models
- Migrate to supported models available in the platform
- Reach out to our Sales Team to discuss Dedicated Endpoints if the option you’re looking for is not available at the platform.
For production workloads, higher stability requirements, or custom configurations, we recommend using Dedicated Endpoints, which provide:
- Full control over model versions
- Predictable performance and scaling
- Enterprise-grade reliability and isolation
Need help?
If you’re impacted or want to ensure a smooth transition, our team is ready to help.
👉 Contact our Sales team: Link
👉 Contact our Support team: tokenfactory-support@nebius.com,
We can support you with:
- Migration planning
- Model selection
- Dedicated endpoint setup