Documentation Index
Fetch the complete documentation index at: https://docs.tokenfactory.nebius.com/llms.txt
Use this file to discover all available pages before exploring further.
Troubleshooting common errors
| Status | Typical cause | Fix |
|---|---|---|
401 Unauthorized | Missing/invalid token | Ensure Authorization: Bearer ... is set and token is active |
403 Forbidden | Token lacks permission | Use a token with dedicated endpoint permissions |
404 Not Found | Endpoint not ready, wrong inference domain, or wrong routing_key | Wait for readiness; use correct inference base URL for the endpoint region; pass routing_key exactly as returned |
409 Conflict | Capacity or config conflict | Choose a supported GPU/region combo from templates or reduce max replicas |
422 Unprocessable Entity | Invalid payload values | Validate fields against templates (gpu types, flavors, regions, counts) |
I finetuned a model and want to deploy it via Dedicated endpoint
Working with custom model weights is currently in beta and available on request. If you’d like to deploy or work with custom fine-tuned model weights, please contact our Support team to enable access and guide you through the current setup process. Availability, supported configurations, and onboarding steps may vary during the beta period.Base model is not available in the list
We strive to support the most popular model balancing variety and performance excellence. If the model you’re looking for is absent in the list - please contact our Support team mentioning:- Exact model you’re looking for with Hugging Face weights link
- Your project and traffic details
Region is not available in the list
We support several datacenters worldwide and the deployment options depends on capacity available. Sometimes the region and GPU type combination you’re looking for may not be available. Still you can contact our Sales team to plan a capacity reserve in a specific region.Can I bring down my Dedicated Endpoint when I don’t have traffic so I don’t overpay?
Yes, useenabled parameter in request to enable/disable the endpoint. Read more on this in Operations section.Note that stopping the endpoint releases the capacity and you may not start it again if the capacity is taken by the other user and there are no free capacity at the moment.