Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.tokenfactory.nebius.com/llms.txt

Use this file to discover all available pages before exploring further.

Troubleshooting common errors

StatusTypical causeFix
401 UnauthorizedMissing/invalid tokenEnsure Authorization: Bearer ... is set and token is active
403 ForbiddenToken lacks permissionUse a token with dedicated endpoint permissions
404 Not FoundEndpoint not ready, wrong inference domain, or wrong routing_keyWait for readiness; use correct inference base URL for the endpoint region; pass routing_key exactly as returned
409 ConflictCapacity or config conflictChoose a supported GPU/region combo from templates or reduce max replicas
422 Unprocessable EntityInvalid payload valuesValidate fields against templates (gpu types, flavors, regions, counts)

I finetuned a model and want to deploy it via Dedicated endpoint

Working with custom model weights is currently in beta and available on request. If you’d like to deploy or work with custom fine-tuned model weights, please contact our Support team to enable access and guide you through the current setup process. Availability, supported configurations, and onboarding steps may vary during the beta period.

Base model is not available in the list

We strive to support the most popular model balancing variety and performance excellence. If the model you’re looking for is absent in the list - please contact our Support team mentioning:
  • Exact model you’re looking for with Hugging Face weights link
  • Your project and traffic details
Our team will review your case and make a decision on model support

Region is not available in the list

We support several datacenters worldwide and the deployment options depends on capacity available. Sometimes the region and GPU type combination you’re looking for may not be available. Still you can contact our Sales team to plan a capacity reserve in a specific region.

Can I bring down my Dedicated Endpoint when I don’t have traffic so I don’t overpay?

Yes, use enabled parameter in request to enable/disable the endpoint. Read more on this in Operations section.
Note that stopping the endpoint releases the capacity and you may not start it again if the capacity is taken by the other user and there are no free capacity at the moment.