Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.tokenfactory.nebius.com/llms.txt

Use this file to discover all available pages before exploring further.

You can create a dedicated endpoint from either of these UI locations:
Image
Image
  1. Explore page https://tokenfactory.nebius.com/
  2. Inference/Model Endpoints https://tokenfactory.nebius.com/models
From there, select a supported model template and complete the deployment configuration, including region, GPU configuration, and autoscaling settings.
Image
Image
Image

Walkthrough on UI deployment

Prefere automation?
For API-based deployment, see Deploy in API section

Using endpoint

Go to Inference → Model Endpoints and open your private endpoint card.
Image
There, you can view key deployment details, including:
  • Endpoint ID
  • Routing key
  • Model
  • GPUs per replica
  • Minimum and maximum replicas
  • Deployment status
  • Ready-to-use code snippets
To update configuration, click Edit Endpoint.
Image
Image
To check observability metrics you can either:
  1. Open dedicated endpoints model card and click Observability button below
  2. Go to observability section and set filters to your enpoint: https://tokenfactory.nebius.com/observability
Read more at Observability section

Walkthrough on Operating Dedicated Endpoints

See Walkthrough on Operating Dedicated Endpoints for deployment management, scaling, and operational best practices