Quickstart: Deploy in UI

Walkthrough on UI deployment
Using endpoint
Walkthrough on Operating Dedicated Endpoints

You can create a dedicated endpoint from either of these UI locations:

Explore page: https://tokenfactory.nebius.com/
Inference → Model Endpoints: https://tokenfactory.nebius.com/models

Explore page https://tokenfactory.nebius.com/
Inference/Model Endpoints https://tokenfactory.nebius.com/models

From there, select a supported model template and complete the deployment configuration, including region, GPU configuration, and autoscaling settings.

Walkthrough on UI deployment

Prefere automation?
For API-based deployment, see Deploy in API section

Using endpoint

Go to Inference → Model Endpoints and open your private endpoint card.

There, you can view key deployment details, including:

Endpoint ID
Routing key
Model
GPUs per replica
Minimum and maximum replicas
Deployment status
Ready-to-use code snippets

To update configuration, click Edit Endpoint.

To check observability metrics you can either:

Open dedicated endpoints model card and click Observability button below
Go to observability section and set filters to your enpoint: https://tokenfactory.nebius.com/observability

Walkthrough on Operating Dedicated Endpoints

See Walkthrough on Operating Dedicated Endpoints for deployment management, scaling, and operational best practices

Quickstart: Deploy via API Control & Data Plane

⌘I

Get Started

AI Models Inference

Observability

Post-training

Data Lab

Utilities

Teams & Access Management

Other Capabilities

Integrations

Walkthrough on UI deployment

Using endpoint

Walkthrough on Operating Dedicated Endpoints

Get Started

AI Models Inference

Observability

Post-training

Data Lab

Utilities

Teams & Access Management

Other Capabilities

Integrations

Documentation Index

​Walkthrough on UI deployment

​Using endpoint

​Walkthrough on Operating Dedicated Endpoints

Walkthrough on UI deployment

Using endpoint

Walkthrough on Operating Dedicated Endpoints