Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.tokenfactory.nebius.com/llms.txt

Use this file to discover all available pages before exploring further.

Terminology

TermDescription
TemplateA deployable performance “blueprint” for a model. Templates define which flavor_name, gpu_type, and regions are supported.
FlavorA template’s sub-option (e.g. base, fast) with different performance/throughput/costs characteristics.
EndpointDedicated deployment with API access
endpoint_idIdentifier used for update/delete operations.
routing_keyThe model identifier you pass to inference calls. Returned when you create an endpoint.
Control planeSets up the configuration and settings of your endpoints. Has common base URL API.
Data planeProcesses model inference requests. Has regional base URL API.

Control plane

Dedicated Endpoints Control Plane is a managedment layer for all configurations operations:
  • Creating & updating endpoints
  • Uploading models / weights
  • Scaling configs (min/max replicas)
  • Monitoring setup
Use common base URL API for all dedicated endpoint management operations:
https://api.tokenfactory.nebius.com

Data plane

Dedicated Ednpoints Data Plane processes model inference requests. It has regional base URL API. Region impacts latency, data locality, and regulatory compliance. Use a region-appropriate base URL for your inference calls:
Endpoint regionInference base URL
eu-north1https://api.tokenfactory.nebius.com
eu-west1https://api.tokenfactory.eu-west1.nebius.com
us-central1https://api.tokenfactory.us-central1.nebius.com
Using the respective inference base URL avoids unnecessary global routing and reduces round-trip latency.

Endpoint Observability

Observability

Check out Observability Documentation section here