Documentation Index
Fetch the complete documentation index at: https://docs.tokenfactory.nebius.com/llms.txt
Use this file to discover all available pages before exploring further.
Terminology
| Term | Description |
|---|---|
| Template | A deployable performance “blueprint” for a model. Templates define which flavor_name, gpu_type, and regions are supported. |
| Flavor | A template’s sub-option (e.g. base, fast) with different performance/throughput/costs characteristics. |
| Endpoint | Dedicated deployment with API access |
| endpoint_id | Identifier used for update/delete operations. |
| routing_key | The model identifier you pass to inference calls. Returned when you create an endpoint. |
| Control plane | Sets up the configuration and settings of your endpoints. Has common base URL API. |
| Data plane | Processes model inference requests. Has regional base URL API. |
Control plane
Dedicated Endpoints Control Plane is a managedment layer for all configurations operations:- Creating & updating endpoints
- Uploading models / weights
- Scaling configs (min/max replicas)
- Monitoring setup
Data plane
Dedicated Ednpoints Data Plane processes model inference requests. It has regional base URL API. Region impacts latency, data locality, and regulatory compliance. Use a region-appropriate base URL for your inference calls:| Endpoint region | Inference base URL |
|---|---|
eu-north1 | https://api.tokenfactory.nebius.com |
eu-west1 | https://api.tokenfactory.eu-west1.nebius.com |
us-central1 | https://api.tokenfactory.us-central1.nebius.com |
Using the respective inference base URL avoids unnecessary global routing and reduces round-trip latency.
Endpoint Observability
Observability
Check out Observability Documentation section here