Documentation Index
Fetch the complete documentation index at: https://docs.tokenfactory.nebius.com/llms.txt
Use this file to discover all available pages before exploring further.
Parameters Description
| Field | Type | Required | Description |
|---|
name | string | yes | Display name for the endpoint |
description | string | no | Optional description |
model_name | string | yes | Template model name (e.g. openai/gpt-oss-120b) |
flavor_name | string | yes | Template flavor (e.g. base, fast) |
gpu_type | string | yes | GPU type supported by the chosen template + flavor |
gpu_count | integer | yes | gpu_count per replica. Total maximum GPUs = gpu_count × scaling.max_replicas |
region | string | yes | eu-north1, eu-west1, us-central1 |
scaling.min_replicas | integer | yes | Minimum replicas |
scaling.max_replicas | integer | yes | Maximum replicas |
Update Endpoint Configuration
PATCH updates only the provided fields.
PATCH /v0/dedicated_endpoints/{endpoint_id}
Endpoint’s region config region cannot be updated after dedicated endpoint creation.
import os, json, requests
CONTROL_PLANE_BASE_URL = "https://api.tokenfactory.nebius.com"
API_TOKEN = os.environ["API_TOKEN"]
endpoint_id = "<endpoint_id>"
payload = {
"name": "GPT-20B Endpoint (updated)",
"scaling": {"min_replicas": 2, "max_replicas": 4},
}
r = requests.patch(
f"{CONTROL_PLANE_BASE_URL}/v0/dedicated_endpoints/{endpoint_id}",
headers={"Authorization": f"Bearer {API_TOKEN}"},
json=payload,
)
r.raise_for_status()
print(json.dumps(r.json(), indent=2))
Scaling changes may trigger provisioning of additional replicas. Plan for a short warm-up period where new replicas come online.
List dedicated endpoints
GET /v0/dedicated_endpoints
import os, json, requests
CONTROL_PLANE_BASE_URL = "https://api.tokenfactory.nebius.com"
API_TOKEN = os.environ["API_TOKEN"]
r = requests.get(
f"{CONTROL_PLANE_BASE_URL}/v0/dedicated_endpoints",
headers={"Authorization": f"Bearer {API_TOKEN}"},
)
r.raise_for_status()
print(json.dumps(r.json(), indent=2))
Delete endpoint
Deletes an endpoint permanently.
DELETE /v0/dedicated_endpoints/{endpoint_id}
This is a permanent action. The GPUs associated with min_replicas are released for other users.
import os, requests
CONTROL_PLANE_BASE_URL = "https://api.tokenfactory.nebius.com"
API_TOKEN = os.environ["API_TOKEN"]
endpoint_id = "<endpoint_id>"
r = requests.delete(
f"{CONTROL_PLANE_BASE_URL}/v0/dedicated_endpoints/{endpoint_id}",
headers={"Authorization": f"Bearer {API_TOKEN}"},
)
print(r.status_code)
r.raise_for_status()