Operating Dedicated Endpoint

Parameters Description

Field	Type	Required	Description
`name`	string	yes	Display name for the endpoint
`description`	string	no	Optional description
`model_name`	string	yes	Template model name (e.g. `openai/gpt-oss-120b`)
`flavor_name`	string	yes	Template flavor (e.g. `base`, `fast`)
`gpu_type`	string	yes	GPU type supported by the chosen template + flavor
`gpu_count`	integer	yes	`gpu_count` per replica. Total maximum GPUs = `gpu_count × scaling.max_replicas`
`region`	string	yes	`eu-north1`, `eu-west1`, `us-central1`
`scaling.min_replicas`	integer	yes	Minimum replicas
`scaling.max_replicas`	integer	yes	Maximum replicas

Update Endpoint Configuration

PATCH updates only the provided fields.

PATCH /v0/dedicated_endpoints/{endpoint_id}

Endpoint’s region config region cannot be updated after dedicated endpoint creation.

import os, json, requests

CONTROL_PLANE_BASE_URL = "https://api.tokenfactory.nebius.com"
API_TOKEN = os.environ["API_TOKEN"]
endpoint_id = "<endpoint_id>"

payload = {
    "name": "GPT-20B Endpoint (updated)",
    "scaling": {"min_replicas": 2, "max_replicas": 4},
}

r = requests.patch(
    f"{CONTROL_PLANE_BASE_URL}/v0/dedicated_endpoints/{endpoint_id}",
    headers={"Authorization": f"Bearer {API_TOKEN}"},
    json=payload,
)
r.raise_for_status()
print(json.dumps(r.json(), indent=2))

Scaling changes may trigger provisioning of additional replicas. Plan for a short warm-up period where new replicas come online.

List dedicated endpoints

GET /v0/dedicated_endpoints

import os, json, requests

CONTROL_PLANE_BASE_URL = "https://api.tokenfactory.nebius.com"
API_TOKEN = os.environ["API_TOKEN"]

r = requests.get(
    f"{CONTROL_PLANE_BASE_URL}/v0/dedicated_endpoints",
    headers={"Authorization": f"Bearer {API_TOKEN}"},
)
r.raise_for_status()
print(json.dumps(r.json(), indent=2))

Delete endpoint

Deletes an endpoint permanently.

DELETE /v0/dedicated_endpoints/{endpoint_id}

This is a permanent action. The GPUs associated with min_replicas are released for other users.

import os, requests

CONTROL_PLANE_BASE_URL = "https://api.tokenfactory.nebius.com"
API_TOKEN = os.environ["API_TOKEN"]
endpoint_id = "<endpoint_id>"

r = requests.delete(
    f"{CONTROL_PLANE_BASE_URL}/v0/dedicated_endpoints/{endpoint_id}",
    headers={"Authorization": f"Bearer {API_TOKEN}"},
)
print(r.status_code)
r.raise_for_status()

Get Started

AI Models Inference

Observability

Post-training

Data Lab

Utilities

Teams & Access Management

Other Capabilities

Integrations

Parameters Description

Update Endpoint Configuration

List dedicated endpoints

Delete endpoint

Get Started

AI Models Inference

Observability

Post-training

Data Lab

Utilities

Teams & Access Management

Other Capabilities

Integrations

Documentation Index

​Parameters Description

​Update Endpoint Configuration

​List dedicated endpoints

​Delete endpoint

Parameters Description

Update Endpoint Configuration

List dedicated endpoints

Delete endpoint