> ## Documentation Index
> Fetch the complete documentation index at: https://docs.tokenfactory.nebius.com/llms.txt
> Use this file to discover all available pages before exploring further.

# Operating Dedicated Endpoint

## Parameters Description

| Field                  | Type    | Required | Description                                                                                         |
| :--------------------- | :------ | :------- | :-------------------------------------------------------------------------------------------------- |
| `name`                 | string  | yes      | Display name for the endpoint                                                                       |
| `description`          | string  | no       | Optional description                                                                                |
| `model_name`           | string  | yes      | Template model name (e.g. `openai/gpt-oss-120b`)                                                    |
| `flavor_name`          | string  | yes      | Template flavor (e.g. `base`, `fast`)                                                               |
| `gpu_type`             | string  | yes      | GPU type supported by the chosen template + flavor                                                  |
| `gpu_count`            | integer | yes      | `gpu_count` per replica. Total maximum GPUs = `gpu_count × scaling.max_replicas`                    |
| `region`               | string  | yes      | `eu-north1`, `eu-west1`, `us-central1`                                                              |
| `scaling.min_replicas` | integer | yes      | Minimum replicas                                                                                    |
| `scaling.max_replicas` | integer | yes      | Maximum replicas                                                                                    |
| `enabled`              | boolean | no       | Enable / Disable the endpoint. Stops the endpoint with freeing up replicas, or starts the endpoint. |

## Update Endpoint Configuration

`PATCH` updates only the provided fields.

```http theme={null}
PATCH /v0/dedicated_endpoints/{endpoint_id}
```

<Note>
  Endpoint's region config `region` cannot be updated after dedicated endpoint creation.
</Note>

<CodeGroup>
  ```python Python theme={null}
  import os, json, requests

  CONTROL_PLANE_BASE_URL = "https://api.tokenfactory.nebius.com"
  API_TOKEN = os.environ["API_TOKEN"]
  endpoint_id = "<endpoint_id>"

  payload = {
      "name": "GPT-20B Endpoint (updated)",
      "scaling": {"min_replicas": 2, "max_replicas": 4},
  	"enabled": True
  }

  r = requests.patch(
      f"{CONTROL_PLANE_BASE_URL}/v0/dedicated_endpoints/{endpoint_id}",
      headers={"Authorization": f"Bearer {API_TOKEN}"},
      json=payload,
  )
  r.raise_for_status()
  print(json.dumps(r.json(), indent=2))
  ```

  ```shellscript cURL theme={null}
  curl -sS -X PATCH \
    "https://api.tokenfactory.nebius.com/v0/dedicated_endpoints/<endpoint_id>" \
    -H "Authorization: Bearer $API_TOKEN" \
    -H "Content-Type: application/json" \
    -d '{
      "scaling": { "min_replicas": 2, "max_replicas": 4 }
    }'
  ```
</CodeGroup>

<Tip>
  Scaling changes may trigger provisioning of additional replicas. Plan for a short warm-up period where new replicas come online.
</Tip>

## List dedicated endpoints

```http theme={null}
GET /v0/dedicated_endpoints
```

<CodeGroup>
  ```python Python theme={null}
  import os, json, requests

  CONTROL_PLANE_BASE_URL = "https://api.tokenfactory.nebius.com"
  API_TOKEN = os.environ["API_TOKEN"]

  r = requests.get(
      f"{CONTROL_PLANE_BASE_URL}/v0/dedicated_endpoints",
      headers={"Authorization": f"Bearer {API_TOKEN}"},
  )
  r.raise_for_status()
  print(json.dumps(r.json(), indent=2))
  ```

  ```shellscript cURL theme={null}
  curl -sS \
    -H "Authorization: Bearer $API_TOKEN" \
    "https://api.tokenfactory.nebius.com/v0/dedicated_endpoints"
  ```
</CodeGroup>

## Delete endpoint

Deletes an endpoint permanently.

```http theme={null}
DELETE /v0/dedicated_endpoints/{endpoint_id}
```

<Warning>
  This is a permanent action. The GPUs associated with min\_replicas are released for other users.
</Warning>

<CodeGroup>
  ```python Python theme={null}
  import os, requests

  CONTROL_PLANE_BASE_URL = "https://api.tokenfactory.nebius.com"
  API_TOKEN = os.environ["API_TOKEN"]
  endpoint_id = "<endpoint_id>"

  r = requests.delete(
      f"{CONTROL_PLANE_BASE_URL}/v0/dedicated_endpoints/{endpoint_id}",
      headers={"Authorization": f"Bearer {API_TOKEN}"},
  )
  print(r.status_code)
  r.raise_for_status()
  ```

  ```shellscript cURL theme={null}
  curl -sS -X DELETE \
    -H "Authorization: Bearer $API_TOKEN" \
    "https://api.tokenfactory.nebius.com/v0/dedicated_endpoints/<endpoint_id>"
  ```
</CodeGroup>
