> ## Documentation Index
> Fetch the complete documentation index at: https://docs.tokenfactory.nebius.com/llms.txt
> Use this file to discover all available pages before exploring further.

# Overview

**Dedicated endpoints provide isolated, configurable deployments** of supported models and their performance templates. Use the control plane to create and manage deployments, and the data plane to run inference through OpenAI-compatible APIs.

With dedicated endpoints, you control:

<Card title="Region" icon="globe" horizontal>
  Choose where your deployment runs to optimize latency and meet data residency requirements.
</Card>

<Card title="GPU configuration" icon="microchip" horizontal>
  Select GPU type and GPUs per replica to match your performance and throughput needs.
</Card>

<Card title="Autoscaling" icon="arrow-down-arrow-up" horizontal>
  Set minimum and maximum replicas to automatically scale capacity with traffic.
</Card>

<Card title="Lifecycle management" icon="circle" horizontal>
  Create, update, stop, and delete deployments as your workloads evolve.
</Card>

## Key use cases:

* predictable capacity
* finetuned base model with custom weights
* compliance / private infra
* bigger control over deployment

## Dedicated vs Public Endpoints Comparison

| Feature                | Dedicated Endpoints                                                   | Public Serverless Endpoints                   |
| :--------------------- | :-------------------------------------------------------------------- | :-------------------------------------------- |
| Capacity               | Isolated capacity reserved for your organization                      | Shared multi-tenant capacity                  |
| Rate limits            | No standard rate limits; throughput depends on your deployed capacity | Dynamic rate limits apply                     |
| Data residency         | Deployment region is fixed and user-selected                          | Region may change based on available capacity |
| Autoscaling            | You control minimum and maximum replicas                              | Platform-managed with predefined limits       |
| Custom weights support | Supported for eligible models                                         | Base models only                              |
| Pricing                | Per GPU/hour, billed with per-minute granularity                      | Per token                                     |

## Learn more

<Card title="Deploy via API" icon="code" horizontal href="/ai-models-inference/dedicated-endpoints/deploy-api " />

<Card title="Deploy via UI" icon="arrow-pointer" horizontal href="/deploy-in-ui-1" />

<Card title="FAQ" icon="question" horizontal href="/ai-models-inference/troubleshooting" />
