> ## Documentation Index > Fetch the complete documentation index at: https://docs.tokenfactory.nebius.com/llms.txt > Use this file to discover all available pages before exploring further. # Overview **Dedicated endpoints provide isolated, configurable deployments** of supported models and their performance templates. Use the control plane to create and manage deployments, and the data plane to run inference through OpenAI-compatible APIs. With dedicated endpoints, you control: Choose where your deployment runs to optimize latency and meet data residency requirements. Select GPU type and GPUs per replica to match your performance and throughput needs. Set minimum and maximum replicas to automatically scale capacity with traffic. Create, update, stop, and delete deployments as your workloads evolve. ## Key use cases: * predictable capacity * finetuned base model with custom weights * compliance / private infra * bigger control over deployment ## Dedicated vs Public Endpoints Comparison | Feature | Dedicated Endpoints | Public Serverless Endpoints | | :--------------------- | :-------------------------------------------------------------------- | :-------------------------------------------- | | Capacity | Isolated capacity reserved for your organization | Shared multi-tenant capacity | | Rate limits | No standard rate limits; throughput depends on your deployed capacity | Dynamic rate limits apply | | Data residency | Deployment region is fixed and user-selected | Region may change based on available capacity | | Autoscaling | You control minimum and maximum replicas | Platform-managed with predefined limits | | Custom weights support | Supported for eligible models | Base models only | | Pricing | Per GPU/hour, billed with per-minute granularity | Per token | ## Learn more