Overview

Key use cases:
Dedicated vs Public Endpoints Comparison
Learn more

Dedicated endpoints provide isolated, configurable deployments of supported models and their performance templates. Use the control plane to create and manage deployments, and the data plane to run inference through OpenAI-compatible APIs. With dedicated endpoints, you control:

Region

Choose where your deployment runs to optimize latency and meet data residency requirements.

GPU configuration

Select GPU type and GPUs per replica to match your performance and throughput needs.

Autoscaling

Set minimum and maximum replicas to automatically scale capacity with traffic.

Lifecycle management

Create, update, stop, and delete deployments as your workloads evolve.

Key use cases:

predictable capacity
finetuned base model with custom weights
compliance / private infra
bigger control over deployment

Dedicated vs Public Endpoints Comparison

Feature	Dedicated Endpoints	Public Serverless Endpoints
Capacity	Isolated capacity reserved for your organization	Shared multi-tenant capacity
Rate limits	No standard rate limits; throughput depends on your deployed capacity	Dynamic rate limits apply
Data residency	Deployment region is fixed and user-selected	Region may change based on available capacity
Autoscaling	You control minimum and maximum replicas	Platform-managed with predefined limits
Custom weights support	Supported for eligible models	Base models only
Pricing	Per GPU/hour, billed with per-minute granularity	Per token

Get Started

AI Models Inference

Observability

Post-training

Data Lab

Utilities

Teams & Access Management

Other Capabilities

Integrations

Region

GPU configuration

Autoscaling

Lifecycle management

Key use cases:

Dedicated vs Public Endpoints Comparison

Learn more

Deploy via API

Deploy via UI

FAQ

Get Started

AI Models Inference

Observability

Post-training

Data Lab

Utilities

Teams & Access Management

Other Capabilities

Integrations

Documentation Index

Region

GPU configuration

Autoscaling

Lifecycle management

​Key use cases:

​Dedicated vs Public Endpoints Comparison

​Learn more

Deploy via API

Deploy via UI

FAQ

Key use cases:

Dedicated vs Public Endpoints Comparison

Learn more