Capacity, availability, and service guarantees

Capacity availability
Minimum replicas guarantee
Maximum replicas scaling
SLA

Capacity availability

Dedicated endpoint deployment depends on available capacity across different regions. Not all listed GPU types may be available at all times. Availability depends on current regional capacity.

Minimum replicas guarantee

Minimum replicas are reserved and non-preemptible and remain allocated to your deployment for as long as the endpoint is active.

Maximum replicas scaling

Scaling above minimum replicas depends on available burst capacity and is not guaranteed indefinitely. Additional replicas may be reclaimed after scale-down and may not always be available again without sufficient capacity.

SLA

Self-service dedicated endpoints do not include a formal SLA unless covered by contract. Historical average monthly request success rate has been approximately 99.9%.

If you encounter capacity errors or are unable to scale beyond the minimum number of replicas, contact our Sales team to reserve dedicated GPU capacity

Control & Data Plane Billing Policy

⌘I

Get Started

AI Models Inference

Observability

Post-training

Data Lab

Utilities

Teams & Access Management

Other Capabilities

Integrations