> ## Documentation Index
> Fetch the complete documentation index at: https://docs.tokenfactory.nebius.com/llms.txt
> Use this file to discover all available pages before exploring further.

# Capacity, availability, and service guarantees

### Capacity availability

Dedicated endpoint deployment depends on available capacity across different regions. Not all listed GPU types may be available at all times. Availability depends on current regional capacity.

### Minimum replicas guarantee

Minimum replicas are reserved and non-preemptible and remain allocated to your deployment for as long as the endpoint is active.

### Maximum replicas scaling

Scaling above minimum replicas depends on available burst capacity and is not guaranteed indefinitely. Additional replicas may be reclaimed after scale-down and may not always be available again without sufficient capacity.

### SLA

Self-service dedicated endpoints do not include a formal SLA unless covered by contract. Historical average monthly request success rate has been approximately 99.9%.

<Note>
  If you encounter capacity errors or are unable to scale beyond the minimum number of replicas, [contact our Sales team to reserve dedicated GPU capacity](https://nebius.com/services/token-factory/enterprise-grade-inference?_gl=1*b0acui*_gcl_au*MTAwNTQwNDk0NC4xNzcwMjkyMzc3LjExODU3MTgzMTAuMTc3NjA5NDQ4Ni4xNzc2MDk0NDg1#token-factory-enterprise-sales-form)
</Note>
