Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.tokenfactory.nebius.com/llms.txt

Use this file to discover all available pages before exploring further.

Capacity availability

Dedicated endpoint deployment depends on available capacity across different regions. Not all listed GPU types may be available at all times. Availability depends on current regional capacity.

Minimum replicas guarantee

Minimum replicas are reserved and non-preemptible and remain allocated to your deployment for as long as the endpoint is active.

Maximum replicas scaling

Scaling above minimum replicas depends on available burst capacity and is not guaranteed indefinitely. Additional replicas may be reclaimed after scale-down and may not always be available again without sufficient capacity.

SLA

Self-service dedicated endpoints do not include a formal SLA unless covered by contract. Historical average monthly request success rate has been approximately 99.9%.
If you encounter capacity errors or are unable to scale beyond the minimum number of replicas, contact our Sales team to reserve dedicated GPU capacity