Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.tokenfactory.nebius.com/llms.txt

Use this file to discover all available pages before exploring further.

Billing while active

  • A dedicated endpoint is considered active, accessible, and billable when at least one replica is running
  • When one or more replicas are running, the endpoint is available to serve traffic and billing charges apply.
  • When zero replicas are running, the endpoint is not accessible and billing charges do not apply.
  • Charges may vary depending on your custom contract or work order.

Autoscaling costs

Scaling above or below minimum replicas adjusts billing dynamically on a PAYG basis.

Capacity retention

As long as your endpoint remains active, minimum replicas remain allocated to you. Once capacity is released, it may be reassigned and may not be immediately available again.