Skip to main content
POST
/
v1
/
completions
Create completion
curl --request POST \
  --url https://api.tokenfactory.nebius.com/v1/completions \
  --header 'Authorization: Bearer <token>' \
  --header 'Content-Type: application/json' \
  --data '
{
  "model": "<string>",
  "prompt": "Say this is a test",
  "stream": false,
  "stream_options": null,
  "max_tokens": 100,
  "temperature": 1,
  "top_p": 1,
  "n": 1,
  "logprobs": null,
  "echo": false,
  "stop": "<string>",
  "presence_penalty": 0,
  "frequency_penalty": 0,
  "logit_bias": {},
  "user": "<string>",
  "extra_body": null,
  "service_tier": "auto"
}
'
{
  "id": "cmpl-bd18c4194f544c189578cfcb273a2f74",
  "choices": [
    {
      "finish_reason": "stop",
      "index": 0,
      "text": "Hello! It's nice to meet you. Is there something I can help you with, or would you like to chat?"
    }
  ],
  "created": 1717516032,
  "model": "meta-llama/Llama-3.3-70B-Instruct",
  "object": "text_completion",
  "usage": {
    "completion_tokens": 26,
    "prompt_tokens": 13,
    "total_tokens": 39
  }
}

Authorizations

Authorization
string
header
required

Bearer authentication header of the form Bearer <token>, where <token> is your auth token.

Query Parameters

ai_project_id
string | null

current project ID

Body

application/json
model
string
required

ID of the model to use.

Example:

"meta-llama/Meta-Llama-3.1-70B-Instruct"

prompt
required

The prompt(s) to generate completions for, encoded as a string, array of strings, array of tokens, or array of token arrays.

Example:

"Say this is a test"

stream
boolean | null
default:false

Enable response streaming.

stream_options
Stream Options · object

If set to {"include_usage": True}, usage stats will be sent with the last chunk of data

Example:

null

max_tokens
integer | null

Max completion token count

Example:

100

temperature
number | null
default:1

What sampling temperature to use, between 0 and 2. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic.

top_p
number | null
default:1

An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered.

n
integer | null
default:1

How many completions to generate for each prompt.

logprobs
integer | null

Include the log probabilities on the logprobs most likely tokens, as well the chosen tokens. So for example, if logprobs is 5, the API will return a list of the 5 most likely tokens. The API will always return the logprob of the sampled token, so there may be up to logprobs+1 elements in the response.

Example:

null

echo
boolean | null
default:false

Echo back the prompt in addition to the completion.

stop

Up to 4 sequences where the API will stop generating further tokens.

presence_penalty
number | null
default:0

Number between -2.0 and 2.0. Positive values penalize new tokens based on whether they appear in the text so far, increasing the model's likelihood to talk about new topics.

frequency_penalty
number | null
default:0

Number between -2.0 and 2.0. Positive values penalize new tokens based on their existing frequency in the text so far, decreasing the model's likelihood to repeat the same line verbatim.

logit_bias
Logit Bias · object

Modify the likelihood of specified tokens appearing in the completion. Accepts a json object that maps tokens (specified by their token ID in the tokenizer) to an associated bias value from -100 to 100. Mathematically, the bias is added to the logits generated by the model prior to sampling. The exact effect will vary per model, but values between -1 and 1 should decrease or increase likelihood of selection; values like -100 or 100 should result in a ban or exclusive selection of the relevant token.

user
string | null

A unique identifier representing your end-user, which can help OpenAI to monitor and detect abuse.

extra_body
Extra Body · object

To provide extra parameters.

Example:

null

service_tier
enum<string> | null
default:auto

The service tier to use for the request.

Available options:
auto,
default,
over-limit,
flex,
no-limit
Example:

"auto"

Response

OK

id
string
required

A unique identifier for the chat completion.

object
string
required

The object type, which is always text_completion.

created
integer
required

The Unix timestamp of when the completion was created.

model
string
required

The model used for the chat completion.

choices
CompletionChoice · object[]
required

A list of completion choices.

usage
Usage · object
required

Usage statistics for the completion request.

service_tier
enum<string>
required

The service tier used for the request.

Available options:
auto,
default,
over-limit,
flex,
no-limit