Create completion - Nebius Token Factory documentation

Authorizations

Authorization

string

header

required

Bearer authentication header of the form Bearer <token>, where <token> is your auth token.

Query Parameters

ai_project_id

string | null

current project ID

Body

application/json

model

string

required

ID of the model to use.

Example:

"meta-llama/Meta-Llama-3.1-70B-Instruct"

prompt

required

The prompt(s) to generate completions for, encoded as a string, array of strings, array of tokens, or array of token arrays.

Example:

"Say this is a test"

stream

boolean | null

default:false

Enable response streaming.

stream_options

Stream Options · object

If set to {"include_usage": True}, usage stats will be sent with the last chunk of data

Show child attributes

Example:

null

max_tokens

integer | null

Max completion token count

Example:

100

temperature

number | null

default:1

What sampling temperature to use, between 0 and 2. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic.

top_p

number | null

default:1

An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered.

integer | null

default:1

How many completions to generate for each prompt.

logprobs

integer | null

Include the log probabilities on the logprobs most likely tokens, as well the chosen tokens. So for example, if logprobs is 5, the API will return a list of the 5 most likely tokens. The API will always return the logprob of the sampled token, so there may be up to logprobs+1 elements in the response.

Example:

null

echo

boolean | null

default:false

Echo back the prompt in addition to the completion.

stop

Up to 4 sequences where the API will stop generating further tokens.

presence_penalty

number | null

default:0

Number between -2.0 and 2.0. Positive values penalize new tokens based on whether they appear in the text so far, increasing the model's likelihood to talk about new topics.

frequency_penalty

number | null

default:0

Number between -2.0 and 2.0. Positive values penalize new tokens based on their existing frequency in the text so far, decreasing the model's likelihood to repeat the same line verbatim.

logit_bias

Logit Bias · object

Modify the likelihood of specified tokens appearing in the completion. Accepts a json object that maps tokens (specified by their token ID in the tokenizer) to an associated bias value from -100 to 100. Mathematically, the bias is added to the logits generated by the model prior to sampling. The exact effect will vary per model, but values between -1 and 1 should decrease or increase likelihood of selection; values like -100 or 100 should result in a ban or exclusive selection of the relevant token.

Show child attributes

user

string | null

A unique identifier representing your end-user, which can help OpenAI to monitor and detect abuse.

extra_body

Extra Body · object

To provide extra parameters.

Example:

null

service_tier

enum<string>

default:auto

The service tier to use for the request.

Available options:

auto,

default,

over-limit,

flex,

no-limit

Example:

"auto"

Response

string

required

A unique identifier for the chat completion.

object

string

required

The object type, which is always text_completion.

created

integer

required

The Unix timestamp of when the completion was created.

model

string

required

The model used for the chat completion.

choices

CompletionChoice · object[]

required

A list of completion choices.

Show child attributes

usage

Usage · object

required

Usage statistics for the completion request.

Show child attributes

service_tier

enum<string>

required

The service tier used for the request.

Available options:

auto,

default,

over-limit,

flex,

no-limit