Skip to main content
POST
/
v1
/
responses
Create a response
curl --request POST \
  --url https://api.tokenfactory.nebius.com/v1/responses \
  --header 'Authorization: Bearer <token>' \
  --header 'Content-Type: application/json' \
  --data '
{
  "input": "<string>",
  "model": "<string>",
  "background": false,
  "include": [
    "code_interpreter_call.outputs"
  ],
  "instructions": "<string>",
  "max_output_tokens": 123,
  "max_tool_calls": 123,
  "metadata": {},
  "parallel_tool_calls": true,
  "previous_response_id": "<string>",
  "prompt": {
    "id": "<string>",
    "variables": {},
    "version": "<string>"
  },
  "reasoning": {
    "effort": "minimal",
    "generate_summary": "auto",
    "summary": "auto"
  },
  "service_tier": "auto",
  "store": true,
  "stream": true,
  "temperature": 1,
  "text": {
    "format": {
      "type": "text"
    },
    "verbosity": "low"
  },
  "tool_choice": "auto",
  "tools": [
    {
      "name": "<string>",
      "type": "function",
      "parameters": {},
      "strict": true,
      "description": "<string>"
    }
  ],
  "top_logprobs": 1,
  "top_p": 0.5,
  "truncation": "disabled",
  "user": "<string>",
  "prompt_cache_key": "<string>"
}
'
{
  "id": "<string>",
  "created_at": 123,
  "model": "<string>",
  "output": [
    {
      "id": "<string>",
      "content": [
        {
          "annotations": [
            {
              "file_id": "<string>",
              "filename": "<string>",
              "index": 123,
              "type": "file_citation"
            }
          ],
          "text": "<string>",
          "type": "output_text",
          "logprobs": [
            {
              "token": "<string>",
              "bytes": [
                123
              ],
              "logprob": 123,
              "top_logprobs": [
                {
                  "token": "<string>",
                  "bytes": [
                    123
                  ],
                  "logprob": 123
                }
              ]
            }
          ]
        }
      ],
      "role": "assistant",
      "status": "in_progress",
      "type": "message"
    }
  ],
  "parallel_tool_calls": true,
  "temperature": 123,
  "tool_choice": "none",
  "tools": [
    {
      "name": "<string>",
      "type": "function",
      "parameters": {},
      "strict": true,
      "description": "<string>"
    }
  ],
  "top_p": 123,
  "background": true,
  "max_output_tokens": 123,
  "service_tier": "auto",
  "status": "completed",
  "truncation": "auto",
  "error": {
    "code": "server_error",
    "message": "<string>"
  },
  "incomplete_details": {
    "reason": "max_output_tokens"
  },
  "instructions": "<string>",
  "metadata": {},
  "object": "response",
  "max_tool_calls": 123,
  "previous_response_id": "<string>",
  "prompt": {
    "id": "<string>",
    "variables": {},
    "version": "<string>"
  },
  "reasoning": {
    "effort": "minimal",
    "generate_summary": "auto",
    "summary": "auto"
  },
  "text": {
    "format": {
      "type": "text"
    },
    "verbosity": "low"
  },
  "top_logprobs": 123,
  "usage": {
    "input_tokens": 123,
    "input_tokens_details": {
      "cached_tokens": 123,
      "input_tokens_per_turn": [
        123
      ],
      "cached_tokens_per_turn": [
        123
      ]
    },
    "output_tokens": 123,
    "output_tokens_details": {
      "reasoning_tokens": 0,
      "tool_output_tokens": 0,
      "output_tokens_per_turn": [
        123
      ],
      "tool_output_tokens_per_turn": [
        123
      ]
    },
    "total_tokens": 123
  },
  "user": "<string>"
}

Authorizations

Authorization
string
header
required

Bearer authentication header of the form Bearer <token>, where <token> is your auth token.

Query Parameters

ai_project_id
string | null

current project ID

Body

application/json
input
required

Text, image, or file inputs to the model, used to generate a response.

model
string
required

The model used for the chat completion.

background
boolean | null

Whether to run the model response in the background.

Example:

false

include
enum<string>[] | null

Specify additional output data to include in the model response.

Available options:
code_interpreter_call.outputs,
computer_call_output.output.image_url,
file_search_call.results,
message.input_image.image_url,
message.output_text.logprobs,
reasoning.encrypted_content
instructions
string | null

A system (or developer) message inserted into the model's context.

max_output_tokens
integer | null

An upper bound for the number of tokens that can be generated for a response, including visible output tokens and reasoning tokens.

max_tool_calls
integer | null

The maximum number of total calls to built-in tools that can be processed in a response. This maximum number applies across all built-in tool calls, not per individual tool. Any further attempts to call a tool by the model will be ignored.

metadata
Metadata · object

Set of 16 key-value pairs that can be attached to an object.

parallel_tool_calls
boolean | null

Whether to allow the model to run tool calls in parallel.

previous_response_id
string | null

The unique ID of the previous response to the model.

prompt
ResponsePrompt · object

Reference to a prompt template and its variables.

reasoning
Reasoning · object

Configuration options for reasoning models.

service_tier
enum<string>
default:auto

The service tier to use for the request.

Available options:
auto,
default,
over-limit,
flex,
no-limit
Example:

"auto"

store
boolean | null

Whether to store the generated model response for later retrieval via API.

stream
boolean | null

If set to true, the model response data will be streamed to the client as it is generated using server-sent events.

temperature
number | null
default:1

What sampling temperature to use, between 0 and 2. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic.

We generally recommend altering this or top_p but not both.

Required range: 0 <= x <= 2
text
ResponseTextConfig · object

Configuration options for a text response from the model.

tool_choice
default:auto

How the model should select which tool (or tools) to use when generating a response.

Available options:
none,
auto,
required
tools
(FunctionTool · object | FileSearchTool · object | ComputerTool · object | WebSearchTool · object | Mcp · object | CodeInterpreter · object | ImageGeneration · object | LocalShell · object | CustomTool · object | WebSearchPreviewTool · object)[]

An array of tools the model may call while generating a response.

top_logprobs
integer | null

A non-negative integer specifying the number of most likely tokens to return at each token position, each with an associated log probability. logprobs must be set to true if this parameter is used.

Required range: x >= 0
top_p
number | null

An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered.

We generally recommend altering this or temperature but not both.

Required range: 0 <= x <= 1
truncation
enum<string>
default:disabled

The truncation strategy to use for the model response.

Available options:
auto,
disabled
user
string | null

A unique identifier representing your end-user, which can help us to monitor and detect abuse. Learn more.

prompt_cache_key
string | null

Used by OpenAI to cache responses for similar requests to optimize your cache hit rates. Replaces the user field.

Response

Successful Response

id
string
required
created_at
integer
required
model
string
required
output
(ResponseOutputMessage · object | ResponseFileSearchToolCall · object | ResponseFunctionToolCall · object | ResponseFunctionWebSearch · object | ResponseComputerToolCall · object | ResponseReasoningItem · object | ImageGenerationCall · object | ResponseCodeInterpreterToolCall · object | LocalShellCall · object | McpCall · object | McpListTools · object | McpApprovalRequest · object | ResponseCustomToolCall · object)[]
required
parallel_tool_calls
boolean
required
temperature
number
required
tool_choice
required
Available options:
none,
auto,
required
tools
(FunctionTool · object | FileSearchTool · object | ComputerTool · object | WebSearchTool · object | Mcp · object | CodeInterpreter · object | ImageGeneration · object | LocalShell · object | CustomTool · object | WebSearchPreviewTool · object)[]
required
top_p
number
required
background
boolean
required
max_output_tokens
integer
required
service_tier
enum<string>
required

Represents the service tier for requests.

Attributes: Auto: Automatically choose the best available tier for the request (Default or OverLimit). Analyze response to determine which tier was used. Default: Return 429 errors on hitting the rate limit, do not exceed to the OverLimit tier. OverLimit: Indicate that the request was over the user limit. This tier cannot be set by user in the request, but us used in a response for tier=Auto. Flex: Do not consume rate-limit credits, but run with lower priority. May still result in 429 errors in case of if there is no resources to process.

Available options:
auto,
default,
over-limit,
flex,
no-limit
status
enum<string>
required
Available options:
completed,
failed,
in_progress,
cancelled,
queued,
incomplete
truncation
enum<string>
required
Available options:
auto,
disabled
error
ResponseError · object
incomplete_details
IncompleteDetails · object
instructions
string | null
metadata
Metadata · object
object
enum<string>
default:response
Available options:
response
Allowed value: "response"
max_tool_calls
integer | null
previous_response_id
string | null
prompt
ResponsePrompt · object
reasoning
Reasoning · object
text
ResponseTextConfig · object
top_logprobs
integer | null
usage
ResponseUsage · object
user
string | null