Creates a model response for a given input
Bearer authentication header of the form Bearer <token>, where <token> is your auth token.
current project ID
Text, image, or file inputs to the model, used to generate a response.
The model used for the chat completion.
Whether to run the model response in the background.
false
Specify additional output data to include in the model response.
code_interpreter_call.outputs, computer_call_output.output.image_url, file_search_call.results, message.input_image.image_url, message.output_text.logprobs, reasoning.encrypted_content A system (or developer) message inserted into the model's context.
An upper bound for the number of tokens that can be generated for a response, including visible output tokens and reasoning tokens.
The maximum number of total calls to built-in tools that can be processed in a response. This maximum number applies across all built-in tool calls, not per individual tool. Any further attempts to call a tool by the model will be ignored.
Set of 16 key-value pairs that can be attached to an object.
Whether to allow the model to run tool calls in parallel.
The unique ID of the previous response to the model.
Reference to a prompt template and its variables.
Configuration options for reasoning models.
The service tier to use for the request.
auto, default, over-limit, flex, no-limit "auto"
Whether to store the generated model response for later retrieval via API.
If set to true, the model response data will be streamed to the client as it is generated using server-sent events.
What sampling temperature to use, between 0 and 2. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic.
We generally recommend altering this or top_p but not both.
0 <= x <= 2Configuration options for a text response from the model.
How the model should select which tool (or tools) to use when generating a response.
none, auto, required An array of tools the model may call while generating a response.
A non-negative integer specifying the number of most likely tokens to return at each token position, each with an associated log probability. logprobs must be set to true if this parameter is used.
x >= 0An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered.
We generally recommend altering this or temperature but not both.
0 <= x <= 1The truncation strategy to use for the model response.
auto, disabled A unique identifier representing your end-user, which can help us to monitor and detect abuse. Learn more.
Used by OpenAI to cache responses for similar requests to optimize your cache hit rates. Replaces the user field.
Successful Response
none, auto, required Represents the service tier for requests.
Attributes: Auto: Automatically choose the best available tier for the request (Default or OverLimit). Analyze response to determine which tier was used. Default: Return 429 errors on hitting the rate limit, do not exceed to the OverLimit tier. OverLimit: Indicate that the request was over the user limit. This tier cannot be set by user in the request, but us used in a response for tier=Auto. Flex: Do not consume rate-limit credits, but run with lower priority. May still result in 429 errors in case of if there is no resources to process.
auto, default, over-limit, flex, no-limit completed, failed, in_progress, cancelled, queued, incomplete auto, disabled response "response"