Completions
API Reference
Completions
Send a list of messages and receive a model-generated completion. Supports streaming, web search, research mode, and external toolkits.
POST
Completions
Body
Array of message objects representing the conversation history.
One or more models to use for the completion, in
provider/model format. At least one must be provided.Optional list of fallback models to try if the primary models fail, in
provider/model format.When
true, the response is streamed as server-sent events (SSE).Sampling temperature between
0 and 1. Higher values produce more random output. Defaults to the model’s preset.Maximum number of tokens to generate in the response.
Number of times to retry a failed model request before falling back or erroring.
Enable real-time web search to ground the response in current information.
Associates this request with an existing conversation for context tracking.
Examples
Response
The provider that handled the request. E.g.
"openai", "anthropic".The model that generated the response. E.g.
"gpt-4o".Unique identifier for the generated assistant message.
Type of the response chunk. Currently
"text". Present on streaming chunks.The generated text fragment. Present on streaming chunks.
The full assistant message object.
ID of the conversation this completion belongs to.
Auto-generated title for the conversation derived from the first message.
Token usage and cost breakdown for the request.
Total request latency in milliseconds from request received to response complete.
Time to first token in milliseconds.