Skip to main content
POST
/
api
/
v1
/
completions
Completions
curl --request POST \
  --url https://www.chat.ironlabs.ai/api/v1/completions \
  --header 'Content-Type: application/json' \
  --data '
{
  "messages": [
    {
      "role": "<string>",
      "content": [
        {
          "type": "<string>",
          "text": "<string>"
        }
      ]
    }
  ],
  "models": {},
  "fallback_models": {},
  "stream": true,
  "temperature": 123,
  "maxTokens": 123,
  "maxRetries": 123,
  "search": true,
  "conversationId": {}
}
'
{
  "provider": "openai",
  "model": "gpt-4o",
  "responseMessageId": "a1b2c3d4-e5f6-7890-abcd-ef1234567890",
  "conversationId": "conv_xyz789abc",
  "conversationTitle": "Explain LLM routing",
  "message": {
    "role": "assistant",
    "type": "text",
    "content": [
      {
        "type": "text",
        "text": "LLM routing is the process of automatically selecting the best language model for a given request based on factors like cost, speed, and capability."
      }
    ]
  },
  "usage": {
    "promptTokens": 120,
    "completionTokens": 80,
    "totalTokens": 200,
    "reasoningTokens": 0,
    "costs": {
      "input_tokens_cost": 0.0000600,
      "output_tokens_cost": 0.0003200,
      "total_tokens_cost": 0.0003800
    }
  },
  "latency": 1240,
  "ttft": 320
}

Body

messages
array
required
Array of message objects representing the conversation history.
models
array of strings
required
One or more models to use for the completion, in provider/model format. At least one must be provided.
["openai/gpt-4o", "anthropic/claude-3-5-sonnet-20241022"]
fallback_models
array of strings
Optional list of fallback models to try if the primary models fail, in provider/model format.
["google/gemini-2.0-flash"]
stream
boolean
default:"false"
When true, the response is streamed as server-sent events (SSE).
temperature
number
Sampling temperature between 0 and 1. Higher values produce more random output. Defaults to the model’s preset.
maxTokens
integer
Maximum number of tokens to generate in the response.
maxRetries
integer
Number of times to retry a failed model request before falling back or erroring.
Enable real-time web search to ground the response in current information.
conversationId
string (UUID)
Associates this request with an existing conversation for context tracking.

Examples

curl -X POST https://www.chat.ironlabs.ai/api/v1/completions \
     -H "Authorization: Bearer <IAI_API_Key>" \
     -H "Content-Type: application/json" \
     -d '{
  "messages": [
    {"role": "system", "content": [{"type": "text", "text": "You are a helpful assistant."}]},
    {"role": "user", "content": [{"type": "text", "text": "Explain how LLM routing works."}]}
  ],
  "models": ["openai/gpt-4o", "anthropic/claude-3-5-sonnet-20241022"],
  "stream": false,
  "temperature": 0.7,
  "maxTokens": 1024
}'

Response

provider
string
The provider that handled the request. E.g. "openai", "anthropic".
model
string
The model that generated the response. E.g. "gpt-4o".
responseMessageId
string (UUID)
Unique identifier for the generated assistant message.
type
string
Type of the response chunk. Currently "text". Present on streaming chunks.
text
string
The generated text fragment. Present on streaming chunks.
message
object
The full assistant message object.
conversationId
string
ID of the conversation this completion belongs to.
conversationTitle
string
Auto-generated title for the conversation derived from the first message.
usage
object
Token usage and cost breakdown for the request.
latency
integer
Total request latency in milliseconds from request received to response complete.
ttft
integer
Time to first token in milliseconds.
{
  "provider": "openai",
  "model": "gpt-4o",
  "responseMessageId": "a1b2c3d4-e5f6-7890-abcd-ef1234567890",
  "conversationId": "conv_xyz789abc",
  "conversationTitle": "Explain LLM routing",
  "message": {
    "role": "assistant",
    "type": "text",
    "content": [
      {
        "type": "text",
        "text": "LLM routing is the process of automatically selecting the best language model for a given request based on factors like cost, speed, and capability."
      }
    ]
  },
  "usage": {
    "promptTokens": 120,
    "completionTokens": 80,
    "totalTokens": 200,
    "reasoningTokens": 0,
    "costs": {
      "input_tokens_cost": 0.0000600,
      "output_tokens_cost": 0.0003200,
      "total_tokens_cost": 0.0003800
    }
  },
  "latency": 1240,
  "ttft": 320
}