Completions

curl --request POST \
  --url https://www.chat.ironlabs.ai/api/v1/completions \
  --header 'Content-Type: application/json' \
  --data '
{
  "messages": [
    {
      "role": "<string>",
      "content": [
        {
          "type": "<string>",
          "text": "<string>"
        }
      ]
    }
  ],
  "models": {},
  "fallback_models": {},
  "stream": true,
  "temperature": 123,
  "maxTokens": 123,
  "maxRetries": 123,
  "search": true,
  "conversationId": {}
}
'

{
  "provider": "openai",
  "model": "gpt-4o",
  "responseMessageId": "a1b2c3d4-e5f6-7890-abcd-ef1234567890",
  "conversationId": "conv_xyz789abc",
  "conversationTitle": "Explain LLM routing",
  "message": {
    "role": "assistant",
    "type": "text",
    "content": [
      {
        "type": "text",
        "text": "LLM routing is the process of automatically selecting the best language model for a given request based on factors like cost, speed, and capability."
      }
    ]
  },
  "usage": {
    "promptTokens": 120,
    "completionTokens": 80,
    "totalTokens": 200,
    "reasoningTokens": 0,
    "costs": {
      "input_tokens_cost": 0.0000600,
      "output_tokens_cost": 0.0003200,
      "total_tokens_cost": 0.0003800
    }
  },
  "latency": 1240,
  "ttft": 320
}

POST

api

completions

Completions

curl --request POST \
  --url https://www.chat.ironlabs.ai/api/v1/completions \
  --header 'Content-Type: application/json' \
  --data '
{
  "messages": [
    {
      "role": "<string>",
      "content": [
        {
          "type": "<string>",
          "text": "<string>"
        }
      ]
    }
  ],
  "models": {},
  "fallback_models": {},
  "stream": true,
  "temperature": 123,
  "maxTokens": 123,
  "maxRetries": 123,
  "search": true,
  "conversationId": {}
}
'

{
  "provider": "openai",
  "model": "gpt-4o",
  "responseMessageId": "a1b2c3d4-e5f6-7890-abcd-ef1234567890",
  "conversationId": "conv_xyz789abc",
  "conversationTitle": "Explain LLM routing",
  "message": {
    "role": "assistant",
    "type": "text",
    "content": [
      {
        "type": "text",
        "text": "LLM routing is the process of automatically selecting the best language model for a given request based on factors like cost, speed, and capability."
      }
    ]
  },
  "usage": {
    "promptTokens": 120,
    "completionTokens": 80,
    "totalTokens": 200,
    "reasoningTokens": 0,
    "costs": {
      "input_tokens_cost": 0.0000600,
      "output_tokens_cost": 0.0003200,
      "total_tokens_cost": 0.0003800
    }
  },
  "latency": 1240,
  "ttft": 320
}

Body

messages

array

required

Array of message objects representing the conversation history.

Show Message object

role

string

required

Role of the message sender. One of system, user, or assistant.

content

array

required

Array of content blocks for the message.

Show Content block

type

string

required

Type of content block. Currently supports "text".

text

string

required

The text content of the block.

models

array of strings

required

One or more models to use for the completion, in provider/model format. At least one must be provided.

["openai/gpt-4o", "anthropic/claude-3-5-sonnet-20241022"]

fallback_models

array of strings

Optional list of fallback models to try if the primary models fail, in provider/model format.

["google/gemini-2.0-flash"]

stream

boolean

default:"false"

When true, the response is streamed as server-sent events (SSE).

temperature

number

Sampling temperature between 0 and 1. Higher values produce more random output. Defaults to the model’s preset.

maxTokens

integer

Maximum number of tokens to generate in the response.

maxRetries

integer

Number of times to retry a failed model request before falling back or erroring.

boolean

Enable real-time web search to ground the response in current information.

conversationId

string (UUID)

Associates this request with an existing conversation for context tracking.

Examples

curl -X POST https://www.chat.ironlabs.ai/api/v1/completions \
     -H "Authorization: Bearer <IAI_API_Key>" \
     -H "Content-Type: application/json" \
     -d '{
  "messages": [
    {"role": "system", "content": [{"type": "text", "text": "You are a helpful assistant."}]},
    {"role": "user", "content": [{"type": "text", "text": "Explain how LLM routing works."}]}
  ],
  "models": ["openai/gpt-4o", "anthropic/claude-3-5-sonnet-20241022"],
  "stream": false,
  "temperature": 0.7,
  "maxTokens": 1024
}'

Response

provider

string

The provider that handled the request. E.g. "openai", "anthropic".

model

string

The model that generated the response. E.g. "gpt-4o".

responseMessageId

string (UUID)

Unique identifier for the generated assistant message.

type

string

Type of the response chunk. Currently "text". Present on streaming chunks.

text

string

The generated text fragment. Present on streaming chunks.

message

object

The full assistant message object.

Show Message object

role

string

Always "assistant".

type

string

Always "text".

content

array

Array of content blocks.

Show Content block

type

string

Content block type. Currently "text".

text

string

The generated text.

conversationId

string

ID of the conversation this completion belongs to.

conversationTitle

string

Auto-generated title for the conversation derived from the first message.

usage

object

Token usage and cost breakdown for the request.

Show Usage object

promptTokens

integer

Number of tokens in the input messages.

completionTokens

integer

Number of tokens in the generated response.

totalTokens

integer

Total tokens used (promptTokens + completionTokens).

reasoningTokens

integer

Tokens used for internal reasoning (e.g. thinking models). 0 for standard models.

costs

object

Cost breakdown in USD.

Show Costs object

input_tokens_cost

number

Cost for the prompt tokens.

output_tokens_cost

number

Cost for the completion tokens.

total_tokens_cost

number

Total cost for the request.

latency

integer

Total request latency in milliseconds from request received to response complete.

ttft

integer

Time to first token in milliseconds.

{
  "provider": "openai",
  "model": "gpt-4o",
  "responseMessageId": "a1b2c3d4-e5f6-7890-abcd-ef1234567890",
  "conversationId": "conv_xyz789abc",
  "conversationTitle": "Explain LLM routing",
  "message": {
    "role": "assistant",
    "type": "text",
    "content": [
      {
        "type": "text",
        "text": "LLM routing is the process of automatically selecting the best language model for a given request based on factors like cost, speed, and capability."
      }
    ]
  },
  "usage": {
    "promptTokens": 120,
    "completionTokens": 80,
    "totalTokens": 200,
    "reasoningTokens": 0,
    "costs": {
      "input_tokens_cost": 0.0000600,
      "output_tokens_cost": 0.0003200,
      "total_tokens_cost": 0.0003800
    }
  },
  "latency": 1240,
  "ttft": 320
}

Introduction Model Select

​Body

​Examples

​Response

Body

Examples

Response