State-by-state
Received
Trigger: Your SDK or HTTP client sends aPOST /completions (or /model-select) call. The API gateway authenticates the bearer token, validates the payload schema, and assigns a responseMessageId (UUID).
Observable: No content yet — only the request ID. If auth fails or schema is invalid, the lifecycle terminates here with a 4xx and never reaches Scored.
Scored
Trigger: The router embeds the user message and runs the classifier (either a pre-trained router for general use or your Custom Router ifrouterId is set). Each candidate model in models gets a score that blends three signals:
| Signal | Source |
|---|---|
| Capability fit | classifier output — “is this model likely to answer correctly?” |
| Cost | normalized per-million-token price |
| Latency | rolling p50 time-to-first-token from the last hour |
tradeoff setting ("latency", "cost", "quality") controls the weighting.
Observable: Internal — surfaced in the Studio activity log as the decision trace.
Routed
Trigger: Top-k ranking is reduced to a single primary plus an ordered fallback list. The system reserves capacity at the chosen provider. Observable: Streaming requests start seeing the chosen provider/model echoed in the first SSE chunk:Executed
Trigger: IronLabs forwards the request to the provider with normalized parameters. Streaming responses are pass-through SSE; non-streaming responses are buffered. Observable: Either the full assistant message (non-streaming) or a stream oftext-delta events (streaming). The usage block — {prompt_tokens, completion_tokens, total_tokens} — is attached at the close of execution.
Fallback (optional)
Trigger: The provider returned a 5xx, timed out, exceeded the request’smaxTokens, hit a content-policy block, or violated your maxRetries budget. IronLabs walks to the next entry in the fallback list and re-enters Executed.
Observable: The final response includes an errorTrace array listing each attempt that failed. Inspect this when you see a slower-than-expected response — most often, a primary timed out and a fallback served.
502 and an errorTrace describing every attempt.
Logged
Trigger: Whether the request succeeded or failed, IronLabs writes a single row toModelSelectUsage with: userId, customRouterId (if any), provider, model, cost, latency_ms, errorTrace, responseMessageId.
Observable: Visible in the Studio → Activity tab within ~10 seconds. Cost rolls up into your Balance via DebitTransaction.
Edge cases
previous_sessionfor context continuity. When set, the router prefers the same model that handled the prior turn —Scoredis short-circuited to maintain conversation coherence. The trace recordsroutingReason: "session_continuity".stream: falsewaits for full execution before returning. Network round-trip is faster but you lose progressive output.- No fallback configured.
Fallback → Executedis skipped — a single provider failure surfaces as502immediately. Setfallback_modelsfor production traffic. - Custom Router cold start. First request after 5 minutes idle loads the XGBoost classifier from Modal Volume; expect +200–500ms on the cold call. Subsequent calls are warm.
Related
Fallback Configuration
How to set primaries vs fallbacks.
Custom Router
Replace the pre-trained scorer with your own.
Completions API
Every parameter that affects routing.
Sandbox Isolation
How tenant data stays separated across requests.