Skip to main content
IronLabs Custom Router lets you train a personalized model selection system on your own data. The router learns from your examples to automatically pick the most appropriate AI model for each task — improving accuracy and reducing costs.

Python SDK

pip install ironlabs

Node.js SDK

npm install ironlabs

When to use Custom Router

  • Cost optimization — route simple tasks to cheaper models, reserve powerful ones for complex requests
  • Domain specialization — match prompts to models that excel in your domain (code, legal, creative writing, etc.)
  • Multi-model pipelines — let the router decide which model handles each stage instead of hardcoding choices
  • Replace guesswork — use a data-driven system trained on your own examples instead of manual model selection

Prerequisites

Before you start, make sure you have:
  • An IronLabs API key from the Settings page
  • A training data file hosted at a publicly accessible URL (GitHub, S3, CDN, etc.)

Installation

Install the SDK for your language:
pip install ironlabs

Initialize the client

Set your API key as an environment variable:
export IRONLABS_API_KEY="your_api_key_here"
Then initialize the trainer in your code:
from ironlabs import RouterTrainer

trainer = RouterTrainer()
The client automatically picks up IRONLABS_API_KEY from your environment — no need to pass it explicitly.

Training a Custom Router

1

Prepare training data

Training data maps prompts to the ideal models for each. Supported formats are JSON and CSV — host your file at any publicly accessible URL (GitHub, S3, CDN, etc.).
[
  {
    "problem": "Write a simple hello world function",
    "correct_models": ["openai/gpt-4o-mini", "anthropic/claude-3-5-haiku-20241022"]
  },
  {
    "problem": "Explain quantum computing in detail",
    "correct_models": ["openai/gpt-4o", "anthropic/claude-3-5-sonnet-20241022"]
  }
]
You can pass multiple file URLs to combine datasets from different sources.
2

Start training

Pass one or more data URLs to kick off a training job.
import time
from ironlabs import RouterTrainer

trainer = RouterTrainer()

data_urls = ["https://example.com/path/to/your/training_data.json"]

training_info = trainer.fit(data_urls)
job_id = training_info.get("training_job_id")
print(f"Training job started. Job ID: {job_id}")
Response:
{
  "training_job_id": "abc123-def456-ghi789",
  "status": "queued",
  "version": "v1.0"
}
3

Check training status

Poll until the job reaches completed or failed. Training typically takes a few minutes.
while True:
    status_info = trainer.get_status()
    status = status_info.get("status")
    print(f"Current status: {status}")

    if status == "completed":
        model_id = status_info.get("model_id")
        print(f"Training completed! Model ID: {model_id}")
        break
    elif status == "failed":
        print("Training failed.")
        break

    time.sleep(10)
Response:
{
  "training_job_id": "abc123-def456-ghi789",
  "status": "completed",
  "model_id": "model-xyz-top1_0.8542",
  "started_at": "2026-02-04T10:30:00Z",
  "completed_at": "2026-02-04T10:35:00Z",
  "training_config": {
    "timing": {
      "training_time_seconds": 45.2,
      "embedding_time_seconds": 120.5,
      "total_time_seconds": 300.0
    }
  }
}
StatusDescription
queuedJob is waiting to start
runningTraining is in progress
completedTraining finished successfully
failedTraining encountered an error
4

Get model details

Retrieve metadata and performance metrics for your trained model.
details = trainer.get_model_details()
print(f"Model details: {details}")
Response:
{
  "model_id": "model-xyz-top1_0.8542",
  "version": "v1.0",
  "status": "active",
  "embedding_model": "Qwen/Qwen3-Embedding-4B",
  "num_classes": 8,
  "created_at": "2026-02-04T10:35:00Z",
  "metrics": {
    "top1_hit_rate": 0.8542,
    "top3_hit_rate": 0.9521,
    "accuracy": 0.8542
  }
}
5

Run inference

Use your trained router to select the best model for new prompts.
inputs = [
    "Write a Python function to calculate fibonacci numbers",
    "How to implement a binary search tree in JavaScript?",
    "What is the time complexity of quicksort?"
]

predictions = trainer.predict(inputs)

for i, pred in enumerate(predictions.get("predictions", [])):
    print(f"Input: {inputs[i]}")
    print(f"Recommended Model: {pred.get('top_model')}")
    print(f"Confidence: {pred.get('top_prob'):.0%}")
    print("-" * 40)
Response:
{
  "predictions": [
    {
      "top_model": "openai/gpt-4o-mini",
      "top_prob": 0.92,
      "models": [
        { "model": "openai/gpt-4o-mini", "confidence": 0.92, "rank": 1 },
        { "model": "anthropic/claude-3-5-haiku-20241022", "confidence": 0.06, "rank": 2 }
      ]
    }
  ],
  "version": "v1.0"
}
Each prediction includes:
  • top_model — the recommended model for this input
  • top_prob — confidence score (0–1)
  • models — full ranked list with individual confidence scores

Complete example

import time
from ironlabs import RouterTrainer

def main():
    trainer = RouterTrainer()

    # 1. Start training
    data_urls = ["https://example.com/path/to/your/training_data.json"]
    training_info = trainer.fit(data_urls)
    job_id = training_info.get("training_job_id")
    print(f"Training job started. Job ID: {job_id}")

    # 2. Poll until complete
    while True:
        status_info = trainer.get_status()
        status = status_info.get("status")
        print(f"Current status: {status}")

        if status == "completed":
            model_id = status_info.get("model_id")
            print(f"Training completed! Model ID: {model_id}")
            break
        elif status == "failed":
            print("Training failed.")
            return

        time.sleep(10)

    # 3. Inspect model
    details = trainer.get_model_details()
    print(f"Model details: {details}")

    # 4. Run inference
    test_inputs = [
        "Write a Python function to calculate fibonacci numbers",
        "How to implement a binary search tree in JavaScript?",
        "What is the time complexity of quicksort?"
    ]

    predictions = trainer.predict(test_inputs)

    for i, pred in enumerate(predictions.get("predictions", [])):
        print(f"Input: {test_inputs[i]}")
        print(f"Recommended Model: {pred.get('top_model')}")
        print(f"Confidence: {pred.get('top_prob'):.0%}")
        print("-" * 40)

if __name__ == "__main__":
    main()

Loading an existing model

Reuse a previously trained model without going through training again.
from ironlabs import RouterTrainer

trainer = RouterTrainer()
trainer.model_id = "model-xyz-top1_0.8542"

predictions = trainer.predict(["What is the time complexity of quicksort?"])
print(f"Recommended model: {predictions['predictions'][0]['top_model']}")

Batch processing

The predict endpoint supports up to 500 inputs per request. For larger datasets, split into chunks.
from ironlabs import RouterTrainer

trainer = RouterTrainer()
trainer.model_id = "model-xyz-top1_0.8542"

large_dataset = [f"Test query {i}" for i in range(1200)]
BATCH_SIZE = 500

all_predictions = []
for i in range(0, len(large_dataset), BATCH_SIZE):
    batch = large_dataset[i:i + BATCH_SIZE]
    result = trainer.predict(batch)
    all_predictions.extend(result.get("predictions", []))
    print(f"Processed {min(i + BATCH_SIZE, len(large_dataset))}/{len(large_dataset)} inputs")

print(f"Total predictions: {len(all_predictions)}")

Supported data formats

FormatDescription
JSONArray of objects with problem and correct_models fields
CSVColumns for problem and correct_models (comma-separated model names in a single cell)
Multiple filesPass multiple URLs in Data_URLs to combine datasets

Model lifecycle

StateDescription
ActiveAvailable for inference
ArchivedAutomatically archived after 1 year of inactivity
RemovedInactive models are cleaned up weekly to optimize storage