Skip to main content
IronLabs Prompt Optimization uses state-of-the-art techniques (GEPA and MIPROv2) to automatically refine your prompts for better performance. The system tests different prompt variations and selects the best one based on your chosen metric.

Python SDK

pip install ironlabs

Node.js SDK

npm install ironlabs

When to use Prompt Optimization

  • Quality improvement — automatically refine prompts to get better responses from any model
  • Metric optimization — optimize prompts against specific evaluation metrics (exact match, BLEU, ROUGE, etc.)
  • Multi-model enhancement — find the best prompt formulation for each target model independently
  • Prompt engineering at scale — replace manual trial-and-error with a data-driven optimization system

Prerequisites

Before you start, make sure you have:
  • An IronLabs API key from the Settings page
  • A prompt file hosted at a publicly accessible URL
  • A dataset file (JSON) with input/output pairs hosted at a publicly accessible URL

Installation

Install the SDK for your language:
pip install ironlabs

Initialize the client

Set your API key as an environment variable:
export IRONLABS_API_KEY="your_api_key_here"
Then initialize the optimizer in your code:
from ironlabs import PromptOptimizer

optimizer = PromptOptimizer()
The client automatically picks up IRONLABS_API_KEY from your environment — no need to pass it explicitly.

Running an Optimization

1

Prepare your files

You need two files hosted at publicly accessible URLs:1. Prompt file (prompt.txt) — your starting prompt:
Translate the following text to French.
2. Dataset file (dataset.json) — input/output pairs for evaluation:
[
  {
    "input": "Hello, how are you?",
    "output": "Bonjour, comment allez-vous?"
  },
  {
    "input": "Good morning",
    "output": "Bonjour"
  }
]
2

Start optimization

Pass your prompt URL, dataset URL, evaluation metric, and target models to kick off an optimization job.
optimization_info = optimizer.fit(
    prompt_url="https://example.com/prompt.txt",
    dataset_url="https://example.com/dataset.json",
    metric="exact_match",
    target_models=["openai/gpt-4o-mini", "anthropic/claude-3-5-haiku-20241022"]
)

job_id = optimization_info.get("job_id")
print(f"Optimization job started. Job ID: {job_id}")
Parameters:
ParameterRequiredDescription
prompt_urlYesURL to your original prompt text file
dataset_urlYesURL to your evaluation dataset (JSON)
metricNoEvaluation metric to optimize for (default: exact_match)
target_modelsNoList of models to optimize for (max 10)
reflection_modelNoA stronger model to guide the GEPA optimizer
Response:
{
  "job_id": "opt-abc123-def456"
}
3

Check optimization status

Poll until the job reaches completed or failed. Optimization can take several minutes depending on dataset size.
import time

while True:
    status_info = optimizer.get_status()
    status = status_info.get("status")
    print(f"Current status: {status}")

    if status == "completed":
        print("Optimization completed!")
        break
    elif status == "failed":
        print("Optimization failed.")
        break

    time.sleep(15)
Response:
{
  "status": "completed"
}
StatusDescription
in_progressOptimization is running
runningOptimization is running
completedOptimization finished successfully
failedOptimization encountered an error
4

Get optimization results

Retrieve the optimized prompts and performance metrics.
results = optimizer.get_results()

for result in results.get("results", []):
    models = result.get('model', [])
    metrics = result.get('metrics', {})
    print(f"Model: {', '.join(models)}")
    print(f"Optimizer: {result.get('optimizer')}")
    print(f"Avg Score: {metrics.get('avg_score')}")
    print(f"Original Prompt: {result.get('original_prompt')}")
    print(f"Optimized Prompt: {result.get('optimized_prompt')}")
    print("-" * 40)
Response:
{
  "job_id": "opt-abc123-def456",
  "status": "completed",
  "results": [
    {
      "model": ["openai/gpt-4o-mini"],
      "optimizer": "GEPA",
      "original_prompt": "Translate the following text to French.",
      "optimized_prompt": "You are a professional French translator. Translate the following English text to French, maintaining the tone and context.\n\n### Examples:\nInput: Hello\nOutput: Bonjour\n\nInput: Thank you\nOutput: Merci",
      "metrics": {
        "metric_name": "exact_match",
        "avg_score": 0.92,
        "eval_samples": 20,
        "train_samples": 40,
        "dev_samples": 10,
        "winner": "GEPA"
      }
    }
  ]
}
Each result includes:
  • model — the target model(s) this prompt was optimized for
  • optimizer — which technique won (GEPA or MIPROv2)
  • original_prompt — your starting prompt
  • optimized_prompt — the improved prompt
  • metrics — performance scores and evaluation details

Complete example

import time
from ironlabs import PromptOptimizer

def main():
    optimizer = PromptOptimizer()

    # 1. Start optimization
    print("Starting prompt optimization job...")
    optimization_info = optimizer.fit(
        prompt_url="https://example.com/prompt.txt",
        dataset_url="https://example.com/dataset.json",
        metric="exact_match",
        target_models=["openai/gpt-4o-mini", "anthropic/claude-3-5-haiku-20241022"]
    )
    job_id = optimization_info.get("job_id")
    print(f"Optimization job started. Job ID: {job_id}")

    # 2. Poll until complete
    print("Waiting for optimization to complete...")
    while True:
        status_info = optimizer.get_status()
        status = status_info.get("status")
        print(f"Current status: {status}")

        if status == "completed":
            print("Optimization completed!")
            break
        elif status == "failed":
            print("Optimization failed.")
            return

        time.sleep(15)

    # 3. Get results
    results = optimizer.get_results()
    print("\nOptimization Results:")
    for result in results.get("results", []):
        models = result.get('model', [])
        metrics = result.get('metrics', {})
        print(f"Model: {', '.join(models)}")
        print(f"Optimizer: {result.get('optimizer')}")
        print(f"Avg Score: {metrics.get('avg_score')}")
        print(f"Original Prompt: {result.get('original_prompt')}")
        print(f"Optimized Prompt: {result.get('optimized_prompt')}")
        print("-" * 40)

if __name__ == "__main__":
    main()

Optimization techniques

IronLabs runs two optimization methods in parallel and selects the best result:

GEPA (Generalized Evolutionary Prompt Adaptation)

  • Uses evolutionary algorithms to refine prompts
  • Adds few-shot examples automatically
  • Best for complex reasoning tasks
  • Requires a reflection model for guidance

MIPROv2 (Multi-Prompt Instruction Optimization v2)

  • Generates and tests multiple prompt variations
  • Optimizes both instructions and examples
  • Best for structured tasks
  • Works well without a reflection model
The system automatically selects the technique that achieves the highest score on your metric.

Evaluation metrics

MetricDescriptionBest for
exact_matchPerfect string matching between output and expected answerClassification, structured outputs, exact translations
bleuMeasures n-gram overlapTranslation, text generation with reference outputs
rougeMeasures recall-oriented overlapSummarization, content extraction
meteorConsiders synonyms and stemming (more flexible than BLEU)Translation, paraphrasing, semantic similarity
json_matchValidates JSON structure and content matchingStructured data extraction, API responses

Dataset format

Your dataset should be a JSON array with input and output pairs:
[
  {
    "input": "What is the capital of France?",
    "output": "Paris"
  },
  {
    "input": "Translate 'hello' to Spanish",
    "output": "hola"
  }
]