Python SDK
pip install ironlabsNode.js SDK
npm install ironlabsWhen to use Prompt Optimization
- Quality improvement — automatically refine prompts to get better responses from any model
- Metric optimization — optimize prompts against specific evaluation metrics (exact match, BLEU, ROUGE, etc.)
- Multi-model enhancement — find the best prompt formulation for each target model independently
- Prompt engineering at scale — replace manual trial-and-error with a data-driven optimization system
Prerequisites
Before you start, make sure you have:
- An IronLabs API key from the Settings page
- A prompt file hosted at a publicly accessible URL
- A dataset file (JSON) with input/output pairs hosted at a publicly accessible URL
Installation
Install the SDK for your language:Initialize the client
Set your API key as an environment variable:Running an Optimization
Prepare your files
You need two files hosted at publicly accessible URLs:1. Prompt file (2. Dataset file (
prompt.txt) — your starting prompt:dataset.json) — input/output pairs for evaluation:Start optimization
Pass your prompt URL, dataset URL, evaluation metric, and target models to kick off an optimization job.Parameters:
Response:
| Parameter | Required | Description |
|---|---|---|
prompt_url | Yes | URL to your original prompt text file |
dataset_url | Yes | URL to your evaluation dataset (JSON) |
metric | No | Evaluation metric to optimize for (default: exact_match) |
target_models | No | List of models to optimize for (max 10) |
reflection_model | No | A stronger model to guide the GEPA optimizer |
Check optimization status
Poll until the job reaches Response:
completed or failed. Optimization can take several minutes depending on dataset size.| Status | Description |
|---|---|
in_progress | Optimization is running |
running | Optimization is running |
completed | Optimization finished successfully |
failed | Optimization encountered an error |
Get optimization results
Retrieve the optimized prompts and performance metrics.Response:Each result includes:
model— the target model(s) this prompt was optimized foroptimizer— which technique won (GEPA or MIPROv2)original_prompt— your starting promptoptimized_prompt— the improved promptmetrics— performance scores and evaluation details
Complete example
View full end-to-end example
View full end-to-end example
Optimization techniques
IronLabs runs two optimization methods in parallel and selects the best result:GEPA (Generalized Evolutionary Prompt Adaptation)
- Uses evolutionary algorithms to refine prompts
- Adds few-shot examples automatically
- Best for complex reasoning tasks
- Requires a reflection model for guidance
MIPROv2 (Multi-Prompt Instruction Optimization v2)
- Generates and tests multiple prompt variations
- Optimizes both instructions and examples
- Best for structured tasks
- Works well without a reflection model
Evaluation metrics
| Metric | Description | Best for |
|---|---|---|
exact_match | Perfect string matching between output and expected answer | Classification, structured outputs, exact translations |
bleu | Measures n-gram overlap | Translation, text generation with reference outputs |
rouge | Measures recall-oriented overlap | Summarization, content extraction |
meteor | Considers synonyms and stemming (more flexible than BLEU) | Translation, paraphrasing, semantic similarity |
json_match | Validates JSON structure and content matching | Structured data extraction, API responses |
Dataset format
Your dataset should be a JSON array withinput and output pairs: