At a glance
| Question | GEPA | MIPROv2 |
|---|---|---|
| What does it tune? | Instruction wording and few-shot demonstrations | In-context examples (BootstrapFewShot) |
| Best when… | Your prompt is fundamentally wrong or vague | Your prompt is right but examples could be better |
| Typical cost | Higher (more reflection loops per iteration) | Lower (bootstrap is cheaper than reflection) |
| Surprising behavior | Can drastically shorten prompts that seemed essential | Can pick one very specific example that lifts the whole eval |
| Underlying paper / library | DSPy GEPA (Guided Experiment & Prompt Adaptation) | DSPy MIPROv2 (Multi-prompt Instruction PRoposal) |
| Typical iterations | 8–16 | 4–8 |
When to prefer each
Pick GEPA when:- Your prompt was written quickly and you’re not sure the wording is right.
- The task requires multi-step reasoning that the model isn’t currently doing.
- You want the optimizer to explore structural changes (re-ordering instructions, adding a chain-of-thought scaffold, removing redundant constraints).
- Your prompt’s wording is solid and you’re trying to lift the last 5–10% of accuracy.
- The task has a clear input/output schema and the failure mode is “almost right but format-wrong”.
- You have a small, high-quality bootstrap set and want the optimizer to leverage in-context examples rather than rewrite instructions.
How IronLabs picks the winner
Both optimizers run against the same dataset with the same metric (json_match, exact_match, bleu, rouge, meteor, or facility). At convergence:
- Each optimizer reports its best candidate prompt and the eval score on the held-out portion of your dataset.
- IronLabs compares scores with a tie threshold of 0.005. If both are within the threshold, MIPROv2 wins (lower-cost tiebreaker).
- The winner’s candidate prompt and score are written to the job’s result. The runner-up’s score is still recorded for reference.
Disabling one optimizer
If you’ve measured that one optimizer always wins for your domain and you want to save the cost, passoptimizers:
Cost model
Token spend per job is roughly:Related
Prompt Optimization Quickstart
Run your first optimization job.
AgentOpt
Optimize entire agents, not just prompts.
Routing Lifecycle
See how an optimized prompt is then served.
Security & Isolation
How your OpenRouter key is handled during optimization.