GEPA vs MIPROv2 - IronLabs Docs

IronLabs Prompt Optimization ships with two optimization algorithms — GEPA and MIPROv2 — and runs them in parallel by default. This page explains what each does, when to prefer one, and how the winner is selected.

At a glance

Question	GEPA	MIPROv2
What does it tune?	Instruction wording and few-shot demonstrations	In-context examples (`BootstrapFewShot`)
Best when…	Your prompt is fundamentally wrong or vague	Your prompt is right but examples could be better
Typical cost	Higher (more reflection loops per iteration)	Lower (bootstrap is cheaper than reflection)
Surprising behavior	Can drastically shorten prompts that seemed essential	Can pick one very specific example that lifts the whole eval
Underlying paper / library	DSPy GEPA (Guided Experiment & Prompt Adaptation)	DSPy MIPROv2 (Multi-prompt Instruction PRoposal)
Typical iterations	8–16	4–8

When to prefer each

Pick GEPA when:

Your prompt was written quickly and you’re not sure the wording is right.
The task requires multi-step reasoning that the model isn’t currently doing.
You want the optimizer to explore structural changes (re-ordering instructions, adding a chain-of-thought scaffold, removing redundant constraints).

Pick MIPROv2 when:

Your prompt’s wording is solid and you’re trying to lift the last 5–10% of accuracy.
The task has a clear input/output schema and the failure mode is “almost right but format-wrong”.
You have a small, high-quality bootstrap set and want the optimizer to leverage in-context examples rather than rewrite instructions.

Pick both (default) when you don’t know which applies — the parallel run costs ~1.5× a single run (not 2×, because they share embedding and dataset prep) and lets the data decide.

How IronLabs picks the winner

Both optimizers run against the same dataset with the same metric (json_match, exact_match, bleu, rouge, meteor, or facility). At convergence:

Each optimizer reports its best candidate prompt and the eval score on the held-out portion of your dataset.
IronLabs compares scores with a tie threshold of 0.005. If both are within the threshold, MIPROv2 wins (lower-cost tiebreaker).
The winner’s candidate prompt and score are written to the job’s result. The runner-up’s score is still recorded for reference.

You’ll see this in the optimization job result:

{
  "job_id": "opt-abc123-def456",
  "status": "completed",
  "results": [
    {
      "model": "openai/gpt-4o-mini",
      "optimizer": "gepa",
      "score": 0.847,
      "is_winner": true,
      "optimized_prompt": "..."
    },
    {
      "model": "openai/gpt-4o-mini",
      "optimizer": "miprov2",
      "score": 0.812,
      "is_winner": false,
      "optimized_prompt": "..."
    }
  ]
}

Disabling one optimizer

If you’ve measured that one optimizer always wins for your domain and you want to save the cost, pass optimizers:

optimizer.fit(
    prompt_url="https://example.com/prompt.txt",
    dataset_url="https://example.com/dataset.json",
    metric="exact_match",
    target_models=["openai/gpt-4o-mini"],
    optimizers=["gepa"],  # default is ["gepa", "miprov2"]
)

Cost model

Token spend per job is roughly:

total_tokens ≈ (dataset_size × iterations × tokens_per_call) × num_optimizers

A typical run with 100 examples, 12 iterations, GPT-4o-mini target, and both optimizers enabled lands in the $0.40–$1.20 range. Reflection-heavy GEPA contributes ~70% of that. Disabling GEPA roughly halves cost; disabling MIPROv2 saves ~25%.

Prompt Optimization Quickstart

Run your first optimization job.

AgentOpt

Optimize entire agents, not just prompts.

Routing Lifecycle

See how an optimized prompt is then served.

Security & Isolation

How your OpenRouter key is handled during optimization.

​At a glance

​When to prefer each

​How IronLabs picks the winner

​Disabling one optimizer

​Cost model

​Related