AgentOpt Skill
End-to-end workflow: user files → ZIP → upload → optimize → poll → results.Trigger
Activate when user says things like:- “optimize my agent / prompt”
- “run agentopt on my files”
- “improve my system prompt using the optimizer”
/AgentOpt
Prerequisites
Ask for these if not already provided:| Item | Where to get |
|---|---|
IRONA_API_KEY | User’s bearer token (env var or prompted) |
agent.py | User’s agent file (path or content) |
eval.py | Scoring function (path or content) |
dataset.json | Array of {input, answer} objects (path or content) |
target_model | OpenRouter model string, default openai/gpt-4o-mini |
n_iterations | 1–50, default 5 for quick test / 15 for full run |
Step 1 — Validate & convert files
agent.py requirements
Must have EDITABLE and FIXED boundary markers, and exportrun_batch:
- If user’s code has no
EDITABLE SECTION STARTmarker → wrap theirSYSTEM_PROMPT+run_batchin the EDITABLE section, append the FIXED boundary block verbatim. - If
run_batchdoesn’t exist → wrap their inference logic inside the template above. MODELmust be set to the target model string.- The FIXED boundary block must not be modified.
examples/gaia/agent.py, examples/finance_agent/agent.py
eval.py requirements
Must definescore(expected: str, predicted: str) -> float returning 0.0–1.0.
- If user has a different metric (e.g. F1, BLEU, exact_match) → wrap it in a
score(expected, predicted)function that returns float 0–1. - If user has no eval → use the exact/partial match template above, noting that they should customize it for their task.
examples/gaia/eval.py, examples/finance_agent/eval.py, examples/trail/eval.py
dataset.json requirements
JSON array of{"input": str, "answer": str} objects. Minimum 10 rows.
- CSV with
input/answercolumns →python3 -c "import csv,json,sys; rows=list(csv.DictReader(open('data.csv'))); json.dump([{'input':r['input'],'answer':r['answer']} for r in rows],sys.stdout,indent=2)" - Different column names → remap to
input/answer. - JSONL →
python3 -c "import json,sys; data=[json.loads(l) for l in open('data.jsonl')]; json.dump(data,sys.stdout,indent=2)" - Fewer than 10 rows → warn the user; optimizer requires minimum 10.
Step 2 — Build ZIP
Step 3 — Upload ZIP to Cloudflare R2
Requires R2 credentials in environment:| Variable | Value |
|---|---|
CF_ENDPOINT_URL | https://<account_id>.r2.cloudflarestorage.com |
CF_ACCESS_KEY_ID | R2 API token (access key) |
CF_SECRET_ACCESS_KEY | R2 API token (secret) |
CF_ACCOUNT_ID | Cloudflare account ID |
CF_BUCKET_NAME | R2 bucket name |
Step 4 — Submit optimization job
| Field | Type | Default | Notes |
|---|---|---|---|
optimizer | string | — | must be "agentopt" |
input_url | string | — | public ZIP download URL |
target_models | string[] | — | exactly one model, e.g. ["openai/gpt-4o-mini"] |
n_iterations | int | 15 | 1–50 |
overall_timeout_seconds | int | 3600 | 300–7200 |
llm_call_timeout_seconds | int | 300 | 30–600 |
sandbox_timeout_seconds | int | 600 | 60–1800 |
Step 5 — Poll status until complete
Poll every 30 seconds. Show live progress each iteration.queued → running → completed | interrupted | failed
AgentOpt-specific status fields:
| Field | Description |
|---|---|
current_iteration | Latest iteration completed (0 = baseline) |
best_score | Best score seen so far (0.0–1.0) |
baseline_score | Score before any optimization |
n_iterations | Total iterations requested |
Step 6 — Fetch and display results
| Field | Description |
|---|---|
optimized_prompt | Best system prompt found |
original_prompt | System prompt from original agent.py |
train_score | Score on 70% train split |
test_score | Score on 30% held-out test split |
iterations_run | Total iterations executed |
iterations_kept | Iterations where score improved |
agent_code_url | Public URL to download best agent.py |
Complete end-to-end script
Save to/tmp/run_agentopt.sh and run:
Reference examples
Pre-built working examples inexamples/:
| Benchmark | agent.py | eval.py | dataset.json |
|---|---|---|---|
| GAIA (general knowledge + tool use) | examples/gaia/agent.py | examples/gaia/eval.py | examples/gaia/dataset.json |
| Finance Q&A | examples/finance_agent/agent.py | examples/finance_agent/eval.py | examples/finance_agent/dataset.json |
| TRAIL (LLM trace classification) | examples/trail/agent.py | examples/trail/eval.py | examples/trail/dataset.json |
Error handling
| Error | Cause | Fix |
|---|---|---|
422 agentopt requires input_url | Missing input_url in request | Ensure ZIP uploaded and URL is set |
422 agentopt requires target_models | Empty target_models list | Add at least one model string |
422 overall_timeout_seconds must be 300–7200 | Timeout out of range | Use value in range |
422 n_iterations must be 1–50 | Iterations out of range | Use value in range |
400 R2 upload failed | input_url not reachable or R2 key missing | Verify R2 credentials and bucket public-read access |
status: interrupted | Job timed out or cancelled | Check error_message field in status response; retry with higher overall_timeout_seconds |
status: failed | Internal error | Check error_message; ensure agent.py runs without error locally |