AgentOpt Skill

End-to-end workflow: user files → ZIP → upload → optimize → poll → results.

Trigger

Activate when user says things like:

“optimize my agent / prompt”
“run agentopt on my files”
“improve my system prompt using the optimizer”
/AgentOpt

Prerequisites

Ask for these if not already provided:

Item	Where to get
`IRONA_API_KEY`	User’s bearer token (env var or prompted)
`agent.py`	User’s agent file (path or content)
`eval.py`	Scoring function (path or content)
`dataset.json`	Array of `{input, answer}` objects (path or content)
`target_model`	OpenRouter model string, default `openai/gpt-4o-mini`
`n_iterations`	1–50, default `5` for quick test / `15` for full run

Step 1 — Validate & convert files

agent.py requirements

Must have EDITABLE and FIXED boundary markers, and export run_batch:

# ===== EDITABLE SECTION START =====
import asyncio
import os

MODEL = "openai/gpt-4o-mini"  # Set by benchmark config — do not change
MAX_STEPS = 1
CONCURRENT_REQUESTS = 5
DEPENDENCIES = []

SYSTEM_PROMPT = "Your system prompt here"

async def run_batch(inputs: list[str], api_key: str) -> tuple[list[str], dict]:
    """
    Process a batch of inputs. Return (predictions, usage).
    usage = {"tokens": int, "cost": float}
    """
    if not api_key:
        return ["unknown"] * len(inputs), {"tokens": 0, "cost": 0.0}

    from openai import AsyncOpenAI
    client = AsyncOpenAI(api_key=api_key, base_url="https://openrouter.ai/api/v1")
    semaphore = asyncio.Semaphore(CONCURRENT_REQUESTS)
    total_tokens = 0
    total_cost = 0.0

    async def _call(inp: str) -> str:
        nonlocal total_tokens, total_cost
        async with semaphore:
            response = await client.chat.completions.create(
                model=MODEL,
                temperature=0,
                messages=[
                    {"role": "system", "content": SYSTEM_PROMPT},
                    {"role": "user", "content": inp},
                ],
            )
            if response.usage:
                total_tokens += response.usage.total_tokens
                extra = getattr(response.usage, "model_extra", None) or {}
                total_cost += float(extra.get("cost", 0.0))
            return (response.choices[0].message.content or "").strip()

    predictions = list(await asyncio.gather(*(_call(inp) for inp in inputs)))
    return predictions, {"tokens": total_tokens, "cost": total_cost}

# ===== FIXED BOUNDARY - DO NOT MODIFY BELOW =====
if __name__ == "__main__":
    import json
    import sys

    inputs_file = sys.argv[1]
    with open(inputs_file) as f:
        inputs = json.load(f)

    api_key = os.environ.get("OPENROUTER_API_KEY", "")
    predictions, usage = asyncio.run(run_batch(inputs, api_key))

    print(json.dumps(predictions))
    print(json.dumps(usage), file=sys.stderr)

Conversion rules:

If user’s code has no EDITABLE SECTION START marker → wrap their SYSTEM_PROMPT + run_batch in the EDITABLE section, append the FIXED boundary block verbatim.
If run_batch doesn’t exist → wrap their inference logic inside the template above.
MODEL must be set to the target model string.
The FIXED boundary block must not be modified.

Reference examples: examples/gaia/agent.py, examples/finance_agent/agent.py

eval.py requirements

Must define score(expected: str, predicted: str) -> float returning 0.0–1.0.

# Minimal eval — exact/partial match
import re

def _normalize(s: str) -> str:
    return re.sub(r"[.,;:!?\-]$", "", s.strip().lower())

def score(expected: str, predicted: str) -> float:
    exp = _normalize(expected)
    pred = _normalize(predicted)
    if exp == pred:
        return 1.0
    if exp in pred:
        return 0.5
    return 0.0

Conversion rules:

If user has a different metric (e.g. F1, BLEU, exact_match) → wrap it in a score(expected, predicted) function that returns float 0–1.
If user has no eval → use the exact/partial match template above, noting that they should customize it for their task.

Reference examples: examples/gaia/eval.py, examples/finance_agent/eval.py, examples/trail/eval.py

dataset.json requirements

JSON array of {"input": str, "answer": str} objects. Minimum 10 rows.

[
  {"input": "What is the capital of France?", "answer": "Paris"},
  {"input": "2 + 2", "answer": "4"}
]

Conversion rules:

CSV with input/answer columns → python3 -c "import csv,json,sys; rows=list(csv.DictReader(open('data.csv'))); json.dump([{'input':r['input'],'answer':r['answer']} for r in rows],sys.stdout,indent=2)"
Different column names → remap to input/answer.
JSONL → python3 -c "import json,sys; data=[json.loads(l) for l in open('data.jsonl')]; json.dump(data,sys.stdout,indent=2)"
Fewer than 10 rows → warn the user; optimizer requires minimum 10.

Step 2 — Build ZIP

cd /tmp && mkdir -p agentopt_job
cp <agent.py path>  agentopt_job/agent.py
cp <eval.py path>   agentopt_job/eval.py
cp <dataset.json path> agentopt_job/dataset.json
cd agentopt_job && zip -q ../agentopt_input.zip agent.py eval.py dataset.json
echo "ZIP size: $(du -sh ../agentopt_input.zip | cut -f1)"

Step 3 — Upload ZIP to Cloudflare R2

Requires R2 credentials in environment:

Variable	Value
`CF_ENDPOINT_URL`	`https://<account_id>.r2.cloudflarestorage.com`
`CF_ACCESS_KEY_ID`	R2 API token (access key)
`CF_SECRET_ACCESS_KEY`	R2 API token (secret)
`CF_ACCOUNT_ID`	Cloudflare account ID
`CF_BUCKET_NAME`	R2 bucket name

R2_KEY="router-training/$(python3 -c 'import uuid; print(uuid.uuid4().hex)')/agentopt_input.zip"

python3 - <<'EOF'
import boto3, os, sys
from botocore.config import Config

s3 = boto3.client(
    "s3",
    endpoint_url=os.environ["CF_ENDPOINT_URL"],
    aws_access_key_id=os.environ["CF_ACCESS_KEY_ID"],
    aws_secret_access_key=os.environ["CF_SECRET_ACCESS_KEY"],
    config=Config(signature_version="s3v4"),
    region_name="auto",
)
bucket = os.environ["CF_BUCKET_NAME"]
key = os.environ["R2_KEY"]
s3.upload_file("/tmp/agentopt_input.zip", bucket, key)
print(f"https://{bucket}.{os.environ['CF_ACCOUNT_ID']}.r2.dev/{key}")
EOF

# Capture the URL
INPUT_URL=$(R2_KEY="$R2_KEY" python3 - <<'EOF'
import boto3, os
from botocore.config import Config
s3 = boto3.client("s3", endpoint_url=os.environ["CF_ENDPOINT_URL"],
    aws_access_key_id=os.environ["CF_ACCESS_KEY_ID"],
    aws_secret_access_key=os.environ["CF_SECRET_ACCESS_KEY"],
    config=Config(signature_version="s3v4"), region_name="auto")
bucket = os.environ["CF_BUCKET_NAME"]
key = os.environ["R2_KEY"]
s3.upload_file("/tmp/agentopt_input.zip", bucket, key)
print(f"https://{bucket}.{os.environ['CF_ACCOUNT_ID']}.r2.dev/{key}")
EOF
)
echo "Input URL: $INPUT_URL"

Step 4 — Submit optimization job

API_KEY="${IRONA_API_KEY:-<paste your key>}"
BASE_URL="https://irona-ai--optimize-dev.modal.run"

JOB_RESPONSE=$(curl -s -X POST "$BASE_URL" \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $API_KEY" \
  -d "{
    \"optimizer\": \"agentopt\",
    \"input_url\": \"$RAW_URL\",
    \"target_models\": [\"$TARGET_MODEL\"],
    \"n_iterations\": $N_ITERATIONS
  }")

echo "$JOB_RESPONSE" | python3 -m json.tool
JOB_ID=$(echo "$JOB_RESPONSE" | python3 -c "import json,sys; print(json.load(sys.stdin)['job_id'])")
echo "Job ID: $JOB_ID"

Full request body:

Field	Type	Default	Notes
`optimizer`	string	—	must be `"agentopt"`
`input_url`	string	—	public ZIP download URL
`target_models`	string[]	—	exactly one model, e.g. `["openai/gpt-4o-mini"]`
`n_iterations`	int	15	1–50
`overall_timeout_seconds`	int	3600	300–7200
`llm_call_timeout_seconds`	int	300	30–600
`sandbox_timeout_seconds`	int	600	60–1800

Response:

{"job_id": "uuid", "status": "queued", "version": "x.x.x"}

Step 5 — Poll status until complete

Poll every 30 seconds. Show live progress each iteration.

STATUS_URL="https://irona-ai--optimize-status-dev.modal.run"

while true; do
  STATUS=$(curl -s "$STATUS_URL?job_id=$JOB_ID" \
    -H "Authorization: Bearer $API_KEY")
  
  STATE=$(echo "$STATUS" | python3 -c "import json,sys; d=json.load(sys.stdin); print(d.get('status',''))")
  ITER=$(echo "$STATUS"  | python3 -c "import json,sys; d=json.load(sys.stdin); print(d.get('current_iteration','?'))")
  BEST=$(echo "$STATUS"  | python3 -c "import json,sys; d=json.load(sys.stdin); print(d.get('best_score','?'))")
  
  echo "[$(date +%H:%M:%S)] status=$STATE  iteration=$ITER  best_score=$BEST"
  
  case "$STATE" in
    completed|interrupted|failed) break ;;
  esac
  sleep 30
done

Status values: queued → running → completed | interrupted | failed AgentOpt-specific status fields:

Field	Description
`current_iteration`	Latest iteration completed (0 = baseline)
`best_score`	Best score seen so far (0.0–1.0)
`baseline_score`	Score before any optimization
`n_iterations`	Total iterations requested

Step 6 — Fetch and display results

RESULT_URL="https://irona-ai--optimize-result-dev.modal.run"

RESULT=$(curl -s "$RESULT_URL?job_id=$JOB_ID" \
  -H "Authorization: Bearer $API_KEY")

echo "$RESULT" | python3 -c "
import json, sys
data = json.load(sys.stdin)
results = data.get('results', [])
if not results:
    print('No results yet. Status:', data.get('status'))
    sys.exit(0)
for r in results:
    print('=== AgentOpt Result ===')
    print(f'Model:              {r[\"model\"]}')
    print(f'Train score:        {r.get(\"train_score\")}')
    print(f'Test score:         {r.get(\"test_score\")}')
    print(f'Iterations run:     {r.get(\"iterations_run\")}')
    print(f'Iterations kept:    {r.get(\"iterations_kept\")}')
    print()
    print('--- Original prompt ---')
    print(r['original_prompt'])
    print()
    print('--- Optimized prompt ---')
    print(r['optimized_prompt'])
    print()
    print(f'Best agent code:    {r.get(\"agent_code_url\")}')
"

Result fields (agentopt):

Field	Description
`optimized_prompt`	Best system prompt found
`original_prompt`	System prompt from original agent.py
`train_score`	Score on 70% train split
`test_score`	Score on 30% held-out test split
`iterations_run`	Total iterations executed
`iterations_kept`	Iterations where score improved
`agent_code_url`	Public URL to download best agent.py

Complete end-to-end script

Save to /tmp/run_agentopt.sh and run:

#!/usr/bin/env bash
set -euo pipefail

# ── Config ──────────────────────────────────────────────────────────────────
API_KEY="${IRONA_API_KEY:?Set IRONA_API_KEY}"
AGENT_FILE="${1:?Usage: $0 <agent.py> <eval.py> <dataset.json> [model] [n_iter]}"
EVAL_FILE="${2:?}"
DATASET_FILE="${3:?}"
TARGET_MODEL="${4:-openai/gpt-4o-mini}"
N_ITERATIONS="${5:-5}"

BASE_URL="https://irona-ai--optimize-dev.modal.run"
STATUS_URL="https://irona-ai--optimize-status-dev.modal.run"
RESULT_URL="https://irona-ai--optimize-result-dev.modal.run"

# ── Build ZIP ────────────────────────────────────────────────────────────────
TMP=$(mktemp -d)
cp "$AGENT_FILE" "$TMP/agent.py"
cp "$EVAL_FILE"  "$TMP/eval.py"
cp "$DATASET_FILE" "$TMP/dataset.json"
ZIP="$TMP/input.zip"
(cd "$TMP" && zip -q "$ZIP" agent.py eval.py dataset.json)
echo "ZIP: $(du -sh $ZIP | cut -f1)"

# ── Upload to Cloudflare R2 ──────────────────────────────────────────────────
echo "Uploading to R2..."
R2_KEY="router-training/$(python3 -c 'import uuid; print(uuid.uuid4().hex)')/agentopt_input.zip"
INPUT_URL=$(R2_KEY="$R2_KEY" python3 - <<'PYEOF'
import boto3, os
from botocore.config import Config
s3 = boto3.client("s3", endpoint_url=os.environ["CF_ENDPOINT_URL"],
    aws_access_key_id=os.environ["CF_ACCESS_KEY_ID"],
    aws_secret_access_key=os.environ["CF_SECRET_ACCESS_KEY"],
    config=Config(signature_version="s3v4"), region_name="auto")
bucket = os.environ["CF_BUCKET_NAME"]
key = os.environ["R2_KEY"]
s3.upload_file(os.environ["ZIP"], bucket, key)
print(f"https://{bucket}.{os.environ['CF_ACCOUNT_ID']}.r2.dev/{key}")
PYEOF
)
echo "URL: $INPUT_URL"

# ── Submit ──────────────────────────────────────────────────────────────────
echo "Submitting job (model=$TARGET_MODEL, n_iterations=$N_ITERATIONS)..."
JOB=$(curl -s -X POST "$BASE_URL" \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $API_KEY" \
  -d "{
    \"optimizer\": \"agentopt\",
    \"input_url\": \"$INPUT_URL\",
    \"target_models\": [\"$TARGET_MODEL\"],
    \"n_iterations\": $N_ITERATIONS
  }")
echo "$JOB" | python3 -m json.tool
JOB_ID=$(echo "$JOB" | python3 -c "import json,sys; print(json.load(sys.stdin)['job_id'])")
echo "Job ID: $JOB_ID"

# ── Poll ─────────────────────────────────────────────────────────────────────
echo "Polling..."
while true; do
  S=$(curl -s "$STATUS_URL?job_id=$JOB_ID" -H "Authorization: Bearer $API_KEY")
  STATE=$(echo "$S" | python3 -c "import json,sys; d=json.load(sys.stdin); print(d.get('status',''))")
  ITER=$(echo "$S"  | python3 -c "import json,sys; d=json.load(sys.stdin); print(d.get('current_iteration','?'))")
  BEST=$(echo "$S"  | python3 -c "import json,sys; d=json.load(sys.stdin); print(d.get('best_score','?'))")
  BASE=$(echo "$S"  | python3 -c "import json,sys; d=json.load(sys.stdin); print(d.get('baseline_score','?'))")
  echo "[$(date +%H:%M:%S)] $STATE  iter=$ITER  best=$BEST  baseline=$BASE"
  case "$STATE" in completed|interrupted|failed) break ;; esac
  sleep 30
done

# ── Results ──────────────────────────────────────────────────────────────────
echo ""
echo "Fetching results..."
curl -s "$RESULT_URL?job_id=$JOB_ID" \
  -H "Authorization: Bearer $API_KEY" | python3 -c "
import json, sys
data = json.load(sys.stdin)
for r in (data.get('results') or []):
    print('=== RESULT ===')
    print(f'Train: {r.get(\"train_score\")}  Test: {r.get(\"test_score\")}')
    print(f'Iterations: {r.get(\"iterations_run\")} run / {r.get(\"iterations_kept\")} kept')
    print()
    print('ORIGINAL PROMPT:')
    print(r['original_prompt'])
    print()
    print('OPTIMIZED PROMPT:')
    print(r['optimized_prompt'])
    print()
    print('Best agent.py:', r.get('agent_code_url'))
"

rm -rf "$TMP"

Usage:

IRONA_API_KEY=sk_... bash /tmp/run_agentopt.sh \
  agent.py eval.py dataset.json \
  openai/gpt-4o-mini 5

Reference examples

Pre-built working examples in examples/:

Benchmark	agent.py	eval.py	dataset.json
GAIA (general knowledge + tool use)	`examples/gaia/agent.py`	`examples/gaia/eval.py`	`examples/gaia/dataset.json`
Finance Q&A	`examples/finance_agent/agent.py`	`examples/finance_agent/eval.py`	`examples/finance_agent/dataset.json`
TRAIL (LLM trace classification)	`examples/trail/agent.py`	`examples/trail/eval.py`	`examples/trail/dataset.json`

Quick test with a pre-built example:

IRONA_API_KEY=sk_... bash /tmp/run_agentopt.sh \
  examples/gaia/agent.py \
  examples/gaia/eval.py \
  examples/gaia/dataset.json \
  openai/gpt-4o-mini 2

Error handling

Error	Cause	Fix
`422 agentopt requires input_url`	Missing `input_url` in request	Ensure ZIP uploaded and URL is set
`422 agentopt requires target_models`	Empty `target_models` list	Add at least one model string
`422 overall_timeout_seconds must be 300–7200`	Timeout out of range	Use value in range
`422 n_iterations must be 1–50`	Iterations out of range	Use value in range
`400 R2 upload failed`	`input_url` not reachable or R2 key missing	Verify R2 credentials and bucket public-read access
`status: interrupted`	Job timed out or cancelled	Check `error_message` field in status response; retry with higher `overall_timeout_seconds`
`status: failed`	Internal error	Check `error_message`; ensure agent.py runs without error locally

Validating agent.py locally before submitting

# Create a minimal test run (no API key → smoke test only)
python3 <agent.py path> <(echo '["test question"]')
# Should print: ["unknown"] to stdout, {"tokens": 0, "cost": 0.0} to stderr

​AgentOpt Skill

​Trigger

​Prerequisites

​Step 1 — Validate & convert files

​agent.py requirements

​eval.py requirements

​dataset.json requirements

​Step 2 — Build ZIP

​Step 3 — Upload ZIP to Cloudflare R2

​Step 4 — Submit optimization job

​Step 5 — Poll status until complete

​Step 6 — Fetch and display results

​Complete end-to-end script

​Reference examples

​Error handling

​Validating agent.py locally before submitting

AgentOpt Skill

Trigger

Prerequisites

Step 1 — Validate & convert files

agent.py requirements

eval.py requirements

dataset.json requirements

Step 2 — Build ZIP

Step 3 — Upload ZIP to Cloudflare R2

Step 4 — Submit optimization job

Step 5 — Poll status until complete

Step 6 — Fetch and display results

Complete end-to-end script

Reference examples

Error handling

Validating agent.py locally before submitting