When to use client-side load balancing
Use client-side load balancing when you want to:- Distribute requests across multiple models to optimize for reliability, latency, or cost.
- Implement failover mechanisms to switch to alternative models if one fails.
- Handle retries with exponential backoff for robust request processing.
- Customize model-specific behavior with additional messages or timeouts.
Configuring weighted load balancing
Weighted load balancing allows you to assign weights to models, determining the probability of selecting each model for a request. You can also specify retries, timeouts, and model-specific messages.Example: Weighted load balancing
- Requests are distributed with a 70% chance to
openai/gpt-4oand 30% toanthropic/claude-3-5-haiku-20241022. - Each model has specific retry limits, timeouts, and backoff settings for exponential retry delays.
- Model-specific messages are appended to ensure concise responses.
Configuring ordered failover
Ordered failover allows you to specify a chain of models to try in sequence if a model fails. Each model in the chain can have its own retries, timeouts, and additional messages.Example: Ordered failover
- The client first tries
openai/gpt-4owith up to 2 retries. - If it fails, it falls back to
anthropic/claude-3-5-haiku-20241022with 1 retry. - Each model has its own timeout, backoff, and additional messages for tailored behavior.