Every single model inference API is rate limited these days, so any practical use of the method AzureOpenAIChatClient.create_agent would need to deal with retry logic leading to boiler plate code.
I would like Agent Framework to solve this typical pattern, helping the developer to focus on the rest. A simple implementation using tenacity library would be as follows (but streaming needs a more delicate handling):
class AzureOpenAIChatClientWithRetry(AzureOpenAIChatClient):
"""Azure OpenAI Chat Client with built-in retry logic for handling rate limits."""
retry_attempts = 3
"""Number of retry attempts for rate limit errors."""
@staticmethod
def _before_sleep_log(retry_state: RetryCallState) -> None:
"""Log when rate limiting is reached and retry is about to sleep."""
attempt_number = retry_state.attempt_number
wait_time = retry_state.next_action.sleep if retry_state.next_action else 0
logger.warning(
"Rate limiting reached. Attempt %d failed. Retrying in %.2f seconds...",
attempt_number,
wait_time,
)
@override
@retry(
stop=stop_after_attempt(retry_attempts),
wait=wait_exponential(multiplier=1, min=4, max=10),
retry=retry_if_exception_type(RateLimitError),
reraise=True,
before_sleep=_before_sleep_log
)
def get_response(self, *args, **kwargs):
"""Get response with retry on rate limit errors (429 status code only)."""
return super().get_response(*args, **kwargs)
Every single model inference API is rate limited these days, so any practical use of the method
AzureOpenAIChatClient.create_agentwould need to deal with retry logic leading to boiler plate code.I would like Agent Framework to solve this typical pattern, helping the developer to focus on the rest. A simple implementation using
tenacitylibrary would be as follows (but streaming needs a more delicate handling):