WARNING

How to Fix HTTP 429 Too Many Requests Error

Quick Fix Summary

TL;DR

Immediately implement exponential backoff with jitter in your client code and verify server-side rate limit headers.

HTTP 429 indicates your client has exceeded the rate limit set by the server or API. It's a protective mechanism to prevent abuse and ensure service stability.

Diagnosis & Causes

  • Client making requests faster than the allowed rate limit.
  • Aggressive retry logic without backoff in client code.
  • Shared API key/IP address exceeding quotas across multiple services.
  • Sudden traffic spike or misconfigured load balancer.
  • Server-side rate limit misconfiguration or overly restrictive rules.
  • Recovery Steps

    1

    Step 1: Implement Exponential Backoff with Jitter

    Immediately modify your client's retry logic to respect rate limits. Exponential backoff increases wait time between retries, and jitter adds randomness to prevent thundering herds.

    python
    import time
    import random
    def make_request_with_backoff(url, max_retries=5):
        base_delay = 1  # seconds
        for attempt in range(max_retries):
            try:
                response = requests.get(url)
                if response.status_code == 429:
                    retry_after = int(response.headers.get('Retry-After', base_delay * (2 ** attempt)))
                    jitter = random.uniform(0, 0.1 * retry_after)
                    time.sleep(retry_after + jitter)
                    continue
                return response
            except Exception as e:
                delay = base_delay * (2 ** attempt) + random.uniform(0, 1)
                time.sleep(delay)
        raise Exception('Max retries exceeded')
    2

    Step 2: Inspect and Respect Rate Limit Headers

    Parse the server's response headers to understand your current quota and adjust request pacing dynamically.

    python
    response = requests.get('https://api.example.com/data')
    print(f"Status: {response.status_code}")
    print(f"Rate Limit: {response.headers.get('X-RateLimit-Limit')}")
    print(f"Remaining: {response.headers.get('X-RateLimit-Remaining')}")
    print(f"Reset Time: {response.headers.get('X-RateLimit-Reset')}")
    print(f"Retry After: {response.headers.get('Retry-After')}")
    3

    Step 3: Implement Client-Side Request Throttling

    Use a token bucket or leaky bucket algorithm to pace outgoing requests and stay within limits before the server rejects them.

    python
    from threading import Semaphore, Timer
    import time
    class RateLimiter:
        def __init__(self, rate, per):
            self.rate = rate  # requests
            self.per = per    # seconds
            self.semaphore = Semaphore(rate)
            self.timer = None
        def acquire(self):
            self.semaphore.acquire()
            if not self.timer:
                self.timer = Timer(self.per, self._reset)
                self.timer.start()
        def _reset(self):
            for _ in range(self.rate):
                self.semaphore.release()
            self.timer = None
    limiter = RateLimiter(100, 60)  # 100 requests per minute
    limiter.acquire()
    # Make your request here
    4

    Step 4: Cache Frequent Responses

    Reduce the number of live API calls by implementing a caching layer for idempotent GET requests.

    python
    from functools import lru_cache
    import requests
    @lru_cache(maxsize=128)
    def get_cached_data(endpoint, params=None):
        """Cache GET requests to avoid hitting rate limits on repeated calls."""
        return requests.get(endpoint, params=params).json()
    5

    Step 5: Batch Requests Where Possible

    If the API supports it, combine multiple logical operations into a single HTTP request to reduce call volume.

    python
    # Instead of N requests for N items...
    # response_1 = api.get_item(1)
    # response_2 = api.get_item(2)
    # ...
    # Use a batch endpoint in a single request.
    batch_ids = [1, 2, 3, 4, 5]
    response = requests.post('https://api.example.com/batch', json={'ids': batch_ids})
    all_items = response.json()
    6

    Step 6: Scale or Distribute Request Sources

    If limits are per IP or API key, distribute traffic across multiple IPs (using a proxy pool) or rotate API keys.

    python
    import itertools
    API_KEYS = ['key1', 'key2', 'key3']
    key_cycle = itertools.cycle(API_KEYS)
    def make_request_with_rotation(url):
        current_key = next(key_cycle)
        headers = {'Authorization': f'Bearer {current_key}'}
        return requests.get(url, headers=headers)

    Architect's Pro Tip

    "Always design clients to treat 429 as a normal, expected response—not an error. Log it as INFO, not ERROR, to avoid alert noise and trigger your backoff logic silently."

    Frequently Asked Questions

    What's the difference between HTTP 429 and 503?

    429 means *your client* is sending too many requests. 503 means *the server* is overloaded or down, often due to aggregate traffic from all clients.

    Should I use 'Retry-After' header or implement my own backoff?

    Always prefer the server-provided 'Retry-After' header if present. If absent, fall back to your exponential backoff strategy with jitter.

    How do I know if the rate limit is per user, IP, or API key?

    Check the API documentation. If unspecified, test by making requests from different IPs or with different credentials while monitoring the rate limit headers.

    Related HTTP Protocol Guides