WARNING

How to Fix HTTP 429 Too Many Requests Error

Quick Fix Summary

TL;DR

Immediately implement exponential backoff with jitter in your client code and verify server-side rate limit headers.

HTTP 429 indicates your client has exceeded the rate limit set by the server or API. It's a protective mechanism to prevent abuse and ensure service stability.

Diagnosis & Causes

Client making requests faster than the allowed rate limit.

Aggressive retry logic without backoff in client code.

Shared API key/IP address exceeding quotas across multiple services.

Sudden traffic spike or misconfigured load balancer.

Server-side rate limit misconfiguration or overly restrictive rules.

Recovery Steps

Step 1: Implement Exponential Backoff with Jitter

Immediately modify your client's retry logic to respect rate limits. Exponential backoff increases wait time between retries, and jitter adds randomness to prevent thundering herds.

python

import time
import random
def make_request_with_backoff(url, max_retries=5):
    base_delay = 1  # seconds
    for attempt in range(max_retries):
        try:
            response = requests.get(url)
            if response.status_code == 429:
                retry_after = int(response.headers.get('Retry-After', base_delay * (2 ** attempt)))
                jitter = random.uniform(0, 0.1 * retry_after)
                time.sleep(retry_after + jitter)
                continue
            return response
        except Exception as e:
            delay = base_delay * (2 ** attempt) + random.uniform(0, 1)
            time.sleep(delay)
    raise Exception('Max retries exceeded')

Step 2: Inspect and Respect Rate Limit Headers

Parse the server's response headers to understand your current quota and adjust request pacing dynamically.

python

response = requests.get('https://api.example.com/data')
print(f"Status: {response.status_code}")
print(f"Rate Limit: {response.headers.get('X-RateLimit-Limit')}")
print(f"Remaining: {response.headers.get('X-RateLimit-Remaining')}")
print(f"Reset Time: {response.headers.get('X-RateLimit-Reset')}")
print(f"Retry After: {response.headers.get('Retry-After')}")

Step 3: Implement Client-Side Request Throttling

Use a token bucket or leaky bucket algorithm to pace outgoing requests and stay within limits before the server rejects them.

python

from threading import Semaphore, Timer
import time
class RateLimiter:
    def __init__(self, rate, per):
        self.rate = rate  # requests
        self.per = per    # seconds
        self.semaphore = Semaphore(rate)
        self.timer = None
    def acquire(self):
        self.semaphore.acquire()
        if not self.timer:
            self.timer = Timer(self.per, self._reset)
            self.timer.start()
    def _reset(self):
        for _ in range(self.rate):
            self.semaphore.release()
        self.timer = None
limiter = RateLimiter(100, 60)  # 100 requests per minute
limiter.acquire()
# Make your request here

Step 4: Cache Frequent Responses

Reduce the number of live API calls by implementing a caching layer for idempotent GET requests.

python

from functools import lru_cache
import requests
@lru_cache(maxsize=128)
def get_cached_data(endpoint, params=None):
    """Cache GET requests to avoid hitting rate limits on repeated calls."""
    return requests.get(endpoint, params=params).json()

Step 5: Batch Requests Where Possible

If the API supports it, combine multiple logical operations into a single HTTP request to reduce call volume.

python

# Instead of N requests for N items...
# response_1 = api.get_item(1)
# response_2 = api.get_item(2)
# ...
# Use a batch endpoint in a single request.
batch_ids = [1, 2, 3, 4, 5]
response = requests.post('https://api.example.com/batch', json={'ids': batch_ids})
all_items = response.json()

Step 6: Scale or Distribute Request Sources

If limits are per IP or API key, distribute traffic across multiple IPs (using a proxy pool) or rotate API keys.

python

import itertools
API_KEYS = ['key1', 'key2', 'key3']
key_cycle = itertools.cycle(API_KEYS)
def make_request_with_rotation(url):
    current_key = next(key_cycle)
    headers = {'Authorization': f'Bearer {current_key}'}
    return requests.get(url, headers=headers)

Architect's Pro Tip

"Always design clients to treat 429 as a normal, expected response—not an error. Log it as INFO, not ERROR, to avoid alert noise and trigger your backoff logic silently."

Frequently Asked Questions

What's the difference between HTTP 429 and 503?

429 means *your client* is sending too many requests. 503 means *the server* is overloaded or down, often due to aggregate traffic from all clients.

Should I use 'Retry-After' header or implement my own backoff?

Always prefer the server-provided 'Retry-After' header if present. If absent, fall back to your exponential backoff strategy with jitter.

How do I know if the rate limit is per user, IP, or API key?

Check the API documentation. If unspecified, test by making requests from different IPs or with different credentials while monitoring the rate limit headers.

Related HTTP Protocol Guides

502 Bad Gateway

How to Fix HTTP 429 Too Many Requests Error

Quick Fix Summary

Diagnosis & Causes

Recovery Steps

Step 1: Implement Exponential Backoff with Jitter

Step 2: Inspect and Respect Rate Limit Headers

Step 3: Implement Client-Side Request Throttling

Step 4: Cache Frequent Responses

Step 5: Batch Requests Where Possible

Step 6: Scale or Distribute Request Sources

Architect's Pro Tip

Frequently Asked Questions

What's the difference between HTTP 429 and 503?

Should I use 'Retry-After' header or implement my own backoff?

How do I know if the rate limit is per user, IP, or API key?

Related HTTP Protocol Guides

Resolving 502 Bad Gateway Errors Blocking Your CI/CD Pipeline Deployments

Root Cause Analysis: Why HTTP/2 Stream Reset (RST_STREAM) Happens

Troubleshooting Guide: HTTP/2 Connection Failures and GOAWAY Frames