Imagine pouring a large jug of water into a funnel. Dump it all at once and it overflows everywhere; pour it at a controlled pace and every drop makes it through. When your application calls an external API — a payment provider, a mapping service, a partner feed — you're pouring requests into someone else's funnel, and they've decided exactly how fast it can drain.
The Rate Limiting pattern is how you do the controlled pour. Rather than firing requests the instant you have them and hoping the provider keeps up, you deliberately pace your own outbound traffic to stay within the limit the downstream service allows.
The problem
Almost every external service caps how often you may call it — say, 100 requests per second, or 10,000 per day. Exceed it and you don't just get a polite slowdown: you get rejected requests, 429 Too Many Requests errors, temporary bans, or even billing penalties. Your work fails not because anything was wrong with it, but because you sent it too fast.
This is easy to confuse with throttling, but the direction is opposite. Throttling is defensive on the inbound side — it protects your service from being overwhelmed by callers. Rate limiting is considerate on the outbound side — it protects a downstream service from being overwhelmed by you. Naively retrying rejected calls only makes it worse, hammering an already-saturated limiter and triggering ever-longer penalties.
- Burst of workCalls fired the instant there's work to do, with no self-pacing. Spikes go out at full speed straight at the provider.
- Downstream API (over quota)The external service's published limit is blown past. It defends itself by rejecting the excess instead of absorbing it.
- 429 / banRejected requests, 'Too Many Requests' errors, temporary bans, or billing penalties. Naive retries only hammer the limiter harder.
How it works
The classic mechanism is a token bucket. The bucket holds tokens and refills at exactly the rate the downstream service permits — say, 100 tokens per second. Every outbound call must spend a token. If tokens are available, the call goes immediately; if the bucket is empty, the call waits until a token refills rather than being fired off to fail.
The bucket's size sets how big a burst you can absorb before pacing kicks in, while the refill rate enforces the long-run average. Work that can't go out right now sits in a buffer until its turn, so a sudden spike of 1,000 requests drains out smoothly at the allowed pace instead of being rejected en masse. The diagram below shows requests arriving in bursts, being metered by the limiter, and leaving as a steady, compliant stream to the downstream service.
- Token bucketRefills tokens at exactly the allowed rate. Each outbound call spends one; when the bucket is empty, the call waits instead of being rejected.
- Downstream APIThe external service with a published quota. Smoothing your calls keeps you inside it and avoids 429s and penalties.
Coordinate the bucket across instances. A per-process limiter is fine for one worker, but ten workers each pacing to the full limit will collectively blow past it tenfold. When you scale out, the token bucket usually needs to live in shared state (a cache like Redis) so the whole fleet shares one budget.
When to use it
Use rate limiting whenever you call a service that publishes a quota and you'd rather pace yourself than be cut off. It pairs naturally with queue-load-leveling: a queue absorbs the bursty work, and the rate limiter drains it at a sustainable speed. It also makes your retry logic far gentler — instead of retrying immediately into a wall, retries wait for the next available token, so you stop amplifying the very congestion you're trying to recover from.
Where it's overkill: low-volume calls that never approach any limit, or fire-and-forget traffic where the occasional rejection genuinely doesn't matter. But the moment you're doing bulk work against a metered API — sending notifications, syncing records, scraping a feed — pacing your outbound flow is the difference between steady throughput and a stream of rejections.