Picture a coffee shop with one barista and a line out the door. It doesn't matter how fast that barista works — orders pile up, and if they call in sick, the whole shop grinds to a halt. The obvious fix is to put two or three baristas behind the counter, each grabbing the next ticket as it comes in.
The Competing Consumers pattern is exactly that arrangement applied to message queues: instead of one worker draining a queue, you run several, and they share the load by competing for whatever arrives.
The problem
A single consumer reading from a queue is fine when traffic is light, but it has two structural weaknesses. First, it's a bottleneck: the queue can only drain as fast as that one process can handle messages, so when work arrives faster than it can keep up, the backlog grows without bound and latency climbs.
Second, it's a single point of failure. If that lone consumer crashes, hangs, or gets stuck on a poison message, nothing gets processed until it comes back — the queue just keeps filling while everyone waits.
How it works
The fix is to run multiple consumer instances that all read from the same queue and compete for messages. The queue itself does the coordinating: when a consumer asks for the next message, the queue hands it out and hides it from everyone else, so no two consumers ever grab the same one. Each message is delivered to exactly one consumer, which means the work naturally spreads across however many workers you have running.
Because the consumers don't know or care about each other, scaling becomes trivial — you turn throughput up or down simply by changing how many of them are running. That's horizontal scaling applied to the processing side: more workers, more messages per second, no code changes. The animation below shows one queue feeding several consumers, each pulling a different message off the front.
- QueueHolds messages; each is delivered to exactly one consumer.
- ConsumerOne of many instances competing for messages — add more to scale throughput.
Throughput scales almost linearly with consumer count as long as the queue and any shared downstream (a database, an API) can keep up. The moment you saturate that downstream, adding consumers stops helping and just moves the bottleneck — watch where the real constraint lives before turning the dial.
Resilience and at-least-once delivery
Spreading work across many consumers also removes the single point of failure. When a consumer picks up a message, the queue doesn't delete it outright — it makes the message invisible for a window called the visibility timeout and waits for the consumer to confirm it's done. If the consumer finishes, it acknowledges and the message is deleted for good.
But if that consumer dies mid-message and never acknowledges, the visibility timeout expires and the message simply reappears on the queue, where another healthy consumer picks it up. A crash costs you a brief delay, not a lost message — the system heals itself.
This safety net gives you at-least-once delivery, not exactly-once. A message can be processed more than once — for example if a consumer finishes the work but crashes just before acknowledging. Design your handlers to be idempotent (see the idempotency lesson) so that reprocessing the same message is harmless rather than a double charge or duplicate record.
Competing consumers vs. pub/sub
It's easy to confuse this pattern with pub/sub because both involve messages and multiple receivers, but they answer opposite questions. Competing consumers is about sharing work: one message goes to one consumer, and adding consumers spreads the load so the queue drains faster.
Pub/sub is about broadcasting: one message goes to every subscriber, because each subscriber needs its own copy to react to independently. If you want N workers to split a workload, that's competing consumers; if you want N services to each learn that something happened, that's pub/sub.
Ordering caveats
Sharing messages across parallel consumers comes at a cost: you generally lose strict global ordering. If consumer A grabs message 1 and consumer B grabs message 2, message 2 might well finish first — the consumers run independently and at different speeds. For many workloads (resizing images, sending emails, processing independent jobs) that's perfectly fine.
When order does matter, the usual answer is to partition the queue by some key so that all related messages — say, every event for a single account — land in the same partition and are handled by one consumer in sequence. You keep parallelism across keys while preserving order within each key.
When to use it
Reach for competing consumers whenever you have a stream of independent units of work that need to be processed reliably and you want processing capacity that scales with demand. It pairs naturally with queue-based load leveling: the queue absorbs bursts and smooths the spikes, while a pool of competing consumers drains it at whatever rate you've scaled them to.
It's less suited to work that must run in a strict global order, or to fan-out scenarios where multiple distinct services each need to react to the same event — that second case calls for pub/sub instead.