Think of a factory assembly line for bottling juice: one station washes the bottles, the next fills them, another caps them, a fourth slaps on a label. No single worker does everything — each handles one step and slides the bottle along to the next. New bottles keep entering while finished ones roll off the end.
Pipes and Filters is that assembly line for data. A big processing job is split into a series of small stages (the filters), connected by channels (the pipes) that carry the output of one stage into the input of the next.
The problem
When all the steps of a complex task live inside one big function or service, they fuse into a tangle. Validation, transformation, enrichment, and formatting all share the same code and state, so you can't change one step without risking the others, and you can't reuse a step elsewhere because it's welded to its neighbors.
Scaling gets awkward too. Maybe one step is CPU-heavy and slow while the rest are trivial — but because they're bundled together, you have to scale the whole monolith just to give that one hot step more room. You end up paying for capacity the other steps don't need.
- Monolithic stepValidation, transformation, enrichment, and formatting fused into one function — change one and you risk all the rest.
- Raw itemsEverything funnels through the single processor, so one slow CPU-heavy step blocks the trivial ones behind it.
- No independent scalingBecause the steps are welded together, you can't reuse one elsewhere or scale just the hot step — you scale the whole block.
How it works
You pull each step out into its own self-contained filter. A filter receives data on its input pipe, performs exactly one transformation, and writes the result to its output pipe — and that's all it knows about. It doesn't know who fed it or who consumes it, only the shape of the data flowing through.
That independence is the whole payoff. You can reorder stages, drop a stage in or out, reuse a stage in another pipeline, and scale each stage on its own — give the slow one more workers while the fast ones run lean. And because every stage processes a stream, they all run at once: while stage three works on item one, stage one is already pulling in item three. This is map/filter/reduce thinking stretched into a distributed pipeline. The diagram below shows data flowing left to right through a chain of filter stages.
- FilterA self-contained stage that does exactly one transformation. It knows only the data shape, not its neighbors.
- PipeThe channel carrying one filter's output into the next filter's input — often a durable queue between stages.
- SinkThe end of the line where finished items land. New items keep entering the front while these roll off the back.
Make filters idempotent and let pipes buffer. If a stage crashes partway through, the item should be safe to reprocess without corrupting anything — so design each filter to be idempotent. Using a durable queue as the pipe between stages also lets you fan a busy stage out to multiple competing consumers, absorbing bursts and recovering cleanly from failures.
When to use it
Pipes and filters fits naturally when a task is a clear sequence of distinct steps that operate on a stream of data — ETL jobs, image and video processing, log enrichment, or any workflow where stages have different resource appetites and you want to scale them independently.
It's overkill for a quick task that runs in a few milliseconds inside one process; the pipes themselves add latency and operational overhead. It's also a poor fit when the steps are tightly interdependent and need to share lots of state, since the whole point is that filters stay isolated. And you'll need to think hard about failures and ordering up front — a stage dying mid-stream is a question the pattern asks you to answer deliberately, not by accident.