Sharding

Imagine a library so popular that one front desk can't check books in and out fast enough. You could hire a faster clerk, but there's a ceiling. The smarter move is to open several desks and split the work: authors A–F at desk one, G–M at desk two, and so on. No single desk handles everything, and the whole library serves far more readers at once.

Sharding does the same thing to a database. Instead of one giant database straining under every write, you split the data across many smaller databases — each responsible for its own slice.

The problem

A single database server has hard limits: so much CPU, so much RAM, so much disk. You can buy a bigger box — vertical scaling — but that runs out fast and gets expensive fast. Adding read replicas helps you serve more reads, but every write still has to land on the one primary. Once your write volume or your total data size outgrows the biggest machine you can rent, replicas don't save you.

That's the wall sharding is built to break. The bottleneck isn't reads you can copy around — it's a single point that must accept every write and store every row.

Before sharding — one box must accept every write

every write lands on one box

All writes + all data

One database (overloaded)

Read replica (writes still hit primary)

A single server has hard CPU, RAM, and disk limits. Read replicas spread reads around, but every write still lands on the one primary — once write volume or data size outgrows the biggest machine, replicas don't save you.

How it works

You pick a shard key — a column like user_id or region — and a rule that maps each key to a shard. The rule might be a range (user_id 1–1M on shard A), a hash of the key (spreads rows evenly), or a lookup table. A router sits in front: when a write or query arrives, it inspects the shard key, figures out which shard owns that data, and sends the request straight there.

Because each shard holds only a fraction of the rows and handles only a fraction of the traffic, the system scales out simply by adding more shards. Each shard can even keep its own indexes tuned to its slice. The diagram below shows the router fanning writes out to three independent shards based on the key.

Sharding — one dataset split across many databases

routed by shard key

Writes

Shard router

Shard A · keys 0–9

Shard B · keys a–m

Shard C · keys n–z

The router inspects each write's shard key and forwards it to the single shard responsible for that range; no shard holds everything, so the system scales out by adding more.

Tip

The shard key is a near-permanent decision. Pick one that spreads load evenly and keeps data you query together on the same shard. A key like country may look tidy but will overload the shard holding your biggest market; a hashed user_id usually balances better. Queries that need to touch every shard at once are the slow path you're trying to avoid.

When to use it

Reach for sharding when a single database genuinely can't hold your data or absorb your write throughput, and replicas alone aren't enough. It's the workhorse behind large-scale systems where the dataset is naturally partitionable by a key — per-user, per-tenant, or per-region data.

It's not free. Cross-shard queries and transactions get complicated, rebalancing data when a shard fills up is genuinely hard, and your application (or a router layer) now has to be shard-aware. So don't shard preemptively. Most systems do fine with a beefier box and read replicas for a long time — sharding is what you reach for once you've truly outgrown them.

Sharding

The problem

How it works

When to use it

Key takeaways

Keep going