Imagine a hotel with no front desk. To check in you'd have to find the housekeeping office yourself, walk to a separate building for the restaurant, and track down the valet on your own — and you'd need to memorize where each one lives. A front desk fixes this: one counter handles everything, and you never need to know how the back of the house is organized.
An API gateway is that front desk for your backend. Instead of clients reaching into a sprawl of microservices directly, they send every request to one well-known endpoint, and the gateway figures out where it should go.
The problem
When a mobile app or browser talks directly to a fleet of microservices, the client has to know the topology — which service owns which feature, where each one lives, and how to reach it. A single screen might fire off calls to the user service, the orders service, and the inventory service, then stitch the results together itself. Every time you split, rename, or move a service, every client has to be updated.
It gets worse with cross-cutting concerns. Authentication, TLS termination, rate limiting, and logging are needed by every service, so each team ends up reimplementing them — slightly differently, with slightly different bugs. That's duplicated effort, inconsistent behavior, and a chatty client making many round trips over a slow network just to render one page.
How it works
The fix is to place an API gateway as a single entry point in front of the services. Clients only ever know one address; the gateway holds the map of what lives where and does three common jobs on their behalf:
- Routing — inspect each incoming request and send it to the right backend service based on its path or host (
/orders/*goes to the orders service,/users/*to the user service). The client stays blissfully unaware of the layout behind the door. - Aggregation — for a screen that needs data from several services, the gateway fans out the calls, waits for the responses, and combines them into one payload. The client makes a single request instead of three.
- Offloading — handle the shared concerns at the edge: authentication, TLS termination, throttling, and caching all happen once, in the gateway, so the services behind it don't each have to.
The animation below shows clients hitting one gateway endpoint, which then routes each request onward to the appropriate backend service.
- API GatewayOne entry point that routes to services and handles auth, TLS, and rate limiting.
- ServiceA backend microservice the gateway forwards to.
Offloading is where a gateway earns its keep. Pushing authentication, TLS termination, and throttling to the edge means you write and harden that logic once. Your backend services get to assume the request is already authenticated and rate-limited, so they can stay small and focused on business logic.
- GatewayMakes several backend calls for one client request and merges the results into a single response.
It's not a load balancer
Gateways and load balancers both sit in front of your servers, so they're easy to confuse — but they solve different problems. A load balancer spreads traffic across a pool of identical servers; it doesn't care what the request says, only that the next instance gets its fair share.
An API gateway routes to different services based on what the request actually is, and adds logic — auth, aggregation, caching — on the way through. It operates at the application layer and understands paths, headers, and tokens. In practice the two are often layered: a load balancer distributes traffic across several gateway instances, and the gateway then routes onward to the correct service.
Backends for Frontends
A common variation is Backends for Frontends (BFF): instead of one gateway trying to serve every client, you run a tailored gateway per client type — one for the mobile app, one for the web app, maybe one for partners. A phone on a flaky connection wants compact, pre-aggregated responses; a desktop browser can handle richer payloads. A BFF lets each frontend get a shape that fits it, without bloating a single shared gateway with conditional logic for everyone.
The front door is also a chokepoint. Because every request flows through it, an unscaled or unprotected gateway becomes a bottleneck and a single point of failure — if it goes down, your whole API goes dark. Run multiple instances behind a load balancer, keep its logic thin, and protect it with timeouts and a circuit breaker so a slow backend can't drag the gateway down with it.
When to use it
Reach for an API gateway once you have more than a handful of services and external clients that would otherwise need to know your internal layout. It shines when you want a single place for auth, TLS, and throttling, or when clients are making many round trips that aggregation could collapse into one.
For a small system with one or two services it's overkill — the indirection costs more than it saves. And keep the gateway disciplined: it should route, aggregate, and offload, not become a dumping ground for business logic. That belongs in the services, often deployed alongside them with a sidecar, so the gateway stays a thin, fast, well-protected front door.