Explainstuff.mebeta
All concepts
Cloud Native Patternsintermediate6 min

Index Table

When your data store can't index a field you query often, build your own lookup table keyed by that field so you skip the full scan.

A cookbook is organized by chapter — soups, mains, desserts. That's great if you know you want dessert. But if you've got a bag of mushrooms and want every recipe that uses them, you'd have to read the whole book page by page. That's why good cookbooks add an ingredient index in the back: look up "mushrooms" and jump straight to the right pages.

Index Table is exactly that back-of-the-book index, but for a data store that can only find things by its primary key. You build and maintain a second table keyed by the field you actually search on.

The problem

Many data stores — especially key-value and partitioned cloud stores — are blazing fast when you look up a record by its primary key, and miserably slow at everything else. Ask one for "all orders placed by customer 42" when the key is the order ID, and it has no choice but to scan every partition and inspect every row.

In a relational database you'd just add a secondary index and move on. But the simpler stores at cloud scale often don't offer secondary indexes, or limit them severely. Without one, your common-but-non-key queries get slower and pricier as the data grows, until the full scan becomes unworkable.

Without an index table — every query is a full scan
scan every row
Query by customer
Orders store (key = order ID)
Row 1 · check
Row 2 · check
Row N · check
With no secondary index on the customer field, a query for one customer's orders forces the store to inspect every row in turn — cost grows with the table.

How it works

You create a second table whose key is the field you want to query by. To find orders by customer, you build an index table keyed on customer ID; each entry points to (or contains) the matching orders. Now "orders for customer 42" is a single direct lookup instead of a scan.

There are two flavors. A lean index table stores just the key and a pointer — the primary keys of the matching records — so you then fetch the full rows from the main table. A fatter one duplicates whole records (or the columns a query needs) into the index so the lookup returns everything in one hop, trading storage and write cost for read speed. Either way, your application is responsible for writing to the index whenever the source data changes. The diagram below shows a query bypassing a full scan by going through a customer-keyed index table to reach the right records.

Index Table — your own back-of-the-book index
direct lookup, no scan
Query by customer
Index table (by customer)
Full scan (avoided)
Orders table
A query on a non-key field goes through a customer-keyed index table straight to the right records, skipping the expensive full scan.
Tip

Every index table is another write you have to keep honest. Inserting or updating a record now means updating its index entries too, ideally in the same transaction or via a reliable async pipeline. If they ever drift apart, the index lies. This is the classic trade-off against strict normalization: you're duplicating data on purpose to make reads fast, and you owe the discipline to keep the copies in sync.

When to use it

Use an index table when you frequently query a store on a non-key field and the store itself can't index it for you — the typical situation with large key-value or sharded NoSQL stores. It's a great fit alongside CQRS, where the read side is free to maintain whatever purpose-built lookup structures make queries fast.

Skip it when your database already supports the secondary indexes you need — let the engine do the work and keep the consistency guarantees. And weigh the write penalty: if the field is rarely queried but constantly updated, the extra writes and sync risk may cost more than the occasional scan you're trying to avoid.

Key takeaways

  • An index table is a secondary table you maintain yourself, keyed by a field the primary store can't index.
  • It turns an expensive full scan into a cheap direct lookup for queries on non-key fields.
  • Each index table holds enough to find or return the records for one query shape — sometimes a pointer, sometimes a copy.
  • It's the application-managed cousin of a database's built-in secondary index, common in key-value and partitioned stores.
  • The cost is write amplification and the risk of the index drifting out of sync with the source data.

Keep going