Glossary

Index (Database)

A database index is an auxiliary data structure (usually a B-tree) that allows the database to find rows matching a condition in O(log n) time instead of scanning every row (O(n)). Indexes speed up reads but slow down writes.

Explanation

Without an index, a query like WHERE email = 'alice@example.com' requires a full table scan — examining every row. For a table with 10 million rows, that's millions of disk reads. With an index on email, the database uses a B-tree to find the matching row in O(log n) steps — roughly 23 comparisons for 10 million rows. Index types: B-tree (default, excellent for equality and range queries: =, <, >, BETWEEN, LIKE 'prefix%'), hash (perfect O(1) equality lookups but no range queries — less commonly used explicitly, though databases use hash indexes internally), GiST and GIN (PostgreSQL — for full-text search, geometric data, arrays), BRIN (block range index — for very large tables with naturally-ordered data like timestamps). Multi-column (composite) indexes support queries that filter on multiple columns. Index on (user_id, created_at) supports WHERE user_id = 42 ORDER BY created_at — both the filter and the sort are indexed. Column order in composite indexes matters: the index is most useful when queries filter on a prefix of the columns (user_id alone, or user_id + created_at, but not created_at alone). Index trade-offs: reads are faster; writes are slower (every INSERT/UPDATE/DELETE must maintain all indexes); indexes consume storage. Don't add indexes speculatively — add them when you have a specific slow query that would benefit. Use EXPLAIN (ANALYZE) to see if a query uses an index and at what cost.

Code Example

sql

-- Database indexing strategy

-- Basic index on a frequently-queried column
CREATE INDEX idx_users_email ON users(email);
-- SELECT * FROM users WHERE email = 'x@y.com' now uses idx_users_email

-- Composite index: supports queries filtering on user_id and/or created_at
CREATE INDEX idx_orders_user_created ON orders(user_id, created_at);
-- SELECT * FROM orders WHERE user_id = 42 ORDER BY created_at DESC → uses index
-- SELECT * FROM orders WHERE user_id = 42 → uses index (prefix match)
-- SELECT * FROM orders WHERE created_at > '2024-01-01' → does NOT use index (not a prefix)

-- Partial index: only index active users (smaller, faster)
CREATE INDEX idx_users_active_email ON users(email) WHERE active = true;

-- Unique index: enforce uniqueness AND speed up lookups
CREATE UNIQUE INDEX idx_users_email_unique ON users(email);
-- (Often created automatically by UNIQUE constraint)

-- See if a query uses an index (PostgreSQL)
EXPLAIN (ANALYZE, BUFFERS) 
SELECT * FROM orders WHERE user_id = 42 ORDER BY created_at DESC LIMIT 10;
-- Look for "Index Scan" vs "Seq Scan" in the output
-- Seq Scan on a large table = missing index

Why It Matters for Engineers

Missing indexes are the most common cause of slow database queries in production applications. A query that runs in 1ms with an index can take 30 seconds without one on a large table — and this is the kind of bug that's invisible in development (small datasets) but catastrophic in production. Adding the right index is often a one-line fix with a 1000x performance improvement. Understanding indexes also informs schema design: you should design your tables with the access patterns in mind, adding indexes to columns used in WHERE, JOIN ON, and ORDER BY clauses. This is one of those topics where practical knowledge translates directly to better production systems.

Learn This In Practice

Go deeper with the full module on Beyond Vibe Code.

Databases Fundamentals → →