Database Design for Developers Who Learned by Vibe Coding
AI writes your database schemas. But bad schema design is the most expensive mistake to fix later. Here's what you need to know to design databases that age well.
Why Schema Design Is the Most Consequential Decision You'll Make
Application code can be refactored in a day. Database schemas, once they have real data in them, are expensive to change — they require migrations, potential downtime, and careful coordination if other systems read the same database. The schema you design upfront will shape your application's architecture, query patterns, and performance characteristics for years. AI tools generate schemas that work for the immediate use case. They rarely consider growth, access patterns, or the queries you'll be running six months from now.
Normalization: When to Follow the Rules and When to Break Them
Database normalization (3NF, Boyce-Codd) is the theory of organizing data to eliminate redundancy and dependency. In practice: normalize your data by default, denormalize deliberately when you have a proven performance problem. The most common beginner mistake is either over-normalizing (six tables where two would do) or storing JSON blobs in columns to avoid designing proper tables. JSON columns have legitimate uses, but 'I don't want to think about this relationship' is not one of them.
-- Over-normalized (excessive joins for simple reads):
CREATE TABLE orders (id, customer_id, created_at);
CREATE TABLE order_totals (order_id, subtotal, tax, shipping, total);
CREATE TABLE order_statuses (order_id, status, updated_at);
-- Three tables for one logical entity. Every order read = 3 JOINs.
-- Appropriately normalized:
CREATE TABLE orders (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
customer_id UUID NOT NULL REFERENCES customers(id),
status TEXT NOT NULL DEFAULT 'pending' CHECK (status IN ('pending','processing','shipped','delivered','cancelled')),
subtotal_cents INTEGER NOT NULL,
tax_cents INTEGER NOT NULL,
total_cents INTEGER NOT NULL GENERATED ALWAYS AS (subtotal_cents + tax_cents) STORED,
created_at TIMESTAMPTZ DEFAULT now(),
updated_at TIMESTAMPTZ DEFAULT now()
);Indexes: The Difference Between Fast and Unusable
An unindexed query on a table with 10 rows is fast. The same query on a table with 1 million rows is a full table scan — every row examined, regardless of how many match. Indexes create efficient lookup structures that let PostgreSQL find matching rows in O(log n) instead of O(n). The rule: every foreign key gets an index. Every column you filter by in WHERE clauses gets an index. Every column you ORDER BY gets an index. Composite indexes for multi-column WHERE conditions. The counter-rule: too many indexes slow down writes (every insert/update/delete must update the indexes).
-- Add indexes proactively on columns you know you'll query:
CREATE INDEX idx_orders_customer_id ON orders(customer_id);
CREATE INDEX idx_orders_status ON orders(status);
CREATE INDEX idx_orders_created_at ON orders(created_at DESC); -- for time-sorted queries
-- Composite index for multi-column filters:
CREATE INDEX idx_orders_customer_status ON orders(customer_id, status);
-- This index satisfies: WHERE customer_id = ? AND status = ?
-- But also: WHERE customer_id = ? (uses first column)
-- But NOT: WHERE status = ? alone (doesn't use the index efficiently)
-- Check if a query uses an index:
EXPLAIN ANALYZE SELECT * FROM orders WHERE customer_id = 'abc123' ORDER BY created_at DESC;The Many-to-Many Relationship Trap
AI-generated schemas often get many-to-many relationships wrong. A user can have many roles, a role can have many users — this is many-to-many, and it requires a junction table. But junction tables don't have to be dumb join tables. They can carry their own data — when was this role assigned? who assigned it? was it a temporary assignment? Designing junction tables that carry meaningful data from the start is far easier than retrofitting this data later.
-- Dumb junction table (works but throws away data):
CREATE TABLE user_roles (user_id UUID, role_id UUID);
-- Rich junction table (captures the relationship itself):
CREATE TABLE user_roles (
user_id UUID NOT NULL REFERENCES users(id) ON DELETE CASCADE,
role_id UUID NOT NULL REFERENCES roles(id) ON DELETE CASCADE,
assigned_by UUID REFERENCES users(id),
assigned_at TIMESTAMPTZ DEFAULT now(),
expires_at TIMESTAMPTZ, -- temporary roles
PRIMARY KEY (user_id, role_id)
);
-- Now you have an audit trail and support for role expiry — free.Migrations: Changing Schemas Safely
Schema migrations are the hardest database operation. Adding a column to a large table can lock the table, causing downtime. Renaming a column breaks code that uses the old name. Removing a column before updating the code causes crashes. The professional approach: always add before removing (blue-green migrations), use NOT NULL with a DEFAULT for new required columns, deploy the application changes before removing the old column. Prisma and other ORMs provide migration tooling, but understanding the operations they perform is essential for running migrations safely in production. The database fundamentals module covers zero-downtime migration patterns explicitly.