PostgreSQL is the world's most advanced open-source relational database. It's the default choice for serious applications. powerful enough to handle the most complex data requirements, reliable enough for financial and healthcare systems, and open enough to deploy anywhere from a Raspberry Pi to a 64-core cloud instance.
This guide is for application developers who use Postgres through ORMs but want to understand what's happening underneath. and for those ready to write SQL directly. We cover schema design, indexes, transactions, JSONB, full-text search, connection pooling, and production operations.
Why PostgreSQL?
There are many database options. Why Postgres?
Reliability: ACID transactions, point-in-time recovery, and battle-tested durability. Postgres has been in production at scale for 30+ years.
Richness: full SQL, window functions, CTEs, arrays, JSONB, full-text search, custom types, materialized views, and a procedural language (PL/pgSQL). No other open-source database matches the feature set.
Extensions: PostGIS (geospatial), pgvector (vector embeddings for AI), TimescaleDB (time-series), Citus (distributed), pg_trgm (trigram full-text), and hundreds more.
Managed options: Supabase, AWS RDS/Aurora, Google Cloud SQL, Neon, Railway, Render, Fly.io all offer managed Postgres. You're never locked in.
Ecosystem: every language has a mature Postgres driver. Every ORM supports it. Every migration tool works with it.
Schema Design Fundamentals
Data types matter
-- Use appropriate types - don't store everything as text
CREATE TABLE products (
id BIGSERIAL PRIMARY KEY, -- not INT for large tables
sku TEXT NOT NULL, -- variable length text
name VARCHAR(255) NOT NULL, -- text with max length
price NUMERIC(12, 2) NOT NULL, -- not FLOAT for money
weight_kg REAL, -- float is fine for measurements
in_stock BOOLEAN NOT NULL DEFAULT TRUE,
metadata JSONB, -- structured variable data
tags TEXT[], -- array of text
created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(), -- timezone-aware
updated_at TIMESTAMPTZ NOT NULL DEFAULT NOW()
);
Common type mistakes:
FLOAT/DOUBLEfor money. floating-point rounding errors. UseNUMERIC(precision, scale).TIMESTAMPwithout timezone. stores without timezone context. Always useTIMESTAMPTZ.INTfor IDs on large tables. max 2.1 billion. UseBIGINTorBIGSERIAL.CHAR(n): pads with spaces. UseVARCHAR(n)orTEXT.
Constraints as documentation
CREATE TABLE orders (
id BIGSERIAL PRIMARY KEY,
user_id BIGINT NOT NULL REFERENCES users(id),
status TEXT NOT NULL CHECK (status IN ('pending', 'paid', 'shipped', 'cancelled')),
total NUMERIC(12, 2) NOT NULL CHECK (total >= 0),
created_at TIMESTAMPTZ NOT NULL DEFAULT NOW()
);
-- Unique constraint (creates an index automatically)
ALTER TABLE users ADD CONSTRAINT users_email_unique UNIQUE (email);
-- Composite unique constraint
ALTER TABLE order_items ADD CONSTRAINT order_item_unique
UNIQUE (order_id, product_id);
Constraints document intent and catch bugs. A CHECK constraint on status prevents invalid states from being stored. no application code required.
Indexing Strategy
The most impactful performance optimisation in Postgres. Most query slowness is caused by missing indexes.
B-tree indexes (default)
-- Single column - for equality and range queries
CREATE INDEX ON users(email); -- equality: WHERE email = ?
CREATE INDEX ON orders(created_at); -- range: WHERE created_at > ?
-- Composite - for multi-column queries
-- Order matters: put equality columns first, then range
CREATE INDEX ON orders(user_id, created_at DESC); -- WHERE user_id = ? ORDER BY created_at DESC
-- Partial index - only indexes rows matching a condition
-- Much smaller and faster for sparse conditions
CREATE INDEX ON orders(created_at) WHERE status = 'pending';
CREATE INDEX ON users(email) WHERE deleted_at IS NULL;
-- Covering index - includes columns to avoid table lookups
CREATE INDEX ON orders(user_id) INCLUDE (status, total, created_at);
When to add an index
Add an index when:
- You filter, sort, or join on that column regularly
- The column has high cardinality (many distinct values)
- The table is large enough that sequential scans are slow
Don't add an index on:
- Columns you never query on
- Low-cardinality columns (boolean, status with 2-3 values). unless partial
- Columns on small tables (< 1,000 rows). sequential scan is faster
- Every foreign key automatically. only add FK indexes if you query by that FK
Every index slows writes slightly and uses disk space. Index selectively.
Finding slow queries
-- Enable slow query logging
ALTER SYSTEM SET log_min_duration_statement = '100ms'; -- log queries > 100ms
SELECT pg_reload_conf();
-- Find queries that hit sequential scans on large tables
SELECT schemaname, tablename, seq_scan, idx_scan, n_live_tup
FROM pg_stat_user_tables
WHERE seq_scan > 0
ORDER BY seq_scan DESC;
-- EXPLAIN ANALYZE - the most important diagnostic command
EXPLAIN (ANALYZE, BUFFERS, FORMAT TEXT)
SELECT * FROM orders WHERE user_id = 123 ORDER BY created_at DESC LIMIT 20;
Look for Seq Scan on large tables (high rows count). That's a missing index. Index Scan and Bitmap Index Scan are what you want.
Transactions
-- Explicit transaction
BEGIN;
UPDATE accounts SET balance = balance - 100 WHERE id = 1;
UPDATE accounts SET balance = balance + 100 WHERE id = 2;
-- Both succeed or both fail
COMMIT;
-- Savepoints for partial rollback
BEGIN;
INSERT INTO orders (...) VALUES (...);
SAVEPOINT order_created;
INSERT INTO order_items (...) VALUES (...);
-- If this fails, roll back only to savepoint
ROLLBACK TO SAVEPOINT order_created;
COMMIT;
Isolation levels
-- Default: READ COMMITTED (most reads see committed data)
BEGIN;
-- ... your statements
COMMIT;
-- REPEATABLE READ: consistent snapshot of the database throughout the transaction
BEGIN TRANSACTION ISOLATION LEVEL REPEATABLE READ;
-- Good for reports that span multiple queries
COMMIT;
-- SERIALIZABLE: strictest, prevents all anomalies
BEGIN TRANSACTION ISOLATION LEVEL SERIALIZABLE;
-- Use for financial calculations where concurrent modifications matter
COMMIT;
JSONB: When and How
JSONB is Postgres's binary JSON type. indexed, queryable, and powerful for variable-schema data.
-- Schema: structured columns + flexible JSONB for extension data
CREATE TABLE events (
id BIGSERIAL PRIMARY KEY,
user_id BIGINT NOT NULL,
event_type TEXT NOT NULL,
occurred_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
properties JSONB DEFAULT '{}' -- flexible per-event-type data
);
-- Insert with JSONB
INSERT INTO events (user_id, event_type, properties) VALUES
(1, 'purchase', '{"sku": "SHOE-42", "price": 129.99, "currency": "USD"}'),
(1, 'page_view', '{"path": "/products/shoes", "referrer": "google"}');
-- Query by JSONB key
SELECT * FROM events WHERE properties->>'sku' = 'SHOE-42';
SELECT * FROM events WHERE (properties->>'price')::numeric > 100;
-- JSONB contains operator (@>)
SELECT * FROM events WHERE properties @> '{"currency": "USD"}';
-- Index for JSONB queries
CREATE INDEX ON events USING gin(properties); -- GIN index for @> and existence checks
CREATE INDEX ON events ((properties->>'sku')); -- B-tree on specific key
When to use JSONB:
- Configuration or preferences per entity (variable keys)
- Event properties (different per event type)
- Third-party webhook payloads
- Schema-flexible extension data
When not to use JSONB:
- Data you filter/sort/join on frequently. use typed columns
- Data with rigid structure known at design time. normalise it
- Replacing your entire schema with JSONB documents. that's MongoDB; don't do it in Postgres
Full-Text Search
-- Add a tsvector column for full-text search
ALTER TABLE articles ADD COLUMN search_vector TSVECTOR;
-- Populate it
UPDATE articles SET search_vector = to_tsvector('english', title || ' ' || body);
-- Create a GIN index
CREATE INDEX ON articles USING gin(search_vector);
-- Keep it updated with a trigger
CREATE FUNCTION update_article_search_vector()
RETURNS TRIGGER AS $$
BEGIN
NEW.search_vector := to_tsvector('english', NEW.title || ' ' || coalesce(NEW.body, ''));
RETURN NEW;
END;
$$ LANGUAGE plpgsql;
CREATE TRIGGER articles_search_vector_update
BEFORE INSERT OR UPDATE OF title, body ON articles
FOR EACH ROW EXECUTE FUNCTION update_article_search_vector();
-- Search query
SELECT id, title, ts_rank(search_vector, query) AS rank
FROM articles, to_tsquery('english', 'postgres & performance') query
WHERE search_vector @@ query
ORDER BY rank DESC
LIMIT 20;
-- Highlight matching terms
SELECT title, ts_headline('english', body, to_tsquery('postgres & performance'),
'MaxFragments=3, MaxWords=30') AS excerpt
FROM articles
WHERE search_vector @@ to_tsquery('english', 'postgres & performance');
Postgres full-text search handles stemming, stop words, and rankings. For most apps it's good enough. Algolia/Typesense is only needed when you need fuzzy matching, typo tolerance, or faceted search.
Connection Pooling
Postgres creates one OS process per connection. Each connection consumes ~5-10MB of memory. A web server with 100 instances × 20 connections each = 2,000 connections. enough to bring a Postgres server to its knees.
Always use a connection pooler in production.
PgBouncer (most common)
# pgbouncer.ini
[databases]
myapp = host=localhost port=5432 dbname=myapp
[pgbouncer]
pool_mode = transaction # Most efficient: connection returned after each transaction
max_client_conn = 1000 # Max clients connecting to pgBouncer
default_pool_size = 25 # Postgres connections per database
Pool modes:
session: one connection per client session (least efficient)transaction: connection returned after each transaction (recommended for web apps)statement: connection returned after each statement (can break multi-statement transactions)
Application-level pooling
For ORMs and query builders, set maximum connection limits:
// Prisma
// DATABASE_URL with connection limit parameter
DATABASE_URL="postgresql://user:pass@localhost:5432/myapp?connection_limit=10"
// Drizzle / knex
const db = knex({
client: "postgresql",
connection: config.database,
pool: { min: 2, max: 10 },
});
Production Operations
Backups
# Logical backup (portable, can restore to different version)
pg_dump myapp > backup.sql
pg_dump --format=custom myapp > backup.dump # Compressed, faster restore
# Restore
psql myapp < backup.sql
pg_restore --dbname=myapp backup.dump
# Continuous archiving (WAL archiving) - point-in-time recovery
# Configure in postgresql.conf:
# archive_mode = on
# archive_command = 'cp %p /path/to/archive/%f'
For managed Postgres (Supabase, RDS, Neon), backups and PITR are handled automatically. Verify the retention period matches your RTO/RPO requirements.
Vacuum and maintenance
Postgres uses MVCC (Multi-Version Concurrency Control). Deleted/updated rows leave "dead tuples" that VACUUM cleans up. Autovacuum runs automatically, but monitor it:
-- Check for tables needing vacuum
SELECT schemaname, tablename, n_dead_tup, n_live_tup,
last_autovacuum, last_autoanalyze
FROM pg_stat_user_tables
WHERE n_dead_tup > 10000
ORDER BY n_dead_tup DESC;
-- Manual vacuum (non-blocking)
VACUUM ANALYZE tablename;
-- VACUUM FULL reclaims disk space but locks the table
VACUUM FULL tablename; -- Use pg_repack for zero-downtime space reclaim
Monitoring queries
-- Active long-running queries
SELECT pid, now() - query_start AS duration, query, state
FROM pg_stat_activity
WHERE state != 'idle'
AND query_start < now() - interval '5 minutes'
ORDER BY duration DESC;
-- Kill a stuck query
SELECT pg_terminate_backend(pid) FROM pg_stat_activity
WHERE pid = 12345;
Common Pitfalls
1. Missing NOT NULL constraints: NULL is a common source of bugs. Default to NOT NULL and add DEFAULT values. Only allow NULL when the absence of a value is semantically meaningful.
2. Storing sensitive data in plaintext: passwords must be hashed (never store plaintext). Consider encrypting PII columns with pgcrypto or application-level encryption.
3. Long-running transactions: transactions that stay open block VACUUM and hold locks. Always commit or rollback quickly.
4. No connection pooling: covered above. This is the #1 production mistake for web apps.
5. SELECT * in production. fetch only the columns you need. SELECT * transfers more data, breaks when columns are added, and can't benefit from covering indexes.
FAQ
Q: Postgres vs MySQL. when to choose which? Postgres for complex queries, JSONB, full-text search, custom types, and extensions. MySQL for teams already invested in the MySQL ecosystem. For new projects, Postgres is the better choice in almost all scenarios.
Q: How do I run migrations safely in production? Use Flyway or Liquibase (Java), Alembic (Python), Prisma Migrate (Node.js), or plain SQL migration files. Always test migrations on a staging database first. Non-destructive migrations (adding columns, new indexes concurrently) are safe; dropping columns or changing types requires care.
Q: How do I add an index without locking the table?
Use CREATE INDEX CONCURRENTLY: it builds the index without a table lock. Takes longer but safe for production.
Q: When should I use a materialized view?
When a complex aggregation or join is needed repeatedly but the underlying data changes infrequently. REFRESH MATERIALIZED VIEW updates it; REFRESH MATERIALIZED VIEW CONCURRENTLY does it without locking.
Q: How do I handle schema migrations with zero downtime?
- Add columns as nullable first; backfill; add NOT NULL constraint once backfilled
- Remove columns in two deployments: first remove code references, then run the DROP
- Use
CREATE INDEX CONCURRENTLYfor new indexes - Never rename a column without an aliasing step
Conclusion
Postgres is the foundation most serious production applications are built on. Its combination of reliability, SQL power, extension ecosystem, and managed hosting options makes it the default choice for application developers.
Master the fundamentals. proper types, indexes tuned to your queries, transactions scoped tightly, connection pooling always on. and Postgres will handle volumes that would surprise you.
Next: JSONB Indexing and Query Performance in Postgres. a deep dive into getting the most out of Postgres's JSON capabilities.