REF / WRITING · SOFTWARE

PostgreSQL for Application Developers: The Complete Guide

Complete PostgreSQL guide - schema design, indexes, transactions, JSONB, full-text search, connection pooling, and production operations.

DomainSoftware
Formattutorial
Published25 Jun 2024
Tagspostgres · postgresql · sql

PostgreSQL is the world's most advanced open-source relational database. It's the default choice for serious applications. powerful enough to handle the most complex data requirements, reliable enough for financial and healthcare systems, and open enough to deploy anywhere from a Raspberry Pi to a 64-core cloud instance.

This guide is for application developers who use Postgres through ORMs but want to understand what's happening underneath. and for those ready to write SQL directly. We cover schema design, indexes, transactions, JSONB, full-text search, connection pooling, and production operations.

Why PostgreSQL?

There are many database options. Why Postgres?

Reliability: ACID transactions, point-in-time recovery, and battle-tested durability. Postgres has been in production at scale for 30+ years.

Richness: full SQL, window functions, CTEs, arrays, JSONB, full-text search, custom types, materialized views, and a procedural language (PL/pgSQL). No other open-source database matches the feature set.

Extensions: PostGIS (geospatial), pgvector (vector embeddings for AI), TimescaleDB (time-series), Citus (distributed), pg_trgm (trigram full-text), and hundreds more.

Managed options: Supabase, AWS RDS/Aurora, Google Cloud SQL, Neon, Railway, Render, Fly.io all offer managed Postgres. You're never locked in.

Ecosystem: every language has a mature Postgres driver. Every ORM supports it. Every migration tool works with it.

Schema Design Fundamentals

Data types matter

-- Use appropriate types - don't store everything as text
CREATE TABLE products (
  id          BIGSERIAL PRIMARY KEY,           -- not INT for large tables
  sku         TEXT NOT NULL,                   -- variable length text
  name        VARCHAR(255) NOT NULL,           -- text with max length
  price       NUMERIC(12, 2) NOT NULL,         -- not FLOAT for money
  weight_kg   REAL,                            -- float is fine for measurements
  in_stock    BOOLEAN NOT NULL DEFAULT TRUE,
  metadata    JSONB,                           -- structured variable data
  tags        TEXT[],                          -- array of text
  created_at  TIMESTAMPTZ NOT NULL DEFAULT NOW(),  -- timezone-aware
  updated_at  TIMESTAMPTZ NOT NULL DEFAULT NOW()
);

Common type mistakes:

  • FLOAT/DOUBLE for money. floating-point rounding errors. Use NUMERIC(precision, scale).
  • TIMESTAMP without timezone. stores without timezone context. Always use TIMESTAMPTZ.
  • INT for IDs on large tables. max 2.1 billion. Use BIGINT or BIGSERIAL.
  • CHAR(n): pads with spaces. Use VARCHAR(n) or TEXT.

Constraints as documentation

CREATE TABLE orders (
  id          BIGSERIAL PRIMARY KEY,
  user_id     BIGINT NOT NULL REFERENCES users(id),
  status      TEXT NOT NULL CHECK (status IN ('pending', 'paid', 'shipped', 'cancelled')),
  total       NUMERIC(12, 2) NOT NULL CHECK (total >= 0),
  created_at  TIMESTAMPTZ NOT NULL DEFAULT NOW()
);

-- Unique constraint (creates an index automatically)
ALTER TABLE users ADD CONSTRAINT users_email_unique UNIQUE (email);

-- Composite unique constraint
ALTER TABLE order_items ADD CONSTRAINT order_item_unique
  UNIQUE (order_id, product_id);

Constraints document intent and catch bugs. A CHECK constraint on status prevents invalid states from being stored. no application code required.

Indexing Strategy

The most impactful performance optimisation in Postgres. Most query slowness is caused by missing indexes.

B-tree indexes (default)

-- Single column - for equality and range queries
CREATE INDEX ON users(email);  -- equality: WHERE email = ?
CREATE INDEX ON orders(created_at);  -- range: WHERE created_at > ?

-- Composite - for multi-column queries
-- Order matters: put equality columns first, then range
CREATE INDEX ON orders(user_id, created_at DESC);  -- WHERE user_id = ? ORDER BY created_at DESC

-- Partial index - only indexes rows matching a condition
-- Much smaller and faster for sparse conditions
CREATE INDEX ON orders(created_at) WHERE status = 'pending';
CREATE INDEX ON users(email) WHERE deleted_at IS NULL;

-- Covering index - includes columns to avoid table lookups
CREATE INDEX ON orders(user_id) INCLUDE (status, total, created_at);

When to add an index

Add an index when:

  • You filter, sort, or join on that column regularly
  • The column has high cardinality (many distinct values)
  • The table is large enough that sequential scans are slow

Don't add an index on:

  • Columns you never query on
  • Low-cardinality columns (boolean, status with 2-3 values). unless partial
  • Columns on small tables (< 1,000 rows). sequential scan is faster
  • Every foreign key automatically. only add FK indexes if you query by that FK

Every index slows writes slightly and uses disk space. Index selectively.

Finding slow queries

-- Enable slow query logging
ALTER SYSTEM SET log_min_duration_statement = '100ms';  -- log queries > 100ms
SELECT pg_reload_conf();

-- Find queries that hit sequential scans on large tables
SELECT schemaname, tablename, seq_scan, idx_scan, n_live_tup
FROM pg_stat_user_tables
WHERE seq_scan > 0
ORDER BY seq_scan DESC;

-- EXPLAIN ANALYZE - the most important diagnostic command
EXPLAIN (ANALYZE, BUFFERS, FORMAT TEXT)
SELECT * FROM orders WHERE user_id = 123 ORDER BY created_at DESC LIMIT 20;

Look for Seq Scan on large tables (high rows count). That's a missing index. Index Scan and Bitmap Index Scan are what you want.

Transactions

-- Explicit transaction
BEGIN;
  UPDATE accounts SET balance = balance - 100 WHERE id = 1;
  UPDATE accounts SET balance = balance + 100 WHERE id = 2;
  -- Both succeed or both fail
COMMIT;

-- Savepoints for partial rollback
BEGIN;
  INSERT INTO orders (...) VALUES (...);
  SAVEPOINT order_created;

  INSERT INTO order_items (...) VALUES (...);
  -- If this fails, roll back only to savepoint
  ROLLBACK TO SAVEPOINT order_created;

COMMIT;

Isolation levels

-- Default: READ COMMITTED (most reads see committed data)
BEGIN;
-- ... your statements
COMMIT;

-- REPEATABLE READ: consistent snapshot of the database throughout the transaction
BEGIN TRANSACTION ISOLATION LEVEL REPEATABLE READ;
-- Good for reports that span multiple queries
COMMIT;

-- SERIALIZABLE: strictest, prevents all anomalies
BEGIN TRANSACTION ISOLATION LEVEL SERIALIZABLE;
-- Use for financial calculations where concurrent modifications matter
COMMIT;

JSONB: When and How

JSONB is Postgres's binary JSON type. indexed, queryable, and powerful for variable-schema data.

-- Schema: structured columns + flexible JSONB for extension data
CREATE TABLE events (
  id          BIGSERIAL PRIMARY KEY,
  user_id     BIGINT NOT NULL,
  event_type  TEXT NOT NULL,
  occurred_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
  properties  JSONB DEFAULT '{}'  -- flexible per-event-type data
);

-- Insert with JSONB
INSERT INTO events (user_id, event_type, properties) VALUES
  (1, 'purchase', '{"sku": "SHOE-42", "price": 129.99, "currency": "USD"}'),
  (1, 'page_view', '{"path": "/products/shoes", "referrer": "google"}');

-- Query by JSONB key
SELECT * FROM events WHERE properties->>'sku' = 'SHOE-42';
SELECT * FROM events WHERE (properties->>'price')::numeric > 100;

-- JSONB contains operator (@>)
SELECT * FROM events WHERE properties @> '{"currency": "USD"}';

-- Index for JSONB queries
CREATE INDEX ON events USING gin(properties);  -- GIN index for @> and existence checks
CREATE INDEX ON events ((properties->>'sku'));  -- B-tree on specific key

When to use JSONB:

  • Configuration or preferences per entity (variable keys)
  • Event properties (different per event type)
  • Third-party webhook payloads
  • Schema-flexible extension data

When not to use JSONB:

  • Data you filter/sort/join on frequently. use typed columns
  • Data with rigid structure known at design time. normalise it
  • Replacing your entire schema with JSONB documents. that's MongoDB; don't do it in Postgres

Full-Text Search

-- Add a tsvector column for full-text search
ALTER TABLE articles ADD COLUMN search_vector TSVECTOR;

-- Populate it
UPDATE articles SET search_vector = to_tsvector('english', title || ' ' || body);

-- Create a GIN index
CREATE INDEX ON articles USING gin(search_vector);

-- Keep it updated with a trigger
CREATE FUNCTION update_article_search_vector()
RETURNS TRIGGER AS $$
BEGIN
  NEW.search_vector := to_tsvector('english', NEW.title || ' ' || coalesce(NEW.body, ''));
  RETURN NEW;
END;
$$ LANGUAGE plpgsql;

CREATE TRIGGER articles_search_vector_update
  BEFORE INSERT OR UPDATE OF title, body ON articles
  FOR EACH ROW EXECUTE FUNCTION update_article_search_vector();

-- Search query
SELECT id, title, ts_rank(search_vector, query) AS rank
FROM articles, to_tsquery('english', 'postgres & performance') query
WHERE search_vector @@ query
ORDER BY rank DESC
LIMIT 20;

-- Highlight matching terms
SELECT title, ts_headline('english', body, to_tsquery('postgres & performance'),
  'MaxFragments=3, MaxWords=30') AS excerpt
FROM articles
WHERE search_vector @@ to_tsquery('english', 'postgres & performance');

Postgres full-text search handles stemming, stop words, and rankings. For most apps it's good enough. Algolia/Typesense is only needed when you need fuzzy matching, typo tolerance, or faceted search.

Connection Pooling

Postgres creates one OS process per connection. Each connection consumes ~5-10MB of memory. A web server with 100 instances × 20 connections each = 2,000 connections. enough to bring a Postgres server to its knees.

Always use a connection pooler in production.

PgBouncer (most common)

# pgbouncer.ini
[databases]
myapp = host=localhost port=5432 dbname=myapp

[pgbouncer]
pool_mode = transaction     # Most efficient: connection returned after each transaction
max_client_conn = 1000      # Max clients connecting to pgBouncer
default_pool_size = 25      # Postgres connections per database

Pool modes:

  • session: one connection per client session (least efficient)
  • transaction: connection returned after each transaction (recommended for web apps)
  • statement: connection returned after each statement (can break multi-statement transactions)

Application-level pooling

For ORMs and query builders, set maximum connection limits:

// Prisma
// DATABASE_URL with connection limit parameter
DATABASE_URL="postgresql://user:pass@localhost:5432/myapp?connection_limit=10"

// Drizzle / knex
const db = knex({
  client: "postgresql",
  connection: config.database,
  pool: { min: 2, max: 10 },
});

Production Operations

Backups

# Logical backup (portable, can restore to different version)
pg_dump myapp > backup.sql
pg_dump --format=custom myapp > backup.dump  # Compressed, faster restore

# Restore
psql myapp < backup.sql
pg_restore --dbname=myapp backup.dump

# Continuous archiving (WAL archiving) - point-in-time recovery
# Configure in postgresql.conf:
# archive_mode = on
# archive_command = 'cp %p /path/to/archive/%f'

For managed Postgres (Supabase, RDS, Neon), backups and PITR are handled automatically. Verify the retention period matches your RTO/RPO requirements.

Vacuum and maintenance

Postgres uses MVCC (Multi-Version Concurrency Control). Deleted/updated rows leave "dead tuples" that VACUUM cleans up. Autovacuum runs automatically, but monitor it:

-- Check for tables needing vacuum
SELECT schemaname, tablename, n_dead_tup, n_live_tup,
       last_autovacuum, last_autoanalyze
FROM pg_stat_user_tables
WHERE n_dead_tup > 10000
ORDER BY n_dead_tup DESC;

-- Manual vacuum (non-blocking)
VACUUM ANALYZE tablename;

-- VACUUM FULL reclaims disk space but locks the table
VACUUM FULL tablename;  -- Use pg_repack for zero-downtime space reclaim

Monitoring queries

-- Active long-running queries
SELECT pid, now() - query_start AS duration, query, state
FROM pg_stat_activity
WHERE state != 'idle'
AND query_start < now() - interval '5 minutes'
ORDER BY duration DESC;

-- Kill a stuck query
SELECT pg_terminate_backend(pid) FROM pg_stat_activity
WHERE pid = 12345;

Common Pitfalls

1. Missing NOT NULL constraints: NULL is a common source of bugs. Default to NOT NULL and add DEFAULT values. Only allow NULL when the absence of a value is semantically meaningful.

2. Storing sensitive data in plaintext: passwords must be hashed (never store plaintext). Consider encrypting PII columns with pgcrypto or application-level encryption.

3. Long-running transactions: transactions that stay open block VACUUM and hold locks. Always commit or rollback quickly.

4. No connection pooling: covered above. This is the #1 production mistake for web apps.

5. SELECT * in production. fetch only the columns you need. SELECT * transfers more data, breaks when columns are added, and can't benefit from covering indexes.

FAQ

Q: Postgres vs MySQL. when to choose which? Postgres for complex queries, JSONB, full-text search, custom types, and extensions. MySQL for teams already invested in the MySQL ecosystem. For new projects, Postgres is the better choice in almost all scenarios.

Q: How do I run migrations safely in production? Use Flyway or Liquibase (Java), Alembic (Python), Prisma Migrate (Node.js), or plain SQL migration files. Always test migrations on a staging database first. Non-destructive migrations (adding columns, new indexes concurrently) are safe; dropping columns or changing types requires care.

Q: How do I add an index without locking the table? Use CREATE INDEX CONCURRENTLY: it builds the index without a table lock. Takes longer but safe for production.

Q: When should I use a materialized view? When a complex aggregation or join is needed repeatedly but the underlying data changes infrequently. REFRESH MATERIALIZED VIEW updates it; REFRESH MATERIALIZED VIEW CONCURRENTLY does it without locking.

Q: How do I handle schema migrations with zero downtime?

  • Add columns as nullable first; backfill; add NOT NULL constraint once backfilled
  • Remove columns in two deployments: first remove code references, then run the DROP
  • Use CREATE INDEX CONCURRENTLY for new indexes
  • Never rename a column without an aliasing step

Conclusion

Postgres is the foundation most serious production applications are built on. Its combination of reliability, SQL power, extension ecosystem, and managed hosting options makes it the default choice for application developers.

Master the fundamentals. proper types, indexes tuned to your queries, transactions scoped tightly, connection pooling always on. and Postgres will handle volumes that would surprise you.

Next: JSONB Indexing and Query Performance in Postgres. a deep dive into getting the most out of Postgres's JSON capabilities.