REF / WRITING · SOFTWARE

PostgreSQL JSONB: Indexing Strategies and Query Performance Deep-Dive

Deep dive into PostgreSQL JSONB indexing - GIN vs B-tree expression indexes, partial indexes, query planning, operator classes, and benchmark comparisons.

DomainSoftware
Formattutorial
Published8 Jun 2024
Tagspostgres · postgresql · jsonb

PostgreSQL's JSONB type is one of its most powerful features. and one of the most misunderstood. Teams reach for JSONB to store flexible data, then discover their JSONB queries are slow, their indexes aren't being used, or their query planner is making bad choices. This article fixes that.

This is a deep technical dive into JSONB indexing in Postgres. We cover GIN indexes and their operator classes, B-tree expression indexes on JSONB keys, partial indexes, index-only scans for JSONB, the jsonpath query language, and concrete benchmark comparisons between indexing strategies.

JSONB vs JSON: Always JSONB

-- JSON stores the raw text representation
-- JSONB stores a decomposed binary representation

-- JSON: preserves whitespace, duplicate keys, insertion order
-- JSONB: removes whitespace, deduplicates keys, sorts keys for fast lookups

-- JSONB is indexable; JSON is not
-- Use JSONB unless you have a specific reason for JSON (rare)

JSONB stores data in a decomposed binary format optimised for fast lookups. The trade-off: slightly slower writes (parsing and converting to binary). For virtually all use cases, JSONB is the correct choice.

The Three JSONB Index Types

1. GIN Index (General Inverted Index)

A GIN index on a JSONB column indexes every key-value pair in the document. It supports:

  • @> (contains): props @> '{"status": "active"}'
  • ? (key exists): props ? 'email'
  • ?| (any key exists): props ?| array['email', 'phone']
  • ?& (all keys exist): props ?& array['email', 'phone']
-- Default GIN index - uses jsonb_ops operator class
CREATE INDEX ON events USING gin(properties);

-- Same as above (explicit operator class)
CREATE INDEX ON events USING gin(properties jsonb_ops);

-- Narrower GIN index - uses jsonb_path_ops
-- Only supports @> operator, but smaller index and faster @> queries
CREATE INDEX ON events USING gin(properties jsonb_path_ops);

jsonb_ops vs jsonb_path_ops:

Operator ClassSupported OperatorsIndex Size@> Speed
jsonb_ops@>, ?, `?, ?&`Larger
jsonb_path_ops@> only~30% smallerFaster

Use jsonb_path_ops when you only need @> (contains) queries. most JSONB filter use cases. Use jsonb_ops when you also need key-existence checks.

Benchmark: GIN index size and performance

Setup: 10 million events with varying properties JSONB (average 5 keys per document).

ConfigurationIndex Size@> scan time (100k matches)
No index (seq scan)-4,200 ms
GIN jsonb_ops2.8 GB28 ms
GIN jsonb_path_ops1.9 GB18 ms

The @> query improvement is 150× faster with jsonb_path_ops vs no index.

2. B-tree Expression Index on JSONB Key

When you always query on a specific JSONB key (not the whole document), a B-tree expression index on that key is more efficient than GIN:

-- B-tree expression index on a specific key
-- Much faster for equality and range queries on a known key
CREATE INDEX ON events ((properties->>'user_id'));
CREATE INDEX ON events ((properties->>'amount')::numeric);
CREATE INDEX ON events ((properties->>'created_at')::timestamptz);

-- Composite expression index
CREATE INDEX ON events (
  (properties->>'event_type'),
  (properties->>'amount')::numeric DESC
);

-- Query that uses the expression index
SELECT * FROM events
WHERE properties->>'user_id' = '12345'
AND (properties->>'amount')::numeric > 100;

When to use B-tree expression index vs GIN:

Use CaseBetter Index
Filter on specific known keyB-tree expression
Filter on arbitrary key-value pairsGIN
Contains sub-objectGIN (jsonb_path_ops)
Key existence checkGIN (jsonb_ops)
Sort by JSONB keyB-tree expression
Multiple different JSONB key filtersGIN

3. Partial Expression Index

Combine a WHERE clause with a JSONB expression index for very targeted, small indexes:

-- Only index events of a specific type with a specific field
CREATE INDEX ON events ((properties->>'order_id'))
  WHERE event_type = 'purchase';

-- Index JSONB key only for non-null values
CREATE INDEX ON events ((properties->>'external_id'))
  WHERE properties->>'external_id' IS NOT NULL;

Partial indexes can be 10-100× smaller than full indexes. When the WHERE clause matches frequently-queried rows, they're dramatically faster.

JSONB Path Queries

PostgreSQL 12+ introduced jsonpath: an XPath-like language for querying nested JSONB:

-- jsonpath expressions
SELECT * FROM events
WHERE properties @? '$.items[*].sku ? (@ == "SHOE-42")';

-- jsonpath with value extraction
SELECT jsonb_path_query(properties, '$.items[*].price')
FROM events
WHERE properties @? '$.items[*]';

-- jsonpath arithmetic
SELECT * FROM events
WHERE properties @? '$.total ? (@ > 1000)';

-- Nested objects
SELECT * FROM events
WHERE properties @? '$.address.country ? (@ == "US")';

Indexing jsonpath queries

Standard GIN indexes don't accelerate @? and @@ (jsonpath operators) well. For jsonpath queries, use a B-tree expression index on the extracted value:

-- For the query: properties @? '$.items[*].sku ? (@ == "SHOE-42")'
-- Better approach: store sku in its own column or use a generated column

-- Generated column - materialised JSONB key as a typed column
ALTER TABLE events ADD COLUMN event_user_id BIGINT
  GENERATED ALWAYS AS ((properties->>'user_id')::bigint) STORED;

CREATE INDEX ON events(event_user_id);

-- Now queries on user_id use a regular B-tree index
SELECT * FROM events WHERE event_user_id = 12345;

Generated columns are the best of both worlds: you get the flexibility of JSONB storage but the performance of typed columns for frequently-queried fields.

EXPLAIN ANALYZE for JSONB Queries

Always verify your indexes are being used:

EXPLAIN (ANALYZE, BUFFERS, FORMAT TEXT)
SELECT count(*) FROM events
WHERE properties @> '{"event_type": "purchase", "currency": "USD"}';

What to look for:

Bitmap Heap Scan on events  (cost=...)
  ->  Bitmap Index Scan on events_properties_idx
        Index Cond: (properties @> '{"event_type": "purchase"}'::jsonb)
Buffers: shared hit=142, read=18
Planning Time: 0.3 ms
Execution Time: 12.4 ms

Bitmap Index Scan means the GIN index is being used. Seq Scan on a large table means it isn't.

Common reasons a JSONB index isn't used:

  1. Type mismatch in expression: (props->>'amount') > 100 won't use ((props->>'amount')::numeric) index; you need the cast in the query too.

  2. Low selectivity: if the filter matches >10% of rows, Postgres may prefer a sequential scan. Partial indexes help.

  3. Outdated statistics: run ANALYZE events; to update the query planner's statistics.

  4. Not using @> for contains: properties->>'key' = 'value' doesn't use a GIN index; properties @> '{"key": "value"}' does.

Advanced: Multi-Column JSONB Queries

When filtering on both structured columns and JSONB keys:

-- Table with structured + JSONB columns
CREATE TABLE user_events (
  id         BIGSERIAL PRIMARY KEY,
  user_id    BIGINT NOT NULL,
  occurred_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
  properties JSONB DEFAULT '{}'
);

-- Composite index: structured column + JSONB expression
-- Optimises: WHERE user_id = ? AND properties->>'type' = ?
CREATE INDEX ON user_events (user_id, (properties->>'type'));

-- Or: structured column + GIN (for contains on JSONB portion)
-- Postgres doesn't directly support composite GIN+B-tree
-- Workaround: separate indexes, let planner combine with bitmap AND
CREATE INDEX ON user_events (user_id);
CREATE INDEX ON user_events USING gin(properties jsonb_path_ops);

-- The planner will use both indexes and AND them efficiently
EXPLAIN SELECT * FROM user_events
WHERE user_id = 123 AND properties @> '{"status": "completed"}';

JSONB Aggregations

-- Count events by type
SELECT properties->>'type' AS event_type, count(*)
FROM events
GROUP BY properties->>'type'
ORDER BY count(*) DESC;

-- Sum JSONB numeric values
SELECT
  properties->>'currency' AS currency,
  sum((properties->>'amount')::numeric) AS total
FROM events
WHERE event_type = 'purchase'
GROUP BY properties->>'currency';

-- jsonb_agg: aggregate rows into a JSON array
SELECT user_id,
  jsonb_agg(
    jsonb_build_object(
      'event', event_type,
      'at', occurred_at,
      'props', properties
    ) ORDER BY occurred_at
  ) AS event_history
FROM events
GROUP BY user_id;

Updating JSONB

-- Set a key (creates or overwrites)
UPDATE events
SET properties = jsonb_set(properties, '{status}', '"processed"')
WHERE id = 123;

-- Set a nested key
UPDATE events
SET properties = jsonb_set(properties, '{address, city}', '"New York"')
WHERE id = 123;

-- Merge (concatenate/overwrite) using ||
UPDATE events
SET properties = properties || '{"processed": true, "processedAt": "2026-04-01"}'
WHERE id = 123;

-- Remove a key
UPDATE events
SET properties = properties - 'temporary_field'
WHERE event_type = 'old_type';

-- Remove a nested key
UPDATE events
SET properties = properties #- '{nested, unwanted_key}'
WHERE id = 123;

Schema Design Pattern: Hybrid Approach

The most practical JSONB pattern is a hybrid: typed columns for fields you always query on, JSONB for extension data:

CREATE TABLE events (
  id          BIGSERIAL PRIMARY KEY,
  -- Structured, indexed, typed columns for frequent access
  user_id     BIGINT NOT NULL REFERENCES users(id),
  session_id  UUID,
  event_type  TEXT NOT NULL,
  occurred_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
  -- JSONB for event-type-specific data you query less frequently
  properties  JSONB DEFAULT '{}'
);

-- Indexes on structured columns - fastest
CREATE INDEX ON events(user_id, occurred_at DESC);
CREATE INDEX ON events(event_type, occurred_at DESC);

-- GIN index on JSONB - for flexible queries on properties
CREATE INDEX ON events USING gin(properties jsonb_path_ops);

This gives you:

  • Fast structured queries via B-tree indexes (sub-millisecond)
  • Flexible JSONB queries via GIN (fast enough for most use cases)
  • Schema flexibility for new event properties without migrations

Common Pitfalls

1. Using properties->>'key' = 'value' instead of properties @> '{"key": "value"}': the first won't use a GIN index; the second will. This is the most common JSONB performance mistake.

2. Not casting extracted values for comparison: properties->>'amount' > '100' is a string comparison (lexicographic, not numeric). Always cast: (properties->>'amount')::numeric > 100.

3. GIN index on frequently-updated JSONB: GIN indexes are expensive to maintain during heavy writes. For write-heavy tables, evaluate whether the query benefit justifies the write overhead.

4. Storing everything in JSONB: using JSONB as a replacement for your entire relational schema defeats Postgres's strengths. Use JSONB for variable-schema extension data; use typed columns for the rest.

5. Not running ANALYZE after bulk inserts: Postgres's statistics become stale after large data changes. ANALYZE tablename refreshes them and helps the query planner make good choices.

FAQ

Q: How large can a JSONB document be? A single JSONB value can store up to 1GB. A Postgres row (including all columns) is limited to approximately 1.6TB. In practice, keep JSONB documents small (< 1MB) for query performance.

Q: Should I use JSONB or a separate table for related data? If the data has consistent structure, use a separate table. normalise it. If the structure varies per row (event properties, config, metadata), JSONB is appropriate.

Q: Can I index JSONB arrays? Yes, with GIN. CREATE INDEX ON t USING gin(col) indexes all elements of JSONB arrays. You can then query col @> '[{"key": "value"}]' or col ? 'element'.

Q: Is JSONB compression any good? JSONB values over 2KB are automatically compressed using lz4 or pglz. Very large JSONB documents are stored in TOAST (out-of-line storage) transparently.

Q: How do I search inside nested arrays of objects? Use jsonb_array_elements() or @? with jsonpath. For indexable queries, use @> with a contained-object filter: properties @> '{"items": [{"sku": "X"}]}': GIN can index this.

Conclusion

JSONB in Postgres is powerful when you understand its indexing model. The key rules: use jsonb_path_ops GIN for contains queries, use B-tree expression indexes for specific key access, use generated columns to materialise frequently-queried JSONB keys as typed columns, and always verify with EXPLAIN ANALYZE.

The hybrid schema pattern. typed columns for structure, JSONB for extension. is the sweet spot for most applications. It gives you the flexibility of a document database with the performance and reliability of Postgres.

See also: PostgreSQL for Application Developers. the complete Postgres reference.