Why your PostgreSQL planner is wrong (and what to do about it)

Frank Heikensยท
PostgreSQLPerformance

The planner sees the world differently than you do

When you write a SQL query, you think about what you want. PostgreSQL's query planner thinks about how to get it. These are fundamentally different problems, and the gap between them is where most performance issues live.

Consider a simple join:

SELECT o.id, o.total, c.name
FROM orders o
JOIN customers c ON c.id = o.customer_id
WHERE o.created_at > now() - interval '30 days'
  AND c.region = 'eu-west';

You read this as: get recent EU orders with customer names. The planner reads it as a constraint satisfaction problem with multiple possible execution strategies, each with different cost profiles depending on data distribution, index availability, and buffer cache state.

What the planner actually decides

For every query, the planner must choose:

  • Join order -- which table to scan first
  • Join method -- nested loop, hash join, or merge join
  • Access method -- sequential scan, index scan, bitmap scan, or index-only scan
  • Filter placement -- where to apply WHERE conditions in the execution tree

Each combination produces a different execution plan. The planner estimates the cost of each and picks the cheapest. When those estimates are accurate, this works well. When they are wrong, performance degrades -- sometimes by orders of magnitude.

Where estimates go wrong

The planner relies on statistics collected by ANALYZE (or autovacuum). These statistics describe:

  • Row counts per table
  • Most common values and their frequencies
  • Histogram boundaries for value distribution
  • Correlation between physical and logical row ordering
flowchart TD
    A[Query arrives] --> B[Parser]
    B --> C[Planner]
    C --> D{Check statistics}
    D -->|Accurate| E[Good plan]
    D -->|Stale/wrong| F[Bad plan]
    E --> G[Fast execution]
    F --> H[Slow execution]
    H --> I[Investigate with EXPLAIN ANALYZE]
    I --> J[Fix: update stats / add index / rewrite query]

The most common failure modes:

Stale statistics. After a bulk load or major data change, the statistics no longer reflect reality. The planner might estimate 100 rows when there are 100,000. Run ANALYZE on affected tables.

Correlated columns. The planner assumes column values are independent. If region equals eu-west strongly correlates with created_at being recent, the planner cannot know that and will underestimate the selectivity of the combined condition. PostgreSQL 14+ supports extended statistics (CREATE STATISTICS) for this.

Join cardinality misestimation. When joining multiple tables, estimation errors compound. A 2x error on two joins becomes a 4x error. This is where nested loops get chosen over hash joins (or vice versa) with catastrophic results.

How to diagnose this

EXPLAIN (ANALYZE, BUFFERS) is the tool. Not EXPLAIN alone -- you need actual row counts to compare against estimates.

EXPLAIN (ANALYZE, BUFFERS, FORMAT TEXT)
SELECT o.id, o.total, c.name
FROM orders o
JOIN customers c ON c.id = o.customer_id
WHERE o.created_at > now() - interval '30 days'
  AND c.region = 'eu-west';

Look for lines where actual rows differs significantly from rows (estimated). A 10x discrepancy is a strong signal. A 100x discrepancy is almost certainly causing a bad plan choice.

What Arq does with this

Arq Signal watches for exactly these discrepancies continuously. When a query's actual execution diverges from the planner's expectations, Signal surfaces it as a finding -- not as a raw EXPLAIN output, but as a diagnosis: which estimates are wrong, why they might be wrong, and what to do about it.

Arq Analyzer goes deeper: it compares execution profiles across time, detects when a query's plan changes (plan regression), and recommends specific actions -- whether that is running ANALYZE, creating extended statistics, adding an index, or rewriting the query.

The goal is not to replace your understanding of the planner. It is to make sure you see the problems that matter before your users do.


This is the first post on the Elevarq blog. We will be writing regularly about PostgreSQL internals, performance analysis, and the tools we are building. If you want to follow along, subscribe to the RSS feed.

Reactions