PostgreSQL Performance Tuning: Beyond the Basics

If you've been running PostgreSQL for a while, you've probably already set shared_buffers to 25% of RAM, bumped work_mem, and added indexes to your slowest queries. Those are the right first moves. But they're also where a lot of tuning advice stops, which is a problem because the performance challenges that emerge at scale are qualitatively different from the ones you encounter on a single, lightly-loaded instance.

This guide covers the areas where experienced PostgreSQL operators find the most leverage once the basics are in place: vacuum configuration that prevents slow-building degradation, parameter settings whose defaults are actively harmful at scale, index selection as a tuning lever in its own right, query performance tooling that goes beyond pg_stat_statements, and the PostgreSQL 17 improvements that are worth upgrading for.

The parameters that matter most (and their dangerous defaults)

PostgreSQL ships with defaults designed to run on minimal hardware. That was a reasonable choice when the project started, but it means that a fresh installation on a modern server with 64GB of RAM and NVMe storage is leaving performance on the table. Many operators know to adjust shared_buffers and work_mem. Fewer know about the vacuum-related defaults that cause serious problems at scale.

Parameter	Default	Recommended	Why it matters
`shared_buffers`	128 MB	25-40% of RAM^[1]	PostgreSQL's main shared cache. Default is designed for systems with very little memory.
`work_mem`	4 MB	64-256 MB (with care)^[1]	Per-operation allocation for sorts and hash joins. 100 connections running complex queries can exhaust RAM if set too high.
`effective_cache_size`	4 GB	75% of RAM^[1]	Planner hint only (no memory is allocated). Affects whether index scans are preferred over sequential scans.
`autovacuum_vacuum_scale_factor`	0.2 (20%)	0.02-0.05 for large tables^[2]	At default, a billion-row table needs 200 million dead tuples before vacuum triggers.
`autovacuum_vacuum_cost_delay`	20 ms (pre-PG12) / 2 ms (PG12+)	0 ms for highest-churn tables on SSDs^[3]	Original 20ms throttling was designed for spinning disks. Modern releases default to 2ms, but on SSDs even that can slow vacuum down.
`max_connections`	100	Workload-dependent; use PgBouncer^[1]	Each connection uses 5-10 MB of RAM. High max_connections limits available work_mem.

The first three are covered in nearly every PostgreSQL tuning guide. The last three are where teams at scale find (or lose) significant performance, and they deserve a closer look.

A caveat on these recommendations: the values above are useful starting points, but they oversimplify the problem. Research from Andy Pavlo's Database Group at Carnegie Mellon University has demonstrated that database knobs interact with each other in ways that rule-of-thumb guidance can't capture.^[16] Changing work_mem affects how much benefit you get from shared_buffers, and the optimal combination depends on your specific workload mix. Their OtterTune project showed that ML-driven holistic tuning, where all parameters are optimised together against actual workload telemetry, can achieve dramatically better results than tuning parameters one at a time.^[17] The table above will get you past the worst defaults, but treat it as a floor, not a ceiling.

Vacuum: the silent performance killer

PostgreSQL's MVCC architecture means that UPDATE and DELETE operations don't immediately remove old row versions. Instead, they create "dead tuples" that remain on disk until vacuum cleans them up. This is a feature, not a bug: it's what makes PostgreSQL's concurrency model work. But it means that vacuum configuration directly determines how much wasted space and I/O overhead your database accumulates over time.

The problem is the defaults. With autovacuum_vacuum_scale_factor set to 0.2, vacuum won't trigger until 20% of a table's rows are dead. On a small table, that's fine. On a table with a billion rows, it means 200 million dead tuples accumulate before anything happens.^[2] Those dead tuples bloat your indexes, degrade query plan quality, consume storage, and increase I/O for every query that touches the table.

The second parameter, autovacuum_vacuum_cost_delay, compounds the problem. In older releases, the default was 20ms, designed to prevent vacuum from competing with production workloads on spinning disks. In modern releases (PostgreSQL 12+), the default was lowered to 2ms.^[3] On SSD-backed systems it often makes sense to lower it further, or to 0ms for your highest-churn tables. If you're still running a pre-12 version, reducing this from 20ms is one of the highest-leverage changes you can make to vacuum throughput.

Practical vacuum tuning

The most effective approach is per-table configuration for your highest-traffic tables. PostgreSQL allows you to override autovacuum settings at the table level:

ALTER TABLE orders SET (
  autovacuum_vacuum_scale_factor = 0.01,
  autovacuum_vacuum_cost_delay = 0,
  autovacuum_analyze_scale_factor = 0.005
);

This tells PostgreSQL to vacuum the orders table when just 1% of its rows are dead, with no throttling, and to update planner statistics when 0.5% of rows have changed. For a high-churn OLTP table, this prevents the slow accumulation of bloat that eventually degrades performance across the board.^[4]

You can monitor the effectiveness of your vacuum configuration by checking pg_stat_user_tables. The columns n_dead_tup (dead tuples waiting for vacuum) and last_autovacuum (when vacuum last ran) tell you whether your settings are keeping up with your write workload. If dead tuples are growing faster than vacuum can clean them, your settings need adjusting.^[5]

Where bottlenecks actually hide at scale

Once the parameter basics are in place, performance problems at scale tend to cluster in a few areas that are harder to diagnose with standard monitoring.

Query planning failures

Stale statistics are the most common cause of bad query plans. When the planner doesn't have accurate information about data distribution, it makes poor choices: sequential scans where index scans would be faster, nested loops where hash joins would be appropriate, or the reverse. The fix is to ensure autovacuum_analyze_scale_factor is low enough that statistics refresh frequently, and to run ANALYZE manually after bulk data loads.^[5]

Correlated subqueries are another common offender. A subquery that references the outer query executes once per row, which is catastrophic on large result sets. Recent PostgreSQL releases (16 and 17) add optimisations that can turn some correlated subqueries into joins automatically,^[6] but for earlier versions, identifying and rewriting correlated subqueries is one of the highest-leverage query optimisations available.

Lock contention

Long-running transactions holding row locks create cascading contention, where a single slow UPDATE can block dozens of subsequent transactions.^[8] DDL operations like ALTER TABLE are particularly dangerous because they acquire exclusive locks that block all other access. Tools like pg_repack and careful partition strategies can mitigate the DDL problem, but the underlying issue is usually application-level: transactions that hold locks longer than necessary.

I/O pressure

Index bloat is an underappreciated source of I/O overhead. As dead tuples accumulate, indexes grow larger than they need to be, which means every index scan reads more pages than necessary. This shows up as gradually degrading query performance rather than a sudden failure, making it difficult to diagnose without tracking index sizes over time. Monitoring pg_stat_user_tables for high n_dead_tup relative to n_live_tup is the simplest early warning.^[9]

Replication lag

Logical replication slot accumulation is a subtle but dangerous problem. If a replication consumer falls behind or disconnects, the replication slot prevents PostgreSQL from cleaning up WAL segments, which can eventually exhaust disk space on the primary. Monitoring replication slot lag and setting up alerts for stale slots is essential for any PostgreSQL deployment using logical replication.

Index selection: the other major tuning lever

Parameter tuning and vacuum configuration get a lot of attention, but for many workloads, choosing the right indexes has a larger impact on query performance than any server-level setting. A missing index on a frequently-filtered column can force sequential scans on tables with millions of rows. Conversely, unused indexes consume storage, slow down writes (every INSERT and UPDATE must maintain them), and add to vacuum's workload.

The challenge is that optimal indexes depend entirely on your query patterns, and those patterns change as your application evolves. An index that was critical six months ago may now serve queries that have been rewritten or removed. PostgreSQL provides pg_stat_user_indexes to identify unused indexes (check idx_scan = 0 over a representative time period), and pg_stat_user_tables to spot tables where sequential scans dominate despite available indexes.^[9]

Composite indexes are where the decisions get more interesting. The column order in a multi-column index determines which queries it can serve efficiently, and getting that order wrong means the index exists but the planner ignores it. Partial indexes (with a WHERE clause) can be dramatically smaller and faster than full-table indexes for queries that filter on a known condition, but they're underused because they require more thought about access patterns up front.

This is also an area where automated approaches are gaining traction. CMU's database research group has explored automated index recommendation as part of their holistic tuning work, treating index selection as a joint optimisation problem alongside knob configuration rather than a separate manual task.^[19] For teams managing dozens of PostgreSQL instances, each with different query workloads, the case for automated index analysis grows stronger as the fleet grows.

Beyond pg_stat_statements

pg_stat_statements is the standard PostgreSQL extension for query performance tracking, and it's enabled by default on most managed services. It tracks execution statistics (total time, call count, rows returned, cache hits) for normalised query patterns.^[10] For basic query performance monitoring, it's indispensable.

But it has real limitations that become apparent at scale.^[11] Statistics are cumulative, resetting on restart or manual flush, so there's no time-series history. You can see that a query is slow on average, but not whether it became slow yesterday or has been slow for months. Queries are normalised with parameters replaced, so you can't identify specific parameter values that cause poor performance. And critically, there's no query plan capture. You know a query is slow, but not why.

pg_stat_monitor, an open-source extension from Percona, addresses all of these limitations.^[12] It provides time-based buckets (query statistics per time interval rather than cumulative totals), query plan capture, per-plan statistics, and enhanced metrics including latency distributions. For teams doing serious performance analysis, it's the practical upgrade path from pg_stat_statements.

The difference matters in practice. With pg_stat_statements, you know that a query averages 200ms. With pg_stat_monitor, you can see that it averaged 20ms until Tuesday, jumped to 500ms, and correlated that with a plan change from an index scan to a sequential scan. That's the difference between knowing you have a performance problem and understanding what caused it.

This kind of time-series workload telemetry is also the foundation for the next generation of automated tuning. CMU's self-driving database research relies on exactly this data, detailed query performance over time, to build behaviour models that predict how configuration changes will affect a given workload.^[18] Whether you're tuning manually or feeding data into an automated system, richer observability is the prerequisite.

What PostgreSQL 17 changes

PostgreSQL 17 (released September 2024) includes several performance improvements that are worth understanding whether or not you've upgraded yet.^[13]

Streaming I/O

The new Read Stream API fundamentally changes how PostgreSQL handles sequential scans and ANALYZE operations. High-concurrency workloads can see up to 2x better write throughput from WAL processing improvements alone.^[14] This is the kind of improvement that benefits most workloads without requiring any configuration changes.

B-tree index improvements

Queries with large IN clauses previously required multiple B-tree index scans, one per value in the list. PostgreSQL 17 consolidates these into a single scan, resulting in 20-30% faster execution for CPU-bound workloads that use this pattern.^[15] If your application generates queries with large IN lists (common in batch operations and ORM-generated queries), this is a meaningful improvement.

Vacuum memory reduction

A new TidStore data structure reduces vacuum memory usage by up to 20x.^[13] This means vacuum operations on large tables consume dramatically less memory, reducing contention with production workloads and allowing vacuum to process larger tables without hitting memory limits.

Incremental sorting improvements

Multi-column ORDER BY queries on tables with partial indexes see 40-60% query time reduction.^[15] This matters particularly for reporting and analytics queries that sort large result sets by multiple columns.

Logical replication upgrades

PostgreSQL 17 eliminates the need to drop logical replication slots during major version upgrades.^[13] This removes one of the most painful operational requirements of PostgreSQL major version upgrades, where you previously had to coordinate replication slot recreation with downtime.

A tuning process, not a checklist

The parameters and techniques above are useful starting points, but PostgreSQL performance tuning is ultimately an iterative process. The right work_mem depends on your query mix. The right vacuum settings depend on your write patterns. The right indexes depend on your access patterns, which change as your application evolves.

What distinguishes teams that maintain good PostgreSQL performance at scale from those that don't isn't a specific set of parameter values. It's the practice of continuously monitoring query performance, understanding how the database engine is executing their workload, and adjusting configuration as conditions change. The teams that treat tuning as an ongoing process rather than a one-time setup catch problems early, before they cascade into incidents that take hours to resolve.

Manual tuning has a ceiling

It's worth being honest about the limits of guides like this one. Andy Pavlo's research group at Carnegie Mellon has spent over a decade demonstrating that manual, parameter-by-parameter tuning leaves significant performance on the table. PostgreSQL's knob count has grown steadily with each major release, the interactions between parameters are non-linear, and the optimal configuration for one workload may be wrong for another running on the same hardware. Their Proto-X project showed that holistic, ML-driven tuning, optimising all parameters simultaneously against measured workload behaviour, can deliver a 10x improvement over standard configurations.^[19]

That research points to a future where tuning guides become less necessary. But today, most teams are still tuning manually, and the defaults described above are still causing real problems in production. The practical path forward is to get the fundamentals right (vacuum, observability, avoiding the worst defaults), build the telemetry infrastructure that captures workload behaviour over time, and progressively adopt automated tuning as the tooling matures. The teams that have rich performance data will be best positioned to benefit from ML-driven approaches when they're ready.

This is also where the DBRE discipline earns its keep. The investigative work of identifying query regressions, tracking vacuum effectiveness, and correlating performance changes with deployment activity is exactly the kind of repetitive, high-frequency analysis that benefits from systematic automation. Not because the analysis is simple, but because it needs to happen continuously across every instance in the fleet, and because the data it generates is the foundation for increasingly intelligent tuning over time.

References

How to Tune shared_buffers and work_mem in PostgreSQL — OneUptime. Guidelines for memory parameter configuration based on system resources. See also: PostgreSQL Fastware: Understanding shared_buffers, work_mem, and wal_buffers.
Debugging Postgres autovacuum problems: 13 tips — Citus Data. Detailed exploration of autovacuum configuration and the scale_factor problem on large tables.
How to Implement PostgreSQL Bloat Prevention — OneUptime. Covers vacuum_cost_delay tuning for SSD storage and bloat prevention strategies.
How to Prevent Table Bloat with Autovacuum Tuning — OneUptime. Per-table autovacuum configuration patterns for high-churn OLTP tables.
PostgreSQL Bloat Is a Feature, Not a Bug — Roger Welin. Explains MVCC dead tuple mechanics and monitoring approaches.
Postgres 17 Query Performance Improvements — Microsoft Tech Community. Covers correlated subquery optimisation, IS NULL improvements, and other PG17 query planner changes.
PostgreSQL: Why table bloat ratio is higher than autovacuum_vacuum_scale_factor — Stack Overflow. Real-world example of bloat accumulation exceeding configured thresholds.
Locks, Contention, and Performance: A Technical Study — dbsnoop. Documents lock contention cascades in production PostgreSQL workloads.
Debugging Postgres autovacuum problems: 13 tips — Citus Data. Includes monitoring approaches for identifying index bloat and dead tuple accumulation.
PostgreSQL Extensions: pg_stat_statements — TigerData. Overview of pg_stat_statements capabilities and configuration.
Query observability and performance tuning with pg_stat_monitor and pg_stat_statements — Severalnines. Comparative analysis of both extensions, including pg_stat_statements limitations.
Query observability and performance tuning with pg_stat_monitor and pg_stat_statements — Severalnines. Covers pg_stat_monitor's time-based buckets, plan capture, and enhanced metrics.
PostgreSQL 17 Released with Improved Vacuum Process — InfoQ. Covers TidStore vacuum memory reduction, logical replication upgrades, and other PG17 changes.
PostgreSQL 17 Released — PostgreSQL Global Development Group. Official release announcement covering streaming I/O, BRIN parallel builds, and WAL improvements.
PostgreSQL 17 Features You Should Actually Be Using — DevStarsJ. Practical analysis of B-tree IN clause improvements and incremental sorting gains. See also: pgEdge: PostgreSQL 17 - A Major Step Forward.
Automatic Database Management System Tuning Through Large-scale Machine Learning — Van Aken et al., Carnegie Mellon University. Foundational OtterTune paper demonstrating that knob interactions make parameter-by-parameter tuning suboptimal.
OtterTune: Automatic Database Configuration Tuning — Carnegie Mellon Database Group. ML-driven holistic tuning that reuses training data from previous sessions to optimise configurations for specific workloads.
Make Your Database System Dream of Electric Sheep: Towards Self-Driving Operation — Pavlo et al., VLDB 2021. Describes the workload forecasting, behaviour modelling, and action planning architecture for autonomous database management.
The Holon Approach for Simultaneously Tuning Multiple Components in a Self-Driving DBMS — Zhang et al., VLDB 2024. Proto-X holistic tuning using vector embeddings to optimise all database parameters simultaneously, achieving 10x gains over standard configurations.

Ewen Fortune

CEO & Founder, SIXTA. The autonomous DBRE that joins your team.