How do I find slow queries in PostgreSQL?

Enable pg_stat_statements extension and query it for the slowest queries. Also enable log_min_duration_statement in postgresql.conf to log slow queries to the server log.

What is VACUUM in PostgreSQL and do I need to run it manually?

VACUUM reclaims disk space from dead rows created by UPDATE/DELETE. Autovacuum handles this automatically in most cases. Run VACUUM ANALYZE manually after large batch imports or if table bloat is detected.

What is connection pooling and why does PostgreSQL need it?

PostgreSQL creates a new OS process per connection — expensive at scale. Connection poolers (PgBouncer) maintain a pool of long-lived connections and multiplex many app connections onto them.

How many indexes should a table have?

As few as needed. Each index speeds reads but slows writes. Start with primary keys and foreign keys, then add indexes on WHERE, JOIN, and ORDER BY columns that run frequently.

PostgreSQL Query Optimization Guide | EXPLAIN, Indexes, and Slow Query Fixes

2026년 4월 16일 · 22분 읽기 · 수정 2026년 4월 16일 advanced tutorial

이 글의 핵심

PostgreSQL performance problems follow predictable patterns — missing indexes, N+1 queries, bloated tables, and misconfigured connection pools. This guide teaches you to find and fix each one systematically.

Systematic Performance Approach

1. Find slow queries (pg_stat_statements, logs)
2. Understand execution plan (EXPLAIN ANALYZE)
3. Fix: missing index, bad query, table bloat, config
4. Verify (re-run EXPLAIN ANALYZE, monitor)
5. Repeat

1. Finding Slow Queries

-- Enable the extension
CREATE EXTENSION IF NOT EXISTS pg_stat_statements;

-- Top 10 slowest queries by total time
SELECT
  round(total_exec_time::numeric, 2) AS total_ms,
  calls,
  round(mean_exec_time::numeric, 2) AS avg_ms,
  left(query, 200) AS query
FROM pg_stat_statements
ORDER BY total_exec_time DESC
LIMIT 10;

-- High average time (optimization candidates)
SELECT
  round(mean_exec_time::numeric, 2) AS avg_ms,
  calls,
  left(query, 200) AS query
FROM pg_stat_statements
WHERE calls > 100
ORDER BY mean_exec_time DESC
LIMIT 20;

# postgresql.conf: log queries over 1 second
log_min_duration_statement = 1000
log_checkpoints = on
track_io_timing = on

2. EXPLAIN ANALYZE

EXPLAIN (ANALYZE, BUFFERS, FORMAT TEXT)
SELECT u.id, u.email, COUNT(o.id) AS order_count
FROM users u
LEFT JOIN orders o ON o.user_id = u.id
WHERE u.created_at > '2026-01-01'
GROUP BY u.id, u.email
ORDER BY order_count DESC
LIMIT 20;

Red flags in output:

Seq Scan on large tables → needs an index
actual rows >> estimated rows → stale statistics, run ANALYZE
Sort Method: external merge → increase work_mem
Buffers: hit=0, read=N → I/O bound, increase shared_buffers

3. Index Strategies

B-tree indexes

-- Single column
CREATE INDEX CONCURRENTLY idx_orders_user_id ON orders(user_id);
CREATE INDEX CONCURRENTLY idx_users_email ON users(email);

-- Composite (most selective column first)
CREATE INDEX CONCURRENTLY idx_users_status_created
  ON users(status, created_at);

-- Covering index (index-only scan — no table access)
CREATE INDEX CONCURRENTLY idx_users_status_cover
  ON users(status) INCLUDE (id, email);

CONCURRENTLY builds the index without locking the table — always use in production.

Partial indexes

-- Only index active users
CREATE INDEX CONCURRENTLY idx_users_email_active
  ON users(email) WHERE is_active = true;

-- Only index pending jobs
CREATE INDEX CONCURRENTLY idx_jobs_pending
  ON jobs(created_at) WHERE status = 'pending';

JSONB and full-text

-- GIN index for JSONB containment queries
CREATE INDEX CONCURRENTLY idx_products_meta ON products USING GIN(metadata);
SELECT * FROM products WHERE metadata @> '{"category": "electronics"}';

-- Full-text search
ALTER TABLE articles ADD COLUMN search_vector tsvector
  GENERATED ALWAYS AS (to_tsvector('english', title || ' ' || body)) STORED;
CREATE INDEX idx_articles_fts ON articles USING GIN(search_vector);
SELECT * FROM articles WHERE search_vector @@ plainto_tsquery('english', 'postgres index');

4. Find Unused and Missing Indexes

-- Unused indexes (candidates for removal)
SELECT
  indexname,
  pg_size_pretty(pg_relation_size(indexrelid)) AS index_size,
  idx_scan AS scans
FROM pg_stat_user_indexes
WHERE idx_scan = 0
  AND indexrelname NOT LIKE '%_pkey'
ORDER BY pg_relation_size(indexrelid) DESC;

-- Tables with too many sequential scans (missing index)
SELECT
  relname AS table,
  seq_scan,
  idx_scan,
  pg_size_pretty(pg_total_relation_size(relid)) AS size
FROM pg_stat_user_tables
WHERE seq_scan > 100
  AND pg_total_relation_size(relid) > 10 * 1024 * 1024
ORDER BY seq_scan DESC
LIMIT 20;

5. Table Bloat and VACUUM

PostgreSQL leaves dead rows after UPDATE/DELETE. VACUUM reclaims them.

-- Check bloat
SELECT
  relname AS table_name,
  n_dead_tup AS dead_rows,
  n_live_tup AS live_rows,
  round(n_dead_tup::numeric / NULLIF(n_live_tup + n_dead_tup, 0) * 100, 1) AS dead_pct,
  last_autovacuum
FROM pg_stat_user_tables
WHERE n_dead_tup > 10000
ORDER BY n_dead_tup DESC;

-- Manual vacuum (safe, non-locking)
VACUUM ANALYZE orders;

-- VACUUM FULL (rewrites entire table — locks it)
-- Only during maintenance windows
VACUUM FULL orders;

# postgresql.conf: tune autovacuum for high-write tables
autovacuum_vacuum_scale_factor = 0.01
autovacuum_analyze_scale_factor = 0.005

6. Connection Pooling (PgBouncer)

PostgreSQL creates one OS process per connection. At 500+ connections, this overloads the server.

# pgbouncer.ini
[databases]
myapp = host=db.example.com dbname=myapp

[pgbouncer]
pool_mode = transaction     # recommended
max_client_conn = 1000      # app-facing connections
default_pool_size = 20      # actual PostgreSQL connections
server_idle_timeout = 600

App (1000 connections) → PgBouncer → PostgreSQL (20 connections)

7. postgresql.conf Key Settings

# Memory
shared_buffers = 4GB           # 25% of RAM
effective_cache_size = 12GB    # 75% of RAM
work_mem = 64MB                # per sort/hash (multiply by max connections)
maintenance_work_mem = 512MB   # for VACUUM, CREATE INDEX

# WAL
checkpoint_completion_target = 0.9
max_wal_size = 4GB

# Parallelism
max_parallel_workers_per_gather = 4

# Logging
log_min_duration_statement = 1000
log_lock_waits = on
log_temp_files = 0             # log all temp files (work_mem tuning signal)

8. Common Patterns and Fixes

-- ❌ OFFSET pagination: scans 100,000 rows then discards them
SELECT * FROM posts ORDER BY id DESC LIMIT 20 OFFSET 100000;

-- ✅ Keyset pagination: O(log n) via index
SELECT * FROM posts
WHERE id < :last_seen_id
ORDER BY id DESC LIMIT 20;

-- ❌ Exact COUNT(*) on large table: full scan
SELECT COUNT(*) FROM events;

-- ✅ Approximate count (instant)
SELECT reltuples::bigint FROM pg_class WHERE relname = 'events';

-- ❌ Function on indexed column (can't use index)
SELECT * FROM users WHERE lower(email) = '[email protected]';

-- ✅ Expression index
CREATE INDEX idx_users_email_lower ON users(lower(email));
SELECT * FROM users WHERE lower(email) = '[email protected]';

9. Monitoring Queries

-- Long-running queries
SELECT pid, now() - query_start AS duration, state, left(query, 100) AS query
FROM pg_stat_activity
WHERE state != 'idle' AND query_start < now() - interval '5 seconds'
ORDER BY duration DESC;

-- Kill a query
SELECT pg_terminate_backend(pid);

-- Lock waits
SELECT
  blocked.pid AS blocked_pid,
  blocking.pid AS blocking_pid,
  left(blocked.query, 80) AS blocked_query,
  left(blocking.query, 80) AS blocking_query
FROM pg_stat_activity blocked
JOIN pg_stat_activity blocking ON blocking.pid = ANY(pg_blocking_pids(blocked.pid));

-- Buffer cache hit ratio (should be > 99%)
SELECT
  round(sum(heap_blks_hit)::numeric /
    (sum(heap_blks_hit) + sum(heap_blks_read)) * 100, 2) AS cache_hit_pct
FROM pg_statio_user_tables;

Key Takeaways

Problem	Fix
Slow SELECT	Index on WHERE/JOIN/ORDER BY columns
High write latency	Remove unused indexes
Too many connections	PgBouncer connection pooling
Table bloat	Tune autovacuum or VACUUM ANALYZE
Slow pagination	Keyset instead of OFFSET
Low cache hit ratio	Increase `shared_buffers`
Sort spills to disk	Increase `work_mem`
Planner makes bad plans	Run ANALYZE, update statistics

Master EXPLAIN ANALYZE and pg_stat_statements — 90% of PostgreSQL performance problems are solved with the right indexes and query rewrites, before you ever touch configuration.