When do I use this in production?

For high-traffic services, complex query bottlenecks, and cutting latency—apply index selection, execution plans, statistics, and cost models. Use EXPLAIN to find full scans, then tune with index strategy.

[2026] C++ Query Optimization Guide | Index Selection, Plans, Statistics, Cost Model, Production Patterns [#49-3]

Q: What should I read first?

Database fundamentals and REST API posts help you place C++ DB integration in context.

Q: Where can I go deeper?

PostgreSQL docs for EXPLAIN, pg_stat catalogs, and the Use The Index Luke site.

2026년 3월 12일 · 45분 읽기 · 수정 2026년 4월 17일 advanced tutorial

이 글의 핵심

C++ query optimization: index selection, execution plans, statistics, cost models, and production patterns [#49-3]. Real-world problems and scenarios.

Introduction

“Our query takes 3 seconds in production”

Why query optimization

You run a REST API server, but the user list endpoint takes more than 3 seconds. With only 100 concurrent users, CPU hits 90% and timeouts appear. Query optimization is the process of finding such bottlenecks and fixing them using indexes, execution plans, statistics, and cost models.

This post covers:

Problem scenarios: seven situations you see in real production
Index selection: single/composite/covering indexes, criteria, and full examples
Execution plans: EXPLAIN / EXPLAIN ANALYZE for full scans vs index scans
Statistics and cost models: ANALYZE, pg_stat, optimizer cost estimates
Common mistakes: N+1, full scans, unused indexes, prepared statements, and 13 patterns total
Best practices: index design, query style, monitoring
Production patterns: connection pooling, read/write split, batching

Related posts: Database query optimization #51-8, Database fundamentals.

A mental model

Treat the topic as a system of interlocking parts. Choices in one layer (storage, network, observability) affect others, so this article frames tradeoffs with numbers and patterns.

Practical experience: this article is grounded in problems and fixes from large C++ projects. It includes pitfalls and debugging tips you rarely see in textbooks.

Problem scenarios
Index selection guide
Execution plan analysis (EXPLAIN)
Statistics and cost model
End-to-end query optimization example
Common mistakes and fixes
Best practices
Production patterns
Checklist
Summary

Core loop: index selection → verify plan → refresh statistics → remove N+1 → prepared statements. Following this order resolves most query bottlenecks.

1. Problem scenarios

Scenario 1: “The user list API takes 3 seconds”

Situation: GET /users takes 3.2 seconds for 1000 users. The cause is N+1: an extra query per user for orders.

// Bad: 1 + 1000 = 1001 queries
void get_users_with_orders(PGconn* conn) {
    PGresult* users = PQexec(conn, "SELECT id, name FROM users");
    for (int i = 0; i < PQntuples(users); ++i) {
        int id = atoi(PQgetvalue(users, i, 0));
        PGresult* orders = PQexecParams(conn,
            "SELECT * FROM orders WHERE user_id = $1", 1, nullptr,
            (const char*[]){std::to_string(id).c_str()}, nullptr, nullptr, 0);
        // ... handle ...
        PQclear(orders);
    }
    PQclear(users);
}

Cause: one query per user inside the loop. 1000 users × ~3 ms ≈ 3 seconds.

Fix: reduce to 1–2 queries with JOIN or batched IN.

Scenario 2: “We have an index but still get a full scan”

Situation: table orders has idx_orders_user_id, but EXPLAIN shows Seq Scan.

-- Query
SELECT * FROM orders WHERE LOWER(status) = 'pending';

Note: If collation rules change, expression indexes and results can diverge—validate during migrations.

Cause: applying a function like LOWER(status) prevents a plain index on status from being used; the optimizer cannot probe the index on raw status values.

Fix: remove the function from the predicate, or add a functional index.

-- Functional index (PostgreSQL)
CREATE INDEX idx_orders_status_lower ON orders (LOWER(status));

Scenario 3: “Stale statistics pick the wrong plan”

Situation: 10 million new rows were added, but the optimizer still estimates 1000 rows and picks nested loop join. Hash join would be faster.

Cause: pg_stat_user_tables statistics are stale; without ANALYZE, the optimizer uses outdated histograms.

Fix: run ANALYZE regularly; after large INSERT/UPDATE batches, run it immediately.

ANALYZE orders;
-- Or specific columns only
ANALYZE orders (user_id, created_at);

Scenario 4: “Composite index column order is wrong”

Situation: for WHERE user_id = ? AND created_at > ? you created idx_orders_created_user (created_at, user_id). The index is not used effectively.

Cause: composite indexes are used from the left. If created_at is first, predicates that only filter on user_id get limited benefit.

Fix: put the most selective column first—e.g. if user_id narrows 10M rows to ~10, put user_id first.

-- Correct order: user_id (high selectivity) → created_at (range)
CREATE INDEX idx_orders_user_created ON orders(user_id, created_at);

Scenario 5: “Batch INSERT of 10k rows takes 30 seconds”

Situation: inserting 10k rows into a log table takes 30 seconds.

// Bad: 10k individual INSERTs
for (int i = 0; i < 10000; ++i) {
    PQexecParams(conn, "INSERT INTO logs (ts, msg) VALUES ($1, $2)", ...);
}

Cause: commit per row, WAL flush, index maintenance—10k round trips.

Fix: wrap in BEGIN/COMMIT, or use COPY.

// Batch INSERT
PQexec(conn, "BEGIN");
// e.g. insert in chunks of 100
PQexec(conn, "COMMIT");

Scenario 6: “Same query 10k times—parse cost adds up”

Situation: login runs SELECT * FROM users WHERE email = $1 1000 times per second. CPU is high and pg_stat_activity shows heavy Parse work.

Cause: repeated parse and plan per request without prepared statements.

Fix: prepare once; bind parameters for each execution.

// Prepared statement (PostgreSQL)
// 1. PREPARE once
PQexec(conn, "PREPARE get_user (text) AS SELECT * FROM users WHERE email = $1");
// 2. EXECUTE with different bindings
PQexecPrepared(conn, "get_user", 1, (const char*[]){"[email protected]"}, nullptr, nullptr, 0);

Scenario 7: “Large join query OOMs the server”

Situation: joining 1M users to 10M orders OOMs a 32 GB machine.

Cause: nested loop join materializes huge intermediates; exceeding work_mem spills to disk and slows everything.

Fix: inspect join strategy with EXPLAIN, refresh stats with ANALYZE to favor hash join; tune work_mem if needed.

-- Refresh stats to help hash join
ANALYZE users;
ANALYZE orders;
-- Session-local; use with care
SET work_mem = '256MB';

Scenario diagram

flowchart TB
    subgraph Problems[Query performance issues]
        P1[N+1 queries]
        P2[Full table scan]
        P3[Stale statistics]
        P4[Wrong index column order]
        P5[Per-row INSERT]
    end
    subgraph Solutions[Fixes]
        S1[JOIN / IN batch]
        S2[Add index / remove function]
        S3[ANALYZE]
        S4[Fix composite order]
        S5[Batch INSERT / COPY]
    end
    P1 --> S1
    P2 --> S2
    P3 --> S3
    P4 --> S4
    P5 --> S5

2. Index selection guide

Why indexes matter

Without an index, WHERE user_id = 123 may read the entire table. With a B-tree index, lookup is O(log N) in a few milliseconds.

flowchart TB
    subgraph NoIndex[No index]
        N1[Row 1] --> N2[Row 2]
        N2 --> N3[Row 3]
        N3 --> N4[...]
        N4 --> N5[Row 1M]
        N5 --> N6["Full table scan: read 1M rows"]
    end
    subgraph WithIndex[With index]
        I1["B-tree index"] --> I2["log2(1M) ≈ 20 nodes"]
        I2 --> I3["Seek to matching rows"]
    end

Index selection cheat sheet

Column usage	Index type	Example	Reason
WHERE equality	Single B-tree	`idx_users_email`	Exact match
WHERE range	Composite (equality first)	`idx_orders_user_created`	`user_id` eq + `created_at` range
JOIN keys	Both sides	`orders.user_id`, `users.id`	Join performance
ORDER BY	Composite	`(user_id, created_at DESC)`	Avoid sort
SELECT columns	Covering	`INCLUDE (amount, status)`	Avoid heap fetch

Single-column indexes

-- Lookup user by email (equality)
CREATE INDEX idx_users_email ON users(email);
-- Sort by created_at
CREATE INDEX idx_users_created_at ON users(created_at);

// Create indexes from C++ (SQLite)
#include <sqlite3.h>
void create_indexes(sqlite3* db) {
    const char* sql = R"(
        CREATE INDEX IF NOT EXISTS idx_users_email ON users(email);
        CREATE INDEX IF NOT EXISTS idx_users_created ON users(created_at);
    )";
    char* err = nullptr;
    sqlite3_exec(db, sql, nullptr, nullptr, &err);
    if (err) {
        sqlite3_free(err);
    }
}

Composite indexes and column order

Rule: put the most selective column first; equality before range.

-- user_id returns ~10 rows out of 10M; created_at is a range
-- Order: user_id (equality, selective) → created_at (range)
CREATE INDEX idx_orders_user_created ON orders(user_id, created_at DESC);
-- Wrong: created_at first makes user_id-only filters inefficient
CREATE INDEX idx_orders_created_user ON orders(created_at, user_id);

Queryable patterns:

Index `(user_id, created_at)`	Works?	Notes
`WHERE user_id = ?`	Yes	Left-prefix
`WHERE user_id = ? AND created_at > ?`	Yes	Both used
`WHERE created_at > ?`	No	Missing leading `user_id`
`WHERE user_id = ? ORDER BY created_at DESC`	Yes	Index order covers sort

Covering indexes (index-only scan)

If all selected columns live in the index, PostgreSQL can satisfy the query from the index alone.

-- PostgreSQL: INCLUDE
CREATE INDEX idx_orders_user_cover ON orders(user_id) INCLUDE (amount, status);
-- SQLite: include columns in the key list
CREATE INDEX idx_orders_user_cover ON orders(user_id, amount, status);

-- Can become an index-only scan (with visibility map / ANALYZE as applicable)
SELECT user_id, amount, status FROM orders WHERE user_id = 123;

Partial indexes

Index only rows matching a predicate—smaller index, faster maintenance.

-- Only active orders
CREATE INDEX idx_orders_active ON orders(user_id) WHERE status = 'active';
-- Last 30 days of logs
CREATE INDEX idx_logs_recent ON logs(created_at) WHERE created_at > NOW() - INTERVAL '30 days';

When indexes cannot be used

-- Function on column — index not used (plain column index)
SELECT * FROM users WHERE LOWER(email) = '[email protected]';
-- Index-friendly
SELECT * FROM users WHERE email = '[email protected]';
-- Implicit cast (PostgreSQL)
SELECT * FROM users WHERE id = '123';  -- id is integer
-- Explicit type
SELECT * FROM users WHERE id = 123;
-- OR — may merge indexes but often inefficient
SELECT * FROM users WHERE email = '[email protected]' OR name = 'Alice';
-- Prefer UNION ALL with disjoint predicates
SELECT * FROM users WHERE email = '[email protected]'
UNION ALL
SELECT * FROM users WHERE name = 'Alice' AND email != '[email protected]';

3. Execution plan analysis (EXPLAIN)

What is an execution plan

The engine shows how it will run the query. EXPLAIN reveals full scans, index scans, and join order.

flowchart LR
    A[SQL query] --> B[Parser]
    B --> C[Optimizer]
    C --> D[Execution plan]
    D --> E[Index Scan]
    D --> F[Seq Scan]
    D --> G[Nested Loop]

PostgreSQL EXPLAIN ANALYZE

-- Actual timings (ANALYZE)
-- Buffer hits (BUFFERS)
EXPLAIN (ANALYZE, BUFFERS, FORMAT TEXT)
SELECT * FROM orders WHERE user_id = 123 ORDER BY created_at DESC LIMIT 10;

Using an index (good):

Limit  (cost=0.42..8.44 rows=10 width=40) (actual time=0.05..0.08 rows=10 loops=1)
  ->  Index Scan using idx_orders_user_created on orders
        Index Cond: (user_id = 123)
        Buffers: shared hit=4
Planning Time: 0.12 ms
Execution Time: 0.15 ms

Full scan (bad):

Limit  (cost=0.00..20834.00 rows=10 width=40) (actual time=45.2..182.3 rows=10 loops=1)
  ->  Seq Scan on orders
        Filter: (user_id = 123)
        Rows Removed by Filter: 999990
        Buffers: shared read=8000
Planning Time: 0.08 ms
Execution Time: 182.5 ms

Plan terminology

Term	Meaning	Action
Seq Scan	Full table scan	Add or fix index
Index Scan	Index + heap fetch	Usually good
Index Only Scan	Index only (covering)	Best when applicable
Bitmap Index Scan	Bitmap then heap	Common for large result sets
Nested Loop	Nested-loop join	Good for small outer
Hash Join	Hash join	Large equi-joins
Merge Join	Sort-merge join	Pre-sorted inputs

Running EXPLAIN from C++

#include <libpq-fe.h>
#include <iostream>
#include <string>
void explain_query(PGconn* conn, const std::string& sql) {
    std::string explain_sql = "EXPLAIN (ANALYZE, BUFFERS) " + sql;
    PGresult* res = PQexec(conn, explain_sql.c_str());
    if (PQresultStatus(res) == PGRES_TUPLES_OK) {
        for (int i = 0; i < PQntuples(res); ++i) {
            const char* row = PQgetvalue(res, i, 0);
            if (row) std::cout << row << "\n";
        }
    }
    PQclear(res);
}

SQLite EXPLAIN QUERY PLAN

EXPLAIN QUERY PLAN
SELECT id, name FROM users WHERE email = '[email protected]';

-- With index
SEARCH users USING INDEX idx_users_email (email=?)
-- Without index
SCAN users

Plan review checklist

Check	Good	Bad	Mitigation
Scan type	Index / index-only	Seq Scan	Add or change index
Rows removed by filter	~0	Very large	Index `WHERE` columns
Buffers shared hit	Mostly hit	Mostly read	Warm cache / smaller reads
actual time	Low ms	High ms	Revisit query/index
Planning time	Sub-ms to low ms	Very high	Stats, simpler query

// SQLite EXPLAIN from C++
void explain_sqlite(sqlite3* db, const std::string& sql) {
    sqlite3_stmt* stmt = nullptr;
    std::string explain_sql = "EXPLAIN QUERY PLAN " + sql;
    sqlite3_prepare_v2(db, explain_sql.c_str(), -1, &stmt, nullptr);
    while (sqlite3_step(stmt) == SQLITE_ROW) {
        const char* detail = reinterpret_cast<const char*>(sqlite3_column_text(stmt, 3));
        if (detail) std::cout << detail << "\n";
    }
    sqlite3_finalize(stmt);
}

4. Statistics and cost model

How statistics affect plans

The optimizer estimates cost from table and index statistics. Stale stats yield wrong row estimates and bad joins.

flowchart TB
    A[pg_stat_user_tables] --> B[Optimizer]
    C[pg_stats] --> B
    B --> D[Cost estimate]
    D --> E[Plan choice]

Refresh with ANALYZE

-- Whole database
ANALYZE;
-- One table
ANALYZE orders;
-- Specific columns on huge tables
ANALYZE orders (user_id, created_at);

// Run ANALYZE from C++
void refresh_statistics(PGconn* conn) {
    PGresult* res = PQexec(conn, "ANALYZE orders");
    if (PQresultStatus(res) != PGRES_COMMAND_OK) {
        std::cerr << "ANALYZE failed: " << PQerrorMessage(conn) << "\n";
    }
    PQclear(res);
}

pg_stat catalogs

-- Table scan stats
SELECT relname, seq_scan, seq_tup_read, idx_scan, idx_tup_fetch
FROM pg_stat_user_tables
WHERE relname = 'orders';
-- Per-index stats
SELECT indexrelname, idx_scan, idx_tup_read, idx_tup_fetch
FROM pg_stat_user_indexes
WHERE relname = 'orders';

Column	Meaning
seq_scan	Sequential scan count
seq_tup_read	Rows read by seq scan
idx_scan	Index scan count
idx_tup_fetch	Rows fetched via index

Monitoring full-scan-heavy tables

-- Tables with many seq scans (candidates for indexes)
SELECT relname, seq_scan, seq_tup_read, idx_scan, idx_tup_fetch,
       seq_scan::float / NULLIF(seq_scan + idx_scan, 0) AS seq_ratio
FROM pg_stat_user_tables
WHERE seq_scan + idx_scan > 100
ORDER BY seq_tup_read DESC
LIMIT 10;

-- Unused indexes (candidates to drop)
SELECT schemaname, relname, indexrelname, idx_scan
FROM pg_stat_user_indexes
WHERE idx_scan = 0
  AND indexrelname NOT LIKE '%_pkey';

// Log table stats from C++
void log_table_stats(PGconn* conn) {
    const char* sql = R"(
        SELECT relname, seq_scan, idx_scan, seq_tup_read, idx_tup_fetch
        FROM pg_stat_user_tables
        WHERE seq_scan > 1000 OR seq_tup_read > 100000
    )";
    PGresult* res = PQexec(conn, sql);
    for (int i = 0; i < PQntuples(res); ++i) {
        printf("Table: %s, seq_scan=%s, idx_scan=%s\n",
               PQgetvalue(res, i, 0), PQgetvalue(res, i, 1), PQgetvalue(res, i, 2));
    }
    PQclear(res);
}

Cost model (PostgreSQL)

How PostgreSQL estimates cost:

Sequential scan cost = (pages × seq_page_cost) + (rows × cpu_tuple_cost)
Index scan cost ≈ (index pages × random_page_cost) + (heap pages × random_page_cost) + (rows × cpu_tuple_cost)

Example:
orders: 10M rows, 125K pages

Seq Scan:
= 125,000 × 1.0 + 10,000,000 × 0.01
= 225,000 cost units

Index Scan (user_id = 123, 100 rows):
= (4 index pages × 4.0) + (100 heap pages × 4.0) + (100 × 0.01)
≈ 417 cost units

→ Optimizer prefers index scan

PostgreSQL compares abstract cost units. Defaults:

seq_page_cost = 1.0 — sequential page read
random_page_cost = 4.0 — random page (often 1.1–1.5 on SSD)
cpu_tuple_cost = 0.01 — per-row CPU

SHOW seq_page_cost;
SHOW random_page_cost;
-- SSD tuning example
ALTER SYSTEM SET random_page_cost = 1.1;

Reading a cost line

Limit  (cost=0.42..8.44 rows=10 width=40)

cost=0.42..8.44: startup cost .. total cost
rows=10: estimated rows
width=40: average row width in bytes

High total cost (e.g. cost=0.00..20834.00) often indicates a large sequential scan.

5. End-to-end query optimization example

Example: ten latest orders per user

Requirement: for 1000 users, fetch the ten most recent orders each, under 3 seconds total.

Step 1: Baseline

-- users: id, name, email, created_at (1M rows)
-- orders: id, user_id, amount, status, created_at (10M rows)
-- Naive approach (N+1)
SELECT * FROM users;
-- loop: SELECT * FROM orders WHERE user_id = ? ORDER BY created_at DESC LIMIT 10

EXPLAIN:

users: seq scan (no helpful index)
orders: repeated seq scans without user_id index

Step 2: Index design

CREATE INDEX idx_users_email ON users(email);
-- Core: user_id equality + created_at order
CREATE INDEX idx_orders_user_created ON orders(user_id, created_at DESC);

Step 3: Remove N+1 — batched `ANY` / window query

#include <libpq-fe.h>
#include <vector>
#include <string>
#include <unordered_map>
struct Order {
    int id, user_id, amount;
    std::string status;
};
std::unordered_map<int, std::vector<Order>> get_recent_orders_batch(
    PGconn* conn, const std::vector<int>& user_ids, int limit = 10)
{
    if (user_ids.empty()) return {};
    std::string sql = "SELECT user_id, id, amount, status FROM (";
    sql += "SELECT user_id, id, amount, status, "
           "ROW_NUMBER() OVER (PARTITION BY user_id ORDER BY created_at DESC) rn ";
    sql += "FROM orders WHERE user_id = ANY($1::int[])) sub ";
    sql += "WHERE rn <= " + std::to_string(limit);
    std::string array_str = "{";
    for (size_t i = 0; i < user_ids.size(); ++i) {
        if (i > 0) array_str += ",";
        array_str += std::to_string(user_ids[i]);
    }
    array_str += "}";
    const char* params[] = {array_str.c_str()};
    PGresult* res = PQexecParams(conn, sql.c_str(), 1, nullptr, params, nullptr, nullptr, 0);
    std::unordered_map<int, std::vector<Order>> result;
    if (PQresultStatus(res) == PGRES_TUPLES_OK) {
        for (int i = 0; i < PQntuples(res); ++i) {
            Order o;
            o.user_id = atoi(PQgetvalue(res, i, 0));
            o.id = atoi(PQgetvalue(res, i, 1));
            o.amount = atoi(PQgetvalue(res, i, 2));
            o.status = PQgetvalue(res, i, 3) ? PQgetvalue(res, i, 3) : "";
            result[o.user_id].push_back(o);
        }
    }
    PQclear(res);
    return result;
}

Step 4: Verify the plan

EXPLAIN (ANALYZE, BUFFERS)
SELECT user_id, id, amount, status FROM (
    SELECT user_id, id, amount, status,
           ROW_NUMBER() OVER (PARTITION BY user_id ORDER BY created_at DESC) rn
    FROM orders WHERE user_id = ANY(ARRAY[1,2,3,...,1000])
) sub WHERE rn <= 10;

Expect: Index Scan using idx_orders_user_created or Bitmap Heap Scan.

Step 5: Compare approaches

Approach	Queries	Approx. time
N+1 (no index)	1001	3200 ms
N+1 (with index)	1001	800 ms
Batch `ANY` (with index)	2	95 ms

Step 6: Prepared statements for hot paths

Repeated execution benefits from one-time parse/plan; binding also prevents SQL injection.

// PostgreSQL: PQprepare + PQexecPrepared
#include <libpq-fe.h>
#include <string>
class PreparedQuery {
    PGconn* conn_;
    std::string name_;
    int n_params_;
public:
    PreparedQuery(PGconn* conn, const std::string& name,
                  const std::string& sql, int n_params)
        : conn_(conn), name_(name), n_params_(n_params)
    {
        PGresult* res = PQprepare(conn_, name_.c_str(), sql.c_str(), n_params_, nullptr);
        if (PQresultStatus(res) != PGRES_COMMAND_OK) {
            PQclear(res);
            throw std::runtime_error(PQerrorMessage(conn_));
        }
        PQclear(res);
    }
    PGresult* execute(const char* const* params) {
        return PQexecPrepared(conn_, name_.c_str(), n_params_, params, nullptr, nullptr, 0);
    }
};

// SQLite: sqlite3_prepare_v2 + bind + step in a loop
sqlite3_stmt* stmt = nullptr;
sqlite3_prepare_v2(db, "SELECT id, name FROM users WHERE email = ?", -1, &stmt, nullptr);
for (const auto& email : emails) {
    sqlite3_bind_text(stmt, 1, email.c_str(), -1, SQLITE_TRANSIENT);
    while (sqlite3_step(stmt) == SQLITE_ROW) {
        // handle row
    }
    sqlite3_reset(stmt);
}
sqlite3_finalize(stmt);

Step 7: Single JOIN alternative

You can fetch users and ranked orders in one query.

WITH ranked AS (
    SELECT o.*, ROW_NUMBER() OVER (PARTITION BY o.user_id ORDER BY o.created_at DESC) rn
    FROM orders o
    JOIN users u ON o.user_id = u.id
    WHERE u.created_at > '2025-01-01'
)
SELECT u.id, u.name, r.id AS order_id, r.amount, r.status
FROM users u
JOIN ranked r ON u.id = r.user_id AND r.rn <= 10
ORDER BY u.id, r.rn;

// Process JOIN result in C++
std::vector<UserWithOrders> get_users_with_orders_join(PGconn* conn) {
    const char* sql = R"(
        WITH ranked AS (
            SELECT o.user_id, o.id AS order_id, o.amount, o.status,
                   ROW_NUMBER() OVER (PARTITION BY o.user_id ORDER BY o.created_at DESC) rn
            FROM orders o JOIN users u ON o.user_id = u.id
        )
        SELECT u.id, u.name, r.order_id, r.amount, r.status
        FROM users u
        JOIN ranked r ON u.id = r.user_id AND r.rn <= 10
        ORDER BY u.id, r.rn
    )";
    PGresult* res = PQexec(conn, sql);
    std::vector<UserWithOrders> result;
    int current_user_id = -1;
    UserWithOrders* current = nullptr;
    for (int i = 0; i < PQntuples(res); ++i) {
        int uid = atoi(PQgetvalue(res, i, 0));
        if (uid != current_user_id) {
            result.push_back({uid, PQgetvalue(res, i, 1), {}});
            current = &result.back();
            current_user_id = uid;
        }
        if (current)
            current->orders.push_back({atoi(PQgetvalue(res, i, 2)),
                                       atoi(PQgetvalue(res, i, 3)),
                                       PQgetvalue(res, i, 4)});
    }
    PQclear(res);
    return result;
}

Step 8: Plan comparison workflow

1. Run EXPLAIN (ANALYZE, BUFFERS) on the query
2. If Seq Scan → check Rows Removed by Filter; consider indexes
3. If Index Scan → high shared hit ratio means cache friendly
4. Compare cost vs actual time — cost is estimate, actual is truth
5. Nested loop vs hash join — depends on cardinality and stats

-- Compare plans before/after ANALYZE
EXPLAIN (ANALYZE, BUFFERS, FORMAT JSON)
SELECT u.id, u.name, COUNT(o.id)
FROM users u
JOIN orders o ON u.id = o.user_id
GROUP BY u.id, u.name;

6. Common mistakes and fixes

1. N+1 queries

Symptom: list API > 3 seconds.
Cause: query inside loop.
Fix: JOIN or batched IN / ANY.

// Bad
for (auto& user : users) {
    auto orders = query_orders(conn, user.id);
}

2. Full table scan

Symptom: Seq Scan over millions of rows.
Cause: no index on WHERE columns.

CREATE INDEX idx_orders_user_id ON orders(user_id);

3. Functions prevent index use

Symptom: index exists but plan shows seq scan.

SELECT * FROM users WHERE email = '[email protected]';
CREATE INDEX idx_users_email_lower ON users (LOWER(email));

4. Wrong composite order

Symptom: index (created_at, user_id) but query filters only user_id.
Fix: (user_id, created_at).

DROP INDEX idx_orders_created_user;
CREATE INDEX idx_orders_user_created ON orders(user_id, created_at);

5. Connection leak

Symptom: too many connections.
Fix: RAII guards / pool release.

PgConnectionGuard guard(pool);
PGconn* conn = guard.get();

6. SQL injection

Symptom: hostile input exfiltrates data.

// Never
std::string sql = "SELECT * FROM users WHERE id = " + user_input;
// Always bind
PQexecParams(conn, "SELECT * FROM users WHERE id = $1::int", 1, nullptr, params, nullptr, nullptr, 0);

7. Stale statistics

Symptom: wrong join after bulk load.
Fix: ANALYZE.

8. `SELECT *`

Symptom: wide rows, no covering index benefit.

SELECT id, amount, status FROM orders WHERE user_id = 123;

9. Too many indexes

Symptom: slow writes.
Fix: index only query patterns you run; drop unused indexes.

10. Cache not invalidated

Symptom: stale reads.
Fix: invalidate or update cache on writes.

void update_user(..., LruCache<int, User>& cache, ...) {
    cache.put(id, new_user);
}

11. No prepared statements

Symptom: high CPU on repeated queries.
Fix: PQprepare / PQexecPrepared or long-lived sqlite3_stmt*.

12. Huge `IN` lists

Symptom: thousands of literals—bad plans or errors.
Fix: chunk (e.g. 500–1000) or temp table + JOIN.

const size_t BATCH = 500;
for (size_t i = 0; i < ids.size(); i += BATCH) {
    auto end = std::min(i + BATCH, ids.size());
    query_batch(conn, ids.data() + i, end - i);
}

13. Batch INSERT without a transaction

Symptom: 10k inserts in 30+ seconds.
Fix: BEGIN/COMMIT or COPY.

PQexec(conn, "BEGIN");
for (const auto& row : rows) {
    PQexecParams(conn, "INSERT INTO logs (...) VALUES ($1, $2)", ...);
}
PQexec(conn, "COMMIT");

7. Best practices

Index design

Index WHERE, JOIN, and ORDER BY columns as needed
Composite: most selective column first
Use covering / INCLUDE for index-only scans when worthwhile
Drop unused indexes

Query style

Avoid SELECT *; project needed columns
Use parameter binding and prepared statements
Avoid N+1 (JOIN, batching)
Use LIMIT on large result sets

Statistics

ANALYZE after bulk changes; schedule periodic analyze
Watch pg_stat_user_tables

Plans

After index changes, run EXPLAIN ANALYZE
Confirm seq → index transition
Compare estimated vs actual rows

Connections

Use a pool (typical sizes 10–20)
RAII for acquire/release

Monitoring

Log slow queries (e.g. > 100 ms)
Inspect pg_stat_activity
Set statement timeouts

Index decision flow

flowchart TD
    A[Analyze query] --> B{WHERE shape?}
    B -->|Single equality| C[Single B-tree]
    B -->|Equality + range| D[Composite: equality first]
    B -->|Range only| E[Single or partial index]
    C --> F[Most selective column]
    D --> F
    E --> F
    F --> G{Can cover SELECT?}
    G -->|Yes| H[Covering / INCLUDE]
    G -->|No| I[Non-covering index]

N+1 alternatives

Pattern	When	Pros	Cons
JOIN	1:N, one round trip	Single query	Wider rows, grouping in app
`IN` batch	Many IDs	Flexible	Limit `IN` size
Lazy load	Rare access	Fast first paint	Easy to N+1
Lazy + cache	Hot reads	Fast on hit	Invalidation complexity

8. Production patterns

Pool + prepared statements

struct DbContext {
    PgConnectionPool& pool;
    PgConnectionGuard guard;
    PGconn* conn;
    PreparedStatementCache& stmt_cache;
    explicit DbContext(PgConnectionPool& p, PreparedStatementCache& c)
        : pool(p), guard(p), conn(guard.get()), stmt_cache(c) {}
};

Read/write split

flowchart TB
    App[App] --> Router[Router]
    Router -->|SELECT| Replica[Replica]
    Router -->|INSERT/UPDATE| Primary[Primary]
    Primary -.->|Replication| Replica

class ReadWriteSplit {
    PgConnectionPool* primary_;
    PgConnectionPool* replica_;
public:
    PGconn* acquire_read() { return replica_->acquire(); }
    PGconn* acquire_write() { return primary_->acquire(); }
};

Batched INSERT

PQexec(conn, "BEGIN");
for (size_t i = 0; i < batch.size(); i += 100) {
    batch_insert(conn, batch, i, std::min(i + 100, batch.size()));
}
PQexec(conn, "COMMIT");

Statement timeout

PQexec(conn, "SET statement_timeout = 5000");  // 5s

Slow query logging

auto start = std::chrono::steady_clock::now();
PGresult* res = PQexecParams(conn, sql.c_str(), ...);
auto elapsed = std::chrono::duration_cast<std::chrono::milliseconds>(
    std::chrono::steady_clock::now() - start).count();
if (elapsed > 100) {
    log_warn("Slow query: {}ms - {}", elapsed, sql);
}

Scheduled ANALYZE

-- pg_cron or OS cron
-- 0 3 * * * psql -c "ANALYZE"

Prepared statement cache

class PreparedStatementCache {
    PGconn* conn_;
    std::unordered_map<std::string, std::string> stmts_;
public:
    explicit PreparedStatementCache(PGconn* conn) : conn_(conn) {}
    PGresult* execute(const std::string& name, const std::string& sql,
                      int n_params, const char* const* params) {
        if (stmts_.find(name) == stmts_.end()) {
            std::string prepare_sql = "PREPARE " + name + " AS " + sql;
            PQexec(conn_, prepare_sql.c_str());
            stmts_[name] = sql;
        }
        return PQexecPrepared(conn_, name.c_str(), n_params, params, nullptr, nullptr, 0);
    }
};

COPY for bulk load

void bulk_insert_logs(PGconn* conn, const std::vector<Log>& logs) {
    PQexec(conn, "BEGIN");
    PGresult* res = PQexec(conn, "COPY logs (ts, msg, level) FROM STDIN WITH (FORMAT text)");
    if (PQresultStatus(res) != PGRES_COPY_IN) {
        PQclear(res);
        return;
    }
    PQclear(res);
    for (const auto& log : logs) {
        std::string row = log.ts + "\t" + log.msg + "\t" + log.level + "\n";
        PQputCopyData(conn, row.c_str(), static_cast<int>(row.size()));
    }
    PQputCopyEnd(conn, nullptr);
    PQgetResult(conn);
    PQexec(conn, "COMMIT");
}

Naive SQL-based routing

PGconn* acquire_for_query(PgConnectionPool* primary, PgConnectionPool* replica,
                          const std::string& sql) {
    return (sql.find("SELECT") == 0) ? replica->acquire() : primary->acquire();
}

9. Checklist

Indexes

Index WHERE columns
Index join keys
Consider ORDER BY in composite indexes
Confirm index use with EXPLAIN
Functional predicates → expression index or rewrite query

Queries

Prepared statements / binding
No N+1 (JOIN or batch)
Project only needed columns
LIMIT where appropriate

Statistics

ANALYZE after large changes
Monitor pg_stat_*

Connections

Pooling
RAII guards
No leaks

Security and ops

No string concatenation for values
Query timeouts
Slow query logs

10. Summary

Topic	Takeaway
Indexes	B-tree on `WHERE` / `JOIN` / `ORDER BY`; composite order by selectivity
Plans	`EXPLAIN ANALYZE` to move seq scan → index scan
Statistics	`ANALYZE` for accurate estimates
Cost	Interpret `cost`, `rows`; lower `random_page_cost` on SSD
N+1	JOIN or batch to cut round trips
Prepared	Less parse CPU; safer binding
Production	Pooling, replicas, batch insert, `COPY`, timeouts, monitoring

One line: diagnose with scenarios, then apply index choice, plans, statistics, and cost awareness to turn multi-second queries into sub-100 ms where possible.

FAQ (expanded)

Q. When do I apply this at work?

A. When production queries exceed your latency budget—use scenarios, index rules, EXPLAIN, statistics, pitfalls, and patterns from this article with your own measurements.

Q. What should I read first?

A. Follow previous post links at the bottom of each article or use the C++ series index for order.

Q. Where can I go deeper?

A. cppreference and your driver/database docs. Official references below help too.

Next post: Database query optimization #51-8 — prepared statements, pooling, caching in depth.

References

C++ database query optimization #51-8
C++ database integration #31-3
C++ database engine #50-4

Keywords

C++, database, query optimization, index, query plan, EXPLAIN, PostgreSQL, SQLite

이 글의 핵심

Introduction

“Our query takes 3 seconds in production”

Why query optimization

A mental model

Table of contents

1. Problem scenarios

Scenario 1: “The user list API takes 3 seconds”

Scenario 2: “We have an index but still get a full scan”

Scenario 3: “Stale statistics pick the wrong plan”

Scenario 4: “Composite index column order is wrong”

Scenario 5: “Batch INSERT of 10k rows takes 30 seconds”

Scenario 6: “Same query 10k times—parse cost adds up”

Scenario 7: “Large join query OOMs the server”

Scenario diagram

2. Index selection guide

Why indexes matter

Index selection cheat sheet

Single-column indexes

Composite indexes and column order

Covering indexes (index-only scan)

Partial indexes

When indexes cannot be used

3. Execution plan analysis (EXPLAIN)

What is an execution plan

PostgreSQL EXPLAIN ANALYZE

Plan terminology

Running EXPLAIN from C++

SQLite EXPLAIN QUERY PLAN

Plan review checklist

4. Statistics and cost model

How statistics affect plans

Refresh with ANALYZE

pg_stat catalogs

Monitoring full-scan-heavy tables

Cost model (PostgreSQL)

Reading a cost line

5. End-to-end query optimization example

Example: ten latest orders per user

Step 1: Baseline

Step 2: Index design

Step 3: Remove N+1 — batched ANY / window query

Step 4: Verify the plan

Step 5: Compare approaches

Step 6: Prepared statements for hot paths

Step 7: Single JOIN alternative

Step 8: Plan comparison workflow

6. Common mistakes and fixes

1. N+1 queries

2. Full table scan

3. Functions prevent index use

4. Wrong composite order

5. Connection leak

6. SQL injection

7. Stale statistics

8. SELECT *

9. Too many indexes

10. Cache not invalidated

11. No prepared statements

12. Huge IN lists

13. Batch INSERT without a transaction

7. Best practices

Index design

Query style

Statistics

Plans

Connections

Monitoring

Index decision flow

N+1 alternatives

8. Production patterns

Pool + prepared statements

Read/write split

Batched INSERT

Statement timeout

Slow query logging

Scheduled ANALYZE

Prepared statement cache

COPY for bulk load

Naive SQL-based routing

Step 3: Remove N+1 — batched `ANY` / window query

8. `SELECT *`

12. Huge `IN` lists