본문으로 건너뛰기 [2026] C++ Query Optimization Guide | Index Selection, Plans, Statistics, Cost Model, Production Patterns [#49-3]

[2026] C++ Query Optimization Guide | Index Selection, Plans, Statistics, Cost Model, Production Patterns [#49-3]

[2026] C++ Query Optimization Guide | Index Selection, Plans, Statistics, Cost Model, Production Patterns [#49-3]

이 글의 핵심

C++ query optimization: index selection, execution plans, statistics, cost models, and production patterns [#49-3]. Real-world problems and scenarios.

Introduction

“Our query takes 3 seconds in production”

Why query optimization

You run a REST API server, but the user list endpoint takes more than 3 seconds. With only 100 concurrent users, CPU hits 90% and timeouts appear. Query optimization is the process of finding such bottlenecks and fixing them using indexes, execution plans, statistics, and cost models.

This post covers:

  • Problem scenarios: seven situations you see in real production
  • Index selection: single/composite/covering indexes, criteria, and full examples
  • Execution plans: EXPLAIN / EXPLAIN ANALYZE for full scans vs index scans
  • Statistics and cost models: ANALYZE, pg_stat, optimizer cost estimates
  • Common mistakes: N+1, full scans, unused indexes, prepared statements, and 13 patterns total
  • Best practices: index design, query style, monitoring
  • Production patterns: connection pooling, read/write split, batching

Related posts: Database query optimization #51-8, Database fundamentals.

A mental model

Treat the topic as a system of interlocking parts. Choices in one layer (storage, network, observability) affect others, so this article frames tradeoffs with numbers and patterns.


Practical experience: this article is grounded in problems and fixes from large C++ projects. It includes pitfalls and debugging tips you rarely see in textbooks.

Table of contents

  1. Problem scenarios
  2. Index selection guide
  3. Execution plan analysis (EXPLAIN)
  4. Statistics and cost model
  5. End-to-end query optimization example
  6. Common mistakes and fixes
  7. Best practices
  8. Production patterns
  9. Checklist
  10. Summary

Core loop: index selection → verify plan → refresh statistics → remove N+1 → prepared statements. Following this order resolves most query bottlenecks.


1. Problem scenarios

Scenario 1: “The user list API takes 3 seconds”

Situation: GET /users takes 3.2 seconds for 1000 users. The cause is N+1: an extra query per user for orders.

// Bad: 1 + 1000 = 1001 queries
void get_users_with_orders(PGconn* conn) {
    PGresult* users = PQexec(conn, "SELECT id, name FROM users");
    for (int i = 0; i < PQntuples(users); ++i) {
        int id = atoi(PQgetvalue(users, i, 0));
        PGresult* orders = PQexecParams(conn,
            "SELECT * FROM orders WHERE user_id = $1", 1, nullptr,
            (const char*[]){std::to_string(id).c_str()}, nullptr, nullptr, 0);
        // ... handle ...
        PQclear(orders);
    }
    PQclear(users);
}

Cause: one query per user inside the loop. 1000 users × ~3 ms ≈ 3 seconds.

Fix: reduce to 1–2 queries with JOIN or batched IN.


Scenario 2: “We have an index but still get a full scan”

Situation: table orders has idx_orders_user_id, but EXPLAIN shows Seq Scan.

-- Query
SELECT * FROM orders WHERE LOWER(status) = 'pending';

Note: If collation rules change, expression indexes and results can diverge—validate during migrations.

Cause: applying a function like LOWER(status) prevents a plain index on status from being used; the optimizer cannot probe the index on raw status values.

Fix: remove the function from the predicate, or add a functional index.

-- Functional index (PostgreSQL)
CREATE INDEX idx_orders_status_lower ON orders (LOWER(status));

Scenario 3: “Stale statistics pick the wrong plan”

Situation: 10 million new rows were added, but the optimizer still estimates 1000 rows and picks nested loop join. Hash join would be faster.

Cause: pg_stat_user_tables statistics are stale; without ANALYZE, the optimizer uses outdated histograms.

Fix: run ANALYZE regularly; after large INSERT/UPDATE batches, run it immediately.

ANALYZE orders;
-- Or specific columns only
ANALYZE orders (user_id, created_at);

Scenario 4: “Composite index column order is wrong”

Situation: for WHERE user_id = ? AND created_at > ? you created idx_orders_created_user (created_at, user_id). The index is not used effectively.

Cause: composite indexes are used from the left. If created_at is first, predicates that only filter on user_id get limited benefit.

Fix: put the most selective column first—e.g. if user_id narrows 10M rows to ~10, put user_id first.

-- Correct order: user_id (high selectivity) → created_at (range)
CREATE INDEX idx_orders_user_created ON orders(user_id, created_at);

Scenario 5: “Batch INSERT of 10k rows takes 30 seconds”

Situation: inserting 10k rows into a log table takes 30 seconds.

// Bad: 10k individual INSERTs
for (int i = 0; i < 10000; ++i) {
    PQexecParams(conn, "INSERT INTO logs (ts, msg) VALUES ($1, $2)", ...);
}

Cause: commit per row, WAL flush, index maintenance—10k round trips.

Fix: wrap in BEGIN/COMMIT, or use COPY.

// Batch INSERT
PQexec(conn, "BEGIN");
// e.g. insert in chunks of 100
PQexec(conn, "COMMIT");

Scenario 6: “Same query 10k times—parse cost adds up”

Situation: login runs SELECT * FROM users WHERE email = $1 1000 times per second. CPU is high and pg_stat_activity shows heavy Parse work.

Cause: repeated parse and plan per request without prepared statements.

Fix: prepare once; bind parameters for each execution.

// Prepared statement (PostgreSQL)
// 1. PREPARE once
PQexec(conn, "PREPARE get_user (text) AS SELECT * FROM users WHERE email = $1");
// 2. EXECUTE with different bindings
PQexecPrepared(conn, "get_user", 1, (const char*[]){"[email protected]"}, nullptr, nullptr, 0);

Scenario 7: “Large join query OOMs the server”

Situation: joining 1M users to 10M orders OOMs a 32 GB machine.

Cause: nested loop join materializes huge intermediates; exceeding work_mem spills to disk and slows everything.

Fix: inspect join strategy with EXPLAIN, refresh stats with ANALYZE to favor hash join; tune work_mem if needed.

-- Refresh stats to help hash join
ANALYZE users;
ANALYZE orders;
-- Session-local; use with care
SET work_mem = '256MB';

Scenario diagram

flowchart TB
    subgraph Problems[Query performance issues]
        P1[N+1 queries]
        P2[Full table scan]
        P3[Stale statistics]
        P4[Wrong index column order]
        P5[Per-row INSERT]
    end
    subgraph Solutions[Fixes]
        S1[JOIN / IN batch]
        S2[Add index / remove function]
        S3[ANALYZE]
        S4[Fix composite order]
        S5[Batch INSERT / COPY]
    end
    P1 --> S1
    P2 --> S2
    P3 --> S3
    P4 --> S4
    P5 --> S5

2. Index selection guide

Why indexes matter

Without an index, WHERE user_id = 123 may read the entire table. With a B-tree index, lookup is O(log N) in a few milliseconds.

flowchart TB
    subgraph NoIndex[No index]
        N1[Row 1] --> N2[Row 2]
        N2 --> N3[Row 3]
        N3 --> N4[...]
        N4 --> N5[Row 1M]
        N5 --> N6["Full table scan: read 1M rows"]
    end
    subgraph WithIndex[With index]
        I1["B-tree index"] --> I2["log2(1M) ≈ 20 nodes"]
        I2 --> I3["Seek to matching rows"]
    end

Index selection cheat sheet

Column usageIndex typeExampleReason
WHERE equalitySingle B-treeidx_users_emailExact match
WHERE rangeComposite (equality first)idx_orders_user_createduser_id eq + created_at range
JOIN keysBoth sidesorders.user_id, users.idJoin performance
ORDER BYComposite(user_id, created_at DESC)Avoid sort
SELECT columnsCoveringINCLUDE (amount, status)Avoid heap fetch

Single-column indexes

-- Lookup user by email (equality)
CREATE INDEX idx_users_email ON users(email);
-- Sort by created_at
CREATE INDEX idx_users_created_at ON users(created_at);
// Create indexes from C++ (SQLite)
#include <sqlite3.h>
void create_indexes(sqlite3* db) {
    const char* sql = R"(
        CREATE INDEX IF NOT EXISTS idx_users_email ON users(email);
        CREATE INDEX IF NOT EXISTS idx_users_created ON users(created_at);
    )";
    char* err = nullptr;
    sqlite3_exec(db, sql, nullptr, nullptr, &err);
    if (err) {
        sqlite3_free(err);
    }
}

Composite indexes and column order

Rule: put the most selective column first; equality before range.

-- user_id returns ~10 rows out of 10M; created_at is a range
-- Order: user_id (equality, selective) → created_at (range)
CREATE INDEX idx_orders_user_created ON orders(user_id, created_at DESC);
-- Wrong: created_at first makes user_id-only filters inefficient
CREATE INDEX idx_orders_created_user ON orders(created_at, user_id);

Queryable patterns:

Index (user_id, created_at)Works?Notes
WHERE user_id = ?YesLeft-prefix
WHERE user_id = ? AND created_at > ?YesBoth used
WHERE created_at > ?NoMissing leading user_id
WHERE user_id = ? ORDER BY created_at DESCYesIndex order covers sort

Covering indexes (index-only scan)

If all selected columns live in the index, PostgreSQL can satisfy the query from the index alone.

-- PostgreSQL: INCLUDE
CREATE INDEX idx_orders_user_cover ON orders(user_id) INCLUDE (amount, status);
-- SQLite: include columns in the key list
CREATE INDEX idx_orders_user_cover ON orders(user_id, amount, status);
-- Can become an index-only scan (with visibility map / ANALYZE as applicable)
SELECT user_id, amount, status FROM orders WHERE user_id = 123;

Partial indexes

Index only rows matching a predicate—smaller index, faster maintenance.

-- Only active orders
CREATE INDEX idx_orders_active ON orders(user_id) WHERE status = 'active';
-- Last 30 days of logs
CREATE INDEX idx_logs_recent ON logs(created_at) WHERE created_at > NOW() - INTERVAL '30 days';

When indexes cannot be used

-- Function on column — index not used (plain column index)
SELECT * FROM users WHERE LOWER(email) = '[email protected]';
-- Index-friendly
SELECT * FROM users WHERE email = '[email protected]';
-- Implicit cast (PostgreSQL)
SELECT * FROM users WHERE id = '123';  -- id is integer
-- Explicit type
SELECT * FROM users WHERE id = 123;
-- OR — may merge indexes but often inefficient
SELECT * FROM users WHERE email = '[email protected]' OR name = 'Alice';
-- Prefer UNION ALL with disjoint predicates
SELECT * FROM users WHERE email = '[email protected]'
UNION ALL
SELECT * FROM users WHERE name = 'Alice' AND email != '[email protected]';

3. Execution plan analysis (EXPLAIN)

What is an execution plan

The engine shows how it will run the query. EXPLAIN reveals full scans, index scans, and join order.

flowchart LR
    A[SQL query] --> B[Parser]
    B --> C[Optimizer]
    C --> D[Execution plan]
    D --> E[Index Scan]
    D --> F[Seq Scan]
    D --> G[Nested Loop]

PostgreSQL EXPLAIN ANALYZE

-- Actual timings (ANALYZE)
-- Buffer hits (BUFFERS)
EXPLAIN (ANALYZE, BUFFERS, FORMAT TEXT)
SELECT * FROM orders WHERE user_id = 123 ORDER BY created_at DESC LIMIT 10;

Using an index (good):

Limit  (cost=0.42..8.44 rows=10 width=40) (actual time=0.05..0.08 rows=10 loops=1)
  ->  Index Scan using idx_orders_user_created on orders
        Index Cond: (user_id = 123)
        Buffers: shared hit=4
Planning Time: 0.12 ms
Execution Time: 0.15 ms

Full scan (bad):

Limit  (cost=0.00..20834.00 rows=10 width=40) (actual time=45.2..182.3 rows=10 loops=1)
  ->  Seq Scan on orders
        Filter: (user_id = 123)
        Rows Removed by Filter: 999990
        Buffers: shared read=8000
Planning Time: 0.08 ms
Execution Time: 182.5 ms

Plan terminology

TermMeaningAction
Seq ScanFull table scanAdd or fix index
Index ScanIndex + heap fetchUsually good
Index Only ScanIndex only (covering)Best when applicable
Bitmap Index ScanBitmap then heapCommon for large result sets
Nested LoopNested-loop joinGood for small outer
Hash JoinHash joinLarge equi-joins
Merge JoinSort-merge joinPre-sorted inputs

Running EXPLAIN from C++

#include <libpq-fe.h>
#include <iostream>
#include <string>
void explain_query(PGconn* conn, const std::string& sql) {
    std::string explain_sql = "EXPLAIN (ANALYZE, BUFFERS) " + sql;
    PGresult* res = PQexec(conn, explain_sql.c_str());
    if (PQresultStatus(res) == PGRES_TUPLES_OK) {
        for (int i = 0; i < PQntuples(res); ++i) {
            const char* row = PQgetvalue(res, i, 0);
            if (row) std::cout << row << "\n";
        }
    }
    PQclear(res);
}

SQLite EXPLAIN QUERY PLAN

EXPLAIN QUERY PLAN
SELECT id, name FROM users WHERE email = '[email protected]';
-- With index
SEARCH users USING INDEX idx_users_email (email=?)
-- Without index
SCAN users

Plan review checklist

CheckGoodBadMitigation
Scan typeIndex / index-onlySeq ScanAdd or change index
Rows removed by filter~0Very largeIndex WHERE columns
Buffers shared hitMostly hitMostly readWarm cache / smaller reads
actual timeLow msHigh msRevisit query/index
Planning timeSub-ms to low msVery highStats, simpler query
// SQLite EXPLAIN from C++
void explain_sqlite(sqlite3* db, const std::string& sql) {
    sqlite3_stmt* stmt = nullptr;
    std::string explain_sql = "EXPLAIN QUERY PLAN " + sql;
    sqlite3_prepare_v2(db, explain_sql.c_str(), -1, &stmt, nullptr);
    while (sqlite3_step(stmt) == SQLITE_ROW) {
        const char* detail = reinterpret_cast<const char*>(sqlite3_column_text(stmt, 3));
        if (detail) std::cout << detail << "\n";
    }
    sqlite3_finalize(stmt);
}

4. Statistics and cost model

How statistics affect plans

The optimizer estimates cost from table and index statistics. Stale stats yield wrong row estimates and bad joins.

flowchart TB
    A[pg_stat_user_tables] --> B[Optimizer]
    C[pg_stats] --> B
    B --> D[Cost estimate]
    D --> E[Plan choice]

Refresh with ANALYZE

-- Whole database
ANALYZE;
-- One table
ANALYZE orders;
-- Specific columns on huge tables
ANALYZE orders (user_id, created_at);
// Run ANALYZE from C++
void refresh_statistics(PGconn* conn) {
    PGresult* res = PQexec(conn, "ANALYZE orders");
    if (PQresultStatus(res) != PGRES_COMMAND_OK) {
        std::cerr << "ANALYZE failed: " << PQerrorMessage(conn) << "\n";
    }
    PQclear(res);
}

pg_stat catalogs

-- Table scan stats
SELECT relname, seq_scan, seq_tup_read, idx_scan, idx_tup_fetch
FROM pg_stat_user_tables
WHERE relname = 'orders';
-- Per-index stats
SELECT indexrelname, idx_scan, idx_tup_read, idx_tup_fetch
FROM pg_stat_user_indexes
WHERE relname = 'orders';
ColumnMeaning
seq_scanSequential scan count
seq_tup_readRows read by seq scan
idx_scanIndex scan count
idx_tup_fetchRows fetched via index

Monitoring full-scan-heavy tables

-- Tables with many seq scans (candidates for indexes)
SELECT relname, seq_scan, seq_tup_read, idx_scan, idx_tup_fetch,
       seq_scan::float / NULLIF(seq_scan + idx_scan, 0) AS seq_ratio
FROM pg_stat_user_tables
WHERE seq_scan + idx_scan > 100
ORDER BY seq_tup_read DESC
LIMIT 10;
-- Unused indexes (candidates to drop)
SELECT schemaname, relname, indexrelname, idx_scan
FROM pg_stat_user_indexes
WHERE idx_scan = 0
  AND indexrelname NOT LIKE '%_pkey';
// Log table stats from C++
void log_table_stats(PGconn* conn) {
    const char* sql = R"(
        SELECT relname, seq_scan, idx_scan, seq_tup_read, idx_tup_fetch
        FROM pg_stat_user_tables
        WHERE seq_scan > 1000 OR seq_tup_read > 100000
    )";
    PGresult* res = PQexec(conn, sql);
    for (int i = 0; i < PQntuples(res); ++i) {
        printf("Table: %s, seq_scan=%s, idx_scan=%s\n",
               PQgetvalue(res, i, 0), PQgetvalue(res, i, 1), PQgetvalue(res, i, 2));
    }
    PQclear(res);
}

Cost model (PostgreSQL)

How PostgreSQL estimates cost:

Sequential scan cost = (pages × seq_page_cost) + (rows × cpu_tuple_cost)
Index scan cost ≈ (index pages × random_page_cost) + (heap pages × random_page_cost) + (rows × cpu_tuple_cost)

Example:
orders: 10M rows, 125K pages

Seq Scan:
= 125,000 × 1.0 + 10,000,000 × 0.01
= 225,000 cost units

Index Scan (user_id = 123, 100 rows):
= (4 index pages × 4.0) + (100 heap pages × 4.0) + (100 × 0.01)
≈ 417 cost units

→ Optimizer prefers index scan

PostgreSQL compares abstract cost units. Defaults:

  • seq_page_cost = 1.0 — sequential page read
  • random_page_cost = 4.0 — random page (often 1.1–1.5 on SSD)
  • cpu_tuple_cost = 0.01 — per-row CPU
SHOW seq_page_cost;
SHOW random_page_cost;
-- SSD tuning example
ALTER SYSTEM SET random_page_cost = 1.1;

Reading a cost line

Limit  (cost=0.42..8.44 rows=10 width=40)
  • cost=0.42..8.44: startup cost .. total cost
  • rows=10: estimated rows
  • width=40: average row width in bytes

High total cost (e.g. cost=0.00..20834.00) often indicates a large sequential scan.


5. End-to-end query optimization example

Example: ten latest orders per user

Requirement: for 1000 users, fetch the ten most recent orders each, under 3 seconds total.

Step 1: Baseline

-- users: id, name, email, created_at (1M rows)
-- orders: id, user_id, amount, status, created_at (10M rows)
-- Naive approach (N+1)
SELECT * FROM users;
-- loop: SELECT * FROM orders WHERE user_id = ? ORDER BY created_at DESC LIMIT 10

EXPLAIN:

  • users: seq scan (no helpful index)
  • orders: repeated seq scans without user_id index

Step 2: Index design

CREATE INDEX idx_users_email ON users(email);
-- Core: user_id equality + created_at order
CREATE INDEX idx_orders_user_created ON orders(user_id, created_at DESC);

Step 3: Remove N+1 — batched ANY / window query

#include <libpq-fe.h>
#include <vector>
#include <string>
#include <unordered_map>
struct Order {
    int id, user_id, amount;
    std::string status;
};
std::unordered_map<int, std::vector<Order>> get_recent_orders_batch(
    PGconn* conn, const std::vector<int>& user_ids, int limit = 10)
{
    if (user_ids.empty()) return {};
    std::string sql = "SELECT user_id, id, amount, status FROM (";
    sql += "SELECT user_id, id, amount, status, "
           "ROW_NUMBER() OVER (PARTITION BY user_id ORDER BY created_at DESC) rn ";
    sql += "FROM orders WHERE user_id = ANY($1::int[])) sub ";
    sql += "WHERE rn <= " + std::to_string(limit);
    std::string array_str = "{";
    for (size_t i = 0; i < user_ids.size(); ++i) {
        if (i > 0) array_str += ",";
        array_str += std::to_string(user_ids[i]);
    }
    array_str += "}";
    const char* params[] = {array_str.c_str()};
    PGresult* res = PQexecParams(conn, sql.c_str(), 1, nullptr, params, nullptr, nullptr, 0);
    std::unordered_map<int, std::vector<Order>> result;
    if (PQresultStatus(res) == PGRES_TUPLES_OK) {
        for (int i = 0; i < PQntuples(res); ++i) {
            Order o;
            o.user_id = atoi(PQgetvalue(res, i, 0));
            o.id = atoi(PQgetvalue(res, i, 1));
            o.amount = atoi(PQgetvalue(res, i, 2));
            o.status = PQgetvalue(res, i, 3) ? PQgetvalue(res, i, 3) : "";
            result[o.user_id].push_back(o);
        }
    }
    PQclear(res);
    return result;
}

Step 4: Verify the plan

EXPLAIN (ANALYZE, BUFFERS)
SELECT user_id, id, amount, status FROM (
    SELECT user_id, id, amount, status,
           ROW_NUMBER() OVER (PARTITION BY user_id ORDER BY created_at DESC) rn
    FROM orders WHERE user_id = ANY(ARRAY[1,2,3,...,1000])
) sub WHERE rn <= 10;

Expect: Index Scan using idx_orders_user_created or Bitmap Heap Scan.

Step 5: Compare approaches

ApproachQueriesApprox. time
N+1 (no index)10013200 ms
N+1 (with index)1001800 ms
Batch ANY (with index)295 ms

Step 6: Prepared statements for hot paths

Repeated execution benefits from one-time parse/plan; binding also prevents SQL injection.

// PostgreSQL: PQprepare + PQexecPrepared
#include <libpq-fe.h>
#include <string>
class PreparedQuery {
    PGconn* conn_;
    std::string name_;
    int n_params_;
public:
    PreparedQuery(PGconn* conn, const std::string& name,
                  const std::string& sql, int n_params)
        : conn_(conn), name_(name), n_params_(n_params)
    {
        PGresult* res = PQprepare(conn_, name_.c_str(), sql.c_str(), n_params_, nullptr);
        if (PQresultStatus(res) != PGRES_COMMAND_OK) {
            PQclear(res);
            throw std::runtime_error(PQerrorMessage(conn_));
        }
        PQclear(res);
    }
    PGresult* execute(const char* const* params) {
        return PQexecPrepared(conn_, name_.c_str(), n_params_, params, nullptr, nullptr, 0);
    }
};
// SQLite: sqlite3_prepare_v2 + bind + step in a loop
sqlite3_stmt* stmt = nullptr;
sqlite3_prepare_v2(db, "SELECT id, name FROM users WHERE email = ?", -1, &stmt, nullptr);
for (const auto& email : emails) {
    sqlite3_bind_text(stmt, 1, email.c_str(), -1, SQLITE_TRANSIENT);
    while (sqlite3_step(stmt) == SQLITE_ROW) {
        // handle row
    }
    sqlite3_reset(stmt);
}
sqlite3_finalize(stmt);

Step 7: Single JOIN alternative

You can fetch users and ranked orders in one query.

WITH ranked AS (
    SELECT o.*, ROW_NUMBER() OVER (PARTITION BY o.user_id ORDER BY o.created_at DESC) rn
    FROM orders o
    JOIN users u ON o.user_id = u.id
    WHERE u.created_at > '2025-01-01'
)
SELECT u.id, u.name, r.id AS order_id, r.amount, r.status
FROM users u
JOIN ranked r ON u.id = r.user_id AND r.rn <= 10
ORDER BY u.id, r.rn;
// Process JOIN result in C++
std::vector<UserWithOrders> get_users_with_orders_join(PGconn* conn) {
    const char* sql = R"(
        WITH ranked AS (
            SELECT o.user_id, o.id AS order_id, o.amount, o.status,
                   ROW_NUMBER() OVER (PARTITION BY o.user_id ORDER BY o.created_at DESC) rn
            FROM orders o JOIN users u ON o.user_id = u.id
        )
        SELECT u.id, u.name, r.order_id, r.amount, r.status
        FROM users u
        JOIN ranked r ON u.id = r.user_id AND r.rn <= 10
        ORDER BY u.id, r.rn
    )";
    PGresult* res = PQexec(conn, sql);
    std::vector<UserWithOrders> result;
    int current_user_id = -1;
    UserWithOrders* current = nullptr;
    for (int i = 0; i < PQntuples(res); ++i) {
        int uid = atoi(PQgetvalue(res, i, 0));
        if (uid != current_user_id) {
            result.push_back({uid, PQgetvalue(res, i, 1), {}});
            current = &result.back();
            current_user_id = uid;
        }
        if (current)
            current->orders.push_back({atoi(PQgetvalue(res, i, 2)),
                                       atoi(PQgetvalue(res, i, 3)),
                                       PQgetvalue(res, i, 4)});
    }
    PQclear(res);
    return result;
}

Step 8: Plan comparison workflow

1. Run EXPLAIN (ANALYZE, BUFFERS) on the query
2. If Seq Scan → check Rows Removed by Filter; consider indexes
3. If Index Scan → high shared hit ratio means cache friendly
4. Compare cost vs actual time — cost is estimate, actual is truth
5. Nested loop vs hash join — depends on cardinality and stats
-- Compare plans before/after ANALYZE
EXPLAIN (ANALYZE, BUFFERS, FORMAT JSON)
SELECT u.id, u.name, COUNT(o.id)
FROM users u
JOIN orders o ON u.id = o.user_id
GROUP BY u.id, u.name;

6. Common mistakes and fixes

1. N+1 queries

Symptom: list API > 3 seconds.
Cause: query inside loop.
Fix: JOIN or batched IN / ANY.

// Bad
for (auto& user : users) {
    auto orders = query_orders(conn, user.id);
}

2. Full table scan

Symptom: Seq Scan over millions of rows.
Cause: no index on WHERE columns.

CREATE INDEX idx_orders_user_id ON orders(user_id);

3. Functions prevent index use

Symptom: index exists but plan shows seq scan.

SELECT * FROM users WHERE email = '[email protected]';
CREATE INDEX idx_users_email_lower ON users (LOWER(email));

4. Wrong composite order

Symptom: index (created_at, user_id) but query filters only user_id.
Fix: (user_id, created_at).

DROP INDEX idx_orders_created_user;
CREATE INDEX idx_orders_user_created ON orders(user_id, created_at);

5. Connection leak

Symptom: too many connections.
Fix: RAII guards / pool release.

PgConnectionGuard guard(pool);
PGconn* conn = guard.get();

6. SQL injection

Symptom: hostile input exfiltrates data.

// Never
std::string sql = "SELECT * FROM users WHERE id = " + user_input;
// Always bind
PQexecParams(conn, "SELECT * FROM users WHERE id = $1::int", 1, nullptr, params, nullptr, nullptr, 0);

7. Stale statistics

Symptom: wrong join after bulk load.
Fix: ANALYZE.


8. SELECT *

Symptom: wide rows, no covering index benefit.

SELECT id, amount, status FROM orders WHERE user_id = 123;

9. Too many indexes

Symptom: slow writes.
Fix: index only query patterns you run; drop unused indexes.


10. Cache not invalidated

Symptom: stale reads.
Fix: invalidate or update cache on writes.

void update_user(..., LruCache<int, User>& cache, ...) {
    cache.put(id, new_user);
}

11. No prepared statements

Symptom: high CPU on repeated queries.
Fix: PQprepare / PQexecPrepared or long-lived sqlite3_stmt*.


12. Huge IN lists

Symptom: thousands of literals—bad plans or errors.
Fix: chunk (e.g. 500–1000) or temp table + JOIN.

const size_t BATCH = 500;
for (size_t i = 0; i < ids.size(); i += BATCH) {
    auto end = std::min(i + BATCH, ids.size());
    query_batch(conn, ids.data() + i, end - i);
}

13. Batch INSERT without a transaction

Symptom: 10k inserts in 30+ seconds.
Fix: BEGIN/COMMIT or COPY.

PQexec(conn, "BEGIN");
for (const auto& row : rows) {
    PQexecParams(conn, "INSERT INTO logs (...) VALUES ($1, $2)", ...);
}
PQexec(conn, "COMMIT");

7. Best practices

Index design

  • Index WHERE, JOIN, and ORDER BY columns as needed
  • Composite: most selective column first
  • Use covering / INCLUDE for index-only scans when worthwhile
  • Drop unused indexes

Query style

  • Avoid SELECT *; project needed columns
  • Use parameter binding and prepared statements
  • Avoid N+1 (JOIN, batching)
  • Use LIMIT on large result sets

Statistics

  • ANALYZE after bulk changes; schedule periodic analyze
  • Watch pg_stat_user_tables

Plans

  • After index changes, run EXPLAIN ANALYZE
  • Confirm seq → index transition
  • Compare estimated vs actual rows

Connections

  • Use a pool (typical sizes 10–20)
  • RAII for acquire/release

Monitoring

  • Log slow queries (e.g. > 100 ms)
  • Inspect pg_stat_activity
  • Set statement timeouts

Index decision flow

flowchart TD
    A[Analyze query] --> B{WHERE shape?}
    B -->|Single equality| C[Single B-tree]
    B -->|Equality + range| D[Composite: equality first]
    B -->|Range only| E[Single or partial index]
    C --> F[Most selective column]
    D --> F
    E --> F
    F --> G{Can cover SELECT?}
    G -->|Yes| H[Covering / INCLUDE]
    G -->|No| I[Non-covering index]

N+1 alternatives

PatternWhenProsCons
JOIN1:N, one round tripSingle queryWider rows, grouping in app
IN batchMany IDsFlexibleLimit IN size
Lazy loadRare accessFast first paintEasy to N+1
Lazy + cacheHot readsFast on hitInvalidation complexity

8. Production patterns

Pool + prepared statements

struct DbContext {
    PgConnectionPool& pool;
    PgConnectionGuard guard;
    PGconn* conn;
    PreparedStatementCache& stmt_cache;
    explicit DbContext(PgConnectionPool& p, PreparedStatementCache& c)
        : pool(p), guard(p), conn(guard.get()), stmt_cache(c) {}
};

Read/write split

flowchart TB
    App[App] --> Router[Router]
    Router -->|SELECT| Replica[Replica]
    Router -->|INSERT/UPDATE| Primary[Primary]
    Primary -.->|Replication| Replica
class ReadWriteSplit {
    PgConnectionPool* primary_;
    PgConnectionPool* replica_;
public:
    PGconn* acquire_read() { return replica_->acquire(); }
    PGconn* acquire_write() { return primary_->acquire(); }
};

Batched INSERT

PQexec(conn, "BEGIN");
for (size_t i = 0; i < batch.size(); i += 100) {
    batch_insert(conn, batch, i, std::min(i + 100, batch.size()));
}
PQexec(conn, "COMMIT");

Statement timeout

PQexec(conn, "SET statement_timeout = 5000");  // 5s

Slow query logging

auto start = std::chrono::steady_clock::now();
PGresult* res = PQexecParams(conn, sql.c_str(), ...);
auto elapsed = std::chrono::duration_cast<std::chrono::milliseconds>(
    std::chrono::steady_clock::now() - start).count();
if (elapsed > 100) {
    log_warn("Slow query: {}ms - {}", elapsed, sql);
}

Scheduled ANALYZE

-- pg_cron or OS cron
-- 0 3 * * * psql -c "ANALYZE"

Prepared statement cache

class PreparedStatementCache {
    PGconn* conn_;
    std::unordered_map<std::string, std::string> stmts_;
public:
    explicit PreparedStatementCache(PGconn* conn) : conn_(conn) {}
    PGresult* execute(const std::string& name, const std::string& sql,
                      int n_params, const char* const* params) {
        if (stmts_.find(name) == stmts_.end()) {
            std::string prepare_sql = "PREPARE " + name + " AS " + sql;
            PQexec(conn_, prepare_sql.c_str());
            stmts_[name] = sql;
        }
        return PQexecPrepared(conn_, name.c_str(), n_params, params, nullptr, nullptr, 0);
    }
};

COPY for bulk load

void bulk_insert_logs(PGconn* conn, const std::vector<Log>& logs) {
    PQexec(conn, "BEGIN");
    PGresult* res = PQexec(conn, "COPY logs (ts, msg, level) FROM STDIN WITH (FORMAT text)");
    if (PQresultStatus(res) != PGRES_COPY_IN) {
        PQclear(res);
        return;
    }
    PQclear(res);
    for (const auto& log : logs) {
        std::string row = log.ts + "\t" + log.msg + "\t" + log.level + "\n";
        PQputCopyData(conn, row.c_str(), static_cast<int>(row.size()));
    }
    PQputCopyEnd(conn, nullptr);
    PQgetResult(conn);
    PQexec(conn, "COMMIT");
}

Naive SQL-based routing

PGconn* acquire_for_query(PgConnectionPool* primary, PgConnectionPool* replica,
                          const std::string& sql) {
    return (sql.find("SELECT") == 0) ? replica->acquire() : primary->acquire();
}

9. Checklist

Indexes

  • Index WHERE columns
  • Index join keys
  • Consider ORDER BY in composite indexes
  • Confirm index use with EXPLAIN
  • Functional predicates → expression index or rewrite query

Queries

  • Prepared statements / binding
  • No N+1 (JOIN or batch)
  • Project only needed columns
  • LIMIT where appropriate

Statistics

  • ANALYZE after large changes
  • Monitor pg_stat_*

Connections

  • Pooling
  • RAII guards
  • No leaks

Security and ops

  • No string concatenation for values
  • Query timeouts
  • Slow query logs

10. Summary

TopicTakeaway
IndexesB-tree on WHERE / JOIN / ORDER BY; composite order by selectivity
PlansEXPLAIN ANALYZE to move seq scan → index scan
StatisticsANALYZE for accurate estimates
CostInterpret cost, rows; lower random_page_cost on SSD
N+1JOIN or batch to cut round trips
PreparedLess parse CPU; safer binding
ProductionPooling, replicas, batch insert, COPY, timeouts, monitoring

One line: diagnose with scenarios, then apply index choice, plans, statistics, and cost awareness to turn multi-second queries into sub-100 ms where possible.


FAQ (expanded)

Q. When do I apply this at work?

A. When production queries exceed your latency budget—use scenarios, index rules, EXPLAIN, statistics, pitfalls, and patterns from this article with your own measurements.

Q. What should I read first?

A. Follow previous post links at the bottom of each article or use the C++ series index for order.

Q. Where can I go deeper?

A. cppreference and your driver/database docs. Official references below help too.

Next post: Database query optimization #51-8 — prepared statements, pooling, caching in depth.


References


  • C++ database query optimization #51-8
  • C++ database integration #31-3
  • C++ database engine #50-4

Keywords

C++, database, query optimization, index, query plan, EXPLAIN, PostgreSQL, SQLite


More from the blog