[2026] C++ Query Optimization Guide | Index Selection, Plans, Statistics, Cost Model, Production Patterns [#49-3]
이 글의 핵심
C++ query optimization: index selection, execution plans, statistics, cost models, and production patterns [#49-3]. Real-world problems and scenarios.
Introduction
“Our query takes 3 seconds in production”
Why query optimization
You run a REST API server, but the user list endpoint takes more than 3 seconds. With only 100 concurrent users, CPU hits 90% and timeouts appear. Query optimization is the process of finding such bottlenecks and fixing them using indexes, execution plans, statistics, and cost models.
This post covers:
- Problem scenarios: seven situations you see in real production
- Index selection: single/composite/covering indexes, criteria, and full examples
- Execution plans: EXPLAIN / EXPLAIN ANALYZE for full scans vs index scans
- Statistics and cost models: ANALYZE, pg_stat, optimizer cost estimates
- Common mistakes: N+1, full scans, unused indexes, prepared statements, and 13 patterns total
- Best practices: index design, query style, monitoring
- Production patterns: connection pooling, read/write split, batching
Related posts: Database query optimization #51-8, Database fundamentals.
A mental model
Treat the topic as a system of interlocking parts. Choices in one layer (storage, network, observability) affect others, so this article frames tradeoffs with numbers and patterns.
Practical experience: this article is grounded in problems and fixes from large C++ projects. It includes pitfalls and debugging tips you rarely see in textbooks.
Table of contents
- Problem scenarios
- Index selection guide
- Execution plan analysis (EXPLAIN)
- Statistics and cost model
- End-to-end query optimization example
- Common mistakes and fixes
- Best practices
- Production patterns
- Checklist
- Summary
Core loop: index selection → verify plan → refresh statistics → remove N+1 → prepared statements. Following this order resolves most query bottlenecks.
1. Problem scenarios
Scenario 1: “The user list API takes 3 seconds”
Situation: GET /users takes 3.2 seconds for 1000 users. The cause is N+1: an extra query per user for orders.
// Bad: 1 + 1000 = 1001 queries
void get_users_with_orders(PGconn* conn) {
PGresult* users = PQexec(conn, "SELECT id, name FROM users");
for (int i = 0; i < PQntuples(users); ++i) {
int id = atoi(PQgetvalue(users, i, 0));
PGresult* orders = PQexecParams(conn,
"SELECT * FROM orders WHERE user_id = $1", 1, nullptr,
(const char*[]){std::to_string(id).c_str()}, nullptr, nullptr, 0);
// ... handle ...
PQclear(orders);
}
PQclear(users);
}
Cause: one query per user inside the loop. 1000 users × ~3 ms ≈ 3 seconds.
Fix: reduce to 1–2 queries with JOIN or batched IN.
Scenario 2: “We have an index but still get a full scan”
Situation: table orders has idx_orders_user_id, but EXPLAIN shows Seq Scan.
-- Query
SELECT * FROM orders WHERE LOWER(status) = 'pending';
Note: If collation rules change, expression indexes and results can diverge—validate during migrations.
Cause: applying a function like LOWER(status) prevents a plain index on status from being used; the optimizer cannot probe the index on raw status values.
Fix: remove the function from the predicate, or add a functional index.
-- Functional index (PostgreSQL)
CREATE INDEX idx_orders_status_lower ON orders (LOWER(status));
Scenario 3: “Stale statistics pick the wrong plan”
Situation: 10 million new rows were added, but the optimizer still estimates 1000 rows and picks nested loop join. Hash join would be faster.
Cause: pg_stat_user_tables statistics are stale; without ANALYZE, the optimizer uses outdated histograms.
Fix: run ANALYZE regularly; after large INSERT/UPDATE batches, run it immediately.
ANALYZE orders;
-- Or specific columns only
ANALYZE orders (user_id, created_at);
Scenario 4: “Composite index column order is wrong”
Situation: for WHERE user_id = ? AND created_at > ? you created idx_orders_created_user (created_at, user_id). The index is not used effectively.
Cause: composite indexes are used from the left. If created_at is first, predicates that only filter on user_id get limited benefit.
Fix: put the most selective column first—e.g. if user_id narrows 10M rows to ~10, put user_id first.
-- Correct order: user_id (high selectivity) → created_at (range)
CREATE INDEX idx_orders_user_created ON orders(user_id, created_at);
Scenario 5: “Batch INSERT of 10k rows takes 30 seconds”
Situation: inserting 10k rows into a log table takes 30 seconds.
// Bad: 10k individual INSERTs
for (int i = 0; i < 10000; ++i) {
PQexecParams(conn, "INSERT INTO logs (ts, msg) VALUES ($1, $2)", ...);
}
Cause: commit per row, WAL flush, index maintenance—10k round trips.
Fix: wrap in BEGIN/COMMIT, or use COPY.
// Batch INSERT
PQexec(conn, "BEGIN");
// e.g. insert in chunks of 100
PQexec(conn, "COMMIT");
Scenario 6: “Same query 10k times—parse cost adds up”
Situation: login runs SELECT * FROM users WHERE email = $1 1000 times per second. CPU is high and pg_stat_activity shows heavy Parse work.
Cause: repeated parse and plan per request without prepared statements.
Fix: prepare once; bind parameters for each execution.
// Prepared statement (PostgreSQL)
// 1. PREPARE once
PQexec(conn, "PREPARE get_user (text) AS SELECT * FROM users WHERE email = $1");
// 2. EXECUTE with different bindings
PQexecPrepared(conn, "get_user", 1, (const char*[]){"[email protected]"}, nullptr, nullptr, 0);
Scenario 7: “Large join query OOMs the server”
Situation: joining 1M users to 10M orders OOMs a 32 GB machine.
Cause: nested loop join materializes huge intermediates; exceeding work_mem spills to disk and slows everything.
Fix: inspect join strategy with EXPLAIN, refresh stats with ANALYZE to favor hash join; tune work_mem if needed.
-- Refresh stats to help hash join
ANALYZE users;
ANALYZE orders;
-- Session-local; use with care
SET work_mem = '256MB';
Scenario diagram
flowchart TB
subgraph Problems[Query performance issues]
P1[N+1 queries]
P2[Full table scan]
P3[Stale statistics]
P4[Wrong index column order]
P5[Per-row INSERT]
end
subgraph Solutions[Fixes]
S1[JOIN / IN batch]
S2[Add index / remove function]
S3[ANALYZE]
S4[Fix composite order]
S5[Batch INSERT / COPY]
end
P1 --> S1
P2 --> S2
P3 --> S3
P4 --> S4
P5 --> S5
2. Index selection guide
Why indexes matter
Without an index, WHERE user_id = 123 may read the entire table. With a B-tree index, lookup is O(log N) in a few milliseconds.
flowchart TB
subgraph NoIndex[No index]
N1[Row 1] --> N2[Row 2]
N2 --> N3[Row 3]
N3 --> N4[...]
N4 --> N5[Row 1M]
N5 --> N6["Full table scan: read 1M rows"]
end
subgraph WithIndex[With index]
I1["B-tree index"] --> I2["log2(1M) ≈ 20 nodes"]
I2 --> I3["Seek to matching rows"]
end
Index selection cheat sheet
| Column usage | Index type | Example | Reason |
|---|---|---|---|
| WHERE equality | Single B-tree | idx_users_email | Exact match |
| WHERE range | Composite (equality first) | idx_orders_user_created | user_id eq + created_at range |
| JOIN keys | Both sides | orders.user_id, users.id | Join performance |
| ORDER BY | Composite | (user_id, created_at DESC) | Avoid sort |
| SELECT columns | Covering | INCLUDE (amount, status) | Avoid heap fetch |
Single-column indexes
-- Lookup user by email (equality)
CREATE INDEX idx_users_email ON users(email);
-- Sort by created_at
CREATE INDEX idx_users_created_at ON users(created_at);
// Create indexes from C++ (SQLite)
#include <sqlite3.h>
void create_indexes(sqlite3* db) {
const char* sql = R"(
CREATE INDEX IF NOT EXISTS idx_users_email ON users(email);
CREATE INDEX IF NOT EXISTS idx_users_created ON users(created_at);
)";
char* err = nullptr;
sqlite3_exec(db, sql, nullptr, nullptr, &err);
if (err) {
sqlite3_free(err);
}
}
Composite indexes and column order
Rule: put the most selective column first; equality before range.
-- user_id returns ~10 rows out of 10M; created_at is a range
-- Order: user_id (equality, selective) → created_at (range)
CREATE INDEX idx_orders_user_created ON orders(user_id, created_at DESC);
-- Wrong: created_at first makes user_id-only filters inefficient
CREATE INDEX idx_orders_created_user ON orders(created_at, user_id);
Queryable patterns:
Index (user_id, created_at) | Works? | Notes |
|---|---|---|
WHERE user_id = ? | Yes | Left-prefix |
WHERE user_id = ? AND created_at > ? | Yes | Both used |
WHERE created_at > ? | No | Missing leading user_id |
WHERE user_id = ? ORDER BY created_at DESC | Yes | Index order covers sort |
Covering indexes (index-only scan)
If all selected columns live in the index, PostgreSQL can satisfy the query from the index alone.
-- PostgreSQL: INCLUDE
CREATE INDEX idx_orders_user_cover ON orders(user_id) INCLUDE (amount, status);
-- SQLite: include columns in the key list
CREATE INDEX idx_orders_user_cover ON orders(user_id, amount, status);
-- Can become an index-only scan (with visibility map / ANALYZE as applicable)
SELECT user_id, amount, status FROM orders WHERE user_id = 123;
Partial indexes
Index only rows matching a predicate—smaller index, faster maintenance.
-- Only active orders
CREATE INDEX idx_orders_active ON orders(user_id) WHERE status = 'active';
-- Last 30 days of logs
CREATE INDEX idx_logs_recent ON logs(created_at) WHERE created_at > NOW() - INTERVAL '30 days';
When indexes cannot be used
-- Function on column — index not used (plain column index)
SELECT * FROM users WHERE LOWER(email) = '[email protected]';
-- Index-friendly
SELECT * FROM users WHERE email = '[email protected]';
-- Implicit cast (PostgreSQL)
SELECT * FROM users WHERE id = '123'; -- id is integer
-- Explicit type
SELECT * FROM users WHERE id = 123;
-- OR — may merge indexes but often inefficient
SELECT * FROM users WHERE email = '[email protected]' OR name = 'Alice';
-- Prefer UNION ALL with disjoint predicates
SELECT * FROM users WHERE email = '[email protected]'
UNION ALL
SELECT * FROM users WHERE name = 'Alice' AND email != '[email protected]';
3. Execution plan analysis (EXPLAIN)
What is an execution plan
The engine shows how it will run the query. EXPLAIN reveals full scans, index scans, and join order.
flowchart LR
A[SQL query] --> B[Parser]
B --> C[Optimizer]
C --> D[Execution plan]
D --> E[Index Scan]
D --> F[Seq Scan]
D --> G[Nested Loop]
PostgreSQL EXPLAIN ANALYZE
-- Actual timings (ANALYZE)
-- Buffer hits (BUFFERS)
EXPLAIN (ANALYZE, BUFFERS, FORMAT TEXT)
SELECT * FROM orders WHERE user_id = 123 ORDER BY created_at DESC LIMIT 10;
Using an index (good):
Limit (cost=0.42..8.44 rows=10 width=40) (actual time=0.05..0.08 rows=10 loops=1)
-> Index Scan using idx_orders_user_created on orders
Index Cond: (user_id = 123)
Buffers: shared hit=4
Planning Time: 0.12 ms
Execution Time: 0.15 ms
Full scan (bad):
Limit (cost=0.00..20834.00 rows=10 width=40) (actual time=45.2..182.3 rows=10 loops=1)
-> Seq Scan on orders
Filter: (user_id = 123)
Rows Removed by Filter: 999990
Buffers: shared read=8000
Planning Time: 0.08 ms
Execution Time: 182.5 ms
Plan terminology
| Term | Meaning | Action |
|---|---|---|
| Seq Scan | Full table scan | Add or fix index |
| Index Scan | Index + heap fetch | Usually good |
| Index Only Scan | Index only (covering) | Best when applicable |
| Bitmap Index Scan | Bitmap then heap | Common for large result sets |
| Nested Loop | Nested-loop join | Good for small outer |
| Hash Join | Hash join | Large equi-joins |
| Merge Join | Sort-merge join | Pre-sorted inputs |
Running EXPLAIN from C++
#include <libpq-fe.h>
#include <iostream>
#include <string>
void explain_query(PGconn* conn, const std::string& sql) {
std::string explain_sql = "EXPLAIN (ANALYZE, BUFFERS) " + sql;
PGresult* res = PQexec(conn, explain_sql.c_str());
if (PQresultStatus(res) == PGRES_TUPLES_OK) {
for (int i = 0; i < PQntuples(res); ++i) {
const char* row = PQgetvalue(res, i, 0);
if (row) std::cout << row << "\n";
}
}
PQclear(res);
}
SQLite EXPLAIN QUERY PLAN
EXPLAIN QUERY PLAN
SELECT id, name FROM users WHERE email = '[email protected]';
-- With index
SEARCH users USING INDEX idx_users_email (email=?)
-- Without index
SCAN users
Plan review checklist
| Check | Good | Bad | Mitigation |
|---|---|---|---|
| Scan type | Index / index-only | Seq Scan | Add or change index |
| Rows removed by filter | ~0 | Very large | Index WHERE columns |
| Buffers shared hit | Mostly hit | Mostly read | Warm cache / smaller reads |
| actual time | Low ms | High ms | Revisit query/index |
| Planning time | Sub-ms to low ms | Very high | Stats, simpler query |
// SQLite EXPLAIN from C++
void explain_sqlite(sqlite3* db, const std::string& sql) {
sqlite3_stmt* stmt = nullptr;
std::string explain_sql = "EXPLAIN QUERY PLAN " + sql;
sqlite3_prepare_v2(db, explain_sql.c_str(), -1, &stmt, nullptr);
while (sqlite3_step(stmt) == SQLITE_ROW) {
const char* detail = reinterpret_cast<const char*>(sqlite3_column_text(stmt, 3));
if (detail) std::cout << detail << "\n";
}
sqlite3_finalize(stmt);
}
4. Statistics and cost model
How statistics affect plans
The optimizer estimates cost from table and index statistics. Stale stats yield wrong row estimates and bad joins.
flowchart TB
A[pg_stat_user_tables] --> B[Optimizer]
C[pg_stats] --> B
B --> D[Cost estimate]
D --> E[Plan choice]
Refresh with ANALYZE
-- Whole database
ANALYZE;
-- One table
ANALYZE orders;
-- Specific columns on huge tables
ANALYZE orders (user_id, created_at);
// Run ANALYZE from C++
void refresh_statistics(PGconn* conn) {
PGresult* res = PQexec(conn, "ANALYZE orders");
if (PQresultStatus(res) != PGRES_COMMAND_OK) {
std::cerr << "ANALYZE failed: " << PQerrorMessage(conn) << "\n";
}
PQclear(res);
}
pg_stat catalogs
-- Table scan stats
SELECT relname, seq_scan, seq_tup_read, idx_scan, idx_tup_fetch
FROM pg_stat_user_tables
WHERE relname = 'orders';
-- Per-index stats
SELECT indexrelname, idx_scan, idx_tup_read, idx_tup_fetch
FROM pg_stat_user_indexes
WHERE relname = 'orders';
| Column | Meaning |
|---|---|
| seq_scan | Sequential scan count |
| seq_tup_read | Rows read by seq scan |
| idx_scan | Index scan count |
| idx_tup_fetch | Rows fetched via index |
Monitoring full-scan-heavy tables
-- Tables with many seq scans (candidates for indexes)
SELECT relname, seq_scan, seq_tup_read, idx_scan, idx_tup_fetch,
seq_scan::float / NULLIF(seq_scan + idx_scan, 0) AS seq_ratio
FROM pg_stat_user_tables
WHERE seq_scan + idx_scan > 100
ORDER BY seq_tup_read DESC
LIMIT 10;
-- Unused indexes (candidates to drop)
SELECT schemaname, relname, indexrelname, idx_scan
FROM pg_stat_user_indexes
WHERE idx_scan = 0
AND indexrelname NOT LIKE '%_pkey';
// Log table stats from C++
void log_table_stats(PGconn* conn) {
const char* sql = R"(
SELECT relname, seq_scan, idx_scan, seq_tup_read, idx_tup_fetch
FROM pg_stat_user_tables
WHERE seq_scan > 1000 OR seq_tup_read > 100000
)";
PGresult* res = PQexec(conn, sql);
for (int i = 0; i < PQntuples(res); ++i) {
printf("Table: %s, seq_scan=%s, idx_scan=%s\n",
PQgetvalue(res, i, 0), PQgetvalue(res, i, 1), PQgetvalue(res, i, 2));
}
PQclear(res);
}
Cost model (PostgreSQL)
How PostgreSQL estimates cost:
Sequential scan cost = (pages × seq_page_cost) + (rows × cpu_tuple_cost)
Index scan cost ≈ (index pages × random_page_cost) + (heap pages × random_page_cost) + (rows × cpu_tuple_cost)
Example:
orders: 10M rows, 125K pages
Seq Scan:
= 125,000 × 1.0 + 10,000,000 × 0.01
= 225,000 cost units
Index Scan (user_id = 123, 100 rows):
= (4 index pages × 4.0) + (100 heap pages × 4.0) + (100 × 0.01)
≈ 417 cost units
→ Optimizer prefers index scan
PostgreSQL compares abstract cost units. Defaults:
seq_page_cost = 1.0— sequential page readrandom_page_cost = 4.0— random page (often 1.1–1.5 on SSD)cpu_tuple_cost = 0.01— per-row CPU
SHOW seq_page_cost;
SHOW random_page_cost;
-- SSD tuning example
ALTER SYSTEM SET random_page_cost = 1.1;
Reading a cost line
Limit (cost=0.42..8.44 rows=10 width=40)
cost=0.42..8.44: startup cost .. total costrows=10: estimated rowswidth=40: average row width in bytes
High total cost (e.g. cost=0.00..20834.00) often indicates a large sequential scan.
5. End-to-end query optimization example
Example: ten latest orders per user
Requirement: for 1000 users, fetch the ten most recent orders each, under 3 seconds total.
Step 1: Baseline
-- users: id, name, email, created_at (1M rows)
-- orders: id, user_id, amount, status, created_at (10M rows)
-- Naive approach (N+1)
SELECT * FROM users;
-- loop: SELECT * FROM orders WHERE user_id = ? ORDER BY created_at DESC LIMIT 10
EXPLAIN:
users: seq scan (no helpful index)orders: repeated seq scans withoutuser_idindex
Step 2: Index design
CREATE INDEX idx_users_email ON users(email);
-- Core: user_id equality + created_at order
CREATE INDEX idx_orders_user_created ON orders(user_id, created_at DESC);
Step 3: Remove N+1 — batched ANY / window query
#include <libpq-fe.h>
#include <vector>
#include <string>
#include <unordered_map>
struct Order {
int id, user_id, amount;
std::string status;
};
std::unordered_map<int, std::vector<Order>> get_recent_orders_batch(
PGconn* conn, const std::vector<int>& user_ids, int limit = 10)
{
if (user_ids.empty()) return {};
std::string sql = "SELECT user_id, id, amount, status FROM (";
sql += "SELECT user_id, id, amount, status, "
"ROW_NUMBER() OVER (PARTITION BY user_id ORDER BY created_at DESC) rn ";
sql += "FROM orders WHERE user_id = ANY($1::int[])) sub ";
sql += "WHERE rn <= " + std::to_string(limit);
std::string array_str = "{";
for (size_t i = 0; i < user_ids.size(); ++i) {
if (i > 0) array_str += ",";
array_str += std::to_string(user_ids[i]);
}
array_str += "}";
const char* params[] = {array_str.c_str()};
PGresult* res = PQexecParams(conn, sql.c_str(), 1, nullptr, params, nullptr, nullptr, 0);
std::unordered_map<int, std::vector<Order>> result;
if (PQresultStatus(res) == PGRES_TUPLES_OK) {
for (int i = 0; i < PQntuples(res); ++i) {
Order o;
o.user_id = atoi(PQgetvalue(res, i, 0));
o.id = atoi(PQgetvalue(res, i, 1));
o.amount = atoi(PQgetvalue(res, i, 2));
o.status = PQgetvalue(res, i, 3) ? PQgetvalue(res, i, 3) : "";
result[o.user_id].push_back(o);
}
}
PQclear(res);
return result;
}
Step 4: Verify the plan
EXPLAIN (ANALYZE, BUFFERS)
SELECT user_id, id, amount, status FROM (
SELECT user_id, id, amount, status,
ROW_NUMBER() OVER (PARTITION BY user_id ORDER BY created_at DESC) rn
FROM orders WHERE user_id = ANY(ARRAY[1,2,3,...,1000])
) sub WHERE rn <= 10;
Expect: Index Scan using idx_orders_user_created or Bitmap Heap Scan.
Step 5: Compare approaches
| Approach | Queries | Approx. time |
|---|---|---|
| N+1 (no index) | 1001 | 3200 ms |
| N+1 (with index) | 1001 | 800 ms |
Batch ANY (with index) | 2 | 95 ms |
Step 6: Prepared statements for hot paths
Repeated execution benefits from one-time parse/plan; binding also prevents SQL injection.
// PostgreSQL: PQprepare + PQexecPrepared
#include <libpq-fe.h>
#include <string>
class PreparedQuery {
PGconn* conn_;
std::string name_;
int n_params_;
public:
PreparedQuery(PGconn* conn, const std::string& name,
const std::string& sql, int n_params)
: conn_(conn), name_(name), n_params_(n_params)
{
PGresult* res = PQprepare(conn_, name_.c_str(), sql.c_str(), n_params_, nullptr);
if (PQresultStatus(res) != PGRES_COMMAND_OK) {
PQclear(res);
throw std::runtime_error(PQerrorMessage(conn_));
}
PQclear(res);
}
PGresult* execute(const char* const* params) {
return PQexecPrepared(conn_, name_.c_str(), n_params_, params, nullptr, nullptr, 0);
}
};
// SQLite: sqlite3_prepare_v2 + bind + step in a loop
sqlite3_stmt* stmt = nullptr;
sqlite3_prepare_v2(db, "SELECT id, name FROM users WHERE email = ?", -1, &stmt, nullptr);
for (const auto& email : emails) {
sqlite3_bind_text(stmt, 1, email.c_str(), -1, SQLITE_TRANSIENT);
while (sqlite3_step(stmt) == SQLITE_ROW) {
// handle row
}
sqlite3_reset(stmt);
}
sqlite3_finalize(stmt);
Step 7: Single JOIN alternative
You can fetch users and ranked orders in one query.
WITH ranked AS (
SELECT o.*, ROW_NUMBER() OVER (PARTITION BY o.user_id ORDER BY o.created_at DESC) rn
FROM orders o
JOIN users u ON o.user_id = u.id
WHERE u.created_at > '2025-01-01'
)
SELECT u.id, u.name, r.id AS order_id, r.amount, r.status
FROM users u
JOIN ranked r ON u.id = r.user_id AND r.rn <= 10
ORDER BY u.id, r.rn;
// Process JOIN result in C++
std::vector<UserWithOrders> get_users_with_orders_join(PGconn* conn) {
const char* sql = R"(
WITH ranked AS (
SELECT o.user_id, o.id AS order_id, o.amount, o.status,
ROW_NUMBER() OVER (PARTITION BY o.user_id ORDER BY o.created_at DESC) rn
FROM orders o JOIN users u ON o.user_id = u.id
)
SELECT u.id, u.name, r.order_id, r.amount, r.status
FROM users u
JOIN ranked r ON u.id = r.user_id AND r.rn <= 10
ORDER BY u.id, r.rn
)";
PGresult* res = PQexec(conn, sql);
std::vector<UserWithOrders> result;
int current_user_id = -1;
UserWithOrders* current = nullptr;
for (int i = 0; i < PQntuples(res); ++i) {
int uid = atoi(PQgetvalue(res, i, 0));
if (uid != current_user_id) {
result.push_back({uid, PQgetvalue(res, i, 1), {}});
current = &result.back();
current_user_id = uid;
}
if (current)
current->orders.push_back({atoi(PQgetvalue(res, i, 2)),
atoi(PQgetvalue(res, i, 3)),
PQgetvalue(res, i, 4)});
}
PQclear(res);
return result;
}
Step 8: Plan comparison workflow
1. Run EXPLAIN (ANALYZE, BUFFERS) on the query
2. If Seq Scan → check Rows Removed by Filter; consider indexes
3. If Index Scan → high shared hit ratio means cache friendly
4. Compare cost vs actual time — cost is estimate, actual is truth
5. Nested loop vs hash join — depends on cardinality and stats
-- Compare plans before/after ANALYZE
EXPLAIN (ANALYZE, BUFFERS, FORMAT JSON)
SELECT u.id, u.name, COUNT(o.id)
FROM users u
JOIN orders o ON u.id = o.user_id
GROUP BY u.id, u.name;
6. Common mistakes and fixes
1. N+1 queries
Symptom: list API > 3 seconds.
Cause: query inside loop.
Fix: JOIN or batched IN / ANY.
// Bad
for (auto& user : users) {
auto orders = query_orders(conn, user.id);
}
2. Full table scan
Symptom: Seq Scan over millions of rows.
Cause: no index on WHERE columns.
CREATE INDEX idx_orders_user_id ON orders(user_id);
3. Functions prevent index use
Symptom: index exists but plan shows seq scan.
SELECT * FROM users WHERE email = '[email protected]';
CREATE INDEX idx_users_email_lower ON users (LOWER(email));
4. Wrong composite order
Symptom: index (created_at, user_id) but query filters only user_id.
Fix: (user_id, created_at).
DROP INDEX idx_orders_created_user;
CREATE INDEX idx_orders_user_created ON orders(user_id, created_at);
5. Connection leak
Symptom: too many connections.
Fix: RAII guards / pool release.
PgConnectionGuard guard(pool);
PGconn* conn = guard.get();
6. SQL injection
Symptom: hostile input exfiltrates data.
// Never
std::string sql = "SELECT * FROM users WHERE id = " + user_input;
// Always bind
PQexecParams(conn, "SELECT * FROM users WHERE id = $1::int", 1, nullptr, params, nullptr, nullptr, 0);
7. Stale statistics
Symptom: wrong join after bulk load.
Fix: ANALYZE.
8. SELECT *
Symptom: wide rows, no covering index benefit.
SELECT id, amount, status FROM orders WHERE user_id = 123;
9. Too many indexes
Symptom: slow writes.
Fix: index only query patterns you run; drop unused indexes.
10. Cache not invalidated
Symptom: stale reads.
Fix: invalidate or update cache on writes.
void update_user(..., LruCache<int, User>& cache, ...) {
cache.put(id, new_user);
}
11. No prepared statements
Symptom: high CPU on repeated queries.
Fix: PQprepare / PQexecPrepared or long-lived sqlite3_stmt*.
12. Huge IN lists
Symptom: thousands of literals—bad plans or errors.
Fix: chunk (e.g. 500–1000) or temp table + JOIN.
const size_t BATCH = 500;
for (size_t i = 0; i < ids.size(); i += BATCH) {
auto end = std::min(i + BATCH, ids.size());
query_batch(conn, ids.data() + i, end - i);
}
13. Batch INSERT without a transaction
Symptom: 10k inserts in 30+ seconds.
Fix: BEGIN/COMMIT or COPY.
PQexec(conn, "BEGIN");
for (const auto& row : rows) {
PQexecParams(conn, "INSERT INTO logs (...) VALUES ($1, $2)", ...);
}
PQexec(conn, "COMMIT");
7. Best practices
Index design
- Index
WHERE,JOIN, andORDER BYcolumns as needed - Composite: most selective column first
- Use covering /
INCLUDEfor index-only scans when worthwhile - Drop unused indexes
Query style
- Avoid
SELECT *; project needed columns - Use parameter binding and prepared statements
- Avoid N+1 (JOIN, batching)
- Use
LIMITon large result sets
Statistics
ANALYZEafter bulk changes; schedule periodic analyze- Watch
pg_stat_user_tables
Plans
- After index changes, run
EXPLAIN ANALYZE - Confirm seq → index transition
- Compare estimated vs actual rows
Connections
- Use a pool (typical sizes 10–20)
- RAII for acquire/release
Monitoring
- Log slow queries (e.g. > 100 ms)
- Inspect
pg_stat_activity - Set statement timeouts
Index decision flow
flowchart TD
A[Analyze query] --> B{WHERE shape?}
B -->|Single equality| C[Single B-tree]
B -->|Equality + range| D[Composite: equality first]
B -->|Range only| E[Single or partial index]
C --> F[Most selective column]
D --> F
E --> F
F --> G{Can cover SELECT?}
G -->|Yes| H[Covering / INCLUDE]
G -->|No| I[Non-covering index]
N+1 alternatives
| Pattern | When | Pros | Cons |
|---|---|---|---|
| JOIN | 1:N, one round trip | Single query | Wider rows, grouping in app |
IN batch | Many IDs | Flexible | Limit IN size |
| Lazy load | Rare access | Fast first paint | Easy to N+1 |
| Lazy + cache | Hot reads | Fast on hit | Invalidation complexity |
8. Production patterns
Pool + prepared statements
struct DbContext {
PgConnectionPool& pool;
PgConnectionGuard guard;
PGconn* conn;
PreparedStatementCache& stmt_cache;
explicit DbContext(PgConnectionPool& p, PreparedStatementCache& c)
: pool(p), guard(p), conn(guard.get()), stmt_cache(c) {}
};
Read/write split
flowchart TB
App[App] --> Router[Router]
Router -->|SELECT| Replica[Replica]
Router -->|INSERT/UPDATE| Primary[Primary]
Primary -.->|Replication| Replica
class ReadWriteSplit {
PgConnectionPool* primary_;
PgConnectionPool* replica_;
public:
PGconn* acquire_read() { return replica_->acquire(); }
PGconn* acquire_write() { return primary_->acquire(); }
};
Batched INSERT
PQexec(conn, "BEGIN");
for (size_t i = 0; i < batch.size(); i += 100) {
batch_insert(conn, batch, i, std::min(i + 100, batch.size()));
}
PQexec(conn, "COMMIT");
Statement timeout
PQexec(conn, "SET statement_timeout = 5000"); // 5s
Slow query logging
auto start = std::chrono::steady_clock::now();
PGresult* res = PQexecParams(conn, sql.c_str(), ...);
auto elapsed = std::chrono::duration_cast<std::chrono::milliseconds>(
std::chrono::steady_clock::now() - start).count();
if (elapsed > 100) {
log_warn("Slow query: {}ms - {}", elapsed, sql);
}
Scheduled ANALYZE
-- pg_cron or OS cron
-- 0 3 * * * psql -c "ANALYZE"
Prepared statement cache
class PreparedStatementCache {
PGconn* conn_;
std::unordered_map<std::string, std::string> stmts_;
public:
explicit PreparedStatementCache(PGconn* conn) : conn_(conn) {}
PGresult* execute(const std::string& name, const std::string& sql,
int n_params, const char* const* params) {
if (stmts_.find(name) == stmts_.end()) {
std::string prepare_sql = "PREPARE " + name + " AS " + sql;
PQexec(conn_, prepare_sql.c_str());
stmts_[name] = sql;
}
return PQexecPrepared(conn_, name.c_str(), n_params, params, nullptr, nullptr, 0);
}
};
COPY for bulk load
void bulk_insert_logs(PGconn* conn, const std::vector<Log>& logs) {
PQexec(conn, "BEGIN");
PGresult* res = PQexec(conn, "COPY logs (ts, msg, level) FROM STDIN WITH (FORMAT text)");
if (PQresultStatus(res) != PGRES_COPY_IN) {
PQclear(res);
return;
}
PQclear(res);
for (const auto& log : logs) {
std::string row = log.ts + "\t" + log.msg + "\t" + log.level + "\n";
PQputCopyData(conn, row.c_str(), static_cast<int>(row.size()));
}
PQputCopyEnd(conn, nullptr);
PQgetResult(conn);
PQexec(conn, "COMMIT");
}
Naive SQL-based routing
PGconn* acquire_for_query(PgConnectionPool* primary, PgConnectionPool* replica,
const std::string& sql) {
return (sql.find("SELECT") == 0) ? replica->acquire() : primary->acquire();
}
9. Checklist
Indexes
- Index
WHEREcolumns - Index join keys
- Consider
ORDER BYin composite indexes - Confirm index use with
EXPLAIN - Functional predicates → expression index or rewrite query
Queries
- Prepared statements / binding
- No N+1 (JOIN or batch)
- Project only needed columns
-
LIMITwhere appropriate
Statistics
-
ANALYZEafter large changes - Monitor
pg_stat_*
Connections
- Pooling
- RAII guards
- No leaks
Security and ops
- No string concatenation for values
- Query timeouts
- Slow query logs
10. Summary
| Topic | Takeaway |
|---|---|
| Indexes | B-tree on WHERE / JOIN / ORDER BY; composite order by selectivity |
| Plans | EXPLAIN ANALYZE to move seq scan → index scan |
| Statistics | ANALYZE for accurate estimates |
| Cost | Interpret cost, rows; lower random_page_cost on SSD |
| N+1 | JOIN or batch to cut round trips |
| Prepared | Less parse CPU; safer binding |
| Production | Pooling, replicas, batch insert, COPY, timeouts, monitoring |
One line: diagnose with scenarios, then apply index choice, plans, statistics, and cost awareness to turn multi-second queries into sub-100 ms where possible.
FAQ (expanded)
Q. When do I apply this at work?
A. When production queries exceed your latency budget—use scenarios, index rules, EXPLAIN, statistics, pitfalls, and patterns from this article with your own measurements.
Q. What should I read first?
A. Follow previous post links at the bottom of each article or use the C++ series index for order.
Q. Where can I go deeper?
A. cppreference and your driver/database docs. Official references below help too.
Next post: Database query optimization #51-8 — prepared statements, pooling, caching in depth.
References
Related reading
- C++ database query optimization #51-8
- C++ database integration #31-3
- C++ database engine #50-4
Keywords
C++, database, query optimization, index, query plan, EXPLAIN, PostgreSQL, SQLite
More from the blog
- C++ series
- C++ Adapter pattern
- C++ ADL
- C++ aggregate initialization