C++ DB Engine Fundamentals: Storage Engine, Query Parser, Executor & Transactions [#49-1]
이 글의 핵심
A database combines a storage engine, query parser, executor, and transaction layer. Even when using SQLite or MySQL, understanding internals helps answer why indexes matter and how transactions preserve atomicity. This article walks through real scenarios, end-to-end C++ sketches (pager, parser, executor, WAL), common pitfalls, and production patterns.
Introduction: “Why does finding one key in a million rows take three seconds?”
Why DB engine fundamentals
A database is a system that combines a storage engine, query parser, executor, and transactions. Even if you only use SQLite or MySQL, without understanding internals it is hard to answer “why do we need indexes?” or “how does a transaction guarantee atomicity?”. This article covers realistic scenarios, complete C++-style examples (storage engine, parser, executor, transactions), common mistakes, and production patterns from a practical angle. Topics covered:
- Scenarios: full-scan bottlenecks, failed rollbacks, parser errors, and more
- Storage engine: page-based pager, B-Tree index
- Query parser: SELECT/INSERT parsing
- Executor: running queries and returning rows
- Transactions: WAL and ACID guarantees
- Common pitfalls: cache blowups, corrupted WAL
- Production patterns: LRU cache, checkpointing, backups See also: Database connectivity, Binary serialization.
A mental model
A DB engine maps well to a library. The storage engine is bookshelves (pages); a B-Tree index is the card catalog; the executor is the procedure for locating and reading a book. Scanning every row without an index is like walking every shelf from end to end—cost grows with table size.
From production experience: this article is grounded in large-scale C++ work—real failures and fixes that rarely appear in textbooks, including debugging traps and practical tips.
Table of contents
- Problem scenarios
- DB engine architecture
- Storage engine
- Query parser
- Executor
- Transaction
- End-to-end DB engine example
- Common mistakes and fixes
- Best practices
- Production patterns
- Summary and checklist
1. Problem scenarios
Scenario 1: “Finding id=12345 in a million rows takes three seconds”
Situation: A users table has one million rows and SELECT * FROM users WHERE id = 12345 takes about three seconds.
Cause: Full table scan. With no index, every row is checked sequentially—O(N) complexity.
Fix: Build a B-Tree index on id to get O(log N) lookups, typically completing in milliseconds.
Scenario 2: “If one of ten INSERTs fails, I need to roll back all of them”
Situation: During a batch INSERT, the fifth row violates a constraint. What happens to the four rows already inserted?
Cause: Without a transaction, each INSERT may commit independently—partial commits with no way to roll back.
Fix: BEGIN → ten INSERTs → COMMIT, or ROLLBACK on error. Use WAL (write-ahead logging) to enforce atomicity.
Scenario 3: “Parsing SELECT * FROM users WHERE name = 'Alice' fails”
Situation: A simple SQL string fails with something like “Expected FROM”.
Cause: Fragile handling of case, whitespace, and quotes—inputs like "select * from users" break naive parsers.
Fix: In the tokenizer, normalize (e.g., lowercase keywords, trim) and treat quoted strings as single tokens.
Scenario 4: “The DB file is 10GB but only 2GB RAM is available”
Situation: The pager tries to cache every page and the process runs out of memory.
Cause: Unbounded cache—every get_page() adds a page with no eviction.
Fix: An LRU cache with a cap such as MAX_CACHED_PAGES (e.g., 1000); flush dirty pages before evicting.
Scenario 5: “After a crash, half the data disappears”
Situation: Power loss during commit; after restart, only some changes are visible. Cause: Crash before fsync completes on the WAL, or incorrect recovery ordering. Fix: Treat a commit as complete only after WAL records are fsynced; on recovery, REDO the WAL in order.
Scenario 6: “Two clients update the same row at once”
Situation: Clients A and B both run UPDATE users SET balance = balance - 100. The final balance is wrong.
Cause: No concurrency control—both transactions read, modify, and commit, and one update is lost.
Fix: Locks or MVCC. This article focuses on single-threaded behavior; concurrency is mentioned under production patterns.
Scenario diagram
flowchart TB
subgraph Problems[Production scenarios]
P1[Full scan 3s]
P2[Partial commit no rollback]
P3[SQL parse failure]
P4[Cache OOM]
P5[Data loss after crash]
end
subgraph Solutions[Directions]
S1[B-Tree index]
S2[WAL transactions]
S3[Tokenizer normalization]
S4[LRU page cache]
S5[WAL fsync + REDO]
end
P1 --> S1
P2 --> S2
P3 --> S3
P4 --> S4
P5 --> S5
2. DB engine architecture
Big picture
flowchart TB
subgraph Client[Client]
C1["SQL string"]
end
subgraph Engine[DB engine]
subgraph Parser[Query parser]
P1[Tokenizer]
P2[Syntax analysis]
end
subgraph Executor[Executor]
E1[Execution plan]
E2[Operators]
end
subgraph Storage[Storage engine]
S1[Pager]
S2[B-Tree]
end
subgraph Txn[Transaction]
T1[WAL]
end
end
C1 --> Parser
Parser --> Executor
Executor --> Storage
Storage --> Txn
Component roles
| Component | Role | Analogy |
|---|---|---|
| Query parser | SQL string → syntax tree (AST) | Translator |
| Executor | AST → actual work (scan, filter, insert) | Worker |
| Storage engine | Page I/O, B-Tree index | Warehouse |
| Transaction | WAL, commit/rollback | Ledger |
3. Storage engine
Why page-based storage
Disk I/O is most efficient in page-sized chunks. Using 4KB pages aligns with OS page cache and reduces random I/O.
Pager: page cache
// pager.hpp
#pragma once
#include <fstream>
#include <unordered_map>
#include <memory>
#include <cstring>
#include <cstdint>
constexpr size_t PAGE_SIZE = 4096;
struct Page {
uint32_t page_id;
uint8_t data[PAGE_SIZE];
bool dirty = false;
};
class Pager {
std::string filename_;
std::fstream file_;
std::unordered_map<uint32_t, std::unique_ptr<Page>> cache_;
uint32_t num_pages_ = 0;
public:
explicit Pager(const std::string& filename) : filename_(filename) {
file_.open(filename, std::ios::in | std::ios::out | std::ios::binary);
if (!file_) {
file_.open(filename, std::ios::out | std::ios::binary);
file_.close();
file_.open(filename, std::ios::in | std::ios::out | std::ios::binary);
}
file_.seekg(0, std::ios::end);
auto size = file_.tellg();
num_pages_ = static_cast<uint32_t>(size) / PAGE_SIZE;
}
Page* get_page(uint32_t page_id) {
auto it = cache_.find(page_id);
if (it != cache_.end()) {
return it->second.get();
}
auto page = std::make_unique<Page>();
page->page_id = page_id;
if (page_id < num_pages_) {
file_.seekg(static_cast<std::streamoff>(page_id) * PAGE_SIZE);
file_.read(reinterpret_cast<char*>(page->data), PAGE_SIZE);
} else {
std::memset(page->data, 0, PAGE_SIZE);
num_pages_ = page_id + 1;
}
auto* ptr = page.get();
cache_[page_id] = std::move(page);
return ptr;
}
void mark_dirty(uint32_t page_id) {
auto it = cache_.find(page_id);
if (it != cache_.end()) {
it->second->dirty = true;
}
}
void flush_page(uint32_t page_id) {
auto it = cache_.find(page_id);
if (it == cache_.end() || !it->second->dirty) return;
file_.seekp(static_cast<std::streamoff>(page_id) * PAGE_SIZE);
file_.write(reinterpret_cast<const char*>(it->second->data), PAGE_SIZE);
file_.flush();
it->second->dirty = false;
}
void flush_all() {
for (auto& [id, page] : cache_) {
if (page->dirty) flush_page(id);
}
}
uint32_t allocate_page() {
return num_pages_++;
}
~Pager() {
flush_all();
}
};
Key points:
get_page(): on cache miss, load from disk; new pages are zero-filled.mark_dirty(): marks modified pages.flush_all(): writes all dirty pages at commit time.
B-Tree node (simplified)
// btree.hpp
#pragma once
#include "pager.hpp"
#include <vector>
#include <optional>
#include <algorithm>
#include <cstring>
struct BTreeNode {
bool is_leaf;
uint32_t num_keys;
std::vector<int32_t> keys;
std::vector<uint32_t> children;
std::vector<std::vector<uint8_t>> values;
static constexpr size_t MAX_KEYS = 100;
void serialize(uint8_t* buf) const {
size_t offset = 0;
std::memcpy(buf + offset, &is_leaf, sizeof(is_leaf));
offset += sizeof(is_leaf);
std::memcpy(buf + offset, &num_keys, sizeof(num_keys));
offset += sizeof(num_keys);
for (uint32_t i = 0; i < num_keys; ++i) {
std::memcpy(buf + offset, &keys[i], sizeof(keys[i]));
offset += sizeof(keys[i]);
}
if (!is_leaf) {
for (uint32_t i = 0; i <= num_keys; ++i) {
std::memcpy(buf + offset, &children[i], sizeof(children[i]));
offset += sizeof(children[i]);
}
} else {
for (uint32_t i = 0; i < num_keys; ++i) {
uint16_t sz = static_cast<uint16_t>(values[i].size());
std::memcpy(buf + offset, &sz, sizeof(sz));
offset += sizeof(sz);
std::memcpy(buf + offset, values[i].data(), sz);
offset += sz;
}
}
}
static BTreeNode deserialize(const uint8_t* buf) {
BTreeNode node;
size_t offset = 0;
std::memcpy(&node.is_leaf, buf + offset, sizeof(node.is_leaf));
offset += sizeof(node.is_leaf);
std::memcpy(&node.num_keys, buf + offset, sizeof(node.num_keys));
offset += sizeof(node.num_keys);
node.keys.resize(node.num_keys);
for (uint32_t i = 0; i < node.num_keys; ++i) {
std::memcpy(&node.keys[i], buf + offset, sizeof(node.keys[i]));
offset += sizeof(node.keys[i]);
}
if (!node.is_leaf) {
node.children.resize(node.num_keys + 1);
for (uint32_t i = 0; i <= node.num_keys; ++i) {
std::memcpy(&node.children[i], buf + offset, sizeof(node.children[i]));
offset += sizeof(node.children[i]);
}
} else {
node.values.resize(node.num_keys);
for (uint32_t i = 0; i < node.num_keys; ++i) {
uint16_t sz;
std::memcpy(&sz, buf + offset, sizeof(sz));
offset += sizeof(sz);
node.values[i].resize(sz);
std::memcpy(node.values[i].data(), buf + offset, sz);
offset += sz;
}
}
return node;
}
};
class BTree {
Pager& pager_;
uint32_t root_page_id_;
public:
BTree(Pager& pager, uint32_t root_page_id)
: pager_(pager), root_page_id_(root_page_id) {}
std::optional<std::vector<uint8_t>> search(int32_t key) {
return search_recursive(root_page_id_, key);
}
void insert(int32_t key, const std::vector<uint8_t>& value) {
auto* root_page = pager_.get_page(root_page_id_);
auto root = BTreeNode::deserialize(root_page->data);
if (root.num_keys >= BTreeNode::MAX_KEYS) {
uint32_t new_root = pager_.allocate_page();
split_root(root_page_id_, new_root);
root_page_id_ = new_root;
}
insert_non_full(root_page_id_, key, value);
}
private:
std::optional<std::vector<uint8_t>> search_recursive(uint32_t page_id, int32_t key) {
auto* page = pager_.get_page(page_id);
auto node = BTreeNode::deserialize(page->data);
auto it = std::lower_bound(node.keys.begin(), node.keys.end(), key);
size_t pos = it - node.keys.begin();
if (it != node.keys.end() && *it == key) {
if (node.is_leaf) return node.values[pos];
return search_recursive(node.children[pos + 1], key);
}
if (node.is_leaf) return std::nullopt;
return search_recursive(node.children[pos], key);
}
void split_root(uint32_t old_root, uint32_t new_root) {
(void)old_root;
(void)new_root;
}
void insert_non_full(uint32_t page_id, int32_t key, const std::vector<uint8_t>& value) {
auto* page = pager_.get_page(page_id);
auto node = BTreeNode::deserialize(page->data);
if (node.is_leaf) {
auto it = std::lower_bound(node.keys.begin(), node.keys.end(), key);
size_t pos = it - node.keys.begin();
node.keys.insert(it, key);
node.values.insert(node.values.begin() + pos, value);
node.num_keys++;
node.serialize(page->data);
pager_.mark_dirty(page_id);
} else {
auto it = std::lower_bound(node.keys.begin(), node.keys.end(), key);
size_t pos = it - node.keys.begin();
insert_non_full(node.children[pos], key, value);
}
}
};
4. Query parser
Tokenizer and syntax analysis
// parser.hpp
#pragma once
#include <string>
#include <vector>
#include <optional>
#include <sstream>
#include <algorithm>
#include <cctype>
enum class TokenType { SELECT, INSERT, INTO, VALUES, FROM, WHERE, IDENT, NUMBER, STRING, COMMA, LPAREN, RPAREN, EQ, SEMICOLON, END };
struct Token {
TokenType type;
std::string value;
};
class Tokenizer {
std::string input_;
size_t pos_ = 0;
char peek() const {
return pos_ < input_.size() ? input_[pos_] : '\0';
}
char consume() {
return pos_ < input_.size() ? input_[pos_++] : '\0';
}
void skip_whitespace() {
while (std::isspace(static_cast<unsigned char>(peek()))) consume();
}
public:
explicit Tokenizer(const std::string& input) : input_(input) {}
Token next() {
skip_whitespace();
if (pos_ >= input_.size()) return {TokenType::END, ""};
size_t start = pos_;
if (std::isalpha(static_cast<unsigned char>(peek())) || peek() == '_') {
while (std::isalnum(static_cast<unsigned char>(peek()))) consume();
std::string word = input_.substr(start, pos_ - start);
std::transform(word.begin(), word.end(), word.begin(), ::tolower);
if (word == "select") return {TokenType::SELECT, word};
if (word == "insert") return {TokenType::INSERT, word};
if (word == "into") return {TokenType::INTO, word};
if (word == "values") return {TokenType::VALUES, word};
if (word == "from") return {TokenType::FROM, word};
if (word == "where") return {TokenType::WHERE, word};
return {TokenType::IDENT, word};
}
if (std::isdigit(static_cast<unsigned char>(peek()))) {
while (std::isdigit(static_cast<unsigned char>(peek()))) consume();
return {TokenType::NUMBER, input_.substr(start, pos_ - start)};
}
if (peek() == '\' || peek() == '"') {
char quote = consume();
start = pos_;
while (peek() != quote && peek() != '\0') consume();
std::string s = input_.substr(start, pos_ - start);
if (peek() == quote) consume();
return {TokenType::STRING, s};
}
if (peek() == ',') { consume(); return {TokenType::COMMA, ","}; }
if (peek() == '(') { consume(); return {TokenType::LPAREN, "("}; }
if (peek() == ')') { consume(); return {TokenType::RPAREN, ")"}; }
if (peek() == '=') { consume(); return {TokenType::EQ, "="}; }
if (peek() == ';') { consume(); return {TokenType::SEMICOLON, ";"}; }
consume();
return {TokenType::END, ""};
}
};
struct SelectStmt {
std::vector<std::string> columns;
std::string table_name;
std::optional<std::string> where_column;
std::optional<std::string> where_value;
};
struct InsertStmt {
std::string table_name;
std::vector<std::string> values;
};
class SQLParser {
Tokenizer tokenizer_;
Token current_;
void advance() { current_ = tokenizer_.next(); }
bool check(TokenType t) const { return current_.type == t; }
void expect(TokenType t) {
if (!check(t)) throw std::runtime_error("Expected token, got: " + current_.value);
advance();
}
public:
explicit SQLParser(const std::string& sql) : tokenizer_(sql) {
advance();
}
SelectStmt parse_select() {
SelectStmt stmt;
expect(TokenType::SELECT);
while (!check(TokenType::FROM)) {
if (current_.type == TokenType::IDENT) {
stmt.columns.push_back(current_.value);
advance();
}
if (check(TokenType::COMMA)) advance();
}
expect(TokenType::FROM);
stmt.table_name = current_.value;
expect(TokenType::IDENT);
if (check(TokenType::WHERE)) {
advance();
stmt.where_column = current_.value;
expect(TokenType::IDENT);
expect(TokenType::EQ);
stmt.where_value = current_.value;
if (current_.type == TokenType::STRING) advance();
else expect(TokenType::NUMBER);
}
return stmt;
}
InsertStmt parse_insert() {
InsertStmt stmt;
expect(TokenType::INSERT);
expect(TokenType::INTO);
stmt.table_name = current_.value;
expect(TokenType::IDENT);
expect(TokenType::VALUES);
expect(TokenType::LPAREN);
while (!check(TokenType::RPAREN)) {
if (current_.type == TokenType::NUMBER)
stmt.values.push_back(current_.value);
else if (current_.type == TokenType::STRING)
stmt.values.push_back(current_.value);
advance();
if (check(TokenType::COMMA)) advance();
}
expect(TokenType::RPAREN);
return stmt;
}
bool is_select() const { return current_.type == TokenType::SELECT; }
bool is_insert() const { return current_.type == TokenType::INSERT; }
};
Key points:
- Normalization: lowercasing keywords so
SELECTandselectboth work. - Quoted strings:
'Alice'becomes a single STRING token. expect(): throws with context when the next token is wrong.
5. Executor
SELECT / INSERT execution
// executor.hpp
#pragma once
#include "parser.hpp"
#include "pager.hpp"
#include "btree.hpp"
#include <functional>
#include <vector>
#include <string>
#include <unordered_map>
struct Schema {
std::string table_name;
std::vector<std::pair<std::string, std::string>> columns;
};
class Executor {
Pager& pager_;
std::unordered_map<std::string, std::pair<Schema, uint32_t>> tables_;
public:
explicit Executor(Pager& pager) : pager_(pager) {}
void create_table(const Schema& schema) {
uint32_t root = pager_.allocate_page();
tables_[schema.table_name] = {schema, root};
}
std::vector<std::vector<std::string>> execute_select(const SelectStmt& stmt) {
std::vector<std::vector<std::string>> results;
auto it = tables_.find(stmt.table_name);
if (it == tables_.end()) throw std::runtime_error("Table not found: " + stmt.table_name);
const auto& [schema, root_id] = it->second;
BTree btree(pager_, root_id);
if (stmt.where_column && stmt.where_value) {
int32_t key = std::stoi(*stmt.where_value);
auto opt = btree.search(key);
if (opt) {
auto row = deserialize_row(*opt, schema);
results.push_back(row);
}
} else {
scan_all(btree, schema, [&](const std::vector<std::string>& row) {
results.push_back(row);
});
}
return results;
}
void execute_insert(const InsertStmt& stmt) {
auto it = tables_.find(stmt.table_name);
if (it == tables_.end()) throw std::runtime_error("Table not found: " + stmt.table_name);
const auto& [schema, root_id] = it->second;
BTree btree(pager_, root_id);
int32_t key = std::stoi(stmt.values[0]);
std::vector<uint8_t> row_data = serialize_row(stmt.values, schema);
btree.insert(key, row_data);
}
private:
std::vector<uint8_t> serialize_row(const std::vector<std::string>& values, const Schema& schema) {
std::vector<uint8_t> data;
for (size_t i = 0; i < schema.columns.size() && i < values.size(); ++i) {
const auto& [name, type] = schema.columns[i];
const auto& val = values[i];
(void)name;
if (type == "int") {
int32_t num = std::stoi(val);
data.insert(data.end(),
reinterpret_cast<const uint8_t*>(&num),
reinterpret_cast<const uint8_t*>(&num) + sizeof(num));
} else {
uint16_t len = static_cast<uint16_t>(val.size());
data.insert(data.end(),
reinterpret_cast<const uint8_t*>(&len),
reinterpret_cast<const uint8_t*>(&len) + sizeof(len));
data.insert(data.end(), val.begin(), val.end());
}
}
return data;
}
std::vector<std::string> deserialize_row(const std::vector<uint8_t>& data, const Schema& schema) {
std::vector<std::string> row;
size_t offset = 0;
for (const auto& [name, type] : schema.columns) {
(void)name;
if (type == "int") {
int32_t num;
std::memcpy(&num, data.data() + offset, sizeof(num));
row.push_back(std::to_string(num));
offset += sizeof(num);
} else {
uint16_t len;
std::memcpy(&len, data.data() + offset, sizeof(len));
offset += sizeof(len);
row.push_back(std::string(data.begin() + offset, data.begin() + offset + len));
offset += len;
}
}
return row;
}
void scan_all(BTree& btree, const Schema& schema,
const std::function<void(const std::vector<std::string>&)>& callback) {
(void)btree;
(void)schema;
(void)callback;
}
};
6. Transaction
WAL (write-ahead logging)
// transaction.hpp
#pragma once
#include "pager.hpp"
#include <fstream>
#include <vector>
#include <cstring>
#include <cstdio> // EOF
struct WALEntry {
enum Type { INSERT, UPDATE, DELETE } type;
uint32_t page_id;
std::vector<uint8_t> before_image;
std::vector<uint8_t> after_image;
};
class TransactionManager {
std::string wal_path_;
std::fstream wal_file_;
std::vector<WALEntry> current_txn_;
bool in_transaction_ = false;
public:
explicit TransactionManager(const std::string& db_path)
: wal_path_(db_path + ".wal") {
wal_file_.open(wal_path_, std::ios::in | std::ios::out | std::ios::binary | std::ios::app);
if (!wal_file_) {
wal_file_.open(wal_path_, std::ios::out | std::ios::binary);
wal_file_.close();
wal_file_.open(wal_path_, std::ios::in | std::ios::out | std::ios::binary | std::ios::app);
}
}
void begin() {
if (in_transaction_) throw std::runtime_error("Transaction already in progress");
current_txn_.clear();
in_transaction_ = true;
}
void log_page_change(uint32_t page_id,
const uint8_t* before, const uint8_t* after) {
WALEntry entry;
entry.type = WALEntry::UPDATE;
entry.page_id = page_id;
entry.before_image.assign(before, before + PAGE_SIZE);
entry.after_image.assign(after, after + PAGE_SIZE);
current_txn_.push_back(entry);
}
void commit(Pager& pager) {
if (!in_transaction_) return;
for (const auto& e : current_txn_) {
write_entry(e);
}
wal_file_.flush();
pager.flush_all();
current_txn_.clear();
in_transaction_ = false;
}
void rollback(Pager& pager) {
if (!in_transaction_) return;
for (auto it = current_txn_.rbegin(); it != current_txn_.rend(); ++it) {
auto* page = pager.get_page(it->page_id);
std::memcpy(page->data, it->before_image.data(), PAGE_SIZE);
pager.mark_dirty(it->page_id);
}
current_txn_.clear();
in_transaction_ = false;
}
void recover(Pager& pager) {
wal_file_.seekg(0, std::ios::beg);
while (wal_file_.good() && wal_file_.peek() != EOF) {
auto entry = read_entry();
auto* page = pager.get_page(entry.page_id);
std::memcpy(page->data, entry.after_image.data(), PAGE_SIZE);
pager.mark_dirty(entry.page_id);
}
pager.flush_all();
}
bool is_in_transaction() const { return in_transaction_; }
private:
void write_entry(const WALEntry& e) {
uint8_t t = static_cast<uint8_t>(e.type);
wal_file_.write(reinterpret_cast<const char*>(&t), sizeof(t));
wal_file_.write(reinterpret_cast<const char*>(&e.page_id), sizeof(e.page_id));
uint32_t before_sz = static_cast<uint32_t>(e.before_image.size());
wal_file_.write(reinterpret_cast<const char*>(&before_sz), sizeof(before_sz));
wal_file_.write(reinterpret_cast<const char*>(e.before_image.data()), before_sz);
uint32_t after_sz = static_cast<uint32_t>(e.after_image.size());
wal_file_.write(reinterpret_cast<const char*>(&after_sz), sizeof(after_sz));
wal_file_.write(reinterpret_cast<const char*>(e.after_image.data()), after_sz);
}
WALEntry read_entry() {
WALEntry e;
uint8_t t;
wal_file_.read(reinterpret_cast<char*>(&t), sizeof(t));
e.type = static_cast<WALEntry::Type>(t);
wal_file_.read(reinterpret_cast<char*>(&e.page_id), sizeof(e.page_id));
uint32_t before_sz, after_sz;
wal_file_.read(reinterpret_cast<char*>(&before_sz), sizeof(before_sz));
e.before_image.resize(before_sz);
wal_file_.read(reinterpret_cast<char*>(e.before_image.data()), before_sz);
wal_file_.read(reinterpret_cast<char*>(&after_sz), sizeof(after_sz));
e.after_image.resize(after_sz);
wal_file_.read(reinterpret_cast<char*>(e.after_image.data()), after_sz);
return e;
}
};
Key points:
- Write-ahead: log changes before they are durable on the main file.
- Rollback: restore pages from
before_image. - Recovery: REDO the WAL from start to finish.
7. End-to-end DB engine example
Database class
// database.hpp
#pragma once
#include "pager.hpp"
#include "parser.hpp"
#include "executor.hpp"
#include "transaction.hpp"
#include <iostream>
#include <stdexcept>
class Database {
Pager pager_;
TransactionManager txn_mgr_;
Executor executor_;
public:
explicit Database(const std::string& path)
: pager_(path),
txn_mgr_(path),
executor_(pager_) {
txn_mgr_.recover(pager_);
}
void execute(const std::string& sql) {
SQLParser parser(sql);
if (parser.is_select()) {
auto stmt = parser.parse_select();
auto results = executor_.execute_select(stmt);
for (const auto& row : results) {
for (const auto& col : row) std::cout << col << " ";
std::cout << "\n";
}
} else if (parser.is_insert()) {
auto stmt = parser.parse_insert();
executor_.execute_insert(stmt);
} else {
throw std::runtime_error("Unknown statement type");
}
}
void begin() { txn_mgr_.begin(); }
void commit() { txn_mgr_.commit(pager_); }
void rollback() { txn_mgr_.rollback(pager_); }
void create_table(const Schema& schema) {
executor_.create_table(schema);
}
};
Example main
// main.cpp
#include "database.hpp"
int main() {
Database db("mydb.dat");
Schema schema;
schema.table_name = "users";
schema.columns = {{"id", "int"}, {"name", "varchar"}, {"age", "int"}};
db.create_table(schema);
db.begin();
try {
db.execute("INSERT INTO users VALUES (1, 'Alice', 30)");
db.execute("INSERT INTO users VALUES (2, 'Bob', 25)");
db.execute("INSERT INTO users VALUES (3, 'Charlie', 35)");
db.commit();
} catch (const std::exception& e) {
db.rollback();
std::cerr << "Rollback: " << e.what() << "\n";
}
db.execute("SELECT id, name, age FROM users WHERE id = 2");
return 0;
}
Sequence diagram: running SELECT
sequenceDiagram
participant Client
participant Database
participant Parser
participant Executor
participant Pager
participant BTree
Client->>Database: execute("SELECT ....WHERE id=2")
Database->>Parser: parse_select()
Parser->>Parser: tokenize, parse
Parser-->>Database: SelectStmt
Database->>Executor: execute_select(stmt)
Executor->>BTree: search(2)
BTree->>Pager: get_page()
Pager-->>BTree: Page*
BTree-->>Executor: row data
Executor-->>Database: results
Database-->>Client: print rows
8. Common mistakes and fixes
Issue 1: “Page not found” or garbage on read
Symptom: get_page(0) returns junk or crashes.
Cause: Empty file means num_pages_ == 0 and read() reads zero bytes.
// ❌ Wrong
if (page_id < num_pages_) {
file_.read(...);
}
// ✅ Correct
if (page_id < num_pages_) {
file_.seekg(...);
file_.read(reinterpret_cast<char*>(page->data), PAGE_SIZE);
} else {
std::memset(page->data, 0, PAGE_SIZE);
num_pages_ = page_id + 1;
}
Issue 2: B-Tree “vector subscript out of range”
Symptom: Crash on node.values[pos] during leaf insert.
Cause: num_keys out of sync with values size.
// ✅ Keep keys and values in lockstep
node.keys.insert(it, key);
node.values.insert(node.values.begin() + pos, value);
node.num_keys++;
Issue 3: Data loss after WAL recovery
Symptom: Some transactions disappear after restart. Cause: Treating a commit as complete without fsync on the WAL.
// ❌ Wrong
wal_file_.flush();
// ✅ Production: fsync is required
wal_file_.flush();
// POSIX: fsync(fileno(wal_file_)) or platform-specific sync
Issue 4: Cache memory explosion (OOM)
Symptom: Opening a 10GB file grows RAM toward 10GB. Cause: no eviction in the pager cache. Fix: cap with LRU and MAX_CACHED (e.g., 1000); call evict_lru_page() when cache_.size() >= MAX_CACHED.
Issue 5: Parser “Expected FROM”
Symptom: "select * from users" fails. Cause: * is not a token. Fix: if (peek() == '*') { consume(); stmt.columns.push_back("*"); }.
Issue 6: Nested transactions / schema version skew
Nested transactions: two BEGIN calls corrupt state—use nesting depth or SAVEPOINTs.
Schema mismatch: reading old rows after schema change fails—store a schema version in page headers and migrate.
9. Best practices
- Page size: use
PAGE_SIZE = 4096to match typical OS pages. - WAL fsync:
flush()alone is often insufficient; call platform fsync on commit in production. - B-Tree tuning: size
MAX_KEYSto fit a 4KB page—often on the order of hundreds of entries. - Batch flush: flush at commit boundaries; flushing every page update spikes I/O.
- Ordered flush: sort dirty pages by
page_idfor more sequential disk writes. - Error messages: include context, e.g.
throw std::runtime_error("Expected FROM, got: " + current_.value);. - Schema metadata: store catalog/system tables on dedicated pages.
10. Production patterns
Pattern 1: LRU page cache
class LRUPager {
std::list<uint32_t> lru_list_;
std::unordered_map<uint32_t, std::pair<std::unique_ptr<Page>, std::list<uint32_t>::iterator>> cache_;
static constexpr size_t MAX_PAGES = 1000;
void evict() {
while (cache_.size() >= MAX_PAGES && !lru_list_.empty()) {
uint32_t victim = lru_list_.back();
lru_list_.pop_back();
auto it = cache_.find(victim);
if (it->second.first->dirty) flush_page(victim);
cache_.erase(it);
}
}
};
Pattern 2: WAL checkpoint
void checkpoint() {
pager_.flush_all();
wal_file_.close();
std::ofstream(wal_path_, std::ios::trunc).close();
wal_file_.open(wal_path_, std::ios::in | std::ios::out | std::ios::binary | std::ios::app);
}
Pattern 3: Health metrics
Track cache_size, cache_hits, cache_misses; monitor hit_ratio = cache_hits / (cache_hits + cache_misses).
Pattern 4: Backup and restore
void backup(const std::string& backup_path) {
pager_.flush_all();
std::filesystem::copy(db_path_, backup_path, std::filesystem::copy_options::overwrite_existing);
std::filesystem::copy(db_path_ + ".wal", backup_path + ".wal",
std::filesystem::copy_options::overwrite_existing);
}
Pattern 5: Externalized config and thread safety
Use something like DBConfig { page_size, max_cached_pages, fsync_on_commit }. For multiple threads, protect the pager with std::shared_mutex (shared_lock for get_page, unique_lock for flush_page).
11. Summary and checklist
At a glance
| Component | Role |
|---|---|
| Storage engine | Page-based pager, B-Tree index |
| Query parser | Tokenizer + parser → AST |
| Executor | SELECT/INSERT, row serialization |
| Transaction | WAL, REDO/UNDO |
Implementation checklist
- Pager: zero new pages,
flush_all - B-Tree: keep keys/values in sync, implement
split_root - Parser: normalized tokens, handle
* - Executor: schema-driven serialization
- Transactions: WAL,
before_imageon rollback - LRU cache (production)
- WAL checkpoints (production)
One-line summary: paging, B-Trees, a small SQL parser, an executor, and WAL transactions are enough to sketch a minimal database engine. References:
- SQLite Architecture
- Database Internals (Alex Petrov, O’Reilly)
- cppreference — File I/O
Related posts on this site
- C++ database connectivity — SQLite, PostgreSQL, pools, transactions [#31-3]
- C++ binary serialization — endianness, padding, save files
- C++ database engine implementation — B-Tree, transactions, query optimization [#50-4]
Keywords (search)
C++, database, DB engine, storage engine, query parser, transaction, ACID, and related terms should surface this article in search.
FAQ (from the body)
Q. When do I use this at work?
A. Whenever you implement or embed a small engine in C++: paging, a parser, executor, and ACID transactions—with scenarios, pitfalls, and patterns like those above.
Q. What should I read before this?
A. Follow Previous / Related links at the bottom of each post, or use the C++ series index for the full learning path.
Q. How do I go deeper?
A. Use cppreference and official docs for libraries you depend on, plus the links at the end of this article.
More related posts
- Full C++ series
- C++ Adapter pattern — interface adaptation
- C++ ADL (argument-dependent lookup)
- C++ aggregate initialization