본문으로 건너뛰기 C++ DB Engine Fundamentals: Storage Engine, Query Parser, Executor & Transactions [#49-1]

C++ DB Engine Fundamentals: Storage Engine, Query Parser, Executor & Transactions [#49-1]

C++ DB Engine Fundamentals: Storage Engine, Query Parser, Executor & Transactions [#49-1]

이 글의 핵심

A database combines a storage engine, query parser, executor, and transaction layer. Even when using SQLite or MySQL, understanding internals helps answer why indexes matter and how transactions preserve atomicity. This article walks through real scenarios, end-to-end C++ sketches (pager, parser, executor, WAL), common pitfalls, and production patterns.

Introduction: “Why does finding one key in a million rows take three seconds?”

Why DB engine fundamentals

A database is a system that combines a storage engine, query parser, executor, and transactions. Even if you only use SQLite or MySQL, without understanding internals it is hard to answer “why do we need indexes?” or “how does a transaction guarantee atomicity?”. This article covers realistic scenarios, complete C++-style examples (storage engine, parser, executor, transactions), common mistakes, and production patterns from a practical angle. Topics covered:

  • Scenarios: full-scan bottlenecks, failed rollbacks, parser errors, and more
  • Storage engine: page-based pager, B-Tree index
  • Query parser: SELECT/INSERT parsing
  • Executor: running queries and returning rows
  • Transactions: WAL and ACID guarantees
  • Common pitfalls: cache blowups, corrupted WAL
  • Production patterns: LRU cache, checkpointing, backups See also: Database connectivity, Binary serialization.

A mental model

A DB engine maps well to a library. The storage engine is bookshelves (pages); a B-Tree index is the card catalog; the executor is the procedure for locating and reading a book. Scanning every row without an index is like walking every shelf from end to end—cost grows with table size.

From production experience: this article is grounded in large-scale C++ work—real failures and fixes that rarely appear in textbooks, including debugging traps and practical tips.

Table of contents

  1. Problem scenarios
  2. DB engine architecture
  3. Storage engine
  4. Query parser
  5. Executor
  6. Transaction
  7. End-to-end DB engine example
  8. Common mistakes and fixes
  9. Best practices
  10. Production patterns
  11. Summary and checklist

1. Problem scenarios

Scenario 1: “Finding id=12345 in a million rows takes three seconds”

Situation: A users table has one million rows and SELECT * FROM users WHERE id = 12345 takes about three seconds. Cause: Full table scan. With no index, every row is checked sequentially—O(N) complexity. Fix: Build a B-Tree index on id to get O(log N) lookups, typically completing in milliseconds.

Scenario 2: “If one of ten INSERTs fails, I need to roll back all of them”

Situation: During a batch INSERT, the fifth row violates a constraint. What happens to the four rows already inserted? Cause: Without a transaction, each INSERT may commit independently—partial commits with no way to roll back. Fix: BEGIN → ten INSERTs → COMMIT, or ROLLBACK on error. Use WAL (write-ahead logging) to enforce atomicity.

Scenario 3: “Parsing SELECT * FROM users WHERE name = 'Alice' fails”

Situation: A simple SQL string fails with something like “Expected FROM”. Cause: Fragile handling of case, whitespace, and quotes—inputs like "select * from users" break naive parsers. Fix: In the tokenizer, normalize (e.g., lowercase keywords, trim) and treat quoted strings as single tokens.

Scenario 4: “The DB file is 10GB but only 2GB RAM is available”

Situation: The pager tries to cache every page and the process runs out of memory. Cause: Unbounded cache—every get_page() adds a page with no eviction. Fix: An LRU cache with a cap such as MAX_CACHED_PAGES (e.g., 1000); flush dirty pages before evicting.

Scenario 5: “After a crash, half the data disappears”

Situation: Power loss during commit; after restart, only some changes are visible. Cause: Crash before fsync completes on the WAL, or incorrect recovery ordering. Fix: Treat a commit as complete only after WAL records are fsynced; on recovery, REDO the WAL in order.

Scenario 6: “Two clients update the same row at once”

Situation: Clients A and B both run UPDATE users SET balance = balance - 100. The final balance is wrong. Cause: No concurrency control—both transactions read, modify, and commit, and one update is lost. Fix: Locks or MVCC. This article focuses on single-threaded behavior; concurrency is mentioned under production patterns.

Scenario diagram

flowchart TB
    subgraph Problems[Production scenarios]
        P1[Full scan 3s]
        P2[Partial commit no rollback]
        P3[SQL parse failure]
        P4[Cache OOM]
        P5[Data loss after crash]
    end
    subgraph Solutions[Directions]
        S1[B-Tree index]
        S2[WAL transactions]
        S3[Tokenizer normalization]
        S4[LRU page cache]
        S5[WAL fsync + REDO]
    end
    P1 --> S1
    P2 --> S2
    P3 --> S3
    P4 --> S4
    P5 --> S5

2. DB engine architecture

Big picture

flowchart TB
    subgraph Client[Client]
        C1["SQL string"]
    end
    subgraph Engine[DB engine]
        subgraph Parser[Query parser]
            P1[Tokenizer]
            P2[Syntax analysis]
        end
        subgraph Executor[Executor]
            E1[Execution plan]
            E2[Operators]
        end
        subgraph Storage[Storage engine]
            S1[Pager]
            S2[B-Tree]
        end
        subgraph Txn[Transaction]
            T1[WAL]
        end
    end
    C1 --> Parser
    Parser --> Executor
    Executor --> Storage
    Storage --> Txn

Component roles

ComponentRoleAnalogy
Query parserSQL string → syntax tree (AST)Translator
ExecutorAST → actual work (scan, filter, insert)Worker
Storage enginePage I/O, B-Tree indexWarehouse
TransactionWAL, commit/rollbackLedger

3. Storage engine

Why page-based storage

Disk I/O is most efficient in page-sized chunks. Using 4KB pages aligns with OS page cache and reduces random I/O.

Pager: page cache

// pager.hpp
#pragma once
#include <fstream>
#include <unordered_map>
#include <memory>
#include <cstring>
#include <cstdint>
constexpr size_t PAGE_SIZE = 4096;
struct Page {
    uint32_t page_id;
    uint8_t data[PAGE_SIZE];
    bool dirty = false;
};
class Pager {
    std::string filename_;
    std::fstream file_;
    std::unordered_map<uint32_t, std::unique_ptr<Page>> cache_;
    uint32_t num_pages_ = 0;
public:
    explicit Pager(const std::string& filename) : filename_(filename) {
        file_.open(filename, std::ios::in | std::ios::out | std::ios::binary);
        if (!file_) {
            file_.open(filename, std::ios::out | std::ios::binary);
            file_.close();
            file_.open(filename, std::ios::in | std::ios::out | std::ios::binary);
        }
        file_.seekg(0, std::ios::end);
        auto size = file_.tellg();
        num_pages_ = static_cast<uint32_t>(size) / PAGE_SIZE;
    }
    Page* get_page(uint32_t page_id) {
        auto it = cache_.find(page_id);
        if (it != cache_.end()) {
            return it->second.get();
        }
        auto page = std::make_unique<Page>();
        page->page_id = page_id;
        if (page_id < num_pages_) {
            file_.seekg(static_cast<std::streamoff>(page_id) * PAGE_SIZE);
            file_.read(reinterpret_cast<char*>(page->data), PAGE_SIZE);
        } else {
            std::memset(page->data, 0, PAGE_SIZE);
            num_pages_ = page_id + 1;
        }
        auto* ptr = page.get();
        cache_[page_id] = std::move(page);
        return ptr;
    }
    void mark_dirty(uint32_t page_id) {
        auto it = cache_.find(page_id);
        if (it != cache_.end()) {
            it->second->dirty = true;
        }
    }
    void flush_page(uint32_t page_id) {
        auto it = cache_.find(page_id);
        if (it == cache_.end() || !it->second->dirty) return;
        file_.seekp(static_cast<std::streamoff>(page_id) * PAGE_SIZE);
        file_.write(reinterpret_cast<const char*>(it->second->data), PAGE_SIZE);
        file_.flush();
        it->second->dirty = false;
    }
    void flush_all() {
        for (auto& [id, page] : cache_) {
            if (page->dirty) flush_page(id);
        }
    }
    uint32_t allocate_page() {
        return num_pages_++;
    }
    ~Pager() {
        flush_all();
    }
};

Key points:

  • get_page(): on cache miss, load from disk; new pages are zero-filled.
  • mark_dirty(): marks modified pages.
  • flush_all(): writes all dirty pages at commit time.

B-Tree node (simplified)

// btree.hpp
#pragma once
#include "pager.hpp"
#include <vector>
#include <optional>
#include <algorithm>
#include <cstring>
struct BTreeNode {
    bool is_leaf;
    uint32_t num_keys;
    std::vector<int32_t> keys;
    std::vector<uint32_t> children;
    std::vector<std::vector<uint8_t>> values;
    static constexpr size_t MAX_KEYS = 100;
    void serialize(uint8_t* buf) const {
        size_t offset = 0;
        std::memcpy(buf + offset, &is_leaf, sizeof(is_leaf));
        offset += sizeof(is_leaf);
        std::memcpy(buf + offset, &num_keys, sizeof(num_keys));
        offset += sizeof(num_keys);
        for (uint32_t i = 0; i < num_keys; ++i) {
            std::memcpy(buf + offset, &keys[i], sizeof(keys[i]));
            offset += sizeof(keys[i]);
        }
        if (!is_leaf) {
            for (uint32_t i = 0; i <= num_keys; ++i) {
                std::memcpy(buf + offset, &children[i], sizeof(children[i]));
                offset += sizeof(children[i]);
            }
        } else {
            for (uint32_t i = 0; i < num_keys; ++i) {
                uint16_t sz = static_cast<uint16_t>(values[i].size());
                std::memcpy(buf + offset, &sz, sizeof(sz));
                offset += sizeof(sz);
                std::memcpy(buf + offset, values[i].data(), sz);
                offset += sz;
            }
        }
    }
    static BTreeNode deserialize(const uint8_t* buf) {
        BTreeNode node;
        size_t offset = 0;
        std::memcpy(&node.is_leaf, buf + offset, sizeof(node.is_leaf));
        offset += sizeof(node.is_leaf);
        std::memcpy(&node.num_keys, buf + offset, sizeof(node.num_keys));
        offset += sizeof(node.num_keys);
        node.keys.resize(node.num_keys);
        for (uint32_t i = 0; i < node.num_keys; ++i) {
            std::memcpy(&node.keys[i], buf + offset, sizeof(node.keys[i]));
            offset += sizeof(node.keys[i]);
        }
        if (!node.is_leaf) {
            node.children.resize(node.num_keys + 1);
            for (uint32_t i = 0; i <= node.num_keys; ++i) {
                std::memcpy(&node.children[i], buf + offset, sizeof(node.children[i]));
                offset += sizeof(node.children[i]);
            }
        } else {
            node.values.resize(node.num_keys);
            for (uint32_t i = 0; i < node.num_keys; ++i) {
                uint16_t sz;
                std::memcpy(&sz, buf + offset, sizeof(sz));
                offset += sizeof(sz);
                node.values[i].resize(sz);
                std::memcpy(node.values[i].data(), buf + offset, sz);
                offset += sz;
            }
        }
        return node;
    }
};
class BTree {
    Pager& pager_;
    uint32_t root_page_id_;
public:
    BTree(Pager& pager, uint32_t root_page_id)
        : pager_(pager), root_page_id_(root_page_id) {}
    std::optional<std::vector<uint8_t>> search(int32_t key) {
        return search_recursive(root_page_id_, key);
    }
    void insert(int32_t key, const std::vector<uint8_t>& value) {
        auto* root_page = pager_.get_page(root_page_id_);
        auto root = BTreeNode::deserialize(root_page->data);
        if (root.num_keys >= BTreeNode::MAX_KEYS) {
            uint32_t new_root = pager_.allocate_page();
            split_root(root_page_id_, new_root);
            root_page_id_ = new_root;
        }
        insert_non_full(root_page_id_, key, value);
    }
private:
    std::optional<std::vector<uint8_t>> search_recursive(uint32_t page_id, int32_t key) {
        auto* page = pager_.get_page(page_id);
        auto node = BTreeNode::deserialize(page->data);
        auto it = std::lower_bound(node.keys.begin(), node.keys.end(), key);
        size_t pos = it - node.keys.begin();
        if (it != node.keys.end() && *it == key) {
            if (node.is_leaf) return node.values[pos];
            return search_recursive(node.children[pos + 1], key);
        }
        if (node.is_leaf) return std::nullopt;
        return search_recursive(node.children[pos], key);
    }
    void split_root(uint32_t old_root, uint32_t new_root) {
        (void)old_root;
        (void)new_root;
    }
    void insert_non_full(uint32_t page_id, int32_t key, const std::vector<uint8_t>& value) {
        auto* page = pager_.get_page(page_id);
        auto node = BTreeNode::deserialize(page->data);
        if (node.is_leaf) {
            auto it = std::lower_bound(node.keys.begin(), node.keys.end(), key);
            size_t pos = it - node.keys.begin();
            node.keys.insert(it, key);
            node.values.insert(node.values.begin() + pos, value);
            node.num_keys++;
            node.serialize(page->data);
            pager_.mark_dirty(page_id);
        } else {
            auto it = std::lower_bound(node.keys.begin(), node.keys.end(), key);
            size_t pos = it - node.keys.begin();
            insert_non_full(node.children[pos], key, value);
        }
    }
};

4. Query parser

Tokenizer and syntax analysis

// parser.hpp
#pragma once
#include <string>
#include <vector>
#include <optional>
#include <sstream>
#include <algorithm>
#include <cctype>
enum class TokenType { SELECT, INSERT, INTO, VALUES, FROM, WHERE, IDENT, NUMBER, STRING, COMMA, LPAREN, RPAREN, EQ, SEMICOLON, END };
struct Token {
    TokenType type;
    std::string value;
};
class Tokenizer {
    std::string input_;
    size_t pos_ = 0;
    char peek() const {
        return pos_ < input_.size() ? input_[pos_] : '\0';
    }
    char consume() {
        return pos_ < input_.size() ? input_[pos_++] : '\0';
    }
    void skip_whitespace() {
        while (std::isspace(static_cast<unsigned char>(peek()))) consume();
    }
public:
    explicit Tokenizer(const std::string& input) : input_(input) {}
    Token next() {
        skip_whitespace();
        if (pos_ >= input_.size()) return {TokenType::END, ""};
        size_t start = pos_;
        if (std::isalpha(static_cast<unsigned char>(peek())) || peek() == '_') {
            while (std::isalnum(static_cast<unsigned char>(peek()))) consume();
            std::string word = input_.substr(start, pos_ - start);
            std::transform(word.begin(), word.end(), word.begin(), ::tolower);
            if (word == "select") return {TokenType::SELECT, word};
            if (word == "insert") return {TokenType::INSERT, word};
            if (word == "into") return {TokenType::INTO, word};
            if (word == "values") return {TokenType::VALUES, word};
            if (word == "from") return {TokenType::FROM, word};
            if (word == "where") return {TokenType::WHERE, word};
            return {TokenType::IDENT, word};
        }
        if (std::isdigit(static_cast<unsigned char>(peek()))) {
            while (std::isdigit(static_cast<unsigned char>(peek()))) consume();
            return {TokenType::NUMBER, input_.substr(start, pos_ - start)};
        }
        if (peek() == '\' || peek() == '"') {
            char quote = consume();
            start = pos_;
            while (peek() != quote && peek() != '\0') consume();
            std::string s = input_.substr(start, pos_ - start);
            if (peek() == quote) consume();
            return {TokenType::STRING, s};
        }
        if (peek() == ',') { consume(); return {TokenType::COMMA, ","}; }
        if (peek() == '(') { consume(); return {TokenType::LPAREN, "("}; }
        if (peek() == ')') { consume(); return {TokenType::RPAREN, ")"}; }
        if (peek() == '=') { consume(); return {TokenType::EQ, "="}; }
        if (peek() == ';') { consume(); return {TokenType::SEMICOLON, ";"}; }
        consume();
        return {TokenType::END, ""};
    }
};
struct SelectStmt {
    std::vector<std::string> columns;
    std::string table_name;
    std::optional<std::string> where_column;
    std::optional<std::string> where_value;
};
struct InsertStmt {
    std::string table_name;
    std::vector<std::string> values;
};
class SQLParser {
    Tokenizer tokenizer_;
    Token current_;
    void advance() { current_ = tokenizer_.next(); }
    bool check(TokenType t) const { return current_.type == t; }
    void expect(TokenType t) {
        if (!check(t)) throw std::runtime_error("Expected token, got: " + current_.value);
        advance();
    }
public:
    explicit SQLParser(const std::string& sql) : tokenizer_(sql) {
        advance();
    }
    SelectStmt parse_select() {
        SelectStmt stmt;
        expect(TokenType::SELECT);
        while (!check(TokenType::FROM)) {
            if (current_.type == TokenType::IDENT) {
                stmt.columns.push_back(current_.value);
                advance();
            }
            if (check(TokenType::COMMA)) advance();
        }
        expect(TokenType::FROM);
        stmt.table_name = current_.value;
        expect(TokenType::IDENT);
        if (check(TokenType::WHERE)) {
            advance();
            stmt.where_column = current_.value;
            expect(TokenType::IDENT);
            expect(TokenType::EQ);
            stmt.where_value = current_.value;
            if (current_.type == TokenType::STRING) advance();
            else expect(TokenType::NUMBER);
        }
        return stmt;
    }
    InsertStmt parse_insert() {
        InsertStmt stmt;
        expect(TokenType::INSERT);
        expect(TokenType::INTO);
        stmt.table_name = current_.value;
        expect(TokenType::IDENT);
        expect(TokenType::VALUES);
        expect(TokenType::LPAREN);
        while (!check(TokenType::RPAREN)) {
            if (current_.type == TokenType::NUMBER)
                stmt.values.push_back(current_.value);
            else if (current_.type == TokenType::STRING)
                stmt.values.push_back(current_.value);
            advance();
            if (check(TokenType::COMMA)) advance();
        }
        expect(TokenType::RPAREN);
        return stmt;
    }
    bool is_select() const { return current_.type == TokenType::SELECT; }
    bool is_insert() const { return current_.type == TokenType::INSERT; }
};

Key points:

  • Normalization: lowercasing keywords so SELECT and select both work.
  • Quoted strings: 'Alice' becomes a single STRING token.
  • expect(): throws with context when the next token is wrong.

5. Executor

SELECT / INSERT execution

// executor.hpp
#pragma once
#include "parser.hpp"
#include "pager.hpp"
#include "btree.hpp"
#include <functional>
#include <vector>
#include <string>
#include <unordered_map>
struct Schema {
    std::string table_name;
    std::vector<std::pair<std::string, std::string>> columns;
};
class Executor {
    Pager& pager_;
    std::unordered_map<std::string, std::pair<Schema, uint32_t>> tables_;
public:
    explicit Executor(Pager& pager) : pager_(pager) {}
    void create_table(const Schema& schema) {
        uint32_t root = pager_.allocate_page();
        tables_[schema.table_name] = {schema, root};
    }
    std::vector<std::vector<std::string>> execute_select(const SelectStmt& stmt) {
        std::vector<std::vector<std::string>> results;
        auto it = tables_.find(stmt.table_name);
        if (it == tables_.end()) throw std::runtime_error("Table not found: " + stmt.table_name);
        const auto& [schema, root_id] = it->second;
        BTree btree(pager_, root_id);
        if (stmt.where_column && stmt.where_value) {
            int32_t key = std::stoi(*stmt.where_value);
            auto opt = btree.search(key);
            if (opt) {
                auto row = deserialize_row(*opt, schema);
                results.push_back(row);
            }
        } else {
            scan_all(btree, schema, [&](const std::vector<std::string>& row) {
                results.push_back(row);
            });
        }
        return results;
    }
    void execute_insert(const InsertStmt& stmt) {
        auto it = tables_.find(stmt.table_name);
        if (it == tables_.end()) throw std::runtime_error("Table not found: " + stmt.table_name);
        const auto& [schema, root_id] = it->second;
        BTree btree(pager_, root_id);
        int32_t key = std::stoi(stmt.values[0]);
        std::vector<uint8_t> row_data = serialize_row(stmt.values, schema);
        btree.insert(key, row_data);
    }
private:
    std::vector<uint8_t> serialize_row(const std::vector<std::string>& values, const Schema& schema) {
        std::vector<uint8_t> data;
        for (size_t i = 0; i < schema.columns.size() && i < values.size(); ++i) {
            const auto& [name, type] = schema.columns[i];
            const auto& val = values[i];
            (void)name;
            if (type == "int") {
                int32_t num = std::stoi(val);
                data.insert(data.end(),
                    reinterpret_cast<const uint8_t*>(&num),
                    reinterpret_cast<const uint8_t*>(&num) + sizeof(num));
            } else {
                uint16_t len = static_cast<uint16_t>(val.size());
                data.insert(data.end(),
                    reinterpret_cast<const uint8_t*>(&len),
                    reinterpret_cast<const uint8_t*>(&len) + sizeof(len));
                data.insert(data.end(), val.begin(), val.end());
            }
        }
        return data;
    }
    std::vector<std::string> deserialize_row(const std::vector<uint8_t>& data, const Schema& schema) {
        std::vector<std::string> row;
        size_t offset = 0;
        for (const auto& [name, type] : schema.columns) {
            (void)name;
            if (type == "int") {
                int32_t num;
                std::memcpy(&num, data.data() + offset, sizeof(num));
                row.push_back(std::to_string(num));
                offset += sizeof(num);
            } else {
                uint16_t len;
                std::memcpy(&len, data.data() + offset, sizeof(len));
                offset += sizeof(len);
                row.push_back(std::string(data.begin() + offset, data.begin() + offset + len));
                offset += len;
            }
        }
        return row;
    }
    void scan_all(BTree& btree, const Schema& schema,
                  const std::function<void(const std::vector<std::string>&)>& callback) {
        (void)btree;
        (void)schema;
        (void)callback;
    }
};

6. Transaction

WAL (write-ahead logging)

// transaction.hpp
#pragma once
#include "pager.hpp"
#include <fstream>
#include <vector>
#include <cstring>
#include <cstdio>  // EOF
struct WALEntry {
    enum Type { INSERT, UPDATE, DELETE } type;
    uint32_t page_id;
    std::vector<uint8_t> before_image;
    std::vector<uint8_t> after_image;
};
class TransactionManager {
    std::string wal_path_;
    std::fstream wal_file_;
    std::vector<WALEntry> current_txn_;
    bool in_transaction_ = false;
public:
    explicit TransactionManager(const std::string& db_path)
        : wal_path_(db_path + ".wal") {
        wal_file_.open(wal_path_, std::ios::in | std::ios::out | std::ios::binary | std::ios::app);
        if (!wal_file_) {
            wal_file_.open(wal_path_, std::ios::out | std::ios::binary);
            wal_file_.close();
            wal_file_.open(wal_path_, std::ios::in | std::ios::out | std::ios::binary | std::ios::app);
        }
    }
    void begin() {
        if (in_transaction_) throw std::runtime_error("Transaction already in progress");
        current_txn_.clear();
        in_transaction_ = true;
    }
    void log_page_change(uint32_t page_id,
                         const uint8_t* before, const uint8_t* after) {
        WALEntry entry;
        entry.type = WALEntry::UPDATE;
        entry.page_id = page_id;
        entry.before_image.assign(before, before + PAGE_SIZE);
        entry.after_image.assign(after, after + PAGE_SIZE);
        current_txn_.push_back(entry);
    }
    void commit(Pager& pager) {
        if (!in_transaction_) return;
        for (const auto& e : current_txn_) {
            write_entry(e);
        }
        wal_file_.flush();
        pager.flush_all();
        current_txn_.clear();
        in_transaction_ = false;
    }
    void rollback(Pager& pager) {
        if (!in_transaction_) return;
        for (auto it = current_txn_.rbegin(); it != current_txn_.rend(); ++it) {
            auto* page = pager.get_page(it->page_id);
            std::memcpy(page->data, it->before_image.data(), PAGE_SIZE);
            pager.mark_dirty(it->page_id);
        }
        current_txn_.clear();
        in_transaction_ = false;
    }
    void recover(Pager& pager) {
        wal_file_.seekg(0, std::ios::beg);
        while (wal_file_.good() && wal_file_.peek() != EOF) {
            auto entry = read_entry();
            auto* page = pager.get_page(entry.page_id);
            std::memcpy(page->data, entry.after_image.data(), PAGE_SIZE);
            pager.mark_dirty(entry.page_id);
        }
        pager.flush_all();
    }
    bool is_in_transaction() const { return in_transaction_; }
private:
    void write_entry(const WALEntry& e) {
        uint8_t t = static_cast<uint8_t>(e.type);
        wal_file_.write(reinterpret_cast<const char*>(&t), sizeof(t));
        wal_file_.write(reinterpret_cast<const char*>(&e.page_id), sizeof(e.page_id));
        uint32_t before_sz = static_cast<uint32_t>(e.before_image.size());
        wal_file_.write(reinterpret_cast<const char*>(&before_sz), sizeof(before_sz));
        wal_file_.write(reinterpret_cast<const char*>(e.before_image.data()), before_sz);
        uint32_t after_sz = static_cast<uint32_t>(e.after_image.size());
        wal_file_.write(reinterpret_cast<const char*>(&after_sz), sizeof(after_sz));
        wal_file_.write(reinterpret_cast<const char*>(e.after_image.data()), after_sz);
    }
    WALEntry read_entry() {
        WALEntry e;
        uint8_t t;
        wal_file_.read(reinterpret_cast<char*>(&t), sizeof(t));
        e.type = static_cast<WALEntry::Type>(t);
        wal_file_.read(reinterpret_cast<char*>(&e.page_id), sizeof(e.page_id));
        uint32_t before_sz, after_sz;
        wal_file_.read(reinterpret_cast<char*>(&before_sz), sizeof(before_sz));
        e.before_image.resize(before_sz);
        wal_file_.read(reinterpret_cast<char*>(e.before_image.data()), before_sz);
        wal_file_.read(reinterpret_cast<char*>(&after_sz), sizeof(after_sz));
        e.after_image.resize(after_sz);
        wal_file_.read(reinterpret_cast<char*>(e.after_image.data()), after_sz);
        return e;
    }
};

Key points:

  • Write-ahead: log changes before they are durable on the main file.
  • Rollback: restore pages from before_image.
  • Recovery: REDO the WAL from start to finish.

7. End-to-end DB engine example

Database class

// database.hpp
#pragma once
#include "pager.hpp"
#include "parser.hpp"
#include "executor.hpp"
#include "transaction.hpp"
#include <iostream>
#include <stdexcept>
class Database {
    Pager pager_;
    TransactionManager txn_mgr_;
    Executor executor_;
public:
    explicit Database(const std::string& path)
        : pager_(path),
          txn_mgr_(path),
          executor_(pager_) {
        txn_mgr_.recover(pager_);
    }
    void execute(const std::string& sql) {
        SQLParser parser(sql);
        if (parser.is_select()) {
            auto stmt = parser.parse_select();
            auto results = executor_.execute_select(stmt);
            for (const auto& row : results) {
                for (const auto& col : row) std::cout << col << " ";
                std::cout << "\n";
            }
        } else if (parser.is_insert()) {
            auto stmt = parser.parse_insert();
            executor_.execute_insert(stmt);
        } else {
            throw std::runtime_error("Unknown statement type");
        }
    }
    void begin() { txn_mgr_.begin(); }
    void commit() { txn_mgr_.commit(pager_); }
    void rollback() { txn_mgr_.rollback(pager_); }
    void create_table(const Schema& schema) {
        executor_.create_table(schema);
    }
};

Example main

// main.cpp
#include "database.hpp"
int main() {
    Database db("mydb.dat");
    Schema schema;
    schema.table_name = "users";
    schema.columns = {{"id", "int"}, {"name", "varchar"}, {"age", "int"}};
    db.create_table(schema);
    db.begin();
    try {
        db.execute("INSERT INTO users VALUES (1, 'Alice', 30)");
        db.execute("INSERT INTO users VALUES (2, 'Bob', 25)");
        db.execute("INSERT INTO users VALUES (3, 'Charlie', 35)");
        db.commit();
    } catch (const std::exception& e) {
        db.rollback();
        std::cerr << "Rollback: " << e.what() << "\n";
    }
    db.execute("SELECT id, name, age FROM users WHERE id = 2");
    return 0;
}

Sequence diagram: running SELECT

sequenceDiagram
    participant Client
    participant Database
    participant Parser
    participant Executor
    participant Pager
    participant BTree
    Client->>Database: execute("SELECT ....WHERE id=2")
    Database->>Parser: parse_select()
    Parser->>Parser: tokenize, parse
    Parser-->>Database: SelectStmt
    Database->>Executor: execute_select(stmt)
    Executor->>BTree: search(2)
    BTree->>Pager: get_page()
    Pager-->>BTree: Page*
    BTree-->>Executor: row data
    Executor-->>Database: results
    Database-->>Client: print rows

8. Common mistakes and fixes

Issue 1: “Page not found” or garbage on read

Symptom: get_page(0) returns junk or crashes. Cause: Empty file means num_pages_ == 0 and read() reads zero bytes.

// ❌ Wrong
if (page_id < num_pages_) {
    file_.read(...);
}
// ✅ Correct
if (page_id < num_pages_) {
    file_.seekg(...);
    file_.read(reinterpret_cast<char*>(page->data), PAGE_SIZE);
} else {
    std::memset(page->data, 0, PAGE_SIZE);
    num_pages_ = page_id + 1;
}

Issue 2: B-Tree “vector subscript out of range”

Symptom: Crash on node.values[pos] during leaf insert. Cause: num_keys out of sync with values size.

// ✅ Keep keys and values in lockstep
node.keys.insert(it, key);
node.values.insert(node.values.begin() + pos, value);
node.num_keys++;

Issue 3: Data loss after WAL recovery

Symptom: Some transactions disappear after restart. Cause: Treating a commit as complete without fsync on the WAL.

// ❌ Wrong
wal_file_.flush();
// ✅ Production: fsync is required
wal_file_.flush();
// POSIX: fsync(fileno(wal_file_)) or platform-specific sync

Issue 4: Cache memory explosion (OOM)

Symptom: Opening a 10GB file grows RAM toward 10GB. Cause: no eviction in the pager cache. Fix: cap with LRU and MAX_CACHED (e.g., 1000); call evict_lru_page() when cache_.size() >= MAX_CACHED.

Issue 5: Parser “Expected FROM”

Symptom: "select * from users" fails. Cause: * is not a token. Fix: if (peek() == '*') { consume(); stmt.columns.push_back("*"); }.

Issue 6: Nested transactions / schema version skew

Nested transactions: two BEGIN calls corrupt state—use nesting depth or SAVEPOINTs. Schema mismatch: reading old rows after schema change fails—store a schema version in page headers and migrate.

9. Best practices

  1. Page size: use PAGE_SIZE = 4096 to match typical OS pages.
  2. WAL fsync: flush() alone is often insufficient; call platform fsync on commit in production.
  3. B-Tree tuning: size MAX_KEYS to fit a 4KB page—often on the order of hundreds of entries.
  4. Batch flush: flush at commit boundaries; flushing every page update spikes I/O.
  5. Ordered flush: sort dirty pages by page_id for more sequential disk writes.
  6. Error messages: include context, e.g. throw std::runtime_error("Expected FROM, got: " + current_.value);.
  7. Schema metadata: store catalog/system tables on dedicated pages.

10. Production patterns

Pattern 1: LRU page cache

class LRUPager {
    std::list<uint32_t> lru_list_;
    std::unordered_map<uint32_t, std::pair<std::unique_ptr<Page>, std::list<uint32_t>::iterator>> cache_;
    static constexpr size_t MAX_PAGES = 1000;
    void evict() {
        while (cache_.size() >= MAX_PAGES && !lru_list_.empty()) {
            uint32_t victim = lru_list_.back();
            lru_list_.pop_back();
            auto it = cache_.find(victim);
            if (it->second.first->dirty) flush_page(victim);
            cache_.erase(it);
        }
    }
};

Pattern 2: WAL checkpoint

void checkpoint() {
    pager_.flush_all();
    wal_file_.close();
    std::ofstream(wal_path_, std::ios::trunc).close();
    wal_file_.open(wal_path_, std::ios::in | std::ios::out | std::ios::binary | std::ios::app);
}

Pattern 3: Health metrics

Track cache_size, cache_hits, cache_misses; monitor hit_ratio = cache_hits / (cache_hits + cache_misses).

Pattern 4: Backup and restore

void backup(const std::string& backup_path) {
    pager_.flush_all();
    std::filesystem::copy(db_path_, backup_path, std::filesystem::copy_options::overwrite_existing);
    std::filesystem::copy(db_path_ + ".wal", backup_path + ".wal",
        std::filesystem::copy_options::overwrite_existing);
}

Pattern 5: Externalized config and thread safety

Use something like DBConfig { page_size, max_cached_pages, fsync_on_commit }. For multiple threads, protect the pager with std::shared_mutex (shared_lock for get_page, unique_lock for flush_page).

11. Summary and checklist

At a glance

ComponentRole
Storage enginePage-based pager, B-Tree index
Query parserTokenizer + parser → AST
ExecutorSELECT/INSERT, row serialization
TransactionWAL, REDO/UNDO

Implementation checklist

  • Pager: zero new pages, flush_all
  • B-Tree: keep keys/values in sync, implement split_root
  • Parser: normalized tokens, handle *
  • Executor: schema-driven serialization
  • Transactions: WAL, before_image on rollback
  • LRU cache (production)
  • WAL checkpoints (production)

One-line summary: paging, B-Trees, a small SQL parser, an executor, and WAL transactions are enough to sketch a minimal database engine. References:


  • C++ database connectivity — SQLite, PostgreSQL, pools, transactions [#31-3]
  • C++ binary serialization — endianness, padding, save files
  • C++ database engine implementation — B-Tree, transactions, query optimization [#50-4]

C++, database, DB engine, storage engine, query parser, transaction, ACID, and related terms should surface this article in search.

FAQ (from the body)

Q. When do I use this at work?

A. Whenever you implement or embed a small engine in C++: paging, a parser, executor, and ACID transactions—with scenarios, pitfalls, and patterns like those above.

Q. What should I read before this?

A. Follow Previous / Related links at the bottom of each post, or use the C++ series index for the full learning path.

Q. How do I go deeper?

A. Use cppreference and official docs for libraries you depend on, plus the links at the end of this article.