Where is this useful in practice?

Embedded DB work, special-purpose storage, teaching DB internals, in-memory caches, and performance tuning when you need to reason about the stack beneath SQL.

What should I read first?

Posts on database connectivity, binary serialization, and data-oriented design pair well with this one.

Where can I go deeper?

SQLite source, Database Internals (Alex Petrov), and cppreference for I/O and language details.

C++ DB Engine Fundamentals: Storage Engine, Query Parser, Executor & Transactions [#49-1]

2026년 3월 12일 · 45분 읽기 · 수정 2026년 4월 17일 Intermediate Tutorial

이 글의 핵심

A database combines a storage engine, query parser, executor, and transaction layer. Even when using SQLite or MySQL, understanding internals helps answer why indexes matter and how transactions preserve atomicity. This article walks through real scenarios, end-to-end C++ sketches (pager, parser, executor, WAL), common pitfalls, and production patterns.

Introduction: “Why does finding one key in a million rows take three seconds?”

Why DB engine fundamentals

A database is a system that combines a storage engine, query parser, executor, and transactions. Even if you only use SQLite or MySQL, without understanding internals it is hard to answer “why do we need indexes?” or “how does a transaction guarantee atomicity?”. This article covers realistic scenarios, complete C++-style examples (storage engine, parser, executor, transactions), common mistakes, and production patterns from a practical angle. Topics covered:

Scenarios: full-scan bottlenecks, failed rollbacks, parser errors, and more
Storage engine: page-based pager, B-Tree index
Query parser: SELECT/INSERT parsing
Executor: running queries and returning rows
Transactions: WAL and ACID guarantees
Common pitfalls: cache blowups, corrupted WAL
Production patterns: LRU cache, checkpointing, backups See also: Database connectivity, Binary serialization.

A mental model

A DB engine maps well to a library. The storage engine is bookshelves (pages); a B-Tree index is the card catalog; the executor is the procedure for locating and reading a book. Scanning every row without an index is like walking every shelf from end to end—cost grows with table size.

From production experience: this article is grounded in large-scale C++ work—real failures and fixes that rarely appear in textbooks, including debugging traps and practical tips.

Problem scenarios
DB engine architecture
Storage engine
Query parser
Executor
Transaction
End-to-end DB engine example
Common mistakes and fixes
Best practices
Production patterns
Summary and checklist

1. Problem scenarios

Scenario 1: “Finding id=12345 in a million rows takes three seconds”

Situation: A users table has one million rows and SELECT * FROM users WHERE id = 12345 takes about three seconds. Cause: Full table scan. With no index, every row is checked sequentially—O(N) complexity. Fix: Build a B-Tree index on id to get O(log N) lookups, typically completing in milliseconds.

Scenario 2: “If one of ten INSERTs fails, I need to roll back all of them”

Situation: During a batch INSERT, the fifth row violates a constraint. What happens to the four rows already inserted? Cause: Without a transaction, each INSERT may commit independently—partial commits with no way to roll back. Fix: BEGIN → ten INSERTs → COMMIT, or ROLLBACK on error. Use WAL (write-ahead logging) to enforce atomicity.

Scenario 3: “Parsing `SELECT * FROM users WHERE name = 'Alice'` fails”

Situation: A simple SQL string fails with something like “Expected FROM”. Cause: Fragile handling of case, whitespace, and quotes—inputs like "select * from users" break naive parsers. Fix: In the tokenizer, normalize (e.g., lowercase keywords, trim) and treat quoted strings as single tokens.

Scenario 4: “The DB file is 10GB but only 2GB RAM is available”

Situation: The pager tries to cache every page and the process runs out of memory. Cause: Unbounded cache—every get_page() adds a page with no eviction. Fix: An LRU cache with a cap such as MAX_CACHED_PAGES (e.g., 1000); flush dirty pages before evicting.

Scenario 5: “After a crash, half the data disappears”

Situation: Power loss during commit; after restart, only some changes are visible. Cause: Crash before fsync completes on the WAL, or incorrect recovery ordering. Fix: Treat a commit as complete only after WAL records are fsynced; on recovery, REDO the WAL in order.

Scenario 6: “Two clients update the same row at once”

Situation: Clients A and B both run UPDATE users SET balance = balance - 100. The final balance is wrong. Cause: No concurrency control—both transactions read, modify, and commit, and one update is lost. Fix: Locks or MVCC. This article focuses on single-threaded behavior; concurrency is mentioned under production patterns.

Scenario diagram

flowchart TB
    subgraph Problems[Production scenarios]
        P1[Full scan 3s]
        P2[Partial commit no rollback]
        P3[SQL parse failure]
        P4[Cache OOM]
        P5[Data loss after crash]
    end
    subgraph Solutions[Directions]
        S1[B-Tree index]
        S2[WAL transactions]
        S3[Tokenizer normalization]
        S4[LRU page cache]
        S5[WAL fsync + REDO]
    end
    P1 --> S1
    P2 --> S2
    P3 --> S3
    P4 --> S4
    P5 --> S5

2. DB engine architecture

Big picture

flowchart TB
    subgraph Client[Client]
        C1["SQL string"]
    end
    subgraph Engine[DB engine]
        subgraph Parser[Query parser]
            P1[Tokenizer]
            P2[Syntax analysis]
        end
        subgraph Executor[Executor]
            E1[Execution plan]
            E2[Operators]
        end
        subgraph Storage[Storage engine]
            S1[Pager]
            S2[B-Tree]
        end
        subgraph Txn[Transaction]
            T1[WAL]
        end
    end
    C1 --> Parser
    Parser --> Executor
    Executor --> Storage
    Storage --> Txn

Component roles

Component	Role	Analogy
Query parser	SQL string → syntax tree (AST)	Translator
Executor	AST → actual work (scan, filter, insert)	Worker
Storage engine	Page I/O, B-Tree index	Warehouse
Transaction	WAL, commit/rollback	Ledger

3. Storage engine

Why page-based storage

Disk I/O is most efficient in page-sized chunks. Using 4KB pages aligns with OS page cache and reduces random I/O.

// pager.hpp
#pragma once
#include <fstream>
#include <unordered_map>
#include <memory>
#include <cstring>
#include <cstdint>
constexpr size_t PAGE_SIZE = 4096;
struct Page {
    uint32_t page_id;
    uint8_t data[PAGE_SIZE];
    bool dirty = false;
};
class Pager {
    std::string filename_;
    std::fstream file_;
    std::unordered_map<uint32_t, std::unique_ptr<Page>> cache_;
    uint32_t num_pages_ = 0;
public:
    explicit Pager(const std::string& filename) : filename_(filename) {
        file_.open(filename, std::ios::in | std::ios::out | std::ios::binary);
        if (!file_) {
            file_.open(filename, std::ios::out | std::ios::binary);
            file_.close();
            file_.open(filename, std::ios::in | std::ios::out | std::ios::binary);
        }
        file_.seekg(0, std::ios::end);
        auto size = file_.tellg();
        num_pages_ = static_cast<uint32_t>(size) / PAGE_SIZE;
    }
    Page* get_page(uint32_t page_id) {
        auto it = cache_.find(page_id);
        if (it != cache_.end()) {
            return it->second.get();
        }
        auto page = std::make_unique<Page>();
        page->page_id = page_id;
        if (page_id < num_pages_) {
            file_.seekg(static_cast<std::streamoff>(page_id) * PAGE_SIZE);
            file_.read(reinterpret_cast<char*>(page->data), PAGE_SIZE);
        } else {
            std::memset(page->data, 0, PAGE_SIZE);
            num_pages_ = page_id + 1;
        }
        auto* ptr = page.get();
        cache_[page_id] = std::move(page);
        return ptr;
    }
    void mark_dirty(uint32_t page_id) {
        auto it = cache_.find(page_id);
        if (it != cache_.end()) {
            it->second->dirty = true;
        }
    }
    void flush_page(uint32_t page_id) {
        auto it = cache_.find(page_id);
        if (it == cache_.end() || !it->second->dirty) return;
        file_.seekp(static_cast<std::streamoff>(page_id) * PAGE_SIZE);
        file_.write(reinterpret_cast<const char*>(it->second->data), PAGE_SIZE);
        file_.flush();
        it->second->dirty = false;
    }
    void flush_all() {
        for (auto& [id, page] : cache_) {
            if (page->dirty) flush_page(id);
        }
    }
    uint32_t allocate_page() {
        return num_pages_++;
    }
    ~Pager() {
        flush_all();
    }
};

Key points:

get_page(): on cache miss, load from disk; new pages are zero-filled.
mark_dirty(): marks modified pages.
flush_all(): writes all dirty pages at commit time.

B-Tree node (simplified)

// btree.hpp
#pragma once
#include "pager.hpp"
#include <vector>
#include <optional>
#include <algorithm>
#include <cstring>
struct BTreeNode {
    bool is_leaf;
    uint32_t num_keys;
    std::vector<int32_t> keys;
    std::vector<uint32_t> children;
    std::vector<std::vector<uint8_t>> values;
    static constexpr size_t MAX_KEYS = 100;
    void serialize(uint8_t* buf) const {
        size_t offset = 0;
        std::memcpy(buf + offset, &is_leaf, sizeof(is_leaf));
        offset += sizeof(is_leaf);
        std::memcpy(buf + offset, &num_keys, sizeof(num_keys));
        offset += sizeof(num_keys);
        for (uint32_t i = 0; i < num_keys; ++i) {
            std::memcpy(buf + offset, &keys[i], sizeof(keys[i]));
            offset += sizeof(keys[i]);
        }
        if (!is_leaf) {
            for (uint32_t i = 0; i <= num_keys; ++i) {
                std::memcpy(buf + offset, &children[i], sizeof(children[i]));
                offset += sizeof(children[i]);
            }
        } else {
            for (uint32_t i = 0; i < num_keys; ++i) {
                uint16_t sz = static_cast<uint16_t>(values[i].size());
                std::memcpy(buf + offset, &sz, sizeof(sz));
                offset += sizeof(sz);
                std::memcpy(buf + offset, values[i].data(), sz);
                offset += sz;
            }
        }
    }
    static BTreeNode deserialize(const uint8_t* buf) {
        BTreeNode node;
        size_t offset = 0;
        std::memcpy(&node.is_leaf, buf + offset, sizeof(node.is_leaf));
        offset += sizeof(node.is_leaf);
        std::memcpy(&node.num_keys, buf + offset, sizeof(node.num_keys));
        offset += sizeof(node.num_keys);
        node.keys.resize(node.num_keys);
        for (uint32_t i = 0; i < node.num_keys; ++i) {
            std::memcpy(&node.keys[i], buf + offset, sizeof(node.keys[i]));
            offset += sizeof(node.keys[i]);
        }
        if (!node.is_leaf) {
            node.children.resize(node.num_keys + 1);
            for (uint32_t i = 0; i <= node.num_keys; ++i) {
                std::memcpy(&node.children[i], buf + offset, sizeof(node.children[i]));
                offset += sizeof(node.children[i]);
            }
        } else {
            node.values.resize(node.num_keys);
            for (uint32_t i = 0; i < node.num_keys; ++i) {
                uint16_t sz;
                std::memcpy(&sz, buf + offset, sizeof(sz));
                offset += sizeof(sz);
                node.values[i].resize(sz);
                std::memcpy(node.values[i].data(), buf + offset, sz);
                offset += sz;
            }
        }
        return node;
    }
};
class BTree {
    Pager& pager_;
    uint32_t root_page_id_;
public:
    BTree(Pager& pager, uint32_t root_page_id)
        : pager_(pager), root_page_id_(root_page_id) {}
    std::optional<std::vector<uint8_t>> search(int32_t key) {
        return search_recursive(root_page_id_, key);
    }
    void insert(int32_t key, const std::vector<uint8_t>& value) {
        auto* root_page = pager_.get_page(root_page_id_);
        auto root = BTreeNode::deserialize(root_page->data);
        if (root.num_keys >= BTreeNode::MAX_KEYS) {
            uint32_t new_root = pager_.allocate_page();
            split_root(root_page_id_, new_root);
            root_page_id_ = new_root;
        }
        insert_non_full(root_page_id_, key, value);
    }
private:
    std::optional<std::vector<uint8_t>> search_recursive(uint32_t page_id, int32_t key) {
        auto* page = pager_.get_page(page_id);
        auto node = BTreeNode::deserialize(page->data);
        auto it = std::lower_bound(node.keys.begin(), node.keys.end(), key);
        size_t pos = it - node.keys.begin();
        if (it != node.keys.end() && *it == key) {
            if (node.is_leaf) return node.values[pos];
            return search_recursive(node.children[pos + 1], key);
        }
        if (node.is_leaf) return std::nullopt;
        return search_recursive(node.children[pos], key);
    }
    void split_root(uint32_t old_root, uint32_t new_root) {
        (void)old_root;
        (void)new_root;
    }
    void insert_non_full(uint32_t page_id, int32_t key, const std::vector<uint8_t>& value) {
        auto* page = pager_.get_page(page_id);
        auto node = BTreeNode::deserialize(page->data);
        if (node.is_leaf) {
            auto it = std::lower_bound(node.keys.begin(), node.keys.end(), key);
            size_t pos = it - node.keys.begin();
            node.keys.insert(it, key);
            node.values.insert(node.values.begin() + pos, value);
            node.num_keys++;
            node.serialize(page->data);
            pager_.mark_dirty(page_id);
        } else {
            auto it = std::lower_bound(node.keys.begin(), node.keys.end(), key);
            size_t pos = it - node.keys.begin();
            insert_non_full(node.children[pos], key, value);
        }
    }
};

4. Query parser

Tokenizer and syntax analysis

// parser.hpp
#pragma once
#include <string>
#include <vector>
#include <optional>
#include <sstream>
#include <algorithm>
#include <cctype>
enum class TokenType { SELECT, INSERT, INTO, VALUES, FROM, WHERE, IDENT, NUMBER, STRING, COMMA, LPAREN, RPAREN, EQ, SEMICOLON, END };
struct Token {
    TokenType type;
    std::string value;
};
class Tokenizer {
    std::string input_;
    size_t pos_ = 0;
    char peek() const {
        return pos_ < input_.size() ? input_[pos_] : '\0';
    }
    char consume() {
        return pos_ < input_.size() ? input_[pos_++] : '\0';
    }
    void skip_whitespace() {
        while (std::isspace(static_cast<unsigned char>(peek()))) consume();
    }
public:
    explicit Tokenizer(const std::string& input) : input_(input) {}
    Token next() {
        skip_whitespace();
        if (pos_ >= input_.size()) return {TokenType::END, ""};
        size_t start = pos_;
        if (std::isalpha(static_cast<unsigned char>(peek())) || peek() == '_') {
            while (std::isalnum(static_cast<unsigned char>(peek()))) consume();
            std::string word = input_.substr(start, pos_ - start);
            std::transform(word.begin(), word.end(), word.begin(), ::tolower);
            if (word == "select") return {TokenType::SELECT, word};
            if (word == "insert") return {TokenType::INSERT, word};
            if (word == "into") return {TokenType::INTO, word};
            if (word == "values") return {TokenType::VALUES, word};
            if (word == "from") return {TokenType::FROM, word};
            if (word == "where") return {TokenType::WHERE, word};
            return {TokenType::IDENT, word};
        }
        if (std::isdigit(static_cast<unsigned char>(peek()))) {
            while (std::isdigit(static_cast<unsigned char>(peek()))) consume();
            return {TokenType::NUMBER, input_.substr(start, pos_ - start)};
        }
        if (peek() == '\' || peek() == '"') {
            char quote = consume();
            start = pos_;
            while (peek() != quote && peek() != '\0') consume();
            std::string s = input_.substr(start, pos_ - start);
            if (peek() == quote) consume();
            return {TokenType::STRING, s};
        }
        if (peek() == ',') { consume(); return {TokenType::COMMA, ","}; }
        if (peek() == '(') { consume(); return {TokenType::LPAREN, "("}; }
        if (peek() == ')') { consume(); return {TokenType::RPAREN, ")"}; }
        if (peek() == '=') { consume(); return {TokenType::EQ, "="}; }
        if (peek() == ';') { consume(); return {TokenType::SEMICOLON, ";"}; }
        consume();
        return {TokenType::END, ""};
    }
};
struct SelectStmt {
    std::vector<std::string> columns;
    std::string table_name;
    std::optional<std::string> where_column;
    std::optional<std::string> where_value;
};
struct InsertStmt {
    std::string table_name;
    std::vector<std::string> values;
};
class SQLParser {
    Tokenizer tokenizer_;
    Token current_;
    void advance() { current_ = tokenizer_.next(); }
    bool check(TokenType t) const { return current_.type == t; }
    void expect(TokenType t) {
        if (!check(t)) throw std::runtime_error("Expected token, got: " + current_.value);
        advance();
    }
public:
    explicit SQLParser(const std::string& sql) : tokenizer_(sql) {
        advance();
    }
    SelectStmt parse_select() {
        SelectStmt stmt;
        expect(TokenType::SELECT);
        while (!check(TokenType::FROM)) {
            if (current_.type == TokenType::IDENT) {
                stmt.columns.push_back(current_.value);
                advance();
            }
            if (check(TokenType::COMMA)) advance();
        }
        expect(TokenType::FROM);
        stmt.table_name = current_.value;
        expect(TokenType::IDENT);
        if (check(TokenType::WHERE)) {
            advance();
            stmt.where_column = current_.value;
            expect(TokenType::IDENT);
            expect(TokenType::EQ);
            stmt.where_value = current_.value;
            if (current_.type == TokenType::STRING) advance();
            else expect(TokenType::NUMBER);
        }
        return stmt;
    }
    InsertStmt parse_insert() {
        InsertStmt stmt;
        expect(TokenType::INSERT);
        expect(TokenType::INTO);
        stmt.table_name = current_.value;
        expect(TokenType::IDENT);
        expect(TokenType::VALUES);
        expect(TokenType::LPAREN);
        while (!check(TokenType::RPAREN)) {
            if (current_.type == TokenType::NUMBER)
                stmt.values.push_back(current_.value);
            else if (current_.type == TokenType::STRING)
                stmt.values.push_back(current_.value);
            advance();
            if (check(TokenType::COMMA)) advance();
        }
        expect(TokenType::RPAREN);
        return stmt;
    }
    bool is_select() const { return current_.type == TokenType::SELECT; }
    bool is_insert() const { return current_.type == TokenType::INSERT; }
};

Key points:

Normalization: lowercasing keywords so SELECT and select both work.
Quoted strings: 'Alice' becomes a single STRING token.
expect(): throws with context when the next token is wrong.

5. Executor

SELECT / INSERT execution

// executor.hpp
#pragma once
#include "parser.hpp"
#include "pager.hpp"
#include "btree.hpp"
#include <functional>
#include <vector>
#include <string>
#include <unordered_map>
struct Schema {
    std::string table_name;
    std::vector<std::pair<std::string, std::string>> columns;
};
class Executor {
    Pager& pager_;
    std::unordered_map<std::string, std::pair<Schema, uint32_t>> tables_;
public:
    explicit Executor(Pager& pager) : pager_(pager) {}
    void create_table(const Schema& schema) {
        uint32_t root = pager_.allocate_page();
        tables_[schema.table_name] = {schema, root};
    }
    std::vector<std::vector<std::string>> execute_select(const SelectStmt& stmt) {
        std::vector<std::vector<std::string>> results;
        auto it = tables_.find(stmt.table_name);
        if (it == tables_.end()) throw std::runtime_error("Table not found: " + stmt.table_name);
        const auto& [schema, root_id] = it->second;
        BTree btree(pager_, root_id);
        if (stmt.where_column && stmt.where_value) {
            int32_t key = std::stoi(*stmt.where_value);
            auto opt = btree.search(key);
            if (opt) {
                auto row = deserialize_row(*opt, schema);
                results.push_back(row);
            }
        } else {
            scan_all(btree, schema, [&](const std::vector<std::string>& row) {
                results.push_back(row);
            });
        }
        return results;
    }
    void execute_insert(const InsertStmt& stmt) {
        auto it = tables_.find(stmt.table_name);
        if (it == tables_.end()) throw std::runtime_error("Table not found: " + stmt.table_name);
        const auto& [schema, root_id] = it->second;
        BTree btree(pager_, root_id);
        int32_t key = std::stoi(stmt.values[0]);
        std::vector<uint8_t> row_data = serialize_row(stmt.values, schema);
        btree.insert(key, row_data);
    }
private:
    std::vector<uint8_t> serialize_row(const std::vector<std::string>& values, const Schema& schema) {
        std::vector<uint8_t> data;
        for (size_t i = 0; i < schema.columns.size() && i < values.size(); ++i) {
            const auto& [name, type] = schema.columns[i];
            const auto& val = values[i];
            (void)name;
            if (type == "int") {
                int32_t num = std::stoi(val);
                data.insert(data.end(),
                    reinterpret_cast<const uint8_t*>(&num),
                    reinterpret_cast<const uint8_t*>(&num) + sizeof(num));
            } else {
                uint16_t len = static_cast<uint16_t>(val.size());
                data.insert(data.end(),
                    reinterpret_cast<const uint8_t*>(&len),
                    reinterpret_cast<const uint8_t*>(&len) + sizeof(len));
                data.insert(data.end(), val.begin(), val.end());
            }
        }
        return data;
    }
    std::vector<std::string> deserialize_row(const std::vector<uint8_t>& data, const Schema& schema) {
        std::vector<std::string> row;
        size_t offset = 0;
        for (const auto& [name, type] : schema.columns) {
            (void)name;
            if (type == "int") {
                int32_t num;
                std::memcpy(&num, data.data() + offset, sizeof(num));
                row.push_back(std::to_string(num));
                offset += sizeof(num);
            } else {
                uint16_t len;
                std::memcpy(&len, data.data() + offset, sizeof(len));
                offset += sizeof(len);
                row.push_back(std::string(data.begin() + offset, data.begin() + offset + len));
                offset += len;
            }
        }
        return row;
    }
    void scan_all(BTree& btree, const Schema& schema,
                  const std::function<void(const std::vector<std::string>&)>& callback) {
        (void)btree;
        (void)schema;
        (void)callback;
    }
};

6. Transaction

WAL (write-ahead logging)

// transaction.hpp
#pragma once
#include "pager.hpp"
#include <fstream>
#include <vector>
#include <cstring>
#include <cstdio>  // EOF
struct WALEntry {
    enum Type { INSERT, UPDATE, DELETE } type;
    uint32_t page_id;
    std::vector<uint8_t> before_image;
    std::vector<uint8_t> after_image;
};
class TransactionManager {
    std::string wal_path_;
    std::fstream wal_file_;
    std::vector<WALEntry> current_txn_;
    bool in_transaction_ = false;
public:
    explicit TransactionManager(const std::string& db_path)
        : wal_path_(db_path + ".wal") {
        wal_file_.open(wal_path_, std::ios::in | std::ios::out | std::ios::binary | std::ios::app);
        if (!wal_file_) {
            wal_file_.open(wal_path_, std::ios::out | std::ios::binary);
            wal_file_.close();
            wal_file_.open(wal_path_, std::ios::in | std::ios::out | std::ios::binary | std::ios::app);
        }
    }
    void begin() {
        if (in_transaction_) throw std::runtime_error("Transaction already in progress");
        current_txn_.clear();
        in_transaction_ = true;
    }
    void log_page_change(uint32_t page_id,
                         const uint8_t* before, const uint8_t* after) {
        WALEntry entry;
        entry.type = WALEntry::UPDATE;
        entry.page_id = page_id;
        entry.before_image.assign(before, before + PAGE_SIZE);
        entry.after_image.assign(after, after + PAGE_SIZE);
        current_txn_.push_back(entry);
    }
    void commit(Pager& pager) {
        if (!in_transaction_) return;
        for (const auto& e : current_txn_) {
            write_entry(e);
        }
        wal_file_.flush();
        pager.flush_all();
        current_txn_.clear();
        in_transaction_ = false;
    }
    void rollback(Pager& pager) {
        if (!in_transaction_) return;
        for (auto it = current_txn_.rbegin(); it != current_txn_.rend(); ++it) {
            auto* page = pager.get_page(it->page_id);
            std::memcpy(page->data, it->before_image.data(), PAGE_SIZE);
            pager.mark_dirty(it->page_id);
        }
        current_txn_.clear();
        in_transaction_ = false;
    }
    void recover(Pager& pager) {
        wal_file_.seekg(0, std::ios::beg);
        while (wal_file_.good() && wal_file_.peek() != EOF) {
            auto entry = read_entry();
            auto* page = pager.get_page(entry.page_id);
            std::memcpy(page->data, entry.after_image.data(), PAGE_SIZE);
            pager.mark_dirty(entry.page_id);
        }
        pager.flush_all();
    }
    bool is_in_transaction() const { return in_transaction_; }
private:
    void write_entry(const WALEntry& e) {
        uint8_t t = static_cast<uint8_t>(e.type);
        wal_file_.write(reinterpret_cast<const char*>(&t), sizeof(t));
        wal_file_.write(reinterpret_cast<const char*>(&e.page_id), sizeof(e.page_id));
        uint32_t before_sz = static_cast<uint32_t>(e.before_image.size());
        wal_file_.write(reinterpret_cast<const char*>(&before_sz), sizeof(before_sz));
        wal_file_.write(reinterpret_cast<const char*>(e.before_image.data()), before_sz);
        uint32_t after_sz = static_cast<uint32_t>(e.after_image.size());
        wal_file_.write(reinterpret_cast<const char*>(&after_sz), sizeof(after_sz));
        wal_file_.write(reinterpret_cast<const char*>(e.after_image.data()), after_sz);
    }
    WALEntry read_entry() {
        WALEntry e;
        uint8_t t;
        wal_file_.read(reinterpret_cast<char*>(&t), sizeof(t));
        e.type = static_cast<WALEntry::Type>(t);
        wal_file_.read(reinterpret_cast<char*>(&e.page_id), sizeof(e.page_id));
        uint32_t before_sz, after_sz;
        wal_file_.read(reinterpret_cast<char*>(&before_sz), sizeof(before_sz));
        e.before_image.resize(before_sz);
        wal_file_.read(reinterpret_cast<char*>(e.before_image.data()), before_sz);
        wal_file_.read(reinterpret_cast<char*>(&after_sz), sizeof(after_sz));
        e.after_image.resize(after_sz);
        wal_file_.read(reinterpret_cast<char*>(e.after_image.data()), after_sz);
        return e;
    }
};

Key points:

Write-ahead: log changes before they are durable on the main file.
Rollback: restore pages from before_image.
Recovery: REDO the WAL from start to finish.

7. End-to-end DB engine example

`Database` class

// database.hpp
#pragma once
#include "pager.hpp"
#include "parser.hpp"
#include "executor.hpp"
#include "transaction.hpp"
#include <iostream>
#include <stdexcept>
class Database {
    Pager pager_;
    TransactionManager txn_mgr_;
    Executor executor_;
public:
    explicit Database(const std::string& path)
        : pager_(path),
          txn_mgr_(path),
          executor_(pager_) {
        txn_mgr_.recover(pager_);
    }
    void execute(const std::string& sql) {
        SQLParser parser(sql);
        if (parser.is_select()) {
            auto stmt = parser.parse_select();
            auto results = executor_.execute_select(stmt);
            for (const auto& row : results) {
                for (const auto& col : row) std::cout << col << " ";
                std::cout << "\n";
            }
        } else if (parser.is_insert()) {
            auto stmt = parser.parse_insert();
            executor_.execute_insert(stmt);
        } else {
            throw std::runtime_error("Unknown statement type");
        }
    }
    void begin() { txn_mgr_.begin(); }
    void commit() { txn_mgr_.commit(pager_); }
    void rollback() { txn_mgr_.rollback(pager_); }
    void create_table(const Schema& schema) {
        executor_.create_table(schema);
    }
};

Example `main`

// main.cpp
#include "database.hpp"
int main() {
    Database db("mydb.dat");
    Schema schema;
    schema.table_name = "users";
    schema.columns = {{"id", "int"}, {"name", "varchar"}, {"age", "int"}};
    db.create_table(schema);
    db.begin();
    try {
        db.execute("INSERT INTO users VALUES (1, 'Alice', 30)");
        db.execute("INSERT INTO users VALUES (2, 'Bob', 25)");
        db.execute("INSERT INTO users VALUES (3, 'Charlie', 35)");
        db.commit();
    } catch (const std::exception& e) {
        db.rollback();
        std::cerr << "Rollback: " << e.what() << "\n";
    }
    db.execute("SELECT id, name, age FROM users WHERE id = 2");
    return 0;
}

Sequence diagram: running SELECT

sequenceDiagram
    participant Client
    participant Database
    participant Parser
    participant Executor
    participant Pager
    participant BTree
    Client->>Database: execute("SELECT ....WHERE id=2")
    Database->>Parser: parse_select()
    Parser->>Parser: tokenize, parse
    Parser-->>Database: SelectStmt
    Database->>Executor: execute_select(stmt)
    Executor->>BTree: search(2)
    BTree->>Pager: get_page()
    Pager-->>BTree: Page*
    BTree-->>Executor: row data
    Executor-->>Database: results
    Database-->>Client: print rows

8. Common mistakes and fixes

Issue 1: “Page not found” or garbage on read

Symptom: get_page(0) returns junk or crashes. Cause: Empty file means num_pages_ == 0 and read() reads zero bytes.

// ❌ Wrong
if (page_id < num_pages_) {
    file_.read(...);
}
// ✅ Correct
if (page_id < num_pages_) {
    file_.seekg(...);
    file_.read(reinterpret_cast<char*>(page->data), PAGE_SIZE);
} else {
    std::memset(page->data, 0, PAGE_SIZE);
    num_pages_ = page_id + 1;
}

Issue 2: B-Tree “vector subscript out of range”

Symptom: Crash on node.values[pos] during leaf insert. Cause: num_keys out of sync with values size.

// ✅ Keep keys and values in lockstep
node.keys.insert(it, key);
node.values.insert(node.values.begin() + pos, value);
node.num_keys++;

Issue 3: Data loss after WAL recovery

Symptom: Some transactions disappear after restart. Cause: Treating a commit as complete without fsync on the WAL.

// ❌ Wrong
wal_file_.flush();
// ✅ Production: fsync is required
wal_file_.flush();
// POSIX: fsync(fileno(wal_file_)) or platform-specific sync

Issue 4: Cache memory explosion (OOM)

Symptom: Opening a 10GB file grows RAM toward 10GB. Cause: no eviction in the pager cache. Fix: cap with LRU and MAX_CACHED (e.g., 1000); call evict_lru_page() when cache_.size() >= MAX_CACHED.

Issue 5: Parser “Expected FROM”

Symptom: "select * from users" fails. Cause: * is not a token. Fix: if (peek() == '*') { consume(); stmt.columns.push_back("*"); }.

Issue 6: Nested transactions / schema version skew

Nested transactions: two `BEGIN` calls corrupt state—use nesting depth or SAVEPOINTs. Schema mismatch: reading old rows after schema change fails—store a schema version in page headers and migrate.

9. Best practices

Page size: use PAGE_SIZE = 4096 to match typical OS pages.
WAL fsync: flush() alone is often insufficient; call platform fsync on commit in production.
B-Tree tuning: size MAX_KEYS to fit a 4KB page—often on the order of hundreds of entries.
Batch flush: flush at commit boundaries; flushing every page update spikes I/O.
Ordered flush: sort dirty pages by page_id for more sequential disk writes.
Error messages: include context, e.g. throw std::runtime_error("Expected FROM, got: " + current_.value);.
Schema metadata: store catalog/system tables on dedicated pages.

10. Production patterns

Pattern 1: LRU page cache

class LRUPager {
    std::list<uint32_t> lru_list_;
    std::unordered_map<uint32_t, std::pair<std::unique_ptr<Page>, std::list<uint32_t>::iterator>> cache_;
    static constexpr size_t MAX_PAGES = 1000;
    void evict() {
        while (cache_.size() >= MAX_PAGES && !lru_list_.empty()) {
            uint32_t victim = lru_list_.back();
            lru_list_.pop_back();
            auto it = cache_.find(victim);
            if (it->second.first->dirty) flush_page(victim);
            cache_.erase(it);
        }
    }
};

Pattern 2: WAL checkpoint

void checkpoint() {
    pager_.flush_all();
    wal_file_.close();
    std::ofstream(wal_path_, std::ios::trunc).close();
    wal_file_.open(wal_path_, std::ios::in | std::ios::out | std::ios::binary | std::ios::app);
}

Pattern 3: Health metrics

Track cache_size, cache_hits, cache_misses; monitor hit_ratio = cache_hits / (cache_hits + cache_misses).

Pattern 4: Backup and restore

void backup(const std::string& backup_path) {
    pager_.flush_all();
    std::filesystem::copy(db_path_, backup_path, std::filesystem::copy_options::overwrite_existing);
    std::filesystem::copy(db_path_ + ".wal", backup_path + ".wal",
        std::filesystem::copy_options::overwrite_existing);
}

Pattern 5: Externalized config and thread safety

Use something like `DBConfig { page_size, max_cached_pages, fsync_on_commit }`. For multiple threads, protect the pager with `std::shared_mutex` (`shared_lock` for `get_page`, `unique_lock` for `flush_page`).

11. Summary and checklist

At a glance

Component	Role
Storage engine	Page-based pager, B-Tree index
Query parser	Tokenizer + parser → AST
Executor	SELECT/INSERT, row serialization
Transaction	WAL, REDO/UNDO

Implementation checklist

Pager: zero new pages, flush_all
B-Tree: keep keys/values in sync, implement split_root
Parser: normalized tokens, handle *
Executor: schema-driven serialization
Transactions: WAL, before_image on rollback
LRU cache (production)
WAL checkpoints (production)

One-line summary: paging, B-Trees, a small SQL parser, an executor, and WAL transactions are enough to sketch a minimal database engine. References:

SQLite Architecture
Database Internals (Alex Petrov, O’Reilly)
cppreference — File I/O

C++ database connectivity — SQLite, PostgreSQL, pools, transactions [#31-3]
C++ binary serialization — endianness, padding, save files
C++ database engine implementation — B-Tree, transactions, query optimization [#50-4]

Keywords (search)

C++, database, DB engine, storage engine, query parser, transaction, ACID, and related terms should surface this article in search.

FAQ (from the body)

Q. When do I use this at work?

A. Whenever you implement or embed a small engine in C++: paging, a parser, executor, and ACID transactions—with scenarios, pitfalls, and patterns like those above.

Q. What should I read before this?

A. Follow Previous / Related links at the bottom of each post, or use the C++ series index for the full learning path.

Q. How do I go deeper?

A. Use cppreference and official docs for libraries you depend on, plus the links at the end of this article.

Full C++ series
C++ Adapter pattern — interface adaptation
C++ ADL (argument-dependent lookup)
C++ aggregate initialization

이 글의 핵심

Introduction: “Why does finding one key in a million rows take three seconds?”

Why DB engine fundamentals

A mental model

A DB engine maps well to a library. The storage engine is bookshelves (pages); a B-Tree index is the card catalog; the executor is the procedure for locating and reading a book. Scanning every row without an index is like walking every shelf from end to end—cost grows with table size.

Table of contents

1. Problem scenarios

Scenario 1: “Finding id=12345 in a million rows takes three seconds”

Scenario 2: “If one of ten INSERTs fails, I need to roll back all of them”

Scenario 3: “Parsing SELECT * FROM users WHERE name = 'Alice' fails”

Scenario 4: “The DB file is 10GB but only 2GB RAM is available”

Scenario 5: “After a crash, half the data disappears”

Scenario 6: “Two clients update the same row at once”

Scenario diagram

2. DB engine architecture

Big picture

Component roles

3. Storage engine

Why page-based storage

Pager: page cache

B-Tree node (simplified)

4. Query parser

Tokenizer and syntax analysis

5. Executor

SELECT / INSERT execution

6. Transaction

WAL (write-ahead logging)

7. End-to-end DB engine example

Database class

Example main

Sequence diagram: running SELECT

8. Common mistakes and fixes

Issue 1: “Page not found” or garbage on read

Issue 2: B-Tree “vector subscript out of range”

Issue 3: Data loss after WAL recovery

Issue 4: Cache memory explosion (OOM)

Issue 5: Parser “Expected FROM”

Issue 6: Nested transactions / schema version skew

Nested transactions: two BEGIN calls corrupt state—use nesting depth or SAVEPOINTs. Schema mismatch: reading old rows after schema change fails—store a schema version in page headers and migrate.

9. Best practices

10. Production patterns

Pattern 1: LRU page cache

Pattern 2: WAL checkpoint

Pattern 3: Health metrics

Pattern 4: Backup and restore

Pattern 5: Externalized config and thread safety

Use something like DBConfig { page_size, max_cached_pages, fsync_on_commit }. For multiple threads, protect the pager with std::shared_mutex (shared_lock for get_page, unique_lock for flush_page).

11. Summary and checklist

At a glance

Implementation checklist

Related posts on this site

Keywords (search)

FAQ (from the body)

Q. When do I use this at work?

Q. What should I read before this?

Q. How do I go deeper?

A. Use cppreference and official docs for libraries you depend on, plus the links at the end of this article.

More related posts

이 글 공유하기

Scenario 3: “Parsing `SELECT * FROM users WHERE name = 'Alice'` fails”

`Database` class

Example `main`

Nested transactions: two `BEGIN` calls corrupt state—use nesting depth or SAVEPOINTs. Schema mismatch: reading old rows after schema change fails—store a schema version in page headers and migrate.

Use something like `DBConfig { page_size, max_cached_pages, fsync_on_commit }`. For multiple threads, protect the pager with `std::shared_mutex` (`shared_lock` for `get_page`, `unique_lock` for `flush_page`).