본문으로 건너뛰기
Previous
Next
Custom C++ Memory Pools: Fixed Blocks, TLS, and Benchmarks

Custom C++ Memory Pools: Fixed Blocks, TLS, and Benchmarks

Custom C++ Memory Pools: Fixed Blocks, TLS, and Benchmarks

이 글의 핵심

Fixed-size block pools, free lists, thread-local pools, object pools, frame allocators, and benchmarking vs global new/delete. When pools help and when they hurt.

Why Memory Pools?

The global heap (new/delete) is a general-purpose allocator that handles any size, any lifetime, from any thread. That flexibility costs performance: lock contention, fragmentation, and bookkeeping overhead.

When a hot path allocates many objects of the same size with the same lifetime pattern, a custom pool can be 5-50x faster than new/delete for that specific case. Game engines, network servers, and database implementations all use pools for exactly this reason.

The tradeoff: pools are less flexible. You must know the object size at pool creation time, and you must carefully track which objects belong to which pool.


Fixed-Size Block Pool

The simplest pool: a slab of memory divided into fixed-size blocks, linked together as a free list. Allocation pops from the list, deallocation pushes back.

#include <cassert>
#include <cstddef>
#include <cstdlib>
#include <new>
#include <iostream>

class FixedBlockPool {
    struct Block {
        Block* next;  // intrusive free list — stored in the block itself
    };

    char*  slab_;        // raw memory
    Block* free_head_;   // head of free list
    size_t block_size_;  // must be >= sizeof(Block*)
    size_t capacity_;    // total number of blocks
    size_t allocated_;   // currently in use

public:
    FixedBlockPool(size_t block_size, size_t capacity)
        : block_size_(std::max(block_size, sizeof(Block*)))
        , capacity_(capacity)
        , allocated_(0)
    {
        // Align block size to pointer alignment
        block_size_ = (block_size_ + alignof(std::max_align_t) - 1)
                    & ~(alignof(std::max_align_t) - 1);

        slab_ = static_cast<char*>(
            std::aligned_alloc(alignof(std::max_align_t),
                               block_size_ * capacity_));
        if (!slab_) throw std::bad_alloc{};

        // Build the free list by linking all blocks
        free_head_ = nullptr;
        for (size_t i = capacity_; i-- > 0; ) {
            Block* b = reinterpret_cast<Block*>(slab_ + i * block_size_);
            b->next = free_head_;
            free_head_ = b;
        }
    }

    ~FixedBlockPool() {
        std::free(slab_);
    }

    // Non-copyable, non-movable
    FixedBlockPool(const FixedBlockPool&) = delete;
    FixedBlockPool& operator=(const FixedBlockPool&) = delete;

    void* allocate() {
        if (!free_head_) return nullptr;  // pool exhausted
        Block* b = free_head_;
        free_head_ = b->next;
        ++allocated_;
        return b;
    }

    void deallocate(void* ptr) {
        if (!ptr) return;
        Block* b = static_cast<Block*>(ptr);
        b->next = free_head_;
        free_head_ = b;
        --allocated_;
    }

    size_t allocated() const { return allocated_; }
    size_t available() const { return capacity_ - allocated_; }
};

Using the Pool

struct Packet {
    uint32_t id;
    uint32_t length;
    char data[56];  // 64 bytes total
};

int main() {
    FixedBlockPool pool(sizeof(Packet), 1024);

    // Allocate and construct
    void* mem = pool.allocate();
    Packet* p = new(mem) Packet{};  // placement new — construct in pool memory
    p->id = 1;
    p->length = 10;

    // Destroy and return to pool
    p->~Packet();          // explicit destructor — no delete
    pool.deallocate(mem);

    std::cout << "Available: " << pool.available() << '\n';  // 1024
}

Object Pool (RAII Wrapper)

A cleaner interface that handles construction/destruction automatically:

#include <memory>
#include <functional>

template<typename T>
class ObjectPool {
    FixedBlockPool block_pool_;

public:
    explicit ObjectPool(size_t capacity)
        : block_pool_(sizeof(T), capacity) {}

    // Custom deleter that returns the block to the pool
    using UniquePtr = std::unique_ptr<T, std::function<void(T*)>>;

    template<typename... Args>
    UniquePtr acquire(Args&&... args) {
        void* mem = block_pool_.allocate();
        if (!mem) throw std::bad_alloc{};

        T* obj = new(mem) T(std::forward<Args>(args)...);

        return UniquePtr(obj, [this](T* p) {
            p->~T();                    // call destructor
            block_pool_.deallocate(p);  // return to pool
        });
    }

    size_t available() const { return block_pool_.available(); }
};

// Usage
struct Connection {
    int fd;
    std::string address;
    Connection(int fd, std::string addr) : fd(fd), address(std::move(addr)) {}
    ~Connection() { std::cout << "Connection " << fd << " closed\n"; }
};

int main() {
    ObjectPool<Connection> pool(100);

    auto conn = pool.acquire(42, "192.168.1.1");
    std::cout << "Connected: " << conn->address << '\n';
    std::cout << "Available: " << pool.available() << '\n';  // 99

    // conn goes out of scope — destructor called, block returned to pool
}

Thread-Local Pool

For multi-threaded code, each thread gets its own pool. No locks needed:

class TLSPool {
    static constexpr size_t BLOCK_SIZE = 64;
    static constexpr size_t CAPACITY   = 4096;

    struct ThreadPool {
        FixedBlockPool pool{BLOCK_SIZE, CAPACITY};
    };

    // Each thread has its own pool instance
    static thread_local ThreadPool tl_pool_;

public:
    static void* allocate() {
        return tl_pool_.pool.allocate();
    }

    static void deallocate(void* ptr) {
        tl_pool_.pool.deallocate(ptr);
    }
};

thread_local TLSPool::ThreadPool TLSPool::tl_pool_;

The thread-local pool is lock-free — each thread allocates and frees from its own pool without ever touching another thread’s state.


Frame Allocator (Linear / Bump Allocator)

For objects that all share the same lifetime (a single game frame, a request handler), a bump allocator is even faster than a free list — just increment a pointer:

class FrameAllocator {
    char*  buffer_;
    size_t capacity_;
    size_t offset_;  // next free byte

public:
    explicit FrameAllocator(size_t capacity)
        : buffer_(new char[capacity])
        , capacity_(capacity)
        , offset_(0)
    {}

    ~FrameAllocator() { delete[] buffer_; }

    void* allocate(size_t size, size_t align = alignof(std::max_align_t)) {
        // Round up offset to alignment
        size_t aligned = (offset_ + align - 1) & ~(align - 1);
        if (aligned + size > capacity_) return nullptr;

        void* ptr = buffer_ + aligned;
        offset_ = aligned + size;
        return ptr;
    }

    // Reset all allocations at once — O(1)
    void reset() {
        offset_ = 0;
        // All pointers from before reset() are now invalid
    }

    size_t used() const { return offset_; }
    size_t remaining() const { return capacity_ - offset_; }
};

// Game loop usage
int main() {
    FrameAllocator frame(1024 * 1024);  // 1 MB per frame

    for (int frameNum = 0; frameNum < 3; ++frameNum) {
        // All per-frame allocations from the frame allocator
        auto* enemies  = static_cast<int*>(frame.allocate(sizeof(int) * 100));
        auto* messages = static_cast<char*>(frame.allocate(256));

        // Use enemies and messages this frame...
        (void)enemies; (void)messages;

        // End of frame — free everything at once
        frame.reset();
        std::cout << "Frame " << frameNum << " done\n";
    }
}

No individual frees — the whole frame is reset at once. This is extremely cache-friendly since all allocations are contiguous.


Scenarios for Each Pool Type

ProblemBest Pool
Many objects of the same type (particles, packets)Fixed-block pool
Multi-threaded, high-churn allocationsThread-local pool
Per-frame or per-request temp dataFrame/linear allocator
Heterogeneous small objectsSize-class pools (multiple fixed-block pools)
Already using standard containersstd::pmr with monotonic_buffer_resource

Benchmarking Pool vs new/delete

Always benchmark on your actual hardware and workload. Here’s a simple benchmark pattern:

#include <chrono>
#include <vector>
#include <iostream>

template<typename F>
double measureMs(F&& f, int iterations) {
    auto start = std::chrono::steady_clock::now();
    f(iterations);
    auto end = std::chrono::steady_clock::now();
    return std::chrono::duration<double, std::milli>(end - start).count();
}

struct SmallObj { int data[8]; };

void benchNewDelete(int n) {
    std::vector<SmallObj*> ptrs;
    ptrs.reserve(n);
    for (int i = 0; i < n; ++i) ptrs.push_back(new SmallObj{});
    for (auto* p : ptrs) delete p;
}

void benchPool(int n) {
    FixedBlockPool pool(sizeof(SmallObj), static_cast<size_t>(n));
    std::vector<void*> ptrs;
    ptrs.reserve(n);
    for (int i = 0; i < n; ++i) ptrs.push_back(pool.allocate());
    for (auto* p : ptrs) pool.deallocate(p);
}

int main() {
    const int N = 100'000;
    int reps = 10;

    double nd = 0, pool = 0;
    for (int i = 0; i < reps; ++i) {
        nd   += measureMs(benchNewDelete, N);
        pool += measureMs(benchPool, N);
    }

    std::cout << "new/delete: " << nd/reps   << " ms avg\n";
    std::cout << "pool:       " << pool/reps << " ms avg\n";
    std::cout << "speedup:    " << nd/pool   << "x\n";
}

Typical results on a modern machine for 100K small objects:

  • new/delete: ~15-30 ms (heap lock contention, fragmentation)
  • Fixed-block pool: ~1-3 ms (free list pop/push, no system calls)

Results vary significantly by platform, allocator implementation, and contention level. Measure your specific case.


Common Mistakes

Pool outlived by objects: if the pool is destroyed while objects are still in use, any subsequent deallocation writes into freed memory — silent corruption.

// WRONG
ObjectPool<Widget>* pool = new ObjectPool<Widget>(100);
auto w = pool->acquire();
delete pool;  // pool gone
// w's deleter will call pool->deallocate — pool is destroyed!

// CORRECT — ensure pool outlives all acquired objects
ObjectPool<Widget> pool(100);
auto w = pool.acquire();
// w destroyed before pool goes out of scope — correct order

Deallocating to the wrong pool: two pools of the same block size still have separate slabs. Returning a block from pool A to pool B corrupts pool B’s free list.

Object larger than block size: the block pool rounds up to alignment, but it won’t grow. If you allocate 128 bytes from a 64-byte pool, you overflow into adjacent blocks — silent corruption. Add an assertion:

void* allocate(size_t requested) {
    assert(requested <= block_size_ && "Requested size exceeds pool block size");
    return allocate();
}

Frame allocator: using pointers after reset: the frame allocator’s reset() does not zero memory — old data lingers. Treat all pointers from before reset() as invalid.


Key Takeaways

  • Fixed-block pools eliminate allocator overhead for uniform-sized objects — allocation/deallocation is O(1) with no system calls
  • Thread-local pools are lock-free — each thread owns its pool, no contention
  • Frame allocators are the fastest option when all allocations share the same lifetime — reset is O(1)
  • Always benchmark — pools help when the global allocator is a proven bottleneck, not by assumption
  • Lifetime discipline is critical — the pool must outlive all objects it allocates
  • std::pmr (C++17) provides monotonic_buffer_resource and pool_resource for use with standard containers before writing custom pools

자주 묻는 질문 (FAQ)

Q. 이 내용을 실무에서 언제 쓰나요?

A. Fixed-size block pools, free lists, thread-local pools, object pools, frame allocators, and benchmarking vs global new/d… 실무에서는 위 본문의 예제와 선택 가이드를 참고해 적용하면 됩니다.

Q. 선행으로 읽으면 좋은 글은?

A. 각 글 하단의 이전 글 또는 관련 글 링크를 따라가면 순서대로 배울 수 있습니다. C++ 시리즈 목차에서 전체 흐름을 확인할 수 있습니다.

Q. 더 깊이 공부하려면?

A. cppreference와 해당 라이브러리 공식 문서를 참고하세요. 글 말미의 참고 자료 링크도 활용하면 좋습니다.


같이 보면 좋은 글 (내부 링크)

이 주제와 연결되는 다른 글입니다.


이 글에서 다루는 키워드 (관련 검색어)

C++, memory pool, allocator, fragmentation, object pool, game engine, benchmark 등으로 검색하시면 이 글이 도움이 됩니다.