Custom C++ Memory Pools: Fixed Blocks, TLS, and Benchmarks
이 글의 핵심
Fixed-size block pools, free lists, thread-local pools, object pools, frame allocators, and benchmarking vs global new/delete. When pools help and when they hurt.
Why Memory Pools?
The global heap (new/delete) is a general-purpose allocator that handles any size, any lifetime, from any thread. That flexibility costs performance: lock contention, fragmentation, and bookkeeping overhead.
When a hot path allocates many objects of the same size with the same lifetime pattern, a custom pool can be 5-50x faster than new/delete for that specific case. Game engines, network servers, and database implementations all use pools for exactly this reason.
The tradeoff: pools are less flexible. You must know the object size at pool creation time, and you must carefully track which objects belong to which pool.
Fixed-Size Block Pool
The simplest pool: a slab of memory divided into fixed-size blocks, linked together as a free list. Allocation pops from the list, deallocation pushes back.
#include <cassert>
#include <cstddef>
#include <cstdlib>
#include <new>
#include <iostream>
class FixedBlockPool {
struct Block {
Block* next; // intrusive free list — stored in the block itself
};
char* slab_; // raw memory
Block* free_head_; // head of free list
size_t block_size_; // must be >= sizeof(Block*)
size_t capacity_; // total number of blocks
size_t allocated_; // currently in use
public:
FixedBlockPool(size_t block_size, size_t capacity)
: block_size_(std::max(block_size, sizeof(Block*)))
, capacity_(capacity)
, allocated_(0)
{
// Align block size to pointer alignment
block_size_ = (block_size_ + alignof(std::max_align_t) - 1)
& ~(alignof(std::max_align_t) - 1);
slab_ = static_cast<char*>(
std::aligned_alloc(alignof(std::max_align_t),
block_size_ * capacity_));
if (!slab_) throw std::bad_alloc{};
// Build the free list by linking all blocks
free_head_ = nullptr;
for (size_t i = capacity_; i-- > 0; ) {
Block* b = reinterpret_cast<Block*>(slab_ + i * block_size_);
b->next = free_head_;
free_head_ = b;
}
}
~FixedBlockPool() {
std::free(slab_);
}
// Non-copyable, non-movable
FixedBlockPool(const FixedBlockPool&) = delete;
FixedBlockPool& operator=(const FixedBlockPool&) = delete;
void* allocate() {
if (!free_head_) return nullptr; // pool exhausted
Block* b = free_head_;
free_head_ = b->next;
++allocated_;
return b;
}
void deallocate(void* ptr) {
if (!ptr) return;
Block* b = static_cast<Block*>(ptr);
b->next = free_head_;
free_head_ = b;
--allocated_;
}
size_t allocated() const { return allocated_; }
size_t available() const { return capacity_ - allocated_; }
};
Using the Pool
struct Packet {
uint32_t id;
uint32_t length;
char data[56]; // 64 bytes total
};
int main() {
FixedBlockPool pool(sizeof(Packet), 1024);
// Allocate and construct
void* mem = pool.allocate();
Packet* p = new(mem) Packet{}; // placement new — construct in pool memory
p->id = 1;
p->length = 10;
// Destroy and return to pool
p->~Packet(); // explicit destructor — no delete
pool.deallocate(mem);
std::cout << "Available: " << pool.available() << '\n'; // 1024
}
Object Pool (RAII Wrapper)
A cleaner interface that handles construction/destruction automatically:
#include <memory>
#include <functional>
template<typename T>
class ObjectPool {
FixedBlockPool block_pool_;
public:
explicit ObjectPool(size_t capacity)
: block_pool_(sizeof(T), capacity) {}
// Custom deleter that returns the block to the pool
using UniquePtr = std::unique_ptr<T, std::function<void(T*)>>;
template<typename... Args>
UniquePtr acquire(Args&&... args) {
void* mem = block_pool_.allocate();
if (!mem) throw std::bad_alloc{};
T* obj = new(mem) T(std::forward<Args>(args)...);
return UniquePtr(obj, [this](T* p) {
p->~T(); // call destructor
block_pool_.deallocate(p); // return to pool
});
}
size_t available() const { return block_pool_.available(); }
};
// Usage
struct Connection {
int fd;
std::string address;
Connection(int fd, std::string addr) : fd(fd), address(std::move(addr)) {}
~Connection() { std::cout << "Connection " << fd << " closed\n"; }
};
int main() {
ObjectPool<Connection> pool(100);
auto conn = pool.acquire(42, "192.168.1.1");
std::cout << "Connected: " << conn->address << '\n';
std::cout << "Available: " << pool.available() << '\n'; // 99
// conn goes out of scope — destructor called, block returned to pool
}
Thread-Local Pool
For multi-threaded code, each thread gets its own pool. No locks needed:
class TLSPool {
static constexpr size_t BLOCK_SIZE = 64;
static constexpr size_t CAPACITY = 4096;
struct ThreadPool {
FixedBlockPool pool{BLOCK_SIZE, CAPACITY};
};
// Each thread has its own pool instance
static thread_local ThreadPool tl_pool_;
public:
static void* allocate() {
return tl_pool_.pool.allocate();
}
static void deallocate(void* ptr) {
tl_pool_.pool.deallocate(ptr);
}
};
thread_local TLSPool::ThreadPool TLSPool::tl_pool_;
The thread-local pool is lock-free — each thread allocates and frees from its own pool without ever touching another thread’s state.
Frame Allocator (Linear / Bump Allocator)
For objects that all share the same lifetime (a single game frame, a request handler), a bump allocator is even faster than a free list — just increment a pointer:
class FrameAllocator {
char* buffer_;
size_t capacity_;
size_t offset_; // next free byte
public:
explicit FrameAllocator(size_t capacity)
: buffer_(new char[capacity])
, capacity_(capacity)
, offset_(0)
{}
~FrameAllocator() { delete[] buffer_; }
void* allocate(size_t size, size_t align = alignof(std::max_align_t)) {
// Round up offset to alignment
size_t aligned = (offset_ + align - 1) & ~(align - 1);
if (aligned + size > capacity_) return nullptr;
void* ptr = buffer_ + aligned;
offset_ = aligned + size;
return ptr;
}
// Reset all allocations at once — O(1)
void reset() {
offset_ = 0;
// All pointers from before reset() are now invalid
}
size_t used() const { return offset_; }
size_t remaining() const { return capacity_ - offset_; }
};
// Game loop usage
int main() {
FrameAllocator frame(1024 * 1024); // 1 MB per frame
for (int frameNum = 0; frameNum < 3; ++frameNum) {
// All per-frame allocations from the frame allocator
auto* enemies = static_cast<int*>(frame.allocate(sizeof(int) * 100));
auto* messages = static_cast<char*>(frame.allocate(256));
// Use enemies and messages this frame...
(void)enemies; (void)messages;
// End of frame — free everything at once
frame.reset();
std::cout << "Frame " << frameNum << " done\n";
}
}
No individual frees — the whole frame is reset at once. This is extremely cache-friendly since all allocations are contiguous.
Scenarios for Each Pool Type
| Problem | Best Pool |
|---|---|
| Many objects of the same type (particles, packets) | Fixed-block pool |
| Multi-threaded, high-churn allocations | Thread-local pool |
| Per-frame or per-request temp data | Frame/linear allocator |
| Heterogeneous small objects | Size-class pools (multiple fixed-block pools) |
| Already using standard containers | std::pmr with monotonic_buffer_resource |
Benchmarking Pool vs new/delete
Always benchmark on your actual hardware and workload. Here’s a simple benchmark pattern:
#include <chrono>
#include <vector>
#include <iostream>
template<typename F>
double measureMs(F&& f, int iterations) {
auto start = std::chrono::steady_clock::now();
f(iterations);
auto end = std::chrono::steady_clock::now();
return std::chrono::duration<double, std::milli>(end - start).count();
}
struct SmallObj { int data[8]; };
void benchNewDelete(int n) {
std::vector<SmallObj*> ptrs;
ptrs.reserve(n);
for (int i = 0; i < n; ++i) ptrs.push_back(new SmallObj{});
for (auto* p : ptrs) delete p;
}
void benchPool(int n) {
FixedBlockPool pool(sizeof(SmallObj), static_cast<size_t>(n));
std::vector<void*> ptrs;
ptrs.reserve(n);
for (int i = 0; i < n; ++i) ptrs.push_back(pool.allocate());
for (auto* p : ptrs) pool.deallocate(p);
}
int main() {
const int N = 100'000;
int reps = 10;
double nd = 0, pool = 0;
for (int i = 0; i < reps; ++i) {
nd += measureMs(benchNewDelete, N);
pool += measureMs(benchPool, N);
}
std::cout << "new/delete: " << nd/reps << " ms avg\n";
std::cout << "pool: " << pool/reps << " ms avg\n";
std::cout << "speedup: " << nd/pool << "x\n";
}
Typical results on a modern machine for 100K small objects:
new/delete: ~15-30 ms (heap lock contention, fragmentation)- Fixed-block pool: ~1-3 ms (free list pop/push, no system calls)
Results vary significantly by platform, allocator implementation, and contention level. Measure your specific case.
Common Mistakes
Pool outlived by objects: if the pool is destroyed while objects are still in use, any subsequent deallocation writes into freed memory — silent corruption.
// WRONG
ObjectPool<Widget>* pool = new ObjectPool<Widget>(100);
auto w = pool->acquire();
delete pool; // pool gone
// w's deleter will call pool->deallocate — pool is destroyed!
// CORRECT — ensure pool outlives all acquired objects
ObjectPool<Widget> pool(100);
auto w = pool.acquire();
// w destroyed before pool goes out of scope — correct order
Deallocating to the wrong pool: two pools of the same block size still have separate slabs. Returning a block from pool A to pool B corrupts pool B’s free list.
Object larger than block size: the block pool rounds up to alignment, but it won’t grow. If you allocate 128 bytes from a 64-byte pool, you overflow into adjacent blocks — silent corruption. Add an assertion:
void* allocate(size_t requested) {
assert(requested <= block_size_ && "Requested size exceeds pool block size");
return allocate();
}
Frame allocator: using pointers after reset: the frame allocator’s reset() does not zero memory — old data lingers. Treat all pointers from before reset() as invalid.
Key Takeaways
- Fixed-block pools eliminate allocator overhead for uniform-sized objects — allocation/deallocation is O(1) with no system calls
- Thread-local pools are lock-free — each thread owns its pool, no contention
- Frame allocators are the fastest option when all allocations share the same lifetime — reset is O(1)
- Always benchmark — pools help when the global allocator is a proven bottleneck, not by assumption
- Lifetime discipline is critical — the pool must outlive all objects it allocates
std::pmr(C++17) providesmonotonic_buffer_resourceandpool_resourcefor use with standard containers before writing custom pools
자주 묻는 질문 (FAQ)
Q. 이 내용을 실무에서 언제 쓰나요?
A. Fixed-size block pools, free lists, thread-local pools, object pools, frame allocators, and benchmarking vs global new/d… 실무에서는 위 본문의 예제와 선택 가이드를 참고해 적용하면 됩니다.
Q. 선행으로 읽으면 좋은 글은?
A. 각 글 하단의 이전 글 또는 관련 글 링크를 따라가면 순서대로 배울 수 있습니다. C++ 시리즈 목차에서 전체 흐름을 확인할 수 있습니다.
Q. 더 깊이 공부하려면?
A. cppreference와 해당 라이브러리 공식 문서를 참고하세요. 글 말미의 참고 자료 링크도 활용하면 좋습니다.
같이 보면 좋은 글 (내부 링크)
이 주제와 연결되는 다른 글입니다.
- [A Minimal “Redis-like” Server in Modern C++ [#48-1]](/en/blog/cpp-series-48-1-redis-clone/
- C++ 초경량 HTTP 웹 프레임워크 바닥부터 만들기 [#48-2]
- [Cache-Friendly C++: Data-Oriented Design and AoS vs SoA](/en/blog/cpp-series-39-1-cache-data-oriented-design/
이 글에서 다루는 키워드 (관련 검색어)
C++, memory pool, allocator, fragmentation, object pool, game engine, benchmark 등으로 검색하시면 이 글이 도움이 됩니다.