C++ Memory Leak Debugging Case Study | Fixing a Production Server Memory Spike
이 글의 핵심
Real-world C++ production memory leak debugging with Valgrind, AddressSanitizer, and Heaptrack.
Introduction
In production, memory leaks are bugs that slowly kill a server. This article walks through a real leak we hit: from first symptoms through root cause, fix, and prevention.
What you will learn
- How to spot memory leak symptoms early
- How to use Valgrind, ASan, and Heaptrack in practice
- Strategies for tracing leaks in a large codebase
- Coding patterns that help prevent leaks
Table of contents
- Symptom: server memory keeps growing
- Initial analysis: monitoring data
- Tool choice: Valgrind vs ASan vs Heaptrack
- First pass with Valgrind
- Fast reproduction with ASan
- Allocation patterns with Heaptrack
- Root cause: accumulating event listeners
- Fix: RAII and smart pointers
- Verification: comparing memory profiles
- Prevention: ASan in CI
- Closing thoughts
1. Symptom: server memory keeps growing
What we saw
We ran a chat server. Starting three days after deploy, memory use grew roughly linearly.
# Right after deploy
$ ps aux | grep chat_server
user 12345 0.5 2.1 524288 ... ./chat_server
# Three days later
$ ps aux | grep chat_server
user 12345 0.5 8.7 2162688 ... ./chat_server
# Seven days later (killed by OOM killer)
[ 123.456] Out of memory: Killed process 12345 (chat_server)
Early hypotheses
- Are connection objects not freed properly?
- Is a log buffer growing without bound?
- Is a cache growing forever?
2. Initial analysis: monitoring data
Prometheus metrics
// Metrics collection added to the server
class MemoryMetrics {
public:
static size_t getCurrentRSS() {
std::ifstream stat("/proc/self/status");
std::string line;
while (std::getline(stat, line)) {
if (line.find("VmRSS:") == 0) {
std::istringstream iss(line);
std::string key, value, unit;
iss >> key >> value >> unit;
return std::stoull(value) * 1024; // KB to bytes
}
}
return 0;
}
};
// Send metrics periodically
void reportMetrics() {
auto rss = MemoryMetrics::getCurrentRSS();
prometheus_gauge_set(memory_rss_bytes, rss);
}
Pattern
From Grafana:
- Memory growth rate: ~50 MB per hour
- Connection count: stable (100–200)
- Throughput: unchanged
Conclusion: it is not “memory per connection” but something that accumulates over time.
3. Tool choice: Valgrind vs ASan vs Heaptrack
Comparison
| Tool | Strengths | Weaknesses | Best for |
|---|---|---|---|
| Valgrind | Accurate leak detection | Very slow (10–50×) | Dev, small repro cases |
| ASan | Fast (~2×), many bug classes | Needs recompile | CI, integration tests |
| Heaptrack | Allocation visualization | Not ideal for “leak only” | Memory profiling |
Strategy
- Try ASan for quick reproduction
- If it does not repro, use Valgrind for deeper analysis
- Use Heaptrack for allocation hotspots
4. First pass with Valgrind
Build and run
# Debug symbols, no optimization
$ g++ -g -O0 -std=c++17 *.cpp -o chat_server
# Run under Valgrind
$ valgrind --leak-check=full --show-leak-kinds=all \
--track-origins=yes --log-file=valgrind.log \
./chat_server
Problem
The server became too slow to reproduce real load. After 10 minutes, memory growth was tiny.
==12345== HEAP SUMMARY:
==12345== in use at exit: 1,234,567 bytes in 1,234 blocks
==12345== total heap usage: 12,345 allocs, 11,111 frees, 123,456,789 bytes allocated
Conclusion: Valgrind is too slow to replay production-like load.
5. Fast reproduction with ASan
ASan build
# Recompile with ASan
$ g++ -g -O1 -fsanitize=address -fno-omit-frame-pointer \
-std=c++17 *.cpp -o chat_server_asan
$ export ASAN_OPTIONS=detect_leaks=1:log_path=asan.log
Load test
# Simulate real traffic
$ ./load_test.sh --connections=200 --duration=600s
Result
In 10 minutes the leak reproduced; ASan reported:
=================================================================
==23456==ERROR: LeakSanitizer: detected memory leaks
Direct leak of 48000 byte(s) in 1000 object(s) allocated from:
#0 0x7f123456 in operator new(unsigned long)
#1 0x7f234567 in EventManager::subscribe(std::string const&, EventCallback)
#2 0x7f345678 in ChatRoom::addUser(User*)
#3 0x7f456789 in Server::handleJoin(Connection*)
...
SUMMARY: AddressSanitizer: 48000 byte(s) leaked in 1000 allocations.
Finding: leak originates in EventManager::subscribe.
6. Allocation patterns with Heaptrack
Running Heaptrack
$ heaptrack ./chat_server
$ heaptrack_gui heaptrack.chat_server.12345.gz
Findings
From the Heaptrack flame graph:
EventManager::subscribeaccounts for 35% of allocations- Allocations keep growing; almost no corresponding frees
- Call stack:
ChatRoom::addUser→subscribe
7. Root cause: accumulating event listeners
Buggy code
class EventManager {
std::unordered_map<std::string, std::vector<EventCallback*>> listeners_;
public:
void subscribe(const std::string& event, EventCallback callback) {
// Bug: allocated with new but never freed
auto* cb = new EventCallback(std::move(callback));
listeners_[event].push_back(cb);
}
void publish(const std::string& event, const EventData& data) {
if (auto it = listeners_.find(event); it != listeners_.end()) {
for (auto* cb : it->second) {
(*cb)(data);
}
}
}
// Destructor does not free listeners!
~EventManager() = default;
};
class ChatRoom {
EventManager& eventMgr_;
public:
void addUser(User* user) {
// Register a listener on every join
eventMgr_.subscribe("message", [user](const EventData& data) {
user->sendMessage(data);
});
// User leaves but listeners remain!
}
};
Why it leaked
- Every
addUserdidnew EventCallback - After a user left, pointers stayed in
listeners_ - The destructor did not free them
- 1000 joins → 1000 allocations → 0 frees ≈ 48 KB leak (scaled up in production)
8. Fix: RAII and smart pointers
Option 1: smart pointers
class EventManager {
using CallbackPtr = std::shared_ptr<EventCallback>;
std::unordered_map<std::string, std::vector<CallbackPtr>> listeners_;
public:
// Returns subscription id for later unsubscribe
size_t subscribe(const std::string& event, EventCallback callback) {
auto cb = std::make_shared<EventCallback>(std::move(callback));
listeners_[event].push_back(cb);
return reinterpret_cast<size_t>(cb.get());
}
void unsubscribe(const std::string& event, size_t id) {
auto& cbs = listeners_[event];
cbs.erase(
std::remove_if(cbs.begin(), cbs.end(),
[id](const CallbackPtr& cb) {
return reinterpret_cast<size_t>(cb.get()) == id;
}),
cbs.end()
);
}
~EventManager() = default; // shared_ptr cleans up
};
Option 2: RAII wrapper
class Subscription {
EventManager* mgr_;
std::string event_;
size_t id_;
public:
Subscription(EventManager* mgr, std::string event, size_t id)
: mgr_(mgr), event_(std::move(event)), id_(id) {}
~Subscription() {
if (mgr_) {
mgr_->unsubscribe(event_, id_);
}
}
Subscription(Subscription&& other) noexcept
: mgr_(other.mgr_), event_(std::move(other.event_)), id_(other.id_) {
other.mgr_ = nullptr;
}
Subscription(const Subscription&) = delete;
Subscription& operator=(const Subscription&) = delete;
};
class ChatRoom {
EventManager& eventMgr_;
std::vector<Subscription> subscriptions_;
public:
void addUser(User* user) {
auto id = eventMgr_.subscribe("message", [user](const EventData& data) {
user->sendMessage(data);
});
subscriptions_.emplace_back(&eventMgr_, "message", id);
}
void removeUser(User* user) {
// Removing from subscriptions_ triggers unsubscribe
// (in practice, map users to subscriptions)
}
};
9. Verification: comparing memory profiles
Before
$ heaptrack ./chat_server_before
# After 10 minutes
Peak heap memory: 2.1 GB
Total allocations: 1,234,567
Total deallocations: 234,567
Leaked: 1,000,000 allocations
After
$ heaptrack ./chat_server_after
# After 10 minutes
Peak heap memory: 156 MB
Total allocations: 1,234,567
Total deallocations: 1,234,565
Leaked: 2 allocations (static objects)
ASan final check
$ ./chat_server_asan
# After 10 min load test, exit
=================================================================
==45678==ERROR: LeakSanitizer: 0 byte(s) leaked in 0 allocation(s).
Success: the leak is gone.
10. Prevention: ASan in CI
GitHub Actions
# .github/workflows/sanitizers.yml
name: Memory Sanitizers
on: [push, pull_request]
jobs:
asan:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v3
- name: Build with ASan
run: |
cmake -DCMAKE_BUILD_TYPE=Debug \
-DCMAKE_CXX_FLAGS="-fsanitize=address -fno-omit-frame-pointer" \
-B build
cmake --build build
- name: Run tests with ASan
run: |
export ASAN_OPTIONS=detect_leaks=1:halt_on_error=1
cd build && ctest --output-on-failure
Code review checklist
- If
newis used, is there a matchingdelete(or smart pointer)? - Can this be a smart pointer?
- Is RAII used for resource acquisition?
- For callbacks/listeners, is there an unsubscribe path?
11. Lessons and best practices
Takeaways
- Detect early: wire up memory monitoring from day one of deploys
- Combine tools: Valgrind, ASan, Heaptrack for different situations
- RAII: acquisition is initialization; release in destructors
- Automate: sanitizers in CI to catch regressions
Patterns that help avoid leaks
// Bad: manual memory
class BadCache {
std::map<std::string, Data*> cache_;
public:
void add(const std::string& key, Data* data) {
cache_[key] = data; // who deletes?
}
};
// Good: smart pointers
class GoodCache {
std::map<std::string, std::unique_ptr<Data>> cache_;
public:
void add(const std::string& key, std::unique_ptr<Data> data) {
cache_[key] = std::move(data);
}
};
// Better: value semantics
class BestCache {
std::map<std::string, Data> cache_;
public:
void add(const std::string& key, Data data) {
cache_[key] = std::move(data);
}
};
Closing thoughts
What we learned:
- Leaks often show up slowly—monitoring is non-optional
- Picking the right tool cuts debugging time sharply
- RAII and smart pointers are the baseline for memory safety
- CI sanitizers catch regressions early
If you are fighting memory issues in production, use this workflow systematically.
FAQ
Q1. Can we run ASan in production?
Roughly 2× overhead is common; route a fraction of traffic to an ASan build, or replay production traffic in staging.
Q2. Valgrind says “still reachable”—is that a leak?
“Still reachable” means memory still pointed to at exit. Fine for static singletons; if it grows over time, treat it as a leak.
Q3. Don’t smart pointers cause leaks via cycles?
Break shared_ptr cycles with weak_ptr; prefer unique_ptr when ownership is clear.
Related posts
- C++ smart pointers guide
- C++ RAII
- C++ Valgrind
- C++ ASan debugging
Checklists
Memory leak debugging
- Memory monitoring (Prometheus, Grafana)
- Pattern analysis (linear, stepped, periodic)
- Tool choice (Valgrind, ASan, Heaptrack)
- Repro environment (load tests)
- Call stack analysis (where allocations happen)
- Root cause (why no free?)
- Fix (RAII, smart pointers)
- Verify (profile before/after)
- Sanitizers in CI
- Update review guidelines
Memory-safe coding
-
new/deletepairing or smart pointers - RAII for resources
- Unsubscribe path for callbacks/listeners
- Check for cycles (
weak_ptr) - Exception safety (still freed on throw?)
Keywords
C++, memory leak, debugging, Valgrind, ASan, AddressSanitizer, Heaptrack, production, case study, RAII, smart pointers, profiling, CI/CD