Debugging Memory Leaks Guide | Profiler, Heap Snapshots, Node.js, Python, Java

Debugging Memory Leaks Guide | Profiler, Heap Snapshots, Node.js, Python, Java

이 글의 핵심

Memory leaks cause servers to grow until they crash. This guide gives you a systematic approach — using profilers and heap snapshots — to find and fix leaks in Node.js, Python, and Java.

Memory Leak vs. High Memory Usage

A memory leak is memory that is allocated but never released because references are retained unintentionally. Symptoms:

  • Memory usage increases monotonically over time
  • GC pauses become longer and more frequent
  • Service eventually crashes (OOM / exit code 137 in containers)

High memory usage that plateaus is not a leak — it may be a sizing issue but doesn’t require leak-hunting techniques.


General Approach

1. Confirm the leak — monitor memory over time
2. Isolate the scenario — which request/operation triggers growth?
3. Take heap snapshots before and after
4. Find objects growing between snapshots
5. Trace references back to root cause
6. Fix, deploy, confirm memory stabilizes

Node.js

Monitoring memory

// Log memory usage every 30 seconds
setInterval(() => {
  const m = process.memoryUsage();
  console.log({
    rss: `${Math.round(m.rss / 1024 / 1024)}MB`,       // total process memory
    heapUsed: `${Math.round(m.heapUsed / 1024 / 1024)}MB`,
    heapTotal: `${Math.round(m.heapTotal / 1024 / 1024)}MB`,
    external: `${Math.round(m.external / 1024 / 1024)}MB`,
  });
}, 30000);

Watch heapUsed — if it climbs continuously it confirms a heap leak. If rss climbs but heapUsed doesn’t, the leak may be in native code or external (Buffers).

Heap snapshots with Chrome DevTools

node --inspect app.js

Open Chrome → chrome://inspect → click “inspect” → Memory tab → Take heap snapshot.

Take a snapshot, trigger the suspected leak (run 100 requests, etc.), take another snapshot. In the second snapshot, select “Comparison” view to see what grew.

Heap snapshots programmatically

import v8 from 'v8';
import fs from 'fs';

// Take a snapshot and write to file
const snapshotStream = v8.writeHeapSnapshot();
console.log('Snapshot written to:', snapshotStream);

// Or use heapdump package for older Node versions
// npm install heapdump
// process.kill(process.pid, 'SIGUSR2')  → writes .heapsnapshot

Common Node.js leak patterns

1. Event listeners not removed

// LEAK: adds a listener on every request, never removes it
app.get('/data', (req, res) => {
  emitter.on('update', (data) => {  // grows unboundedly
    res.json(data);
  });
});

// FIX: use .once() or remove the listener
app.get('/data', (req, res) => {
  emitter.once('update', (data) => {
    res.json(data);
  });
});

2. Growing cache with no eviction

// LEAK: cache grows forever
const cache = new Map();
app.get('/user/:id', async (req, res) => {
  if (!cache.has(req.params.id)) {
    cache.set(req.params.id, await fetchUser(req.params.id));
  }
  res.json(cache.get(req.params.id));
});

// FIX: use LRU cache with size limit
import { LRUCache } from 'lru-cache';
const cache = new LRUCache({ max: 1000, ttl: 1000 * 60 * 5 });

3. Closures holding large objects

// LEAK: data (potentially large) is captured in the closure
function processLargeFile(filePath) {
  const data = fs.readFileSync(filePath);  // large buffer
  return setInterval(() => {
    console.log('File size:', data.length);  // data never released
  }, 1000);
}

// FIX: extract only what you need
function processLargeFile(filePath) {
  const size = fs.statSync(filePath).size;  // just the number
  return setInterval(() => {
    console.log('File size:', size);
  }, 1000);
}

4. Timers keeping references alive

// LEAK: interval keeps the object alive forever
class DataProcessor {
  constructor() {
    this.data = new Array(100000).fill('x');
    setInterval(() => this.process(), 1000);  // keeps `this` alive
  }
  process() { /* ... */ }
  destroy() {
    // No way to clean up — interval captures `this`
  }
}

// FIX: store interval reference and clear it
class DataProcessor {
  constructor() {
    this.data = new Array(100000).fill('x');
    this.interval = setInterval(() => this.process(), 1000);
  }
  process() { /* ... */ }
  destroy() {
    clearInterval(this.interval);
    this.data = null;
  }
}

Clinic.js (comprehensive profiling)

npm install -g clinic
clinic heap -- node app.js
# Run your load test, then Ctrl+C
# Opens a flamegraph showing allocations

Python

Monitor memory with tracemalloc

import tracemalloc
import linecache

tracemalloc.start()

# ... run suspected leaky code ...

snapshot = tracemalloc.take_snapshot()
top_stats = snapshot.statistics('lineno')

print("Top 10 memory allocations:")
for stat in top_stats[:10]:
    print(stat)

memory_profiler — line-by-line

pip install memory_profiler
from memory_profiler import profile

@profile
def my_function():
    data = [i for i in range(1000000)]
    result = sum(data)
    return result
python -m memory_profiler script.py

Output shows memory usage per line — spots exactly where allocations happen.

objgraph — find growing objects

pip install objgraph
import objgraph

# Show most common object types
objgraph.show_most_common_types(limit=20)

# Show what's growing between two points
objgraph.show_growth()

# Trace references to an object
objgraph.show_backrefs(objgraph.by_type('MyClass')[0], max_depth=5)

Common Python leak patterns

1. Mutable default argument

# LEAK: list is created once and reused across all calls
def add_item(item, items=[]):
    items.append(item)
    return items

# FIX
def add_item(item, items=None):
    if items is None:
        items = []
    items.append(item)
    return items

2. Circular references with del

# Circular references are handled by GC, but __del__ prevents collection
class Node:
    def __init__(self):
        self.other = None
    def __del__(self):  # prevents GC from collecting the cycle
        pass

# FIX: use weakref for back-references
import weakref

class Node:
    def __init__(self):
        self._other = None

    @property
    def other(self):
        return self._other() if self._other else None

    @other.setter
    def other(self, value):
        self._other = weakref.ref(value)

3. Django queryset caching

# LEAK in background task: qs evaluates and caches all rows
def process_all_users():
    users = User.objects.all()  # 1M rows loaded into memory
    for user in users:
        send_email(user)

# FIX: use iterator() to avoid caching
def process_all_users():
    for user in User.objects.all().iterator(chunk_size=1000):
        send_email(user)

Java / JVM

Heap dump

# Trigger heap dump on OOM automatically
java -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/tmp/heapdump.hprof -jar app.jar

# Trigger manually (find PID first)
jmap -dump:live,format=b,file=/tmp/heapdump.hprof <PID>

Analyze with Eclipse Memory Analyzer (MAT) or VisualVM:

  • “Leak Suspects Report” in MAT identifies the most likely leak
  • Look for objects with many retained instances

jstat — GC monitoring

# Monitor GC every 1 second
jstat -gcutil <PID> 1000

# Output columns: S0% S1% E% O% M% YGC YGCT FGC FGCT GCT
# O% = Old Gen usage — if this grows continuously, there's a leak

Common Java leak patterns

1. Static collections

// LEAK: static map grows forever
public class Cache {
    private static final Map<String, Object> store = new HashMap<>();
    
    public static void put(String key, Object value) {
        store.put(key, value);  // never evicted
    }
}

// FIX: use Caffeine or Guava cache with eviction
import com.github.benmanes.caffeine.cache.Cache;
import com.github.benmanes.caffeine.cache.Caffeine;

Cache<String, Object> cache = Caffeine.newBuilder()
    .maximumSize(10_000)
    .expireAfterWrite(Duration.ofMinutes(5))
    .build();

2. Unclosed resources

// LEAK: connection never closed if exception is thrown
public void query() throws SQLException {
    Connection conn = dataSource.getConnection();
    Statement stmt = conn.createStatement();
    ResultSet rs = stmt.executeQuery("SELECT ...");
    // Exception here → conn never closed → connection leak
}

// FIX: try-with-resources
public void query() throws SQLException {
    try (Connection conn = dataSource.getConnection();
         Statement stmt = conn.createStatement();
         ResultSet rs = stmt.executeQuery("SELECT ...")) {
        // auto-closed even on exception
    }
}

3. ThreadLocal not cleaned up

// LEAK in thread pools: ThreadLocal value survives thread reuse
private static final ThreadLocal<LargeObject> holder = new ThreadLocal<>();

// FIX: always remove in finally
LargeObject obj = new LargeObject();
holder.set(obj);
try {
    processRequest();
} finally {
    holder.remove();  // critical in thread pool environments
}

Container / Kubernetes Context

# Watch memory usage of pods
kubectl top pods --sort-by=memory

# Check OOMKilled history
kubectl get events --field-selector reason=OOMKilling

# If container is OOMKilled, increase limit and check for leak
kubectl describe pod <pod-name>
# Look for: OOMKilled, Last State exit code 137

Set a memory limit in Kubernetes — this forces a crash (which you’ll notice) instead of unbounded growth:

resources:
  requests:
    memory: "256Mi"
  limits:
    memory: "512Mi"

Profiling Tools Summary

LanguageToolUse for
Node.jsChrome DevTools / --inspectHeap snapshots, allocation timeline
Node.jsClinic.jsProduction-safe heap profiling
Node.jsprocess.memoryUsage()Continuous monitoring
PythontracemallocAllocation by file/line
Pythonmemory_profilerLine-by-line memory usage
PythonobjgraphObject count growth
Javajmap + MATHeap dump analysis
Javajstat -gcutilGC health monitoring
AllPrometheus + GrafanaLong-term memory trend

Key Takeaways

  1. Confirm first — monitor memory over time before hunting
  2. Snapshot comparison — take heap snapshots before/after suspected scenario
  3. Common culprits: event listeners, unbounded caches, closures, static collections, unclosed resources
  4. Use LRU/TTL caches — every in-memory cache needs an eviction policy
  5. try-with-resources / finally cleanup — don’t trust GC for external resources
  6. Container limits — set memory limits in Kubernetes to catch leaks fast

Memory leaks are almost always about forgotten references. Once you can see which objects are accumulating in a heap snapshot, tracing back to where they’re being held is usually straightforward.