Asio Deadlock Debugging: Async Callbacks, Locks, and Strands [#49-3]
이 글의 핵심
Why Asio deadlocks are subtle: callbacks, thread pools, and implicit lock order. Use strands, avoid waiting under locks, and debug with gdb thread apply all bt.
Introduction: “The Asio server sometimes hangs”
Asio servers can deadlock when a mutex is held while waiting for another async operation whose completion handler needs the same mutex. Multi-threaded io_context::run() makes this timing-dependent and “hidden.”
Topics:
- Pattern: lock → *async_ ** →
cv.waitwhile handler needs the lock - Different lock orders across threads
- Strands, minimal lock scope, uniform ordering
- gdb
thread apply all bt, logging, TSan
See also: Multithreaded Asio, Strand.
Scenarios (short)
- Chat server: sync wait for async_write completion under a lock.
- Session pool: lock A then B vs B then A.
- HTTP proxy: blocking wait for upstream under lock.
- Async logging flush under lock.
- Timer vs I/O callbacks taking locks in opposite order.
Pattern 1: Lock held while waiting for completion
// Dangerous (deadlock)
void on_send() {
std::unique_lock<std::mutex> lock(mtx);
socket.async_write(..., [&](...) {
std::lock_guard<std::mutex> lk(mtx); // may block forever
done = true;
cv.notify_one();
});
cv.wait(lock, [&] { return done; });
}
sequenceDiagram
participant T1 as Thread 1 (on_send)
participant T2 as Thread 2 (io.run)
participant Mtx as mutex
T1->>Mtx: lock
T1->>T2: async_write scheduled
T1->>T1: cv.wait (holds mtx)
T2->>Mtx: try lock in handler → blocks
Note over T1,T2: Deadlock
Fix: Continue work in the completion handler, or post to a strand; do not wait on async completion while holding the mutex the handler needs.
Pattern 2: Lock order inversion
Thread A: mtx1 then mtx2. Thread B: mtx2 then mtx1. → Cycle.
Fix: Global order for all mutexes, or std::lock / std::scoped_lock to acquire both atomically.
Solutions: strand, small critical sections, ordering
Per-connection strand serializes handlers for that connection—often no mutex for session state.
boost::asio::bind_executor(strand_, [self = shared_from_this()](auto ec, auto n) {
self->on_write_done();
});
Rules:
- Do not wait for async completion while holding locks the handler needs.
- If multiple mutexes: fixed order or
std::scoped_lock.
Debugging
- Hang + ~0% CPU → suspect deadlock.
gdb -p <pid>→thread apply all bt full- Look for
pthread_mutex_lock,cond_wait, and cross-thread cycles.
gdb -p <pid> -batch -ex "thread apply all bt"
TSan (-fsanitize=thread) may report lock-order inversion.
Production patterns
- Document lock hierarchy (e.g. session → cache → log).
- Strand-first design for connection state.
- Optional watchdog timer if progress stalls.
SIGUSR1handler for stack dump (limited; use gdb for all threads).
Checklist
- No
cv.waitunder mutex also taken in async handlers for the same operation. - Consistent lock order or
std::scoped_lock - Per-session strand where possible
- Small lock scope; no unknown callbacks under lock
Related posts
- Multithreaded Asio data race
- Composed operations
- Asio intro
FAQ
Q. When to use this?
A. Any multi-threaded Asio app with mutexes + condition variables + async I/O.
Q. Next reads?
A. Series index, strand and executor docs.
Summary
- Deadlock: lock + wait for async completion whose handler needs the same lock.
- Fix: async chaining, strands, lock ordering,
std::scoped_lock. - Debug: all-thread backtraces, logging, TSan.
Previous: CMake link errors (#49-2)
Related: High-performance networking guide index
Related articles
- Segfault debugging
- CMake link errors
- Load balancer