Asio Deadlock Debugging: Async Callbacks, Locks, and Strands [#49-3]

Asio Deadlock Debugging: Async Callbacks, Locks, and Strands [#49-3]

이 글의 핵심

Why Asio deadlocks are subtle: callbacks, thread pools, and implicit lock order. Use strands, avoid waiting under locks, and debug with gdb thread apply all bt.

Introduction: “The Asio server sometimes hangs”

Asio servers can deadlock when a mutex is held while waiting for another async operation whose completion handler needs the same mutex. Multi-threaded io_context::run() makes this timing-dependent and “hidden.”

Topics:

  • Pattern: lock → *async_ ** → cv.wait while handler needs the lock
  • Different lock orders across threads
  • Strands, minimal lock scope, uniform ordering
  • gdb thread apply all bt, logging, TSan

See also: Multithreaded Asio, Strand.


Scenarios (short)

  • Chat server: sync wait for async_write completion under a lock.
  • Session pool: lock A then B vs B then A.
  • HTTP proxy: blocking wait for upstream under lock.
  • Async logging flush under lock.
  • Timer vs I/O callbacks taking locks in opposite order.

Pattern 1: Lock held while waiting for completion

// Dangerous (deadlock)
void on_send() {
    std::unique_lock<std::mutex> lock(mtx);
    socket.async_write(..., [&](...) {
        std::lock_guard<std::mutex> lk(mtx);  // may block forever
        done = true;
        cv.notify_one();
    });
    cv.wait(lock, [&] { return done; });
}
sequenceDiagram
    participant T1 as Thread 1 (on_send)
    participant T2 as Thread 2 (io.run)
    participant Mtx as mutex
    T1->>Mtx: lock
    T1->>T2: async_write scheduled
    T1->>T1: cv.wait (holds mtx)
    T2->>Mtx: try lock in handler → blocks
    Note over T1,T2: Deadlock

Fix: Continue work in the completion handler, or post to a strand; do not wait on async completion while holding the mutex the handler needs.


Pattern 2: Lock order inversion

Thread A: mtx1 then mtx2. Thread B: mtx2 then mtx1. → Cycle.

Fix: Global order for all mutexes, or std::lock / std::scoped_lock to acquire both atomically.


Solutions: strand, small critical sections, ordering

Per-connection strand serializes handlers for that connection—often no mutex for session state.

boost::asio::bind_executor(strand_, [self = shared_from_this()](auto ec, auto n) {
    self->on_write_done();
});

Rules:

  • Do not wait for async completion while holding locks the handler needs.
  • If multiple mutexes: fixed order or std::scoped_lock.

Debugging

  • Hang + ~0% CPU → suspect deadlock.
  • gdb -p <pid>thread apply all bt full
  • Look for pthread_mutex_lock, cond_wait, and cross-thread cycles.
gdb -p <pid> -batch -ex "thread apply all bt"

TSan (-fsanitize=thread) may report lock-order inversion.


Production patterns

  • Document lock hierarchy (e.g. session → cache → log).
  • Strand-first design for connection state.
  • Optional watchdog timer if progress stalls.
  • SIGUSR1 handler for stack dump (limited; use gdb for all threads).

Checklist

  • No cv.wait under mutex also taken in async handlers for the same operation.
  • Consistent lock order or std::scoped_lock
  • Per-session strand where possible
  • Small lock scope; no unknown callbacks under lock

  • Multithreaded Asio data race
  • Composed operations
  • Asio intro

FAQ

Q. When to use this?
A. Any multi-threaded Asio app with mutexes + condition variables + async I/O.

Q. Next reads?
A. Series index, strand and executor docs.


Summary

  • Deadlock: lock + wait for async completion whose handler needs the same lock.
  • Fix: async chaining, strands, lock ordering, std::scoped_lock.
  • Debug: all-thread backtraces, logging, TSan.

Previous: CMake link errors (#49-2)

Related: High-performance networking guide index


  • Segfault debugging
  • CMake link errors
  • Load balancer