본문으로 건너뛰기
Previous
Next
Asio Deadlock Debugging: Async Callbacks, Locks, and Strands

Asio Deadlock Debugging: Async Callbacks, Locks, and Strands

Asio Deadlock Debugging: Async Callbacks, Locks, and Strands

이 글의 핵심

Hidden deadlocks in Boost.Asio: when holding a mutex while waiting for async completion causes a cycle, how lock ordering prevents cross-thread deadlocks, and using strands to eliminate mutexes entirely.

The Asio Deadlock Problem

Boost.Asio async servers typically run with multiple threads calling io_context::run(). Completion handlers execute on whichever thread picks them up. This makes it tempting to protect shared state with mutexes — but it also creates a class of deadlocks that are timing-dependent and hard to reproduce.

The fundamental pattern:

  1. Thread A holds a mutex and waits for an async operation to complete
  2. The async operation’s completion handler, running on Thread B, tries to acquire the same mutex
  3. Thread B blocks forever — Thread A never releases the mutex because it’s waiting for Thread B

In synchronous code this would be obvious. In async code, the lock acquisition in step 1 and the handler in step 2 might be in completely different files.


Setting Up the Examples

All examples use Boost.Asio with C++17:

#include <boost/asio.hpp>
#include <mutex>
#include <condition_variable>
#include <thread>
#include <iostream>
#include <memory>

namespace asio = boost::asio;
using tcp = asio::ip::tcp;

Deadlock Pattern 1: Mutex Held While Waiting for Async Completion

This is the most common Asio deadlock:

class BrokenSession {
    tcp::socket socket_;
    std::mutex  mtx_;
    std::condition_variable cv_;
    bool        write_done_ = false;

public:
    void sendSync(std::string data) {
        std::unique_lock<std::mutex> lock(mtx_);  // acquire mutex

        write_done_ = false;

        asio::async_write(socket_, asio::buffer(data),
            [this](boost::system::error_code ec, std::size_t) {
                // This handler runs on an io_context thread
                std::lock_guard<std::mutex> lk(mtx_);  // DEADLOCK: mtx_ already held
                write_done_ = true;
                cv_.notify_one();
            });

        // Wait for the handler to signal completion
        cv_.wait(lock, [this] { return write_done_; });
        // cv_.wait releases the mutex while waiting — BUT the handler
        // runs immediately on another thread and tries to acquire it again
        // The problem: cv_.wait() temporarily releases the mutex, but the handler
        // runs before cv_.wait() releases it. Actually the bug is subtler:
        // on a single-threaded io_context, the handler can't run while THIS
        // thread is blocked in cv_.wait(). On a multi-threaded io_context,
        // the handler can run — but cv_.wait() DID release the lock.
        // The REAL deadlock: if the completion handler is dispatched to the
        // SAME thread (via dispatch), or if the io_context is single-threaded.
    }
};

Wait — this is nuanced. Let me show the actual deadlock that happens more reliably:

// The real pattern that ALWAYS deadlocks:
// A non-Asio thread calls sendSync() and calls io_context::run() "inline"

class DefinitelyBrokenSession {
    asio::io_context& io_;
    tcp::socket socket_;
    std::mutex mtx_;
    bool done_ = false;

public:
    void sendAndWaitBlocking(std::string data) {
        {
            std::lock_guard<std::mutex> lock(mtx_);
            done_ = false;
        }

        asio::async_write(socket_, asio::buffer(data),
            [this](boost::system::error_code, std::size_t) {
                std::lock_guard<std::mutex> lock(mtx_);  // needs mtx_
                done_ = true;
            });

        // Now block the current thread by spinning (or similar):
        // If this thread is ALSO an io_context thread (calling io_.run()),
        // it cannot process the completion handler while blocked here
        {
            std::unique_lock<std::mutex> lock(mtx_);
            // Condition waits -- but the handler needs a thread to run on!
            // If only one io_context thread, it's now blocked here.
            while (!done_) {
                lock.unlock();
                io_.poll_one();   // drive the io_context manually
                lock.lock();
            }
        }
        // This is fragile — the poll_one() + mutex combination is error-prone
    }
};

Fix: never block an io_context thread waiting for async completion. Instead, chain operations:

// CORRECT: chain — when write finishes, call the next step
class FixedSession : public std::enable_shared_from_this<FixedSession> {
    tcp::socket socket_;

public:
    void send(std::string data) {
        auto self = shared_from_this();
        auto buf = std::make_shared<std::string>(std::move(data));

        asio::async_write(socket_, asio::buffer(*buf),
            [self, buf](boost::system::error_code ec, std::size_t) {
                if (!ec) {
                    self->onWriteComplete();  // chain to next step
                }
            });
        // Return immediately — don't wait here
    }

    void onWriteComplete() {
        // Continue the session: read next request, process next message, etc.
        startRead();
    }
};

Deadlock Pattern 2: Lock Order Inversion

Two threads take the same two mutexes but in opposite order:

std::mutex session_mutex;   // protects session state
std::mutex cache_mutex;     // protects a shared cache

// Thread A (handles incoming data):
void onReceive(const std::string& data) {
    std::lock_guard<std::mutex> session_lock(session_mutex);  // lock session FIRST
    // ... process data ...
    {
        std::lock_guard<std::mutex> cache_lock(cache_mutex);  // then lock cache
        cache.update(data);
    }
}

// Thread B (flushes cache periodically):
void flushCache() {
    std::lock_guard<std::mutex> cache_lock(cache_mutex);  // lock cache FIRST
    // ... flush cache data ...
    {
        std::lock_guard<std::mutex> session_lock(session_mutex);  // then lock session
        // Update session stats
    }
}

// DEADLOCK CYCLE:
// Thread A holds session_mutex, waits for cache_mutex
// Thread B holds cache_mutex, waits for session_mutex

Fix option 1: enforce a global lock order (always session → cache, never cache → session):

// Global rule: always acquire session_mutex before cache_mutex
void flushCache() {
    // Must acquire session_mutex first, even though we primarily want cache_mutex
    std::lock_guard<std::mutex> session_lock(session_mutex);
    std::lock_guard<std::mutex> cache_lock(cache_mutex);
    // ... flush ...
}

Fix option 2: acquire both atomically with std::scoped_lock (C++17):

// std::scoped_lock acquires multiple mutexes deadlock-free using a try-lock loop
void flushCache() {
    std::scoped_lock lock(session_mutex, cache_mutex);  // order doesn't matter
    // ... flush ...
}

void onReceive(const std::string& data) {
    std::scoped_lock lock(session_mutex, cache_mutex);
    // ... update both ...
}

std::scoped_lock with multiple arguments uses a deadlock-avoidance algorithm (similar to std::lock) that guarantees no cycle regardless of lock acquisition order.


Deadlock Pattern 3: Strand Misuse

Strands serialize handlers for a connection. Deadlock occurs when you call a synchronous operation from inside a strand-serialized handler:

asio::strand<asio::io_context::executor_type> strand_;

void badHandler() {
    // This handler runs inside the strand
    // DON'T: calling a synchronous operation that needs the strand
    std::future<int> f = std::async(std::launch::async, [this]() {
        // This lambda is dispatched to the strand too
        asio::dispatch(strand_, [this]() {
            // But the strand is already running badHandler() on the current thread!
            // If the strand is single-threaded and we're waiting for this to complete...
            // It cannot run until badHandler() returns. Deadlock.
            doWork();
        });
        return 42;
    });

    int result = f.get();  // DEADLOCK: waiting for async task that needs the strand
}

Fix: use asio::post instead of blocking gets, or structure the handler to return and let the chain continue:

// CORRECT: no blocking waits inside strand handlers
void goodHandler() {
    // Schedule the next step via post — returns immediately
    asio::post(strand_, [this]() {
        doWork();
    });
    // Return — let the strand execute doWork() after this handler finishes
}

Fixing with Per-Connection Strands

The strand-first approach eliminates most mutex needs for per-connection state:

class Session : public std::enable_shared_from_this<Session> {
    tcp::socket socket_;
    asio::strand<asio::executor> strand_;  // serializes this session's handlers

    // No mutex needed for these — strand guarantees serial access
    std::string write_buffer_;
    bool        writing_ = false;
    std::deque<std::string> pending_writes_;

public:
    Session(tcp::socket socket)
        : socket_(std::move(socket))
        , strand_(socket_.get_executor())
    {}

    // Can be called from any thread — always posts to strand
    void send(std::string data) {
        asio::post(strand_, [self = shared_from_this(), data = std::move(data)]() mutable {
            self->sendOnStrand(std::move(data));
        });
    }

private:
    // Always called on the strand — no mutex needed
    void sendOnStrand(std::string data) {
        pending_writes_.push_back(std::move(data));
        if (!writing_) {
            writeNext();
        }
    }

    void writeNext() {
        if (pending_writes_.empty()) {
            writing_ = false;
            return;
        }
        writing_ = true;
        write_buffer_ = std::move(pending_writes_.front());
        pending_writes_.pop_front();

        asio::async_write(socket_, asio::buffer(write_buffer_),
            asio::bind_executor(strand_,  // completion handler runs on strand too
                [self = shared_from_this()](boost::system::error_code ec, std::size_t) {
                    if (!ec) self->writeNext();
                }));
    }
};

This design handles concurrent sends safely with no locks:

  • send() can be called from any thread — it posts to the strand
  • sendOnStrand() and writeNext() run on the strand — no concurrent access
  • The write chain continues naturally without blocking

Debugging a Live Deadlock

Thread Dump with gdb

When a server hangs at ~0% CPU, attach gdb:

# Find the process ID
ps aux | grep my-server

# Attach and dump all thread backtraces
gdb -p <pid> -batch -ex "thread apply all bt full" -ex "quit" 2>&1 | tee deadlock.txt

# Or interactively:
gdb -p <pid>
(gdb) thread apply all bt full
(gdb) quit

Look for threads blocked in pthread_mutex_lock or std::condition_variable::wait. Find the cycle:

  • Thread 1: holds mutex A, waiting for mutex B
  • Thread 2: holds mutex B, waiting for mutex A (or waiting for a handler that needs mutex A)

ThreadSanitizer for Lock-Order Inversion

TSan detects lock-order inversions before they cause a deadlock:

# Compile with ThreadSanitizer
clang++ -fsanitize=thread -g -O1 server.cpp -o server -lboost_system

# Run under stress — TSan logs order violations
./server

# TSan output example:
# WARNING: ThreadSanitizer: lock-order-inversion (potential deadlock)
# Cycle in lock order graph: M0 => M1 => M0

Watchdog Timer

Add a watchdog that logs if no progress is made in N seconds:

class Watchdog {
    asio::steady_timer timer_;
    std::chrono::seconds interval_;
    std::function<void()> callback_;
    std::atomic<int64_t> last_heartbeat_{0};

public:
    Watchdog(asio::io_context& io, std::chrono::seconds interval, std::function<void()> cb)
        : timer_(io), interval_(interval), callback_(std::move(cb))
    {
        arm();
    }

    void heartbeat() {
        last_heartbeat_.store(
            std::chrono::steady_clock::now().time_since_epoch().count());
    }

private:
    void arm() {
        timer_.expires_after(interval_);
        timer_.async_wait([this](boost::system::error_code ec) {
            if (!ec) {
                auto now = std::chrono::steady_clock::now().time_since_epoch().count();
                auto last = last_heartbeat_.load();
                if (now - last > interval_.count() * 1'000'000'000LL) {
                    callback_();   // log a warning or dump state
                }
                arm();
            }
        });
    }
};

// Usage:
Watchdog watchdog(io, std::chrono::seconds(30), []() {
    std::cerr << "WARNING: no progress for 30 seconds — possible deadlock\n";
    // dump active sessions, pending operations, etc.
});

Deadlock Prevention Checklist

RuleWhy
Never hold a mutex while waiting for async completion on the same mutexThe handler can’t run — cycle
Use std::scoped_lock when acquiring multiple mutexesAtomic acquisition prevents order inversions
Prefer per-connection strands over mutexes for session stateStrand serialization without locks
Never block an io_context threadPrevents handlers from running — use post and chain instead
Document the lock hierarchyMakes order violations visible in code review
Run under TSan in CICatches order inversions before production

Key Takeaways

  • The core pattern: holding mutex A while waiting for an async completion whose handler needs mutex A creates a deadlock cycle on multi-threaded io_context
  • Never block an io_context thread: it prevents completion handlers from running — use async chaining instead
  • Lock order inversion: two threads taking the same two mutexes in opposite order → use std::scoped_lock(m1, m2) for simultaneous acquisition
  • Strands: asio::strand serializes handlers without mutexes — prefer it for per-connection state
  • asio::bind_executor(strand_, handler): ensures the completion handler runs on the strand
  • Debug with gdb: thread apply all bt full reveals threads blocked in pthread_mutex_lock
  • TSan: -fsanitize=thread catches lock-order inversions at runtime before they deadlock in production
  • Watchdog timer: log a warning if no progress for N seconds — catches deadlocks in production

자주 묻는 질문 (FAQ)

Q. 이 내용을 실무에서 언제 쓰나요?

A. Hidden deadlocks in Boost.Asio: mutex + condition_variable with async completion, lock ordering, and fixes with strands,… 실무에서는 위 본문의 예제와 선택 가이드를 참고해 적용하면 됩니다.

Q. 선행으로 읽으면 좋은 글은?

A. 각 글 하단의 이전 글 또는 관련 글 링크를 따라가면 순서대로 배울 수 있습니다. C++ 시리즈 목차에서 전체 흐름을 확인할 수 있습니다.

Q. 더 깊이 공부하려면?

A. cppreference와 해당 라이브러리 공식 문서를 참고하세요. 글 말미의 참고 자료 링크도 활용하면 좋습니다.


같이 보면 좋은 글 (내부 링크)

이 주제와 연결되는 다른 글입니다.


이 글에서 다루는 키워드 (관련 검색어)

C++, Boost.Asio, deadlock, debugging, async, strand 등으로 검색하시면 이 글이 도움이 됩니다.