Asio Deadlock Debugging: Async Callbacks, Locks, and Strands
이 글의 핵심
Hidden deadlocks in Boost.Asio: when holding a mutex while waiting for async completion causes a cycle, how lock ordering prevents cross-thread deadlocks, and using strands to eliminate mutexes entirely.
The Asio Deadlock Problem
Boost.Asio async servers typically run with multiple threads calling io_context::run(). Completion handlers execute on whichever thread picks them up. This makes it tempting to protect shared state with mutexes — but it also creates a class of deadlocks that are timing-dependent and hard to reproduce.
The fundamental pattern:
- Thread A holds a mutex and waits for an async operation to complete
- The async operation’s completion handler, running on Thread B, tries to acquire the same mutex
- Thread B blocks forever — Thread A never releases the mutex because it’s waiting for Thread B
In synchronous code this would be obvious. In async code, the lock acquisition in step 1 and the handler in step 2 might be in completely different files.
Setting Up the Examples
All examples use Boost.Asio with C++17:
#include <boost/asio.hpp>
#include <mutex>
#include <condition_variable>
#include <thread>
#include <iostream>
#include <memory>
namespace asio = boost::asio;
using tcp = asio::ip::tcp;
Deadlock Pattern 1: Mutex Held While Waiting for Async Completion
This is the most common Asio deadlock:
class BrokenSession {
tcp::socket socket_;
std::mutex mtx_;
std::condition_variable cv_;
bool write_done_ = false;
public:
void sendSync(std::string data) {
std::unique_lock<std::mutex> lock(mtx_); // acquire mutex
write_done_ = false;
asio::async_write(socket_, asio::buffer(data),
[this](boost::system::error_code ec, std::size_t) {
// This handler runs on an io_context thread
std::lock_guard<std::mutex> lk(mtx_); // DEADLOCK: mtx_ already held
write_done_ = true;
cv_.notify_one();
});
// Wait for the handler to signal completion
cv_.wait(lock, [this] { return write_done_; });
// cv_.wait releases the mutex while waiting — BUT the handler
// runs immediately on another thread and tries to acquire it again
// The problem: cv_.wait() temporarily releases the mutex, but the handler
// runs before cv_.wait() releases it. Actually the bug is subtler:
// on a single-threaded io_context, the handler can't run while THIS
// thread is blocked in cv_.wait(). On a multi-threaded io_context,
// the handler can run — but cv_.wait() DID release the lock.
// The REAL deadlock: if the completion handler is dispatched to the
// SAME thread (via dispatch), or if the io_context is single-threaded.
}
};
Wait — this is nuanced. Let me show the actual deadlock that happens more reliably:
// The real pattern that ALWAYS deadlocks:
// A non-Asio thread calls sendSync() and calls io_context::run() "inline"
class DefinitelyBrokenSession {
asio::io_context& io_;
tcp::socket socket_;
std::mutex mtx_;
bool done_ = false;
public:
void sendAndWaitBlocking(std::string data) {
{
std::lock_guard<std::mutex> lock(mtx_);
done_ = false;
}
asio::async_write(socket_, asio::buffer(data),
[this](boost::system::error_code, std::size_t) {
std::lock_guard<std::mutex> lock(mtx_); // needs mtx_
done_ = true;
});
// Now block the current thread by spinning (or similar):
// If this thread is ALSO an io_context thread (calling io_.run()),
// it cannot process the completion handler while blocked here
{
std::unique_lock<std::mutex> lock(mtx_);
// Condition waits -- but the handler needs a thread to run on!
// If only one io_context thread, it's now blocked here.
while (!done_) {
lock.unlock();
io_.poll_one(); // drive the io_context manually
lock.lock();
}
}
// This is fragile — the poll_one() + mutex combination is error-prone
}
};
Fix: never block an io_context thread waiting for async completion. Instead, chain operations:
// CORRECT: chain — when write finishes, call the next step
class FixedSession : public std::enable_shared_from_this<FixedSession> {
tcp::socket socket_;
public:
void send(std::string data) {
auto self = shared_from_this();
auto buf = std::make_shared<std::string>(std::move(data));
asio::async_write(socket_, asio::buffer(*buf),
[self, buf](boost::system::error_code ec, std::size_t) {
if (!ec) {
self->onWriteComplete(); // chain to next step
}
});
// Return immediately — don't wait here
}
void onWriteComplete() {
// Continue the session: read next request, process next message, etc.
startRead();
}
};
Deadlock Pattern 2: Lock Order Inversion
Two threads take the same two mutexes but in opposite order:
std::mutex session_mutex; // protects session state
std::mutex cache_mutex; // protects a shared cache
// Thread A (handles incoming data):
void onReceive(const std::string& data) {
std::lock_guard<std::mutex> session_lock(session_mutex); // lock session FIRST
// ... process data ...
{
std::lock_guard<std::mutex> cache_lock(cache_mutex); // then lock cache
cache.update(data);
}
}
// Thread B (flushes cache periodically):
void flushCache() {
std::lock_guard<std::mutex> cache_lock(cache_mutex); // lock cache FIRST
// ... flush cache data ...
{
std::lock_guard<std::mutex> session_lock(session_mutex); // then lock session
// Update session stats
}
}
// DEADLOCK CYCLE:
// Thread A holds session_mutex, waits for cache_mutex
// Thread B holds cache_mutex, waits for session_mutex
Fix option 1: enforce a global lock order (always session → cache, never cache → session):
// Global rule: always acquire session_mutex before cache_mutex
void flushCache() {
// Must acquire session_mutex first, even though we primarily want cache_mutex
std::lock_guard<std::mutex> session_lock(session_mutex);
std::lock_guard<std::mutex> cache_lock(cache_mutex);
// ... flush ...
}
Fix option 2: acquire both atomically with std::scoped_lock (C++17):
// std::scoped_lock acquires multiple mutexes deadlock-free using a try-lock loop
void flushCache() {
std::scoped_lock lock(session_mutex, cache_mutex); // order doesn't matter
// ... flush ...
}
void onReceive(const std::string& data) {
std::scoped_lock lock(session_mutex, cache_mutex);
// ... update both ...
}
std::scoped_lock with multiple arguments uses a deadlock-avoidance algorithm (similar to std::lock) that guarantees no cycle regardless of lock acquisition order.
Deadlock Pattern 3: Strand Misuse
Strands serialize handlers for a connection. Deadlock occurs when you call a synchronous operation from inside a strand-serialized handler:
asio::strand<asio::io_context::executor_type> strand_;
void badHandler() {
// This handler runs inside the strand
// DON'T: calling a synchronous operation that needs the strand
std::future<int> f = std::async(std::launch::async, [this]() {
// This lambda is dispatched to the strand too
asio::dispatch(strand_, [this]() {
// But the strand is already running badHandler() on the current thread!
// If the strand is single-threaded and we're waiting for this to complete...
// It cannot run until badHandler() returns. Deadlock.
doWork();
});
return 42;
});
int result = f.get(); // DEADLOCK: waiting for async task that needs the strand
}
Fix: use asio::post instead of blocking gets, or structure the handler to return and let the chain continue:
// CORRECT: no blocking waits inside strand handlers
void goodHandler() {
// Schedule the next step via post — returns immediately
asio::post(strand_, [this]() {
doWork();
});
// Return — let the strand execute doWork() after this handler finishes
}
Fixing with Per-Connection Strands
The strand-first approach eliminates most mutex needs for per-connection state:
class Session : public std::enable_shared_from_this<Session> {
tcp::socket socket_;
asio::strand<asio::executor> strand_; // serializes this session's handlers
// No mutex needed for these — strand guarantees serial access
std::string write_buffer_;
bool writing_ = false;
std::deque<std::string> pending_writes_;
public:
Session(tcp::socket socket)
: socket_(std::move(socket))
, strand_(socket_.get_executor())
{}
// Can be called from any thread — always posts to strand
void send(std::string data) {
asio::post(strand_, [self = shared_from_this(), data = std::move(data)]() mutable {
self->sendOnStrand(std::move(data));
});
}
private:
// Always called on the strand — no mutex needed
void sendOnStrand(std::string data) {
pending_writes_.push_back(std::move(data));
if (!writing_) {
writeNext();
}
}
void writeNext() {
if (pending_writes_.empty()) {
writing_ = false;
return;
}
writing_ = true;
write_buffer_ = std::move(pending_writes_.front());
pending_writes_.pop_front();
asio::async_write(socket_, asio::buffer(write_buffer_),
asio::bind_executor(strand_, // completion handler runs on strand too
[self = shared_from_this()](boost::system::error_code ec, std::size_t) {
if (!ec) self->writeNext();
}));
}
};
This design handles concurrent sends safely with no locks:
send()can be called from any thread — it posts to the strandsendOnStrand()andwriteNext()run on the strand — no concurrent access- The write chain continues naturally without blocking
Debugging a Live Deadlock
Thread Dump with gdb
When a server hangs at ~0% CPU, attach gdb:
# Find the process ID
ps aux | grep my-server
# Attach and dump all thread backtraces
gdb -p <pid> -batch -ex "thread apply all bt full" -ex "quit" 2>&1 | tee deadlock.txt
# Or interactively:
gdb -p <pid>
(gdb) thread apply all bt full
(gdb) quit
Look for threads blocked in pthread_mutex_lock or std::condition_variable::wait. Find the cycle:
- Thread 1: holds mutex A, waiting for mutex B
- Thread 2: holds mutex B, waiting for mutex A (or waiting for a handler that needs mutex A)
ThreadSanitizer for Lock-Order Inversion
TSan detects lock-order inversions before they cause a deadlock:
# Compile with ThreadSanitizer
clang++ -fsanitize=thread -g -O1 server.cpp -o server -lboost_system
# Run under stress — TSan logs order violations
./server
# TSan output example:
# WARNING: ThreadSanitizer: lock-order-inversion (potential deadlock)
# Cycle in lock order graph: M0 => M1 => M0
Watchdog Timer
Add a watchdog that logs if no progress is made in N seconds:
class Watchdog {
asio::steady_timer timer_;
std::chrono::seconds interval_;
std::function<void()> callback_;
std::atomic<int64_t> last_heartbeat_{0};
public:
Watchdog(asio::io_context& io, std::chrono::seconds interval, std::function<void()> cb)
: timer_(io), interval_(interval), callback_(std::move(cb))
{
arm();
}
void heartbeat() {
last_heartbeat_.store(
std::chrono::steady_clock::now().time_since_epoch().count());
}
private:
void arm() {
timer_.expires_after(interval_);
timer_.async_wait([this](boost::system::error_code ec) {
if (!ec) {
auto now = std::chrono::steady_clock::now().time_since_epoch().count();
auto last = last_heartbeat_.load();
if (now - last > interval_.count() * 1'000'000'000LL) {
callback_(); // log a warning or dump state
}
arm();
}
});
}
};
// Usage:
Watchdog watchdog(io, std::chrono::seconds(30), []() {
std::cerr << "WARNING: no progress for 30 seconds — possible deadlock\n";
// dump active sessions, pending operations, etc.
});
Deadlock Prevention Checklist
| Rule | Why |
|---|---|
| Never hold a mutex while waiting for async completion on the same mutex | The handler can’t run — cycle |
Use std::scoped_lock when acquiring multiple mutexes | Atomic acquisition prevents order inversions |
| Prefer per-connection strands over mutexes for session state | Strand serialization without locks |
| Never block an io_context thread | Prevents handlers from running — use post and chain instead |
| Document the lock hierarchy | Makes order violations visible in code review |
| Run under TSan in CI | Catches order inversions before production |
Key Takeaways
- The core pattern: holding mutex A while waiting for an async completion whose handler needs mutex A creates a deadlock cycle on multi-threaded
io_context - Never block an io_context thread: it prevents completion handlers from running — use async chaining instead
- Lock order inversion: two threads taking the same two mutexes in opposite order → use
std::scoped_lock(m1, m2)for simultaneous acquisition - Strands:
asio::strandserializes handlers without mutexes — prefer it for per-connection state asio::bind_executor(strand_, handler): ensures the completion handler runs on the strand- Debug with gdb:
thread apply all bt fullreveals threads blocked inpthread_mutex_lock - TSan:
-fsanitize=threadcatches lock-order inversions at runtime before they deadlock in production - Watchdog timer: log a warning if no progress for N seconds — catches deadlocks in production
자주 묻는 질문 (FAQ)
Q. 이 내용을 실무에서 언제 쓰나요?
A. Hidden deadlocks in Boost.Asio: mutex + condition_variable with async completion, lock ordering, and fixes with strands,… 실무에서는 위 본문의 예제와 선택 가이드를 참고해 적용하면 됩니다.
Q. 선행으로 읽으면 좋은 글은?
A. 각 글 하단의 이전 글 또는 관련 글 링크를 따라가면 순서대로 배울 수 있습니다. C++ 시리즈 목차에서 전체 흐름을 확인할 수 있습니다.
Q. 더 깊이 공부하려면?
A. cppreference와 해당 라이브러리 공식 문서를 참고하세요. 글 말미의 참고 자료 링크도 활용하면 좋습니다.
같이 보면 좋은 글 (내부 링크)
이 주제와 연결되는 다른 글입니다.
- C++ 멀티스레드 Asio의 딜레마 | Data Race와 Mutex의 한계 [#2]
- C++ mutex로 race condition 해결하기 | 주문 카운터 버그부터 lock_guard까지
- C++ Data Race | ‘Mutex 대신 Atomic을 써야 하는 상황은?’ 면접 단골 질문 정리
이 글에서 다루는 키워드 (관련 검색어)
C++, Boost.Asio, deadlock, debugging, async, strand 등으로 검색하시면 이 글이 도움이 됩니다.