C++ std::thread basics — join/detach mistakes, mutex,
이 글의 핵심
Create and manage std::thread with join/detach; mutex, condition_variable, atomic, and jthread basics; process vs thread; common mistakes (missing join, detach misuse) and fixes—with runnable examples.
Introduction: when single-threaded work is too slow
“Why is this taking so long?” — I/O waits and using only one CPU core
In an image conversion service, several user-uploaded files were resized one after another. Even ten files could take more than ten seconds.
The code at the time used a for loop to process files sequentially. loadImage and saveImage hit disk I/O, so the CPU waited; only resize used the CPU, so one thread used one core. With many files, handing each file to a separate thread (a thread is an independent flow of execution inside a process—think of it as several people working at once inside one team, the process) lets you overlap I/O waits and CPU work and reduce total time. After splitting work across threads, wall time for ten files dropped to roughly half or less.
void processAll(const std::vector<std::string>& files) {
for (const auto& path : files) {
Image img = loadImage(path); // disk I/O wait
img.resize(800, 600); // CPU work
saveImage(img, outputPath); // disk I/O wait
}
}
What this shows: the loop runs on one thread, so during loadImage / saveImage I/O the CPU sits idle, and resize still uses only one core. More files mean longer sequential time; per-file threads let you overlap I/O and CPU work.
Why it was slow:
- All work ran on the main thread only
- The CPU idled during disk I/O waits
- The machine had multiple cores but only one was used
After splitting with std::thread:
- One thread per file to overlap I/O and CPU
- For ten files, perceived time dropped to about half or less
That experience drove home the need for multithreading basics.
Threads look like “doing many things at once,” but if several threads touch the same memory without coordination you get data races and bugs. So learn thread creation, join (wait until the thread finishes), and detach (let the thread run in the background), then continue with mutex, condition_variable, and atomic for protecting shared data.
From an OS thread perspective, this lines up with Java Thread / Executor. It differs from Go goroutines and lightweight scheduling tradeoffs. Rust std::thread and channels pair well if you read them alongside ownership rules.
After reading this post you will:
- Know how to create
std::threadinstances and join or detach them - Understand what threads are and why we use them
- Get intuition for thread safety
- Avoid common mistakes (missing join, detach misuse)
More problem scenarios
Scenario 1: UI freezes
In a desktop app, clicking Save compressed a large file and wrote to disk; the UI froze for seconds. Users thought the app hung and clicked repeatedly. Fix: run compression and I/O on a worker thread; keep the main thread for UI events only.
Scenario 2: delayed / corrupted logs
Several worker threads wrote logs to a file at once; while one called fprintf, others touched the same file handle—lines interleaved or the process crashed. Fix: one dedicated log thread; other threads enqueue messages; only the log thread writes to the file.
Scenario 3: waiting on network calls
REST calls blocked the main logic while waiting for responses; sequential calls added latencies. Fix: launch requests with std::thread in parallel and join() until all finish—total wait tends toward the max, not the sum.
Scenario 4: counter does not match
On event day, workers incremented a single shared int for order counts; the total was tens of thousands below the DB. Cause: counter++ is not atomic—data race. Fix: protect with a mutex or use atomic for a single variable.
Scenario 5: server stuck in deadlock
Two threads locked mutex A and mutex B in different orders and waited forever. Requests stopped until restart. Cause: inconsistent lock order. Fix: always lock in the same order (e.g. A then B) or use std::scoped_lock(mtxA, mtxB) to lock both at once.
Scenario 6: 100% CPU from queue polling
A worker polled a queue every 1 ms in a while loop. Even when idle, CPU stayed high. Fix: use condition_variable—“wake only when work arrives”—so idle CPU use drops near zero.
Practical note: this article is grounded in real large-scale C++ projects: pitfalls and debugging tips you rarely see in textbooks.
Table of contents
- What is a thread?
- Creating threads with std::thread
- join and detach: thread lifetime
- Managing threads safely with RAII
- What is thread safety?
- Mutex basics: protecting shared data
- condition_variable basics
- atomic basics: counter without a lock
- std::jthread and stop_token
- Common mistakes and caveats
- Performance: single vs multi-threaded
- Best practices and production patterns
- Implementation checklist
1. What is a thread?
Process vs thread
A process is one running program. Memory, file handles, and environment are per process. Think of a process as one “book in progress”; threads are multiple bookmarks (instruction pointers) in that same book.
A thread is a unit of execution inside that process. A process starts with one main thread and can create more to do work concurrently.
A simplified view of processes, threads, and std::thread:
flowchart TB
subgraph proc["Process (one address space)"]
main[Main thread]
t1[Thread 1]
t2[Thread 2]
main --- t1
main --- t2
end
main -->|std::thread t(fn);| t1
main -->|std::thread t2(fn2);| t2
// Example diagram
graph TB
PROCESS["Process myapp"]
MEMORY["Address space\n━━━━━━━━━━━━━━━\ncode, data, heap\nshared by all threads ⚠️"]
T1["Thread 1 main\n━━━━━━━━━━━━\nmain, UI"]
T2["Thread 2\n━━━━━━━━━━━━\nnetwork recv"]
T3["Thread 3\n━━━━━━━━━━━━\nimage processing"]
T4["Thread 4\n━━━━━━━━━━━━\nlogging"]
PROCESS --> MEMORY
PROCESS --> T1
PROCESS --> T2
PROCESS --> T3
PROCESS --> T4
T1 -.->|shared access| MEMORY
T2 -.->|shared access| MEMORY
T3 -.->|shared access| MEMORY
T4 -.->|shared access| MEMORY
style PROCESS fill:#e1f5fe,stroke:#0277bd,stroke-width:3px
style MEMORY fill:#fff3e0,stroke:#f57c00,stroke-width:3px
style T1 fill:#c8e6c9,stroke:#388e3c,stroke-width:2px
style T2 fill:#c8e6c9,stroke:#388e3c,stroke-width:2px
style T3 fill:#c8e6c9,stroke:#388e3c,stroke-width:2px
style T4 fill:#c8e6c9,stroke:#388e3c,stroke-width:2px
Threads in the same process share the heap, globals, and static storage. Local variables and the stack are per thread.
So if several threads read and write the same global / heap data at once, you get data races: undefined behavior, wrong results, or crashes. Shared data needs mutex or atomic protection.
Why you must choose join or detach: when a std::thread object is destroyed while the thread is still joinable (running and not joined/detached), the standard calls std::terminate() and the program aborts. After creating a thread you must call join() (wait until it finishes) or detach() (disown the thread). If you want workers to outlive main, use detach; if main should wait for all workers before exiting, use join.
Why use multithreading?
- Throughput: while one thread waits on I/O, others can use the CPU
- Responsiveness: keep the UI thread responsive; move heavy work to workers
- Multi-core: split work across cores
2. Creating threads with std::thread
Since C++11, <thread> provides std::thread. Pass a function, lambda, or functor to start a thread.
Passing a function
The std::thread constructor runs the callable on a new thread. For std::thread t(sayHello);, sayHello is a function pointer—do not add (). Adding () would pass the result of sayHello() (i.e. void), which compiles in some cases but is not what you want. t.join() blocks the main thread until that thread finishes. Without join or detach, destroying t calls std::terminate()—you must call join() or detach().
// Paste and build: g++ -std=c++17 -pthread -o thread_hello thread_hello.cpp && ./thread_hello
#include <thread>
#include <iostream>
void sayHello() {
std::cout << "Hello from thread!\n";
}
int main() {
std::thread t(sayHello); // create and run thread
t.join(); // wait until thread finishes
return 0;
}
Explanation: std::thread t(sayHello) runs sayHello on a new thread; t.join() blocks main until it completes. You pass sayHello without () because you pass the function pointer, not the return value of an immediate call. Without join/detach, thread destruction triggers std::terminate().
Output: one line: Hello from thread!
Passing a lambda
#include <thread>
#include <iostream>
int main() {
std::thread t([] {
std::cout << "Hello from lambda thread!\n";
});
t.join();
return 0;
}
Explanation: passing a lambda runs it on the new thread—handy for small snippets. You still must call join() or detach().
Output: Hello from lambda thread!
Passing arguments
#include <thread>
#include <iostream>
void printSum(int a, int b) {
std::cout << "Sum: " << (a + b) << "\n";
}
int main() {
std::thread t(printSum, 10, 20); // pass 10 and 20
t.join();
return 0;
}
Explanation: arguments after the callable are copied by value into the new thread. For references, use std::ref(variable) and ensure the referred object outlives the thread.
Arguments are listed after the function name. Because they are copied, use std::ref() for references.
Output: Sum: 30
Passing references with std::ref
Use std::ref() or std::cref() for references. Important: the referred object must outlive the thread, or you get use-after-free.
#include <thread>
#include <iostream>
#include <functional>
void increment(int& value) {
++value;
}
int main() {
int counter = 0;
std::thread t(increment, std::ref(counter)); // pass by reference
t.join();
std::cout << "counter: " << counter << "\n"; // 1
return 0;
}
Explanation: std::ref(counter) lets the thread mutate counter in place. Without std::ref, a copy is passed and the original is unchanged. Use std::cref() for const references.
Output: counter: 1
3. join and detach: thread lifetime
Kitchen analogy: concurrency is like one cook switching between tasks (single thread). Parallelism is several cooks working on different dishes at once (multiple threads).
When a std::thread object is destroyed while still joinable, std::terminate() runs. You must join or detach.
join vs detach flow
sequenceDiagram
participant M as Main thread
participant T as Worker thread
Note over M,T: When using join()
M->>T: std::thread t(doWork);
M->>M: call t.join()
Note over M: blocking (wait)
T->>T: run doWork()
T->>M: exit
M->>M: join returns, continue
Note over M,T: When using detach()
M->>T: std::thread t(doWork);
M->>M: call t.detach()
M->>M: continue immediately
T->>T: runs independently in background
join(): wait until the thread finishes
std::thread t(doWork);
t.join(); // block until doWork() finishes
// from here on, t has completed
Explanation: join() blocks until the thread ends. After join() returns, thread-local state is gone. You cannot join the same thread twice.
join()blocks until completion.- Return values are not passed through
join()(usestd::async/std::promiselater). - Double join is undefined behavior.
detach(): run in the background
std::thread t(doWork);
t.detach(); // disown t; thread keeps running
// you no longer know when t finishes—lifetime of data it uses must be correct
Explanation: detach() severs the std::thread handle from the OS thread; the thread keeps running. Main does not wait—ensure anything the thread uses is not destroyed before the thread finishes.
- After detach, you cannot join that thread.
- Lifetime of objects the thread uses must extend appropriately—otherwise undefined behavior (e.g. use-after-free).
joinable()
Calling join twice is undefined behavior. Use joinable() to check.
#include <thread>
#include <iostream>
void doWork() {
std::cout << "Working...\n";
}
int main() {
std::thread t(doWork);
if (t.joinable()) {
t.join(); // safe join
}
// t.join(); // ❌ undefined behavior if already joined
return 0;
}
Explanation: joinable() is true until join or detach. Prefer RAII patterns for exception safety.
Practical guidance
- Default to join when you need ordering or when data lives in the current scope.
- Use detach sparingly—e.g. a process-lifetime log thread—and always verify data lifetime.
4. Managing threads safely with RAII
If an exception is thrown before join, destroying a still-joinable std::thread calls std::terminate(). RAII wrappers that join in the destructor are common.
ThreadGuard pattern
#include <thread>
#include <iostream>
#include <stdexcept>
class ThreadGuard {
public:
explicit ThreadGuard(std::thread& t) : t_(t) {}
~ThreadGuard() {
if (t_.joinable()) {
t_.join(); // auto join on destruction
}
}
// non-copyable
ThreadGuard(const ThreadGuard&) = delete;
ThreadGuard& operator=(const ThreadGuard&) = delete;
private:
std::thread& t_;
};
void doWork() {
std::cout << "Working...\n";
}
void mightThrow() {
std::thread t(doWork);
ThreadGuard guard(t); // t.join() on guard destruction
// throw std::runtime_error("oops"); // still joins on exception
}
Explanation: ThreadGuard holds a reference and joins in the destructor if still joinable—so std::terminate() is avoided even on exceptions. C++20 std::jthread standardizes this.
std::jthread (C++20)
std::jthread (joining thread) joins automatically in its destructor.
#include <thread>
#include <iostream>
void doWork() {
std::cout << "Working...\n";
}
int main() {
std::jthread t(doWork); // auto join on destruction
// no manual t.join()
return 0;
}
Explanation: std::jthread joins on destruction—good way to avoid forgetting join. Requires C++20 or newer.
5. What is thread safety?
Code is thread-safe if, when several threads access shared data, results stay correct and there are no memory errors.
Example that is not thread-safe
#include <thread>
#include <iostream>
int counter = 0; // global
void increment() {
for (int i = 0; i < 100000; ++i) {
counter++; // read-modify-write — data race if concurrent
}
}
int main() {
std::thread t1(increment);
std::thread t2(increment);
t1.join();
t2.join();
std::cout << counter << "\n"; // may not be 200000 (data race)
return 0;
}
counter++ is read → add → write. Two threads interleaving those steps can lose updates—a data race.
Explanation: concurrent counter++ is not atomic; one thread’s write can overwrite another’s read-modify-write. The sum can be less than 200000. Protect shared mutation with mutex or atomic.
See mutex and atomic for fixes.
Visualizing a data race
sequenceDiagram
participant T1 as Thread 1
participant T2 as Thread 2
participant C as counter (memory)
Note over C: initial: 0
T1->>C: read (0)
T2->>C: read (0)
T1->>T1: 0+1=1
T2->>T2: 0+1=1
T1->>C: write (1)
T2->>C: write (1)
Note over C: expected 2, got 1 (lost update)
When it is safe
- Read-only shared data (no writes—usually fine).
- Threads use disjoint variables only.
- APIs documented as thread-safe by the standard.
6. Mutex basics: protecting shared data
When several threads modify the same variable, use a mutex (mutual exclusion) so only one thread enters the critical section at a time. std::mutex blocks others until the lock is released. Details are in the next article; here is minimal usage.
std::mutex + lock_guard full example
// Paste and build: g++ -std=c++17 -pthread -o mutex_basic mutex_basic.cpp && ./mutex_basic
#include <thread>
#include <iostream>
#include <mutex>
int counter = 0;
std::mutex counter_mutex;
void safeIncrement() {
for (int i = 0; i < 100000; ++i) {
std::lock_guard<std::mutex> lock(counter_mutex); // acquire
++counter; // critical section: one thread at a time
} // unlock on destruction
}
int main() {
std::thread t1(safeIncrement);
std::thread t2(safeIncrement);
t1.join();
t2.join();
std::cout << "counter: " << counter << "\n"; // always 200000
return 0;
}
Explanation:
- One
std::mutexper shared resource. std::lock_guardlocks on construction and unlocks on destruction (RAII), even on exceptions.++counteris now serialized—no data race.
Output: always counter: 200000
Without vs with mutex
// ❌ No mutex: data race
void unsafeIncrement() {
for (int i = 0; i < 100000; ++i) {
++counter; // concurrent access → undefined results
}
}
// ✅ Mutex: thread-safe
void safeIncrement() {
for (int i = 0; i < 100000; ++i) {
std::lock_guard<std::mutex> lock(counter_mutex);
++counter;
}
}
lock_guard caveats
- Minimize lock scope: holding a lock across I/O or heavy work hurts concurrency—protect only what you must.
- Deadlock: locking two mutexes in different orders can deadlock—see the mutex article for ordering and
std::lock().
7. condition_variable basics
A condition_variable lets threads wait until a predicate becomes true and wake others when state changes. Mutex alone does not express “sleep until the queue is non-empty” cleanly; cv avoids busy polling—foundation for work queues and producer–consumer. Details: next post.
Producer–consumer example
// Paste and build: g++ -std=c++20 -pthread -o cv_demo cv_demo.cpp && ./cv_demo
#include <thread>
#include <iostream>
#include <mutex>
#include <condition_variable>
#include <queue>
#include <chrono>
std::queue<int> queue;
std::mutex mtx;
std::condition_variable cv;
bool done = false;
void producer() {
for (int i = 0; i < 5; ++i) {
{
std::lock_guard<std::mutex> lock(mtx);
queue.push(i);
}
cv.notify_one(); // wake consumer
std::this_thread::sleep_for(std::chrono::milliseconds(100));
}
{
std::lock_guard<std::mutex> lock(mtx);
done = true;
}
cv.notify_one();
}
void consumer() {
while (true) {
std::unique_lock<std::mutex> lock(mtx);
cv.wait(lock, [] { return !queue.empty() || done; }); // wait until predicate
while (!queue.empty()) {
int val = queue.front();
queue.pop();
lock.unlock();
std::cout << "Consumed: " << val << "\n";
lock.lock();
}
if (done) break;
}
}
int main() {
std::thread p(producer);
std::thread c(consumer);
p.join();
c.join();
return 0;
}
Explanation: producer pushes and notify_one(); consumer uses wait(lock, predicate) so it sleeps until the queue has data or done. Always pass a predicate to handle spurious wakeups.
Output: Consumed: 0 through Consumed: 4 in order.
8. atomic basics: counter without a lock
For a single variable (counter, flag), std::atomic is often lighter than a mutex. Operations are atomic at the hardware level. Deep dive: atomic article.
std::atomic full example
// Paste and build: g++ -std=c++17 -pthread -o atomic_demo atomic_demo.cpp && ./atomic_demo
#include <thread>
#include <iostream>
#include <atomic>
std::atomic<int> counter{0};
void increment() {
for (int i = 0; i < 100000; ++i) {
counter++; // atomic — no data race on counter
}
}
int main() {
std::thread t1(increment);
std::thread t2(increment);
t1.join();
t2.join();
std::cout << "counter: " << counter << "\n"; // always 200000
return 0;
}
Explanation: counter++ on std::atomic<int> is atomic—no mutex contention for this pattern. Note: updating several related variables consistently still needs a mutex. Use atomic for single-variable read/write/increment/decrement.
Output: always counter: 200000
atomic vs mutex
| Situation | Prefer | Why |
|---|---|---|
| Single variable (counter, flag) | std::atomic | Lock-free atomic ops, less contention |
| Multiple variables / complex invariants | std::mutex | Protect a whole section |
9. std::jthread and stop_token
std::jthread (C++20) joins on destruction and can pass a std::stop_token for cooperative shutdown—standardizing what people used to do by hand with std::thread + flags.
std::jthread + stop_token example
// Paste and build: g++ -std=c++20 -pthread -o jthread_demo jthread_demo.cpp && ./jthread_demo
#include <thread>
#include <iostream>
#include <chrono>
void worker(std::stop_token st) {
while (!st.stop_requested()) {
std::cout << "Working...\n";
std::this_thread::sleep_for(std::chrono::milliseconds(200));
}
std::cout << "Stopped.\n";
}
int main() {
std::jthread t(worker); // stop_token injected
std::this_thread::sleep_for(std::chrono::seconds(1));
t.request_stop(); // request shutdown
// destruction joins; request_stop also typical before join
return 0;
}
Explanation: std::jthread passes std::stop_token to callables that accept it. Loop on st.stop_requested(); call request_stop() from outside. Destructor joins—reduces missing join bugs.
Output: several Working... lines, then Stopped.
10. Common mistakes and caveats
(1) Missing join/detach
void bad() {
std::thread t(doWork);
} // if still joinable → std::terminate()
Explanation: destroying t while joinable terminates the program. Always join or detach.
Fix:
// Example
void good() {
std::thread t(doWork);
t.join(); // or t.detach()
}
(2) join twice
std::thread t(doWork);
t.join();
t.join(); // undefined behavior
Explanation: after the first join, the thread is no longer joinable—second join is UB. Use joinable().
Fix:
std::thread t(doWork);
t.join();
if (t.joinable()) { // false — block skipped
t.join();
}
Only one of join or detach per thread.
(3) Lambdas capturing references to locals
void bad() {
int value = 42;
std::thread t([&value]() {
std::cout << value; // value may be gone after bad() returns
});
t.detach(); // thread may outlive bad()
}
Explanation: [&value] with detach can run after bad() returns—use-after-free. Capture by value or join before returning.
Fix:
void good() {
int value = 42;
std::thread t([value]() {
std::cout << value; // safe copy
});
t.join(); // or ensure lifetime if using detach
}
(4) Unprotected shared writes
Multiple writers without synchronization → data race. See mutex.
(5) Parentheses on function pointer
std::thread t(sayHello()); // ❌ passes result of sayHello() — wrong
std::thread t(sayHello); // ✅ pass callable
Common issues and fixes
Issue: “terminate called without an active exception”
Cause: std::thread destroyed while still joinable.
Fix:
void bad() {
std::thread t(doWork);
} // terminate
void good() {
std::thread t(doWork);
t.join(); // or detach()
}
Issue: AddressSanitizer: heap-use-after-free
Cause: detached thread uses a destroyed local.
Fix: capture by value or join first.
void bad() {
std::string msg = "hello";
std::thread t([&msg]() { std::cout << msg; });
t.detach();
} // msg destroyed while thread may still run
void good() {
std::string msg = "hello";
std::thread t([msg]() { std::cout << msg; }); // copy
t.detach();
}
Issue: counter smaller than expected
Cause: data race on a shared variable.
Fix: mutex or atomic.
Issue: double free / heap corruption
Cause: one thread frees memory another still uses—often unsynchronized shared pointers.
Fix: synchronize mutations; clarify ownership; consider single-owner designs.
Issue: too many threads → OOM
Cause: one thread per file → thousands of threads → huge stack usage.
Fix: thread pool; cap near std::thread::hardware_concurrency(); queue work.
Issue: condition_variable wait without predicate
Cause: cv.wait(lock) alone can wake spuriously.
Fix: cv.wait(lock, [] { return condition; })
// ❌ risky
cv.wait(lock);
// ✅ safe
cv.wait(lock, [] { return !queue.empty() || done; });
Issue: re-locking std::mutex in same thread
Cause: std::mutex is not recursive—double lock deadlocks.
Fix: refactor locking; use std::recursive_mutex only when truly needed (often hides design issues).
Issue: main blocked forever on join
Cause: worker infinite loop or blocked I/O.
Fix: cooperative shutdown (stop_token, atomic flags), std::jthread::request_stop(), or wait_for where appropriate.
11. Performance: single vs multi-threaded
CPU-bound work
Splitting pure CPU work can use multiple cores, but thread create/join and scheduling have cost—too small tasks can go slower.
#include <thread>
#include <iostream>
#include <chrono>
#include <vector>
#include <numeric>
long long sumSingle(long long n) {
long long result = 0;
for (long long i = 0; i < n; ++i) {
result += i;
}
return result;
}
long long sumMulti(long long n, int numThreads = 4) {
std::vector<std::thread> threads;
std::vector<long long> partialSums(numThreads, 0);
long long chunk = n / numThreads;
for (int i = 0; i < numThreads; ++i) {
long long start = i * chunk;
long long end = (i == numThreads - 1) ? n : (i + 1) * chunk;
threads.emplace_back([&partialSums, i, start, end]() {
for (long long j = start; j < end; ++j) {
partialSums[i] += j;
}
});
}
for (auto& t : threads) {
t.join();
}
return std::accumulate(partialSums.begin(), partialSums.end(), 0LL);
}
int main() {
const long long n = 100'000'000;
auto start = std::chrono::high_resolution_clock::now();
auto r1 = sumSingle(n);
auto end = std::chrono::high_resolution_clock::now();
auto singleMs = std::chrono::duration_cast<std::chrono::milliseconds>(end - start).count();
start = std::chrono::high_resolution_clock::now();
auto r2 = sumMulti(n);
end = std::chrono::high_resolution_clock::now();
auto multiMs = std::chrono::duration_cast<std::chrono::milliseconds>(end - start).count();
std::cout << "Single: " << singleMs << " ms, result=" << r1 << "\n";
std::cout << "Multi: " << multiMs << " ms, result=" << r2 << "\n";
return 0;
}
Explanation: sumSingle is sequential; sumMulti partitions ranges—can win on 4+ cores for large n. Here each thread writes only its partialSums[i]—no race on that array; more complex sharing needs mutex.
Typical (varies by machine): on a quad-core, multi might be ~2–3× faster for large n.
I/O-bound work
Image pipelines mixing disk I/O and CPU often speed up with parallel files.
| Mode | 10 files (illustrative) |
|---|---|
| Single thread, sequential | ~10 s |
| Multi-threaded, per file | ~3–5 s |
Batch images: before / after
Before (single thread):
void processAll(const std::vector<std::string>& files,
const std::string& outputDir) {
for (size_t i = 0; i < files.size(); ++i) {
Image img = loadImage(files[i]);
img.resize(800, 600);
std::string outPath = outputDir + "/" + std::to_string(i) + ".jpg";
saveImage(img, outPath);
}
}
After (multi-threaded):
void processAllParallel(const std::vector<std::string>& files,
const std::string& outputDir) {
std::vector<std::thread> threads;
threads.reserve(files.size());
for (size_t i = 0; i < files.size(); ++i) {
threads.emplace_back([&files, &outputDir, i]() {
Image img = loadImage(files[i]);
img.resize(800, 600);
std::string outPath = outputDir + "/" + std::to_string(i) + ".jpg";
saveImage(img, outPath);
});
}
for (auto& t : threads) {
t.join();
}
}
Explanation: [&files, &outputDir, i] — files and outputDir live until processAllParallel returns; i is captured by value per iteration. If loadImage / saveImage use hidden global state, add synchronization.
Caveat: if those functions mutate shared caches/globals, you can still race—prefer independent per-thread data or mutexes.
How many threads?
| Workload | Suggestion | Why |
|---|---|---|
| CPU-bound | std::thread::hardware_concurrency() | Match cores |
| I/O-heavy | ~2–4× cores | overlap waits |
| Mixed | measure | tune with benchmarks |
#include <thread>
#include <iostream>
int main() {
unsigned int n = std::thread::hardware_concurrency();
std::cout << "CPU cores: " << n << "\n";
return 0;
}
hardware_concurrency() may return 0 if unknown.
12. Best practices and production patterns
Best practices
- Prefer join when you need ordering or stack-scoped data used by the thread.
- Detach rarely—long-lived daemon threads (e.g. logging)—and verify lifetimes.
- Lifetime: detached threads must not capture dangling references—copy or extend lifetime.
- Shared writes: mutex + lock_guard, or atomic for single variables.
- Small critical sections: do not hold mutexes across I/O or heavy work.
- Exception safety: ThreadGuard or
std::jthread.
Production patterns
Pattern 1: worker pool + queue
Use hardware_concurrency() workers and a condition_variable to wake on work—see the condition_variable article.
Pattern 2: dedicated log thread
Producers enqueue; one thread writes—often detached for process lifetime.
Pattern 3: parallel batch with a cap
const size_t maxConcurrent = std::thread::hardware_concurrency();
std::vector<std::thread> workers;
workers.reserve(maxConcurrent);
for (size_t i = 0; i < files.size(); i += maxConcurrent) {
workers.clear();
for (size_t j = i; j < std::min(i + maxConcurrent, files.size()); ++j)
workers.emplace_back(processFile, j, std::cref(files));
for (auto& t : workers) t.join();
}
Explanation: limits concurrent threads instead of spawning one per file.
Production tips
- Cap thread count—unbounded threads waste memory and time in context switches.
- Exceptions do not propagate from worker to main automatically—use
std::promise/futureor in-thread handling + shared error state. - Debug data races with ThreadSanitizer:
g++ -std=c++17 -pthread -fsanitize=thread -g -o myapp myapp.cpp native_handle()for OS-specific tuning when needed.
13. Implementation checklist
- After creating a thread, always
join()ordetach() - Call
join/detachonly once (double call is UB) - If using
detach(), guarantee lifetime of data the thread uses - Beware reference captures
[&]— use-after-free - Protect shared mutable state with mutex + lock_guard or atomic
- Minimize lock scope (no I/O under lock unless required)
- On exceptions, use RAII (ThreadGuard) or
std::jthread - Pass function pointers without
()(sayHello, notsayHello())
Related posts (internal)
- C++ mutex: race conditions and lock_guard
- C++ atomic: thread-safe counters and memory_order
- C++ practical series index
Keywords
C++ threads, std::thread, multithreading basics, join and detach, thread creation, concurrent programming.
Summary
- Threads run inside a process and share heap and globals; stacks are per-thread.
std::threadruns functions, lambdas, or functors; arguments are copied or passed withstd::ref.- Always join or detach; join is the safer default when lifetimes align.
- Concurrent writes to shared data cause data races—use
std::mutex+lock_guard, orstd::atomicfor single variables. condition_variableenables event-driven waits instead of polling for queues.- RAII (ThreadGuard) or
std::jthreadavoids forgotten joins;stop_tokenhelps cooperative shutdown. - In production: cap thread count, use pools, run ThreadSanitizer.
One-liner: create std::thread, always join or detach, protect shared data with mutex/atomic, use condition_variable for condition-based waiting. Next, read mutex.
Next article
Once you can launch threads, you need to share data safely.
FAQ
Q. When do I use this in production?
A. As a C++11 concurrency primer: std::thread, join/detach, process vs thread, thread safety, and common pitfalls—apply the examples and selection guidance from the body.
Q. What should I read first?
A. Follow previous post links in order, or open the C++ series index.
Q. How do I go deeper?
A. cppreference and library docs; see References below.
Q. std::thread vs std::async?
A. Use std::thread when you manage the thread directly. Use std::async when you want a future / deferred result. For “just run in the background,” std::thread is often enough.
Q. When do I need a thread pool?
A. Creating/destroying threads repeatedly is expensive; for steady workloads, reuse a pool. After this article, you can build pools with mutex and condition_variable.
Q. “undefined reference to pthread” when linking
A. On Linux, link with -pthread:
g++ -std=c++17 -pthread -o myapp myapp.cpp
Q. join or detach by default?
A. Default to join. Use detach only for long-lived daemon-style work (e.g. logging) and verify data lifetime.
Next: C++ guide #7-2: mutex and synchronization — races, mutex, lock_guard.
References
See also
- C++ condition_variable patterns
- C++ atomic (memory_order)
- C++ mutex and races
- C++ advanced threading: pools and work stealing
- C++ stack vs heap
같이 보면 좋은 글 (내부 링크)
이 주제와 연결되는 다른 글입니다.
- C++ mutex로 race condition 해결하기 | 주문 카운터 버그부터 lock_guard까지
- C++ atomic | Mutex 없이 스레드 안전 카운터 만드는 법 (memory_order 포함)
- C++ promise std::promise 완벽 가이드 | future와 비동기 프로그래밍
- C++ async & launch | std::async·future·launch 정책 완벽 정리
- C++20 Coroutines 완벽 가이드 | 비동기 프로그래밍의 새 시대
- IO 다중화 select epoll 완벽 가이드 | I/O 멀티플렉싱 실전
이 글에서 다루는 키워드 (관련 검색어)
C++, multithreading, std::thread, std::jthread, condition_variable, std::atomic, threading 등으로 검색하시면 이 글이 도움이 됩니다.