본문으로 건너뛰기
Previous
Next
C++ std::thread basics — join/detach mistakes, mutex,

C++ std::thread basics — join/detach mistakes, mutex,

C++ std::thread basics — join/detach mistakes, mutex,

이 글의 핵심

Create and manage std::thread with join/detach; mutex, condition_variable, atomic, and jthread basics; process vs thread; common mistakes (missing join, detach misuse) and fixes—with runnable examples.

Introduction: when single-threaded work is too slow

“Why is this taking so long?” — I/O waits and using only one CPU core

In an image conversion service, several user-uploaded files were resized one after another. Even ten files could take more than ten seconds.

The code at the time used a for loop to process files sequentially. loadImage and saveImage hit disk I/O, so the CPU waited; only resize used the CPU, so one thread used one core. With many files, handing each file to a separate thread (a thread is an independent flow of execution inside a process—think of it as several people working at once inside one team, the process) lets you overlap I/O waits and CPU work and reduce total time. After splitting work across threads, wall time for ten files dropped to roughly half or less.

void processAll(const std::vector<std::string>& files) {
    for (const auto& path : files) {
        Image img = loadImage(path);   // disk I/O wait
        img.resize(800, 600);          // CPU work
        saveImage(img, outputPath);    // disk I/O wait
    }
}

What this shows: the loop runs on one thread, so during loadImage / saveImage I/O the CPU sits idle, and resize still uses only one core. More files mean longer sequential time; per-file threads let you overlap I/O and CPU work.

Why it was slow:

  • All work ran on the main thread only
  • The CPU idled during disk I/O waits
  • The machine had multiple cores but only one was used

After splitting with std::thread:

  • One thread per file to overlap I/O and CPU
  • For ten files, perceived time dropped to about half or less

That experience drove home the need for multithreading basics.

Threads look like “doing many things at once,” but if several threads touch the same memory without coordination you get data races and bugs. So learn thread creation, join (wait until the thread finishes), and detach (let the thread run in the background), then continue with mutex, condition_variable, and atomic for protecting shared data.

From an OS thread perspective, this lines up with Java Thread / Executor. It differs from Go goroutines and lightweight scheduling tradeoffs. Rust std::thread and channels pair well if you read them alongside ownership rules.

After reading this post you will:

  • Know how to create std::thread instances and join or detach them
  • Understand what threads are and why we use them
  • Get intuition for thread safety
  • Avoid common mistakes (missing join, detach misuse)

More problem scenarios

Scenario 1: UI freezes

In a desktop app, clicking Save compressed a large file and wrote to disk; the UI froze for seconds. Users thought the app hung and clicked repeatedly. Fix: run compression and I/O on a worker thread; keep the main thread for UI events only.

Scenario 2: delayed / corrupted logs

Several worker threads wrote logs to a file at once; while one called fprintf, others touched the same file handle—lines interleaved or the process crashed. Fix: one dedicated log thread; other threads enqueue messages; only the log thread writes to the file.

Scenario 3: waiting on network calls

REST calls blocked the main logic while waiting for responses; sequential calls added latencies. Fix: launch requests with std::thread in parallel and join() until all finish—total wait tends toward the max, not the sum.

Scenario 4: counter does not match

On event day, workers incremented a single shared int for order counts; the total was tens of thousands below the DB. Cause: counter++ is not atomic—data race. Fix: protect with a mutex or use atomic for a single variable.

Scenario 5: server stuck in deadlock

Two threads locked mutex A and mutex B in different orders and waited forever. Requests stopped until restart. Cause: inconsistent lock order. Fix: always lock in the same order (e.g. A then B) or use std::scoped_lock(mtxA, mtxB) to lock both at once.

Scenario 6: 100% CPU from queue polling

A worker polled a queue every 1 ms in a while loop. Even when idle, CPU stayed high. Fix: use condition_variable—“wake only when work arrives”—so idle CPU use drops near zero.

Practical note: this article is grounded in real large-scale C++ projects: pitfalls and debugging tips you rarely see in textbooks.

Table of contents

  1. What is a thread?
  2. Creating threads with std::thread
  3. join and detach: thread lifetime
  4. Managing threads safely with RAII
  5. What is thread safety?
  6. Mutex basics: protecting shared data
  7. condition_variable basics
  8. atomic basics: counter without a lock
  9. std::jthread and stop_token
  10. Common mistakes and caveats
  11. Performance: single vs multi-threaded
  12. Best practices and production patterns
  13. Implementation checklist

1. What is a thread?

Process vs thread

A process is one running program. Memory, file handles, and environment are per process. Think of a process as one “book in progress”; threads are multiple bookmarks (instruction pointers) in that same book.

A thread is a unit of execution inside that process. A process starts with one main thread and can create more to do work concurrently.

A simplified view of processes, threads, and std::thread:

flowchart TB
  subgraph proc["Process (one address space)"]
    main[Main thread]
    t1[Thread 1]
    t2[Thread 2]
    main --- t1
    main --- t2
  end
  main -->|std::thread t(fn);| t1
  main -->|std::thread t2(fn2);| t2
// Example diagram
graph TB
    PROCESS["Process myapp"]
    MEMORY["Address space\n━━━━━━━━━━━━━━━\ncode, data, heap\nshared by all threads ⚠️"]
    T1["Thread 1 main\n━━━━━━━━━━━━\nmain, UI"]
    T2["Thread 2\n━━━━━━━━━━━━\nnetwork recv"]
    T3["Thread 3\n━━━━━━━━━━━━\nimage processing"]
    T4["Thread 4\n━━━━━━━━━━━━\nlogging"]
    PROCESS --> MEMORY
    PROCESS --> T1
    PROCESS --> T2
    PROCESS --> T3
    PROCESS --> T4
    T1 -.->|shared access| MEMORY
    T2 -.->|shared access| MEMORY
    T3 -.->|shared access| MEMORY
    T4 -.->|shared access| MEMORY
    style PROCESS fill:#e1f5fe,stroke:#0277bd,stroke-width:3px
    style MEMORY fill:#fff3e0,stroke:#f57c00,stroke-width:3px
    style T1 fill:#c8e6c9,stroke:#388e3c,stroke-width:2px
    style T2 fill:#c8e6c9,stroke:#388e3c,stroke-width:2px
    style T3 fill:#c8e6c9,stroke:#388e3c,stroke-width:2px
    style T4 fill:#c8e6c9,stroke:#388e3c,stroke-width:2px

Threads in the same process share the heap, globals, and static storage. Local variables and the stack are per thread.

So if several threads read and write the same global / heap data at once, you get data races: undefined behavior, wrong results, or crashes. Shared data needs mutex or atomic protection.

Why you must choose join or detach: when a std::thread object is destroyed while the thread is still joinable (running and not joined/detached), the standard calls std::terminate() and the program aborts. After creating a thread you must call join() (wait until it finishes) or detach() (disown the thread). If you want workers to outlive main, use detach; if main should wait for all workers before exiting, use join.

Why use multithreading?

  • Throughput: while one thread waits on I/O, others can use the CPU
  • Responsiveness: keep the UI thread responsive; move heavy work to workers
  • Multi-core: split work across cores

2. Creating threads with std::thread

Since C++11, <thread> provides std::thread. Pass a function, lambda, or functor to start a thread.

Passing a function

The std::thread constructor runs the callable on a new thread. For std::thread t(sayHello);, sayHello is a function pointerdo not add (). Adding () would pass the result of sayHello() (i.e. void), which compiles in some cases but is not what you want. t.join() blocks the main thread until that thread finishes. Without join or detach, destroying t calls std::terminate()—you must call join() or detach().

// Paste and build: g++ -std=c++17 -pthread -o thread_hello thread_hello.cpp && ./thread_hello
#include <thread>
#include <iostream>
void sayHello() {
    std::cout << "Hello from thread!\n";
}
int main() {
    std::thread t(sayHello);   // create and run thread
    t.join();                  // wait until thread finishes
    return 0;
}

Explanation: std::thread t(sayHello) runs sayHello on a new thread; t.join() blocks main until it completes. You pass sayHello without () because you pass the function pointer, not the return value of an immediate call. Without join/detach, thread destruction triggers std::terminate().

Output: one line: Hello from thread!

Passing a lambda

#include <thread>
#include <iostream>
int main() {
    std::thread t([] {
        std::cout << "Hello from lambda thread!\n";
    });
    t.join();
    return 0;
}

Explanation: passing a lambda runs it on the new thread—handy for small snippets. You still must call join() or detach().

Output: Hello from lambda thread!

Passing arguments

#include <thread>
#include <iostream>
void printSum(int a, int b) {
    std::cout << "Sum: " << (a + b) << "\n";
}
int main() {
    std::thread t(printSum, 10, 20);  // pass 10 and 20
    t.join();
    return 0;
}

Explanation: arguments after the callable are copied by value into the new thread. For references, use std::ref(variable) and ensure the referred object outlives the thread.

Arguments are listed after the function name. Because they are copied, use std::ref() for references.

Output: Sum: 30

Passing references with std::ref

Use std::ref() or std::cref() for references. Important: the referred object must outlive the thread, or you get use-after-free.

#include <thread>
#include <iostream>
#include <functional>
void increment(int& value) {
    ++value;
}
int main() {
    int counter = 0;
    std::thread t(increment, std::ref(counter));  // pass by reference
    t.join();
    std::cout << "counter: " << counter << "\n";  // 1
    return 0;
}

Explanation: std::ref(counter) lets the thread mutate counter in place. Without std::ref, a copy is passed and the original is unchanged. Use std::cref() for const references.

Output: counter: 1


3. join and detach: thread lifetime

Kitchen analogy: concurrency is like one cook switching between tasks (single thread). Parallelism is several cooks working on different dishes at once (multiple threads).

When a std::thread object is destroyed while still joinable, std::terminate() runs. You must join or detach.

join vs detach flow

sequenceDiagram
    participant M as Main thread
    participant T as Worker thread
    Note over M,T: When using join()
    M->>T: std::thread t(doWork);
    M->>M: call t.join()
    Note over M: blocking (wait)
    T->>T: run doWork()
    T->>M: exit
    M->>M: join returns, continue
    Note over M,T: When using detach()
    M->>T: std::thread t(doWork);
    M->>M: call t.detach()
    M->>M: continue immediately
    T->>T: runs independently in background

join(): wait until the thread finishes

std::thread t(doWork);
t.join();  // block until doWork() finishes
// from here on, t has completed

Explanation: join() blocks until the thread ends. After join() returns, thread-local state is gone. You cannot join the same thread twice.

  • join() blocks until completion.
  • Return values are not passed through join() (use std::async / std::promise later).
  • Double join is undefined behavior.

detach(): run in the background

std::thread t(doWork);
t.detach();  // disown t; thread keeps running
// you no longer know when t finishes—lifetime of data it uses must be correct

Explanation: detach() severs the std::thread handle from the OS thread; the thread keeps running. Main does not wait—ensure anything the thread uses is not destroyed before the thread finishes.

  • After detach, you cannot join that thread.
  • Lifetime of objects the thread uses must extend appropriately—otherwise undefined behavior (e.g. use-after-free).

joinable()

Calling join twice is undefined behavior. Use joinable() to check.

#include <thread>
#include <iostream>
void doWork() {
    std::cout << "Working...\n";
}
int main() {
    std::thread t(doWork);
    if (t.joinable()) {
        t.join();  // safe join
    }
    // t.join();  // ❌ undefined behavior if already joined
    return 0;
}

Explanation: joinable() is true until join or detach. Prefer RAII patterns for exception safety.

Practical guidance

  • Default to join when you need ordering or when data lives in the current scope.
  • Use detach sparingly—e.g. a process-lifetime log thread—and always verify data lifetime.

4. Managing threads safely with RAII

If an exception is thrown before join, destroying a still-joinable std::thread calls std::terminate(). RAII wrappers that join in the destructor are common.

ThreadGuard pattern

#include <thread>
#include <iostream>
#include <stdexcept>
class ThreadGuard {
public:
    explicit ThreadGuard(std::thread& t) : t_(t) {}
    ~ThreadGuard() {
        if (t_.joinable()) {
            t_.join();  // auto join on destruction
        }
    }
    // non-copyable
    ThreadGuard(const ThreadGuard&) = delete;
    ThreadGuard& operator=(const ThreadGuard&) = delete;
private:
    std::thread& t_;
};
void doWork() {
    std::cout << "Working...\n";
}
void mightThrow() {
    std::thread t(doWork);
    ThreadGuard guard(t);  // t.join() on guard destruction
    // throw std::runtime_error("oops");  // still joins on exception
}

Explanation: ThreadGuard holds a reference and joins in the destructor if still joinable—so std::terminate() is avoided even on exceptions. C++20 std::jthread standardizes this.

std::jthread (C++20)

std::jthread (joining thread) joins automatically in its destructor.

#include <thread>
#include <iostream>
void doWork() {
    std::cout << "Working...\n";
}
int main() {
    std::jthread t(doWork);  // auto join on destruction
    // no manual t.join()
    return 0;
}

Explanation: std::jthread joins on destruction—good way to avoid forgetting join. Requires C++20 or newer.


5. What is thread safety?

Code is thread-safe if, when several threads access shared data, results stay correct and there are no memory errors.

Example that is not thread-safe

#include <thread>
#include <iostream>
int counter = 0;  // global
void increment() {
    for (int i = 0; i < 100000; ++i) {
        counter++;  // read-modify-write — data race if concurrent
    }
}
int main() {
    std::thread t1(increment);
    std::thread t2(increment);
    t1.join();
    t2.join();
    std::cout << counter << "\n";  // may not be 200000 (data race)
    return 0;
}

counter++ is read → add → write. Two threads interleaving those steps can lose updates—a data race.

Explanation: concurrent counter++ is not atomic; one thread’s write can overwrite another’s read-modify-write. The sum can be less than 200000. Protect shared mutation with mutex or atomic.

See mutex and atomic for fixes.

Visualizing a data race

sequenceDiagram
    participant T1 as Thread 1
    participant T2 as Thread 2
    participant C as counter (memory)
    Note over C: initial: 0
    T1->>C: read (0)
    T2->>C: read (0)
    T1->>T1: 0+1=1
    T2->>T2: 0+1=1
    T1->>C: write (1)
    T2->>C: write (1)
    Note over C: expected 2, got 1 (lost update)

When it is safe

  • Read-only shared data (no writes—usually fine).
  • Threads use disjoint variables only.
  • APIs documented as thread-safe by the standard.

6. Mutex basics: protecting shared data

When several threads modify the same variable, use a mutex (mutual exclusion) so only one thread enters the critical section at a time. std::mutex blocks others until the lock is released. Details are in the next article; here is minimal usage.

std::mutex + lock_guard full example

// Paste and build: g++ -std=c++17 -pthread -o mutex_basic mutex_basic.cpp && ./mutex_basic
#include <thread>
#include <iostream>
#include <mutex>
int counter = 0;
std::mutex counter_mutex;
void safeIncrement() {
    for (int i = 0; i < 100000; ++i) {
        std::lock_guard<std::mutex> lock(counter_mutex);  // acquire
        ++counter;  // critical section: one thread at a time
    }  // unlock on destruction
}
int main() {
    std::thread t1(safeIncrement);
    std::thread t2(safeIncrement);
    t1.join();
    t2.join();
    std::cout << "counter: " << counter << "\n";  // always 200000
    return 0;
}

Explanation:

  • One std::mutex per shared resource.
  • std::lock_guard locks on construction and unlocks on destruction (RAII), even on exceptions.
  • ++counter is now serialized—no data race.

Output: always counter: 200000

Without vs with mutex

// ❌ No mutex: data race
void unsafeIncrement() {
    for (int i = 0; i < 100000; ++i) {
        ++counter;  // concurrent access → undefined results
    }
}
// ✅ Mutex: thread-safe
void safeIncrement() {
    for (int i = 0; i < 100000; ++i) {
        std::lock_guard<std::mutex> lock(counter_mutex);
        ++counter;
    }
}

lock_guard caveats

  • Minimize lock scope: holding a lock across I/O or heavy work hurts concurrency—protect only what you must.
  • Deadlock: locking two mutexes in different orders can deadlock—see the mutex article for ordering and std::lock().

7. condition_variable basics

A condition_variable lets threads wait until a predicate becomes true and wake others when state changes. Mutex alone does not express “sleep until the queue is non-empty” cleanly; cv avoids busy polling—foundation for work queues and producer–consumer. Details: next post.

Producer–consumer example

// Paste and build: g++ -std=c++20 -pthread -o cv_demo cv_demo.cpp && ./cv_demo
#include <thread>
#include <iostream>
#include <mutex>
#include <condition_variable>
#include <queue>
#include <chrono>
std::queue<int> queue;
std::mutex mtx;
std::condition_variable cv;
bool done = false;
void producer() {
    for (int i = 0; i < 5; ++i) {
        {
            std::lock_guard<std::mutex> lock(mtx);
            queue.push(i);
        }
        cv.notify_one();  // wake consumer
        std::this_thread::sleep_for(std::chrono::milliseconds(100));
    }
    {
        std::lock_guard<std::mutex> lock(mtx);
        done = true;
    }
    cv.notify_one();
}
void consumer() {
    while (true) {
        std::unique_lock<std::mutex> lock(mtx);
        cv.wait(lock, [] { return !queue.empty() || done; });  // wait until predicate
        while (!queue.empty()) {
            int val = queue.front();
            queue.pop();
            lock.unlock();
            std::cout << "Consumed: " << val << "\n";
            lock.lock();
        }
        if (done) break;
    }
}
int main() {
    std::thread p(producer);
    std::thread c(consumer);
    p.join();
    c.join();
    return 0;
}

Explanation: producer pushes and notify_one(); consumer uses wait(lock, predicate) so it sleeps until the queue has data or done. Always pass a predicate to handle spurious wakeups.

Output: Consumed: 0 through Consumed: 4 in order.


8. atomic basics: counter without a lock

For a single variable (counter, flag), std::atomic is often lighter than a mutex. Operations are atomic at the hardware level. Deep dive: atomic article.

std::atomic full example

// Paste and build: g++ -std=c++17 -pthread -o atomic_demo atomic_demo.cpp && ./atomic_demo
#include <thread>
#include <iostream>
#include <atomic>
std::atomic<int> counter{0};
void increment() {
    for (int i = 0; i < 100000; ++i) {
        counter++;  // atomic — no data race on counter
    }
}
int main() {
    std::thread t1(increment);
    std::thread t2(increment);
    t1.join();
    t2.join();
    std::cout << "counter: " << counter << "\n";  // always 200000
    return 0;
}

Explanation: counter++ on std::atomic<int> is atomic—no mutex contention for this pattern. Note: updating several related variables consistently still needs a mutex. Use atomic for single-variable read/write/increment/decrement.

Output: always counter: 200000

atomic vs mutex

SituationPreferWhy
Single variable (counter, flag)std::atomicLock-free atomic ops, less contention
Multiple variables / complex invariantsstd::mutexProtect a whole section

9. std::jthread and stop_token

std::jthread (C++20) joins on destruction and can pass a std::stop_token for cooperative shutdown—standardizing what people used to do by hand with std::thread + flags.

std::jthread + stop_token example

// Paste and build: g++ -std=c++20 -pthread -o jthread_demo jthread_demo.cpp && ./jthread_demo
#include <thread>
#include <iostream>
#include <chrono>
void worker(std::stop_token st) {
    while (!st.stop_requested()) {
        std::cout << "Working...\n";
        std::this_thread::sleep_for(std::chrono::milliseconds(200));
    }
    std::cout << "Stopped.\n";
}
int main() {
    std::jthread t(worker);  // stop_token injected
    std::this_thread::sleep_for(std::chrono::seconds(1));
    t.request_stop();  // request shutdown
    // destruction joins; request_stop also typical before join
    return 0;
}

Explanation: std::jthread passes std::stop_token to callables that accept it. Loop on st.stop_requested(); call request_stop() from outside. Destructor joins—reduces missing join bugs.

Output: several Working... lines, then Stopped.


10. Common mistakes and caveats

(1) Missing join/detach

void bad() {
    std::thread t(doWork);
}  // if still joinable → std::terminate()

Explanation: destroying t while joinable terminates the program. Always join or detach.

Fix:

// Example
void good() {
    std::thread t(doWork);
    t.join();  // or t.detach()
}

(2) join twice

std::thread t(doWork);
t.join();
t.join();  // undefined behavior

Explanation: after the first join, the thread is no longer joinable—second join is UB. Use joinable().

Fix:

std::thread t(doWork);
t.join();
if (t.joinable()) {  // false — block skipped
    t.join();
}

Only one of join or detach per thread.

(3) Lambdas capturing references to locals

void bad() {
    int value = 42;
    std::thread t([&value]() {
        std::cout << value;  // value may be gone after bad() returns
    });
    t.detach();  // thread may outlive bad()
}

Explanation: [&value] with detach can run after bad() returns—use-after-free. Capture by value or join before returning.

Fix:

void good() {
    int value = 42;
    std::thread t([value]() {
        std::cout << value;  // safe copy
    });
    t.join();  // or ensure lifetime if using detach
}

(4) Unprotected shared writes

Multiple writers without synchronization → data race. See mutex.

(5) Parentheses on function pointer

std::thread t(sayHello());  // ❌ passes result of sayHello() — wrong
std::thread t(sayHello);   // ✅ pass callable

Common issues and fixes

Issue: “terminate called without an active exception”

Cause: std::thread destroyed while still joinable.

Fix:

void bad() {
    std::thread t(doWork);
}  // terminate
void good() {
    std::thread t(doWork);
    t.join();  // or detach()
}

Issue: AddressSanitizer: heap-use-after-free

Cause: detached thread uses a destroyed local.

Fix: capture by value or join first.

void bad() {
    std::string msg = "hello";
    std::thread t([&msg]() { std::cout << msg; });
    t.detach();
}  // msg destroyed while thread may still run
void good() {
    std::string msg = "hello";
    std::thread t([msg]() { std::cout << msg; });  // copy
    t.detach();
}

Issue: counter smaller than expected

Cause: data race on a shared variable.

Fix: mutex or atomic.

Issue: double free / heap corruption

Cause: one thread frees memory another still uses—often unsynchronized shared pointers.

Fix: synchronize mutations; clarify ownership; consider single-owner designs.

Issue: too many threads → OOM

Cause: one thread per file → thousands of threads → huge stack usage.

Fix: thread pool; cap near std::thread::hardware_concurrency(); queue work.

Issue: condition_variable wait without predicate

Cause: cv.wait(lock) alone can wake spuriously.

Fix: cv.wait(lock, [] { return condition; })

// ❌ risky
cv.wait(lock);
// ✅ safe
cv.wait(lock, [] { return !queue.empty() || done; });

Issue: re-locking std::mutex in same thread

Cause: std::mutex is not recursive—double lock deadlocks.

Fix: refactor locking; use std::recursive_mutex only when truly needed (often hides design issues).

Issue: main blocked forever on join

Cause: worker infinite loop or blocked I/O.

Fix: cooperative shutdown (stop_token, atomic flags), std::jthread::request_stop(), or wait_for where appropriate.


11. Performance: single vs multi-threaded

CPU-bound work

Splitting pure CPU work can use multiple cores, but thread create/join and scheduling have cost—too small tasks can go slower.

#include <thread>
#include <iostream>
#include <chrono>
#include <vector>
#include <numeric>
long long sumSingle(long long n) {
    long long result = 0;
    for (long long i = 0; i < n; ++i) {
        result += i;
    }
    return result;
}
long long sumMulti(long long n, int numThreads = 4) {
    std::vector<std::thread> threads;
    std::vector<long long> partialSums(numThreads, 0);
    long long chunk = n / numThreads;
    for (int i = 0; i < numThreads; ++i) {
        long long start = i * chunk;
        long long end = (i == numThreads - 1) ? n : (i + 1) * chunk;
        threads.emplace_back([&partialSums, i, start, end]() {
            for (long long j = start; j < end; ++j) {
                partialSums[i] += j;
            }
        });
    }
    for (auto& t : threads) {
        t.join();
    }
    return std::accumulate(partialSums.begin(), partialSums.end(), 0LL);
}
int main() {
    const long long n = 100'000'000;
    auto start = std::chrono::high_resolution_clock::now();
    auto r1 = sumSingle(n);
    auto end = std::chrono::high_resolution_clock::now();
    auto singleMs = std::chrono::duration_cast<std::chrono::milliseconds>(end - start).count();
    start = std::chrono::high_resolution_clock::now();
    auto r2 = sumMulti(n);
    end = std::chrono::high_resolution_clock::now();
    auto multiMs = std::chrono::duration_cast<std::chrono::milliseconds>(end - start).count();
    std::cout << "Single: " << singleMs << " ms, result=" << r1 << "\n";
    std::cout << "Multi:  " << multiMs << " ms, result=" << r2 << "\n";
    return 0;
}

Explanation: sumSingle is sequential; sumMulti partitions ranges—can win on 4+ cores for large n. Here each thread writes only its partialSums[i]—no race on that array; more complex sharing needs mutex.

Typical (varies by machine): on a quad-core, multi might be ~2–3× faster for large n.

I/O-bound work

Image pipelines mixing disk I/O and CPU often speed up with parallel files.

Mode10 files (illustrative)
Single thread, sequential~10 s
Multi-threaded, per file~3–5 s

Batch images: before / after

Before (single thread):

void processAll(const std::vector<std::string>& files,
                const std::string& outputDir) {
    for (size_t i = 0; i < files.size(); ++i) {
        Image img = loadImage(files[i]);
        img.resize(800, 600);
        std::string outPath = outputDir + "/" + std::to_string(i) + ".jpg";
        saveImage(img, outPath);
    }
}

After (multi-threaded):

void processAllParallel(const std::vector<std::string>& files,
                        const std::string& outputDir) {
    std::vector<std::thread> threads;
    threads.reserve(files.size());
    for (size_t i = 0; i < files.size(); ++i) {
        threads.emplace_back([&files, &outputDir, i]() {
            Image img = loadImage(files[i]);
            img.resize(800, 600);
            std::string outPath = outputDir + "/" + std::to_string(i) + ".jpg";
            saveImage(img, outPath);
        });
    }
    for (auto& t : threads) {
        t.join();
    }
}

Explanation: [&files, &outputDir, i]files and outputDir live until processAllParallel returns; i is captured by value per iteration. If loadImage / saveImage use hidden global state, add synchronization.

Caveat: if those functions mutate shared caches/globals, you can still race—prefer independent per-thread data or mutexes.

How many threads?

WorkloadSuggestionWhy
CPU-boundstd::thread::hardware_concurrency()Match cores
I/O-heavy~2–4× coresoverlap waits
Mixedmeasuretune with benchmarks
#include <thread>
#include <iostream>
int main() {
    unsigned int n = std::thread::hardware_concurrency();
    std::cout << "CPU cores: " << n << "\n";
    return 0;
}

hardware_concurrency() may return 0 if unknown.


12. Best practices and production patterns

Best practices

  1. Prefer join when you need ordering or stack-scoped data used by the thread.
  2. Detach rarely—long-lived daemon threads (e.g. logging)—and verify lifetimes.
  3. Lifetime: detached threads must not capture dangling references—copy or extend lifetime.
  4. Shared writes: mutex + lock_guard, or atomic for single variables.
  5. Small critical sections: do not hold mutexes across I/O or heavy work.
  6. Exception safety: ThreadGuard or std::jthread.

Production patterns

Pattern 1: worker pool + queue

Use hardware_concurrency() workers and a condition_variable to wake on work—see the condition_variable article.

Pattern 2: dedicated log thread

Producers enqueue; one thread writes—often detached for process lifetime.

Pattern 3: parallel batch with a cap

const size_t maxConcurrent = std::thread::hardware_concurrency();
std::vector<std::thread> workers;
workers.reserve(maxConcurrent);
for (size_t i = 0; i < files.size(); i += maxConcurrent) {
    workers.clear();
    for (size_t j = i; j < std::min(i + maxConcurrent, files.size()); ++j)
        workers.emplace_back(processFile, j, std::cref(files));
    for (auto& t : workers) t.join();
}

Explanation: limits concurrent threads instead of spawning one per file.

Production tips

  1. Cap thread count—unbounded threads waste memory and time in context switches.
  2. Exceptions do not propagate from worker to main automatically—use std::promise/future or in-thread handling + shared error state.
  3. Debug data races with ThreadSanitizer: g++ -std=c++17 -pthread -fsanitize=thread -g -o myapp myapp.cpp
  4. native_handle() for OS-specific tuning when needed.

13. Implementation checklist

  • After creating a thread, always join() or detach()
  • Call join/detach only once (double call is UB)
  • If using detach(), guarantee lifetime of data the thread uses
  • Beware reference captures [&]use-after-free
  • Protect shared mutable state with mutex + lock_guard or atomic
  • Minimize lock scope (no I/O under lock unless required)
  • On exceptions, use RAII (ThreadGuard) or std::jthread
  • Pass function pointers without () (sayHello, not sayHello())

  • C++ mutex: race conditions and lock_guard
  • C++ atomic: thread-safe counters and memory_order
  • C++ practical series index

Keywords

C++ threads, std::thread, multithreading basics, join and detach, thread creation, concurrent programming.

Summary

  • Threads run inside a process and share heap and globals; stacks are per-thread.
  • std::thread runs functions, lambdas, or functors; arguments are copied or passed with std::ref.
  • Always join or detach; join is the safer default when lifetimes align.
  • Concurrent writes to shared data cause data races—use std::mutex + lock_guard, or std::atomic for single variables.
  • condition_variable enables event-driven waits instead of polling for queues.
  • RAII (ThreadGuard) or std::jthread avoids forgotten joins; stop_token helps cooperative shutdown.
  • In production: cap thread count, use pools, run ThreadSanitizer.

One-liner: create std::thread, always join or detach, protect shared data with mutex/atomic, use condition_variable for condition-based waiting. Next, read mutex.

Next article

Once you can launch threads, you need to share data safely.


FAQ

Q. When do I use this in production?

A. As a C++11 concurrency primer: std::thread, join/detach, process vs thread, thread safety, and common pitfalls—apply the examples and selection guidance from the body.

Q. What should I read first?

A. Follow previous post links in order, or open the C++ series index.

Q. How do I go deeper?

A. cppreference and library docs; see References below.

Q. std::thread vs std::async?

A. Use std::thread when you manage the thread directly. Use std::async when you want a future / deferred result. For “just run in the background,” std::thread is often enough.

Q. When do I need a thread pool?

A. Creating/destroying threads repeatedly is expensive; for steady workloads, reuse a pool. After this article, you can build pools with mutex and condition_variable.

Q. “undefined reference to pthread” when linking

A. On Linux, link with -pthread:

g++ -std=c++17 -pthread -o myapp myapp.cpp

Q. join or detach by default?

A. Default to join. Use detach only for long-lived daemon-style work (e.g. logging) and verify data lifetime.

Next: C++ guide #7-2: mutex and synchronization — races, mutex, lock_guard.

References


See also

  • C++ condition_variable patterns
  • C++ atomic (memory_order)
  • C++ mutex and races
  • C++ advanced threading: pools and work stealing
  • C++ stack vs heap

같이 보면 좋은 글 (내부 링크)

이 주제와 연결되는 다른 글입니다.


이 글에서 다루는 키워드 (관련 검색어)

C++, multithreading, std::thread, std::jthread, condition_variable, std::atomic, threading 등으로 검색하시면 이 글이 도움이 됩니다.