What is the difference between `par` and `par_unseq`?

`par` may split work across threads; `par_unseq` also allows **vectorization and interleaving**, which imposes stricter requirements. It is more sensitive to aliasing and data races.

Will slapping `par` on every algorithm make it faster?

Small inputs, memory bandwidth limits, and synchronization can make it **slower**. Profile first. Algorithms where associativity matters can change results—read the standard carefully.

What happens to exceptions in parallel algorithms?

Depends on policy and implementation, but if one task throws, the operation may **cancel** and propagate an exception. Check your standard library docs for exception safety guarantees.

Why pass `seq` explicitly?

To pick the sequential overload clearly, or in environments (real-time, debugging) where parallel execution must be **forbidden** or made obvious.

C++ Execution Policies | Parallel and Vectorized STL (C++17)

2026년 3월 12일 · 10분 읽기 · 수정 2026년 3월 12일 Advanced Tutorial

이 글의 핵심

Guide to C++17 execution policies: parallel algorithms, safety rules, and practical examples.

What is an execution policy?

It selects how an algorithm runs (C++17).

#include <algorithm>
#include <execution>
#include <vector>

std::vector<int> v = {3, 1, 4, 1, 5};

// Sequential
std::sort(std::execution::seq, v.begin(), v.end());

// Parallel
std::sort(std::execution::par, v.begin(), v.end());

// Parallel + vectorization
std::sort(std::execution::par_unseq, v.begin(), v.end());

Policy kinds

#include <execution>

// sequenced_policy: sequential
std::execution::seq

// parallel_policy: parallel threads
std::execution::par

// parallel_unsequenced_policy: parallel + SIMD-style
std::execution::par_unseq

// unsequenced_policy (C++20)
std::execution::unseq

Practical examples

Example 1: Parallel sort benchmark

#include <algorithm>
#include <execution>
#include <vector>
#include <chrono>

void benchmark() {
    std::vector<int> data(10000000);
    std::generate(data.begin(), data.end(), std::rand);
    
    // Sequential
    auto v1 = data;
    auto start1 = std::chrono::steady_clock::now();
    std::sort(std::execution::seq, v1.begin(), v1.end());
    auto end1 = std::chrono::steady_clock::now();
    
    // Parallel
    auto v2 = data;
    auto start2 = std::chrono::steady_clock::now();
    std::sort(std::execution::par, v2.begin(), v2.end());
    auto end2 = std::chrono::steady_clock::now();
    
    auto time1 = std::chrono::duration_cast<std::chrono::milliseconds>(end1 - start1);
    auto time2 = std::chrono::duration_cast<std::chrono::milliseconds>(end2 - start2);
    
    std::cout << "Sequential: " << time1.count() << "ms" << std::endl;
    std::cout << "Parallel: " << time2.count() << "ms" << std::endl;
}

Example 2: Parallel transform

#include <algorithm>
#include <execution>

int main() {
    std::vector<int> v(1000000);
    std::iota(v.begin(), v.end(), 1);
    
    std::transform(std::execution::par, v.begin(), v.end(), v.begin(),
        [](int x) { return x * x; });
}

Example 3: Parallel reduction

#include <numeric>
#include <execution>

int main() {
    std::vector<int> v(10000000, 1);
    
    int sum = std::reduce(std::execution::par, v.begin(), v.end(), 0);
    
    std::cout << "Sum: " << sum << std::endl;
}

Example 4: Conditional parallel sort

#include <algorithm>
#include <execution>

template<typename T>
void conditionalSort(std::vector<T>& v, bool parallel = true) {
    if (parallel && v.size() > 10000) {
        std::sort(std::execution::par, v.begin(), v.end());
    } else {
        std::sort(v.begin(), v.end());
    }
}

Choosing a policy

// seq: sequential (default-like)
// - Single thread
// - Predictable

// par: parallel
// - Multiple threads
// - Watch for data races

// par_unseq: parallel + unsequenced vectorization
// - SIMD + threads
// - Stricter: limited synchronization patterns

Common problems

Problem 1: Data races

int counter = 0;

std::vector<int> v(1000);

// ❌ Data race
std::for_each(std::execution::par, v.begin(), v.end(), [&](int x) {
    ++counter;  // race
});

// ✅ atomic
std::atomic<int> counter{0};
std::for_each(std::execution::par, v.begin(), v.end(), [&](int x) {
    ++counter;
});

Problem 2: Synchronization with `par_unseq`

std::mutex mtx;

// ❌ Mutex with par_unseq — undefined behavior
std::for_each(std::execution::par_unseq, v.begin(), v.end(), [&](int x) {
    std::lock_guard lock{mtx};
    // ...
});

// ✅ Mutex with par is allowed (check your implementation docs)
std::for_each(std::execution::par, v.begin(), v.end(), [&](int x) {
    std::lock_guard lock{mtx};
    // ...
});

Problem 3: Overhead

std::vector<int> small(100);

// ❌ Parallel on tiny input
std::sort(std::execution::par, small.begin(), small.end());
// overhead can exceed benefit

// ✅ Parallel for large inputs
std::vector<int> large(10000000);
std::sort(std::execution::par, large.begin(), large.end());

Problem 4: Exceptions

try {
    std::for_each(std::execution::par, v.begin(), v.end(), [](int x) {
        if (x < 0) {
            throw std::runtime_error("negative");
        }
    });
} catch (...) {
    // Multiple exceptions possible; std::terminate is possible in some cases
}

Supported algorithms

// Most parallelizable STL algorithms
std::sort(policy, begin, end)
std::transform(policy, begin, end, out, func)
std::for_each(policy, begin, end, func)
std::reduce(policy, begin, end, init)
std::find(policy, begin, end, value)
// ...

FAQ

Q1: Execution policy?

A: Chooses how the algorithm executes (C++17).

Q2: Kinds?

A: seq, par, par_unseq.

Q3: When is parallel worth it?

Large data
Independent work
No data races

Q4: Synchronization?

A: par_unseq forbids typical mutex use; par is more permissive—still read the rules.

Q5: Performance?

A: Helps most on large, parallel-friendly workloads.

Q6: Learning resources?

“C++17 The Complete Guide”
“C++ Concurrency in Action”
cppreference.com

C++ parallel algorithms
C++ path handling
C++ policy-based design

Practical tips

Tips you can apply at work.

Debugging

When something breaks, check compiler warnings first
Reproduce with a small test case

Performance

Do not optimize without profiling
Define measurable targets first

Code review

Pre-check areas that often get flagged in review
Follow team conventions

Production checklist

Things to verify when applying this idea in practice.

Before coding

Is this technique the best fit for the problem?
Can teammates understand and maintain it?
Does it meet performance requirements?

While coding

Are all compiler warnings addressed?
Are edge cases considered?
Is error handling appropriate?

At review

Is intent clear?
Are tests sufficient?
Is it documented?

Use this checklist to reduce mistakes and improve quality.

Keywords covered

Search for C++, execution, parallel, policy, C++17 to find this post.

C++ parallel algorithms
C++ any
Modern C++ cheat sheet
C++ CTAD
C++ string vs string_view