C++ Execution Policies | Parallel and Vectorized STL (C++17)
이 글의 핵심
Guide to C++17 execution policies: parallel algorithms, safety rules, and practical examples.
What is an execution policy?
It selects how an algorithm runs (C++17).
#include <algorithm>
#include <execution>
#include <vector>
std::vector<int> v = {3, 1, 4, 1, 5};
// Sequential
std::sort(std::execution::seq, v.begin(), v.end());
// Parallel
std::sort(std::execution::par, v.begin(), v.end());
// Parallel + vectorization
std::sort(std::execution::par_unseq, v.begin(), v.end());
Policy kinds
#include <execution>
// sequenced_policy: sequential
std::execution::seq
// parallel_policy: parallel threads
std::execution::par
// parallel_unsequenced_policy: parallel + SIMD-style
std::execution::par_unseq
// unsequenced_policy (C++20)
std::execution::unseq
Practical examples
Example 1: Parallel sort benchmark
#include <algorithm>
#include <execution>
#include <vector>
#include <chrono>
void benchmark() {
std::vector<int> data(10000000);
std::generate(data.begin(), data.end(), std::rand);
// Sequential
auto v1 = data;
auto start1 = std::chrono::steady_clock::now();
std::sort(std::execution::seq, v1.begin(), v1.end());
auto end1 = std::chrono::steady_clock::now();
// Parallel
auto v2 = data;
auto start2 = std::chrono::steady_clock::now();
std::sort(std::execution::par, v2.begin(), v2.end());
auto end2 = std::chrono::steady_clock::now();
auto time1 = std::chrono::duration_cast<std::chrono::milliseconds>(end1 - start1);
auto time2 = std::chrono::duration_cast<std::chrono::milliseconds>(end2 - start2);
std::cout << "Sequential: " << time1.count() << "ms" << std::endl;
std::cout << "Parallel: " << time2.count() << "ms" << std::endl;
}
Example 2: Parallel transform
#include <algorithm>
#include <execution>
int main() {
std::vector<int> v(1000000);
std::iota(v.begin(), v.end(), 1);
std::transform(std::execution::par, v.begin(), v.end(), v.begin(),
[](int x) { return x * x; });
}
Example 3: Parallel reduction
#include <numeric>
#include <execution>
int main() {
std::vector<int> v(10000000, 1);
int sum = std::reduce(std::execution::par, v.begin(), v.end(), 0);
std::cout << "Sum: " << sum << std::endl;
}
Example 4: Conditional parallel sort
#include <algorithm>
#include <execution>
template<typename T>
void conditionalSort(std::vector<T>& v, bool parallel = true) {
if (parallel && v.size() > 10000) {
std::sort(std::execution::par, v.begin(), v.end());
} else {
std::sort(v.begin(), v.end());
}
}
Choosing a policy
// seq: sequential (default-like)
// - Single thread
// - Predictable
// par: parallel
// - Multiple threads
// - Watch for data races
// par_unseq: parallel + unsequenced vectorization
// - SIMD + threads
// - Stricter: limited synchronization patterns
Common problems
Problem 1: Data races
int counter = 0;
std::vector<int> v(1000);
// ❌ Data race
std::for_each(std::execution::par, v.begin(), v.end(), [&](int x) {
++counter; // race
});
// ✅ atomic
std::atomic<int> counter{0};
std::for_each(std::execution::par, v.begin(), v.end(), [&](int x) {
++counter;
});
Problem 2: Synchronization with par_unseq
std::mutex mtx;
// ❌ Mutex with par_unseq — undefined behavior
std::for_each(std::execution::par_unseq, v.begin(), v.end(), [&](int x) {
std::lock_guard lock{mtx};
// ...
});
// ✅ Mutex with par is allowed (check your implementation docs)
std::for_each(std::execution::par, v.begin(), v.end(), [&](int x) {
std::lock_guard lock{mtx};
// ...
});
Problem 3: Overhead
std::vector<int> small(100);
// ❌ Parallel on tiny input
std::sort(std::execution::par, small.begin(), small.end());
// overhead can exceed benefit
// ✅ Parallel for large inputs
std::vector<int> large(10000000);
std::sort(std::execution::par, large.begin(), large.end());
Problem 4: Exceptions
try {
std::for_each(std::execution::par, v.begin(), v.end(), [](int x) {
if (x < 0) {
throw std::runtime_error("negative");
}
});
} catch (...) {
// Multiple exceptions possible; std::terminate is possible in some cases
}
Supported algorithms
// Most parallelizable STL algorithms
std::sort(policy, begin, end)
std::transform(policy, begin, end, out, func)
std::for_each(policy, begin, end, func)
std::reduce(policy, begin, end, init)
std::find(policy, begin, end, value)
// ...
FAQ
Q1: Execution policy?
A: Chooses how the algorithm executes (C++17).
Q2: Kinds?
A: seq, par, par_unseq.
Q3: When is parallel worth it?
A:
- Large data
- Independent work
- No data races
Q4: Synchronization?
A: par_unseq forbids typical mutex use; par is more permissive—still read the rules.
Q5: Performance?
A: Helps most on large, parallel-friendly workloads.
Q6: Learning resources?
A:
- “C++17 The Complete Guide”
- “C++ Concurrency in Action”
- cppreference.com
Related reading (internal links)
- C++ parallel algorithms
- C++ path handling
- C++ policy-based design
Practical tips
Tips you can apply at work.
Debugging
- When something breaks, check compiler warnings first
- Reproduce with a small test case
Performance
- Do not optimize without profiling
- Define measurable targets first
Code review
- Pre-check areas that often get flagged in review
- Follow team conventions
Production checklist
Things to verify when applying this idea in practice.
Before coding
- Is this technique the best fit for the problem?
- Can teammates understand and maintain it?
- Does it meet performance requirements?
While coding
- Are all compiler warnings addressed?
- Are edge cases considered?
- Is error handling appropriate?
At review
- Is intent clear?
- Are tests sufficient?
- Is it documented?
Use this checklist to reduce mistakes and improve quality.
Keywords covered
Search for C++, execution, parallel, policy, C++17 to find this post.
Related posts
- C++ parallel algorithms
- C++ any
- Modern C++ cheat sheet
- C++ CTAD
- C++ string vs string_view