C++ Execution Policies | Parallel and Vectorized STL (C++17)

C++ Execution Policies | Parallel and Vectorized STL (C++17)

이 글의 핵심

Guide to C++17 execution policies: parallel algorithms, safety rules, and practical examples.

What is an execution policy?

It selects how an algorithm runs (C++17).

#include <algorithm>
#include <execution>
#include <vector>

std::vector<int> v = {3, 1, 4, 1, 5};

// Sequential
std::sort(std::execution::seq, v.begin(), v.end());

// Parallel
std::sort(std::execution::par, v.begin(), v.end());

// Parallel + vectorization
std::sort(std::execution::par_unseq, v.begin(), v.end());

Policy kinds

#include <execution>

// sequenced_policy: sequential
std::execution::seq

// parallel_policy: parallel threads
std::execution::par

// parallel_unsequenced_policy: parallel + SIMD-style
std::execution::par_unseq

// unsequenced_policy (C++20)
std::execution::unseq

Practical examples

Example 1: Parallel sort benchmark

#include <algorithm>
#include <execution>
#include <vector>
#include <chrono>

void benchmark() {
    std::vector<int> data(10000000);
    std::generate(data.begin(), data.end(), std::rand);
    
    // Sequential
    auto v1 = data;
    auto start1 = std::chrono::steady_clock::now();
    std::sort(std::execution::seq, v1.begin(), v1.end());
    auto end1 = std::chrono::steady_clock::now();
    
    // Parallel
    auto v2 = data;
    auto start2 = std::chrono::steady_clock::now();
    std::sort(std::execution::par, v2.begin(), v2.end());
    auto end2 = std::chrono::steady_clock::now();
    
    auto time1 = std::chrono::duration_cast<std::chrono::milliseconds>(end1 - start1);
    auto time2 = std::chrono::duration_cast<std::chrono::milliseconds>(end2 - start2);
    
    std::cout << "Sequential: " << time1.count() << "ms" << std::endl;
    std::cout << "Parallel: " << time2.count() << "ms" << std::endl;
}

Example 2: Parallel transform

#include <algorithm>
#include <execution>

int main() {
    std::vector<int> v(1000000);
    std::iota(v.begin(), v.end(), 1);
    
    std::transform(std::execution::par, v.begin(), v.end(), v.begin(),
        [](int x) { return x * x; });
}

Example 3: Parallel reduction

#include <numeric>
#include <execution>

int main() {
    std::vector<int> v(10000000, 1);
    
    int sum = std::reduce(std::execution::par, v.begin(), v.end(), 0);
    
    std::cout << "Sum: " << sum << std::endl;
}

Example 4: Conditional parallel sort

#include <algorithm>
#include <execution>

template<typename T>
void conditionalSort(std::vector<T>& v, bool parallel = true) {
    if (parallel && v.size() > 10000) {
        std::sort(std::execution::par, v.begin(), v.end());
    } else {
        std::sort(v.begin(), v.end());
    }
}

Choosing a policy

// seq: sequential (default-like)
// - Single thread
// - Predictable

// par: parallel
// - Multiple threads
// - Watch for data races

// par_unseq: parallel + unsequenced vectorization
// - SIMD + threads
// - Stricter: limited synchronization patterns

Common problems

Problem 1: Data races

int counter = 0;

std::vector<int> v(1000);

// ❌ Data race
std::for_each(std::execution::par, v.begin(), v.end(), [&](int x) {
    ++counter;  // race
});

// ✅ atomic
std::atomic<int> counter{0};
std::for_each(std::execution::par, v.begin(), v.end(), [&](int x) {
    ++counter;
});

Problem 2: Synchronization with par_unseq

std::mutex mtx;

// ❌ Mutex with par_unseq — undefined behavior
std::for_each(std::execution::par_unseq, v.begin(), v.end(), [&](int x) {
    std::lock_guard lock{mtx};
    // ...
});

// ✅ Mutex with par is allowed (check your implementation docs)
std::for_each(std::execution::par, v.begin(), v.end(), [&](int x) {
    std::lock_guard lock{mtx};
    // ...
});

Problem 3: Overhead

std::vector<int> small(100);

// ❌ Parallel on tiny input
std::sort(std::execution::par, small.begin(), small.end());
// overhead can exceed benefit

// ✅ Parallel for large inputs
std::vector<int> large(10000000);
std::sort(std::execution::par, large.begin(), large.end());

Problem 4: Exceptions

try {
    std::for_each(std::execution::par, v.begin(), v.end(), [](int x) {
        if (x < 0) {
            throw std::runtime_error("negative");
        }
    });
} catch (...) {
    // Multiple exceptions possible; std::terminate is possible in some cases
}

Supported algorithms

// Most parallelizable STL algorithms
std::sort(policy, begin, end)
std::transform(policy, begin, end, out, func)
std::for_each(policy, begin, end, func)
std::reduce(policy, begin, end, init)
std::find(policy, begin, end, value)
// ...

FAQ

Q1: Execution policy?

A: Chooses how the algorithm executes (C++17).

Q2: Kinds?

A: seq, par, par_unseq.

Q3: When is parallel worth it?

A:

  • Large data
  • Independent work
  • No data races

Q4: Synchronization?

A: par_unseq forbids typical mutex use; par is more permissive—still read the rules.

Q5: Performance?

A: Helps most on large, parallel-friendly workloads.

Q6: Learning resources?

A:

  • “C++17 The Complete Guide”
  • “C++ Concurrency in Action”
  • cppreference.com

  • C++ parallel algorithms
  • C++ path handling
  • C++ policy-based design

Practical tips

Tips you can apply at work.

Debugging

  • When something breaks, check compiler warnings first
  • Reproduce with a small test case

Performance

  • Do not optimize without profiling
  • Define measurable targets first

Code review

  • Pre-check areas that often get flagged in review
  • Follow team conventions

Production checklist

Things to verify when applying this idea in practice.

Before coding

  • Is this technique the best fit for the problem?
  • Can teammates understand and maintain it?
  • Does it meet performance requirements?

While coding

  • Are all compiler warnings addressed?
  • Are edge cases considered?
  • Is error handling appropriate?

At review

  • Is intent clear?
  • Are tests sufficient?
  • Is it documented?

Use this checklist to reduce mistakes and improve quality.


Keywords covered

Search for C++, execution, parallel, policy, C++17 to find this post.


  • C++ parallel algorithms
  • C++ any
  • Modern C++ cheat sheet
  • C++ CTAD
  • C++ string vs string_view