C++ Benchmarking | A Guide to Benchmarking
이 글의 핵심
C++ microbenchmarking with chrono and Google Benchmark: warmup, iterations, variance, and measuring real optimizations—not noise.
What is Benchmarking?
Performance measurement is similar to using stopwatch benchmarking with chrono and time conversion. By running benchmarks before and after performance optimization, you can quantify the improvements.
Benchmarking Process
graph LR
A[Write Code] --> B[Warmup]
B --> C[Start Measure]
C --> D[Repeat Run]
D --> E[End Measure]
E --> F[Calc Stats]
F --> G{Goal Met?}
G -->|No| H[Optimize]
H --> A
G -->|Yes| I[Done]
#include <chrono>
auto start = std::chrono::high_resolution_clock::now();
// Execute code
auto end = std::chrono::high_resolution_clock::now();
auto duration = std::chrono::duration_cast<std::chrono::microseconds>(
end - start
);
std::cout << "Time: " << duration.count() << "μs" << std::endl;
Basic Measurement
template<typename Func>
auto benchmark(Func f, int iterations = 1000) {
using namespace std::chrono;
auto start = high_resolution_clock::now();
for (int i = 0; i < iterations; ++i) {
f();
}
auto end = high_resolution_clock::now();
auto total = duration_cast<microseconds>(end - start);
return total.count() / iterations;
}
int main() {
auto avgTime = benchmark([]() {
std::vector<int> v(1000);
});
std::cout << "Average: " << avgTime << "μs" << std::endl;
}
Practical Examples
Example 1: Comparing Algorithms
#include <algorithm>
#include <vector>
#include <chrono>
void compareSort() {
std::vector<int> data(100000);
std::generate(data.begin(), data.end(), std::rand);
// std::sort
auto data1 = data;
auto start1 = std::chrono::high_resolution_clock::now();
std::sort(data1.begin(), data1.end());
auto end1 = std::chrono::high_resolution_clock::now();
auto time1 = std::chrono::duration_cast<std::chrono::milliseconds>(end1 - start1);
// std::stable_sort
auto data2 = data;
auto start2 = std::chrono::high_resolution_clock::now();
std::stable_sort(data2.begin(), data2.end());
auto end2 = std::chrono::high_resolution_clock::now();
auto time2 = std::chrono::duration_cast<std::chrono::milliseconds>(end2 - start2);
std::cout << "sort: " << time1.count() << "ms" << std::endl;
std::cout << "stable_sort: " << time2.count() << "ms" << std::endl;
}
Performance Comparison of Sorting Algorithms (100,000 elements):
| Algorithm | Average Time | Worst-Case Complexity | Stability | Memory |
|---|---|---|---|---|
| std::sort | ~8ms | O(N log N) | ❌ | O(log N) |
| std::stable_sort | ~12ms | O(N log² N) | ✅ | O(N) |
| std::partial_sort | ~5ms (Top 10%) | O(N log K) | ❌ | O(1) |
| std::nth_element | ~2ms (Median) | O(N) | ❌ | O(1) |
Example 2: Collecting Statistics
#include <vector>
#include <algorithm>
#include <numeric>
class BenchmarkStats {
std::vector<double> samples;
public:
void addSample(double microseconds) {
samples.push_back(microseconds);
}
void printStats() const {
auto sum = std::accumulate(samples.begin(), samples.end(), 0.0);
auto avg = sum / samples.size();
auto sorted = samples;
std::sort(sorted.begin(), sorted.end());
auto median = sorted[sorted.size() / 2];
auto min = *std::min_element(samples.begin(), samples.end());
auto max = *std::max_element(samples.begin(), samples.end());
// Standard deviation
double variance = 0.0;
for (double s : samples) {
variance += (s - avg) * (s - avg);
}
double stddev = std::sqrt(variance / samples.size());
std::cout << "Average: " << avg << "μs" << std::endl;
std::cout << "Median: " << median << "μs" << std::endl;
std::cout << "Min: " << min << "μs" << std::endl;
std::cout << "Max: " << max << "μs" << std::endl;
std::cout << "StdDev: " << stddev << "μs" << std::endl;
}
};
Meaning of Statistical Metrics:
| Metric | Meaning | Usage |
|---|---|---|
| Mean | Overall average time | General performance |
| Median | Middle value | Outlier removal |
| Min | Best performance | Optimal conditions |
| Max | Worst performance | Worst-case scenario |
| StdDev | Variability | Stability evaluation |
| Percentile (P95, P99) | Top 5%, 1% | SLA benchmarks |
Example 3: Google Benchmark
#include <benchmark/benchmark.h>
#include <vector>
static void BM_VectorPushBack(benchmark::State& state) {
for (auto _ : state) {
std::vector<int> v;
for (int i = 0; i < state.range(0); ++i) {
v.push_back(i);
}
}
}
BENCHMARK(BM_VectorPushBack)->Range(8, 8<<10);
static void BM_VectorReserve(benchmark::State& state) {
for (auto _ : state) {
std::vector<int> v;
v.reserve(state.range(0));
for (int i = 0; i < state.range(0); ++i) {
v.push_back(i);
}
}
}
BENCHMARK(BM_VectorReserve)->Range(8, 8<<10);
BENCHMARK_MAIN();
Example 4: Warmup
template<typename Func>
auto benchmarkWithWarmup(Func f, int warmup, int iterations) {
// Warmup
for (int i = 0; i < warmup; ++i) {
f();
}
// Measurement
BenchmarkStats stats;
for (int i = 0; i < iterations; ++i) {
auto start = std::chrono::high_resolution_clock::now();
f();
auto end = std::chrono::high_resolution_clock::now();
auto duration = std::chrono::duration_cast<std::chrono::microseconds>(
end - start
);
stats.addSample(duration.count());
}
return stats;
}
Benchmarking Tips
Checklist for Accurate Measurement
| Item | Recommendation | Reason |
|---|---|---|
| Warmup | 10-100 runs | Optimize CPU cache, branch prediction |
| Repetitions | 100-1000 runs | Ensure statistical significance |
| Prevent Optimization | volatile or DoNotOptimize | Avoid compiler optimizations |
| Isolated Runs | Close other processes | Minimize noise |
| CPU Pinning | taskset (Linux) | Prevent core switching |
| Release Build | -O3 -DNDEBUG | Measure real performance |
// 1. Warmup
for (int i = 0; i < 10; ++i) {
f(); // Warm up cache
}
// 2. Multiple Measurements
for (int i = 0; i < 100; ++i) {
benchmark(f);
}
// 3. Prevent Optimization
volatile int result = compute();
// 4. Statistical Analysis
// Mean, Median, StdDev
Common Issues
Issue 1: Optimization
// ❌ Removed by optimization
auto time = benchmark([]() {
int x = 42;
return x;
});
// ✅ Use the result
int result;
auto time = benchmark([&]() {
result = compute();
});
benchmark::DoNotOptimize(result);
Issue 2: Cache
// ❌ Cache effects
auto time1 = benchmark(f); // Cache miss
auto time2 = benchmark(f); // Cache hit (faster)
// ✅ Warmup
for (int i = 0; i < 10; ++i) f();
auto time = benchmark(f);
Issue 3: Variability
// Single measurement is inaccurate
auto time = benchmark(f);
// ✅ Multiple measurements
BenchmarkStats stats;
for (int i = 0; i < 100; ++i) {
stats.addSample(benchmark(f));
}
stats.printStats();
Issue 4: Measurement Overhead
// Very short task
auto time = benchmark([]() {
int x = 1 + 1;
});
// Measurement overhead > Actual time
// Repeat multiple times and average
Google Benchmark
# Installation
git clone https://github.com/google/benchmark.git
cd benchmark
cmake -E make_directory "build"
cmake -E chdir "build" cmake -DBENCHMARK_DOWNLOAD_DEPENDENCIES=on -DCMAKE_BUILD_TYPE=Release ../
cmake --build "build" --config Release
# Compilation
g++ -std=c++17 bench.cpp -lbenchmark -lpthread -o bench
# Execution
./bench
FAQ
Q1: What is Benchmarking?
A: Measuring performance.
Q2: What is Warmup?
A: Removing cache effects.
Q3: What are Statistics?
A: Mean, Median, StdDev.
Q4: Recommended Tools?
A: Google Benchmark, perf, vtune.
Q5: How to Prevent Optimization?
A: Use volatile or DoNotOptimize.
Q6: Learning Resources?
A:
- “Optimized C++”
- “Google Benchmark Docs”
- cppreference.com
Related Posts: Stopwatch Benchmarking, Duration, Time Conversion, Performance Optimization.