Why is my first run always slower or faster?

Cold caches, JIT/OS effects, and allocator warmup skew single runs. Discard warmup iterations and report median or percentile statistics over many repeats.

Is std::chrono enough compared to Google Benchmark?

chrono is fine for rough timing; Google Benchmark automates loops, reports CPU time vs wall time, and filters noise—use it when comparisons must be defensible in reviews.

Should I compile with -O2 or -O3 when benchmarking?

Match the optimization level you ship with; comparing debug vs release is misleading. Note LTO and CPU-specific flags can change relative rankings.

How do I avoid the compiler deleting my benchmark work?

Use Google Benchmark’s DoNotOptimize or consume results with side effects the optimizer cannot prove unused; otherwise dead-code elimination distorts timings.

C++ Benchmarking | A Guide to Benchmarking

2026년 3월 12일 · 12분 읽기 · 수정 2026년 3월 30일 Intermediate Tutorial

이 글의 핵심

C++ microbenchmarking with chrono and Google Benchmark: warmup, iterations, variance, and measuring real optimizations—not noise.

What is Benchmarking?

Performance measurement is similar to using stopwatch benchmarking with chrono and time conversion. By running benchmarks before and after performance optimization, you can quantify the improvements.

Benchmarking Process

graph LR
    A[Write Code] --> B[Warmup]
    B --> C[Start Measure]
    C --> D[Repeat Run]
    D --> E[End Measure]
    E --> F[Calc Stats]
    F --> G{Goal Met?}
    G -->|No| H[Optimize]
    H --> A
    G -->|Yes| I[Done]

#include <chrono>

auto start = std::chrono::high_resolution_clock::now();

// Execute code

auto end = std::chrono::high_resolution_clock::now();
auto duration = std::chrono::duration_cast<std::chrono::microseconds>(
    end - start
);

std::cout << "Time: " << duration.count() << "μs" << std::endl;

Basic Measurement

template<typename Func>
auto benchmark(Func f, int iterations = 1000) {
    using namespace std::chrono;
    
    auto start = high_resolution_clock::now();
    
    for (int i = 0; i < iterations; ++i) {
        f();
    }
    
    auto end = high_resolution_clock::now();
    auto total = duration_cast<microseconds>(end - start);
    
    return total.count() / iterations;
}

int main() {
    auto avgTime = benchmark([]() {
        std::vector<int> v(1000);
    });
    
    std::cout << "Average: " << avgTime << "μs" << std::endl;
}

Practical Examples

Example 1: Comparing Algorithms

#include <algorithm>
#include <vector>
#include <chrono>

void compareSort() {
    std::vector<int> data(100000);
    std::generate(data.begin(), data.end(), std::rand);
    
    // std::sort
    auto data1 = data;
    auto start1 = std::chrono::high_resolution_clock::now();
    std::sort(data1.begin(), data1.end());
    auto end1 = std::chrono::high_resolution_clock::now();
    auto time1 = std::chrono::duration_cast<std::chrono::milliseconds>(end1 - start1);
    
    // std::stable_sort
    auto data2 = data;
    auto start2 = std::chrono::high_resolution_clock::now();
    std::stable_sort(data2.begin(), data2.end());
    auto end2 = std::chrono::high_resolution_clock::now();
    auto time2 = std::chrono::duration_cast<std::chrono::milliseconds>(end2 - start2);
    
    std::cout << "sort: " << time1.count() << "ms" << std::endl;
    std::cout << "stable_sort: " << time2.count() << "ms" << std::endl;
}

Performance Comparison of Sorting Algorithms (100,000 elements):

Algorithm	Average Time	Worst-Case Complexity	Stability	Memory
std::sort	~8ms	O(N log N)	❌	O(log N)
std::stable_sort	~12ms	O(N log² N)	✅	O(N)
std::partial_sort	~5ms (Top 10%)	O(N log K)	❌	O(1)
std::nth_element	~2ms (Median)	O(N)	❌	O(1)

Example 2: Collecting Statistics

#include <vector>
#include <algorithm>
#include <numeric>

class BenchmarkStats {
    std::vector<double> samples;
    
public:
    void addSample(double microseconds) {
        samples.push_back(microseconds);
    }
    
    void printStats() const {
        auto sum = std::accumulate(samples.begin(), samples.end(), 0.0);
        auto avg = sum / samples.size();
        
        auto sorted = samples;
        std::sort(sorted.begin(), sorted.end());
        auto median = sorted[sorted.size() / 2];
        
        auto min = *std::min_element(samples.begin(), samples.end());
        auto max = *std::max_element(samples.begin(), samples.end());
        
        // Standard deviation
        double variance = 0.0;
        for (double s : samples) {
            variance += (s - avg) * (s - avg);
        }
        double stddev = std::sqrt(variance / samples.size());
        
        std::cout << "Average: " << avg << "μs" << std::endl;
        std::cout << "Median: " << median << "μs" << std::endl;
        std::cout << "Min: " << min << "μs" << std::endl;
        std::cout << "Max: " << max << "μs" << std::endl;
        std::cout << "StdDev: " << stddev << "μs" << std::endl;
    }
};

Meaning of Statistical Metrics:

Metric	Meaning	Usage
Mean	Overall average time	General performance
Median	Middle value	Outlier removal
Min	Best performance	Optimal conditions
Max	Worst performance	Worst-case scenario
StdDev	Variability	Stability evaluation
Percentile (P95, P99)	Top 5%, 1%	SLA benchmarks

Example 3: Google Benchmark

#include <benchmark/benchmark.h>
#include <vector>

static void BM_VectorPushBack(benchmark::State& state) {
    for (auto _ : state) {
        std::vector<int> v;
        for (int i = 0; i < state.range(0); ++i) {
            v.push_back(i);
        }
    }
}

BENCHMARK(BM_VectorPushBack)->Range(8, 8<<10);

static void BM_VectorReserve(benchmark::State& state) {
    for (auto _ : state) {
        std::vector<int> v;
        v.reserve(state.range(0));
        for (int i = 0; i < state.range(0); ++i) {
            v.push_back(i);
        }
    }
}

BENCHMARK(BM_VectorReserve)->Range(8, 8<<10);

BENCHMARK_MAIN();

Example 4: Warmup

template<typename Func>
auto benchmarkWithWarmup(Func f, int warmup, int iterations) {
    // Warmup
    for (int i = 0; i < warmup; ++i) {
        f();
    }
    
    // Measurement
    BenchmarkStats stats;
    for (int i = 0; i < iterations; ++i) {
        auto start = std::chrono::high_resolution_clock::now();
        f();
        auto end = std::chrono::high_resolution_clock::now();
        
        auto duration = std::chrono::duration_cast<std::chrono::microseconds>(
            end - start
        );
        stats.addSample(duration.count());
    }
    
    return stats;
}

Benchmarking Tips

Checklist for Accurate Measurement

Item	Recommendation	Reason
Warmup	10-100 runs	Optimize CPU cache, branch prediction
Repetitions	100-1000 runs	Ensure statistical significance
Prevent Optimization	`volatile` or `DoNotOptimize`	Avoid compiler optimizations
Isolated Runs	Close other processes	Minimize noise
CPU Pinning	`taskset` (Linux)	Prevent core switching
Release Build	`-O3 -DNDEBUG`	Measure real performance

// 1. Warmup
for (int i = 0; i < 10; ++i) {
    f();  // Warm up cache
}

// 2. Multiple Measurements
for (int i = 0; i < 100; ++i) {
    benchmark(f);
}

// 3. Prevent Optimization
volatile int result = compute();

// 4. Statistical Analysis
// Mean, Median, StdDev

Common Issues

Issue 1: Optimization

// ❌ Removed by optimization
auto time = benchmark([]() {
    int x = 42;
    return x;
});

// ✅ Use the result
int result;
auto time = benchmark([&]() {
    result = compute();
});
benchmark::DoNotOptimize(result);

Issue 2: Cache

// ❌ Cache effects
auto time1 = benchmark(f);  // Cache miss
auto time2 = benchmark(f);  // Cache hit (faster)

// ✅ Warmup
for (int i = 0; i < 10; ++i) f();
auto time = benchmark(f);

Issue 3: Variability

// Single measurement is inaccurate
auto time = benchmark(f);

// ✅ Multiple measurements
BenchmarkStats stats;
for (int i = 0; i < 100; ++i) {
    stats.addSample(benchmark(f));
}
stats.printStats();

Issue 4: Measurement Overhead

// Very short task
auto time = benchmark([]() {
    int x = 1 + 1;
});

// Measurement overhead > Actual time
// Repeat multiple times and average

Google Benchmark

# Installation
git clone https://github.com/google/benchmark.git
cd benchmark
cmake -E make_directory "build"
cmake -E chdir "build" cmake -DBENCHMARK_DOWNLOAD_DEPENDENCIES=on -DCMAKE_BUILD_TYPE=Release ../
cmake --build "build" --config Release

# Compilation
g++ -std=c++17 bench.cpp -lbenchmark -lpthread -o bench

# Execution
./bench

FAQ

Q1: What is Benchmarking?

A: Measuring performance.

Q2: What is Warmup?

A: Removing cache effects.

Q3: What are Statistics?

A: Mean, Median, StdDev.

Q4: Recommended Tools?

A: Google Benchmark, perf, vtune.

Q5: How to Prevent Optimization?

A: Use volatile or DoNotOptimize.

Q6: Learning Resources?

“Optimized C++”
“Google Benchmark Docs”
cppreference.com

Related Posts: Stopwatch Benchmarking, Duration, Time Conversion, Performance Optimization.