C++ Benchmarking: chrono, Warmup, Statistics, and Google Benchmark

2026년 3월 12일 · 12분 읽기 · 수정 2026년 3월 12일 Intermediate Tutorial

이 글의 핵심

Practical C++ benchmarking: timing, statistics, and Google Benchmark.

What is benchmarking?

Performance measurement ties to stopwatch and benchmark patterns and chrono / time conversion. Running benchmarks before and after performance optimization gives numeric proof of improvement.

Benchmark workflow

graph LR
    A[Write Code] --> B[Warmup]
    B --> C[Start Measure]
    C --> D[Repeat Run]
    D --> E[End Measure]
    E --> F[Calc Stats]
    F --> G{Goal Met?}
    G -->|No| H[Optimize]
    H --> A
    G -->|Yes| I[Done]

#include <chrono>

auto start = std::chrono::high_resolution_clock::now();

// code under test

auto end = std::chrono::high_resolution_clock::now();
auto duration = std::chrono::duration_cast<std::chrono::microseconds>(
    end - start
);

std::cout << "Time: " << duration.count() << "μs" << std::endl;

Basic measurement

#include <vector>

template<typename Func>
auto benchmark(Func f, int iterations = 1000) {
    using namespace std::chrono;
    
    auto start = high_resolution_clock::now();
    
    for (int i = 0; i < iterations; ++i) {
        f();
    }
    
    auto end = high_resolution_clock::now();
    auto total = duration_cast<microseconds>(end - start);
    
    return total.count() / iterations;
}

int main() {
    auto avgTime = benchmark( []{
        std::vector<int> v(1000);
    });
    
    std::cout << "Average: " << avgTime << "μs" << std::endl;
}

Examples

Example 1: Comparing sort algorithms

#include <algorithm>
#include <vector>
#include <chrono>

void compareSort() {
    std::vector<int> data(100000);
    std::generate(data.begin(), data.end(), std::rand);
    
    auto data1 = data;
    auto start1 = std::chrono::high_resolution_clock::now();
    std::sort(data1.begin(), data1.end());
    auto end1 = std::chrono::high_resolution_clock::now();
    auto time1 = std::chrono::duration_cast<std::chrono::milliseconds>(end1 - start1);
    
    auto data2 = data;
    auto start2 = std::chrono::high_resolution_clock::now();
    std::stable_sort(data2.begin(), data2.end());
    auto end2 = std::chrono::high_resolution_clock::now();
    auto time2 = std::chrono::duration_cast<std::chrono::milliseconds>(end2 - start2);
    
    std::cout << "sort: " << time1.count() << "ms" << std::endl;
    std::cout << "stable_sort: " << time2.count() << "ms" << std::endl;
}

Sort performance (100,000 elements, illustrative):

Algorithm	Typical time	Worst case	Stable	Extra memory
std::sort	~8ms	O(N log N)	No	O(log N)
std::stable_sort	~12ms	O(N log² N)	Yes	O(N)
std::partial_sort	~5ms (top 10%)	O(N log K)	No	O(1)
std::nth_element	~2ms (median)	O(N)	No	O(1)

Example 2: Statistics

#include <vector>
#include <algorithm>
#include <numeric>

class BenchmarkStats {
    std::vector<double> samples;
    
public:
    void addSample(double microseconds) {
        samples.push_back(microseconds);
    }
    
    void printStats() const {
        auto sum = std::accumulate(samples.begin(), samples.end(), 0.0);
        auto avg = sum / samples.size();
        
        auto sorted = samples;
        std::sort(sorted.begin(), sorted.end());
        auto median = sorted[sorted.size() / 2];
        
        auto min = *std::min_element(samples.begin(), samples.end());
        auto max = *std::max_element(samples.begin(), samples.end());
        
        double variance = 0.0;
        for (double s : samples) {
            variance += (s - avg) * (s - avg);
        }
        double stddev = std::sqrt(variance / samples.size());
        
        std::cout << "Mean: " << avg << "μs" << std::endl;
        std::cout << "Median: " << median << "μs" << std::endl;
        std::cout << "Min: " << min << "μs" << std::endl;
        std::cout << "Max: " << max << "μs" << std::endl;
        std::cout << "StdDev: " << stddev << "μs" << std::endl;
    }
};

What the stats mean:

Metric	Meaning	Use
Mean	Average time	Typical performance
Median	Middle value	Robust to outliers
Min	Best run	Best-case conditions
Max	Worst run	Tail latency
StdDev	Spread	Stability
P95 / P99	Slow tail	SLA-style targets

Example 3: Google Benchmark

#include <benchmark/benchmark.h>
#include <vector>

static void BM_VectorPushBack(benchmark::State& state) {
    for (auto _ : state) {
        std::vector<int> v;
        for (int i = 0; i < state.range(0); ++i) {
            v.push_back(i);
        }
    }
}

BENCHMARK(BM_VectorPushBack)->Range(8, 8<<10);

static void BM_VectorReserve(benchmark::State& state) {
    for (auto _ : state) {
        std::vector<int> v;
        v.reserve(state.range(0));
        for (int i = 0; i < state.range(0); ++i) {
            v.push_back(i);
        }
    }
}

BENCHMARK(BM_VectorReserve)->Range(8, 8<<10);

BENCHMARK_MAIN();

Example 4: Warmup

template<typename Func>
auto benchmarkWithWarmup(Func f, int warmup, int iterations) {
    for (int i = 0; i < warmup; ++i) {
        f();
    }
    
    BenchmarkStats stats;
    for (int i = 0; i < iterations; ++i) {
        auto start = std::chrono::high_resolution_clock::now();
        f();
        auto end = std::chrono::high_resolution_clock::now();
        
        auto duration = std::chrono::duration_cast<std::chrono::microseconds>(
            end - start
        );
        stats.addSample(duration.count());
    }
    
    return stats;
}

Benchmarking tips

Checklist for accurate measurement

Item	Recommendation	Why
Warmup	10–100 iterations	Stabilize cache and branch prediction
Repetitions	100–1000	Statistical significance
Anti-DCE	`volatile` or `DoNotOptimize`	Prevent optimizing away the work
Isolation	Close noisy processes	Less noise
CPU affinity	`taskset` on Linux	Fewer core migrations
Release build	`-O3 -DNDEBUG`	Match production performance

for (int i = 0; i < 10; ++i) {
    f();
}

for (int i = 0; i < 100; ++i) {
    benchmark(f);
}

volatile int result = compute();

Common pitfalls

Pitfall 1: Compiler eliminates “useless” work

int result;
auto time = benchmark([&]() {
    result = compute();
});
benchmark::DoNotOptimize(result);

Pitfall 2: Cache effects

for (int i = 0; i < 10; ++i) f();
auto time = benchmark(f);

Pitfall 3: Variance

BenchmarkStats stats;
for (int i = 0; i < 100; ++i) {
    stats.addSample(benchmark(f));
}
stats.printStats();

Pitfall 4: Timer dominates tiny work

Repeat many times and average.

Google Benchmark setup

git clone https://github.com/google/benchmark.git
cd benchmark
cmake -E make_directory "build"
cmake -E chdir "build" cmake -DBENCHMARK_DOWNLOAD_DEPENDENCIES=on -DCMAKE_BUILD_TYPE=Release ../
cmake --build "build" --config Release

g++ -std=c++17 bench.cpp -lbenchmark -lpthread -o bench
./bench

FAQ

Q1: What is benchmarking?

A: Measuring performance of code paths.

Q2: Warmup?

A: Reduces cold-cache bias.

Q3: Which statistics?

A: Mean, median, standard deviation at minimum.

Q4: Tools?

A: Google Benchmark, perf, VTune.

Q5: Preventing optimization?

A: volatile, benchmark::DoNotOptimize, or careful harness design.

Q6: Resources?

A: Optimized C++, Google Benchmark docs, cppreference.com.

See also: Stopwatch & benchmarks, duration, time conversion, performance optimization.

Practical tips

Debugging

Fix compiler warnings first
Reproduce with a minimal test

Performance

Profile before micro-optimizing
Define measurable targets

Code review

Follow team conventions

Checklist

Before coding

Right technique for the problem?
Maintainable by the team?
Meets performance requirements?

While coding

Warnings cleared?
Edge cases covered?
Error handling appropriate?

At review

Intent clear?
Tests sufficient?
Documentation adequate?

Keywords

C++, benchmarking, performance, testing, Google Benchmark, chrono.

C++ algorithm sort
C++ cache optimization
C++ CMake
C++ code coverage
C++ Conan