C++ Benchmarking: chrono, Warmup, Statistics, and Google Benchmark

C++ Benchmarking: chrono, Warmup, Statistics, and Google Benchmark

이 글의 핵심

Practical C++ benchmarking: timing, statistics, and Google Benchmark.

What is benchmarking?

Performance measurement ties to stopwatch and benchmark patterns and chrono / time conversion. Running benchmarks before and after performance optimization gives numeric proof of improvement.

Benchmark workflow

graph LR
    A[Write Code] --> B[Warmup]
    B --> C[Start Measure]
    C --> D[Repeat Run]
    D --> E[End Measure]
    E --> F[Calc Stats]
    F --> G{Goal Met?}
    G -->|No| H[Optimize]
    H --> A
    G -->|Yes| I[Done]
#include <chrono>

auto start = std::chrono::high_resolution_clock::now();

// code under test

auto end = std::chrono::high_resolution_clock::now();
auto duration = std::chrono::duration_cast<std::chrono::microseconds>(
    end - start
);

std::cout << "Time: " << duration.count() << "μs" << std::endl;

Basic measurement

#include <vector>

template<typename Func>
auto benchmark(Func f, int iterations = 1000) {
    using namespace std::chrono;
    
    auto start = high_resolution_clock::now();
    
    for (int i = 0; i < iterations; ++i) {
        f();
    }
    
    auto end = high_resolution_clock::now();
    auto total = duration_cast<microseconds>(end - start);
    
    return total.count() / iterations;
}

int main() {
    auto avgTime = benchmark( []{
        std::vector<int> v(1000);
    });
    
    std::cout << "Average: " << avgTime << "μs" << std::endl;
}

Examples

Example 1: Comparing sort algorithms

#include <algorithm>
#include <vector>
#include <chrono>

void compareSort() {
    std::vector<int> data(100000);
    std::generate(data.begin(), data.end(), std::rand);
    
    auto data1 = data;
    auto start1 = std::chrono::high_resolution_clock::now();
    std::sort(data1.begin(), data1.end());
    auto end1 = std::chrono::high_resolution_clock::now();
    auto time1 = std::chrono::duration_cast<std::chrono::milliseconds>(end1 - start1);
    
    auto data2 = data;
    auto start2 = std::chrono::high_resolution_clock::now();
    std::stable_sort(data2.begin(), data2.end());
    auto end2 = std::chrono::high_resolution_clock::now();
    auto time2 = std::chrono::duration_cast<std::chrono::milliseconds>(end2 - start2);
    
    std::cout << "sort: " << time1.count() << "ms" << std::endl;
    std::cout << "stable_sort: " << time2.count() << "ms" << std::endl;
}

Sort performance (100,000 elements, illustrative):

AlgorithmTypical timeWorst caseStableExtra memory
std::sort~8msO(N log N)NoO(log N)
std::stable_sort~12msO(N log² N)YesO(N)
std::partial_sort~5ms (top 10%)O(N log K)NoO(1)
std::nth_element~2ms (median)O(N)NoO(1)

Example 2: Statistics

#include <vector>
#include <algorithm>
#include <numeric>

class BenchmarkStats {
    std::vector<double> samples;
    
public:
    void addSample(double microseconds) {
        samples.push_back(microseconds);
    }
    
    void printStats() const {
        auto sum = std::accumulate(samples.begin(), samples.end(), 0.0);
        auto avg = sum / samples.size();
        
        auto sorted = samples;
        std::sort(sorted.begin(), sorted.end());
        auto median = sorted[sorted.size() / 2];
        
        auto min = *std::min_element(samples.begin(), samples.end());
        auto max = *std::max_element(samples.begin(), samples.end());
        
        double variance = 0.0;
        for (double s : samples) {
            variance += (s - avg) * (s - avg);
        }
        double stddev = std::sqrt(variance / samples.size());
        
        std::cout << "Mean: " << avg << "μs" << std::endl;
        std::cout << "Median: " << median << "μs" << std::endl;
        std::cout << "Min: " << min << "μs" << std::endl;
        std::cout << "Max: " << max << "μs" << std::endl;
        std::cout << "StdDev: " << stddev << "μs" << std::endl;
    }
};

What the stats mean:

MetricMeaningUse
MeanAverage timeTypical performance
MedianMiddle valueRobust to outliers
MinBest runBest-case conditions
MaxWorst runTail latency
StdDevSpreadStability
P95 / P99Slow tailSLA-style targets

Example 3: Google Benchmark

#include <benchmark/benchmark.h>
#include <vector>

static void BM_VectorPushBack(benchmark::State& state) {
    for (auto _ : state) {
        std::vector<int> v;
        for (int i = 0; i < state.range(0); ++i) {
            v.push_back(i);
        }
    }
}

BENCHMARK(BM_VectorPushBack)->Range(8, 8<<10);

static void BM_VectorReserve(benchmark::State& state) {
    for (auto _ : state) {
        std::vector<int> v;
        v.reserve(state.range(0));
        for (int i = 0; i < state.range(0); ++i) {
            v.push_back(i);
        }
    }
}

BENCHMARK(BM_VectorReserve)->Range(8, 8<<10);

BENCHMARK_MAIN();

Example 4: Warmup

template<typename Func>
auto benchmarkWithWarmup(Func f, int warmup, int iterations) {
    for (int i = 0; i < warmup; ++i) {
        f();
    }
    
    BenchmarkStats stats;
    for (int i = 0; i < iterations; ++i) {
        auto start = std::chrono::high_resolution_clock::now();
        f();
        auto end = std::chrono::high_resolution_clock::now();
        
        auto duration = std::chrono::duration_cast<std::chrono::microseconds>(
            end - start
        );
        stats.addSample(duration.count());
    }
    
    return stats;
}

Benchmarking tips

Checklist for accurate measurement

ItemRecommendationWhy
Warmup10–100 iterationsStabilize cache and branch prediction
Repetitions100–1000Statistical significance
Anti-DCEvolatile or DoNotOptimizePrevent optimizing away the work
IsolationClose noisy processesLess noise
CPU affinitytaskset on LinuxFewer core migrations
Release build-O3 -DNDEBUGMatch production performance
for (int i = 0; i < 10; ++i) {
    f();
}

for (int i = 0; i < 100; ++i) {
    benchmark(f);
}

volatile int result = compute();

Common pitfalls

Pitfall 1: Compiler eliminates “useless” work

int result;
auto time = benchmark([&]() {
    result = compute();
});
benchmark::DoNotOptimize(result);

Pitfall 2: Cache effects

for (int i = 0; i < 10; ++i) f();
auto time = benchmark(f);

Pitfall 3: Variance

BenchmarkStats stats;
for (int i = 0; i < 100; ++i) {
    stats.addSample(benchmark(f));
}
stats.printStats();

Pitfall 4: Timer dominates tiny work

Repeat many times and average.

Google Benchmark setup

git clone https://github.com/google/benchmark.git
cd benchmark
cmake -E make_directory "build"
cmake -E chdir "build" cmake -DBENCHMARK_DOWNLOAD_DEPENDENCIES=on -DCMAKE_BUILD_TYPE=Release ../
cmake --build "build" --config Release

g++ -std=c++17 bench.cpp -lbenchmark -lpthread -o bench
./bench

FAQ

Q1: What is benchmarking?

A: Measuring performance of code paths.

Q2: Warmup?

A: Reduces cold-cache bias.

Q3: Which statistics?

A: Mean, median, standard deviation at minimum.

Q4: Tools?

A: Google Benchmark, perf, VTune.

Q5: Preventing optimization?

A: volatile, benchmark::DoNotOptimize, or careful harness design.

Q6: Resources?

A: Optimized C++, Google Benchmark docs, cppreference.com.

See also: Stopwatch & benchmarks, duration, time conversion, performance optimization.


  • C++ stopwatch and benchmarks
  • C++ duration
  • C++ time conversion
  • C++ performance optimization

Practical tips

Debugging

  • Fix compiler warnings first
  • Reproduce with a minimal test

Performance

  • Profile before micro-optimizing
  • Define measurable targets

Code review

  • Follow team conventions

Checklist

Before coding

  • Right technique for the problem?
  • Maintainable by the team?
  • Meets performance requirements?

While coding

  • Warnings cleared?
  • Edge cases covered?
  • Error handling appropriate?

At review

  • Intent clear?
  • Tests sufficient?
  • Documentation adequate?

Keywords

C++, benchmarking, performance, testing, Google Benchmark, chrono.


  • C++ algorithm sort
  • C++ cache optimization
  • C++ CMake
  • C++ code coverage
  • C++ Conan