C++ Profiling: Find Bottlenecks with Timers, gprof, perf, and Callgrind

2026년 3월 12일 · 11분 읽기 · 수정 2026년 3월 12일 Intermediate Tutorial

이 글의 핵심

Practical C++ profiling from concepts to gprof, perf, and Callgrind.

What is profiling?

Profiling is the process of measuring program performance and finding bottlenecks.

// Before: you do not know what is slow
void process() {
    step1();
    step2();
    step3();
}

// After: step2 takes ~90% of the time

Basic timing

#include <chrono>
#include <iostream>

void measureTime() {
    auto start = std::chrono::high_resolution_clock::now();
    
    for (int i = 0; i < 1000000; i++) {
        // work
    }
    
    auto end = std::chrono::high_resolution_clock::now();
    auto duration = std::chrono::duration_cast<std::chrono::milliseconds>(end - start);
    
    std::cout << "Time: " << duration.count() << "ms" << std::endl;
}

Examples

Example 1: Scoped function timer

#include <chrono>
#include <iostream>

class Timer {
    std::chrono::time_point<std::chrono::high_resolution_clock> start;
    std::string name;
    
public:
    Timer(const std::string& n) : name(n) {
        start = std::chrono::high_resolution_clock::now();
    }
    
    ~Timer() {
        auto end = std::chrono::high_resolution_clock::now();
        auto duration = std::chrono::duration_cast<std::chrono::microseconds>(end - start);
        std::cout << name << ": " << duration.count() << "μs" << std::endl;
    }
};

void slowFunction() {
    Timer t("slowFunction");
}

void fastFunction() {
    Timer t("fastFunction");
}

Example 2: gprof

g++ -pg program.cpp -o program
./program
gprof program gprof.out > analysis.txt

Example 3: perf

perf record ./program
perf report
perf stat ./program

Example 4: Valgrind Callgrind

valgrind --tool=callgrind ./program
kcachegrind callgrind.out.*

Finding bottlenecks

#include <map>
#include <chrono>

class Profiler {
    struct Entry {
        size_t count = 0;
        long long totalTime = 0;
    };
    
    std::map<std::string, Entry> entries;
    
public:
    void start(const std::string& name) {
        // record start
    }
    
    void end(const std::string& name) {
        // record end
    }
    
    void report() {
        for (const auto& [name, entry] : entries) {
            std::cout << name << ": " 
                      << entry.totalTime / entry.count << "μs" 
                      << " (" << entry.count << " calls)" << std::endl;
        }
    }
};

Common pitfalls

Pitfall 1: Measurement overhead

// ❌ Measuring inside a tight loop
for (int i = 0; i < 1000000; i++) {
    auto start = std::chrono::high_resolution_clock::now();
    doWork();
    auto end = std::chrono::high_resolution_clock::now();
}

// ✅ Measure the whole loop
auto start = std::chrono::high_resolution_clock::now();
for (int i = 0; i < 1000000; i++) {
    doWork();
}
auto end = std::chrono::high_resolution_clock::now();

Pitfall 2: Unoptimized debug builds

// Debug build can be misleadingly slow
g++ -g program.cpp

// ✅ Profile release-like build with symbols
g++ -O2 -g program.cpp

Pitfall 3: Cache effects

// First run may be slow (cold cache)
// Later runs faster (warm cache)

// ✅ Run multiple times and average

Pitfall 4: Premature optimization

// ❌ Optimize before measuring

// ✅ Measure → find hotspot → optimize that area only

Profiling tools (quick reference)

# gprof
g++ -pg program.cpp
./a.out
gprof a.out gmon.out

# perf (Linux)
perf record ./program
perf report

# Valgrind Callgrind
valgrind --tool=callgrind ./program

# Instruments (macOS)
instruments -t "Time Profiler" ./program

# Visual Studio Profiler (Windows)

FAQ

Q1: When should I profile?

A: When you have a performance issue, before major optimizations, or as part of regular monitoring.

Q2: Which tool?

A: gprof for a quick start; perf for detail on Linux; Callgrind for accurate call graphs; Instruments on Mac.

Q3: What units?

A: Microseconds (μs), milliseconds (ms), or CPU cycles depending on the tool.

Q4: Optimization order?

A: Measure → find bottleneck → optimize → measure again.

Q5: Production profiling?

A: Prefer sampling profilers with low overhead and aggregate statistics.

Q6: Learning resources?

A: Optimized C++, perf docs, Valgrind docs.

Practical tips

Debugging

Fix compiler warnings first
Reproduce with a small test case

Performance

Do not optimize without profiling
Define measurable goals first

Code review

Follow team conventions

Checklist

Before coding

Is this the right technique for the problem?
Can teammates maintain it?
Does it meet performance requirements?

While coding

Warnings cleared?
Edge cases handled?
Error handling appropriate?

At review

Intent clear?
Tests sufficient?
Documentation adequate?

Keywords

C++, profiling, performance, optimization, gprof, perf, Callgrind.

C++ cache optimization
C++ exception performance
C++ algorithm sort
C++ benchmarking
C++ branch prediction