C++ Profiling: Find Bottlenecks with Timers, gprof, perf, and Callgrind

C++ Profiling: Find Bottlenecks with Timers, gprof, perf, and Callgrind

이 글의 핵심

Practical C++ profiling from concepts to gprof, perf, and Callgrind.

What is profiling?

Profiling is the process of measuring program performance and finding bottlenecks.

// Before: you do not know what is slow
void process() {
    step1();
    step2();
    step3();
}

// After: step2 takes ~90% of the time

Basic timing

#include <chrono>
#include <iostream>

void measureTime() {
    auto start = std::chrono::high_resolution_clock::now();
    
    for (int i = 0; i < 1000000; i++) {
        // work
    }
    
    auto end = std::chrono::high_resolution_clock::now();
    auto duration = std::chrono::duration_cast<std::chrono::milliseconds>(end - start);
    
    std::cout << "Time: " << duration.count() << "ms" << std::endl;
}

Examples

Example 1: Scoped function timer

#include <chrono>
#include <iostream>

class Timer {
    std::chrono::time_point<std::chrono::high_resolution_clock> start;
    std::string name;
    
public:
    Timer(const std::string& n) : name(n) {
        start = std::chrono::high_resolution_clock::now();
    }
    
    ~Timer() {
        auto end = std::chrono::high_resolution_clock::now();
        auto duration = std::chrono::duration_cast<std::chrono::microseconds>(end - start);
        std::cout << name << ": " << duration.count() << "μs" << std::endl;
    }
};

void slowFunction() {
    Timer t("slowFunction");
}

void fastFunction() {
    Timer t("fastFunction");
}

Example 2: gprof

g++ -pg program.cpp -o program
./program
gprof program gprof.out > analysis.txt

Example 3: perf

perf record ./program
perf report
perf stat ./program

Example 4: Valgrind Callgrind

valgrind --tool=callgrind ./program
kcachegrind callgrind.out.*

Finding bottlenecks

#include <map>
#include <chrono>

class Profiler {
    struct Entry {
        size_t count = 0;
        long long totalTime = 0;
    };
    
    std::map<std::string, Entry> entries;
    
public:
    void start(const std::string& name) {
        // record start
    }
    
    void end(const std::string& name) {
        // record end
    }
    
    void report() {
        for (const auto& [name, entry] : entries) {
            std::cout << name << ": " 
                      << entry.totalTime / entry.count << "μs" 
                      << " (" << entry.count << " calls)" << std::endl;
        }
    }
};

Common pitfalls

Pitfall 1: Measurement overhead

// ❌ Measuring inside a tight loop
for (int i = 0; i < 1000000; i++) {
    auto start = std::chrono::high_resolution_clock::now();
    doWork();
    auto end = std::chrono::high_resolution_clock::now();
}

// ✅ Measure the whole loop
auto start = std::chrono::high_resolution_clock::now();
for (int i = 0; i < 1000000; i++) {
    doWork();
}
auto end = std::chrono::high_resolution_clock::now();

Pitfall 2: Unoptimized debug builds

// Debug build can be misleadingly slow
g++ -g program.cpp

// ✅ Profile release-like build with symbols
g++ -O2 -g program.cpp

Pitfall 3: Cache effects

// First run may be slow (cold cache)
// Later runs faster (warm cache)

// ✅ Run multiple times and average

Pitfall 4: Premature optimization

// ❌ Optimize before measuring

// ✅ Measure → find hotspot → optimize that area only

Profiling tools (quick reference)

# gprof
g++ -pg program.cpp
./a.out
gprof a.out gmon.out

# perf (Linux)
perf record ./program
perf report

# Valgrind Callgrind
valgrind --tool=callgrind ./program

# Instruments (macOS)
instruments -t "Time Profiler" ./program

# Visual Studio Profiler (Windows)

FAQ

Q1: When should I profile?

A: When you have a performance issue, before major optimizations, or as part of regular monitoring.

Q2: Which tool?

A: gprof for a quick start; perf for detail on Linux; Callgrind for accurate call graphs; Instruments on Mac.

Q3: What units?

A: Microseconds (μs), milliseconds (ms), or CPU cycles depending on the tool.

Q4: Optimization order?

A: Measure → find bottleneck → optimize → measure again.

Q5: Production profiling?

A: Prefer sampling profilers with low overhead and aggregate statistics.

Q6: Learning resources?

A: Optimized C++, perf docs, Valgrind docs.


  • C++ exception performance
  • C++ cache optimization
  • C++ profiling with perf and gprof

Practical tips

Debugging

  • Fix compiler warnings first
  • Reproduce with a small test case

Performance

  • Do not optimize without profiling
  • Define measurable goals first

Code review

  • Follow team conventions

Checklist

Before coding

  • Is this the right technique for the problem?
  • Can teammates maintain it?
  • Does it meet performance requirements?

While coding

  • Warnings cleared?
  • Edge cases handled?
  • Error handling appropriate?

At review

  • Intent clear?
  • Tests sufficient?
  • Documentation adequate?

Keywords

C++, profiling, performance, optimization, gprof, perf, Callgrind.


  • C++ cache optimization
  • C++ exception performance
  • C++ algorithm sort
  • C++ benchmarking
  • C++ branch prediction