본문으로 건너뛰기
Previous
Next
C++ Profiling: Find Bottlenecks with Timers, gprof, perf,

C++ Profiling: Find Bottlenecks with Timers, gprof, perf,

C++ Profiling: Find Bottlenecks with Timers, gprof, perf,

이 글의 핵심

C++ profiling guide: chrono timers, gprof, Linux perf, Valgrind Callgrind, and common pitfalls—measure before you optimize.

What is profiling?

Profiling is the process of measuring program performance and finding bottlenecks.

// Before: you do not know what is slow
void process() {
    step1();
    step2();
    step3();
}
// After: step2 takes ~90% of the time

Basic timing

#include <chrono>
#include <iostream>
void measureTime() {
    auto start = std::chrono::high_resolution_clock::now();
    
    for (int i = 0; i < 1000000; i++) {
        // work
    }
    
    auto end = std::chrono::high_resolution_clock::now();
    auto duration = std::chrono::duration_cast<std::chrono::milliseconds>(end - start);
    
    std::cout << "Time: " << duration.count() << "ms" << std::endl;
}

Examples

Example 1: Scoped function timer

#include <chrono>
#include <iostream>
class Timer {
    std::chrono::time_point<std::chrono::high_resolution_clock> start;
    std::string name;
    
public:
    Timer(const std::string& n) : name(n) {
        start = std::chrono::high_resolution_clock::now();
    }
    
    ~Timer() {
        auto end = std::chrono::high_resolution_clock::now();
        auto duration = std::chrono::duration_cast<std::chrono::microseconds>(end - start);
        std::cout << name << ": " << duration.count() << "μs" << std::endl;
    }
};
void slowFunction() {
    Timer t("slowFunction");
}
void fastFunction() {
    Timer t("fastFunction");
}

Example 2: gprof

g++ -pg program.cpp -o program
./program
gprof program gprof.out > analysis.txt

Example 3: perf

perf record ./program
perf report
perf stat ./program

Example 4: Valgrind Callgrind

valgrind --tool=callgrind ./program
kcachegrind callgrind.out.*

Finding bottlenecks

#include <map>
#include <chrono>
class Profiler {
    struct Entry {
        size_t count = 0;
        long long totalTime = 0;
    };
    
    std::map<std::string, Entry> entries;
    
public:
    void start(const std::string& name) {
        // record start
    }
    
    void end(const std::string& name) {
        // record end
    }
    
    void report() {
        for (const auto& [name, entry] : entries) {
            std::cout << name << ": " 
                      << entry.totalTime / entry.count << "μs" 
                      << " (" << entry.count << " calls)" << std::endl;
        }
    }
};

Common pitfalls

Pitfall 1: Measurement overhead

// ❌ Measuring inside a tight loop
for (int i = 0; i < 1000000; i++) {
    auto start = std::chrono::high_resolution_clock::now();
    doWork();
    auto end = std::chrono::high_resolution_clock::now();
}
// ✅ Measure the whole loop
auto start = std::chrono::high_resolution_clock::now();
for (int i = 0; i < 1000000; i++) {
    doWork();
}
auto end = std::chrono::high_resolution_clock::now();

Pitfall 2: Unoptimized debug builds

// Debug build can be misleadingly slow
g++ -g program.cpp
// ✅ Profile release-like build with symbols
g++ -O2 -g program.cpp

Pitfall 3: Cache effects

// First run may be slow (cold cache)
// Later runs faster (warm cache)
// ✅ Run multiple times and average

Pitfall 4: Premature optimization

// ❌ Optimize before measuring
// ✅ Measure → find hotspot → optimize that area only

Profiling tools (quick reference)

# gprof
g++ -pg program.cpp
./a.out
gprof a.out gmon.out
# perf (Linux)
perf record ./program
perf report
# Valgrind Callgrind
valgrind --tool=callgrind ./program
# Instruments (macOS)
instruments -t "Time Profiler" ./program
# Visual Studio Profiler (Windows)

FAQ

Q1: When should I profile?

A: When you have a performance issue, before major optimizations, or as part of regular monitoring.

Q2: Which tool?

A: gprof for a quick start; perf for detail on Linux; Callgrind for accurate call graphs; Instruments on Mac.

Q3: What units?

A: Microseconds (μs), milliseconds (ms), or CPU cycles depending on the tool.

Q4: Optimization order?

A: Measure → find bottleneck → optimize → measure again.

Q5: Production profiling?

A: Prefer sampling profilers with low overhead and aggregate statistics.

Q6: Learning resources?

A: Optimized C++, perf docs, Valgrind docs.

Practical tips

Debugging

  • Fix compiler warnings first
  • Reproduce with a small test case

Performance

  • Do not optimize without profiling
  • Define measurable goals first

Code review

  • Follow team conventions

Checklist

Before coding

  • Is this the right technique for the problem?
  • Can teammates maintain it?
  • Does it meet performance requirements?

While coding

  • Warnings cleared?
  • Edge cases handled?
  • Error handling appropriate?

At review

  • Intent clear?
  • Tests sufficient?
  • Documentation adequate?

Keywords

C++, profiling, performance, optimization, gprof, perf, Callgrind.


같이 보면 좋은 글 (내부 링크)

이 주제와 연결되는 다른 글입니다.


이 글에서 다루는 키워드 (관련 검색어)

C++, profiling, performance, optimization 등으로 검색하시면 이 글이 도움이 됩니다.