C++ Profiling: Find Bottlenecks with Timers, gprof, perf, and Callgrind
이 글의 핵심
Practical C++ profiling from concepts to gprof, perf, and Callgrind.
What is profiling?
Profiling is the process of measuring program performance and finding bottlenecks.
// Before: you do not know what is slow
void process() {
step1();
step2();
step3();
}
// After: step2 takes ~90% of the time
Basic timing
#include <chrono>
#include <iostream>
void measureTime() {
auto start = std::chrono::high_resolution_clock::now();
for (int i = 0; i < 1000000; i++) {
// work
}
auto end = std::chrono::high_resolution_clock::now();
auto duration = std::chrono::duration_cast<std::chrono::milliseconds>(end - start);
std::cout << "Time: " << duration.count() << "ms" << std::endl;
}
Examples
Example 1: Scoped function timer
#include <chrono>
#include <iostream>
class Timer {
std::chrono::time_point<std::chrono::high_resolution_clock> start;
std::string name;
public:
Timer(const std::string& n) : name(n) {
start = std::chrono::high_resolution_clock::now();
}
~Timer() {
auto end = std::chrono::high_resolution_clock::now();
auto duration = std::chrono::duration_cast<std::chrono::microseconds>(end - start);
std::cout << name << ": " << duration.count() << "μs" << std::endl;
}
};
void slowFunction() {
Timer t("slowFunction");
}
void fastFunction() {
Timer t("fastFunction");
}
Example 2: gprof
g++ -pg program.cpp -o program
./program
gprof program gprof.out > analysis.txt
Example 3: perf
perf record ./program
perf report
perf stat ./program
Example 4: Valgrind Callgrind
valgrind --tool=callgrind ./program
kcachegrind callgrind.out.*
Finding bottlenecks
#include <map>
#include <chrono>
class Profiler {
struct Entry {
size_t count = 0;
long long totalTime = 0;
};
std::map<std::string, Entry> entries;
public:
void start(const std::string& name) {
// record start
}
void end(const std::string& name) {
// record end
}
void report() {
for (const auto& [name, entry] : entries) {
std::cout << name << ": "
<< entry.totalTime / entry.count << "μs"
<< " (" << entry.count << " calls)" << std::endl;
}
}
};
Common pitfalls
Pitfall 1: Measurement overhead
// ❌ Measuring inside a tight loop
for (int i = 0; i < 1000000; i++) {
auto start = std::chrono::high_resolution_clock::now();
doWork();
auto end = std::chrono::high_resolution_clock::now();
}
// ✅ Measure the whole loop
auto start = std::chrono::high_resolution_clock::now();
for (int i = 0; i < 1000000; i++) {
doWork();
}
auto end = std::chrono::high_resolution_clock::now();
Pitfall 2: Unoptimized debug builds
// Debug build can be misleadingly slow
g++ -g program.cpp
// ✅ Profile release-like build with symbols
g++ -O2 -g program.cpp
Pitfall 3: Cache effects
// First run may be slow (cold cache)
// Later runs faster (warm cache)
// ✅ Run multiple times and average
Pitfall 4: Premature optimization
// ❌ Optimize before measuring
// ✅ Measure → find hotspot → optimize that area only
Profiling tools (quick reference)
# gprof
g++ -pg program.cpp
./a.out
gprof a.out gmon.out
# perf (Linux)
perf record ./program
perf report
# Valgrind Callgrind
valgrind --tool=callgrind ./program
# Instruments (macOS)
instruments -t "Time Profiler" ./program
# Visual Studio Profiler (Windows)
FAQ
Q1: When should I profile?
A: When you have a performance issue, before major optimizations, or as part of regular monitoring.
Q2: Which tool?
A: gprof for a quick start; perf for detail on Linux; Callgrind for accurate call graphs; Instruments on Mac.
Q3: What units?
A: Microseconds (μs), milliseconds (ms), or CPU cycles depending on the tool.
Q4: Optimization order?
A: Measure → find bottleneck → optimize → measure again.
Q5: Production profiling?
A: Prefer sampling profilers with low overhead and aggregate statistics.
Q6: Learning resources?
A: Optimized C++, perf docs, Valgrind docs.
See also (internal links)
- C++ exception performance
- C++ cache optimization
- C++ profiling with perf and gprof
Practical tips
Debugging
- Fix compiler warnings first
- Reproduce with a small test case
Performance
- Do not optimize without profiling
- Define measurable goals first
Code review
- Follow team conventions
Checklist
Before coding
- Is this the right technique for the problem?
- Can teammates maintain it?
- Does it meet performance requirements?
While coding
- Warnings cleared?
- Edge cases handled?
- Error handling appropriate?
At review
- Intent clear?
- Tests sufficient?
- Documentation adequate?
Keywords
C++, profiling, performance, optimization, gprof, perf, Callgrind.
Related posts
- C++ cache optimization
- C++ exception performance
- C++ algorithm sort
- C++ benchmarking
- C++ branch prediction