본문으로 건너뛰기
Previous
Next
Why Is My C++ Program Slow? Find Bottlenecks with Profiling

Why Is My C++ Program Slow? Find Bottlenecks with Profiling

Why Is My C++ Program Slow? Find Bottlenecks with Profiling

이 글의 핵심

Beyond Big-O: copying, allocations, cache misses, branch mispredictions, virtual calls. Use perf and Visual Studio to find hotspots, flame graphs, and fix patterns.

Complexity basics: [arrays and lists](/en/blog/algorithm-series-01-array-list/ (Big-O intuition alongside profiling).

Introduction: “The code looks correct but it’s slow”

“Same complexity, but slower than Python sometimes”

When C++ feels slow, profiling turns guesses into hotspots: functions and lines that dominate time or hardware events. This article covers:

  • Seven major causes of slowdown
  • Choosing a profiler
  • perf basics (Linux)
  • Visual Studio Profiler (Windows)
  • Ten common performance patterns
  • Case studies and a five-step tuning loop

Table of contents

  1. Seven root causes
  2. Profiler guide
  3. perf (Linux)
  4. Visual Studio Profiler
  5. Ten performance patterns
  6. Case studies
  7. Summary

1. Seven major causes (overview)

  1. Wrong asymptotics (e.g. nested loops vs hash set)
  2. Pass-by-value of large containers
  3. Excessive allocations inside hot loops
  4. Cache-unfriendly access patterns (stride, AoS vs SoA)
  5. Branch-heavy unpredictable control flow
  6. Virtual dispatch on hot inner loops
  7. Inefficient string building (repeated reallocations, excessive flushing)
    Each has small code examples and fixes in the original article; the remedy is almost always measure → change data layout or algorithm → measure again.

2. Profiler guide

PlatformToolNotes
LinuxperfLow overhead, stack + HW counters
macOSInstrumentsGreat UI integration
WindowsVS ProfilerEasy CPU sampling
CrossValgrind/callgrindSlower, no recompile for some modes

3. perf (Linux)

perf record -g ./myapp
perf report
perf stat -e cache-misses,cache-references ./myapp

Flame graphs: fold stacks with Brendan Gregg’s FlameGraph scripts for visual hotspots.

4. Visual Studio

Debug → Performance Profiler → CPU Usage — inspect exclusive vs inclusive time and call trees.

5. Ten patterns (titles)

  1. Pass const T& instead of T for large inputs.
  2. Reuse buffers / reserve vectors in loops.
  3. reserve / ostringstream for string assembly.
  4. Prefer unordered_map when average O(1) beats tree map.
  5. SoA for hot fields vs AoS when you touch only part of a struct.
  6. Reduce virtual calls in inner loops (batch by type, CRTP, etc.—design-dependent).
  7. Avoid std::endl in tight loops (forces flush); use ‘\n’.
  8. Compile regexes once, not per iteration.
  9. Reduce lock contention with local buffers then merge.
  10. Prefer contiguous vector<int> over unique_ptr per element when possible.

6. Case studies (short)

  • JSON-like string building: reserve cut reallocations → large speedups.
  • N+1 queries: one JOIN vs per-row queries → orders of magnitude.
  • Image filters: raw pixel pointer vs virtual getPixel per pixel → fewer calls.

Five-step process

  1. Measure end-to-end time + profiler trace
  2. Identify top exclusive-time functions
  3. Hypothesize (allocations? copies? cache?)
  4. Change one thing at a time
  5. Re-measure; repeat until goals met

Summary

Checklist

  • Algorithm class appropriate?
  • Avoid large copies?
  • Hot loops allocation-free after reserve?
  • Cache-friendly traversal?
  • Locks not dominating?

Priority

  1. Algorithmic improvements
  2. Remove copies / tighten interfaces
  3. Allocation reduction
  4. Data layout / cache
  5. Compiler flags last—after correctness and profiling


Keywords

slow C++, profiling, perf, bottleneck, CPU profiler, cache miss

Practical tips

  • Never optimize without a profile on realistic input.
  • Compare before/after with fixed seeds and hardware when possible.
  • Watch self time, not only inclusive time, to pick real hotspots.

Closing

“Slow” becomes actionable when a profiler shows where time goes. Fix algorithm + data layout + allocations first; micro-optimize only on evidence. Next: Cache-friendly coding and SIMD articles when CPU-bound.


자주 묻는 질문 (FAQ)

Q. 이 내용을 실무에서 언제 쓰나요?

A. Beyond Big-O: copying, allocations, cache misses, branch mispredictions, virtual calls. Use perf and Visual Studio to fi… 실무에서는 위 본문의 예제와 선택 가이드를 참고해 적용하면 됩니다.

Q. 선행으로 읽으면 좋은 글은?

A. 각 글 하단의 이전 글 또는 관련 글 링크를 따라가면 순서대로 배울 수 있습니다. C++ 시리즈 목차에서 전체 흐름을 확인할 수 있습니다.

Q. 더 깊이 공부하려면?

A. cppreference와 해당 라이브러리 공식 문서를 참고하세요. 글 말미의 참고 자료 링크도 활용하면 좋습니다.


같이 보면 좋은 글 (내부 링크)

이 주제와 연결되는 다른 글입니다.

  • [Arrays and Lists](/en/blog/algorithm-series-01-array-list/
  • C++ 프로파일링 | ‘어디가 느린지 모르겠어요’ perf·gprof로 병목 찾기
  • C++ 성능 최적화 | ‘10배 빠르게’ 실전 기법
  • C++ Cache Friendly 코드 작성법 | 메모리 접근 패턴으로 성능 10배 향상

이 글에서 다루는 키워드 (관련 검색어)

C++, Performance, Profiling, perf, gprof, Bottleneck, Optimization 등으로 검색하시면 이 글이 도움이 됩니다.