My program is slow—where do I start?

Use a profiler: perf (Linux), Instruments (macOS), Visual Studio Profiler (Windows). Find functions with high self time or exclusive %.

My algorithm is O(n)—why is it still slow?

Constants matter: cache behavior, allocations, copies, branches, virtual dispatch. Profile hardware counters (cache misses, branch misses).

Prefer perf: no special instrumentation rebuild, rich hardware counters. gprof needs -pg and is weaker for multithreaded apps.

The compiler cannot change your algorithm. Fix hotspots found by profiling: data layout, fewer allocations, better containers.

Multithreading made it slower.

Common causes: lock contention, false sharing, oversmall tasks, thread creation overhead. Use perf/Tsan and scale tests.

Why Is My C++ Program Slow? Find Bottlenecks with Profiling

2026년 3월 28일 · 19분 읽기 · 수정 2026년 3월 28일 Intermediate Problem-solving

이 글의 핵심

Beyond Big-O: copying, allocations, cache misses, branch mispredictions, virtual calls. Use perf and Visual Studio to find hotspots, flame graphs, and fix patterns.

Complexity basics: [arrays and lists](/en/blog/algorithm-series-01-array-list/ (Big-O intuition alongside profiling).

Introduction: “The code looks correct but it’s slow”

“Same complexity, but slower than Python sometimes”

When C++ feels slow, profiling turns guesses into hotspots: functions and lines that dominate time or hardware events. This article covers:

Seven major causes of slowdown
Choosing a profiler
perf basics (Linux)
Visual Studio Profiler (Windows)
Ten common performance patterns
Case studies and a five-step tuning loop

Seven root causes
Profiler guide
perf (Linux)
Visual Studio Profiler
Ten performance patterns
Case studies
Summary

1. Seven major causes (overview)

Wrong asymptotics (e.g. nested loops vs hash set)
Pass-by-value of large containers
Excessive allocations inside hot loops
Cache-unfriendly access patterns (stride, AoS vs SoA)
Branch-heavy unpredictable control flow
Virtual dispatch on hot inner loops
Inefficient string building (repeated reallocations, excessive flushing)
Each has small code examples and fixes in the original article; the remedy is almost always measure → change data layout or algorithm → measure again.

2. Profiler guide

Platform	Tool	Notes
Linux	perf	Low overhead, stack + HW counters
macOS	Instruments	Great UI integration
Windows	VS Profiler	Easy CPU sampling
Cross	Valgrind/callgrind	Slower, no recompile for some modes

3. perf (Linux)

perf record -g ./myapp
perf report
perf stat -e cache-misses,cache-references ./myapp

Flame graphs: fold stacks with Brendan Gregg’s FlameGraph scripts for visual hotspots.

4. Visual Studio

Debug → Performance Profiler → CPU Usage — inspect exclusive vs inclusive time and call trees.

5. Ten patterns (titles)

Pass const T& instead of T for large inputs.
Reuse buffers / reserve vectors in loops.
reserve / ostringstream for string assembly.
Prefer unordered_map when average O(1) beats tree map.
SoA for hot fields vs AoS when you touch only part of a struct.
Reduce virtual calls in inner loops (batch by type, CRTP, etc.—design-dependent).
Avoid std::endl in tight loops (forces flush); use ‘\n’.
Compile regexes once, not per iteration.
Reduce lock contention with local buffers then merge.
Prefer contiguous vector<int> over unique_ptr per element when possible.

6. Case studies (short)

JSON-like string building: reserve cut reallocations → large speedups.
N+1 queries: one JOIN vs per-row queries → orders of magnitude.
Image filters: raw pixel pointer vs virtual getPixel per pixel → fewer calls.

Five-step process

Measure end-to-end time + profiler trace
Identify top exclusive-time functions
Hypothesize (allocations? copies? cache?)
Change one thing at a time
Re-measure; repeat until goals met

Summary

Checklist

Priority

Algorithmic improvements
Remove copies / tighten interfaces
Allocation reduction
Data layout / cache
Compiler flags last—after correctness and profiling

Profiling deep dive
Performance patterns
Cache-friendly code
Benchmarking

Keywords

slow C++, profiling, perf, bottleneck, CPU profiler, cache miss

Practical tips

Never optimize without a profile on realistic input.
Compare before/after with fixed seeds and hardware when possible.
Watch self time, not only inclusive time, to pick real hotspots.

Closing

“Slow” becomes actionable when a profiler shows where time goes. Fix algorithm + data layout + allocations first; micro-optimize only on evidence. Next: Cache-friendly coding and SIMD articles when CPU-bound.

Browse the C++ series

자주 묻는 질문 (FAQ)

Q. 이 내용을 실무에서 언제 쓰나요?

A. Beyond Big-O: copying, allocations, cache misses, branch mispredictions, virtual calls. Use perf and Visual Studio to fi… 실무에서는 위 본문의 예제와 선택 가이드를 참고해 적용하면 됩니다.

Q. 선행으로 읽으면 좋은 글은?

A. 각 글 하단의 이전 글 또는 관련 글 링크를 따라가면 순서대로 배울 수 있습니다. C++ 시리즈 목차에서 전체 흐름을 확인할 수 있습니다.

Q. 더 깊이 공부하려면?

A. cppreference와 해당 라이브러리 공식 문서를 참고하세요. 글 말미의 참고 자료 링크도 활용하면 좋습니다.

같이 보면 좋은 글 (내부 링크)

이 주제와 연결되는 다른 글입니다.

[Arrays and Lists](/en/blog/algorithm-series-01-array-list/
C++ 프로파일링 | ‘어디가 느린지 모르겠어요’ perf·gprof로 병목 찾기
C++ 성능 최적화 | ‘10배 빠르게’ 실전 기법
C++ Cache Friendly 코드 작성법 | 메모리 접근 패턴으로 성능 10배 향상

이 글에서 다루는 키워드 (관련 검색어)

C++, Performance, Profiling, perf, gprof, Bottleneck, Optimization 등으로 검색하시면 이 글이 도움이 됩니다.

이 글이 도움이 되셨나요?

여러분의 피드백은 더 나은 콘텐츠를 만드는 데 도움이 됩니다

문제가 있거나 개선 제안이 있으시면 연락처로 알려주세요.

Keyboard Shortcuts

Why Is My C++ Program Slow? Find Bottlenecks with Profiling

이 글의 핵심

Introduction: “The code looks correct but it’s slow”

“Same complexity, but slower than Python sometimes”

Table of contents

1. Seven major causes (overview)

2. Profiler guide

3. perf (Linux)

Flame graphs: fold stacks with Brendan Gregg’s FlameGraph scripts for visual hotspots.

4. Visual Studio

Debug → Performance Profiler → CPU Usage — inspect exclusive vs inclusive time and call trees.

5. Ten patterns (titles)

6. Case studies (short)

Five-step process

Summary

Checklist

Priority

Keywords

Practical tips

Closing

“Slow” becomes actionable when a profiler shows where time goes. Fix algorithm + data layout + allocations first; micro-optimize only on evidence. Next: Cache-friendly coding and SIMD articles when CPU-bound.

자주 묻는 질문 (FAQ)

Q. 이 내용을 실무에서 언제 쓰나요?

Q. 선행으로 읽으면 좋은 글은?

Q. 더 깊이 공부하려면?

같이 보면 좋은 글 (내부 링크)

이 글에서 다루는 키워드 (관련 검색어)

이 글이 도움이 되셨나요?

Keyboard Shortcuts

이 글의 핵심

Introduction: “The code looks correct but it’s slow”

“Same complexity, but slower than Python sometimes”

Table of contents

1. Seven major causes (overview)

2. Profiler guide

3. perf (Linux)

Flame graphs: fold stacks with Brendan Gregg’s FlameGraph scripts for visual hotspots.

4. Visual Studio

Debug → Performance Profiler → CPU Usage — inspect exclusive vs inclusive time and call trees.

5. Ten patterns (titles)

6. Case studies (short)

Five-step process

Summary

Checklist

Priority

Related posts (internal)

Keywords

Practical tips

Closing

“Slow” becomes actionable when a profiler shows where time goes. Fix algorithm + data layout + allocations first; micro-optimize only on evidence. Next: Cache-friendly coding and SIMD articles when CPU-bound.

More related posts

자주 묻는 질문 (FAQ)

Q. 이 내용을 실무에서 언제 쓰나요?

Q. 선행으로 읽으면 좋은 글은?

Q. 더 깊이 공부하려면?

같이 보면 좋은 글 (내부 링크)

이 글에서 다루는 키워드 (관련 검색어)

이 글이 도움이 되셨나요?