My program is slow—where do I start?

Use a profiler: perf (Linux), Instruments (macOS), Visual Studio Profiler (Windows). Find functions with high self time or exclusive %.

My algorithm is O(n)—why is it still slow?

Constants matter: cache behavior, allocations, copies, branches, virtual dispatch. Profile hardware counters (cache misses, branch misses).

Prefer perf: no special instrumentation rebuild, rich hardware counters. gprof needs -pg and is weaker for multithreaded apps.

The compiler cannot change your algorithm. Fix hotspots found by profiling: data layout, fewer allocations, better containers.

Multithreading made it slower.

Common causes: lock contention, false sharing, oversmall tasks, thread creation overhead. Use perf/Tsan and scale tests.

Why Is My C++ Program Slow? Find Bottlenecks with Profiling (perf, VS Profiler)

2026년 3월 28일 · 19분 읽기 · 수정 2026년 3월 28일 Intermediate Problem-solving

이 글의 핵심

Measure first: seven common slowdown causes, perf/VS workflows, ten fixable patterns (copies, allocations, AoS vs SoA), and a five-step optimize loop.

Complexity basics: arrays and lists (Big-O intuition alongside profiling).

Introduction: “The code looks correct but it’s slow”

“Same complexity, but slower than Python sometimes”

When C++ feels slow, profiling turns guesses into hotspots: functions and lines that dominate time or hardware events.

This article covers:

Seven major causes of slowdown
Choosing a profiler
perf basics (Linux)
Visual Studio Profiler (Windows)
Ten common performance patterns
Case studies and a five-step tuning loop

Seven root causes
Profiler guide
perf (Linux)
Visual Studio Profiler
Ten performance patterns
Case studies
Summary

1. Seven major causes (overview)

Wrong asymptotics (e.g. nested loops vs hash set)
Pass-by-value of large containers
Excessive allocations inside hot loops
Cache-unfriendly access patterns (stride, AoS vs SoA)
Branch-heavy unpredictable control flow
Virtual dispatch on hot inner loops
Inefficient string building (repeated reallocations, excessive flushing)

Each has small code examples and fixes in the original article; the remedy is almost always measure → change data layout or algorithm → measure again.

2. Profiler guide

Platform	Tool	Notes
Linux	perf	Low overhead, stack + HW counters
macOS	Instruments	Great UI integration
Windows	VS Profiler	Easy CPU sampling
Cross	Valgrind/callgrind	Slower, no recompile for some modes

3. perf (Linux)

perf record -g ./myapp
perf report
perf stat -e cache-misses,cache-references ./myapp

Flame graphs: fold stacks with Brendan Gregg’s FlameGraph scripts for visual hotspots.

4. Visual Studio

Debug → Performance Profiler → CPU Usage — inspect exclusive vs inclusive time and call trees.

5. Ten patterns (titles)

Pass const T& instead of T for large inputs.
Reuse buffers / reserve vectors in loops.
reserve / ostringstream for string assembly.
Prefer unordered_map when average O(1) beats tree map.
SoA for hot fields vs AoS when you touch only part of a struct.
Reduce virtual calls in inner loops (batch by type, CRTP, etc.—design-dependent).
Avoid std::endl in tight loops (forces flush); use '\n'.
Compile regexes once, not per iteration.
Reduce lock contention with local buffers then merge.
Prefer contiguous vector<int> over unique_ptr per element when possible.

6. Case studies (short)

JSON-like string building: reserve cut reallocations → large speedups.
N+1 queries: one JOIN vs per-row queries → orders of magnitude.
Image filters: raw pixel pointer vs virtual getPixel per pixel → fewer calls.

Five-step process

Measure end-to-end time + profiler trace
Identify top exclusive-time functions
Hypothesize (allocations? copies? cache?)
Change one thing at a time
Re-measure; repeat until goals met

Summary

Checklist

Priority

Algorithmic improvements
Remove copies / tighten interfaces
Allocation reduction
Data layout / cache
Compiler flags last—after correctness and profiling

Profiling deep dive
Performance patterns
Cache-friendly code
Benchmarking

Keywords

slow C++, profiling, perf, bottleneck, CPU profiler, cache miss

Practical tips

Never optimize without a profile on realistic input.
Compare before/after with fixed seeds and hardware when possible.
Watch self time, not only inclusive time, to pick real hotspots.

Closing

“Slow” becomes actionable when a profiler shows where time goes. Fix algorithm + data layout + allocations first; micro-optimize only on evidence.

Next: Cache-friendly coding and SIMD articles when CPU-bound.

Browse the C++ series