C++ Performance Optimization: Copies, Allocations, Cache, and SIMD
이 글의 핵심
Practical C++ performance: fewer copies, better allocation, cache, and SIMD.
1. Avoid unnecessary copies
Pass by value vs reference
void process(vector<int> data) { }
void process(const vector<int>& data) { }
void process(vector<int>& data) { }
Move semantics
vector<int> createLargeVector() {
vector<int> v(1000000);
return v;
}
vector<int> v1 = {1, 2, 3};
vector<int> v2 = std::move(v1);
2. Memory allocation
reserve to reduce reallocations
vector<int> v;
v.reserve(1000);
for (int i = 0; i < 1000; i++) {
v.push_back(i);
}
Object pool (sketch)
template <typename T>
class ObjectPool {
private:
vector<unique_ptr<T>> pool;
public:
T* acquire() {
if (pool.empty()) {
return new T();
}
T* obj = pool.back().release();
pool.pop_back();
return obj;
}
void release(T* obj) {
pool.push_back(unique_ptr<T>(obj));
}
};
3. Cache-friendly code
Data locality
struct Bad {
int id;
char padding[60];
int value;
};
struct Good {
int id;
int value;
};
Matrix traversal
int matrix[1000][1000];
for (int i = 0; i < 1000; i++) {
for (int j = 0; j < 1000; j++) {
matrix[i][j] = 0;
}
}
4. Compiler optimizations
Inline / constexpr
inline int add(int a, int b) {
return a + b;
}
constexpr int add_ct(int a, int b) {
return a + b;
}
Flags
g++ -O0
g++ -O1
g++ -O2
g++ -O3
g++ -O3 -march=native
g++ -O3 -flto
Examples
Example 1: String concatenation
#include <iostream>
#include <string>
#include <sstream>
#include <chrono>
using namespace std;
string concat1(int n) {
string result;
for (int i = 0; i < n; i++) {
result += to_string(i);
}
return result;
}
string concat2(int n) {
ostringstream oss;
for (int i = 0; i < n; i++) {
oss << i;
}
return oss.str();
}
ostringstream often wins over repeated += for many appends.
Example 2: Lookup table
#include <iostream>
#include <cmath>
#include <chrono>
using namespace std;
double slow(int x) {
return sin(x * 0.01);
}
class FastSin {
private:
static constexpr int SIZE = 360;
double table[SIZE];
public:
FastSin() {
for (int i = 0; i < SIZE; i++) {
table[i] = sin(i * 0.01);
}
}
double get(int x) {
return table[x % SIZE];
}
};
Example 3: SIMD
#include <immintrin.h>
#include <iostream>
using namespace std;
void add_scalar(float* a, float* b, float* c, int n) {
for (int i = 0; i < n; i++) {
c[i] = a[i] + b[i];
}
}
void add_simd(float* a, float* b, float* c, int n) {
for (int i = 0; i < n; i += 8) {
__m256 va = _mm256_loadu_ps(&a[i]);
__m256 vb = _mm256_loadu_ps(&b[i]);
__m256 vc = _mm256_add_ps(va, vb);
_mm256_storeu_ps(&c[i], vc);
}
}
Profiling tools
gprof
g++ -pg program.cpp -o program
./program
gprof program gmon.out > analysis.txt
Valgrind Callgrind
valgrind --tool=callgrind ./program
kcachegrind callgrind.out.*
perf
perf record ./program
perf report
Optimization checklist
Algorithms
- Complexity (e.g. O(n²) → O(n log n))
- Remove redundant work
- Right data structure
Memory
-
reservewhere needed - Fewer copies
- Moves where appropriate
Compiler
-
-O2or-O3 -
inline/constexprwhere it helps - Consider LTO
Cache
- Locality
- Sequential access
- Struct padding awareness
Parallelism
- Threads where appropriate
- SIMD
- GPU when applicable
Common mistakes
Mistake 1: Premature micro-optimization
int x = a * 2 + b / 4;
Mistake 2: Optimizing without profiling
1. Profile
2. Optimize hotspots
3. Profile again
Mistake 3: Chasing micro-opts before algorithms
Algorithm > data structures > line-level tweaks
FAQ
Q1: When to optimize?
A: After profiling shows a real bottleneck; verify with measurements.
Q2: Biggest wins?
A: Algorithmic improvements (e.g. better asymptotic complexity).
Q3: Trust the compiler?
A: Yes for most local optimizations—still measure hot paths.
Q4: Performance vs readability?
A: Prefer readability; optimize proven bottlenecks.
Q5: Profiling tools?
A: Linux: perf, Valgrind; Windows: VS Profiler; cross-platform: Tracy.
Q6: Resources?
A: Optimized C++ by Kurt Guntheroth, CppCon talks, Compiler Explorer.
See also
- C++ alignment and padding
- C++ profiling
- C++ profiling series
Practical tips
Debugging
- Warnings first
- Small repro
Performance
- Profile before optimizing
- Define metrics
Code review
- Conventions
Checklist
Before coding
- Right technique?
- Maintainable?
- Meets requirements?
While coding
- Warnings?
- Edge cases?
- Errors?
At review
- Clear?
- Tests?
- Docs?
Keywords
C++, performance, optimization, SIMD, profiling, move semantics.
Related posts
- C++ algorithm sort
- C++ alignment and padding
- C++ benchmarking
- C++ cache optimization
- C++ string vs string_view