C++ Performance Optimization: Copies, Allocations, Cache, and SIMD

2026년 3월 12일 · 13분 읽기 · 수정 2026년 3월 12일 Advanced Tutorial

이 글의 핵심

Practical C++ performance: fewer copies, better allocation, cache, and SIMD.

1. Avoid unnecessary copies

Pass by value vs reference

void process(vector<int> data) { }

void process(const vector<int>& data) { }

void process(vector<int>& data) { }

Move semantics

vector<int> createLargeVector() {
    vector<int> v(1000000);
    return v;
}

vector<int> v1 = {1, 2, 3};
vector<int> v2 = std::move(v1);

2. Memory allocation

`reserve` to reduce reallocations

vector<int> v;
v.reserve(1000);
for (int i = 0; i < 1000; i++) {
    v.push_back(i);
}

Object pool (sketch)

template <typename T>
class ObjectPool {
private:
    vector<unique_ptr<T>> pool;
    
public:
    T* acquire() {
        if (pool.empty()) {
            return new T();
        }
        T* obj = pool.back().release();
        pool.pop_back();
        return obj;
    }
    
    void release(T* obj) {
        pool.push_back(unique_ptr<T>(obj));
    }
};

3. Cache-friendly code

Data locality

struct Bad {
    int id;
    char padding[60];
    int value;
};

struct Good {
    int id;
    int value;
};

Matrix traversal

int matrix[1000][1000];

for (int i = 0; i < 1000; i++) {
    for (int j = 0; j < 1000; j++) {
        matrix[i][j] = 0;
    }
}

4. Compiler optimizations

Inline / constexpr

inline int add(int a, int b) {
    return a + b;
}

constexpr int add_ct(int a, int b) {
    return a + b;
}

Flags

g++ -O0
g++ -O1
g++ -O2
g++ -O3
g++ -O3 -march=native
g++ -O3 -flto

Examples

Example 1: String concatenation

#include <iostream>
#include <string>
#include <sstream>
#include <chrono>
using namespace std;

string concat1(int n) {
    string result;
    for (int i = 0; i < n; i++) {
        result += to_string(i);
    }
    return result;
}

string concat2(int n) {
    ostringstream oss;
    for (int i = 0; i < n; i++) {
        oss << i;
    }
    return oss.str();
}

ostringstream often wins over repeated += for many appends.

Example 2: Lookup table

#include <iostream>
#include <cmath>
#include <chrono>
using namespace std;

double slow(int x) {
    return sin(x * 0.01);
}

class FastSin {
private:
    static constexpr int SIZE = 360;
    double table[SIZE];
    
public:
    FastSin() {
        for (int i = 0; i < SIZE; i++) {
            table[i] = sin(i * 0.01);
        }
    }
    
    double get(int x) {
        return table[x % SIZE];
    }
};

Example 3: SIMD

#include <immintrin.h>
#include <iostream>
using namespace std;

void add_scalar(float* a, float* b, float* c, int n) {
    for (int i = 0; i < n; i++) {
        c[i] = a[i] + b[i];
    }
}

void add_simd(float* a, float* b, float* c, int n) {
    for (int i = 0; i < n; i += 8) {
        __m256 va = _mm256_loadu_ps(&a[i]);
        __m256 vb = _mm256_loadu_ps(&b[i]);
        __m256 vc = _mm256_add_ps(va, vb);
        _mm256_storeu_ps(&c[i], vc);
    }
}

Profiling tools

gprof

g++ -pg program.cpp -o program
./program
gprof program gmon.out > analysis.txt

Valgrind Callgrind

valgrind --tool=callgrind ./program
kcachegrind callgrind.out.*

perf

perf record ./program
perf report

Optimization checklist

Algorithms

Complexity (e.g. O(n²) → O(n log n))
Remove redundant work
Right data structure

Memory

reserve where needed
Fewer copies
Moves where appropriate

Compiler

-O2 or -O3
inline / constexpr where it helps
Consider LTO

Cache

Locality
Sequential access
Struct padding awareness

Parallelism

Threads where appropriate
SIMD
GPU when applicable

Common mistakes

Mistake 1: Premature micro-optimization

int x = a * 2 + b / 4;

Mistake 2: Optimizing without profiling

1. Profile
2. Optimize hotspots
3. Profile again

Mistake 3: Chasing micro-opts before algorithms

Algorithm > data structures > line-level tweaks

FAQ

Q1: When to optimize?

A: After profiling shows a real bottleneck; verify with measurements.

Q2: Biggest wins?

A: Algorithmic improvements (e.g. better asymptotic complexity).

Q3: Trust the compiler?

A: Yes for most local optimizations—still measure hot paths.

Q4: Performance vs readability?

A: Prefer readability; optimize proven bottlenecks.

Q5: Profiling tools?

A: Linux: perf, Valgrind; Windows: VS Profiler; cross-platform: Tracy.

Q6: Resources?

A: Optimized C++ by Kurt Guntheroth, CppCon talks, Compiler Explorer.

Practical tips

Debugging

Warnings first
Small repro

Performance

Profile before optimizing
Define metrics

Code review

Conventions

Checklist

Before coding

Right technique?
Maintainable?
Meets requirements?

While coding

Warnings?
Edge cases?
Errors?

At review

Clear?
Tests?
Docs?

Keywords

C++, performance, optimization, SIMD, profiling, move semantics.

C++ algorithm sort
C++ alignment and padding
C++ benchmarking
C++ cache optimization
C++ string vs string_view