Cache-Friendly C++: Data-Oriented Design Guide

2026년 3월 12일 · 18분 읽기 · 수정 2026년 3월 12일 Advanced concept

이 글의 핵심

Hardware-aware layout: SoA vs AoS, 64-byte cache lines, false sharing, and profiling with perf.

Introduction: cache decides throughput

Modern CPUs are often memory bound. Data-oriented design (DoD) lays out data for sequential access and SIMD. Structure-of-arrays (SoA) often beats array-of-structures (AoS) when loops touch few fields of many objects. False sharing kills parallel scaling unless you pad or align per-thread counters to separate cache lines (~64 bytes).

This article covers: DoD, cache lines, alignas, false sharing, scenarios, AoS→SoA examples, pitfalls, benchmarks, engine/simulation patterns.

Why cache optimization matters
Data-oriented design
Cache lines & alignment
False sharing & padding
Complete examples
Common mistakes
Benchmarks
Production patterns
Summary

1. Why cache optimization matters

100k entities: updating position only still loads velocity/color/id in AoS → wasted bandwidth.
More threads slower: false sharing on adjacent counters.
SIMD won’t vectorize: AoS scatters x across strides.

2. Data-oriented design

flowchart TB
    subgraph AoS["AoS"]
        E1["Entity0: pos, vel, id"]
        E2["Entity1: ..."]
    end
    subgraph SoA["SoA"]
        X["x[]"]
        Y["y[]"]
        Z["z[]"]
    end
    AoS -->|"position-only loop"| Waste["Loads unused fields"]
    SoA -->|"position-only loop"| Hit["Sequential x,y,z"]

Rule of thumb: thousands+ entities, field-specific hot loops, SIMD → SoA. Small counts (<~100–1000) may favor simpler AoS.

3. Cache lines & alignment

Typical 64-byte lines. alignas(64) hot atomics/counters to separate lines. Use std::hardware_destructive_interference_size (C++17) when available.

Independent variables on the same cache line invalidate each other across cores. Fix with line-sized padding or per-thread shards.

5. Complete examples

This series also walks through a full particle AoS vs SoA benchmark and padded atomic counters for parallel increments—adapt the code and comments to your codebase.

6. Common mistakes

SoA index mismatch after partial deletes—use swap-with-last across all arrays.
Over-padding everything—only hot written fields need isolation.
SoA with random indices loses locality—sort/pack active entities.

7. Benchmarks

Use perf stat -e cache-misses,cache-references and Release (-O3 -march=native) builds.

8. Production patterns

ECS-style component arrays, batch processing, blocked matrix multiply, hybrid AoSoA blocks.

9. Summary

Topic	Takeaway
DoD	Prefer SoA when loops are field-specific
Cache line	64B, `alignas` for sharing avoidance
False sharing	pad or shard counters
Production	measure with `perf`, profile hot loops

Keywords

data-oriented design, cache optimization, AoS SoA, false sharing, cache line, SIMD

Next: Custom allocators & pmr (#39-2)
Previous: PIMPL & ABI (#38-3)