C++ Expression Templates | Lazy Evaluation for Math Library Performance
이 글의 핵심
Expression templates encode deferred arithmetic as expression trees and evaluate in one pass—core technique behind many high-performance math libraries.
What are expression templates? Why use them?
Problem: temporaries in vector math
Chained result = a + b + c + d with naive operator+ allocates a fresh vector per +. That costs allocations, extra passes, and cache traffic.
Solution: lazy expression trees
Build an object representing Add(Add(Add(a,b),c),d) without computing element values until you assign into a concrete Vector (or evaluate an element).
flowchart TD
subgraph normal["Naive evaluation"]
n1["a + b → temp1 (alloc)"]
n2["temp1 + c → temp2 (alloc)"]
n3["temp2 + d → result (alloc)"]
end
subgraph expr["Expression template"]
e1["a + b + c + d → expression tree"]
e2["result = expr (single alloc)"]
e3["one pass evaluation"]
end
n1 --> n2 --> n3
e1 --> e2 --> e3
Table of contents
- Basic structure
- Vector operations
- Matrix operations
- Common pitfalls
- Production patterns
- Worked example: mini math library
- Performance
1. Basic structure
The Korean article provides VecExpr, VecAdd, assignment from VecExpr into Vector, and operator+ returning VecAdd<...>—evaluate lazily on assignment. Extend with VecSub, VecScale, and combined expressions like 2.0 * a + b - c.
Key idea: a + b + c becomes a nested VecAdd type; evaluation runs when operator= materializes into storage.
2. Vector operations
Add subtraction (VecSub) and scalar multiply (VecScale) expressions; combine with overloaded +, -, and * as in the original post.
3. Matrix operations
MatExpr, Matrix, MatMul, and operator* build a lazy matrix multiply evaluated on assignment—code structure matches the Korean article’s section 3.
4. Common pitfalls
Dangling references
Do not store expression objects referring to temporaries—evaluate immediately into a Vector/Matrix.
Type complexity
Very deep VecAdd<VecAdd<...>> types hurt compile time—split with named intermediates when needed.
Aliasing
a = a + b may read and write a simultaneously—use a temporary or detect self-assignment in operator=.
5. Production patterns
Optional SIMD loops (__m256d) or parallel element-wise evaluation with <execution>—only after correctness and profiling.
6. Worked example
Full worked example combining adds, muls, and norms—mirror the Korean listing (result = 2.0 * a + b * c, etc.).
7. Performance
Sample benchmark shows expression templates can be several times faster than naive temporaries on large vectors—always validate on your workload.
Summary
| Concept | Role |
|---|---|
| Expression template | Store deferred operations as types |
| Goal | Fewer temporaries, fused loops |
| Pros | Fewer allocations, better locality |
| Cons | Complex types, aliasing hazards, harder errors |
| Typical use | Linear algebra libraries (Eigen, Blaze), vector/matrix kernels |
FAQ
Q1: Numeric-heavy code with large chained expressions.
Q2: Downsides—complexity, compile time, debugging.
Q3: Eigen combines expression templates with SIMD and tuning.
Q4: C++20 Ranges focus on lazy view composition; expression templates target fused numeric kernels—related spirit, different domain.
Q5: Fix aliasing with temporaries or careful operator=.
Q6: See C++ Templates: The Complete Guide, Eigen docs, Modern C++ Design.
Related posts
- CRTP
- Template basics
- Move semantics
Keywords
C++, expression template, templates, optimization, lazy evaluation, Eigen.
See also
- Expression templates (overview)
autodeduction- Branch prediction
- Cache optimization
- CTAD