C++ Expression Templates | Lazy Evaluation for Math Library Performance

C++ Expression Templates | Lazy Evaluation for Math Library Performance

이 글의 핵심

Expression templates encode deferred arithmetic as expression trees and evaluate in one pass—core technique behind many high-performance math libraries.

What are expression templates? Why use them?

Problem: temporaries in vector math

Chained result = a + b + c + d with naive operator+ allocates a fresh vector per +. That costs allocations, extra passes, and cache traffic.

Solution: lazy expression trees

Build an object representing Add(Add(Add(a,b),c),d) without computing element values until you assign into a concrete Vector (or evaluate an element).

flowchart TD
    subgraph normal["Naive evaluation"]
        n1["a + b → temp1 (alloc)"]
        n2["temp1 + c → temp2 (alloc)"]
        n3["temp2 + d → result (alloc)"]
    end
    subgraph expr["Expression template"]
        e1["a + b + c + d → expression tree"]
        e2["result = expr (single alloc)"]
        e3["one pass evaluation"]
    end
    n1 --> n2 --> n3
    e1 --> e2 --> e3

Table of contents

  1. Basic structure
  2. Vector operations
  3. Matrix operations
  4. Common pitfalls
  5. Production patterns
  6. Worked example: mini math library
  7. Performance

1. Basic structure

The Korean article provides VecExpr, VecAdd, assignment from VecExpr into Vector, and operator+ returning VecAdd<...>—evaluate lazily on assignment. Extend with VecSub, VecScale, and combined expressions like 2.0 * a + b - c.

Key idea: a + b + c becomes a nested VecAdd type; evaluation runs when operator= materializes into storage.


2. Vector operations

Add subtraction (VecSub) and scalar multiply (VecScale) expressions; combine with overloaded +, -, and * as in the original post.


3. Matrix operations

MatExpr, Matrix, MatMul, and operator* build a lazy matrix multiply evaluated on assignment—code structure matches the Korean article’s section 3.


4. Common pitfalls

Dangling references

Do not store expression objects referring to temporaries—evaluate immediately into a Vector/Matrix.

Type complexity

Very deep VecAdd<VecAdd<...>> types hurt compile time—split with named intermediates when needed.

Aliasing

a = a + b may read and write a simultaneously—use a temporary or detect self-assignment in operator=.


5. Production patterns

Optional SIMD loops (__m256d) or parallel element-wise evaluation with <execution>—only after correctness and profiling.


6. Worked example

Full worked example combining adds, muls, and norms—mirror the Korean listing (result = 2.0 * a + b * c, etc.).


7. Performance

Sample benchmark shows expression templates can be several times faster than naive temporaries on large vectors—always validate on your workload.


Summary

ConceptRole
Expression templateStore deferred operations as types
GoalFewer temporaries, fused loops
ProsFewer allocations, better locality
ConsComplex types, aliasing hazards, harder errors
Typical useLinear algebra libraries (Eigen, Blaze), vector/matrix kernels

FAQ

Q1: Numeric-heavy code with large chained expressions.
Q2: Downsides—complexity, compile time, debugging.
Q3: Eigen combines expression templates with SIMD and tuning.
Q4: C++20 Ranges focus on lazy view composition; expression templates target fused numeric kernels—related spirit, different domain.
Q5: Fix aliasing with temporaries or careful operator=.
Q6: See C++ Templates: The Complete Guide, Eigen docs, Modern C++ Design.


  • CRTP
  • Template basics
  • Move semantics

Keywords

C++, expression template, templates, optimization, lazy evaluation, Eigen.

See also

  • Expression templates (overview)
  • auto deduction
  • Branch prediction
  • Cache optimization
  • CTAD