Python Meets C++: High-Performance Engines with pybind11
이 글의 핵심
Bind C++ functions and classes to Python with pybind11: write the module macro, build with CMake, expose NumPy arrays via py::array_t, release the GIL for pure C++ kernels, and ship manylinux wheels.
Why pybind11?
Python dominates data science and AI, but pure Python loops can be orders of magnitude slower than equivalent C++. The typical bottleneck is a preprocessing loop, a custom distance metric, or a simulation kernel that runs millions of times.
pybind11 is the standard way to expose C++ as a native Python module. Unlike ctypes (which calls pre-built C libraries) or Cython (which adds a compiled superset of Python), pybind11 lets you write type-safe C++ binding code that integrates naturally with your existing C++ library. It produces a .so (Linux/macOS) or .pyd (Windows) that Python can import like any other module.
The library is header-only, requires no code generation step, and handles reference counting, exception translation, and STL container conversions automatically.
How pybind11 Works
Python script
└── import example # loads example.cpython-311-x86_64.so
└── PYBIND11_MODULE # your C++ binding code
└── your C++ library functions and classes
When Python calls example.add(1, 2):
- Python’s C API calls the extension module’s function pointer
- pybind11 unpacks Python objects into C++ types (
int,std::string, etc.) - Your C++ function runs
- pybind11 converts the return value back to a Python object
- Python receives the result
The conversion layer handles type checking and throws TypeError if arguments don’t match.
Minimal Example
// example.cpp
#include <pybind11/pybind11.h>
namespace py = pybind11;
int add(int a, int b) { return a + b; }
double mean(std::vector<double> v) {
double s = 0;
for (double x : v) s += x;
return v.empty() ? 0.0 : s / v.size();
}
PYBIND11_MODULE(example, m) {
m.doc() = "Minimal pybind11 example module";
m.def("add", &add, "Add two integers",
py::arg("a"), py::arg("b")); // named args for Python call sites
m.def("mean", &mean, "Compute arithmetic mean of a list",
py::arg("values"));
}
import example
print(example.add(1, 2)) # 3
print(example.add(a=3, b=4)) # 7 — named arguments work
print(example.mean([1.0, 2.0, 3.0])) # 2.0
Note: the name in PYBIND11_MODULE(example, m) must match the filename (example.so/example.pyd) and the import statement.
Building with CMake
# CMakeLists.txt
cmake_minimum_required(VERSION 3.20)
project(example)
# Find Python and pybind11
find_package(Python3 COMPONENTS Interpreter Development REQUIRED)
find_package(pybind11 CONFIG REQUIRED) # install: pip install pybind11
# Build the extension module
pybind11_add_module(example example.cpp)
target_compile_features(example PRIVATE cxx_std_17)
# Build
cmake -B build
cmake --build build -j$(nproc)
# Test import
python3 -c "import sys; sys.path.insert(0, 'build'); import example; print(example.add(1,2))"
If find_package(pybind11) fails:
pip install pybind11
# Then set the CMake prefix path
cmake -B build -DCMAKE_PREFIX_PATH=$(python3 -c "import pybind11; print(pybind11.get_cmake_dir())")
Alternative: FetchContent
include(FetchContent)
FetchContent_Declare(pybind11
GIT_REPOSITORY https://github.com/pybind/pybind11.git
GIT_TAG v2.12.0)
FetchContent_MakeAvailable(pybind11)
Exposing Classes
#include <pybind11/pybind11.h>
#include <string>
namespace py = pybind11;
class Matrix {
int rows_, cols_;
std::vector<double> data_;
public:
Matrix(int rows, int cols)
: rows_(rows), cols_(cols), data_(rows * cols, 0.0) {}
double get(int r, int c) const { return data_[r * cols_ + c]; }
void set(int r, int c, double v) { data_[r * cols_ + c] = v; }
int rows() const { return rows_; }
int cols() const { return cols_; }
Matrix operator+(const Matrix& rhs) const {
Matrix result(rows_, cols_);
for (size_t i = 0; i < data_.size(); ++i)
result.data_[i] = data_[i] + rhs.data_[i];
return result;
}
std::string repr() const {
return "Matrix(" + std::to_string(rows_) + "x" + std::to_string(cols_) + ")";
}
};
PYBIND11_MODULE(linalg, m) {
py::class_<Matrix>(m, "Matrix")
.def(py::init<int, int>(), py::arg("rows"), py::arg("cols"))
.def("get", &Matrix::get, py::arg("row"), py::arg("col"))
.def("set", &Matrix::set, py::arg("row"), py::arg("col"), py::arg("value"))
.def_property_readonly("rows", &Matrix::rows)
.def_property_readonly("cols", &Matrix::cols)
.def("__add__", &Matrix::operator+)
.def("__repr__", &Matrix::repr);
}
from linalg import Matrix
m = Matrix(3, 3)
m.set(0, 0, 1.0)
m.set(1, 1, 2.0)
print(m) # Matrix(3x3)
print(m.rows) # 3
NumPy Integration
The most common use case: pass a NumPy array from Python, process it in C++, return results without copying data.
#include <pybind11/pybind11.h>
#include <pybind11/numpy.h>
#include <cmath>
namespace py = pybind11;
// Compute dot product of two 1D float64 arrays
double dot(py::array_t<double> a, py::array_t<double> b) {
// Request buffer info: pointer, strides, shape
py::buffer_info buf_a = a.request();
py::buffer_info buf_b = b.request();
if (buf_a.ndim != 1 || buf_b.ndim != 1)
throw std::runtime_error("Expected 1D arrays");
if (buf_a.shape[0] != buf_b.shape[0])
throw std::runtime_error("Array size mismatch");
auto* pa = static_cast<double*>(buf_a.ptr);
auto* pb = static_cast<double*>(buf_b.ptr);
ssize_t n = buf_a.shape[0];
double result = 0.0;
for (ssize_t i = 0; i < n; ++i)
result += pa[i] * pb[i];
return result;
}
// Apply sqrt element-wise, return a new array
py::array_t<double> elementwiseSqrt(py::array_t<double, py::array::c_style> arr) {
py::buffer_info info = arr.request();
auto result = py::array_t<double>(info.shape);
auto* src = static_cast<double*>(info.ptr);
auto* dst = static_cast<double*>(result.request().ptr);
ssize_t n = info.size;
for (ssize_t i = 0; i < n; ++i)
dst[i] = std::sqrt(src[i]);
return result;
}
PYBIND11_MODULE(numops, m) {
m.def("dot", &dot,
"Dot product of two 1D float64 arrays",
py::arg("a"), py::arg("b"));
m.def("sqrt", &elementwiseSqrt,
"Element-wise square root",
py::arg("arr"));
}
import numpy as np
from numops import dot, sqrt
a = np.array([1.0, 2.0, 3.0])
b = np.array([4.0, 5.0, 6.0])
print(dot(a, b)) # 32.0 (1*4 + 2*5 + 3*6)
print(sqrt(np.array([4.0, 9.0, 16.0]))) # [2. 3. 4.]
Always require contiguous arrays when using raw pointer access. Non-contiguous NumPy arrays (slices, transposed views) have non-unit strides:
# This may fail or give wrong results if your C++ code ignores strides
sliced = np.array([1.0, 2.0, 3.0, 4.0])[::2] # non-contiguous
# Safe: force contiguous before passing
dot(np.ascontiguousarray(sliced), ...)
# Or: use py::array::c_style flag in the type — pybind11 will copy if needed
Releasing the GIL
For long-running pure C++ work, release the GIL so other Python threads can run concurrently:
#include <pybind11/pybind11.h>
#include <pybind11/numpy.h>
#include <thread>
#include <vector>
namespace py = pybind11;
// Heavy computation — no Python objects touched
static void heavyWork(double* data, ssize_t n) {
for (ssize_t i = 0; i < n; ++i)
data[i] *= data[i]; // pure C++ — no Python API
}
py::array_t<double> squareAll(py::array_t<double, py::array::c_style> arr) {
py::buffer_info info = arr.request();
auto result = py::array_t<double>(info.shape);
auto* src = static_cast<double*>(info.ptr);
auto* dst = static_cast<double*>(result.request().ptr);
ssize_t n = info.size;
// Copy input to output buffer
std::copy(src, src + n, dst);
{
py::gil_scoped_release release; // release GIL here
// Only raw C++ operations here — never call Python API
heavyWork(dst, n);
// GIL reacquired when 'release' goes out of scope
}
return result;
}
Critical rule: once you have released the GIL, never call any Python C API function until the GIL is reacquired. This includes creating Python objects, calling Python functions, or accessing pybind11 wrapper types. Violations cause crashes and are hard to debug.
Exception Handling
C++ exceptions automatically translate to Python exceptions when they propagate through pybind11:
// std::exception → RuntimeError (automatic)
// std::invalid_argument → ValueError (automatic with pybind11/stl.h)
// Custom C++ exception → custom Python exception
class DatabaseError : public std::exception {
std::string msg_;
public:
explicit DatabaseError(std::string msg) : msg_(std::move(msg)) {}
const char* what() const noexcept override { return msg_.c_str(); }
};
PYBIND11_MODULE(mymodule, m) {
// Register the custom exception
py::register_exception<DatabaseError>(m, "DatabaseError");
m.def("connectDB", []() {
throw DatabaseError("Connection refused: host unreachable");
});
}
from mymodule import DatabaseError, connectDB
try:
connectDB()
except DatabaseError as e:
print(f"Caught: {e}") # Caught: Connection refused: host unreachable
Common Errors
| Error | Cause | Fix |
|---|---|---|
ModuleNotFoundError: No module named 'example' | .so not on Python path | sys.path.insert(0, 'build/') or install with pip |
undefined symbol: PyInit_example | Module name in PYBIND11_MODULE differs from filename | Name in macro must match .so filename exactly |
find_package(pybind11) failed | pybind11 not installed | pip install pybind11 + set CMAKE_PREFIX_PATH |
| Crash on non-contiguous NumPy array | Ignoring strides, treating data as contiguous | Use py::array::c_style flag or call np.ascontiguousarray() |
Crash after gil_scoped_release | Called Python API without GIL | Only raw C++ operations inside gil_scoped_release block |
TypeError: incompatible types | Python type doesn’t match C++ parameter | Check pybind11 type caster for the type; include <pybind11/stl.h> for STL |
| Double-free on returned container | Container ownership ambiguous | Use py::return_value_policy to control ownership |
STL Container Conversions
Include <pybind11/stl.h> for automatic conversion:
#include <pybind11/pybind11.h>
#include <pybind11/stl.h>
#include <vector>
#include <map>
#include <optional>
namespace py = pybind11;
std::vector<int> makeRange(int n) {
std::vector<int> v(n);
for (int i = 0; i < n; ++i) v[i] = i;
return v; // converted to Python list automatically
}
std::map<std::string, int> wordCount(std::vector<std::string> words) {
std::map<std::string, int> counts;
for (const auto& w : words) ++counts[w];
return counts; // converted to Python dict automatically
}
std::optional<std::string> findUser(int id) {
if (id == 42) return "Alice";
return std::nullopt; // converted to None
}
PYBIND11_MODULE(collections, m) {
m.def("make_range", &makeRange, py::arg("n"));
m.def("word_count", &wordCount, py::arg("words"));
m.def("find_user", &findUser, py::arg("id"));
}
from collections_ext import make_range, word_count, find_user
print(make_range(5)) # [0, 1, 2, 3, 4]
print(word_count(["a", "b", "a", "c", "b", "a"])) # {'a': 3, 'b': 2, 'c': 1}
print(find_user(42)) # Alice
print(find_user(99)) # None
Note: STL conversions copy the data — modifications to the Python list do not affect the C++ vector.
Production Patterns
Expose version information
PYBIND11_MODULE(mylib, m) {
m.attr("__version__") = "1.2.0";
m.attr("__author__") = "Your Name";
// ...
}
Build manylinux wheels for PyPI
# .github/workflows/wheels.yml
jobs:
build-wheels:
strategy:
matrix:
os: [ubuntu-latest, macos-latest, windows-latest]
python: ['3.9', '3.10', '3.11', '3.12']
runs-on: ${{ matrix.os }}
steps:
- uses: actions/checkout@v4
- uses: actions/setup-python@v4
with: { python-version: '${{ matrix.python }}' }
- run: pip install cibuildwheel
- run: cibuildwheel --output-dir dist/
- uses: actions/upload-artifact@v4
with: { path: dist/*.whl }
cibuildwheel automatically uses manylinux Docker images on Linux to produce wheels compatible with most Linux distributions.
Key Takeaways
PYBIND11_MODULE(name, m)is the entry point —namemust match the.so/.pydfilename and the Pythonimportstatementpy::array_t<T>withbuf.request()gives direct pointer access to NumPy buffers — zero copy for input arrayspy::gil_scoped_releasereleases the GIL for pure C++ work; never call Python API inside the release block<pybind11/stl.h>enables automatic conversion ofstd::vector,std::map,std::optional, etc. — data is copied- Contiguous arrays: check strides or use
py::array::c_styleflag — non-contiguous arrays have stride != sizeof(T) - Exception translation:
std::exception→RuntimeErrorautomatically; usepy::register_exceptionfor custom types - Wheels: build per-Python-version and per-platform; use
cibuildwheelfor CI — manylinux for Linux compatibility
자주 묻는 질문 (FAQ)
Q. 이 내용을 실무에서 언제 쓰나요?
A. Bind C++ to Python with pybind11: minimal modules, CMake/setuptools builds, NumPy buffers, GIL release, wheels, and prod… 실무에서는 위 본문의 예제와 선택 가이드를 참고해 적용하면 됩니다.
Q. 선행으로 읽으면 좋은 글은?
A. 각 글 하단의 이전 글 또는 관련 글 링크를 따라가면 순서대로 배울 수 있습니다. C++ 시리즈 목차에서 전체 흐름을 확인할 수 있습니다.
Q. 더 깊이 공부하려면?
A. cppreference와 해당 라이브러리 공식 문서를 참고하세요. 글 말미의 참고 자료 링크도 활용하면 좋습니다.
같이 보면 좋은 글 (내부 링크)
이 주제와 연결되는 다른 글입니다.
- C++ Python 스크립팅 완벽 가이드 | pybind11 모듈·클래스·NumPy·예외 처리 [실전]
- C++ unique_ptr 고급 완벽 가이드 | 커스텀 삭제자·배열
- C++ WebAssembly(Wasm)와 Emscripten | C++을 브라우저에서 돌리기 [#35-2]
이 글에서 다루는 키워드 (관련 검색어)
C++, Python, pybind11, Bindings, Performance, AI, Data Science 등으로 검색하시면 이 글이 도움이 됩니다.