Python Meets C++: High-Performance Engines with pybind11 [#35-1]
이 글의 핵심
Build importable C++ extensions for Python with pybind11—NumPy, GIL, packaging, and pitfalls.
Introduction: “Python is nice—only this loop should be C++”
“Our preprocessing loop takes 30 minutes”
In AI/data pipelines, pure Python loops can be orders of magnitude slower than C. pybind11 is the most ergonomic way to expose C++ as import-able native modules—lighter than Boost.Python, great C++11+ ergonomics, strong NumPy support.
This article covers: what pybind11 is, minimal PYBIND11_MODULE, CMake/setuptools builds, py::array_t, GIL rules, common import/build errors, performance notes, and shipping wheels.
Environment: Python 3.6+ dev headers, CMake 3.4+ or setuptools, C++11+, NumPy optional.
Table of contents
- What is pybind11?
- Minimal example
- Building with CMake
- NumPy integration
- Complete examples
- Common errors
- Performance
- Production patterns
- Tips
1. What is pybind11?
- Header-only bindings from C++ functions/classes/enums to Python.
- Produces native extensions (
.so/.pyd) using the Python C API. - Converts many STL containers to Python lists/dicts automatically.
flowchart TB
subgraph py["Python"]
P[Script] --> M[Extension module]
end
subgraph cpp["C++"]
M --> B[pybind11 bindings]
B --> C[Your code]
B --> N[py::array_t]
end
2. Minimal example
#include <pybind11/pybind11.h>
namespace py = pybind11;
int add(int a, int b) { return a + b; }
PYBIND11_MODULE(example, m) {
m.doc() = "minimal pybind11 example";
m.def("add", &add, "Add two integers");
}
import example
print(example.add(1, 2)) # 3
3. Build (CMake sketch)
find_package(Python3 COMPONENTS Interpreter Development REQUIRED)
find_package(pybind11 CONFIG REQUIRED)
pybind11_add_module(example example.cpp)
target_link_libraries(example PRIVATE pybind11::module)
Use pybind11::embed when embedding a Python interpreter inside a C++ executable; extension modules link pybind11::module.
4. NumPy
Use py::array_t<T> and buf.request() for pointer/size/strides. Release the GIL for heavy numeric kernels:
py::gil_scoped_release release;
// touch only raw pointers / POD here—no Python API calls
5. Complete examples
Complete examples in this series include dot products over raw buffers, custom C++ exceptions mapped to Python, and STL containers exposed as Python sequences—patterns you reuse when wrapping numeric kernels or ML preprocessing.
6. Common errors
| Issue | Fix |
|---|---|
ModuleNotFoundError | Put built .so/.pyd on PYTHONPATH or install into site-packages |
undefined symbol: PyInit_* | Module name in PYBIND11_MODULE(name, m) must match import |
find_package(pybind11) fails | pip install pybind11, set CMAKE_PREFIX_PATH to cmake dir, or FetchContent |
| non-contiguous NumPy | np.ascontiguousarray or honor strides in C++ |
crash after gil_scoped_release | Never call Python C API without GIL |
7. Performance
Pure Python loops are slow; NumPy is fast; hand-written C++ on contiguous buffers via pybind11 can beat NumPy on irregular control flow. Measure your workload.
8. Production patterns
- Expose
__version__, validate inputs, translate C++ exceptions withpy::register_exception. - Build manylinux/macOS/Windows wheels per Python ABI.
- Document contiguous-array requirements.
9. Tips
Profile before optimizing; keep hot loops GIL-free and pointer-only.
Related posts
- vcpkg & Conan (#40-1)
- C++ vs Python
Keywords
pybind11 tutorial, Python C++ binding, NumPy C++, Python extension module, machine learning acceleration
Next: WebAssembly & Emscripten (#35-2)
Previous: Cache alignment (#34-2)