본문으로 건너뛰기
Previous
Next
Python Meets C++: High-Performance Engines with pybind11

Python Meets C++: High-Performance Engines with pybind11

Python Meets C++: High-Performance Engines with pybind11

이 글의 핵심

Bind C++ functions and classes to Python with pybind11: write the module macro, build with CMake, expose NumPy arrays via py::array_t, release the GIL for pure C++ kernels, and ship manylinux wheels.

Why pybind11?

Python dominates data science and AI, but pure Python loops can be orders of magnitude slower than equivalent C++. The typical bottleneck is a preprocessing loop, a custom distance metric, or a simulation kernel that runs millions of times.

pybind11 is the standard way to expose C++ as a native Python module. Unlike ctypes (which calls pre-built C libraries) or Cython (which adds a compiled superset of Python), pybind11 lets you write type-safe C++ binding code that integrates naturally with your existing C++ library. It produces a .so (Linux/macOS) or .pyd (Windows) that Python can import like any other module.

The library is header-only, requires no code generation step, and handles reference counting, exception translation, and STL container conversions automatically.


How pybind11 Works

Python script
  └── import example              # loads example.cpython-311-x86_64.so
        └── PYBIND11_MODULE       # your C++ binding code
              └── your C++ library functions and classes

When Python calls example.add(1, 2):

  1. Python’s C API calls the extension module’s function pointer
  2. pybind11 unpacks Python objects into C++ types (int, std::string, etc.)
  3. Your C++ function runs
  4. pybind11 converts the return value back to a Python object
  5. Python receives the result

The conversion layer handles type checking and throws TypeError if arguments don’t match.


Minimal Example

// example.cpp
#include <pybind11/pybind11.h>

namespace py = pybind11;

int add(int a, int b) { return a + b; }

double mean(std::vector<double> v) {
    double s = 0;
    for (double x : v) s += x;
    return v.empty() ? 0.0 : s / v.size();
}

PYBIND11_MODULE(example, m) {
    m.doc() = "Minimal pybind11 example module";

    m.def("add", &add, "Add two integers",
          py::arg("a"), py::arg("b"));   // named args for Python call sites

    m.def("mean", &mean, "Compute arithmetic mean of a list",
          py::arg("values"));
}
import example

print(example.add(1, 2))          # 3
print(example.add(a=3, b=4))      # 7 — named arguments work
print(example.mean([1.0, 2.0, 3.0]))  # 2.0

Note: the name in PYBIND11_MODULE(example, m) must match the filename (example.so/example.pyd) and the import statement.


Building with CMake

# CMakeLists.txt
cmake_minimum_required(VERSION 3.20)
project(example)

# Find Python and pybind11
find_package(Python3 COMPONENTS Interpreter Development REQUIRED)
find_package(pybind11 CONFIG REQUIRED)   # install: pip install pybind11

# Build the extension module
pybind11_add_module(example example.cpp)
target_compile_features(example PRIVATE cxx_std_17)
# Build
cmake -B build
cmake --build build -j$(nproc)

# Test import
python3 -c "import sys; sys.path.insert(0, 'build'); import example; print(example.add(1,2))"

If find_package(pybind11) fails:

pip install pybind11
# Then set the CMake prefix path
cmake -B build -DCMAKE_PREFIX_PATH=$(python3 -c "import pybind11; print(pybind11.get_cmake_dir())")

Alternative: FetchContent

include(FetchContent)
FetchContent_Declare(pybind11
    GIT_REPOSITORY https://github.com/pybind/pybind11.git
    GIT_TAG        v2.12.0)
FetchContent_MakeAvailable(pybind11)

Exposing Classes

#include <pybind11/pybind11.h>
#include <string>

namespace py = pybind11;

class Matrix {
    int rows_, cols_;
    std::vector<double> data_;

public:
    Matrix(int rows, int cols)
        : rows_(rows), cols_(cols), data_(rows * cols, 0.0) {}

    double get(int r, int c) const { return data_[r * cols_ + c]; }
    void   set(int r, int c, double v) { data_[r * cols_ + c] = v; }

    int rows() const { return rows_; }
    int cols() const { return cols_; }

    Matrix operator+(const Matrix& rhs) const {
        Matrix result(rows_, cols_);
        for (size_t i = 0; i < data_.size(); ++i)
            result.data_[i] = data_[i] + rhs.data_[i];
        return result;
    }

    std::string repr() const {
        return "Matrix(" + std::to_string(rows_) + "x" + std::to_string(cols_) + ")";
    }
};

PYBIND11_MODULE(linalg, m) {
    py::class_<Matrix>(m, "Matrix")
        .def(py::init<int, int>(), py::arg("rows"), py::arg("cols"))
        .def("get",  &Matrix::get,  py::arg("row"), py::arg("col"))
        .def("set",  &Matrix::set,  py::arg("row"), py::arg("col"), py::arg("value"))
        .def_property_readonly("rows", &Matrix::rows)
        .def_property_readonly("cols", &Matrix::cols)
        .def("__add__",  &Matrix::operator+)
        .def("__repr__", &Matrix::repr);
}
from linalg import Matrix

m = Matrix(3, 3)
m.set(0, 0, 1.0)
m.set(1, 1, 2.0)
print(m)           # Matrix(3x3)
print(m.rows)      # 3

NumPy Integration

The most common use case: pass a NumPy array from Python, process it in C++, return results without copying data.

#include <pybind11/pybind11.h>
#include <pybind11/numpy.h>
#include <cmath>

namespace py = pybind11;

// Compute dot product of two 1D float64 arrays
double dot(py::array_t<double> a, py::array_t<double> b) {
    // Request buffer info: pointer, strides, shape
    py::buffer_info buf_a = a.request();
    py::buffer_info buf_b = b.request();

    if (buf_a.ndim != 1 || buf_b.ndim != 1)
        throw std::runtime_error("Expected 1D arrays");

    if (buf_a.shape[0] != buf_b.shape[0])
        throw std::runtime_error("Array size mismatch");

    auto* pa = static_cast<double*>(buf_a.ptr);
    auto* pb = static_cast<double*>(buf_b.ptr);
    ssize_t n = buf_a.shape[0];

    double result = 0.0;
    for (ssize_t i = 0; i < n; ++i)
        result += pa[i] * pb[i];

    return result;
}

// Apply sqrt element-wise, return a new array
py::array_t<double> elementwiseSqrt(py::array_t<double, py::array::c_style> arr) {
    py::buffer_info info = arr.request();
    auto result = py::array_t<double>(info.shape);

    auto* src = static_cast<double*>(info.ptr);
    auto* dst = static_cast<double*>(result.request().ptr);
    ssize_t n = info.size;

    for (ssize_t i = 0; i < n; ++i)
        dst[i] = std::sqrt(src[i]);

    return result;
}

PYBIND11_MODULE(numops, m) {
    m.def("dot", &dot,
          "Dot product of two 1D float64 arrays",
          py::arg("a"), py::arg("b"));

    m.def("sqrt", &elementwiseSqrt,
          "Element-wise square root",
          py::arg("arr"));
}
import numpy as np
from numops import dot, sqrt

a = np.array([1.0, 2.0, 3.0])
b = np.array([4.0, 5.0, 6.0])

print(dot(a, b))              # 32.0  (1*4 + 2*5 + 3*6)
print(sqrt(np.array([4.0, 9.0, 16.0])))  # [2. 3. 4.]

Always require contiguous arrays when using raw pointer access. Non-contiguous NumPy arrays (slices, transposed views) have non-unit strides:

# This may fail or give wrong results if your C++ code ignores strides
sliced = np.array([1.0, 2.0, 3.0, 4.0])[::2]  # non-contiguous

# Safe: force contiguous before passing
dot(np.ascontiguousarray(sliced), ...)

# Or: use py::array::c_style flag in the type — pybind11 will copy if needed

Releasing the GIL

For long-running pure C++ work, release the GIL so other Python threads can run concurrently:

#include <pybind11/pybind11.h>
#include <pybind11/numpy.h>
#include <thread>
#include <vector>

namespace py = pybind11;

// Heavy computation — no Python objects touched
static void heavyWork(double* data, ssize_t n) {
    for (ssize_t i = 0; i < n; ++i)
        data[i] *= data[i];   // pure C++ — no Python API
}

py::array_t<double> squareAll(py::array_t<double, py::array::c_style> arr) {
    py::buffer_info info = arr.request();
    auto result = py::array_t<double>(info.shape);

    auto* src = static_cast<double*>(info.ptr);
    auto* dst = static_cast<double*>(result.request().ptr);
    ssize_t n = info.size;

    // Copy input to output buffer
    std::copy(src, src + n, dst);

    {
        py::gil_scoped_release release;   // release GIL here
        // Only raw C++ operations here — never call Python API
        heavyWork(dst, n);
        // GIL reacquired when 'release' goes out of scope
    }

    return result;
}

Critical rule: once you have released the GIL, never call any Python C API function until the GIL is reacquired. This includes creating Python objects, calling Python functions, or accessing pybind11 wrapper types. Violations cause crashes and are hard to debug.


Exception Handling

C++ exceptions automatically translate to Python exceptions when they propagate through pybind11:

// std::exception → RuntimeError (automatic)
// std::invalid_argument → ValueError (automatic with pybind11/stl.h)

// Custom C++ exception → custom Python exception
class DatabaseError : public std::exception {
    std::string msg_;
public:
    explicit DatabaseError(std::string msg) : msg_(std::move(msg)) {}
    const char* what() const noexcept override { return msg_.c_str(); }
};

PYBIND11_MODULE(mymodule, m) {
    // Register the custom exception
    py::register_exception<DatabaseError>(m, "DatabaseError");

    m.def("connectDB", []() {
        throw DatabaseError("Connection refused: host unreachable");
    });
}
from mymodule import DatabaseError, connectDB

try:
    connectDB()
except DatabaseError as e:
    print(f"Caught: {e}")   # Caught: Connection refused: host unreachable

Common Errors

ErrorCauseFix
ModuleNotFoundError: No module named 'example'.so not on Python pathsys.path.insert(0, 'build/') or install with pip
undefined symbol: PyInit_exampleModule name in PYBIND11_MODULE differs from filenameName in macro must match .so filename exactly
find_package(pybind11) failedpybind11 not installedpip install pybind11 + set CMAKE_PREFIX_PATH
Crash on non-contiguous NumPy arrayIgnoring strides, treating data as contiguousUse py::array::c_style flag or call np.ascontiguousarray()
Crash after gil_scoped_releaseCalled Python API without GILOnly raw C++ operations inside gil_scoped_release block
TypeError: incompatible typesPython type doesn’t match C++ parameterCheck pybind11 type caster for the type; include <pybind11/stl.h> for STL
Double-free on returned containerContainer ownership ambiguousUse py::return_value_policy to control ownership

STL Container Conversions

Include <pybind11/stl.h> for automatic conversion:

#include <pybind11/pybind11.h>
#include <pybind11/stl.h>
#include <vector>
#include <map>
#include <optional>

namespace py = pybind11;

std::vector<int> makeRange(int n) {
    std::vector<int> v(n);
    for (int i = 0; i < n; ++i) v[i] = i;
    return v;  // converted to Python list automatically
}

std::map<std::string, int> wordCount(std::vector<std::string> words) {
    std::map<std::string, int> counts;
    for (const auto& w : words) ++counts[w];
    return counts;  // converted to Python dict automatically
}

std::optional<std::string> findUser(int id) {
    if (id == 42) return "Alice";
    return std::nullopt;  // converted to None
}

PYBIND11_MODULE(collections, m) {
    m.def("make_range",  &makeRange,  py::arg("n"));
    m.def("word_count",  &wordCount,  py::arg("words"));
    m.def("find_user",   &findUser,   py::arg("id"));
}
from collections_ext import make_range, word_count, find_user

print(make_range(5))                              # [0, 1, 2, 3, 4]
print(word_count(["a", "b", "a", "c", "b", "a"]))  # {'a': 3, 'b': 2, 'c': 1}
print(find_user(42))                              # Alice
print(find_user(99))                              # None

Note: STL conversions copy the data — modifications to the Python list do not affect the C++ vector.


Production Patterns

Expose version information

PYBIND11_MODULE(mylib, m) {
    m.attr("__version__") = "1.2.0";
    m.attr("__author__")  = "Your Name";
    // ...
}

Build manylinux wheels for PyPI

# .github/workflows/wheels.yml
jobs:
  build-wheels:
    strategy:
      matrix:
        os: [ubuntu-latest, macos-latest, windows-latest]
        python: ['3.9', '3.10', '3.11', '3.12']
    runs-on: ${{ matrix.os }}
    steps:
      - uses: actions/checkout@v4
      - uses: actions/setup-python@v4
        with: { python-version: '${{ matrix.python }}' }
      - run: pip install cibuildwheel
      - run: cibuildwheel --output-dir dist/
      - uses: actions/upload-artifact@v4
        with: { path: dist/*.whl }

cibuildwheel automatically uses manylinux Docker images on Linux to produce wheels compatible with most Linux distributions.


Key Takeaways

  • PYBIND11_MODULE(name, m) is the entry point — name must match the .so/.pyd filename and the Python import statement
  • py::array_t<T> with buf.request() gives direct pointer access to NumPy buffers — zero copy for input arrays
  • py::gil_scoped_release releases the GIL for pure C++ work; never call Python API inside the release block
  • <pybind11/stl.h> enables automatic conversion of std::vector, std::map, std::optional, etc. — data is copied
  • Contiguous arrays: check strides or use py::array::c_style flag — non-contiguous arrays have stride != sizeof(T)
  • Exception translation: std::exceptionRuntimeError automatically; use py::register_exception for custom types
  • Wheels: build per-Python-version and per-platform; use cibuildwheel for CI — manylinux for Linux compatibility

자주 묻는 질문 (FAQ)

Q. 이 내용을 실무에서 언제 쓰나요?

A. Bind C++ to Python with pybind11: minimal modules, CMake/setuptools builds, NumPy buffers, GIL release, wheels, and prod… 실무에서는 위 본문의 예제와 선택 가이드를 참고해 적용하면 됩니다.

Q. 선행으로 읽으면 좋은 글은?

A. 각 글 하단의 이전 글 또는 관련 글 링크를 따라가면 순서대로 배울 수 있습니다. C++ 시리즈 목차에서 전체 흐름을 확인할 수 있습니다.

Q. 더 깊이 공부하려면?

A. cppreference와 해당 라이브러리 공식 문서를 참고하세요. 글 말미의 참고 자료 링크도 활용하면 좋습니다.


같이 보면 좋은 글 (내부 링크)

이 주제와 연결되는 다른 글입니다.


이 글에서 다루는 키워드 (관련 검색어)

C++, Python, pybind11, Bindings, Performance, AI, Data Science 등으로 검색하시면 이 글이 도움이 됩니다.