이 내용을 실무에서 언제 쓰나요?

대용량 로그 집계, 실시간 통계, 읽기 부하 분산(레플리카셋), 복잡한 조인·그룹핑이 필요한 분석 쿼리에 활용합니다. 위 본문의 예제와 선택 가이드를 참고해 적용하면 됩니다.

선행으로 읽으면 좋은 글은?

MongoDB 기본 CRUD·연결은 [MongoDB 완벽 가이드(#52-3)](/blog/cpp-series-52-3-mongodb-cpp/)를 먼저 읽으세요. 각 글 하단의 이전 글 링크를 따라가면 순서대로 배울 수 있습니다.

더 깊이 공부하려면?

MongoDB 공식 문서의 Aggregation Pipeline, Index Strategies, Replica Set 섹션을 참고하세요. 글 말미의 참고 자료 링크도 활용하면 좋습니다.

C++ MongoDB 드라이버 고급 | 집계 파이프라인·인덱싱·레플리카셋 완벽 가이드 [#52-4]

2026년 4월 7일 · 28분 읽기 · 수정 2026년 4월 7일 고급 실습

이 글의 핵심

MongoDB C++ 고급 활용: 집계 파이프라인($match·$group·$lookup), 인덱스 전략, 레플리카셋 연결·Read Preference. 실전 문제 시나리오, 흔한 에러 해결, 성능 최적화, 프로덕션 패턴까지.

들어가며: 집계·인덱싱·레플리카셋이 필요한 순간

실제 겪는 문제 시나리오

시나리오 1: 로그 데이터를 일별·시간별로 집계해야 함
수백만 건의 이벤트 로그를 DB에 저장해 두었는데, “일별 활성 사용자 수”, “시간대별 평균 응답 시간” 같은 통계를 계산하려면 find()만으로는 부족합니다. SQL의 GROUP BY, SUM, AVG에 해당하는 작업이 필요합니다.

시나리오 2: find()가 느려서 타임아웃 발생
userId와 createdAt로 조회하는 쿼리가 수십만 건 이상일 때, 인덱스 없이 컬렉션 전체를 스캔하면 수 초가 걸립니다. 어떤 인덱스를 어떻게 만들면 되는지, 복합 인덱스 순서는 어떻게 해야 하는지 막막합니다.

시나리오 3: 읽기 부하가 많아 DB가 과부하
쓰기는 적고 읽기가 많은 서비스에서, 단일 MongoDB 인스턴스가 병목이 됩니다. 레플리카셋으로 읽기를 세컨더리로 분산하고 싶은데, C++ 드라이버에서 어떻게 설정하는지 모르겠습니다.

시나리오 4: 집계 파이프라인 메모리 초과 에러
$group·$sort가 큰 데이터셋에서 처리되다가 “Exceeded memory limit” 에러가 발생합니다. allow_disk_use 설정이나 파이프라인 단계를 나누는 방법이 필요합니다.

시나리오 5: 여러 컬렉션을 조인해서 결과를 만들어야 함
orders와 users 컬렉션을 userId로 연결해, 주문별로 사용자 정보를 붙여서 반환해야 합니다. SQL의 JOIN에 해당하는 $lookup 사용법이 필요합니다.

시나리오 6: 레플리카셋 연결 시 “Not a replica set member” 에러
단일 인스턴스 URI로 연결하려 했는데, 실제로는 레플리카셋으로 구성된 클러스터인 경우 연결이 실패합니다. URI 형식과 옵션을 어떻게 해야 하는지 혼란스럽습니다.

MongoDB C++ 드라이버 고급 기능으로 해결:

집계 파이프라인: $match, $group, $lookup, $sort 등으로 복잡한 분석 쿼리
인덱스 전략: 단일·복합·TTL 인덱스, create_index 옵션
레플리카셋: read_preference, write_concern, 멀티 호스트 URI

flowchart LR
  subgraph Agg[집계]
    A1[문서] --> A2[$match]
    A2 --> A3[$group]
    A3 --> A4[$sort]
    A4 --> A5[결과]
  end
  subgraph Idx[인덱싱]
    I1[쿼리] --> I2[인덱스 조회]
    I2 --> I3[빠른 반환]
  end
  subgraph RS[레플리카셋]
    R1[Primary] --> R2[쓰기]
    R3[Secondary] --> R4[읽기 분산]
  end

이 글에서 다루는 것:

집계 파이프라인 완전한 예제 ($match, $group, $lookup, $project, $sort)
인덱스 전략: 단일·복합·TTL·텍스트 인덱스
레플리카셋 연결, Read Preference, Write Concern
자주 발생하는 에러와 해결법
성능 최적화 팁
프로덕션 배포 패턴

요구 환경: C++17 이상, MongoDB 기본 가이드(#52-3) 선행 학습 권장

실무 적용 경험: 이 글은 대규모 C++ 프로젝트에서 실제로 겪은 문제와 해결 과정을 바탕으로 작성되었습니다. 책이나 문서에서 다루지 않는 실전 함정과 디버깅 팁을 포함합니다.

1. 집계 파이프라인

핵심 개념

집계 파이프라인은 문서가 여러 단계(stage)를 거치며 변환되는 구조입니다. 각 단계는 $match, $group, $sort 등으로, SQL의 WHERE, GROUP BY, ORDER BY에 대응합니다. C++에서는 mongocxx::pipeline으로 스테이지를 체이닝합니다.

flowchart TB
  subgraph Pipeline[집계 파이프라인]
    S1[$match\n필터링] --> S2[$group\n그룹핑]
    S2 --> S3[$sort\n정렬]
    S3 --> S4["$project\n필드 선택"]
  end
  Docs[문서들] --> S1
  S4 --> Result[결과]

1.1 기본 집계: $match + $group

#include <bsoncxx/builder/basic/document.hpp>
#include <bsoncxx/json.hpp>
#include <mongocxx/client.hpp>
#include <mongocxx/instance.hpp>
#include <mongocxx/uri.hpp>
#include <iostream>

using bsoncxx::builder::basic::kvp;
using bsoncxx::builder::basic::make_document;

int main() {
    mongocxx::instance inst{};
    mongocxx::client client{mongocxx::uri{"mongodb://localhost:27017"}};
    auto collection = client[analytics][events];

    // 일별 활성 사용자 수 집계
    mongocxx::pipeline stages;
    stages.match(make_document(
        kvp("createdAt", make_document(
            kvp("$gte", bsoncxx::types::b_date{std::chrono::system_clock::now() - std::chrono::hours(24*7)})))))
        .group(make_document(
            kvp("_id", make_document(kvp("$dateToString", make_document(
                kvp("format", "%Y-%m-%d"),
                kvp("date", "$createdAt")))),
            kvp("activeUsers", make_document(kvp("$addToSet", "$userId"))),
            kvp("count", make_document(kvp("$sum", 1)))));

    auto cursor = collection.aggregate(stages);
    for (auto&& doc : cursor) {
        std::cout << bsoncxx::to_json(doc) << std::endl;
    }
    return 0;
}

코드 설명:

$match: 최근 7일 이벤트만 필터링
$group: _id에 날짜 문자열, $addToSet으로 유일한 userId, $sum으로 건수
aggregate(stages): 파이프라인 실행, 커서 반환

1.2 $lookup으로 컬렉션 조인

// orders와 users를 userId로 조인
void lookup_example(mongocxx::collection& orders) {
    mongocxx::pipeline stages;
    stages.lookup(make_document(
        kvp("from", "users"),
        kvp("localField", "userId"),
        kvp("foreignField", "_id"),
        kvp("as", "userInfo")));

    stages.project(make_document(
        kvp("orderId", "$_id"),
        kvp("amount", 1),
        kvp("userName", make_document(kvp("$arrayElemAt", make_array("$userInfo.name", 0)))));

    auto cursor = orders.aggregate(stages);
    for (auto&& doc : cursor) {
        std::cout << bsoncxx::to_json(doc) << std::endl;
    }
}

$lookup 옵션:

from: 조인할 컬렉션
localField: 현재 컬렉션의 필드
foreignField: 대상 컬렉션의 필드
as: 결과를 담을 배열 필드명

1.3 $match + $group + $sort + $limit 완전 예제

#include <bsoncxx/builder/basic/document.hpp>
#include <bsoncxx/builder/basic/array.hpp>
#include <bsoncxx/json.hpp>
#include <mongocxx/client.hpp>
#include <mongocxx/instance.hpp>
#include <mongocxx/uri.hpp>
#include <iostream>

using bsoncxx::builder::basic::kvp;
using bsoncxx::builder::basic::make_document;
using bsoncxx::builder::basic::make_array;

void aggregation_full_example(mongocxx::collection& collection) {
    // 요리 종류별로 그룹핑, 평균 평점 계산, 상위 5개만
    mongocxx::pipeline stages;

    stages.match(make_document(kvp("rating", make_document(kvp("$exists", true)))))
        .group(make_document(
            kvp("_id", "$cuisine"),
            kvp("avgRating", make_document(kvp("$avg", "$rating"))),
            kvp("count", make_document(kvp("$sum", 1)))))
        .sort(make_document(kvp("avgRating", -1)))
        .limit(5);

    // allow_disk_use: 대용량 시 메모리 초과 방지
    mongocxx::options::aggregate opts{};
    opts.allow_disk_use(true);

    auto cursor = collection.aggregate(stages, opts);
    for (auto&& doc : cursor) {
        std::cout << bsoncxx::to_json(doc) << std::endl;
    }
}

주의: allow_disk_use(true)는 100MB 메모리 제한을 초과할 때 디스크를 사용합니다. $graphLookup은 100MB 제한이 유지됩니다.

// 한 번의 aggregate 호출로 여러 통계를 동시에 계산
void facet_example(mongocxx::collection& collection) {
    mongocxx::pipeline stages;
    stages.facet(make_document(
        kvp("dailyStats", make_array(
            make_document(kvp("$match", make_document(kvp("type", "daily")))),
            make_document(kvp("$group", make_document(
                kvp("_id", "$date"),
                kvp("total", make_document(kvp("$sum", "$amount"))))))),
        kvp("topUsers", make_array(
            make_document(kvp("$group", make_document(
                kvp("_id", "$userId"),
                kvp("count", make_document(kvp("$sum", 1))))),
            make_document(kvp("$sort", make_document(kvp("count", -1)))),
            make_document(kvp("$limit", 10))))));

    auto cursor = collection.aggregate(stages);
    for (auto&& doc : cursor) {
        std::cout << bsoncxx::to_json(doc) << std::endl;
    }
}

$facet 사용 시: 여러 파이프라인을 병렬로 실행해 한 번의 왕복으로 다양한 통계를 얻을 수 있습니다.

2. 인덱싱 전략

핵심 개념

인덱스가 없으면 MongoDB는 컬렉션 스캔(COLLSCAN)을 수행합니다. 문서 수가 많을수록 느려집니다. 인덱스를 만들면 IXSCAN으로 해당 필드만 검색해 빠르게 결과를 반환합니다.

flowchart LR
  subgraph NoIdx[인덱스 없음]
    N1[쿼리] --> N2[전체 스캔]
    N2 --> N3[느림]
  end
  subgraph WithIdx[인덱스 있음]
    W1[쿼리] --> W2[인덱스 조회]
    W2 --> W3[빠름]
  end

2.1 단일·복합 인덱스 생성

#include <bsoncxx/builder/basic/document.hpp>
#include <mongocxx/client.hpp>
#include <mongocxx/instance.hpp>
#include <mongocxx/uri.hpp>

using bsoncxx::builder::basic::kvp;
using bsoncxx::builder::basic::make_document;

void create_indexes(mongocxx::collection& collection) {
    // 단일 인덱스: userId 오름차순
    auto idx1 = make_document(kvp("userId", 1));
    collection.create_index(idx1.view());

    // 복합 인덱스: userId + createdAt (조회 패턴에 맞춰 순서 중요)
    auto idx2 = make_document(
        kvp("userId", 1),
        kvp("createdAt", -1));
    collection.create_index(idx2.view());

    // 유니크 인덱스
    mongocxx::options::index opts{};
    opts.unique(true);
    auto idx3 = make_document(kvp("email", 1));
    collection.create_index(idx3.view(), opts);
}

복합 인덱스 순서: userId로 먼저 필터하고 createdAt으로 정렬하는 쿼리라면 (userId, createdAt) 순서가 적합합니다. (createdAt, userId)는 다른 쿼리 패턴에 맞습니다.

2.2 TTL 인덱스 (자동 만료)

// createdAt 기준 7일 후 자동 삭제
void create_ttl_index(mongocxx::collection& collection) {
    auto keys = make_document(kvp("createdAt", 1));
    mongocxx::options::index opts{};
    opts.expire_after(std::chrono::seconds(7 * 24 * 60 * 60));  // 7일

    collection.create_index(keys.view(), opts);
}

TTL 인덱스: MongoDB 백그라운드 작업이 주기적으로 만료된 문서를 삭제합니다. 로그·세션 데이터에 유용합니다.

2.3 텍스트 인덱스

// 전체 텍스트 검색용
void create_text_index(mongocxx::collection& collection) {
    auto keys = make_document(
        kvp("title", "text"),
        kvp("content", "text"));
    mongocxx::options::index opts{};
    opts.default_language("en");

    collection.create_index(keys.view(), opts);
}

// 텍스트 검색 쿼리
void text_search(mongocxx::collection& collection) {
    auto filter = make_document(kvp("$text", make_document(kvp("$search", "mongodb driver"))));
    mongocxx::options::find opts{};
    opts.projection(make_document(kvp("score", make_document(kvp("$meta", "textScore")))).view());

    auto cursor = collection.find(filter.view(), opts);
    for (auto&& doc : cursor) {
        std::cout << bsoncxx::to_json(doc) << std::endl;
    }
}

2.4 인덱스 목록 조회 및 explain

// 인덱스 목록
void list_indexes(mongocxx::collection& collection) {
    auto cursor = collection.list_indexes();
    for (auto&& doc : cursor) {
        std::cout << bsoncxx::to_json(doc) << std::endl;
    }
}

// 쿼리 실행 계획 확인
void explain_query(mongocxx::collection& collection) {
    auto filter = make_document(kvp("userId", "user123"));
    mongocxx::options::find opts{};
    opts.explain(true);  // 또는 run_command로 explain

    auto cursor = collection.find(filter.view(), opts);
    for (auto&& doc : cursor) {
        std::cout << "Plan: " << bsoncxx::to_json(doc) << std::endl;
    }
}

3. 레플리카셋 연결

핵심 개념

레플리카셋은 Primary(쓰기)와 Secondary(읽기) 노드로 구성됩니다. C++ 드라이버는 URI에 여러 호스트를 지정하고, read_preference로 읽기 요청을 어디로 보낼지 제어합니다.

flowchart TB
  subgraph RS[레플리카셋]
    P[Primary\n쓰기]
    S1[Secondary 1]
    S2[Secondary 2]
  end
  App[C++ 앱] -->|쓰기| P
  App -->|읽기: primary| P
  App -->|읽기: secondary| S1
  App -->|읽기: secondary| S2

3.1 레플리카셋 URI

// 복수 호스트 + replicaSet 이름 필수
mongocxx::uri uri("mongodb://host1:27017,host2:27017,host3:27017/?replicaSet=rs0");

// 인증 포함
mongocxx::uri uri("mongodb://user:pass@host1:27017,host2:27017/?replicaSet=rs0&authSource=admin");

// MongoDB Atlas (srv)
mongocxx::uri uri("mongodb+srv://cluster0.xxxxx.mongodb.net/?retryWrites=true&w=majority");

주의: 레플리카셋인데 replicaSet 옵션을 생략하면 “Not a replica set member” 에러가 발생할 수 있습니다.

3.2 Read Preference 설정

#include <mongocxx/options/client.hpp>
#include <mongocxx/read_preference.hpp>

void read_preference_examples(mongocxx::collection& collection) {
    using bsoncxx::builder::basic::kvp;
    using bsoncxx::builder::basic::make_document;

    auto filter = make_document(kvp("status", "active"));

    // 1) Primary에서만 읽기 (기본값, 쓰기 직후 일관성 보장)
    mongocxx::read_preference rp_primary{};
    rp_primary.mode(mongocxx::read_preference::read_mode::k_primary);

    mongocxx::options::find opts_primary{};
    opts_primary.read_preference(rp_primary);
    auto cursor1 = collection.find(filter.view(), opts_primary);

    // 2) Secondary에서 읽기 (읽기 부하 분산)
    mongocxx::read_preference rp_secondary{};
    rp_secondary.mode(mongocxx::read_preference::read_mode::k_secondary);

    mongocxx::options::find opts_secondary{};
    opts_secondary.read_preference(rp_secondary);
    auto cursor2 = collection.find(filter.view(), opts_secondary);

    // 3) SecondaryPreferred: Secondary 있으면 사용, 없으면 Primary
    mongocxx::read_preference rp_sp{};
    rp_sp.mode(mongocxx::read_preference::read_mode::k_secondary_preferred);
    mongocxx::options::find opts_sp{};
    opts_sp.read_preference(rp_sp);
    auto cursor3 = collection.find(filter.view(), opts_sp);

    // 4) Nearest: 지연 시간이 가장 낮은 멤버
    mongocxx::read_preference rp_nearest{};
    rp_nearest.mode(mongocxx::read_preference::read_mode::k_nearest);
    mongocxx::options::find opts_nearest{};
    opts_nearest.read_preference(rp_nearest);
    auto cursor4 = collection.find(filter.view(), opts_nearest);
}

Read Mode 요약:

모드	설명
k_primary	Primary에서만 읽기 (기본)
k_primary_preferred	Primary 우선, 없으면 Secondary
k_secondary	Secondary에서만 읽기
k_secondary_preferred	Secondary 우선, 없으면 Primary
k_nearest	지연 시간 최소 노드

3.3 Write Concern 설정

void write_concern_example(mongocxx::collection& collection) {
    using bsoncxx::builder::basic::kvp;
    using bsoncxx::builder::basic::make_document;

    mongocxx::write_concern wc;
    wc.acknowledge_level(mongocxx::write_concern::level::k_majority);
    wc.timeout(std::chrono::milliseconds{5000});

    mongocxx::options::insert opts{};
    opts.write_concern(wc);

    auto doc = make_document(kvp("event", "test"), kvp("ts", bsoncxx::types::b_date{std::chrono::system_clock::now()}));
    collection.insert_one(doc.view(), opts);
}

Write Concern: k_majority는 다수 노드에 복제될 때까지 대기합니다. 쓰기 내구성은 높아지지만 지연이 늘어납니다.

3.4 레플리카셋 연결 완전 예제

#include <bsoncxx/builder/basic/document.hpp>
#include <bsoncxx/json.hpp>
#include <mongocxx/client.hpp>
#include <mongocxx/instance.hpp>
#include <mongocxx/uri.hpp>
#include <mongocxx/read_preference.hpp>
#include <mongocxx/options/client.hpp>
#include <iostream>

using bsoncxx::builder::basic::kvp;
using bsoncxx::builder::basic::make_document;

int main() {
    mongocxx::instance inst{};

    // 레플리카셋 URI

    std::string uriStr = "mongodb://localhost:27017,localhost:27018,localhost:27019/?replicaSet=rs0";
    mongocxx::uri uri(uriStr);
    mongocxx::client client(uri);

    auto db = client[mydb];
    auto collection = db[events];

    // 쓰기: Primary
    auto doc = make_document(
        kvp("type", "page_view"),
        kvp("userId", "user1"),
        kvp("ts", bsoncxx::types::b_date{std::chrono::system_clock::now()}));
    collection.insert_one(doc.view());

    // 읽기: SecondaryPreferred (부하 분산)
    mongocxx::read_preference rp{};
    rp.mode(mongocxx::read_preference::read_mode::k_secondary_preferred);
    mongocxx::options::find opts{};
    opts.read_preference(rp);

    auto cursor = collection.find(make_document(kvp("userId", "user1")), opts);
    for (auto&& d : cursor) {
        std::cout << bsoncxx::to_json(d) << std::endl;
    }

    return 0;
}

4. 자주 발생하는 에러와 해결법

에러 1: Exceeded memory limit for $group

증상: 집계 파이프라인 실행 중 “Exceeded memory limit of 100MB” 에러

원인: $group·$sort 등이 대용량 데이터를 메모리에서 처리할 때

해결법:

// ✅ allow_disk_use 활성화
mongocxx::options::aggregate opts{};
opts.allow_disk_use(true);
auto cursor = collection.aggregate(stages, opts);

또는 파이프라인을 나누어 $match로 먼저 데이터를 줄이거나, $limit로 배치 처리합니다.

에러 2: Not a replica set member

증상: 레플리카셋 클러스터에 연결 시 “Not a replica set member” 또는 “No primary found”

원인: URI에 replicaSet 이름이 없거나, 단일 인스턴스 URI로 레플리카셋에 연결

해결법:

// ❌ 잘못된 URI (단일 호스트만)
mongocxx::uri uri("mongodb://host1:27017");

// ✅ 레플리카셋 URI
mongocxx::uri uri("mongodb://host1:27017,host2:27017,host3:27017/?replicaSet=rs0");

에러 3: view가 dangling (document::view 수명)

증상: “Using a view of a deleted document” 또는 잘못된 데이터

원인: make_document() 반환값이 스코프를 벗어나 소멸한 뒤, 그 view()를 사용

해결법:

// ❌ 위험한 코드
bsoncxx::document::view getFilter() {
    auto doc = make_document(kvp("name", "test"));
    return doc.view();  // doc 소멸 → view dangling!
}

// ✅ value를 반환
bsoncxx::document::value getFilter() {
    return make_document(kvp("name", "test"));
}
auto filter = getFilter();
collection.find_one(filter.view());

에러 4: $lookup 결과가 비어 있음

증상: $lookup 후 userInfo 배열이 항상 비어 있음

원인: localField와 foreignField 타입 불일치 (예: ObjectId vs string)

해결법:

// 타입이 맞는지 확인. userId가 string이면 users._id도 string이어야 함
// 또는 $lookup에서 $eq 사용
stages.lookup(make_document(
    kvp("from", "users"),
    kvp("let", make_document(kvp("uid", "$userId"))),
    kvp("pipeline", make_array(
        make_document(kvp("$match", make_document(
            kvp("$expr", make_document(kvp("$eq", make_array("$_id", "$$uid")))))))),
    kvp("as", "userInfo")));

에러 5: 인덱스 생성 실패 (Duplicate key)

증상: create_index 시 “Index already exists” 또는 유니크 위반

원인: 인덱스가 이미 있거나, 기존 데이터에 유니크 제약 위반

해결법:

# 기존 인덱스 확인
mongosh --eval "db.collection.getIndexes()"

// 인덱스가 없을 때만 생성 (에러 무시)
try {
    collection.create_index(keys.view(), opts);
} catch (const mongocxx::operation_exception& e) {
    if (e.code().value() != 85) {  // 85: IndexOptionsConflict
        throw;
    }
}

에러 6: Read Preference 적용 안 됨

증상: read_preference를 secondary로 설정했는데도 Primary에서 읽힘

원인: options::find 등에 read_preference를 넣지 않음, 또는 단일 인스턴스(레플리카셋 아님)

해결법:

// ✅ find/aggregate 등에 opts 전달
mongocxx::options::find opts{};
opts.read_preference(rp_secondary);
auto cursor = collection.find(filter.view(), opts);

에러 7: $graphLookup 메모리 초과

증상: $graphLookup 사용 시 “Exceeded memory limit” (allow_disk_use와 무관)

원인: $graphLookup은 100MB 제한이 고정되어 있음

해결법: 재귀 깊이를 줄이거나, 데이터를 나누어 처리하는 방식으로 쿼리를 재설계합니다.

5. 성능 최적화 팁

팁 1: 집계 전 $match로 데이터 축소

// ✅ $match를 파이프라인 앞쪽에 배치
stages.match(make_document(kvp("createdAt", make_document(
    kvp("$gte", startDate),
    kvp("$lte", endDate)))))
    .group(...);

이유: $match가 먼저 실행되면 $group에 들어가는 문서 수가 줄어듭니다.

팁 2: 프로젝션으로 필드 제한

// 집계 결과에서 불필요한 필드 제외
stages.project(make_document(
    kvp("_id", 0),
    kvp("date", "$_id"),
    kvp("count", 1)));

팁 3: 복합 인덱스 순서는 쿼리 패턴에 맞게

// 쿼리: userId로 필터, createdAt으로 정렬
// 인덱스: (userId, createdAt)
auto idx = make_document(kvp("userId", 1), kvp("createdAt", -1));
collection.create_index(idx.view());

팁 4: 커서로 대용량 집계 결과 처리

// ❌ 전체를 vector에 담기
std::vector<bsoncxx::document::value> all;
auto cursor = collection.aggregate(stages);
for (auto&& doc : cursor) {
    all.push_back(bsoncxx::document::value(doc));  // 메모리 폭증
}

// ✅ 스트리밍 처리
auto cursor = collection.aggregate(stages);
for (auto&& doc : cursor) {
    processResult(doc);
}

// 한 번의 aggregate로 여러 통계 계산 → 왕복 절감
stages.facet(make_document(
    kvp("stats1", make_array(...)),
    kvp("stats2", make_array(...))));

성능 비교 (참고)

방식	왕복	메모리
find 여러 번	N	낮음
aggregate 1회	1	중간
aggregate + allow_disk_use	1	디스크 사용
$facet 여러 집계	1	중간

6. 프로덕션 패턴

패턴 1: 집계 + Read Preference 조합

class AnalyticsService {
public:
    AnalyticsService(mongocxx::client& client) : client_(client) {
        collection_ = client_[analytics][events];
    }

    mongocxx::cursor runDailyStats() {
        mongocxx::pipeline stages;
        stages.match(make_document(kvp("type", "daily")))
            .group(make_document(
                kvp("_id", "$date"),
                kvp("total", make_document(kvp("$sum", "$amount")))));

        mongocxx::options::aggregate opts{};
        opts.allow_disk_use(true);
        mongocxx::read_preference rp{};
        rp.mode(mongocxx::read_preference::read_mode::k_secondary_preferred);
        opts.read_preference(rp);

        return collection_.aggregate(stages, opts);
    }

private:
    mongocxx::client& client_;
    mongocxx::collection collection_;
};

패턴 2: 인덱스 자동 생성 (앱 기동 시)

void ensureIndexes(mongocxx::database db) {
    auto collection = db[events];
    std::vector<std::pair<bsoncxx::document::value, mongocxx::options::index>> indexes;

    indexes.push_back({make_document(kvp("userId", 1), kvp("createdAt", -1)), {}});
    indexes.push_back({make_document(kvp("type", 1), kvp("date", 1)), {}});

    for (auto& [keys, opts] : indexes) {
        try {
            collection.create_index(keys.view(), opts);
        } catch (const mongocxx::operation_exception& e) {
            if (e.code().value() != 85) throw;
        }
    }
}

패턴 3: 집계 결과 캐싱

// 집계 결과를 별도 컬렉션에 저장해 주기적으로 갱신
void cacheAggregationResult(mongocxx::database db) {
    auto events = db[events];
    auto cache = db[daily_stats_cache];

    mongocxx::pipeline stages;
    stages.match(make_document(kvp("createdAt", make_document(
        kvp("$gte", bsoncxx::types::b_date{std::chrono::system_clock::now() - std::chrono::hours(24)})))))
        .group(make_document(
            kvp("_id", make_document(kvp("$dateToString", make_document(
                kvp("format", "%Y-%m-%d"),
                kvp("date", "$createdAt")))),
            kvp("count", make_document(kvp("$sum", 1)))));

    cache.delete_many({});  // 기존 캐시 삭제
    // $out으로 결과를 다른 컬렉션에 저장 가능 (aggregate 옵션)
}

패턴 4: 재시도 + Read Preference

template<typename Func>
auto withRetry(Func&& f, int maxRetries = 3) -> decltype(f()) {
    for (int i = 0; i < maxRetries; ++i) {
        try {
            return f();
        } catch (const mongocxx::exception& e) {
            if (i == maxRetries - 1) throw;
            std::this_thread::sleep_for(std::chrono::milliseconds(100 * (1 << i)));
        }
    }
    throw std::runtime_error("Unreachable");
}

// 사용: Secondary 읽기 실패 시 Primary로 폴백
auto result = withRetry([&]() {
    mongocxx::options::find opts{};
    opts.read_preference(mongocxx::read_preference::read_mode::k_secondary_preferred);
    return collection.find_one(filter.view(), opts);
});

패턴 5: 설정 외부화

struct MongoConfig {
    std::string uri = "mongodb://localhost:27017";
    std::string readPreference = "secondaryPreferred";
    bool allowDiskUse = true;
};

MongoConfig loadFromEnv() {
    MongoConfig c;
    if (const char* u = std::getenv("MONGODB_URI")) c.uri = u;
    if (const char* rp = std::getenv("MONGODB_READ_PREFERENCE")) c.readPreference = rp;
    return c;
}

7. 구현 체크리스트

집계

$match를 파이프라인 앞쪽에 배치
대용량 시 allow_disk_use(true) 설정
document::value 수명 관리 (view dangling 방지)
$facet으로 여러 집계를 한 번에 수행할지 검토

인덱싱

쿼리 패턴에 맞는 복합 인덱스 순서
TTL 인덱스 (로그·세션 등)
explain()으로 실행 계획 확인
중복 인덱스 생성 시 에러 처리

레플리카셋

URI에 replicaSet 이름 포함
읽기 부하 분산 시 read_preference 설정
쓰기 내구성 필요 시 write_concern 설정
Secondary 읽기 시 eventual consistency 고려

에러 처리

집계 메모리 초과 시 allow_disk_use 또는 파이프라인 분할
레플리카셋 연결 실패 시 URI 확인
$lookup 타입 불일치 확인

프로덕션

인덱스 자동 생성 (앱 기동 시)
설정 외부화 (환경 변수)
재시도 정책
집계 결과 캐싱 검토

문제 시나리오 해결 요약

문제	MongoDB C++ 해결 방법
일별·시간별 집계	`$match` + `$group` + `$dateToString`
여러 컬렉션 조인	`$lookup`
집계 메모리 초과	`allow_disk_use(true)` 또는 파이프라인 분할
find 느림	복합 인덱스 생성, explain으로 실행 계획 확인
로그 자동 삭제	TTL 인덱스
읽기 부하 분산	레플리카셋 + `read_preference::k_secondary_preferred`
Not a replica set member	URI에 `replicaSet=이름` 추가
view dangling	`document::value` 반환, 스코프 내에서 view 사용

정리

항목	요약
집계	`$match` + `$group` + `$lookup` + `$sort` + `$project`, `allow_disk_use`
인덱싱	단일·복합·TTL·텍스트 인덱스, 쿼리 패턴에 맞는 순서
레플리카셋	URI에 replicaSet, read_preference, write_concern
에러	메모리 초과, view 수명, replicaSet URI
성능	$match 앞쪽 배치, 프로젝션, 인덱스, 커서 스트리밍
프로덕션	인덱스 자동 생성, 설정 외부화, 재시도, 캐싱

핵심 원칙:

집계는 $match로 먼저 데이터를 줄이고, 대용량 시 allow_disk_use 사용
인덱스는 쿼리 패턴에 맞게 복합 인덱스 순서 결정
레플리카셋은 URI에 replicaSet 필수, 읽기 분산 시 read_preference 설정
document::value 수명에 주의, view dangling 방지

자주 묻는 질문 (FAQ)

Q. 집계와 find의 차이는?

A. find는 필터·정렬·프로젝션만 가능합니다. group, sum, avg, join 같은 집계 연산은 aggregate 파이프라인으로 해야 합니다.

Q. 레플리카셋 없이 read_preference를 설정하면?

A. 단일 인스턴스에서는 Primary만 있으므로, secondary 모드여도 Primary에서 읽습니다. 에러는 나지 않지만 분산 효과는 없습니다.

Q. $lookup 성능이 느려요.

A. foreignField에 인덱스가 있는지 확인하세요. from 컬렉션에 적절한 인덱스가 없으면 Nested Loop Join으로 느려집니다.

Q. 인덱스는 언제 만들면 좋나요?

A. 컬렉션 생성 후, 애플리케이션 기동 시 한 번만 생성하는 것이 일반적입니다. 이미 대량 데이터가 있으면 인덱스 생성에 시간이 걸릴 수 있습니다.

한 줄 요약: 집계 파이프라인·인덱싱·레플리카셋을 활용해 C++에서 MongoDB 고급 기능을 실전에 적용할 수 있습니다.

다음 글: C++ PostgreSQL 드라이버(#52-4)

이전 글: C++ MongoDB 완벽 가이드(#52-3)

참고 자료

C++ MongoDB 실전 완벽 가이드 | mongocxx CRUD·집계·인덱싱·레플리카셋·프로덕션

이 글의 핵심

들어가며: 집계·인덱싱·레플리카셋이 필요한 순간

실제 겪는 문제 시나리오

목차

1. 집계 파이프라인

핵심 개념

1.1 기본 집계: $match + $group

1.2 $lookup으로 컬렉션 조인

1.3 $match + $group + $sort + $limit 완전 예제

1.4 $facet으로 여러 집계를 한 번에

2. 인덱싱 전략

핵심 개념

2.1 단일·복합 인덱스 생성

2.2 TTL 인덱스 (자동 만료)

2.3 텍스트 인덱스

2.4 인덱스 목록 조회 및 explain

3. 레플리카셋 연결

핵심 개념

3.1 레플리카셋 URI

3.2 Read Preference 설정

3.3 Write Concern 설정

3.4 레플리카셋 연결 완전 예제

4. 자주 발생하는 에러와 해결법

에러 1: Exceeded memory limit for $group

에러 2: Not a replica set member

에러 3: view가 dangling (document::view 수명)

에러 4: $lookup 결과가 비어 있음

에러 5: 인덱스 생성 실패 (Duplicate key)

에러 6: Read Preference 적용 안 됨

에러 7: $graphLookup 메모리 초과

5. 성능 최적화 팁

팁 1: 집계 전 $match로 데이터 축소

팁 2: 프로젝션으로 필드 제한

팁 3: 복합 인덱스 순서는 쿼리 패턴에 맞게

팁 4: 커서로 대용량 집계 결과 처리

팁 5: $facet으로 여러 집계를 한 번에

성능 비교 (참고)

6. 프로덕션 패턴

패턴 1: 집계 + Read Preference 조합

패턴 2: 인덱스 자동 생성 (앱 기동 시)

패턴 3: 집계 결과 캐싱

패턴 4: 재시도 + Read Preference

패턴 5: 설정 외부화

7. 구현 체크리스트

집계

인덱싱

레플리카셋

에러 처리

프로덕션

문제 시나리오 해결 요약

정리

자주 묻는 질문 (FAQ)

Q. 집계와 find의 차이는?

Q. 레플리카셋 없이 read_preference를 설정하면?

Q. $lookup 성능이 느려요.

Q. 인덱스는 언제 만들면 좋나요?

참고 자료

관련 글