MongoDB Schema Design: Embedded vs Referenced Documents
이 글의 핵심
Choose embedded vs referenced collections using document size, read/write patterns, 1:N growth, and consistency—plus buckets, partial updates, and the 16MB limit.
Introduction
MongoDB is “schemaless,” but production services benefit from an explicit schema. Embedded vs referenced design is the core fork: store child data inside the parent document (embedded) or in another collection linked by id (referenced). Embedded documents can end reads in one round trip; references control growth and reuse of documents. This article replaces gut feel with measurable criteria: size, frequency, consistency.
After reading this post
- You can apply a checklist for embedded vs referenced designs
- You get hints for 1:N, N:M, i18n/versioning variants
- You understand the 16MB document limit and index interactions
Table of contents
- Concepts
- Hands-on implementation
- Advanced usage
- Performance comparison
- Real-world cases
- Troubleshooting
- Conclusion
Concepts
Embedded
Related data lives in one BSON document as nested arrays or sub-documents.
// User + addresses (embedded example)
{
_id: ObjectId("..."),
name: "Kim",
addresses: [
{ label: "home", city: "Seoul", zip: "03000" },
{ label: "work", city: "Seongnam", zip: "13000" }
]
}
Pros: If you almost always read together, one query finishes the job. Updates in the same document can leverage atomic updates.
Referenced
Separate collections linked by ObjectId (or similar).
// users
{ _id: ObjectId("u1"), name: "Kim" }
// addresses
{ _id: ObjectId("a1"), userId: ObjectId("u1"), city: "Seoul" }
Pros: Child collections can grow large without bloating the parent. Fits models where many parents share the same child.
BSON document size
A single document is capped at 16MB. Data that could grow without bound should start as separate collections + indexes.
Hands-on implementation
Decision checklist
| Question | Favors embedded | Favors referenced |
|---|---|---|
| Read together? | Almost always | Rarely or partial reads |
| Updated together? | Often at once | Frequently independent |
| Bounded child count? | Small (e.g. a few addresses) | Thousands/unbounded |
| Shared child across parents? | Rarely | Often |
| Strong cross-doc integrity? | If it fits one document | Multiple collections + app logic/transactions |
Pattern 1: 1:few — embedded
If comments are always shown with a post and counts are limited (or page-sized), consider embedding.
Pattern 2: 1:many (large) — referenced + indexes
// posts collection
db.posts.createIndex({ slug: 1 }, { unique: true });
// comments collection
db.comments.createIndex({ postId: 1, createdAt: -1 });
db.comments.find({ postId: postObjectId }).sort({ createdAt: -1 }).limit(50);
Pattern 3: Middle ground — buckets
For time-series logs, group readings per day to reduce document count.
{
sensorId: "s1",
day: ISODate("2026-03-30T00:00:00Z"),
readings: [
{ t: ISODate("2026-03-30T00:05:00Z"), v: 23.1 },
// ....batched for the day (design an upper bound)
]
}
Embedded vs referenced often lands on embedded-but-bucketed compromises.
$lookup (join)
For occasional joins on referenced models, aggregation $lookup works—but for hot paths, prefer two reads in application code or caching.
Advanced usage
- i18n fields:
{ title: { ko: "...", en: "..." } }vs locale-split documents—match your translation workflow. - Partial updates: Use array filters and positional operators to reduce write conflicts on embedded arrays.
- Schema validation: Enforce required fields/types with validator and JSON Schema for operational safety.
Performance comparison
| Aspect | Embedded | Referenced |
|---|---|---|
| Read latency (common path) | Single find | find + find or $lookup |
| Write contention | Hotspot if many writers to one doc | Can spread writes |
| Indexes | Nested field indexes—more complex | Per-collection, often simpler |
| Consistency | Single-doc atomic updates | Application logic + transactions |
Real-world cases
- E-commerce orders: Header + dozens of line items often embedded; product master referenced.
- Social feeds: Post referenced; like counts denormalized with eventual consistency via events.
- B2B multi-tenant: Include tenantId in every query with a leading index.
Troubleshooting
| Symptom | Cause / fix |
|---|---|
| Document near 16MB | Split arrays, archive collections, buckets |
Slow update on large embedded arrays | Split documents or move to references |
| Broken referential integrity | App validation + transactions if needed; explicit delete policies (soft delete) |
| Index used but still slow | Working set size, projection minimization, covering indexes |
Conclusion
MongoDB embedded vs referenced design is not “anything goes because NoSQL”—it is locking read patterns and growth boundaries into code. Compare with relational transaction/join trade-offs using the PostgreSQL vs MySQL guide, apply this checklist first, then validate with load tests.
자주 묻는 질문 (FAQ)
Q. 이 내용을 실무에서 언제 쓰나요?
A. Choose embedded vs referenced collections using document size, read/write patterns, 1:N growth, and consistency—plus buc… 실무에서는 위 본문의 예제와 선택 가이드를 참고해 적용하면 됩니다.
Q. 선행으로 읽으면 좋은 글은?
A. 각 글 하단의 이전 글 또는 관련 글 링크를 따라가면 순서대로 배울 수 있습니다. C++ 시리즈 목차에서 전체 흐름을 확인할 수 있습니다.
Q. 더 깊이 공부하려면?
A. cppreference와 해당 라이브러리 공식 문서를 참고하세요. 글 말미의 참고 자료 링크도 활용하면 좋습니다.
같이 보면 좋은 글 (내부 링크)
이 주제와 연결되는 다른 글입니다.
- PostgreSQL vs MySQL 차이와 선택 가이드 | 스키마·트랜잭션·운영
- TypeORM vs Prisma 비교 | 타입 안전성·마이그레이션·쿼리·성능 실전 가이드
- Node.js 데이터베이스 연동 | MongoDB, PostgreSQL, MySQL
이 글에서 다루는 키워드 (관련 검색어)
MongoDB, NoSQL, Schema Design, Embedded, Referenced, Data Modeling 등으로 검색하시면 이 글이 도움이 됩니다.