Does MongoDB need normalization?

Yes, explicitly. Instead of SQL joins you choose **application joins** or **controlled duplication** as a trade-off.

When does the 16MB limit bite?

Arrays that can grow without bound (e.g. every event in one document) are risky—cap, archive, or reference out.

What about transactions?

Multi-document transactions exist (4.0+), but **shrinking transactional scope in design** is still often cheaper.

MongoDB Schema Design: Embedded vs Referenced Documents

2026년 3월 30일 · 18분 읽기 · 수정 2026년 3월 30일 Intermediate Guide

이 글의 핵심

Choose embedded vs referenced collections using document size, read/write patterns, 1:N growth, and consistency—plus buckets, partial updates, and the 16MB limit.

Introduction

MongoDB is “schemaless,” but production services benefit from an explicit schema. Embedded vs referenced design is the core fork: store child data inside the parent document (embedded) or in another collection linked by id (referenced). Embedded documents can end reads in one round trip; references control growth and reuse of documents. This article replaces gut feel with measurable criteria: size, frequency, consistency.

After reading this post

You can apply a checklist for embedded vs referenced designs
You get hints for 1:N, N:M, i18n/versioning variants
You understand the 16MB document limit and index interactions

Concepts
Hands-on implementation
Advanced usage
Performance comparison
Real-world cases
Troubleshooting
Conclusion

Concepts

Embedded

Related data lives in one BSON document as nested arrays or sub-documents.

// User + addresses (embedded example)
{
  _id: ObjectId("..."),
  name: "Kim",
  addresses: [
    { label: "home", city: "Seoul", zip: "03000" },
    { label: "work", city: "Seongnam", zip: "13000" }
  ]
}

Pros: If you almost always read together, one query finishes the job. Updates in the same document can leverage atomic updates.

Referenced

Separate collections linked by ObjectId (or similar).

// users
{ _id: ObjectId("u1"), name: "Kim" }
// addresses
{ _id: ObjectId("a1"), userId: ObjectId("u1"), city: "Seoul" }

Pros: Child collections can grow large without bloating the parent. Fits models where many parents share the same child.

BSON document size

A single document is capped at 16MB. Data that could grow without bound should start as separate collections + indexes.

Hands-on implementation

Decision checklist

Question	Favors embedded	Favors referenced
Read together?	Almost always	Rarely or partial reads
Updated together?	Often at once	Frequently independent
Bounded child count?	Small (e.g. a few addresses)	Thousands/unbounded
Shared child across parents?	Rarely	Often
Strong cross-doc integrity?	If it fits one document	Multiple collections + app logic/transactions

Pattern 1: 1:few — embedded

If comments are always shown with a post and counts are limited (or page-sized), consider embedding.

Pattern 2: 1:many (large) — referenced + indexes

// posts collection
db.posts.createIndex({ slug: 1 }, { unique: true });
// comments collection
db.comments.createIndex({ postId: 1, createdAt: -1 });

db.comments.find({ postId: postObjectId }).sort({ createdAt: -1 }).limit(50);

Pattern 3: Middle ground — buckets

For time-series logs, group readings per day to reduce document count.

{
  sensorId: "s1",
  day: ISODate("2026-03-30T00:00:00Z"),
  readings: [
    { t: ISODate("2026-03-30T00:05:00Z"), v: 23.1 },
    // ....batched for the day (design an upper bound)
  ]
}

Embedded vs referenced often lands on embedded-but-bucketed compromises.

`$lookup` (join)

For occasional joins on referenced models, aggregation $lookup works—but for hot paths, prefer two reads in application code or caching.

Advanced usage

i18n fields: { title: { ko: "...", en: "..." } } vs locale-split documents—match your translation workflow.
Partial updates: Use array filters and positional operators to reduce write conflicts on embedded arrays.
Schema validation: Enforce required fields/types with validator and JSON Schema for operational safety.

Performance comparison

Aspect	Embedded	Referenced
Read latency (common path)	Single `find`	`find` + `find` or `$lookup`
Write contention	Hotspot if many writers to one doc	Can spread writes
Indexes	Nested field indexes—more complex	Per-collection, often simpler
Consistency	Single-doc atomic updates	Application logic + transactions

Real-world cases

E-commerce orders: Header + dozens of line items often embedded; product master referenced.
Social feeds: Post referenced; like counts denormalized with eventual consistency via events.
B2B multi-tenant: Include tenantId in every query with a leading index.

Troubleshooting

Symptom	Cause / fix
Document near 16MB	Split arrays, archive collections, buckets
Slow `update` on large embedded arrays	Split documents or move to references
Broken referential integrity	App validation + transactions if needed; explicit delete policies (soft delete)
Index used but still slow	Working set size, projection minimization, covering indexes

Conclusion

MongoDB embedded vs referenced design is not “anything goes because NoSQL”—it is locking read patterns and growth boundaries into code. Compare with relational transaction/join trade-offs using the PostgreSQL vs MySQL guide, apply this checklist first, then validate with load tests.

자주 묻는 질문 (FAQ)

Q. 이 내용을 실무에서 언제 쓰나요?

A. Choose embedded vs referenced collections using document size, read/write patterns, 1:N growth, and consistency—plus buc… 실무에서는 위 본문의 예제와 선택 가이드를 참고해 적용하면 됩니다.

Q. 선행으로 읽으면 좋은 글은?

A. 각 글 하단의 이전 글 또는 관련 글 링크를 따라가면 순서대로 배울 수 있습니다. C++ 시리즈 목차에서 전체 흐름을 확인할 수 있습니다.

Q. 더 깊이 공부하려면?

A. cppreference와 해당 라이브러리 공식 문서를 참고하세요. 글 말미의 참고 자료 링크도 활용하면 좋습니다.

같이 보면 좋은 글 (내부 링크)

이 주제와 연결되는 다른 글입니다.

이 글에서 다루는 키워드 (관련 검색어)

MongoDB, NoSQL, Schema Design, Embedded, Referenced, Data Modeling 등으로 검색하시면 이 글이 도움이 됩니다.

이 글이 도움이 되셨나요?

여러분의 피드백은 더 나은 콘텐츠를 만드는 데 도움이 됩니다

문제가 있거나 개선 제안이 있으시면 연락처로 알려주세요.

Keyboard Shortcuts

MongoDB Schema Design: Embedded vs Referenced Documents

이 글의 핵심

Introduction

After reading this post

Table of contents

Concepts

Embedded

Referenced

BSON document size

A single document is capped at 16MB. Data that could grow without bound should start as separate collections + indexes.

Hands-on implementation

Decision checklist

Pattern 1: 1:few — embedded

Pattern 2: 1:many (large) — referenced + indexes

Pattern 3: Middle ground — buckets

`$lookup` (join)

For occasional joins on referenced models, aggregation $lookup works—but for hot paths, prefer two reads in application code or caching.

Advanced usage

Performance comparison

Real-world cases

Troubleshooting

Conclusion

자주 묻는 질문 (FAQ)

Q. 이 내용을 실무에서 언제 쓰나요?

Q. 선행으로 읽으면 좋은 글은?

Q. 더 깊이 공부하려면?

같이 보면 좋은 글 (내부 링크)

이 글에서 다루는 키워드 (관련 검색어)

이 글이 도움이 되셨나요?

Keyboard Shortcuts

이 글의 핵심

Introduction

After reading this post

Table of contents

Concepts

Embedded

Referenced

BSON document size

A single document is capped at 16MB. Data that could grow without bound should start as separate collections + indexes.

Hands-on implementation

Decision checklist

Pattern 1: 1:few — embedded

Pattern 2: 1:many (large) — referenced + indexes

Pattern 3: Middle ground — buckets

$lookup (join)

For occasional joins on referenced models, aggregation $lookup works—but for hot paths, prefer two reads in application code or caching.

Advanced usage

Performance comparison

Real-world cases

Troubleshooting

Conclusion

자주 묻는 질문 (FAQ)

Q. 이 내용을 실무에서 언제 쓰나요?

Q. 선행으로 읽으면 좋은 글은?

Q. 더 깊이 공부하려면?

같이 보면 좋은 글 (내부 링크)

이 글에서 다루는 키워드 (관련 검색어)

이 글이 도움이 되셨나요?

`$lookup` (join)