MongoDB Schema Design: Embedded vs Referenced Documents | Modeling Guide
이 글의 핵심
Model MongoDB with measurable rules—co-location vs references, bucket patterns, indexes, and when multi-document transactions still cost more than a good schema.
Introduction
MongoDB is “schemaless,” but production services benefit from an explicit schema. Embedded vs referenced design is the core fork: store child data inside the parent document (embedded) or in another collection linked by id (referenced).
Embedded documents can end reads in one round trip; references control growth and reuse of documents. This article replaces gut feel with measurable criteria: size, frequency, consistency.
After reading this post
- You can apply a checklist for embedded vs referenced designs
- You get hints for 1:N, N:M, i18n/versioning variants
- You understand the 16MB document limit and index interactions
Table of contents
- Concepts
- Hands-on implementation
- Advanced usage
- Performance comparison
- Real-world cases
- Troubleshooting
- Conclusion
Concepts
Embedded
Related data lives in one BSON document as nested arrays or sub-documents.
// User + addresses (embedded example)
{
_id: ObjectId("..."),
name: "Kim",
addresses: [
{ label: "home", city: "Seoul", zip: "03000" },
{ label: "work", city: "Seongnam", zip: "13000" }
]
}
Pros: If you almost always read together, one query finishes the job. Updates in the same document can leverage atomic updates.
Referenced
Separate collections linked by ObjectId (or similar).
// users
{ _id: ObjectId("u1"), name: "Kim" }
// addresses
{ _id: ObjectId("a1"), userId: ObjectId("u1"), city: "Seoul" }
Pros: Child collections can grow large without bloating the parent. Fits models where many parents share the same child.
BSON document size
A single document is capped at 16MB. Data that could grow without bound should start as separate collections + indexes.
Hands-on implementation
Decision checklist
| Question | Favors embedded | Favors referenced |
|---|---|---|
| Read together? | Almost always | Rarely or partial reads |
| Updated together? | Often at once | Frequently independent |
| Bounded child count? | Small (e.g. a few addresses) | Thousands/unbounded |
| Shared child across parents? | Rarely | Often |
| Strong cross-doc integrity? | If it fits one document | Multiple collections + app logic/transactions |
Pattern 1: 1:few — embedded
If comments are always shown with a post and counts are limited (or page-sized), consider embedding.
Pattern 2: 1:many (large) — referenced + indexes
// posts collection
db.posts.createIndex({ slug: 1 }, { unique: true });
// comments collection
db.comments.createIndex({ postId: 1, createdAt: -1 });
db.comments.find({ postId: postObjectId }).sort({ createdAt: -1 }).limit(50);
Pattern 3: Middle ground — buckets
For time-series logs, group readings per day to reduce document count.
{
sensorId: "s1",
day: ISODate("2026-03-30T00:00:00Z"),
readings: [
{ t: ISODate("2026-03-30T00:05:00Z"), v: 23.1 },
// ... batched for the day (design an upper bound)
]
}
Embedded vs referenced often lands on embedded-but-bucketed compromises.
$lookup (join)
For occasional joins on referenced models, aggregation $lookup works—but for hot paths, prefer two reads in application code or caching.
Advanced usage
- i18n fields:
{ title: { ko: "...", en: "..." } }vs locale-split documents—match your translation workflow. - Partial updates: Use array filters and positional operators to reduce write conflicts on embedded arrays.
- Schema validation: Enforce required fields/types with
validatorand JSON Schema for operational safety.
Performance comparison
| Aspect | Embedded | Referenced |
|---|---|---|
| Read latency (common path) | Single find | find + find or $lookup |
| Write contention | Hotspot if many writers to one doc | Can spread writes |
| Indexes | Nested field indexes—more complex | Per-collection, often simpler |
| Consistency | Single-doc atomic updates | Application logic + transactions |
Real-world cases
- E-commerce orders: Header + dozens of line items often embedded; product master referenced.
- Social feeds: Post referenced; like counts denormalized with eventual consistency via events.
- B2B multi-tenant: Include
tenantIdin every query with a leading index.
Troubleshooting
| Symptom | Cause / fix |
|---|---|
| Document near 16MB | Split arrays, archive collections, buckets |
Slow update on large embedded arrays | Split documents or move to references |
| Broken referential integrity | App validation + transactions if needed; explicit delete policies (soft delete) |
| Index used but still slow | Working set size, projection minimization, covering indexes |
Conclusion
MongoDB embedded vs referenced design is not “anything goes because NoSQL”—it is locking read patterns and growth boundaries into code. Compare with relational transaction/join trade-offs using the PostgreSQL vs MySQL guide, apply this checklist first, then validate with load tests.