본문으로 건너뛰기 Astro Content Collections Advanced Guide — Schema, Type Safety, i18n, Dynamic Routing

Astro Content Collections Advanced Guide — Schema, Type Safety, i18n, Dynamic Routing

Astro Content Collections Advanced Guide — Schema, Type Safety, i18n, Dynamic Routing

이 글의 핵심

Content Collections are Astro’s core feature for handling Markdown and MDX in a type-safe way. This post covers Zod schema design, using the CollectionEntry type, locale and originalId–based i18n, [...slug] dynamic routing, client search and server-side filtering, build and runtime optimization, and practical patterns for blogs and documentation sites.

What this post covers

Astro Content Collections treat static content as first-class data in your codebase. Markdown and MDX on disk are validated by schema, TypeScript infers frontmatter fields, and getCollection and getEntry expose a single consistent API. Beyond introductory tutorials, this article ties together type design, i18n, routing, search, and performance concerns you hit in production blogs and documentation sites.

You can expect:

  • How collections, entries, and references relate, and how they behave at build time
  • Validation strategies with Zod schemas and using the CollectionEntry generic
  • Patterns for translation pairs using metadata such as locale and originalId
  • [...slug], nested folders, and integration with getStaticPaths
  • Filtering via listings, tags, full-text search, and JSON APIs
  • Optimization from the perspective of images, MDX, and partial compilation
  • Common blog and documentation site architectures in practice

1. Core concepts of Content Collections

1.1 Why collections

You can build a blog with Markdown alone. As posts grow into the tens or hundreds, issues accumulate: missing frontmatter, mismatched field names, invalid date formats. Content Collections catch these at build time, reducing runtime bugs that only surface after deploy.

In short, collections provide:

  • A single source of truth: files under src/content/<name>/ are the data source
  • Schema validation: Zod for field types, defaults, and coercion rules
  • Type inference: CollectionEntry from astro:content powers autocomplete
  • Loader abstraction: glob and similar loaders declare directory patterns and extensions declaratively

1.2 Collections, entries, and references

  • Collection: a logical group, e.g. blog, docs, changelog.
  • Entry: corresponds to one file in a collection. It has id, data (frontmatter), body (content), and more.
  • Reference: when one entry points to another, the schema can link them with reference() (availability depends on version and project setup).

Because content.config.ts location and loader syntax vary by Astro major version, align with the official “Content Collections” section for your version.

1.3 Build time vs runtime

In SSG, getCollection runs at build time to emit HTML and JSON. Filtering and sorting—your “DB queries”—mostly cost one build. Searching full post bodies in the browser trades payload size and initial load; section 6 splits search strategies.


2. Schema definition and type inference

2.1 The role of Zod schemas

A Zod schema is not just static typing—it is a runtime validator. It rejects invalid date strings, empty tag arrays, disallowed level values, and more at build time.

Design recommendations:

  • Minimize required fields; use .optional() and .default() elsewhere to ease migrating older posts.
  • For dates, use z.coerce.date() so both strings and Date values are accepted and small frontmatter mistakes are absorbed.
  • Use z.enum() for enumerations so UI and filters share the same value set.
  • For images, schema: ({ image }) => ... with the image() helper integrates cleanly with the optimization pipeline.

2.2 Example content.config.ts in Astro 5 style

Below is a minimal blog collection read with the glob loader and validated with Zod. Extend fields to match your team’s conventions.

// content.config.ts
import { defineCollection, z } from 'astro:content';
import { glob } from 'astro/loaders';

const blog = defineCollection({
  loader: glob({ base: './src/content/blog', pattern: '**/*.{md,mdx}' }),
  schema: ({ image }) =>
    z.object({
      title: z.string(),
      description: z.string(),
      pubDate: z.coerce.date(),
      updatedDate: z.coerce.date().optional(),
      heroImage: image().optional(),
      tags: z.array(z.string()).default([]),
      draft: z.boolean().default(false),
      locale: z.enum(['ko', 'en']).default('ko'),
      originalId: z.string().optional(),
      level: z.enum(['초급', '중급', '고급']).optional(),
      category: z.string().optional(),
      readingMinutes: z.number().optional(),
    }),
});

export const collections = { blog };

With this schema, TypeScript understands that data.title and similar fields always exist on each item from getCollection('blog'). Missing title in frontmatter becomes a build failure immediately.

2.3 CollectionEntry and utility types

When you work with types explicitly, use CollectionEntry:

import type { CollectionEntry } from 'astro:content';

type BlogEntry = CollectionEntry<'blog'>;

function summarize(post: BlogEntry): string {
  return post.data.description;
}

BlogEntry['data'] aligns with the frontmatter type inferred from the Zod schema. That is especially useful when centralizing shared helpers (date formatting, reading time, OG tags).

2.4 Schema evolution and migration

When adding fields, prefer this order:

  1. Add to Zod as optional or with a default so all existing files pass.
  2. Fill in content gradually.
  3. When stable, promote to required if needed.

Making a field required in one step can force edits across hundreds of Markdown files. Gradual tightening hurts less on large blogs.


3. Managing multilingual content

3.1 Folder structure strategies

Three common patterns:

ApproachProsCaveats
Filename suffix (post-en.md)Clear slug separationSame-topic mapping must be managed in metadata
Subdirectories (en/, ko/)Per-locale operations separatedNeeds glob patterns and URL design
Single file + field (locale: en)Fewer filesRules needed when slugs would collide

With locale and originalId in the schema (as on this site), translations can point to the source so lists and language switcher UIs are easier to build.

3.2 Linking translation pairs with originalId

For example, with a Korean source my-post.md and English my-post-en.md, set originalId: 'my-post' on the English post so the UI can render a “View in Korean” link. Slug generation must match your project’s [...slug].astro rules.

3.3 Query patterns

To load one language only, filter in getCollection:

import { getCollection } from 'astro:content';

const koPosts = await getCollection('blog', ({ data }) => data.locale === 'ko' && !data.draft);

On English-only routes (/en/blog/...), apply locale === 'en' the same way. Document team rules so routes, filters, and schema locale values stay aligned.


4. Dynamic routing integration

4.1 [...slug] and static path generation

Blog detail pages are often src/pages/blog/[...slug].astro. In getStaticPaths, read the collection and emit a path for every entry:

import { getCollection } from 'astro:content';

export async function getStaticPaths() {
  const posts = await getCollection('blog', ({ data }) => !data.draft);
  return posts.map((post) => ({
    params: { slug: post.id },
    props: { post },
  }));
}

post.id’s string shape depends on the loader and file path. Lock URL-friendly id rules early (extension stripping, nested paths allowed or not).

4.2 Custom slug field

If frontmatter slug overrides the URL, params.slug in getStaticPaths should prefer that field. Useful when the file path contains awkward segments (e.g. reserved words in some environments).

4.3 Nested documentation trees

Documentation sites often use deep trees like docs/getting-started/install.md. If post.id is path-shaped, you can build sidebar navigation from the same id. Parent–child ordering can use series, order, or similar fields in the schema.


5. Search and filtering

5.1 Build-time filtering

For listing pages by tag, category, or difficulty, narrow with the second argument to getCollection:

const frontend = await getCollection(
  'blog',
  (e) => e.data.category === 'frontend' && !e.data.draft
);

Sort with sort (e.g. pubDate descending). Extracting one shared sort helper keeps UX consistent across lists.

5.2 Client search via a JSON API

To search full post bodies in the browser, a common pattern is emitting posts.json at build time with summaries and snippets. In Astro, an endpoint like src/pages/api/blog/posts.json.ts can call getCollection and return JSON.stringify (for static hosting, this becomes part of the build output).

On the client:

  • Simple substring search: easy to implement; limited for morphologically rich languages
  • Lightweight indexes (Fuse.js, MiniSearch): good for mid-sized sites
  • Pagefind: post-build indexing for static search; often paired with Astro

For very large docs, consider external search (Algolia, Typesense, etc.).

5.3 Tags and RSS

Tag pages can flatMap tags after getCollection, or precompute a tag index in a build script. When building RSS with the rss package, reuse the same collection queries as listings so lists, feeds, and sitemaps do not drift.


6. Performance optimization

6.1 Avoid loading full bodies when unnecessary

List cards often need only title, description, and date. getCollection returns entry metadata by default; on MDX-heavy projects, some teams split a lightweight listing collection (e.g. blogMeta vs blog).

6.2 Images

Using the image() schema and getImage simplifies optimization and responsive srcset. For posts with large hero images, watch LCP and tune loading, decoding, and priority hints.

6.3 Scope of MDX components

Globally registering heavy components in MDX can bloat bundles. Inject only what each doc or section needs, or keep posts that do not need MDX as .md to reduce parse cost.

6.4 Incremental builds and CI

In monorepos or large content repos, caching matters. When only content changes, avoid unnecessarily long full pipelines—consider content-hash–based caches in CI.


7. Practical blog and documentation patterns

7.1 Blogs

  • Consistent frontmatter: treat title, description, pubDate, tags, draft as near-required
  • Series: series, seriesOrder for prev/next navigation
  • Related posts: tag overlap scores or manual relatedPosts curation
  • Canonical and hreflang: reduce duplicate-search issues across locales with meta tags

7.2 Documentation sites

  • Version directories: docs/v1/, docs/v2/ with a version field in the schema
  • Page types: type: 'guide' | 'api' | 'changelog' to branch layouts
  • Table of contents: parse headings for ToC (remark plugin or MDX exports)

7.3 Alongside a CMS

Content Collections fit Git-centric workflows; non-developer editing may be better with a headless CMS. A hybrid CMS → Markdown at build time keeps schema validation while improving the editing experience.


8. getEntry and single-entry patterns

When you need one slug rather than a full list, getEntry is clearer—for example prev/next navigation when you only know neighbor ids.

import { getEntry } from 'astro:content';

const prev = await getEntry('blog', 'some-slug');
if (prev) {
  const { Content } = await prev.render();
}

getEntry behaves like undefined when the id does not exist (check version-specific signatures)—always branch. On multi-collection sites, extracting collection names as constants avoids typos.


9. Rendering pipeline and remark/rehype

Markdown and MDX pass through remark plugins (syntax extensions, math, code highlighting) and rehype plugins (HTML transforms, heading ids) before final output. Register plugins in Astro config under markdown or the MDX integration so they apply to all collection content.

Practical tips:

  • Slug-safe heading ids: many non-ASCII titles can cause id collisions or URL issues—team rules for rehype-slug alternatives or prefixes help.
  • External link rel: plugins like rehype-external-links can enforce noopener for security and SEO.
  • Code blocks: limiting Shiki themes and language lists can shorten builds (unsupported languages may fall back to plaintext with warnings).

To use a different pipeline only for some collection content, split blog as .md and docs as .mdx, or add metadata via MDX export const.


Shipping full post bodies to the client for search becomes costly as content grows. Pagefind indexes built HTML and pairs well with Astro. The flow is roughly:

  1. Run astro build to generate HTML.
  2. Run the Pagefind CLI against dist to build the index.
  3. Wire Pagefind’s JS UI or API into a search page.

This gets close to full-text search without a server, without a database or paid SaaS. Downsides: one more build step and possibly large index sizes. For thousands of posts, consider sharding or per-category indexes.


11. Troubleshooting checklist

  • Build fails with “schema error”: check frontmatter field names and types on recently added Markdown.
  • Types look like any: ensure content.config.ts is saved and collection name strings have no typos.
  • 404 on paths: suspect mismatch between getStaticPaths slug and file id.
  • Broken translation links: verify originalId matches filename slug rules.

12. Summary

Content Collections are Astro’s central way to treat content as data. Zod schemas add safety, CollectionEntry expresses domain logic, and metadata like locale, slug, and series fields let you scale blogs and docs. Balance build-time aggregation with client indexes for search and filtering; tie together images, MDX, and API design for perceived performance.

If this repo already defines a blog collection in content.config.ts, new posts only need frontmatter that passes the schema to flow into types, listings, RSS, and tag pages. Treat schema fields like a public API and keep migration notes when they change—that stays cheapest long term.


Appendix: minimal page example (detail render)

To render an entry’s body, use render (works for MDX and Markdown):

---
import { getCollection } from 'astro:content';

const posts = await getCollection('blog', ({ data }) => !data.draft);
const post = posts[0];
const { Content } = await post.render();
---

<article>
  <h1>{post.data.title}</h1>
  <Content />
</article>

Real sites often wrap this in a BlogPost.astro layout with layout, ToC, ad slots, and SEO components. That pattern helps consistent markup and unified Core Web Vitals measurement points.