Microservices Architecture Guide | Design, Patterns, and Pitfalls

Microservices Architecture Guide | Design, Patterns, and Pitfalls

이 글의 핵심

Microservices solve specific scaling and team-autonomy problems — but introduce distributed systems complexity. This guide covers architecture patterns, communication strategies, and the honest trade-offs you need to weigh before committing.

Microservices vs Monolith: The Honest Trade-off

Before choosing microservices, understand what you’re trading:

MonolithMicroservices
Development speedFast initiallySlower initially
DeploymentSimple, one unitComplex, many units
ScalingScale everythingScale independently
Team autonomyLowHigh
DebuggingEasy (local)Hard (distributed)
Data consistencyEasy (single DB)Hard (distributed transactions)
Operational overheadLowHigh

Rule of thumb: Start with a monolith. Extract services when you have a specific, validated reason (team bottleneck, scaling requirement, technology mismatch) — not because microservices sound modern.


1. Service Decomposition

Domain-Driven Design (DDD) approach

Split services along bounded contexts — areas of the domain with their own language and models:

E-commerce system
├── user-service       ← registration, auth, profiles
├── product-service    ← catalog, inventory, pricing
├── order-service      ← cart, orders, order history
├── payment-service    ← payment processing, refunds
├── notification-service ← email, SMS, push
└── shipping-service   ← shipping, tracking, returns

Signs a service boundary is wrong:

  • Services constantly need to call each other to complete one operation
  • You always deploy multiple services together
  • A change in one service always requires a change in another

Size heuristics

  • Too small: a service that only wraps a database table with CRUD
  • Too large: a service that a team can’t understand in a day
  • Right: a service owned by one team, deployable independently

2. Communication Patterns

Synchronous (REST / gRPC)

Client → API Gateway → Order Service → [sync call] → Inventory Service
                                     → [sync call] → Payment Service

Good for: queries, reads, user-facing requests that need immediate response.

// Order service calling payment service
async function processOrder(order) {
  // Synchronous HTTP call
  const paymentResult = await fetch('http://payment-service/charge', {
    method: 'POST',
    body: JSON.stringify({ amount: order.total, userId: order.userId }),
  }).then(r => r.json())

  if (!paymentResult.success) throw new Error('Payment failed')
  return updateOrderStatus(order.id, 'confirmed')
}

Problem: if Payment Service is down, Order Service fails too — cascading failures.

Asynchronous (Events / Message Queue)

Order Service → [event: OrderPlaced] → Kafka/RabbitMQ

                           Payment Service (consumes)
                           Inventory Service (consumes)
                           Notification Service (consumes)

Good for: writes, workflows, processes that don’t need an immediate response.

// Order service publishes an event
await kafka.publish('orders', {
  event: 'OrderPlaced',
  orderId: order.id,
  userId: order.userId,
  items: order.items,
  total: order.total,
})
// Returns immediately — doesn't wait for payment/inventory

// Payment service subscribes
kafka.subscribe('orders', async (message) => {
  if (message.event === 'OrderPlaced') {
    const result = await chargeCustomer(message.userId, message.total)
    await kafka.publish('payments', {
      event: result.success ? 'PaymentSucceeded' : 'PaymentFailed',
      orderId: message.orderId,
    })
  }
})

3. API Gateway

The gateway is the single entry point — handles cross-cutting concerns so services don’t have to:

Client

API Gateway
  ├── Auth (JWT validation)
  ├── Rate limiting
  ├── SSL termination
  ├── Request routing
  ├── Load balancing
  └── Request/response transformation

  ├── /api/users/*     → user-service
  ├── /api/products/*  → product-service
  └── /api/orders/*    → order-service

Popular options: Kong, Nginx, AWS API Gateway, Traefik, Envoy.

# Kong route example
services:
  - name: user-service
    url: http://user-service:3001
    routes:
      - name: users-route
        paths: ["/api/users"]
        methods: ["GET", "POST", "PUT", "DELETE"]

plugins:
  - name: jwt          # Auth
  - name: rate-limiting
    config:
      minute: 100

4. Service Discovery

Services need to find each other without hardcoded IPs. In Kubernetes, this is built-in via DNS:

# Kubernetes Service — DNS: user-service.default.svc.cluster.local
apiVersion: v1
kind: Service
metadata:
  name: user-service
spec:
  selector:
    app: user-service
  ports:
    - port: 80
      targetPort: 3001

Services call each other by name:

const user = await fetch('http://user-service/users/123').then(r => r.json())

Outside Kubernetes: use Consul or AWS Cloud Map for service registry.


5. The Saga Pattern (Distributed Transactions)

When an operation spans multiple services, you can’t use a database transaction. Use the Saga pattern:

Choreography (event-driven)

OrderService → OrderCreated event

        InventoryService reserves stock → StockReserved event

                                  PaymentService charges card → PaymentProcessed event

                                                       ShippingService creates shipment

If payment fails: PaymentFailed event → InventoryService releases stock → OrderService marks order failed.

// Each service listens and reacts
eventBus.on('PaymentFailed', async ({ orderId }) => {
  await releaseReservedStock(orderId)
  await eventBus.emit('StockReleased', { orderId })
})

Orchestration (central coordinator)

A saga orchestrator directs each step:

class OrderSaga {
  async execute(order) {
    try {
      await this.reserveStock(order)
      await this.processPayment(order)
      await this.scheduleShipping(order)
      await this.completeOrder(order)
    } catch (error) {
      await this.compensate(order, error.failedStep)
    }
  }

  async compensate(order, failedStep) {
    if (failedStep === 'payment') await this.releaseStock(order)
    if (failedStep === 'shipping') {
      await this.refundPayment(order)
      await this.releaseStock(order)
    }
    await this.cancelOrder(order)
  }
}

6. Resilience Patterns

Circuit Breaker

Stop cascading failures when a downstream service is unhealthy:

import CircuitBreaker from 'opossum'

const options = {
  timeout: 3000,           // fail if takes > 3s
  errorThresholdPercentage: 50,  // open circuit if 50% fail
  resetTimeout: 30000,     // try again after 30s
}

const breaker = new CircuitBreaker(callPaymentService, options)

breaker.fallback(() => ({ status: 'payment-pending', retry: true }))

const result = await breaker.fire(paymentRequest)

States: Closed (normal) → Open (failing, use fallback) → Half-Open (testing recovery).

Retry with exponential backoff

async function fetchWithRetry(url, maxRetries = 3) {
  for (let attempt = 1; attempt <= maxRetries; attempt++) {
    try {
      return await fetch(url)
    } catch (err) {
      if (attempt === maxRetries) throw err
      const delay = Math.pow(2, attempt) * 100 + Math.random() * 100
      await new Promise(r => setTimeout(r, delay))
    }
  }
}

7. Distributed Tracing

With many services, debugging requires tracing a request across all of them:

// OpenTelemetry setup (works with Jaeger, Zipkin, Tempo)
import { NodeTracerProvider } from '@opentelemetry/sdk-node'
import { JaegerExporter } from '@opentelemetry/exporter-jaeger'

const provider = new NodeTracerProvider()
provider.addSpanProcessor(
  new SimpleSpanProcessor(new JaegerExporter({ endpoint: 'http://jaeger:14268/api/traces' }))
)
provider.register()

// Create spans
const tracer = trace.getTracer('order-service')

const span = tracer.startSpan('process-order')
span.setAttribute('order.id', orderId)
try {
  await processOrder(orderId)
  span.setStatus({ code: SpanStatusCode.OK })
} catch (err) {
  span.setStatus({ code: SpanStatusCode.ERROR, message: err.message })
} finally {
  span.end()
}

8. Database Per Service

Each service owns its own database — never share databases between services:

user-service    → PostgreSQL (users DB)
product-service → MongoDB (products DB)
order-service   → PostgreSQL (orders DB)
session-service → Redis (sessions)
search-service  → Elasticsearch (search index)

Benefits: services can use the best database for their needs, schema changes don’t affect other services, independent scaling.

Challenge: cross-service queries require API calls or event-driven data synchronization.


9. Health Checks and Observability

Every service should expose:

// Health check endpoint
app.get('/health', (req, res) => {
  res.json({
    status: 'healthy',
    version: process.env.APP_VERSION,
    uptime: process.uptime(),
    timestamp: new Date().toISOString(),
  })
})

// Readiness check (dependencies OK?)
app.get('/ready', async (req, res) => {
  try {
    await db.ping()
    res.json({ status: 'ready' })
  } catch {
    res.status(503).json({ status: 'not ready', reason: 'db unreachable' })
  }
})

Use Prometheus + Grafana for metrics, ELK stack or Loki for logs, Jaeger for traces.


Key Takeaways

PatternWhen to use
Synchronous (REST/gRPC)Reads, user-facing queries
Async (events/queues)Writes, workflows, decoupled processes
API GatewaySingle entry point, auth, rate limiting
Circuit breakerProtect against cascading failures
SagaDistributed transactions
Distributed tracingDebug cross-service request flows
DB per serviceAlways — never share a database

Microservices are a team scaling solution, not a technical one. Before adopting them: make sure your monolith is well-structured, your team is large enough to own services independently, and you have the operational maturity to run distributed systems (monitoring, tracing, deployment pipelines). Done right, microservices enable organizational agility. Done wrong, they’re a distributed monolith with network calls instead of function calls.