Microservices Architecture Guide | Design, Patterns, and Pitfalls
이 글의 핵심
Microservices solve specific scaling and team-autonomy problems — but introduce distributed systems complexity. This guide covers architecture patterns, communication strategies, and the honest trade-offs you need to weigh before committing.
Microservices vs Monolith: The Honest Trade-off
Before choosing microservices, understand what you’re trading:
| Monolith | Microservices | |
|---|---|---|
| Development speed | Fast initially | Slower initially |
| Deployment | Simple, one unit | Complex, many units |
| Scaling | Scale everything | Scale independently |
| Team autonomy | Low | High |
| Debugging | Easy (local) | Hard (distributed) |
| Data consistency | Easy (single DB) | Hard (distributed transactions) |
| Operational overhead | Low | High |
Rule of thumb: Start with a monolith. Extract services when you have a specific, validated reason (team bottleneck, scaling requirement, technology mismatch) — not because microservices sound modern.
1. Service Decomposition
Domain-Driven Design (DDD) approach
Split services along bounded contexts — areas of the domain with their own language and models:
E-commerce system
├── user-service ← registration, auth, profiles
├── product-service ← catalog, inventory, pricing
├── order-service ← cart, orders, order history
├── payment-service ← payment processing, refunds
├── notification-service ← email, SMS, push
└── shipping-service ← shipping, tracking, returns
Signs a service boundary is wrong:
- Services constantly need to call each other to complete one operation
- You always deploy multiple services together
- A change in one service always requires a change in another
Size heuristics
- Too small: a service that only wraps a database table with CRUD
- Too large: a service that a team can’t understand in a day
- Right: a service owned by one team, deployable independently
2. Communication Patterns
Synchronous (REST / gRPC)
Client → API Gateway → Order Service → [sync call] → Inventory Service
→ [sync call] → Payment Service
Good for: queries, reads, user-facing requests that need immediate response.
// Order service calling payment service
async function processOrder(order) {
// Synchronous HTTP call
const paymentResult = await fetch('http://payment-service/charge', {
method: 'POST',
body: JSON.stringify({ amount: order.total, userId: order.userId }),
}).then(r => r.json())
if (!paymentResult.success) throw new Error('Payment failed')
return updateOrderStatus(order.id, 'confirmed')
}
Problem: if Payment Service is down, Order Service fails too — cascading failures.
Asynchronous (Events / Message Queue)
Order Service → [event: OrderPlaced] → Kafka/RabbitMQ
↓
Payment Service (consumes)
Inventory Service (consumes)
Notification Service (consumes)
Good for: writes, workflows, processes that don’t need an immediate response.
// Order service publishes an event
await kafka.publish('orders', {
event: 'OrderPlaced',
orderId: order.id,
userId: order.userId,
items: order.items,
total: order.total,
})
// Returns immediately — doesn't wait for payment/inventory
// Payment service subscribes
kafka.subscribe('orders', async (message) => {
if (message.event === 'OrderPlaced') {
const result = await chargeCustomer(message.userId, message.total)
await kafka.publish('payments', {
event: result.success ? 'PaymentSucceeded' : 'PaymentFailed',
orderId: message.orderId,
})
}
})
3. API Gateway
The gateway is the single entry point — handles cross-cutting concerns so services don’t have to:
Client
↓
API Gateway
├── Auth (JWT validation)
├── Rate limiting
├── SSL termination
├── Request routing
├── Load balancing
└── Request/response transformation
↓
├── /api/users/* → user-service
├── /api/products/* → product-service
└── /api/orders/* → order-service
Popular options: Kong, Nginx, AWS API Gateway, Traefik, Envoy.
# Kong route example
services:
- name: user-service
url: http://user-service:3001
routes:
- name: users-route
paths: ["/api/users"]
methods: ["GET", "POST", "PUT", "DELETE"]
plugins:
- name: jwt # Auth
- name: rate-limiting
config:
minute: 100
4. Service Discovery
Services need to find each other without hardcoded IPs. In Kubernetes, this is built-in via DNS:
# Kubernetes Service — DNS: user-service.default.svc.cluster.local
apiVersion: v1
kind: Service
metadata:
name: user-service
spec:
selector:
app: user-service
ports:
- port: 80
targetPort: 3001
Services call each other by name:
const user = await fetch('http://user-service/users/123').then(r => r.json())
Outside Kubernetes: use Consul or AWS Cloud Map for service registry.
5. The Saga Pattern (Distributed Transactions)
When an operation spans multiple services, you can’t use a database transaction. Use the Saga pattern:
Choreography (event-driven)
OrderService → OrderCreated event
↓
InventoryService reserves stock → StockReserved event
↓
PaymentService charges card → PaymentProcessed event
↓
ShippingService creates shipment
If payment fails: PaymentFailed event → InventoryService releases stock → OrderService marks order failed.
// Each service listens and reacts
eventBus.on('PaymentFailed', async ({ orderId }) => {
await releaseReservedStock(orderId)
await eventBus.emit('StockReleased', { orderId })
})
Orchestration (central coordinator)
A saga orchestrator directs each step:
class OrderSaga {
async execute(order) {
try {
await this.reserveStock(order)
await this.processPayment(order)
await this.scheduleShipping(order)
await this.completeOrder(order)
} catch (error) {
await this.compensate(order, error.failedStep)
}
}
async compensate(order, failedStep) {
if (failedStep === 'payment') await this.releaseStock(order)
if (failedStep === 'shipping') {
await this.refundPayment(order)
await this.releaseStock(order)
}
await this.cancelOrder(order)
}
}
6. Resilience Patterns
Circuit Breaker
Stop cascading failures when a downstream service is unhealthy:
import CircuitBreaker from 'opossum'
const options = {
timeout: 3000, // fail if takes > 3s
errorThresholdPercentage: 50, // open circuit if 50% fail
resetTimeout: 30000, // try again after 30s
}
const breaker = new CircuitBreaker(callPaymentService, options)
breaker.fallback(() => ({ status: 'payment-pending', retry: true }))
const result = await breaker.fire(paymentRequest)
States: Closed (normal) → Open (failing, use fallback) → Half-Open (testing recovery).
Retry with exponential backoff
async function fetchWithRetry(url, maxRetries = 3) {
for (let attempt = 1; attempt <= maxRetries; attempt++) {
try {
return await fetch(url)
} catch (err) {
if (attempt === maxRetries) throw err
const delay = Math.pow(2, attempt) * 100 + Math.random() * 100
await new Promise(r => setTimeout(r, delay))
}
}
}
7. Distributed Tracing
With many services, debugging requires tracing a request across all of them:
// OpenTelemetry setup (works with Jaeger, Zipkin, Tempo)
import { NodeTracerProvider } from '@opentelemetry/sdk-node'
import { JaegerExporter } from '@opentelemetry/exporter-jaeger'
const provider = new NodeTracerProvider()
provider.addSpanProcessor(
new SimpleSpanProcessor(new JaegerExporter({ endpoint: 'http://jaeger:14268/api/traces' }))
)
provider.register()
// Create spans
const tracer = trace.getTracer('order-service')
const span = tracer.startSpan('process-order')
span.setAttribute('order.id', orderId)
try {
await processOrder(orderId)
span.setStatus({ code: SpanStatusCode.OK })
} catch (err) {
span.setStatus({ code: SpanStatusCode.ERROR, message: err.message })
} finally {
span.end()
}
8. Database Per Service
Each service owns its own database — never share databases between services:
user-service → PostgreSQL (users DB)
product-service → MongoDB (products DB)
order-service → PostgreSQL (orders DB)
session-service → Redis (sessions)
search-service → Elasticsearch (search index)
Benefits: services can use the best database for their needs, schema changes don’t affect other services, independent scaling.
Challenge: cross-service queries require API calls or event-driven data synchronization.
9. Health Checks and Observability
Every service should expose:
// Health check endpoint
app.get('/health', (req, res) => {
res.json({
status: 'healthy',
version: process.env.APP_VERSION,
uptime: process.uptime(),
timestamp: new Date().toISOString(),
})
})
// Readiness check (dependencies OK?)
app.get('/ready', async (req, res) => {
try {
await db.ping()
res.json({ status: 'ready' })
} catch {
res.status(503).json({ status: 'not ready', reason: 'db unreachable' })
}
})
Use Prometheus + Grafana for metrics, ELK stack or Loki for logs, Jaeger for traces.
Key Takeaways
| Pattern | When to use |
|---|---|
| Synchronous (REST/gRPC) | Reads, user-facing queries |
| Async (events/queues) | Writes, workflows, decoupled processes |
| API Gateway | Single entry point, auth, rate limiting |
| Circuit breaker | Protect against cascading failures |
| Saga | Distributed transactions |
| Distributed tracing | Debug cross-service request flows |
| DB per service | Always — never share a database |
Microservices are a team scaling solution, not a technical one. Before adopting them: make sure your monolith is well-structured, your team is large enough to own services independently, and you have the operational maturity to run distributed systems (monitoring, tracing, deployment pipelines). Done right, microservices enable organizational agility. Done wrong, they’re a distributed monolith with network calls instead of function calls.