← Back to Blog
Backend2024-11-0320 min read

Microservices Architecture: Hard Lessons from Production

Real-world lessons from building and scaling microservices. Learn what works, what doesn't, and how to avoid common pitfalls.

MD
Manoj Dhiman
Microservices Architecture: Hard Lessons from Production

Microservices Architecture: Hard Lessons from Production

Everyone wants to build microservices. The promise is compelling: independent deployments, technology flexibility, and infinite scalability. The reality? It's complicated. After five years of building and maintaining microservices in production, here are the hard lessons I've learned.

The Monolith vs Microservices Debate

Controversial Opinion: Most companies should start with a monolith.

Microservices solve scaling problems you probably don't have yet. They introduce complexity that can kill early-stage companies. Start with a well-structured monolith and extract services when you have:

  • Clear domain boundaries
  • Team scaling problems
  • Specific scaling requirements
  • Resources to handle operational complexity

When Microservices Make Sense

  1. Different scaling requirements: Image processing needs different resources than user management
  2. Team organization: Multiple teams working on different domains
  3. Technology diversity: Some services benefit from specific technologies
  4. Independent deployments: Need to deploy parts of the system independently

Architecture Patterns

1. API Gateway Pattern

Don't expose microservices directly. Use an API gateway:

// API Gateway
app.get('/api/users/:id', async (req, res) => {
  // Aggregate data from multiple services
  const [user, orders, preferences] = await Promise.all([
    userService.getUser(req.params.id),
    orderService.getUserOrders(req.params.id),
    preferencesService.getUserPreferences(req.params.id),
  ])
  
  res.json({ user, orders, preferences })
})

Benefits:

  • Single entry point
  • Request aggregation
  • Authentication/authorization
  • Rate limiting
  • Caching

2. Event-Driven Communication

Use events for cross-service communication:

// Order Service
await orderRepository.createOrder(order)
await eventBus.publish('order.created', {
  orderId: order.id,
  userId: order.userId,
  total: order.total,
})

// Email Service (separate service)
eventBus.subscribe('order.created', async (event) => {
  await sendOrderConfirmation(event.userId, event.orderId)
})

// Analytics Service (separate service)
eventBus.subscribe('order.created', async (event) => {
  await trackOrderMetrics(event)
})

Why Events?

  • Loose coupling
  • Easy to add new consumers
  • Natural audit log
  • Supports eventual consistency

3. Saga Pattern for Distributed Transactions

class OrderSaga {
  async execute(order: Order) {
    try {
      // Step 1: Reserve inventory
      const reservation = await inventoryService.reserve(order.items)
      
      // Step 2: Process payment
      const payment = await paymentService.charge(order.total)
      
      // Step 3: Create order
      const confirmedOrder = await orderService.create(order)
      
      return confirmedOrder
    } catch (error) {
      // Compensating transactions
      if (payment) await paymentService.refund(payment.id)
      if (reservation) await inventoryService.release(reservation.id)
      throw error
    }
  }
}

Hard Lessons Learned

Lesson 1: Distributed Systems Are Hard

Problem: Network calls fail. Services go down. Data gets inconsistent.

Solution:

  • Implement circuit breakers
  • Use timeouts and retries
  • Accept eventual consistency
  • Build for failure
// Circuit Breaker Implementation
class CircuitBreaker {
  private failures = 0
  private lastFailure?: Date
  private state: 'closed' | 'open' | 'half-open' = 'closed'
  
  async call<T>(fn: () => Promise<T>): Promise<T> {
    if (this.state === 'open') {
      if (Date.now() - this.lastFailure!.getTime() > 60000) {
        this.state = 'half-open'
      } else {
        throw new Error('Circuit breaker is open')
      }
    }
    
    try {
      const result = await fn()
      this.onSuccess()
      return result
    } catch (error) {
      this.onFailure()
      throw error
    }
  }
  
  private onSuccess() {
    this.failures = 0
    this.state = 'closed'
  }
  
  private onFailure() {
    this.failures++
    this.lastFailure = new Date()
    if (this.failures >= 5) {
      this.state = 'open'
    }
  }
}

Lesson 2: Observability is Critical

Problem: Debugging across 20 services is impossible without proper tooling.

Solution: Invest heavily in observability:

  1. Distributed Tracing: See requests flow across services
  2. Centralized Logging: Aggregate logs from all services
  3. Metrics & Dashboards: Monitor service health
  4. Alerting: Know when things break
// Distributed Tracing with OpenTelemetry
import { trace } from '@opentelemetry/api'

async function processOrder(orderId: string) {
  const tracer = trace.getTracer('order-service')
  const span = tracer.startSpan('process-order')
  
  try {
    span.setAttribute('order.id', orderId)
    
    // This span will be part of distributed trace
    await inventoryService.checkStock(orderId)
    await paymentService.process(orderId)
    
    span.setStatus({ code: 0 }) // Success
  } catch (error) {
    span.setStatus({ code: 2, message: error.message })
    throw error
  } finally {
    span.end()
  }
}

Lesson 3: Service Boundaries Are Hard

Problem: Wrong boundaries lead to tight coupling and distributed monoliths.

Solution: Follow Domain-Driven Design:

  • Services should own complete domains
  • Minimize cross-service communication
  • Duplicate data when necessary
  • Accept some denormalization

Bad:

UserService -> OrderService -> PaymentService -> InventoryService
(Cascade of synchronous calls)

Good:

OrderService has:
  - Own user data (email, name)
  - Own inventory data (reserved items)
  - Publishes events
  - Eventually consistent

Lesson 4: Database Per Service is Complex

Problem: Each service needs its own database, but you need to join data.

Solution:

  1. Use API composition at gateway
  2. Replicate data via events
  3. Use CQRS for complex queries
  4. Accept data duplication
// CQRS: Separate read and write models
// Write Model (Order Service)
class OrderService {
  async createOrder(order: Order) {
    await this.db.orders.create(order)
    await this.eventBus.publish('order.created', order)
  }
}

// Read Model (Query Service)
class OrderQueryService {
  private denormalizedOrders: Map<string, EnrichedOrder>
  
  constructor() {
    // Subscribe to events from multiple services
    eventBus.subscribe('order.created', this.handleOrderCreated)
    eventBus.subscribe('user.updated', this.handleUserUpdated)
    eventBus.subscribe('product.updated', this.handleProductUpdated)
  }
  
  // Maintain denormalized view
  async handleOrderCreated(event: OrderCreatedEvent) {
    const user = await this.userCache.get(event.userId)
    const products = await this.productCache.getMany(event.productIds)
    
    this.denormalizedOrders.set(event.orderId, {
      ...event,
      userName: user.name,
      productNames: products.map(p => p.name),
    })
  }
  
  // Fast queries from denormalized data
  async getOrderWithDetails(orderId: string) {
    return this.denormalizedOrders.get(orderId)
  }
}

Lesson 5: Versioning is Essential

Problem: Can't update all services simultaneously. Need backward compatibility.

Solution: API versioning strategy:

// URL Versioning
app.get('/api/v1/users/:id', v1.getUser)
app.get('/api/v2/users/:id', v2.getUser)

// Header Versioning
app.get('/api/users/:id', (req, res) => {
  const version = req.headers['api-version'] || 'v1'
  
  if (version === 'v2') {
    return v2.getUser(req, res)
  }
  return v1.getUser(req, res)
})

// Event Versioning
interface OrderCreatedV1 {
  version: 1
  orderId: string
  userId: string
}

interface OrderCreatedV2 {
  version: 2
  orderId: string
  userId: string
  metadata: Record<string, unknown>
}

type OrderCreated = OrderCreatedV1 | OrderCreatedV2

eventBus.subscribe('order.created', (event: OrderCreated) => {
  if (event.version === 1) {
    // Handle V1
  } else {
    // Handle V2
  }
})

Operational Complexity

Deployment Strategy

Use blue-green or canary deployments:

# Kubernetes Canary Deployment
apiVersion: v1
kind: Service
metadata:
  name: user-service
spec:
  selector:
    app: user-service
  ports:
    - port: 80

---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: user-service-v1
spec:
  replicas: 9
  selector:
    matchLabels:
      app: user-service
      version: v1

---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: user-service-v2
spec:
  replicas: 1  # 10% traffic
  selector:
    matchLabels:
      app: user-service
      version: v2

Health Checks

app.get('/health', async (req, res) => {
  const checks = await Promise.all([
    checkDatabase(),
    checkRedis(),
    checkDependentServices(),
  ])
  
  const healthy = checks.every(c => c.healthy)
  
  res.status(healthy ? 200 : 503).json({
    status: healthy ? 'healthy' : 'unhealthy',
    checks,
    timestamp: new Date(),
  })
})

Cost Considerations

Reality Check: Microservices are expensive:

  • More infrastructure
  • More operational overhead
  • More development time
  • More monitoring tools

Our costs:

  • Monolith: $500/month, 2 developers
  • Microservices: $5000/month, 8 developers

Is it worth it? Depends on your scale and team size.

When to Split a Monolith

Signs it's time:

  1. Team stepping on each other's toes
  2. Deployment takes hours and is risky
  3. Different parts have different scaling needs
  4. Part of the system needs different technology

Don't split because:

  1. "Microservices are cool"
  2. "Netflix does it"
  3. Your system has modules
  4. You want to learn

Conclusion

Microservices are a tool, not a goal. They solve specific problems but introduce new ones. The key questions:

  1. Do you have organizational scaling problems?
  2. Do you have technical scaling requirements?
  3. Can you handle the operational complexity?
  4. Do you have the resources?

If you answered yes to all four, microservices might be right for you. Otherwise, build a well-structured monolith and split it later when you actually need to.

Remember: Distributed systems are hard. Really hard. Make sure the benefits outweigh the costs.

What's your experience with microservices? I'd love to hear your lessons learned!

#Microservices#Architecture#Distributed Systems#Backend#System Design