System Design: Distributed Transactions — 2PC, Saga Pattern, and Event Sourcing

Series System Design Interview Prep — #14 of 15

2PC was invented for
distributed databases
in the 1970s. It works,
but the coordinator
crash problem is why
modern microservices
almost universally
prefer Sagas instead.

An e-commerce checkout looks simple from the outside: click Buy, get a confirmation email. Under the hood, that single user action touches four completely independent microservices — Payment, Inventory, Shipment, and Notification — each with its own database, its own failure modes, and zero awareness of what the others are doing. Keeping them consistent is one of the hardest unsolved problems in distributed systems.

The question: An e-commerce order involves charging the payment, reserving inventory, creating a shipment record, and sending a confirmation email — across 4 microservices with 4 separate databases. How do you ensure all succeed or all roll back?

1. The Problem with Distributed State

In a single relational database, ACID guarantees are free:

Atomicity — either all rows update or none do
Consistency — constraints are always valid
Isolation — concurrent transactions don’t see each other’s partial writes
Durability — committed data survives crashes

Across microservices, none of these guarantees exist. Each service owns its own database. There is no shared transaction manager. The network between services is unreliable. A service can crash at any moment.

The failure scenario is concrete: Payment service charges the customer’s card ✓, then the Inventory service crashes mid-reservation. The customer is charged. No order is placed. Support tickets flood in.

The core tension: You need atomicity across boundaries that were explicitly designed to be independent. Every solution is a trade-off between consistency, availability, latency, and complexity.

2. Level 1 — The Naïve Approach (Wrong Answer)

The first instinct is to just call services sequentially:

python

def place_order(order):
    # Step 1: charge payment
    payment_result = payment_service.charge(order.user_id, order.amount)

    # Step 2: reserve inventory  ← what if this crashes?
    inventory_result = inventory_service.reserve(order.items)

    # Step 3: create shipment  ← what if THIS crashes?
    shipment_result = shipment_service.create(order)

    # Step 4: send confirmation email
    email_service.send_confirmation(order.user_id, order.id)

    return "order placed"

There is no error handling. There is no rollback. If inventory_service.reserve() raises a network timeout, the payment is gone and no one knows. If shipment_service.create() fails after inventory is reserved, you now have reserved stock for an order that doesn’t exist.

This is unacceptable in any financial system. In a real interview, naming this anti-pattern early and immediately moving to solutions is a strong signal.

3. Level 2 — Two-Phase Commit (2PC)

2PC introduces a coordinator that orchestrates a two-round protocol across all participants:

Phase 1 — Prepare: The coordinator asks every participant: “Can you commit this transaction?” Each participant locks its resources, writes to a redo log, and votes YES or NO. Crucially, a YES vote is a promise — the participant guarantees it can commit if asked.

Phase 2 — Commit or Rollback: If every participant voted YES, the coordinator sends COMMIT to all. If any voted NO (or timed out), the coordinator sends ROLLBACK to all. Only once the coordinator gets acknowledgements from everyone does the transaction complete.

Interactive 2PC Visualizer

Why 2PC Fails in Practice

The fatal flaw is the coordinator crash problem. Once a participant votes YES in Phase 1, it is locked — it cannot unilaterally commit or rollback. It must wait for Phase 2. If the coordinator crashes after Phase 1 but before Phase 2, every participant holds its locks forever. The system deadlocks until someone manually recovers the coordinator’s log and replays Phase 2.

In a microservices environment with dozens of services, this is catastrophic. 2PC also requires all participants to be available simultaneously — one slow service blocks everyone. It is effectively incompatible with the independent deployability and failure isolation that microservices promise.

When 2PC is still used: Within a single database cluster (PostgreSQL distributed transactions), between two databases you fully control, or in systems where strong consistency is non-negotiable and the coordinator can recover quickly (XA transactions). For cross-team microservices: almost never.

4. Level 3 — The Saga Pattern

A Saga breaks the distributed transaction into a sequence of local transactions, each of which publishes an event or message when it completes. If any step fails, the Saga executes compensating transactions — undo operations — for every step that already succeeded.

The key insight: compensating transactions are not rollbacks. Rollbacks are atomic and invisible. Compensations are new transactions that undo the effect of previous ones. Other services may have already seen the intermediate state. This means Sagas provide eventual consistency, not ACID isolation.

Choreography vs Orchestration

Choreography: Services react to events. No central controller. Payment service emits PaymentCharged → Inventory service listens and reserves stock → emits InventoryReserved → Shipment service creates record → etc. Simple to set up, hard to trace and debug.

Orchestration: A dedicated Saga Orchestrator sends explicit commands to each service and tracks the state machine. When a step fails, the orchestrator issues compensating commands in reverse order. More complex upfront, much easier to observe and reason about.

Interactive Saga Flow

Compensating Transactions Are Not Free

Every forward step must have a pre-designed compensating action. RefundPayment needs to call the payment provider’s refund API. ReleaseInventory needs to decrement the reservation counter. Compensations can also fail — which requires their own retry logic and dead-letter queues. The distributed systems complexity doesn’t disappear; it moves into the compensation design.

python

class OrderSagaOrchestrator:
    def execute(self, order):
        completed = []

        steps = [
            ("create_order",   self.create_order,   self.cancel_order),
            ("charge_payment",  self.charge_payment,  self.refund_payment),
            ("reserve_inv",     self.reserve_inv,     self.release_inv),
            ("create_shipment", self.create_shipment, self.cancel_shipment),
            ("send_email",      self.send_email,      None),
        ]

        for name, forward, compensate in steps:
            try:
                forward(order)
                completed.append((name, compensate))
            except Exception as e:
                # run compensating transactions in reverse
                for _, comp in reversed(completed):
                    if comp:
                        comp(order)  # must be idempotent
                raise SagaFailedError(name, e)

5. Level 4 — The Outbox Pattern

Sagas rely on events being reliably published to a message bus (Kafka, RabbitMQ). Here lies a subtle trap: the dual-write problem.

When a service handles a command, it needs to do two things atomically:

Write to its own database (e.g., update order status to PAYMENT_CHARGED)
Publish an event to Kafka (PaymentCharged)

These two operations cannot be wrapped in a single transaction — one is a database write, the other is a network call to a different system. If the database commits but Kafka publish fails, the downstream services never see the event. The saga stalls silently.

The Outbox Pattern solves this by making Kafka publishing an asynchronous side effect of the database write:

sql

-- outbox table lives in the SAME database as your business data
CREATE TABLE outbox_events (
    id            UUID        PRIMARY KEY DEFAULT gen_random_uuid(),
    aggregate_id  TEXT        NOT NULL,
    aggregate_type TEXT       NOT NULL,
    event_type    TEXT        NOT NULL,
    payload       JSONB       NOT NULL,
    created_at    TIMESTAMPTZ NOT NULL DEFAULT NOW(),
    published_at  TIMESTAMPTZ
);

python

# Inside a single database transaction:
with db.transaction():
    # business write
    db.execute(
        "UPDATE orders SET status='PAYMENT_CHARGED' WHERE id=?",
        order_id
    )
    # outbox write — same transaction, same atomicity guarantee
    db.execute(
        "INSERT INTO outbox_events (aggregate_id, aggregate_type, event_type, payload)"
        "VALUES (?, 'Order', 'PaymentCharged', ?)",
        order_id, json.dumps(payload)
    )
# If this transaction commits, BOTH writes are durable.
# If it rolls back, NEITHER write happens.

# A separate relay process (Debezium CDC) tails the outbox table
# and publishes events to Kafka — independently, with retries.

The relay process (typically Debezium reading the database’s change-data-capture stream) publishes the outbox event to Kafka and marks it as published_at. The relay can retry on Kafka failures without risk of data loss — the event is safely in the database until delivered. This gives you exactly-once delivery semantics at the cost of a small write amplification.

6. Level 5 — Event Sourcing

Event sourcing flips the storage model entirely. Instead of storing the current state of an entity, you store the sequence of events that produced that state. The current state is derived by replaying events from the beginning (or from a snapshot).

python

# Traditional storage: current state snapshot
orders_table = {
    "order_id": "ord-42",
    "status": "shipped",
    "amount": 99.00,
    "items": [...]
}

# Event sourcing: ordered log of immutable events
event_store = [
    { "type": "OrderPlaced",       "orderId": "ord-42", "userId": "u-7",         "ts": "10:00:01" },
    { "type": "PaymentCharged",    "orderId": "ord-42", "amount": 99.00,      "ts": "10:00:03" },
    { "type": "InventoryReserved", "orderId": "ord-42", "items": [...],    "ts": "10:00:05" },
    { "type": "OrderShipped",      "orderId": "ord-42", "tracking": "TRK-9", "ts": "10:02:10" },
]

def rebuild_order_state(events):
    state = {}
    for e in events:
        if e["type"] == "OrderPlaced":
            state = { "id": e["orderId"], "status": "placed", "userId": e["userId"] }
        elif e["type"] == "PaymentCharged":
            state["status"] = "paid";  state["amount"] = e["amount"]
        elif e["type"] == "InventoryReserved":
            state["status"] = "reserved"
        elif e["type"] == "OrderShipped":
            state["status"] = "shipped"; state["tracking"] = e["tracking"]
    return state

Event sourcing sounds
simple but the devil is
in schema evolution —
when your event format
changes, you need to
migrate or version
every event ever stored.
This is why many teams
start with it and
regret it.

Interactive Event Log Replay

Event Log

Current State

💡 Snapshot after 100 events: skip replay, load snapshot + tail

Snapshot Optimization

For aggregates with long event histories (thousands of events), replaying from event zero on every read is expensive. The solution is periodic snapshots: after every N events, serialize the current state to a snapshot store. On read, load the most recent snapshot then replay only the events that occurred after it.

7. Level 6 — CQRS

Command Query Responsibility Segregation (CQRS) is the natural companion to Event Sourcing. The insight is that write workloads (processing commands, enforcing invariants) and read workloads (serving queries, generating views) have almost nothing in common — so separate them.

Write side: Commands → Aggregate (validates invariants, enforces business rules) → Event Store → Event Bus

Read side: Multiple independent projections subscribe to the event bus and maintain their own read-optimized stores. An Order History Projection might maintain a flat table sorted by user + timestamp. An Inventory Projection maintains current stock levels. An Analytics Projection feeds a data warehouse. Each is rebuilt independently from the event stream.

python

# WRITE SIDE — command handler
class OrderAggregate:
    def handle_place_order(self, cmd):
        if self.status != "new":
            raise InvalidStateError("Order already placed")
        self.apply(OrderPlaced(cmd.order_id, cmd.user_id, cmd.items))

    def apply(self, event):
        self.event_store.append(event)
        self.event_bus.publish(event)  # via outbox pattern

# READ SIDE — projections (one per read model)
class OrderHistoryProjection:
    def on_order_placed(self, event):
        self.read_db.upsert("order_history", {
            "order_id": event.order_id, "user_id": event.user_id,
            "placed_at": event.timestamp, "status": "placed"
        })

    def on_order_shipped(self, event):
        self.read_db.update("order_history",
            where={"order_id": event.order_id},
            set={"status": "shipped", "tracking": event.tracking}
        )

The key benefit: the read model is disposable. If you need a new query pattern, build a new projection and replay all historical events into it. The event store is the single source of truth.

8. Comparison: When to Use What

Pattern	Consistency	Complexity	Latency	Best For
2PC	Strong (ACID)	High	High (locks)	Financial, same-team DBs
Saga (Choreography)	Eventual	Medium	Low	Simple microservice flows
Saga (Orchestration)	Eventual	High	Low	Complex, observable flows
Outbox Pattern	Eventual	Low	Low	Reliable event publishing
Event Sourcing	Eventual	Very High	Low	Audit trails, time-travel
CQRS	Eventual	Very High	Low (reads)	High-read, complex domains

9. Real-World: Amazon Order Placement

Amazon’s order flow uses Saga orchestration extensively. Here’s a simplified walk-through of how placing a single order works, with the compensating actions that fire on failure:

Step	Service	Compensation
1	Create order record (PENDING)	Delete pending order
2	Authorize payment (hold, not capture)	Release payment authorization
3	Reserve inventory at warehouse	Release inventory reservation
4	Assign carrier and tracking	Cancel carrier booking
5	Capture payment (finalize charge)	Issue refund
6	Trigger fulfillment pick-pack	Cancel pick-pack job
7	Send confirmation email	(no compensation needed)

Notice that payment authorization (step 2) and payment capture (step 5) are separate. This is intentional: holding a reservation is cheaper to reverse than a completed charge. The saga is designed so the expensive, hard-to-compensate operations happen as late as possible.

Amazon’s saga orchestrator persists its state to DynamoDB. If the orchestrator itself crashes mid-saga, it rehydrates from the persisted state and resumes from the last completed step. This is why every step must be idempotent.

10. Idempotency Is Non-Negotiable

Every saga step — forward and compensating — must be safely retryable. A network timeout doesn’t mean the operation failed; the downstream service may have succeeded and the response was just lost. Retrying a non-idempotent operation doubles the charge, reserves twice the inventory, sends two emails.

The standard solution is the idempotency key pattern:

python

# Client side: generate a stable key from the logical operation
idempotency_key = hash(saga_id + "." + step_name)
# e.g., "saga-ord-42.charge_payment" → deterministic UUID

# Server side (payment service): check before processing
def charge_payment(request):
    existing = db.get("idempotency_keys", request.idempotency_key)
    if existing:
        return existing.result  # replay cached result, don't re-charge

    result = stripe.charge(request.amount, request.card_token)

    db.set("idempotency_keys", request.idempotency_key, result, ttl=86400)
    return result

The idempotency key table is keyed by (service, key) and stores the result. Retries within the TTL window return the cached response immediately without re-executing the business logic. Stripe, Braintree, and every major payment processor support this natively via the Idempotency-Key header.

Interview signal: Mentioning idempotency unprompted, alongside saga compensation design, is the mark of a senior engineer who has debugged distributed systems in production. Most candidates forget it until prompted.

Summary

Distributed transactions are fundamentally a problem of trust across failure boundaries. The evolution from naïve sequential calls → 2PC → Saga → Event Sourcing + CQRS reflects the industry’s growing understanding that strong consistency across microservices is usually not worth the coupling it requires.

For most e-commerce systems today the answer is Saga orchestration + Outbox pattern:

Outbox guarantees reliable event delivery without dual-write risk
Orchestrated Sagas give you observable, debuggable compensation flows
Idempotency keys make every step safely retryable
Eventually-consistent is acceptable for order processing (customers don’t expect sub-millisecond confirmation)

Reserve Event Sourcing and CQRS for domains where audit history is a first-class requirement (financial ledgers, healthcare records, compliance systems) — not as a default architectural choice.

The Saga pattern was
introduced by Hector
Garcia-Molina in 1987
for long-lived
transactions. The
distributed systems
community rediscovered
it for microservices
30 years later.