System Design: Shopping Cart — Consistency, Merging, and the Checkout Saga

Series System Design: Web Scenarios — Shopping Cart

Amazon’s Dynamo paper
(2007) opens with the
shopping cart as the
primary example of why
they chose eventual
consistency over strong
consistency — even for
financial-adjacent data.

Design Amazon’s shopping cart. Users add and remove items. The cart must work offline — user on a plane with no connection. When they reconnect, the cart syncs. Users have the same cart on mobile and desktop. Checkout must reserve inventory and process payment atomically.

The question: Design a shopping cart service that works offline, syncs across devices, and has a correct, rollback-capable checkout flow.


1. More Complex Than It Looks

The shopping cart seems trivial until you consider the real edge cases:

  • User adds item on phone, adds a different item on laptop while both are offline → need to sync both additions
  • User removes item on phone while server is unreachable → removal must persist after reconnect
  • Same item added twice from two devices simultaneously → should not double-count
  • Item goes out of stock between “add to cart” and “checkout” → must handle gracefully

These edge cases map directly to four classic distributed-systems problems: partition tolerance, causal consistency, idempotency, and TOCTOU races. A good interview answer names all four and then addresses them in order.

Why interviewers love this question: The shopping cart looks like a simple CRUD service. It is actually a miniature distributed-systems course — offline-first, CRDT merge, inventory reservation, and the saga pattern, all in one scenario.

2. Level 1 — Naïve Server-Side Cart

The naïve model stores the cart as a single DB row:

sql
CREATE TABLE carts (
  user_id   BIGINT PRIMARY KEY,
  items     JSONB,           -- [{"itemId":"B001","qty":2}, ...]
  updated_at TIMESTAMPTZ
);

-- Add item: read-modify-write
UPDATE carts
  SET items = items || '[{"itemId":"B007","qty":1}]'::jsonb,
      updated_at = now()
  WHERE user_id = 42;

Every add or remove is an UPDATE. This breaks down immediately under two conditions:

Offline: user adds an item, but the network is unavailable — the UPDATE never reaches the DB. The change is lost when the app restarts.

Concurrent devices: phone and laptop both read the cart, each modify a different item, and both UPDATE. The second write stomps the first — last-write-wins silently drops one device’s changes.

Last-write-wins (LWW) is the silent killer of cart implementations. Two devices, both online, can race each other. Whichever network request arrives a few milliseconds later wins — and the loser's changes disappear with no error shown to the user.

3. Level 2 — Cart as a CRDT

Amazon’s Dynamo paper (2007) famously uses the shopping cart as the motivating example for Conflict-free Replicated Data Types (CRDTs). The insight: instead of storing the current cart state, store the history of operations in a way that can always be merged correctly regardless of order.

Model the cart as two sets:

  • added — every (itemId, uniqueTag) pair ever added
  • removed — every tag explicitly removed by the user
  • current_cart = added − removed

This is an OR-Set (Observed-Remove Set):

python
import uuid

class ORSetCart:
    def __init__(self):
        self.added   = {}   # itemId -> set of unique tags
        self.removed = set() # set of tags

    def add(self, item_id):
        tag = str(uuid.uuid4())
        self.added.setdefault(item_id, set()).add(tag)
        return tag

    def remove(self, item_id):
        # Remove ALL currently known tags for this item
        tags = self.added.get(item_id, set())
        self.removed.update(tags)

    def items(self):
        result = {}
        for item_id, tags in self.added.items():
            live = tags - self.removed
            if live:
                result[item_id] = len(live)  # qty = number of live tags
        return result

    def merge(self, other):
        # Union both added sets, union both removed sets
        merged = ORSetCart()
        for item_id in set(self.added) | set(other.added):
            merged.added[item_id] = (
                self.added.get(item_id, set()) |
                other.added.get(item_id, set())
            )
        merged.removed = self.removed | other.removed
        return merged

The key insight: when phone removes “Keyboard” and laptop concurrently re-adds “Keyboard”, the re-add generates a new tag not present in phone’s removed set — so after merge, Keyboard remains. The OR-Set correctly handles the “remove wins unless you re-add” semantics.

Interactive CRDT Cart Merger

CRDT vs Last-Write-Wins — Offline Merge Simulation
Both devices online. Showing shared cart.
📱 Phone online
💻 Laptop online
☁ Server online

How it works: Phone removes “Keyboard” (marks its existing tag as removed). Laptop re-adds “Keyboard” generating a new tag. On sync, merged removed contains only the old tag — the new tag is alive. Keyboard stays. Last-write-wins would have simply taken phone’s state and silently dropped the laptop’s re-addition.


4. Level 3 — Practical Cart Storage

Pure CRDTs are elegant but expensive to store at scale. Real systems use a pragmatic hybrid:

Layer Technology Purpose TTL
Hot cache Redis Hash Active cart reads/writes (sub-millisecond) 30 days, sliding
Persistent store DynamoDB / Cassandra Durability, cross-region replication 90 days
Client cache IndexedDB / localStorage Offline-first, instant UI response Session
Sync log DynamoDB (event log) Ordered ops for delta sync protocol 7 days

Redis schema:

redis
# Cart items stored as Hash field=itemId, value=JSON
HSET cart:42  B001  '{"qty":2,"addedAt":1700000000,"price":29.99}'
HSET cart:42  B007  '{"qty":1,"addedAt":1700000100,"price":9.99}'
EXPIRE cart:42  2592000  # 30 days in seconds

# Read full cart
HGETALL cart:42

# Remove one item
HDEL cart:42  B001

# Inventory soft hold (15-minute TTL)
SET hold:B007:42  '{"qty":1,"expiresAt":1700000900}'  EX 900

Delta sync protocol — the client does not send the full cart on every reconnect. It sends only changes since the last successful sync:

json
// POST /cart/sync
{
  "userId": 42,
  "lastSyncTs": 1700000000,
  "ops": [
    { "op": "add",    "itemId": "B009", "tag": "uuid-1", "ts": 1700000050 },
    { "op": "remove", "itemId": "B001", "ts": 1700000080 }
  ]
}

// Server response: merged state + server-side changes since lastSyncTs
{
  "cart": [ /* full current cart */ ],
  "syncTs": 1700000200
}

5. Level 4 — Cart Expiry and Guest Carts

Guest cart lifecycle:

  1. First visit → generate guestSessionId, store cart in localStorage + Redis key guest-cart:{sessionId}
  2. User signs up or logs in → merge guest cart with existing logged-in cart
  3. Merge policy: for each item, take max(guestQty, loggedInQty) — err on the side of the customer buying more
  4. After merge, delete the guest cart key; set cart:{userId} in Redis
python
def merge_on_login(redis, user_id, guest_session_id):
    guest_key  = "guest-cart:" + guest_session_id
    user_key   = "cart:" + str(user_id)

    guest_items = redis.hgetall(guest_key)   # {itemId: json}
    user_items  = redis.hgetall(user_key)

    for item_id, guest_val in guest_items.items():
        guest_data = json.loads(guest_val)
        user_data  = json.loads(user_items.get(item_id, '{"qty":0}'))
        merged_qty = max(guest_data['qty'], user_data['qty'])

        redis.hset(user_key, item_id, json.dumps({
            'qty': merged_qty,
            'addedAt': min(guest_data.get('addedAt', 0), user_data.get('addedAt', 0)),
        }))

    redis.expire(user_key, 2592000)
    redis.delete(guest_key)

Abandoned cart emails are triggered by a background job: every hour, scan DynamoDB for carts with items that have not had a checkout event in 24 hours, then enqueue a notification. This is an async workflow entirely decoupled from the cart service itself.


6. The Checkout Saga

Checkout spans multiple services. Every step must succeed — or every completed step must be compensated (rolled back). This is the Saga pattern.

The eight steps of checkout:

  1. Lock cart — prevent concurrent checkout attempts for the same user
  2. Validate stock — confirm all items are currently available
  3. Reserve inventory — soft-hold the items (15-min TTL)
  4. Create order record — write order to DB in PENDING state
  5. Process payment — charge the card
  6. Confirm reservation — convert soft-hold to committed reservation
  7. Send confirmation email — async, fire-and-forget
  8. Clear cart — remove items from Redis and DB

Interactive Checkout Saga Visualizer

Checkout Saga — Step-by-Step Execution

Compensating transactions are not rollbacks — they are new, forward-moving operations that undo the effect of a previous step. They must be idempotent (safe to run twice) and durable (persisted to a saga state machine in DynamoDB so they survive crashes).

Saga state machine durability: the saga orchestrator writes its current step to a durable store before executing each action. If the orchestrator crashes mid-saga and restarts, it reads its last known state and resumes — either completing the saga or running compensation from the point of failure.

7. Inventory Reservation — Hard vs Soft Hold

Two models exist for holding stock during checkout:

Model How it works Pros Cons
Hard reservation Decrement stock immediately when item added to cart Simple; no TOCTOU race at checkout Abandoned carts lock up stock indefinitely; needs expiry job
Soft hold (recommended) Tentative hold at checkout start, TTL 15 min; confirmed on payment success Stock only locked when user is actively checking out Race between two users checking out same last item
Optimistic — no hold Check stock at payment time; fail if unavailable Zero lock contention; simplest Payment processed then stock found gone → refund needed

Soft-hold implementation with Redis:

python
def acquire_soft_hold(redis, item_id, user_id, qty):
    hold_key    = "hold:" + item_id + ":" + str(user_id)
    stock_key   = "stock:" + item_id
    hold_ttl    = 900  # 15 minutes

    with redis.pipeline() as pipe:
        while True:
            try:
                pipe.watch(stock_key)
                available = int(pipe.get(stock_key) or 0)
                if available < qty:
                    return False  # out of stock

                pipe.multi()
                pipe.decrby(stock_key, qty)
                pipe.setex(hold_key, hold_ttl, qty)
                pipe.execute()
                return True

            except WatchError:
                continue  # concurrent modification, retry

def confirm_hold(redis, item_id, user_id):
    # Delete the TTL-key; stock already decremented
    redis.delete("hold:" + item_id + ":" + str(user_id))

def release_hold(redis, item_id, user_id, qty):
    # Compensation: put stock back
    redis.incrby("stock:" + item_id, qty)
    redis.delete("hold:" + item_id + ":" + str(user_id))

The WATCH/MULTI/EXEC pattern provides optimistic concurrency: if any other client modifies stock:{itemId} between the watch and execute, the transaction aborts and the client retries — eliminating the stock-going-negative race without any distributed locking.


8. Price Guarantee

A user adds an item at $99. The price changes to $129 the next day. They check out two days later. Which price applies?

Policy Behaviour User experience Business impact
Cart price lock Price at add-time locked for 24–48 h; after that, re-evaluate Good User sees the price they expected Potential margin loss on price increases
Current price always Cart always shows live price; checkout uses live price Mixed Surprise at checkout if price changed Maximises margin; simple to implement
Notify on change Price snapshot stored; on change, show banner "price changed" Good User is informed before committing Higher engineering cost; requires event-driven price feed
Lower of the two Charge min(add-time price, checkout price) Best for user Never worse than expected Revenue impact on flash-sale recovery

Amazon’s documented behaviour: cart shows current price; if the price changes while an item is in your cart, a notice appears at checkout. No price lock is applied. The trade-off: simplicity and accurate revenue over customer price-certainty.

Implementation of “notify on change”: when an item is added to cart, snapshot priceAtAdd in the cart record. A background job subscribes to the price-change event stream. When a price event fires for itemId, query all active carts containing that item (secondary index on DynamoDB) and write a priceChanged=true flag. Cart service reads this flag at checkout time and surfaces the banner.


9. Capacity Estimate

Metric Estimate Notes
Active carts ~100 million Amazon scale; each cart = one Redis Hash key
Cart size in Redis ~500 bytes avg ~10 items × 50 bytes each (itemId + qty + price JSON)
Total Redis memory ~50 GB 100M × 500B; fits in a large Redis cluster
Add-to-cart events/sec ~10,000 req/s Redis HSET; well under 1M ops/s capacity
Checkout transactions/sec ~5,000 tx/s Each involves 8 saga steps; saga orchestrator must scale horizontally
Peak multiplier (Prime Day) 10× 50,000 checkouts/sec; requires pre-scaled Redis and inventory service
Abandoned cart email jobs/day ~2 million Async; DynamoDB scan + SQS queue + email workers

10. Architecture Summary

The complete system breaks into five tiers:

Client tier: React/Native app with IndexedDB-backed cart. Every mutation is written locally first, queued for sync. Sync protocol sends delta ops to the cart API, receives merged state.

Cart API tier: Stateless Go/Java service. Reads/writes Redis for hot path. Writes DynamoDB asynchronously (write-behind cache). Exposes /cart/sync endpoint for delta merge.

Checkout orchestrator: Saga state machine backed by DynamoDB. Each saga instance is a record with sagaId, currentStep, status, and compensationLog. Orchestrator is idempotent — re-running the same saga from any step is safe.

Inventory service: Redis for real-time stock counters with optimistic locking. DynamoDB for authoritative stock. Publishes StockReserved and StockReleased events to Kafka.

Payment service: Wraps payment processor (Stripe/Braintree). Idempotency key = orderId to prevent double charges on retry. Emits PaymentSucceeded / PaymentFailed events consumed by the saga orchestrator.

Key interview takeaway: The shopping cart is not a CRUD service. It is a distributed system problem that requires offline-first client design, CRDT-based merge semantics, event-sourced sync, a choreographed checkout saga, and careful inventory locking strategy — all working together.

Amazon’s Dynamo paper
(2007) on the shopping
cart: “The add to cart
operation can never be
rejected… a ‘shopping
cart’ that cannot
accept items is just
an empty promise.”

The most important design decision in this entire system is the one Amazon made in 2007: allow the cart to accept writes even when the system is degraded. A cart that rejects adds because a replica is down is worse for the customer than a cart that temporarily shows a slightly stale state. The OR-Set CRDT is the technical embodiment of that business decision. Every subsequent design choice — soft holds, saga compensation, delta sync — flows from this same principle: prefer availability and eventual consistency over strong consistency for user-facing cart operations.


The guest cart merge
problem has a famous
edge case: user adds
2 units as guest; their
logged-in cart has 3.
Amazon’s policy: keep
max(2,3)=3, erring
toward buying more.

The “merge guest cart on login” problem appears in every e-commerce system and has no universally correct answer. Amazon chose max(guestQty, loggedInQty) per item. Etsy has historically chosen “union all items, keep guest quantities for new items”. Neither is wrong — they reflect different product philosophies about what a cart means to the customer. In an interview, the right answer is: name the ambiguity, pick a policy, justify the trade-off.


At Amazon scale, a
single checkout spans
50+ microservices. Each
saga step has retry
logic with exponential
backoff + jitter. The
orchestrator is a durable
DynamoDB state machine.

Checkout sagas at Amazon scale are not simple sequential chains. Each step may itself fan out to multiple sub-services. The saga orchestrator maintains a fully durable state machine — if the entire orchestrator fleet is replaced during a deployment, in-flight sagas resume from their last committed step. Exponential backoff with jitter prevents thundering-herd retries during the payment processor brownouts that tend to happen precisely when checkout volume is highest.