System Design: Shopping Cart — Consistency, Merging, and the Checkout Saga
Amazon’s Dynamo paper
(2007) opens with the
shopping cart as the
primary example of why
they chose eventual
consistency over strong
consistency — even for
financial-adjacent data.
Design Amazon’s shopping cart. Users add and remove items. The cart must work offline — user on a plane with no connection. When they reconnect, the cart syncs. Users have the same cart on mobile and desktop. Checkout must reserve inventory and process payment atomically.
The question: Design a shopping cart service that works offline, syncs across devices, and has a correct, rollback-capable checkout flow.
1. More Complex Than It Looks
The shopping cart seems trivial until you consider the real edge cases:
- User adds item on phone, adds a different item on laptop while both are offline → need to sync both additions
- User removes item on phone while server is unreachable → removal must persist after reconnect
- Same item added twice from two devices simultaneously → should not double-count
- Item goes out of stock between “add to cart” and “checkout” → must handle gracefully
These edge cases map directly to four classic distributed-systems problems: partition tolerance, causal consistency, idempotency, and TOCTOU races. A good interview answer names all four and then addresses them in order.
2. Level 1 — Naïve Server-Side Cart
The naïve model stores the cart as a single DB row:
CREATE TABLE carts ( user_id BIGINT PRIMARY KEY, items JSONB, -- [{"itemId":"B001","qty":2}, ...] updated_at TIMESTAMPTZ ); -- Add item: read-modify-write UPDATE carts SET items = items || '[{"itemId":"B007","qty":1}]'::jsonb, updated_at = now() WHERE user_id = 42;
Every add or remove is an UPDATE. This breaks down immediately under two conditions:
Offline: user adds an item, but the network is unavailable — the UPDATE never reaches the DB. The change is lost when the app restarts.
Concurrent devices: phone and laptop both read the cart, each modify a different item, and both UPDATE. The second write stomps the first — last-write-wins silently drops one device’s changes.
3. Level 2 — Cart as a CRDT
Amazon’s Dynamo paper (2007) famously uses the shopping cart as the motivating example for Conflict-free Replicated Data Types (CRDTs). The insight: instead of storing the current cart state, store the history of operations in a way that can always be merged correctly regardless of order.
Model the cart as two sets:
added— every(itemId, uniqueTag)pair ever addedremoved— every tag explicitly removed by the usercurrent_cart = added − removed
This is an OR-Set (Observed-Remove Set):
import uuid class ORSetCart: def __init__(self): self.added = {} # itemId -> set of unique tags self.removed = set() # set of tags def add(self, item_id): tag = str(uuid.uuid4()) self.added.setdefault(item_id, set()).add(tag) return tag def remove(self, item_id): # Remove ALL currently known tags for this item tags = self.added.get(item_id, set()) self.removed.update(tags) def items(self): result = {} for item_id, tags in self.added.items(): live = tags - self.removed if live: result[item_id] = len(live) # qty = number of live tags return result def merge(self, other): # Union both added sets, union both removed sets merged = ORSetCart() for item_id in set(self.added) | set(other.added): merged.added[item_id] = ( self.added.get(item_id, set()) | other.added.get(item_id, set()) ) merged.removed = self.removed | other.removed return merged
The key insight: when phone removes “Keyboard” and laptop concurrently re-adds “Keyboard”, the re-add generates a new tag not present in phone’s removed set — so after merge, Keyboard remains. The OR-Set correctly handles the “remove wins unless you re-add” semantics.
Interactive CRDT Cart Merger
How it works: Phone removes “Keyboard” (marks its existing tag as removed). Laptop re-adds “Keyboard” generating a new tag. On sync, merged removed contains only the old tag — the new tag is alive. Keyboard stays. Last-write-wins would have simply taken phone’s state and silently dropped the laptop’s re-addition.
4. Level 3 — Practical Cart Storage
Pure CRDTs are elegant but expensive to store at scale. Real systems use a pragmatic hybrid:
| Layer | Technology | Purpose | TTL |
|---|---|---|---|
| Hot cache | Redis Hash | Active cart reads/writes (sub-millisecond) | 30 days, sliding |
| Persistent store | DynamoDB / Cassandra | Durability, cross-region replication | 90 days |
| Client cache | IndexedDB / localStorage | Offline-first, instant UI response | Session |
| Sync log | DynamoDB (event log) | Ordered ops for delta sync protocol | 7 days |
Redis schema:
# Cart items stored as Hash field=itemId, value=JSON HSET cart:42 B001 '{"qty":2,"addedAt":1700000000,"price":29.99}' HSET cart:42 B007 '{"qty":1,"addedAt":1700000100,"price":9.99}' EXPIRE cart:42 2592000 # 30 days in seconds # Read full cart HGETALL cart:42 # Remove one item HDEL cart:42 B001 # Inventory soft hold (15-minute TTL) SET hold:B007:42 '{"qty":1,"expiresAt":1700000900}' EX 900
Delta sync protocol — the client does not send the full cart on every reconnect. It sends only changes since the last successful sync:
// POST /cart/sync { "userId": 42, "lastSyncTs": 1700000000, "ops": [ { "op": "add", "itemId": "B009", "tag": "uuid-1", "ts": 1700000050 }, { "op": "remove", "itemId": "B001", "ts": 1700000080 } ] } // Server response: merged state + server-side changes since lastSyncTs { "cart": [ /* full current cart */ ], "syncTs": 1700000200 }
5. Level 4 — Cart Expiry and Guest Carts
Guest cart lifecycle:
- First visit → generate
guestSessionId, store cart inlocalStorage+ Redis keyguest-cart:{sessionId} - User signs up or logs in → merge guest cart with existing logged-in cart
- Merge policy: for each item, take
max(guestQty, loggedInQty)— err on the side of the customer buying more - After merge, delete the guest cart key; set
cart:{userId}in Redis
def merge_on_login(redis, user_id, guest_session_id): guest_key = "guest-cart:" + guest_session_id user_key = "cart:" + str(user_id) guest_items = redis.hgetall(guest_key) # {itemId: json} user_items = redis.hgetall(user_key) for item_id, guest_val in guest_items.items(): guest_data = json.loads(guest_val) user_data = json.loads(user_items.get(item_id, '{"qty":0}')) merged_qty = max(guest_data['qty'], user_data['qty']) redis.hset(user_key, item_id, json.dumps({ 'qty': merged_qty, 'addedAt': min(guest_data.get('addedAt', 0), user_data.get('addedAt', 0)), })) redis.expire(user_key, 2592000) redis.delete(guest_key)
Abandoned cart emails are triggered by a background job: every hour, scan DynamoDB for carts with items that have not had a checkout event in 24 hours, then enqueue a notification. This is an async workflow entirely decoupled from the cart service itself.
6. The Checkout Saga
Checkout spans multiple services. Every step must succeed — or every completed step must be compensated (rolled back). This is the Saga pattern.
The eight steps of checkout:
- Lock cart — prevent concurrent checkout attempts for the same user
- Validate stock — confirm all items are currently available
- Reserve inventory — soft-hold the items (15-min TTL)
- Create order record — write order to DB in
PENDINGstate - Process payment — charge the card
- Confirm reservation — convert soft-hold to committed reservation
- Send confirmation email — async, fire-and-forget
- Clear cart — remove items from Redis and DB
Interactive Checkout Saga Visualizer
Compensating transactions are not rollbacks — they are new, forward-moving operations that undo the effect of a previous step. They must be idempotent (safe to run twice) and durable (persisted to a saga state machine in DynamoDB so they survive crashes).
7. Inventory Reservation — Hard vs Soft Hold
Two models exist for holding stock during checkout:
| Model | How it works | Pros | Cons |
|---|---|---|---|
| Hard reservation | Decrement stock immediately when item added to cart | Simple; no TOCTOU race at checkout | Abandoned carts lock up stock indefinitely; needs expiry job |
| Soft hold (recommended) | Tentative hold at checkout start, TTL 15 min; confirmed on payment success | Stock only locked when user is actively checking out | Race between two users checking out same last item |
| Optimistic — no hold | Check stock at payment time; fail if unavailable | Zero lock contention; simplest | Payment processed then stock found gone → refund needed |
Soft-hold implementation with Redis:
def acquire_soft_hold(redis, item_id, user_id, qty): hold_key = "hold:" + item_id + ":" + str(user_id) stock_key = "stock:" + item_id hold_ttl = 900 # 15 minutes with redis.pipeline() as pipe: while True: try: pipe.watch(stock_key) available = int(pipe.get(stock_key) or 0) if available < qty: return False # out of stock pipe.multi() pipe.decrby(stock_key, qty) pipe.setex(hold_key, hold_ttl, qty) pipe.execute() return True except WatchError: continue # concurrent modification, retry def confirm_hold(redis, item_id, user_id): # Delete the TTL-key; stock already decremented redis.delete("hold:" + item_id + ":" + str(user_id)) def release_hold(redis, item_id, user_id, qty): # Compensation: put stock back redis.incrby("stock:" + item_id, qty) redis.delete("hold:" + item_id + ":" + str(user_id))
The WATCH/MULTI/EXEC pattern provides optimistic concurrency: if any other client modifies stock:{itemId} between the watch and execute, the transaction aborts and the client retries — eliminating the stock-going-negative race without any distributed locking.
8. Price Guarantee
A user adds an item at $99. The price changes to $129 the next day. They check out two days later. Which price applies?
| Policy | Behaviour | User experience | Business impact |
|---|---|---|---|
| Cart price lock | Price at add-time locked for 24–48 h; after that, re-evaluate | Good User sees the price they expected | Potential margin loss on price increases |
| Current price always | Cart always shows live price; checkout uses live price | Mixed Surprise at checkout if price changed | Maximises margin; simple to implement |
| Notify on change | Price snapshot stored; on change, show banner "price changed" | Good User is informed before committing | Higher engineering cost; requires event-driven price feed |
| Lower of the two | Charge min(add-time price, checkout price) | Best for user Never worse than expected | Revenue impact on flash-sale recovery |
Amazon’s documented behaviour: cart shows current price; if the price changes while an item is in your cart, a notice appears at checkout. No price lock is applied. The trade-off: simplicity and accurate revenue over customer price-certainty.
Implementation of “notify on change”: when an item is added to cart, snapshot priceAtAdd in the cart record. A background job subscribes to the price-change event stream. When a price event fires for itemId, query all active carts containing that item (secondary index on DynamoDB) and write a priceChanged=true flag. Cart service reads this flag at checkout time and surfaces the banner.
9. Capacity Estimate
| Metric | Estimate | Notes |
|---|---|---|
| Active carts | ~100 million | Amazon scale; each cart = one Redis Hash key |
| Cart size in Redis | ~500 bytes avg | ~10 items × 50 bytes each (itemId + qty + price JSON) |
| Total Redis memory | ~50 GB | 100M × 500B; fits in a large Redis cluster |
| Add-to-cart events/sec | ~10,000 req/s | Redis HSET; well under 1M ops/s capacity |
| Checkout transactions/sec | ~5,000 tx/s | Each involves 8 saga steps; saga orchestrator must scale horizontally |
| Peak multiplier (Prime Day) | 10× | 50,000 checkouts/sec; requires pre-scaled Redis and inventory service |
| Abandoned cart email jobs/day | ~2 million | Async; DynamoDB scan + SQS queue + email workers |
10. Architecture Summary
The complete system breaks into five tiers:
Client tier: React/Native app with IndexedDB-backed cart. Every mutation is written locally first, queued for sync. Sync protocol sends delta ops to the cart API, receives merged state.
Cart API tier: Stateless Go/Java service. Reads/writes Redis for hot path. Writes DynamoDB asynchronously (write-behind cache). Exposes /cart/sync endpoint for delta merge.
Checkout orchestrator: Saga state machine backed by DynamoDB. Each saga instance is a record with sagaId, currentStep, status, and compensationLog. Orchestrator is idempotent — re-running the same saga from any step is safe.
Inventory service: Redis for real-time stock counters with optimistic locking. DynamoDB for authoritative stock. Publishes StockReserved and StockReleased events to Kafka.
Payment service: Wraps payment processor (Stripe/Braintree). Idempotency key = orderId to prevent double charges on retry. Emits PaymentSucceeded / PaymentFailed events consumed by the saga orchestrator.
Amazon’s Dynamo paper
(2007) on the shopping
cart: “The add to cart
operation can never be
rejected… a ‘shopping
cart’ that cannot
accept items is just
an empty promise.”
The most important design decision in this entire system is the one Amazon made in 2007: allow the cart to accept writes even when the system is degraded. A cart that rejects adds because a replica is down is worse for the customer than a cart that temporarily shows a slightly stale state. The OR-Set CRDT is the technical embodiment of that business decision. Every subsequent design choice — soft holds, saga compensation, delta sync — flows from this same principle: prefer availability and eventual consistency over strong consistency for user-facing cart operations.
The guest cart merge
problem has a famous
edge case: user adds
2 units as guest; their
logged-in cart has 3.
Amazon’s policy: keep
max(2,3)=3, erring
toward buying more.
The “merge guest cart on login” problem appears in every e-commerce system and has no universally correct answer. Amazon chose max(guestQty, loggedInQty) per item. Etsy has historically chosen “union all items, keep guest quantities for new items”. Neither is wrong — they reflect different product philosophies about what a cart means to the customer. In an interview, the right answer is: name the ambiguity, pick a policy, justify the trade-off.
At Amazon scale, a
single checkout spans
50+ microservices. Each
saga step has retry
logic with exponential
backoff + jitter. The
orchestrator is a durable
DynamoDB state machine.
Checkout sagas at Amazon scale are not simple sequential chains. Each step may itself fan out to multiple sub-services. The saga orchestrator maintains a fully durable state machine — if the entire orchestrator fleet is replaced during a deployment, in-flight sagas resume from their last committed step. Exponential backoff with jitter prevents thundering-herd retries during the payment processor brownouts that tend to happen precisely when checkout volume is highest.