System Design: Flash Sale — Surviving Black Friday and Limited-Stock Drops
The Interview Question
You're sitting across from the interviewer. They lean forward:
"Design a flash sale system. One thousand limited-edition sneakers go on sale at exactly 12:00 PM. Five hundred thousand users are waiting. The sale must be fair, there should be no overselling, bots must not win, the site must stay up, and all one thousand items must sell in under thirty seconds."
This is not hypothetical. Nike SNKRS drops, Supreme launches, Taylor Swift tickets, Nvidia GPU releases — these systems fail publicly and memorably. Lets design one that does not.
1. The Three Hard Problems
Every flash sale lives or dies by three interlocked failure modes. You cannot solve them independently.
Problem 1: Overselling. When 500,000 people simultaneously try to buy the last item, naive code sells it to all of them. Each request reads stock = 1, checks 1 > 0, then decrements. You end up at stock = -499,999 and a customer service catastrophe.
Problem 2: Thundering herd. At 12:00:00.000, every one of those 500,000 users clicks "Buy" simultaneously. Even if your system handles 10,000 req/sec normally, a 500,000 req/sec spike is 50x capacity. Servers fall over. The CDN does not help because these are authenticated purchase requests, not static assets.
Problem 3: Bot fairness. Automated buyers run on cloud VMs with 1ms latency and sub-millisecond click timing. A human might submit their request 200ms after the sale opens. A bot cluster is already done. Without addressing bots, 100% of inventory goes to resellers every time.
The layered solution we build below addresses each problem specifically, with each level building on the last.
2. Level 1 — Naive SQL
Level 1 The first instinct
Here is what every developer writes first:
-- Check availability SELECT stock FROM products WHERE id = ?; -- Application logic: if stock > 0, proceed UPDATE products SET stock = stock - 1 WHERE id = ?; INSERT INTO orders (user_id, product_id, created_at) VALUES (?, ?, NOW());
This has a classic check-then-act race condition. Between the SELECT and the UPDATE, another request can read the same stock value. If two requests both read stock = 1 and both verify 1 > 0 = true, they both proceed to UPDATE and INSERT. Stock becomes -1. You have sold an item you do not have.
At 500,000 concurrent users this does not fail occasionally — it fails for nearly every transaction.
Interactive Demo: The Race Condition
Two purchase requests hit the database simultaneously. Watch them both read stock = 1 and both proceed — leaving stock at -1.
3. Level 2 — Pessimistic Locking
Level 2 Row-level locks
The first real fix is a database transaction with an exclusive row lock:
BEGIN; SELECT stock FROM products WHERE id = ? FOR UPDATE; -- acquires exclusive row lock; others block here -- application: if stock > 0: UPDATE products SET stock = stock - 1 WHERE id = ?; INSERT INTO orders (user_id, product_id) VALUES (?, ?); COMMIT; -- releases lock; next waiter proceeds
FOR UPDATE acquires an exclusive lock on the row before reading. Any other transaction attempting to read the same row must block until this transaction commits or rolls back. This is correct — no more overselling.
The problem is throughput. Every purchase attempt serializes through that single row lock. With 500,000 concurrent connections:
- Connection pool exhaustion — databases cap connections at 200–500
- Lock queue — transactions pile up waiting, consuming memory and file descriptors
- Lock timeout cascades — transactions waiting too long start failing, generating user-visible errors
- Database CPU hits 100% managing the lock queue
bottleneck Throughput is bounded by lock serialization: roughly 1 purchase per DB round-trip, typically 50–200 purchases/sec even on powerful hardware. Adequate for a normal sale; fatal for a flash sale.
4. Level 3 — Optimistic Locking
Level 3 Conflict detection, not prevention
Optimistic locking assumes conflicts are rare and detects them on write instead of preventing them on read. Add a version column:
-- Schema ALTER TABLE products ADD COLUMN version INT NOT NULL DEFAULT 1; -- Read without a lock SELECT stock, version FROM products WHERE id = ?; -- got: stock=5, version=42 -- Update only if version still matches what we read UPDATE products SET stock = stock - 1, version = version + 1 WHERE id = ? AND version = 42 -- must match what we read AND stock > 0; -- rows affected = 0: lost the race, retry or fail -- rows affected = 1: success, proceed to INSERT order
This allows concurrent reads (no lock held during SELECT) and detects conflicts at write time. Under normal concurrent load it performs very well. Under flash-sale load — 500,000 synchronized arrivals — the retry rate becomes catastrophic:
- 500,000 users attempt at T=0
- Only 1 succeeds with version=42; 499,999 get 0-rows-affected
- All 499,999 retry, colliding on version=43
- Without exponential backoff this creates a retry storm that is worse than the original problem
- Still hammers the database with ~499,999 failed write attempts per tick
Optimistic locking is excellent for typical web workloads but poorly suited for the pathology of a synchronized-start flash sale where every contender arrives simultaneously.
5. Level 4 — Redis Atomic Decrement
Level 4 Move the hot path to Redis
Redis is single-threaded and executes each command atomically. The DECR command reads and decrements an integer in a single indivisible operation — no locks, no transactions, no race conditions at the application level.
Pre-load inventory before the sale opens:
-- Before 12:00 PM: seed inventory counter SET product:sneaker-001:stock 1000 SET product:sneaker-001:stock:initial 1000 -- At each purchase attempt: remaining = DECR product:sneaker-001:stock IF remaining >= 0: -- Reservation successful createOrderAsync(user_id, 'sneaker-001') RETURN "reserved" ELSE: -- Compensate: undo the decrement INCR product:sneaker-001:stock RETURN "sold_out"
DECR is atomic at the command level. There is no window between reading and writing the value — it is a single CPU instruction from Redis's perspective. This eliminates the overselling race condition entirely.
Performance characteristics of a single Redis instance:
- Throughput: 100,000–200,000 DECR operations per second
- Latency: under 0.1ms typical on the same network
- No connection pool exhaustion (Redis handles thousands of concurrent connections cheaply via epoll)
- Redis Cluster scales linearly with shard count
The Redis DECR approach has a subtle failure mode: if DECR succeeds (reservation made) but the subsequent database write for the order fails, you have decremented stock without creating a confirmed order. The inventory count is now wrong. The fix is a compensation step — if the DB write fails, immediately run INCR to restore the count. This is the smallest possible saga pattern: a two-step distributed transaction with a defined rollback operation.
This solves overselling at high throughput. But it does not yet solve the thundering herd — 500,000 requests still hammer your API layer simultaneously at T=0. And it does not address fairness.
6. Level 5 — The Pre-Sale Queue
Level 5 Decouple demand from fulfillment
The key insight: you do not need to process 500,000 requests simultaneously. You only need to sell 1,000 items. Everything else is waste. The virtual queue separates accepting demand (which must be instantaneous and massively parallel) from fulfilling orders (which is controlled and serial).
Architecture
Queue Entry — Absorbing the T=0 Spike
When the user clicks "Buy Now":
-- NX: only add if member does not exist (one entry per user) ZADD queue:sale:001 NX timestamp_ms() user_id -- Their position in line (0-indexed) position = ZRANK queue:sale:001 user_id -- Estimated wait at current drain rate estimated_wait_sec = position / drain_rate_per_sec -- Respond to the user immediately RETURN position, estimated_wait_sec, queue_token
The NX flag ensures a user can only enter the queue once (idempotent retries are safe). The sorted set scores by timestamp, so first-come-first-served ordering is enforced by Redis itself. A ZADD is O(log N) — at 500,000 entries, this is still well under 1ms.
Queue Processor — Controlled Drain
-- Runs in a tight loop, every 1000ms: LOOP: -- Atomically pop up to 200 entries from the front entries = ZPOPMIN queue:sale:001 200 FOR EACH entry IN entries: remaining = DECR product:sneaker-001:stock IF remaining >= 0: createOrder(entry.user_id, 'sneaker-001') notifyUser(entry.user_id, 'purchased') ELSE: INCR product:sneaker-001:stock -- compensate notifyUser(entry.user_id, 'sold_out') drainRemainingQueueAsSoldOut() BREAK sleep(1000ms)
The processor runs at a rate you control. Set it to 200/sec: 1,000 items sell in exactly 5 seconds. The database sees a steady 200 writes/sec — well within capacity. Users are notified via WebSocket or server-sent events as their turn arrives.
Interactive Demo: The Queue in Action
Five hundred users rush in at T=0. The queue drains at a configurable rate. The yellow dot is you — watch your estimated wait time count down as the queue processes.
7. Level 6 — Anti-Bot Measures
Level 6 Making bots pay the human tax
A fair queue means nothing if automated buyers monopolize the first positions. Anti-bot layers must be enforced at queue entry, not at checkout.
Rate Limiting with Redis Sliding Window
-- Max 1 queue entry attempt per user per 60-second window key = "ratelimit:user:" + user_id + ":" + floor(now_sec / 60) count = INCR key IF count == 1: EXPIRE key 120 -- TTL just past window boundary IF count > 1: RETURN "rate_limited" -- IP-level: max 3 distinct users per IP per minute (catches bot farms) ip_key = "ratelimit:ip:" + client_ip + ":" + floor(now_sec / 60) IF (INCR ip_key) > 3: RETURN "rate_limited"
Multi-Layer Bot Defence
Account age gate. Bots register new accounts for each sale. Require accounts to be at least 30 days old. This forces operators to maintain aged accounts — expensive at scale and detectable by statistical clustering.
CAPTCHA before queue entry. Present an invisible CAPTCHA solved before the sale starts, not at T=0 when every second counts. Humans solve it during the countdown; bots that skip it are rejected at queue entry.
Behavioral fingerprinting. Bots exhibit characteristic timing signatures:
- Click arrives within 5ms of sale start — human reaction time is 150–300ms minimum
- Mouse path is a direct straight line from page load to the buy button with zero deviation
- No scroll events, no hover delay, no micro-pauses before clicking
- HTTP headers inconsistent with the declared browser version
Device-bound participation token. Issue a signed token 5–10 minutes before the sale. The token binds to a browser fingerprint (canvas hash, WebGL renderer string, installed fonts, screen resolution). Same device cannot join the queue twice:
-- Participation token payload (signed with HMAC-SHA256) { "user_id": "u_abc123", "sale_id": "sale_2026_sneaker_001", "device_hash": "sha256_of_fingerprint_components", "issued_at": 1748984400, "expires_at": 1748988000, "bot_score": 0.02 } -- Token is single-use: mark consumed on first queue entry SET token:used:sha256(token) 1 EX 7200
Per-sale purchase cap. One account, one item, enforced at queue processing time:
purchased_key = "purchased:" + sale_id + ":" + user_id IF EXISTS purchased_key: RETURN "already_purchased" -- Set on success: SET purchased_key 1 EX 86400
Nike SNKRS drops are notoriously competitive — often 100,000 people competing for 1,000 pairs. Nike moved to a randomized draw model instead of first-come-first-served specifically to neutralize bots. You cannot bot a random draw: submitting faster gives zero advantage because the draw happens at a fixed cutoff time and all entries before that moment have equal probability. The queue-based approach can adopt the same idea — randomize queue order among entries that arrive within the first 500ms (the human reaction window).
8. Level 7 — The Waiting Room
Level 7 Absorb pre-sale load on the CDN
The waiting room is a completely static HTML page served from the CDN edge. It collects users before the sale opens, pre-validates them, and releases a controlled burst at T=0.
Timeline
T-60 min: Users visit the product page and are served a redirect to the waiting room. This is a static file on CloudFront or Cloudflare — zero backend load, unlimited concurrent viewers, sub-10ms global latency.
T-10 min: The waiting room begins accepting "intent registrations." The page sends the user's auth token to a lightweight validation endpoint which checks account age, purchase history, and device fingerprint, then issues a queue entry JWT valid for 15 minutes.
T=0: The waiting room JavaScript detects the countdown reaching zero — either by local clock or by a server-sent event — and fires the queue entry request with the pre-validated JWT. Since validation already happened, queue entry is a single Redis ZADD with no database calls and no auth overhead.
// Waiting room countdown fires queue entry at T=0 fetch('/api/sale/sneaker-001/start-time') .then(function (r) { return r.json(); }) .then(function (cfg) { var saleStart = new Date(cfg.start_time).getTime(); var tick = setInterval(function () { var remaining = saleStart - Date.now(); if (remaining <= 0) { clearInterval(tick); enterQueue(queueEntryJWT); // single Redis ZADD return; } var secs = Math.floor(remaining / 1000); var mins = Math.floor(secs / 60); var pad = (secs % 60) < 10 ? '0' : ''; countdownEl.textContent = mins + ':' + pad + (secs % 60); }, 100); });
The key benefit: without the waiting room, T=0 triggers simultaneous authentication + authorization + bot-check + inventory operation for 500,000 users. With the waiting room, authentication is distributed over 10 minutes before the sale, and T=0 is reduced to a single Redis call per user.
Cloudflare Waiting Room and Queue-it sell exactly this pattern as managed products. A waiting room is fundamentally just a static countdown page with a WebSocket or SSE connection. The infrastructure cost to serve 500,000 people a CDN-cached countdown timer is essentially zero — a few dollars in bandwidth. The value is entirely in the controlled transition: at T=0, you decide exactly how many requests per second migrate from the waiting room to your real backend.
9. Handling Payment Failures
A user reaches the front of the queue, their slot is reserved, stock decremented — then their payment fails. What happens to that inventory unit?
The Soft Hold Pattern
-- Queue processor reserves a slot: create a hold with TTL order_id = uuid() SET hold:sneaker-001:+order_id user_id EX 300 -- 5-minute TTL INSERT INTO orders (id, user_id, product_id, status, hold_expires_at) VALUES (order_id, user_id, 'sneaker-001', 'pending_payment', NOW() + 300); -- On successful payment: UPDATE orders SET status = 'confirmed' WHERE id = order_id; DEL hold:sneaker-001:+order_id SET purchased:sale_id:user_id 1 EX 86400 -- On payment failure: UPDATE orders SET status = 'cancelled' WHERE id = order_id; DEL hold:sneaker-001:+order_id INCR product:sneaker-001:stock -- release unit back to inventory
A background worker sweeps for expired holds every 60 seconds:
-- Cleanup job: runs every 60 seconds expired = SELECT id, user_id FROM orders WHERE status = 'pending_payment' AND hold_expires_at < NOW(); FOR EACH order IN expired: UPDATE orders SET status = 'expired' WHERE id = order.id; INCR product:sneaker-001:stock -- reclaim the unit notifyUser(order.user_id, 'hold_expired')
Inventory Consistency Invariant
At all times the following must hold true. Run this as a monitoring query and alert on any divergence:
-- The invariant: -- redis_stock = initial_stock -- - COUNT(confirmed orders) -- - COUNT(active holds not yet expired) redis_stock == initial_stock - (SELECT COUNT(*) FROM orders WHERE status = 'confirmed') - (SELECT COUNT(*) FROM orders WHERE status = 'pending_payment' AND hold_expires_at > NOW()) -- Alert threshold: abs(divergence) > 1 -- Expected divergence in normal operation: 0
Concert ticketing platforms like Ticketmaster use exactly this pattern. The countdown timer you see while completing your purchase is a soft hold enforced by a server-side Redis TTL. If you abandon checkout, those seats return to inventory automatically when the key expires. The 10-minute checkout window is not just UX — it is the TTL value in their hold key. They run the same background sweep to catch seats abandoned mid-payment.
10. Capacity Estimates
| Metric | Value |
|---|---|
| Users waiting at T=0 | 500,000 |
| Queue entry requests at T=0 (burst) | ~500,000/sec |
| Redis ZADD throughput (single node) | 500,000+/sec |
| API pods needed for queue entry (10k req/s each) | 50 pods |
| Controlled queue drain rate | 200/sec |
| Time to sell all 1,000 items | ~5 seconds |
| Steady-state DB write rate (order creation) | 200/sec |
| Users notified "sold out" | ~499,000 |
| Active soft hold keys at peak | ~200 (TTL 300s) |
| Redis memory for full queue (500k entries × ~50 bytes) | ~25 MB |
| Waiting room CDN cost (500k users, static page) | ~$0.50 |
The numbers reveal an important inversion: the hard engineering problem is not the 1,000 successful transactions — it is gracefully handling the 499,000 failures. Each rejection requires a polite notification. That is 499,000 WebSocket messages or SSE events to deliver, plus queue cleanup in Redis, plus user-facing messaging. Design your notification pipeline to handle this throughput before the first sale.
11. Failure Modes and Recovery
What happens when components go down mid-sale?
Redis failure. Both the queue and the inventory counter live in Redis. If Redis goes down, you cannot accept new queue entries or decrement stock. Mitigations:
- Redis Sentinel or Redis Cluster with automatic failover — target under 1 second failover time
- Accept that a brief Redis outage pauses the sale; surface a "Technical difficulty — please wait" banner
- Never run a flash sale without Redis replication. A single Redis node is a single point of failure
Queue processor crash. If the worker processing the queue crashes mid-drain, items may be in a gap between "popped from sorted set" and "order written to DB." Use a two-set approach for safe handoff:
-- Atomically move entries to a "processing" set MULTI entries = ZPOPMIN queue:sale:001 200 ZADD queue:processing timestamp() entry.user_id -- for each EXEC -- Only remove from processing set after DB write confirmed: ZREM queue:processing user_id -- On worker restart: re-process anything stuck in queue:processing -- Items stuck > 30s are stale; re-inject to front of main queue
Database overload. If the database cannot sustain 200 writes/sec (unlikely on modern hardware, possible under high I/O contention):
- Switch to batched multi-row INSERT: accumulate 200 order records, insert as one statement per tick
- Temporarily reduce drain rate — 100/sec still sells 1,000 items in 10 seconds
- Write orders to a Kafka topic and let the DB consumer work at its own pace
Clock skew across API pods. Queue positions are sorted by arrival timestamp. If API servers disagree on the current time by ±50ms, queue ordering within that window is non-deterministic. This is acceptable — simultaneous arrival is indistinguishable from near-simultaneous arrival, and the window is far smaller than human reaction time differences. Use NTP with a local time server if tighter ordering matters.
12. Complete Architecture at a Glance
Summary: The Six-Layer Defence
| Layer | Problem Solved | Mechanism |
|---|---|---|
| Redis atomic DECR | Overselling | Atomic read-decrement; no application-level race |
| Virtual queue | Thundering herd | Absorb 500k burst; drain at controlled 200/sec |
| Rate limiting | Bot request spam | Per-user and per-IP Redis sliding window counters |
| Account requirements | Throwaway bot accounts | 30-day age gate, prior purchase requirement |
| Device-bound token | Multi-entry bots | Fingerprint-bound JWT, single-use enforcement |
| Waiting room (CDN) | Pre-sale load spike | Static page absorbs crowd; pre-validates users |
The answer the interviewer is looking for is not "use Redis." It is the recognition that flash sales have three distinct failure modes — overselling, thundering herd, and bot fairness — each requiring a different mechanism, and that the virtual queue is the architectural cornerstone that makes the other layers composable. Without the queue, you are applying point fixes to a fundamentally broken request flow.
The real challenge in production is not the 1,000 successful sales. It is the 499,000 graceful failures — delivered fast, politely, without crashing anything.