System Design: The Like Button — Counting at Billions of Clicks per Second
Facebook’s “Reactions”
(Like, Love, Haha, Wow,
Sad, Angry) are
architecturally the same
problem — just 6 counters
per post instead of 1.
The 2016 launch added
~6x write load to their
like infrastructure
overnight.
Design the “Like” button for YouTube. At peak, a viral video receives 500,000 likes per minute. Likes must be accurate, fast, and eventually consistent. Users should see a near-real-time count. Likes must be idempotent — clicking twice must not double-count.
This is one of the most common system design interview questions. It looks trivial. It is not.
1. Scale & Constraints
The read:write asymmetry is critical. For every person clicking Like, roughly 1,000 users are just viewing the count. This means the display path must be ultra-cheap (cache-heavy), while the write path can tolerate slightly more latency and can be eventually consistent.
The constraints immediately rule out naive relational approaches. Let’s walk through each level.
2. Level 1 — Naive SQL
The first instinct is to increment a counter in the database:
-- Like a video UPDATE videos SET like_count = like_count + 1 WHERE id = 'video_abc'; -- Idempotency via unique constraint INSERT INTO user_likes (user_id, video_id) VALUES ('user_123', 'video_abc') ON DUPLICATE KEY UPDATE user_id = user_id;
Why it breaks at scale:
The UPDATE videos SET like_count = like_count + 1 statement acquires a row-level lock on that single row for the duration of the transaction. At 8,333 writes/sec all hitting the same video_abc row, you get:
- Lock contention: writes queue up, latency climbs from milliseconds to seconds
- Connection pool exhaustion: threads holding locks block new connections
- Thundering herd: a cache expiry causes all readers to hit DB simultaneously
The idempotency table (user_likes) also creates a secondary write on every like, doubling DB load. And reads at 1000× the write rate hit the same DB unless you add read replicas — which don’t help write throughput at all.
Verdict: Works for a small site. Fails at YouTube scale on a single hot video.
3. Level 2 — Write-Through Redis Counter
Replace the DB write with an atomic Redis operation:
# Atomic increment — O(1), no locks needed INCR video:likes:abc123 # Returns: (integer) 500001 # Idempotency: only INCR if user hasn't liked yet SET user:liked:user123:abc123 1 NX EX 86400 # NX = only set if Not eXists → returns OK or nil # EX 86400 = expire after 24h (memory management) # Read the count (served from Redis — 100k ops/sec) GET video:likes:abc123
Redis INCR is atomic because Redis is single-threaded for command execution — no locks, no contention. A single Redis node handles ~100,000 simple operations/second, which comfortably handles 8,333 writes/sec for one video.
The Redis INCR command is
atomic because Redis is
single-threaded for command
execution. No locks needed
— it’s one of the reasons
Redis counters are so popular
for this exact use case.
Problems with Level 2:
- Redis is in-memory: if the node crashes, like counts reset to zero
- The idempotency keys (
user:liked:…) use significant memory across millions of users and videos - We still haven’t addressed persistence to a database
Interactive Demo: Redis Like Counter
4. Level 3 — Write-Behind with Batch Flush
The solution to the durability problem: keep Redis as the live counter, but asynchronously flush deltas to the database.
Live counter
Every 30s
Source of truth
import redis, mysql.connector, time r = redis.Redis() db = mysql.connector.connect(...) def flush_like_counts(): # GETDEL atomically reads and removes the delta key cursor = r.scan_iter("video:likes:delta:*") for key in cursor: delta = r.getdel(key) # atomic read + delete if delta: video_id = key.decode().split(':')[-1] db.cursor().execute( "UPDATE videos SET likes = likes + %s WHERE id = %s", (int(delta), video_id) ) db.commit() while True: flush_like_counts() time.sleep(30) # flush every 30 seconds
Two Redis keys per video:
# Live display counter (absolute, loaded from DB + live delta) GET video:likes:abc123 # → "500,423" (shown to user) # Delta buffer (how many likes since last DB flush) INCR video:likes:delta:abc123 # → incremented atomically # On flush: read delta, write to DB, delete delta key GETDEL video:likes:delta:abc123 # → "8333" (30s of likes)
What to say in an interview: “The trade-off is a 30-second window of data loss on Redis crash. We mitigate this with Redis persistence (AOF/RDB snapshots) and Redis Sentinel for HA. For a like count, losing 30 seconds of likes on a node failure is an acceptable trade-off versus the DB being the hot write path.”
5. Level 4 — Event Streaming with Kafka
For true scale, analytics, and full decoupling: publish every like/unlike as an event.
{
"userId": "user_a1b2c3",
"videoId": "video_abc123",
"action": "like", // "like" | "unlike"
"timestamp": 1748736000000, // Unix ms
"region": "us-east-1",
"sessionId": "sess_xyz789"
}
from kafka import KafkaProducer import json producer = KafkaProducer( bootstrap_servers=['kafka-1:9092', 'kafka-2:9092'], value_serializer=lambda v: json.dumps(v).encode() ) def publish_like_event(user_id, video_id, action): producer.send( topic='video-likes', key=video_id.encode(), # partition by videoId value={ 'userId': user_id, 'videoId': video_id, 'action': action, 'timestamp': time_ms() } )
Stream processor (Flink/Spark Streaming):
// For each 1-second window of events: events .filter(e => e.action == "like") .keyBy(e => e.videoId) .window(TumblingEventTimeWindows.of(1, SECONDS)) .aggregate(count) .sink(redisSink) // INCRBY video:likes:X delta // Every 60 seconds: snapshot Redis → MySQL // (same flush pattern as Level 3)
Why Kafka unlocks more:
- Replay: if the aggregator has a bug, replay all events to recompute counts
- Analytics: who liked what, from where, at what time — fan out to a data warehouse
- Multiple consumers: the like feed can power recommendations, notifications, trending algorithms — all from the same event stream
- Backpressure handling: Kafka buffers spikes; the aggregator processes at its own pace
Interactive: Event Stream Visualizer
6. Level 5 — Idempotency at Scale
The hardest constraint: one user = one like, even across distributed nodes. Compare the options:
| Option | Mechanism | Pros | Cons | Verdict |
|---|---|---|---|---|
| A — Redis SET NX | SET user:liked:{uid}:{vid} 1 NX |
Fast, atomic, no DB touch | Memory: grows O(users × videos); eviction loses data | OK for hot videos |
| B — DB unique constraint | UNIQUE(user_id, video_id) in likes table |
Perfectly accurate; no memory issue | DB write on every like; hot table at scale | Best for correctness |
| C — Bloom filter | Per-video probabilistic set membership | Sub-MB memory per video; ultra-fast | False positives → rare legitimate likes dropped; no undo | Not for unlikes |
| D — UserID partition | Shard by userId; each shard checks locally | Distributed; each shard is independent | Cross-shard queries needed for analytics; shard rebalancing | Best at extreme scale |
Recommended hybrid for an interview:
def like_video(user_id, video_id): key = 'user:liked:' + user_id + ':' + video_id # Fast path: Redis NX check (in-memory) if not r.set(key, 1, nx=True, ex=86400): return "already_liked" # idempotent — no-op # Increment the display counter r.incr('video:likes:' + video_id) r.incr('video:likes:delta:' + video_id) # Async: write to DB (background job or queue) # DB has UNIQUE(user_id, video_id) as safety net queue.enqueue('persist_like', user_id, video_id) return "liked"
7. Level 6 — Sharding & Geographic Distribution
A single Redis node handles ~100k ops/sec. Platform-wide, we need ~2M ops/sec. The solution: shard Redis by videoId.
import hashlib REDIS_SHARDS = [ 'redis-shard-0:6379', 'redis-shard-1:6379', 'redis-shard-2:6379', 'redis-shard-3:6379', ] def get_shard(video_id): h = int(hashlib.md5(video_id.encode()).hexdigest(), 16) return REDIS_SHARDS[h % len(REDIS_SHARDS)] # video_abc → shard-2, video_xyz → shard-0 # Each shard owns ~25% of videos; ~500k ops/sec each
Global distribution:
For a truly global video (viral in both Tokyo and New York simultaneously), regional Redis clusters reduce latency and distribute load:
Eventual consistency in practice: A user in Tokyo and a user in NYC may see like counts that differ by a few thousand for ~1 second. This is acceptable — like counts are inherently approximate displays, not financial ledgers. YouTube itself shows rounded counts (“1.2M likes”) for popular videos, which further masks small transient differences.
YouTube doesn’t show exact
like counts anymore for
videos under ~10k likes
(they show approximations).
This reduces the psychological
“one more click matters”
effect and — conveniently
— reduces the idempotency
enforcement cost.
8. The Unlike Problem & CRDTs
Eventual consistency gets non-trivial when users change their minds:
The naive G-Counter (grow-only counter) cannot model this — it has no decrement. You need a PN-Counter (Positive-Negative Counter):
# Instead of one counter, maintain two INCR video:likes:p:abc123 # positive counter (likes) INCR video:likes:n:abc123 # negative counter (unlikes) # Display count = P - N (always >= 0) GET video:likes:p:abc123 # → 500,423 GET video:likes:n:abc123 # → 50,001 # net = 500,423 - 50,001 = 450,422 # Merging regions: take MAX of each regional P and N counter # P_global = max(P_us, P_eu, P_asia) # N_global = max(N_us, N_eu, N_asia)
Why MAX for merging? Each region only increments its own counter and never decrements it. If Region A has seen 3 likes and Region B has seen 5 likes for the same user actions, the global truth is 5 (Region B has more complete information). Taking MAX of monotonically-increasing G-Counters gives the correct CRDT merge.
9. Capacity Estimate
| Component | Numbers | Notes |
|---|---|---|
| Viral video peak writes | 8,333 / sec | 500k likes/min ÷ 60 |
| Platform-wide like events | ~2M / sec peak | 800M DAU, avg 150 likes/day each |
| Kafka throughput needed | ~10 MB/sec | 2M events × ~50 bytes/event |
| Redis memory per video | ~80 bytes | P counter + N counter + delta + metadata |
| Top 1M videos in Redis | ~80 MB | Trivial; Redis can hold billions of small keys |
| Idempotency keys (Redis NX) | ~50 bytes each | For 10M active likers × top 10k videos = 500 GB — use TTL or DB fallback |
| DB write rate (after flush) | 1 write / 30s / video | vs 8,333/s without batching |
| Like table in DB | ~500 bytes / like row | userId(8) + videoId(8) + timestamp(8) + indexes + overhead |
| Annual like storage | ~150 TB / year | ~300B likes/year × 500 bytes |
10. Full Architecture — Clickable Pipeline
Click each stage to see implementation details:
Client
CDN
API Gateway
Like Service
Redis Cluster
Kafka Topic
Stream Processor
Redis Display
DB Snapshot
11. Interview Cheat Sheet
When asked “Design the Like button” in an interview, structure your answer around these escalation levels:
| Level | Approach | Max Throughput | Key Trade-off |
|---|---|---|---|
| 1 | SQL UPDATE ... SET likes = likes + 1 |
~500 writes/sec (hot row) | Simple, correct, doesn't scale |
| 2 | Redis INCR + write-through | ~100k writes/sec | Fast; data loss on crash |
| 3 | Redis INCR + write-behind (30s flush) | ~100k writes/sec | Durable; 30s loss window |
| 4 | Kafka events + stream aggregation | ~2M writes/sec | Fully decoupled; operationally complex |
| 5 | Sharded Redis + geo-distribution + PN-Counters | Theoretically unlimited | Eventually consistent; ~1s lag |
Summary
The “Like” button is a masterclass in the gap between appearances and complexity. A single UPDATE statement works for your side project. At YouTube scale, it requires:
- Redis atomic counters for in-memory, lock-free increment/decrement
- Write-behind batching to protect the database from hot-row contention
- Kafka event streaming for durability, analytics, and decoupling
- Hybrid idempotency (Redis NX fast path + DB unique constraint fallback)
- PN-Counters for correct CRDT semantics when merging regional like/unlike data
- CDN-cached read path to absorb the 1000:1 read:write asymmetry
Every design decision is a trade-off: memory vs. durability, consistency vs. latency, simplicity vs. scale. The right answer depends on where on that curve your system needs to be.
“Facebook’s ‘Reactions’
(Like, Love, Haha, Wow,
Sad, Angry) are
architecturally the same
problem — just 6 counters
per post instead of 1.
The 2016 launch added
~6x write load to their
like infrastructure
overnight.”