System Design: Chat System — From Polling to WhatsApp at Scale
Series #12 of 15 — System Design Interview Prep
1. The Problem
Chat is deceptively simple at first glance: send a message, receive a message. You have probably built a toy version of this in an afternoon. The complexity emerges at scale.
The constraints turn a weekend project into a distributed systems challenge:
- Real-time delivery — users expect messages in under 100ms, not “eventually”
- Offline queuing — a message must survive even if the recipient is offline for days
- Ordering — messages must arrive in the order they were sent, across devices
- Delivery receipts — the double-tick ✓✓ is a product feature users depend on
- Group fan-out — one message to a 500-member group means 499 separate deliveries
- Connection scale — 50M concurrent WebSocket connections don’t fit on one machine
WhatsApp handled 100 billion messages per day at peak, with a team of around 50 engineers. The architecture decisions below are why that number is possible.
2. Level 1 — HTTP Polling
The simplest possible approach: the client asks the server for new messages every few seconds.
// Client polls every 5 seconds setInterval(function() { fetch('/messages?since=' + lastMessageId) .then(function(r) { return r.json(); }) .then(function(msgs) { renderMessages(msgs); }); }, 5000);
What works: Extremely simple. Any HTTP server handles it. No persistent connections. Works through proxies, firewalls, and corporate networks that block WebSockets.
What breaks at scale: With 50M users polling every 5 seconds that is 10 million
requests per second — and the vast majority return an empty body. You are burning CPU,
bandwidth, and money to serve [] billions of times per day. Latency averages 2.5s (half
the poll interval), making the chat feel sluggish compared to SMS.
3. Level 2 — Long Polling
The server holds the connection open until a message arrives (or a timeout, typically 30s). The client immediately reconnects after each response.
// Server holds request open — pseudocode app.get('/poll', async function(req, res) { const deadline = Date.now() + 30000; while (Date.now() < deadline) { const msgs = await checkForMessages(req.userId); if (msgs.length) { return res.json(msgs); } await sleep(500); // check every 500ms } res.json([]); // timeout — client reconnects });
Improvements: Latency drops to near-zero when a message arrives. Wasted requests drop dramatically — each connection lasts up to 30s instead of firing every 5s.
Still problematic: Every message delivery requires a new TCP handshake and HTTP headers. Stateless HTTP servers cannot easily route to the right open connection — if user A’s request is held on server 1, and a message arrives on server 2, server 2 must find and wake server 1. Connection limit per server is around 65,000 ephemeral ports.
| Method | Latency | Wasted requests | Server connections |
|---|---|---|---|
| Polling (5s) | ~2.5 s avg | ~99% empty | Stateless |
| Long Polling | ~100 ms | ~20% empty | One per user |
| WebSockets | <50 ms | Zero wasted | Stateful |
4. Level 3 — WebSockets
A WebSocket starts as an HTTP request and upgrades to a persistent, bidirectional TCP connection. Once established, either side can send frames at any time with no headers, no handshakes, and no polling overhead.
// Client — single connection, messages flow both ways const ws = new WebSocket('wss://chat.example.com/ws'); ws.onmessage = function(event) { const msg = JSON.parse(event.data); renderMessage(msg); }; function sendMessage(text) { ws.send(JSON.stringify({ type: 'msg', text: text })); }
The key property: the server pushes messages to the client the instant they arrive. No polling, no reconnect overhead. Latency is limited only by the network round trip.
5. Level 4 — Message Service Architecture
A single Chat Server cannot hold 50M WebSocket connections. We need a fleet. The design splits responsibilities into purpose-built services. Click each component to learn more.
Message flow summary (1-1 message):
- Client A sends message over WebSocket → Chat Server A
- Chat Server A → Message Service → Cassandra (persist) + Kafka (publish)
- Fan-out Service consumes Kafka → queries Presence → finds Client B on Chat Server B
- Fan-out Service tells Chat Server B to deliver
- Chat Server B pushes message to Client B over WebSocket
- Client B device sends “delivered” ACK → Chat Server B → back to Client A (second tick ✓✓)
6. Level 5 — Message Storage & Ordering
Why Cassandra?
Cassandra is a wide-column store built for high write throughput across distributed nodes. Chat is a write-heavy workload: every message is a write, every delivery receipt is a write, reads (fetching history) are less frequent. Cassandra also handles time-series data naturally through its clustering key ordering.
Schema
-- Messages table: one partition per channel, ordered by message time CREATE TABLE messages ( channel_id UUID, -- partition key: all messages for a channel together message_id BIGINT, -- clustering key DESC: newest first — Snowflake ID sender_id UUID, content TEXT, type TEXT, -- 'text' | 'image' | 'video' | 'system' created_at TIMESTAMP, PRIMARY KEY ((channel_id), message_id) ) WITH CLUSTERING ORDER BY (message_id DESC); -- Fetch last 20 messages in channel 123 SELECT * FROM messages WHERE channel_id = 123 LIMIT 20; -- Fetch messages before a cursor (pagination) SELECT * FROM messages WHERE channel_id = 123 AND message_id < 1745660400000 -- cursor from last page LIMIT 20;
Snowflake IDs — Ordering Without Coordination
Cassandra does not provide auto-increment IDs across nodes. We need globally ordered IDs generated without coordination (no single sequence counter). Twitter’s Snowflake format:
-- 64-bit Snowflake ID layout -- [41 bits timestamp ms] [10 bits machine id] [12 bits sequence] -- 2^41 ms = ~69 years of timestamps -- 2^10 = 1024 machines -- 2^12 = 4096 IDs per machine per millisecond function generateSnowflakeId(machineId, sequence) { var epoch = 1700000000000; // custom epoch (Nov 2023) var ts = Date.now() - epoch; return (ts * 1000000) + (machineId * 4096) + sequence; }
Snowflake IDs sort chronologically as integers — a range scan on message_id in Cassandra
gives you messages in time order without any ORDER BY on a separate timestamp column.
The double-ratchet algorithm (Signal protocol) generates a new encryption key for every single message — even if one key is compromised, past and future messages remain secure. Forward secrecy means recorded ciphertext cannot be decrypted even if the long-term keys are later stolen.
7. Level 6 — Delivery Receipts (the ✓✓ Problem)
Three distinct states exist for every message:
- Sent (✓) — the server received and persisted the message
- Delivered (✓✓) — the recipient’s device received the message
- Read (✓✓ blue) — the recipient opened the conversation and saw the message
The ✓✓ mechanism is more complex than it appears because “delivered” and “read” are events that originate on the recipient’s device and must travel back to the sender.
-- Receipt events table CREATE TABLE receipts ( message_id BIGINT, user_id UUID, status TEXT, -- 'sent' | 'delivered' | 'read' updated_at TIMESTAMP, PRIMARY KEY (message_id, user_id) );
Flow for a 1-1 message:
- Sender’s device → Chat Server:
{type: "msg", text: "Hello"}(over WebSocket) - Message Service persists → returns
{type: "ack", status: "sent", msgId: X}to sender → ✓ appears - Fan-out Service delivers to recipient’s device → device auto-responds
{type: "delivered", msgId: X} - Server updates receipts table → pushes
{type: "receipt", msgId: X, status: "delivered"}to sender → ✓✓ appears - Recipient opens the chat window → device sends
{type: "read", msgId: X} - Server updates receipts → pushes
{type: "receipt", msgId: X, status: "read"}to sender → ✓✓ turns blue
8. Level 7 — Offline Message Delivery
When the recipient is offline, the Fan-out Service has no Chat Server connection to route to. Messages must be durably queued until the user reconnects — possibly days later.
-- Redis offline queue per user -- Key pattern: offline:{userId} -- Structure: Redis List (RPUSH to enqueue, LRANGE to drain) -- Fan-out Service: recipient offline RPUSH offline:user_456 '{"msgId":X,"from":"alice","text":"Hey!"}' -- Set 30-day TTL so messages expire if never read EXPIRE offline:user_456 2592000 -- On reconnect: drain the queue atomically LRANGE offline:user_456 0 -1 -- read all pending DEL offline:user_456 -- clear the queue
Reconnect flow:
- User opens the app → device establishes WebSocket connection to Chat Server
- Chat Server registers presence (Redis:
userId → serverId) - Chat Server queries offline queue:
LRANGE offline:userId 0 -1 - Delivers all queued messages in order over the new WebSocket connection
- Device sends delivery ACKs for each message, triggering sender-side receipt updates
- Chat Server deletes the offline queue
Edge cases: Push notifications (APNs / FCM) wake the app when a message arrives so the reconnect happens quickly. Message deduplication is needed because the device may have received some messages via push before reconnecting.
9. Level 8 — Group Chat Fan-out
A 500-member group is the hardest fan-out scenario. One message generates up to 499 deliveries. At scale:
Fan-out strategy for large groups:
-- Message arrives for group_id = 77 -- Kafka message: { group_id: 77, msg: {...} } -- Partitioned by group_id — ensures ordering within a group -- Fan-out worker algorithm function fanoutGroupMessage(groupId, message) { var members = getGroupMembers(groupId); // cached in Redis members.forEach(function(userId) { var serverId = presence.get(userId); // O(1) Redis lookup if (serverId) { chatServers.deliver(serverId, userId, message); // online } else { offlineQueue.push(userId, message); // offline } }); }
Optimizations for very large groups (1000+ members):
- Write fan-out vs. read fan-out: For extremely large groups (Slack channels with 10,000 members), instead of delivering to all connections immediately, store the message once and let clients pull when they open the channel. Hybrid: deliver to online members, pull for offline.
- Group member cache: Store member lists in Redis. Avoid hitting the database on every fan-out.
- Kafka partition count: More partitions → more parallel fan-out workers. Tune to match your delivery throughput target.
WhatsApp famously runs on Erlang — a language built for telecom switches that needed to handle millions of concurrent connections with sub-millisecond failover. In 2014 they had 450M users with just 50 engineers. Erlang’s actor model maps directly to the “one process per connection” model that WebSocket servers require.
10. Level 9 — End-to-End Encryption
In E2EE, the server stores only ciphertext. It cannot read your messages — only the intended recipient’s device can decrypt them.
Key concepts:
The Signal Protocol (Double Ratchet Algorithm):
-- Key agreement: X3DH (Extended Triple Diffie-Hellman) -- Each party publishes a "pre-key bundle" to the server -- Session established without both parties being online simultaneously -- Double ratchet: new key per message -- Two ratchets running in parallel: -- 1. Diffie-Hellman ratchet: new DH key on each round trip -- 2. Symmetric ratchet: chain key advances on each message -- Result: forward secrecy + break-in recovery -- Apps using Signal protocol: WhatsApp, Signal, iMessage (partial)
Why this matters for system design:
- The server’s message store contains only encrypted blobs — a Cassandra data leak exposes ciphertext, not content.
- Search, spam filtering, and content moderation become impossible on-server — a deliberate privacy tradeoff.
- Multi-device support requires key exchange with each device separately.
11. Interactive: Mini Chat Demo
Two users, one conversation. Type and send messages as Alice or Bob.
12. Capacity Estimation
| Metric | Assumption | Result |
|---|---|---|
| Concurrent WebSocket connections | 50M concurrent users, 1 connection each | 50M connections |
| Chat Servers needed | 100K connections per server | ~500 servers |
| Message throughput | 100B messages/day (WhatsApp scale) | ~1.16M msg/sec |
| Peak throughput (3× avg) | Evening spike factor | ~3.5M msg/sec |
| Message storage per day | 100B msgs × 200 bytes avg (with metadata) | ~20 TB/day |
| Annual storage (3× replication) | 20 TB × 365 × 3 replicas | ~22 PB/year |
| Kafka throughput | 3.5M msg/sec × 200 bytes | ~700 MB/sec |
| Presence service memory | 50M users × 100 bytes (userId + serverId + TTL) | ~5 GB RAM |
| Fan-out at group scale | 1M groups msgs/min × 500 members | ~8.3M deliveries/sec |
Cassandra cluster sizing: At 3.5M writes/sec with 3× replication = 10.5M write ops/sec across the cluster. At ~50K writes/sec per Cassandra node, you need ~210 nodes. With commodity hardware (3 TB SSD each), that is ~630 TB raw capacity — covered.
Kafka sizing: 700 MB/sec write × 3 replicas = 2.1 GB/sec. At 1 Gbps per broker, you need ~20 Kafka brokers with 7-day message retention (~1.2 PB total Kafka storage).
Summary
The journey from a polling endpoint to a WhatsApp-scale chat system follows a clear progression. Each level solves a specific bottleneck:
| Level | Problem Solved | Key Technology |
|---|---|---|
| 1 — Polling | Works at all | HTTP GET |
| 2 — Long Polling | Latency | HTTP hold-open |
| 3 — WebSockets | Real-time + wasted requests | TCP upgrade |
| 4 — Message Service | Durability + decoupling | Kafka + Cassandra |
| 5 — Snowflake IDs | Ordering at scale | 64-bit composite ID |
| 6 — Receipts | ✓✓ blue ticks | ACK events over WebSocket |
| 7 — Offline Queue | Durability while disconnected | Redis Lists + TTL |
| 8 — Group Fan-out | 499 deliveries from 1 message | Kafka partitions |
| 9 — E2EE | Privacy | Signal protocol |
WebSocket connections are stateful — this means you cannot simply add servers behind a load balancer. You need sticky sessions or a pub/sub layer (Redis Pub/Sub or Kafka) so any server can route delivery to any open connection. This is the core operational complexity of running a chat system at scale.
In an interview, the progression from polling to WebSockets to the full distributed architecture demonstrates the ability to start simple, identify bottlenecks, and apply the right tool at each layer. The delivery receipt mechanism and offline queue are the most commonly missed details — knowing them signals depth of understanding.