System Design: Chat System — From Polling to WhatsApp at Scale

Series #12 of 15 — System Design Interview Prep

1. The Problem

Chat is deceptively simple at first glance: send a message, receive a message. You have probably built a toy version of this in an afternoon. The complexity emerges at scale.

Interview question: Design a real-time chat system like WhatsApp or Slack. Support 1-1 messaging, group chats (up to 500 members), message delivery receipts (sent / delivered / read), and offline message delivery. Handle 50 million concurrent users.

The constraints turn a weekend project into a distributed systems challenge:

Real-time delivery — users expect messages in under 100ms, not “eventually”
Offline queuing — a message must survive even if the recipient is offline for days
Ordering — messages must arrive in the order they were sent, across devices
Delivery receipts — the double-tick ✓✓ is a product feature users depend on
Group fan-out — one message to a 500-member group means 499 separate deliveries
Connection scale — 50M concurrent WebSocket connections don’t fit on one machine

WhatsApp handled 100 billion messages per day at peak, with a team of around 50 engineers. The architecture decisions below are why that number is possible.

2. Level 1 — HTTP Polling

The simplest possible approach: the client asks the server for new messages every few seconds.

// Client polls every 5 seconds
setInterval(function() {
  fetch('/messages?since=' + lastMessageId)
    .then(function(r) { return r.json(); })
    .then(function(msgs) { renderMessages(msgs); });
}, 5000);

What works: Extremely simple. Any HTTP server handles it. No persistent connections. Works through proxies, firewalls, and corporate networks that block WebSockets.

What breaks at scale: With 50M users polling every 5 seconds that is 10 million requests per second — and the vast majority return an empty body. You are burning CPU, bandwidth, and money to serve [] billions of times per day. Latency averages 2.5s (half the poll interval), making the chat feel sluggish compared to SMS.

3. Level 2 — Long Polling

The server holds the connection open until a message arrives (or a timeout, typically 30s). The client immediately reconnects after each response.

// Server holds request open — pseudocode
app.get('/poll', async function(req, res) {
  const deadline = Date.now() + 30000;
  while (Date.now() < deadline) {
    const msgs = await checkForMessages(req.userId);
    if (msgs.length) { return res.json(msgs); }
    await sleep(500);   // check every 500ms
  }
  res.json([]);      // timeout — client reconnects
});

Improvements: Latency drops to near-zero when a message arrives. Wasted requests drop dramatically — each connection lasts up to 30s instead of firing every 5s.

Still problematic: Every message delivery requires a new TCP handshake and HTTP headers. Stateless HTTP servers cannot easily route to the right open connection — if user A’s request is held on server 1, and a message arrives on server 2, server 2 must find and wake server 1. Connection limit per server is around 65,000 ephemeral ports.

Method	Latency	Wasted requests	Server connections
Polling (5s)	~2.5 s avg	~99% empty	Stateless
Long Polling	~100 ms	~20% empty	One per user
WebSockets	<50 ms	Zero wasted	Stateful

4. Level 3 — WebSockets

A WebSocket starts as an HTTP request and upgrades to a persistent, bidirectional TCP connection. Once established, either side can send frames at any time with no headers, no handshakes, and no polling overhead.

// Client — single connection, messages flow both ways
const ws = new WebSocket('wss://chat.example.com/ws');

ws.onmessage = function(event) {
  const msg = JSON.parse(event.data);
  renderMessage(msg);
};

function sendMessage(text) {
  ws.send(JSON.stringify({ type: 'msg', text: text }));
}

The key property: the server pushes messages to the client the instant they arrive. No polling, no reconnect overhead. Latency is limited only by the network round trip.

5. Level 4 — Message Service Architecture

A single Chat Server cannot hold 50M WebSocket connections. We need a fleet. The design splits responsibilities into purpose-built services. Click each component to learn more.

💻

Client A

WebSocket

🖥

Chat Server A

100K conns

⚙️

Message Service

write path

🗄

Cassandra

message store

📨

Kafka

message queue

📡

Fan-out Service

delivery router

🟢

Presence Service

online / offline

⚡

Redis

presence + queue

🖥

Chat Server B

100K conns

📱

Client B

WebSocket

← Click any component above to learn what it does and why that technology was chosen.

Message flow summary (1-1 message):

Client A sends message over WebSocket → Chat Server A
Chat Server A → Message Service → Cassandra (persist) + Kafka (publish)
Fan-out Service consumes Kafka → queries Presence → finds Client B on Chat Server B
Fan-out Service tells Chat Server B to deliver
Chat Server B pushes message to Client B over WebSocket
Client B device sends “delivered” ACK → Chat Server B → back to Client A (second tick ✓✓)

6. Level 5 — Message Storage & Ordering

Why Cassandra?

Cassandra is a wide-column store built for high write throughput across distributed nodes. Chat is a write-heavy workload: every message is a write, every delivery receipt is a write, reads (fetching history) are less frequent. Cassandra also handles time-series data naturally through its clustering key ordering.

Schema

-- Messages table: one partition per channel, ordered by message time
CREATE TABLE messages (
  channel_id   UUID,          -- partition key: all messages for a channel together
  message_id   BIGINT,        -- clustering key DESC: newest first — Snowflake ID
  sender_id    UUID,
  content      TEXT,
  type         TEXT,          -- 'text' | 'image' | 'video' | 'system'
  created_at   TIMESTAMP,
  PRIMARY KEY ((channel_id), message_id)
) WITH CLUSTERING ORDER BY (message_id DESC);

-- Fetch last 20 messages in channel 123
SELECT * FROM messages
WHERE channel_id = 123
LIMIT 20;

-- Fetch messages before a cursor (pagination)
SELECT * FROM messages
WHERE channel_id = 123
  AND message_id < 1745660400000   -- cursor from last page
LIMIT 20;

Snowflake IDs — Ordering Without Coordination

Cassandra does not provide auto-increment IDs across nodes. We need globally ordered IDs generated without coordination (no single sequence counter). Twitter’s Snowflake format:

-- 64-bit Snowflake ID layout
-- [41 bits timestamp ms] [10 bits machine id] [12 bits sequence]
--  2^41 ms = ~69 years of timestamps
--  2^10 = 1024 machines
--  2^12 = 4096 IDs per machine per millisecond

function generateSnowflakeId(machineId, sequence) {
  var epoch  = 1700000000000;   // custom epoch (Nov 2023)
  var ts     = Date.now() - epoch;
  return (ts * 1000000) + (machineId * 4096) + sequence;
}

Snowflake IDs sort chronologically as integers — a range scan on message_id in Cassandra gives you messages in time order without any ORDER BY on a separate timestamp column.

The double-ratchet algorithm (Signal protocol) generates a new encryption key for every single message — even if one key is compromised, past and future messages remain secure. Forward secrecy means recorded ciphertext cannot be decrypted even if the long-term keys are later stolen.

7. Level 6 — Delivery Receipts (the ✓✓ Problem)

Three distinct states exist for every message:

Sent (✓) — the server received and persisted the message
Delivered (✓✓) — the recipient’s device received the message
Read (✓✓ blue) — the recipient opened the conversation and saw the message

The ✓✓ mechanism is more complex than it appears because “delivered” and “read” are events that originate on the recipient’s device and must travel back to the sender.

-- Receipt events table
CREATE TABLE receipts (
  message_id   BIGINT,
  user_id      UUID,
  status       TEXT,    -- 'sent' | 'delivered' | 'read'
  updated_at   TIMESTAMP,
  PRIMARY KEY (message_id, user_id)
);

Flow for a 1-1 message:

Sender’s device → Chat Server: {type: "msg", text: "Hello"} (over WebSocket)
Message Service persists → returns {type: "ack", status: "sent", msgId: X} to sender → ✓ appears
Fan-out Service delivers to recipient’s device → device auto-responds {type: "delivered", msgId: X}
Server updates receipts table → pushes {type: "receipt", msgId: X, status: "delivered"} to sender → ✓✓ appears
Recipient opens the chat window → device sends {type: "read", msgId: X}
Server updates receipts → pushes {type: "receipt", msgId: X, status: "read"} to sender → ✓✓ turns blue

8. Level 7 — Offline Message Delivery

When the recipient is offline, the Fan-out Service has no Chat Server connection to route to. Messages must be durably queued until the user reconnects — possibly days later.

-- Redis offline queue per user
-- Key pattern: offline:{userId}
-- Structure: Redis List (RPUSH to enqueue, LRANGE to drain)

-- Fan-out Service: recipient offline
RPUSH offline:user_456  '{"msgId":X,"from":"alice","text":"Hey!"}'

-- Set 30-day TTL so messages expire if never read
EXPIRE offline:user_456  2592000

-- On reconnect: drain the queue atomically
LRANGE  offline:user_456  0  -1   -- read all pending
DEL     offline:user_456         -- clear the queue

Reconnect flow:

User opens the app → device establishes WebSocket connection to Chat Server
Chat Server registers presence (Redis: userId → serverId)
Chat Server queries offline queue: LRANGE offline:userId 0 -1
Delivers all queued messages in order over the new WebSocket connection
Device sends delivery ACKs for each message, triggering sender-side receipt updates
Chat Server deletes the offline queue

Edge cases: Push notifications (APNs / FCM) wake the app when a message arrives so the reconnect happens quickly. Message deduplication is needed because the device may have received some messages via push before reconnecting.

9. Level 8 — Group Chat Fan-out

A 500-member group is the hardest fan-out scenario. One message generates up to 499 deliveries. At scale:

1M group messages/min × 500 members = 500M deliveries/min = ~8.3M deliveries/sec. A single fan-out worker cannot handle this. Kafka partitioning and parallel workers are required.

Fan-out strategy for large groups:

-- Message arrives for group_id = 77
-- Kafka message: { group_id: 77, msg: {...} }
-- Partitioned by group_id — ensures ordering within a group

-- Fan-out worker algorithm
function fanoutGroupMessage(groupId, message) {
  var members = getGroupMembers(groupId);       // cached in Redis

  members.forEach(function(userId) {
    var serverId = presence.get(userId);        // O(1) Redis lookup

    if (serverId) {
      chatServers.deliver(serverId, userId, message);  // online
    } else {
      offlineQueue.push(userId, message);             // offline
    }
  });
}

Optimizations for very large groups (1000+ members):

Write fan-out vs. read fan-out: For extremely large groups (Slack channels with 10,000 members), instead of delivering to all connections immediately, store the message once and let clients pull when they open the channel. Hybrid: deliver to online members, pull for offline.
Group member cache: Store member lists in Redis. Avoid hitting the database on every fan-out.
Kafka partition count: More partitions → more parallel fan-out workers. Tune to match your delivery throughput target.

WhatsApp famously runs on Erlang — a language built for telecom switches that needed to handle millions of concurrent connections with sub-millisecond failover. In 2014 they had 450M users with just 50 engineers. Erlang’s actor model maps directly to the “one process per connection” model that WebSocket servers require.

10. Level 9 — End-to-End Encryption

In E2EE, the server stores only ciphertext. It cannot read your messages — only the intended recipient’s device can decrypt them.

Key concepts:

Alice's device

🔑 Alice private key

🔓 Bob public key

→

Encrypt message

→

Server (ciphertext only)

Bob's device

🔑 Bob private key

→

Decrypt ciphertext

→

Plaintext message

The Signal Protocol (Double Ratchet Algorithm):

-- Key agreement: X3DH (Extended Triple Diffie-Hellman)
-- Each party publishes a "pre-key bundle" to the server
-- Session established without both parties being online simultaneously

-- Double ratchet: new key per message
-- Two ratchets running in parallel:
--   1. Diffie-Hellman ratchet: new DH key on each round trip
--   2. Symmetric ratchet: chain key advances on each message
-- Result: forward secrecy + break-in recovery

-- Apps using Signal protocol: WhatsApp, Signal, iMessage (partial)

Why this matters for system design:

The server’s message store contains only encrypted blobs — a Cassandra data leak exposes ciphertext, not content.
Search, spam filtering, and content moderation become impossible on-server — a deliberate privacy tradeoff.
Multi-device support requires key exchange with each device separately.

11. Interactive: Mini Chat Demo

Two users, one conversation. Type and send messages as Alice or Bob.

12. Capacity Estimation

Metric	Assumption	Result
Concurrent WebSocket connections	50M concurrent users, 1 connection each	50M connections
Chat Servers needed	100K connections per server	~500 servers
Message throughput	100B messages/day (WhatsApp scale)	~1.16M msg/sec
Peak throughput (3× avg)	Evening spike factor	~3.5M msg/sec
Message storage per day	100B msgs × 200 bytes avg (with metadata)	~20 TB/day
Annual storage (3× replication)	20 TB × 365 × 3 replicas	~22 PB/year
Kafka throughput	3.5M msg/sec × 200 bytes	~700 MB/sec
Presence service memory	50M users × 100 bytes (userId + serverId + TTL)	~5 GB RAM
Fan-out at group scale	1M groups msgs/min × 500 members	~8.3M deliveries/sec

Cassandra cluster sizing: At 3.5M writes/sec with 3× replication = 10.5M write ops/sec across the cluster. At ~50K writes/sec per Cassandra node, you need ~210 nodes. With commodity hardware (3 TB SSD each), that is ~630 TB raw capacity — covered.

Kafka sizing: 700 MB/sec write × 3 replicas = 2.1 GB/sec. At 1 Gbps per broker, you need ~20 Kafka brokers with 7-day message retention (~1.2 PB total Kafka storage).

Summary

The journey from a polling endpoint to a WhatsApp-scale chat system follows a clear progression. Each level solves a specific bottleneck:

Level	Problem Solved	Key Technology
1 — Polling	Works at all	HTTP GET
2 — Long Polling	Latency	HTTP hold-open
3 — WebSockets	Real-time + wasted requests	TCP upgrade
4 — Message Service	Durability + decoupling	Kafka + Cassandra
5 — Snowflake IDs	Ordering at scale	64-bit composite ID
6 — Receipts	✓✓ blue ticks	ACK events over WebSocket
7 — Offline Queue	Durability while disconnected	Redis Lists + TTL
8 — Group Fan-out	499 deliveries from 1 message	Kafka partitions
9 — E2EE	Privacy	Signal protocol

WebSocket connections are stateful — this means you cannot simply add servers behind a load balancer. You need sticky sessions or a pub/sub layer (Redis Pub/Sub or Kafka) so any server can route delivery to any open connection. This is the core operational complexity of running a chat system at scale.

In an interview, the progression from polling to WebSockets to the full distributed architecture demonstrates the ability to start simple, identify bottlenecks, and apply the right tool at each layer. The delivery receipt mechanism and offline queue are the most commonly missed details — knowing them signals depth of understanding.