System Design: Collaborative Document Editing — How Google Docs Works

Series System Design: Web Scenarios

Google Docs launched
in 2006. Real-time
multi-user editing felt
like magic then — it
still requires serious
distributed systems
thinking to build.

Design Google Docs. Multiple users edit the same document at the same time. Changes from any user appear in real-time on every other screen. The document must always be consistent — no lost edits, no corrupted state, no divergent copies.

The question: Design a collaborative document editor like Google Docs. Support concurrent edits from multiple users. Guarantee convergence — all clients must end up with the same document.

1. The Core Problem

Two users open the same document simultaneously. Both see “Hello”. Now:

User A inserts " World" at position 5 → sees "Hello World"
User B deletes "H" at position 0 → sees "ello"

Each operation is valid in isolation. But both happen concurrently — neither user has seen the other’s change when they act. If you apply both operations naively in different orders, you get different results:

pseudocode

# Order 1: A first, then B
"Hello"
→ INSERT(" World", pos=5) → "Hello World"
→ DELETE(pos=0)            → "ello World"

# Order 2: B first, then A
"Hello"
→ DELETE(pos=0)            → "ello"
→ INSERT(" World", pos=5) → "ello World" # but position 5 is now past the end!
# → or corrupted, depending on how you handle out-of-bounds

The two clients diverge. This is the concurrent edit conflict problem, and it is the central challenge of collaborative editing at any scale.

Convergence requirement: Every client that applies the same set of operations — in any order — must end up with the same document. This property is called strong eventual consistency.

2. Level 1 — Pessimistic Locking

The simplest solution: lock the document when someone is editing. Only one user types at a time.

pseudocode

function acquireLock(docId, userId):
    if locks[docId] == null:
        locks[docId] = userId
        return true
    return false  # Someone else has the lock — you must wait

function releaseLock(docId, userId):
    if locks[docId] == userId:
        locks[docId] = null

Why this fails as a product: Google Docs with a “Document locked — someone else is editing” banner would be unusable. Real-time collaboration is the product’s core value proposition. Locking destroys it. You also have to handle lock expiry (what if the lock holder crashes?), which adds more complexity.

Pessimistic locking is correct for preventing conflicts but catastrophically wrong for collaborative UX.

3. Level 2 — Last-Write-Wins

Every operation gets a timestamp. When two operations conflict, the one with the higher timestamp wins.

pseudocode

Operation {
    content:   string
    position:  int
    timestamp: int   # wall clock on client machine
    userId:    string
}

function applyWithLWW(ops):
    # Sort by timestamp, last writer wins on conflict
    return ops.sort((a, b) => b.timestamp - a.timestamp)[0]

Why this fails: Client clocks are not synchronized. NTP drift can be hundreds of milliseconds. A user on a laptop with a slightly fast clock always wins. More critically, LWW silently discards one user’s edit — User B’s carefully typed paragraph vanishes with no warning. That is not “conflict resolution”; it is data loss.

The silent data loss problem: In LWW, every "conflict resolution" is actually a decision to silently throw away one user's work. For a document editor, this is unacceptable. Users must not lose edits they typed.

4. Level 3 — Operational Transformation (OT)

The original Google Docs approach (2006–2010). Still used in many systems today.

Core idea: when you receive a remote operation that conflicts with your local state, transform the remote operation to account for what you have already done locally, before applying it.

How OT Works

Every operation has a type (INSERT or DELETE), a position, and content. The transformation function T(op1, op2) returns a new op1 that has been adjusted to account for op2 having already been applied.

javascript

// Transform op1 assuming op2 was already applied
function transform(op1, op2) {
  if (op1.type === 'INSERT' && op2.type === 'INSERT') {
    if (op2.position <= op1.position) {
      // op2 inserted before op1's target — shift op1 right
      return { ...op1, position: op1.position + op2.content.length };
    }
    return op1; // op2 was after op1's target — no shift needed
  }

  if (op1.type === 'INSERT' && op2.type === 'DELETE') {
    if (op2.position < op1.position) {
      // op2 deleted before op1's target — shift op1 left
      return { ...op1, position: op1.position - 1 };
    }
    return op1;
  }

  if (op1.type === 'DELETE' && op2.type === 'INSERT') {
    if (op2.position <= op1.position) {
      return { ...op1, position: op1.position + op2.content.length };
    }
    return op1;
  }

  if (op1.type === 'DELETE' && op2.type === 'DELETE') {
    if (op2.position < op1.position) {
      return { ...op1, position: op1.position - 1 };
    }
    if (op2.position === op1.position) {
      return null; // Both deleted the same character — op1 is a no-op
    }
    return op1;
  }
}

OT Example

Document: "Hello". User A: INSERT("!", 5) → "Hello!". User B (concurrently): INSERT("?", 5) → "Hello?". B’s op arrives at the server after A’s op was already applied.

pseudocode

# Server has already applied A's INSERT("!", 5)
# Document state: "Hello!"

# B's original op: INSERT("?", 5)
# Transform against A's INSERT("!", 5):
#   A inserted at position 5 → B's position 5 must shift to 6
transformed_B = transform(INSERT("?", 5), INSERT("!", 5))
# → INSERT("?", 6)

# Apply transformed_B to "Hello!"
# → "Hello!?"  ✓  (both users' intent preserved)

Google Docs originally
used OT, implemented
from scratch in JavaScript.
The OT algorithm for
rich text — handling
concurrent bold+italic+
cursor+insert correctly —
requires dozens of
transformation cases.

Interactive OT Demo

▶ Operational Transform — Conflict Visualizer

▶ User A

Hello

▶ User B

Hello

Without OT (merged result)

—

With OT (merged result)

—

OT Limitations

OT works but it has a nasty property: the transformation function must handle every combination of concurrent operations. For plain text, there are 4 combinations (INSERT/INSERT, INSERT/DELETE, DELETE/INSERT, DELETE/DELETE). For rich text with formatting (bold, italic, comments, table cells, embedded images), the number of cases explodes into the dozens. Even small bugs in transformation functions can cause subtle divergence that only appears under specific concurrency patterns.

The second critical limitation: OT requires a central server to serialize operations. All clients must agree on the canonical order of operations. Peer-to-peer OT is theoretically possible but extraordinarily difficult to get right.

5. Level 4 — CRDT (Conflict-free Replicated Data Types)

The modern approach. Used by Figma, Notion, Linear, and most new collaborative tools.

Core insight: instead of transforming operations based on positions, assign every character a globally unique ID and express insertions relative to IDs, not positions. Characters are never “at position 5” — they are “after character ID abc:3”.

Why This Solves Convergence

If two users insert characters concurrently “at the same place”, in OT they create a conflict that needs transformation. In a CRDT, each character has a unique ID and a pointer to its predecessor ID. Applying both insertions is always safe — the ordering between them is resolved by a deterministic tie-breaking rule (e.g., larger (timestamp, siteId) tuple goes first). No server serialization is needed.

javascript

// Each character in a CRDT document
class CRDTChar {
  constructor(char, id, afterId) {
    this.char    = char;     // The actual character
    this.id      = id;       // Unique ID: "siteId:counter"
    this.afterId = afterId;  // Insert after this ID (null = beginning)
    this.deleted = false;    // Tombstone — never physically remove
  }
}

class CRDTDocument {
  constructor(siteId) {
    this.siteId  = siteId;
    this.counter = 0;
    this.chars   = []; // ordered list of CRDTChar
  }

  insert(char, afterId) {
    const id = this.siteId + ':' + (++this.counter);
    const newChar = new CRDTChar(char, id, afterId);
    this._integrate(newChar);
    return id;
  }

  _integrate(newChar) {
    // Find the position of afterId, then insert after it
    // Tie-break concurrent inserts at same position by ID comparison
    let insertIdx = 0;
    for (let i = 0; i < this.chars.length; i++) {
      if (this.chars[i].id === newChar.afterId) {
        insertIdx = i + 1;
        // Skip past any chars already inserted here with higher ID (tie-break)
        while (insertIdx < this.chars.length
               && this.chars[insertIdx].afterId === newChar.afterId
               && this.chars[insertIdx].id > newChar.id) {
          insertIdx++;
        }
        break;
      }
    }
    this.chars.splice(insertIdx, 0, newChar);
  }

  delete(id) {
    // Tombstone — mark deleted but keep in array (preserves IDs for future refs)
    const ch = this.chars.find(c => c.id === id);
    if (ch) ch.deleted = true;
  }

  getText() {
    return this.chars.filter(c => !c.deleted).map(c => c.char).join('');
  }
}

Figma switched to
CRDTs in 2019. Their
post “How Figma’s
multiplayer technology
works” is one of the
best technical deep-
dives on real-world
CRDT implementation.
They chose CRDTs over
OT because OT requires
a central server to
serialize operations —
CRDTs can work
peer-to-peer.

Interactive CRDT Visualizer

▶ CRDT Document — ID-Based Character Nodes

View mode:

Types of Text CRDTs

Algorithm	Approach	Used by	Notes
RGA	Each char has unique (timestamp, siteId) tuple; ordering is deterministic	Many research systems	Simple, well-studied
LSEQ	Tree-based fractional position identifiers	Academic	Space-efficient IDs
Logoot/WOOT	Position identifiers are tuples; unique across all sites	Atom Teletype (early)	Classic, readable spec
Yjs	YATA algorithm — custom CRDT with efficient encoding	VS Code Live Share, Overleaf, ProseMirror	Production proven
Automerge	JSON-compatible CRDT; RGA for sequences	Various collaborative tools	Rich data model

6. Level 5 — Real-Time Transport

Operations need to reach all collaborators within milliseconds. The transport layer is separate from the conflict-resolution algorithm.

WebSocket Architecture

javascript

// Client — connect to collaboration server and send ops
class CollabClient {
  constructor(docId, userId) {
    this.ws = new WebSocket('wss://collab.example.com/doc/' + docId);
    this.userId = userId;
    this.opBuffer = [];     // Ops queued while offline
    this.revision = 0;     // Last server revision seen

    this.ws.onopen = () => this._flush();
    this.ws.onmessage = (e) => this._recv(JSON.parse(e.data));
    this.ws.onclose  = () => setTimeout(() => this._reconnect(), 1000);
  }

  sendOp(op) {
    const msg = { type: 'op', op, revision: this.revision, userId: this.userId };
    if (this.ws.readyState === WebSocket.OPEN) {
      this.ws.send(JSON.stringify(msg));
    } else {
      this.opBuffer.push(msg);  // Queue for when reconnected
    }
  }

  _recv(msg) {
    if (msg.type === 'op' && msg.userId !== this.userId) {
      this.revision = msg.revision;
      applyRemoteOp(msg.op);  // Apply the OT-transformed / CRDT op
    }
    if (msg.type === 'ack') {
      this.revision = msg.revision;  // Server confirmed our op
    }
    if (msg.type === 'presence') {
      updateCursor(msg.userId, msg.position, msg.color);
    }
  }

  _flush() {
    while (this.opBuffer.length) {
      this.ws.send(JSON.stringify(this.opBuffer.shift()));
    }
  }
}

Operation Flow

Every keystroke generates an operation that travels through this pipeline:

Client A

types char

→

WebSocket

op + revision

→

Collab Server

OT/CRDT transform

→

Op Log DB

append operation

→

Clients B,C,D

broadcast

The server serializes the operation stream. This is why OT requires a server — the server assigns a global revision number to each op, and clients use that revision to know which ops they need to transform against.

7. Level 6 — Persistence & Version History

Operation Log

Every operation is appended to an operation log — an immutable, ordered record of every change ever made to the document.

sql

-- Operation log table
CREATE TABLE doc_operations (
  revision   BIGINT      NOT NULL,
  doc_id     UUID        NOT NULL,
  user_id    UUID        NOT NULL,
  op_type    VARCHAR(16) NOT NULL, -- INSERT | DELETE | FORMAT
  op_data    JSONB       NOT NULL, -- {position, content, attrs, ...}
  created_at TIMESTAMPTZ NOT NULL DEFAULT now(),
  PRIMARY KEY (doc_id, revision)
);

-- Snapshot table (periodic compaction)
CREATE TABLE doc_snapshots (
  doc_id          UUID        NOT NULL,
  at_revision     BIGINT      NOT NULL,
  snapshot_data   BYTEA       NOT NULL, -- serialized document state
  created_at      TIMESTAMPTZ NOT NULL,
  PRIMARY KEY (doc_id, at_revision)
);

Document State Reconstruction

pseudocode

function loadDocument(docId, atRevision = null):
    # Find nearest snapshot before atRevision
    snap = getLatestSnapshot(docId, before = atRevision)

    # Replay all ops from snapshot to target revision
    ops = getOps(docId, fromRevision = snap.atRevision, toRevision = atRevision)

    doc = deserialize(snap.data)
    for op in ops:
        doc = applyOp(doc, op)
    return doc

Version history for free: because every operation is stored in order, "See version history" in Google Docs is just loadDocument(docId, atRevision=X) — replay the op log up to revision X. The document at any point in time is always reconstructible.

Storage Estimates

A typical heavily edited document after one year:

~100B

Bytes per op

~1M

Ops / year / heavy doc

~100MB

Op log / year

~1–5MB

Snapshot size

Snapshots are taken periodically (e.g., every 10,000 ops or every 24 hours). On load, the client fetches the latest snapshot plus any ops since the snapshot — typically a small number. This bounds cold-start latency regardless of document age.

8. Level 7 — Cursors & Presence Awareness

Showing other users’ cursors and selections is a separate problem from document consistency. Cursor positions are ephemeral — they do not need to be persisted or replayed.

javascript

// Broadcast presence on every cursor move
function onCursorMove(position, selection) {
  ws.send(JSON.stringify({
    type:      'presence',
    userId:    currentUser.id,
    color:     currentUser.color,
    position:  position,
    selection: selection,   // {start, end} or null
    name:      currentUser.displayName
  }));
}

// Render remote cursors as overlays
function updateCursor(userId, position, color) {
  let cursor = document.getElementById('cursor-' + userId);
  if (!cursor) {
    cursor = document.createElement('div');
    cursor.id = 'cursor-' + userId;
    cursor.className = 'remote-cursor';
    cursor.style.borderColor = color;
    document.getElementById('editor').appendChild(cursor);
  }
  // Convert document position to pixel coords and reposition the cursor overlay
  const coords = positionToCoords(position);
  cursor.style.left = coords.x + 'px';
  cursor.style.top  = coords.y + 'px';
}

Redis Pub/Sub for Presence

Each document session has a Redis channel. When a user joins or their cursor moves, the collaboration server publishes to that channel. All servers subscribed to that document receive the update and relay it to their connected clients.

pseudocode

# Server: user joins document session
on ws_connect(userId, docId):
    redis.subscribe("doc:" + docId)
    redis.publish("doc:" + docId, { event: "join", userId })
    redis.hset("presence:" + docId, userId, JSON.stringify({ online: true, lastSeen: now() }))
    redis.expire("presence:" + docId, 86400)  # TTL — auto-clean inactive sessions

# Broadcast presence updates to all servers for this doc
on ws_message(userId, docId, presenceMsg):
    redis.publish("doc:" + docId, presenceMsg)

# On message received from Redis channel
on redis_message(channel, msg):
    docId = channel.split(":")[1]
    broadcastToLocalClients(docId, msg)

9. Conflict-Free Features Beyond Text

The CRDT concept extends naturally to other document features. Each feature maps to a specific CRDT data structure:

Feature	CRDT Type	Semantics
Comments / suggestions	G-Set (grow-only set)	Comments can only be added. "Delete" is a soft-delete flag, not a true removal. Ensures no comment is lost if two users delete the same comment concurrently.
Checkboxes / toggles	LWW-Register	Last-write-wins per checkbox ID. Acceptable because the "conflict" (two users clicking a checkbox simultaneously) has no meaningful resolution — the last click wins.
Bold / italic formatting	OR-Set (observed-remove set)	Removes only what the remover "saw". If A adds bold and B concurrently removes bold (on a stale view), A's add wins because B didn't observe A's add.
Document title	LWW-Register	Last timestamp wins. Title conflicts are rare and LWW is acceptable for short string fields.
Cursor / selection	Ephemeral (not a CRDT)	Not persisted. Broadcast only. No convergence needed — latest update always replaces previous.

The classic paper
“Operational Transformation
in Real-Time Group
Editors” (Ellis & Gibbs,
1989) was written before
the web existed. Google
Docs is essentially a
web-scale implementation
of a 1989 research paper.

10. Capacity Estimate

Metric	Number
Concurrent collaborators per document	typically 2–10, max ~100
Operations per user per second (active editing)	5–20
Operation payload size	50–200 bytes
Bandwidth per active collab session	~2–10 KB/s inbound per server
WebSocket connections per server	~50K–100K (depends on RAM)
Documents per collaboration server	~5K–10K active sessions
Op log for a 1-year heavily edited document	~100MB
Snapshot size for a large document	1–5 MB
Redis presence TTL	24 hours after last activity

11. Full Architecture

Click each component to see its responsibilities:

▶ Collaborative Editing Architecture

Client A

WebSocket

⟶

Load Balancer

sticky sessions

⟶

Collab Server

OT Engine

Op Log DB

append-only

Redis

pub/sub + presence

Snapshot Worker

periodic compaction

Clients B, C, D

receive broadcast

12. OT vs CRDT — When to Use Which

Property	OT	CRDT
Requires central server	Yes — to serialize ops	No — peer-to-peer possible
Algorithm complexity	High for rich text (many transformation cases)	Moderate; simpler reasoning
Storage overhead	Low — ops are compact	Higher — tombstones accumulate, IDs add metadata
Offline support	Difficult — need server to reintegrate	Natural — merge on reconnect
Industry adoption	Google Docs (original), Google Wave	Figma, Notion, Linear, Yjs ecosystem
Proven at scale	Yes — 20 years production use	Yes — growing rapidly

For a system design interview: either is a valid answer. State the trade-off explicitly: OT is simpler to reason about when you have a central server (as most web apps do); CRDT is preferable if you need offline-first or peer-to-peer capability.

13. Interview Checklist

When asked to design Google Docs in an interview, hit these points in order:

1. Identify the core problem — concurrent edits create divergence. Define convergence as the requirement.

2. Eliminate bad approaches — pessimistic locking destroys UX; last-write-wins loses edits. Explain why, briefly.

3. Choose OT or CRDT — describe the algorithm at a conceptual level. Show the INSERT/INSERT transformation example.

4. Transport — WebSocket per client, Collaboration Server serializes the op stream, Redis pub/sub for multi-server broadcast.

5. Persistence — append-only op log, periodic snapshots, version history is replay from snapshot + ops.

6. Presence & cursors — ephemeral, broadcast via WebSocket, Redis for cross-server fanout.

7. Scale numbers — op size (~100B), bandwidth per session (~5 KB/s), storage (~100MB/year per heavy doc).

1. The Core Problem

2. Level 1 — Pessimistic Locking

3. Level 2 — Last-Write-Wins

4. Level 3 — Operational Transformation (OT)

How OT Works

OT Example

Interactive OT Demo

OT Limitations

5. Level 4 — CRDT (Conflict-free Replicated Data Types)

Why This Solves Convergence

Interactive CRDT Visualizer

Types of Text CRDTs

6. Level 5 — Real-Time Transport

WebSocket Architecture

Operation Flow

7. Level 6 — Persistence & Version History

Operation Log

Document State Reconstruction

Storage Estimates

8. Level 7 — Cursors & Presence Awareness

Redis Pub/Sub for Presence

9. Conflict-Free Features Beyond Text

10. Capacity Estimate

11. Full Architecture

12. OT vs CRDT — When to Use Which

13. Interview Checklist

Further Reading