System Design: Distributed ID Generator — From UUID to Twitter Snowflake

Series #13 of 15 — System Design Interview Prep **Interview question:** Design a distributed unique ID generator that produces IDs at 10,000/sec, is sortable by creation time, works across 100 servers without coordination, and fits in 64 bits. This question recurs in every senior distributed-systems interview because it is deceptively simple on the surface and layered with real trade-offs underneath. We will work through every level — from the naïve answer to the Twitter production design — and build interactive tools along the way. --- ## 1. The Problem — Why Not AUTO_INCREMENT? In a single relational database, `AUTO_INCREMENT` (MySQL) or `SERIAL` (Postgres) is perfectly fine. The database engine guarantees a monotonically increasing counter, atomic across concurrent inserts. Distributed systems shatter this assumption immediately. Imagine you are Twitter, storing 500 million tweets per day, spread across 100 MySQL shards to handle the write volume. The question becomes: which shard assigns the next ID? You need a single authority — a single point of failure. Every write must round-trip to the ID server before it can proceed. Under load it becomes the bottleneck. If it fails, your entire write path stops. The requirements we need to satisfy before proposing a solution: - **Globally unique** — no two IDs ever collide, across any server, any datacenter, any point in time - **64-bit** — must fit in a `long` / `int64`. 128-bit IDs double the storage cost of every index and foreign key - **Time-sortable** — if ID *A* was generated before ID *B*, then *A* < *B* numerically. This makes range queries (`WHERE id > X AND id < Y`) meaningful as time ranges - **No runtime coordination** — each node generates IDs fully independently at runtime. Startup coordination (e.g. assigning machine IDs) is acceptable - **10,000+ IDs/sec per node** — at minimum. Production systems need 100K–1M/sec --- ## 2. Level 1 — UUID v4 The simplest globally unique ID is a UUID (Universally Unique Identifier). Version 4 uses 122 random bits:
// UUID v4 format (128 bits total, displayed as 32 hex digits + 4 dashes)
550e8400-e29b-41d4-a716-446655440000
//          ^^^^                   version = 4 (random)
//               ^                  variant = 10xx (RFC 4122)
// Java
import java.util.UUID;

public class UUIDDemo {
    public static String generate() {
        return UUID.randomUUID().toString();
        // "a3bb189e-8bf9-3888-9912-ace4e6543002"
    }
}

// Python
import uuid
uid = str(uuid.uuid4())   # "f47ac10b-58cc-4372-a567-0e02b2c3d479"
**What works:** Zero coordination needed. Works offline. Probabilistically unique — the chance of collision among 1 trillion UUIDs is about 1 in a billion. Good enough for almost any real-world system. **The four problems:** 1. **128 bits** — doubles the storage cost of every index and foreign key. A billion-row table with a UUID primary key costs ~128 bytes per index entry vs 8 bytes for a 64-bit integer. 2. **Not time-sortable** — the 122 random bits have no time component. You cannot derive insertion order from UUID values. 3. **B-tree fragmentation** — random values insert at unpredictable positions in the B-tree index, causing frequent page splits and cache evictions. We will visualize this in Section 11. 4. **URL-unfriendly** — 36 characters with dashes; awkward in REST endpoints and log files. UUID v1 uses a timestamp in its structure, but it leaks the MAC address of the generating machine (a privacy issue) and still has poor locality due to the timestamp byte order. --- ## 3. Level 2 — Database Auto-Increment with Multiple Servers A pragmatic workaround: multiple database servers, each claiming a different stride of the integer space. {: class="marginalia" } Twitter open-sourced Snowflake in 2010, and it has since been reimplemented in every major language. Instagram, Discord, and Mastodon all use Snowflake-style IDs. With two servers generating IDs cooperatively:
-- Server A: generates odd IDs (1, 3, 5, 7 ...)
SET @@auto_increment_offset    = 1;
SET @@auto_increment_increment = 2;

-- Server B: generates even IDs (2, 4, 6, 8 ...)
SET @@auto_increment_offset    = 2;
SET @@auto_increment_increment = 2;
Flickr famously used this pattern in production for years. It is simple, battle-tested, and requires no application-level changes — the database handles everything. **Where it breaks down:** - Hard-coded to exactly *N* servers. Scaling from 2 to 3 means reassigning offsets and carefully migrating live data. - Adding a server mid-flight risks collisions unless every table's current max is analysed and new offsets are chosen carefully. - IDs from the two servers are interleaved but not strictly time-ordered. Server A's ID 999 may have been generated hours after Server B's ID 1000. - Each server is still a single point of failure for its ID range. This is an acceptable solution for an early-stage product with one or two databases. It stops scaling cleanly well before you need it most. --- ## 4. Level 3 — Timestamp + Random Suffix Each server independently concatenates the current millisecond with random bits:
// 48-bit ms timestamp + 16-bit random = 64-bit ID
function generate() {
    const ts   = Date.now();                       // ms since Unix epoch (~48 bits)
    const rand = Math.floor(Math.random() * 65536); // 16-bit random
    return ts * 65536 + rand;
}
IDs generated in different milliseconds will sort correctly. IDs in the *same* millisecond sort by their random component — which may not match generation order. **The collision problem:** At 10,000 IDs/sec you expect roughly 10 IDs per millisecond per server. With 65,536 random values available, the birthday-problem collision probability within one millisecond is approximately (10² / (2 × 65536)) ≈ 0.08%. Small — but not zero. Under a traffic spike it gets meaningfully worse. Adding a machine ID prefix reduces cross-server collisions to zero, but within a single server in a single millisecond, collisions remain possible under load. For any system requiring a hard uniqueness guarantee this is not good enough. --- ## 5. Level 4 — Twitter Snowflake ⭐ Snowflake solves every requirement by carefully partitioning 64 bits into three non-overlapping fields:
64-bit Snowflake = [1 sign bit] + [41 timestamp bits] + [10 machine-ID bits] + [12 sequence bits]
The timestamp occupies the most significant bits, so IDs are strictly ordered by time. The sequence increments atomically within a millisecond, so no coordination is needed between calls on the same node.
**Interactive Snowflake Visualizer — try it:**
🔢 Snowflake ID — 64-bit Bit Layout
Bit 63: Sign (always 0)
Bits 62–22: 41-bit Timestamp (ms since epoch)
Bits 21–12: 10-bit Machine ID
Bits 11–0: 12-bit Sequence
Sign 1 bit 0 Always 0 — keeps the ID a positive signed 64-bit integer
Timestamp 41 bits — ms since 2010-01-01 Milliseconds elapsed since the custom epoch (Jan 1 2010). 2^41 ms = 69.7 years.
Machine ID 10 bits — 1024 nodes Identifies the generating server. 2^10 = 1024 unique machines. Often split 5 bits datacenter + 5 bits node.
Sequence 12 bits — 4096/ms Auto-increments per millisecond on this machine. Resets to 0 on each new ms. 2^12 = 4096 IDs/ms/machine.
Decimal ID:
Timestamp offset:
Date / Time (UTC):
Machine ID:42 (demo)
Sequence:
🔍 Decode Any Snowflake ID
Timestamp offset:
Date / Time (UTC):
Machine ID:
Sequence:
Binary (segmented):
**The maths behind Snowflake:** - **41-bit timestamp** = 2^41 ms = 2,199,023,255,552 ms ÷ 86,400,000 ms/day ÷ 365.25 ≈ **69.7 years**. Set your custom epoch to Jan 1 2020 and you are covered until 2089. - **10-bit machine ID** = 2^10 = **1,024 unique nodes**. In practice this is often split into 5 bits for datacenter (32 datacenters) and 5 bits for machine within that datacenter (32 machines each), giving 1,024 total nodes with structured routing. - **12-bit sequence** = 2^12 = **4,096 IDs per millisecond per machine** → **4,096,000 IDs/sec per machine**. The 10K/sec requirement is satisfied by a factor of 400×. The key insight: because the timestamp occupies bits 63–22 (the most significant non-sign bits), any two Snowflake IDs are ordered by time as long as they were generated in different milliseconds. Within the same millisecond, the machine ID and sequence break ties deterministically. --- ## 6. Level 5 — The Clock Drift Problem Snowflake has one serious vulnerability: it assumes server clocks move only forward. In production, NTP (Network Time Protocol) adjustments can move a server's clock backward by tens of milliseconds. If your clock jumps back 50 ms, the Snowflake generator will start producing IDs with timestamps it has already used — creating duplicates. {: class="marginalia" } The 41-bit timestamp in Snowflake gives you 69.7 years from your custom epoch. Set your epoch to 2020 and you are covered until 2089 — almost certainly enough for any system you build today. Every production Snowflake implementation must include a clock-drift guard:
public class SnowflakeGenerator {

    private static final long EPOCH      = 1577836800000L; // Jan 1 2020 UTC
    private static final long MACHINE_ID = readMachineId();  // from config / ZooKeeper

    private long lastTimestamp = -1L;
    private long sequence      = 0L;

    public synchronized long nextId() {
        long currentMs = currentTimeMs();

        // ─── Clock drift guard ──────────────────────────────────
        if (currentMs < lastTimestamp) {
            long drift = lastTimestamp - currentMs;
            if (drift <= 5) {
                // Small drift: sleep it off
                sleepMs(drift);
                currentMs = currentTimeMs();
            } else {
                // Large drift: fail loudly — cannot guarantee uniqueness
                throw new ClockMovedBackwardException(
                    "Clock moved back " + drift + "ms. Last="
                    + lastTimestamp + " current=" + currentMs
                );
            }
        }
        // ────────────────────────────────────────────────────────

        if (currentMs == lastTimestamp) {
            sequence = (sequence + 1) & 0xFFFL;   // 12-bit mask
            if (sequence == 0) {
                currentMs = waitNextMs(lastTimestamp); // exhausted 4096 IDs this ms
            }
        } else {
            sequence = 0;
        }

        lastTimestamp = currentMs;

        return ((currentMs - EPOCH) << 22)
             | (MACHINE_ID           << 12)
             |  sequence;
    }
}
Three strategies for handling drift, in order of aggressiveness: 1. **Wait and retry (small drift)** — if drift is ≤ 5 ms, sleep until the clock catches up. Safe for most workloads; adds negligible latency. 2. **Throw an exception (large drift)** — for drifts above a threshold, fail fast. The caller retries against a healthy node. Prevents silent ID corruption. 3. **Monotonic clock** — decouple from wall-clock time entirely. Java's `System.nanoTime()` is monotonic. Use it to increment a logical counter, periodically anchored to wall-clock time. This is the most robust approach but the hardest to implement correctly. Modern cloud VMs (AWS, GCP, Azure) have extremely well-disciplined clocks with NTP adjustments under 1 ms in practice. But the guard code is still essential — especially for financial systems where duplicate IDs cause data corruption. --- ## 7. Level 6 — Sonyflake (A Snowflake Variant) Sony's open-source [Sonyflake](https://github.com/sony/sonyflake) makes different trade-offs: lower timestamp resolution in exchange for a longer lifespan and automatic machine ID assignment:
FieldSnowflakeSonyflake
Total bits6463
Timestamp resolution1 ms10 ms
Timestamp bits41 bits39 bits
Lifespan from epoch69.7 years174 years
Machine ID bits10 bits (1024 machines)8 bits (256 machines)
Machine ID sourceConfig / ZooKeeperLower 16 bits of private IP
Sequence bits12 bits (4096/ms)16 bits (65536/10 ms)
Max IDs/sec/machine4,096,0006,553,600
Sonyflake's headline design choice: use the lower 16 bits of the machine's private IP address as the machine identifier. In a standard AWS VPC (10.0.0.0/16), the last two octets are unique per instance — no configuration required. The trade-off is fewer available machines (256 vs 1024).
// Sonyflake 63-bit layout (Go)
// [39 bits: 10ms timestamp] [8 bits: machine ID] [16 bits: sequence]

type Sonyflake struct {
    mutex       sync.Mutex
    startTime   int64    // custom epoch in 10ms units
    elapsedTime int64    // 10ms ticks since startTime
    sequence    uint16   // 16-bit per-tick counter
    machineID   uint16   // lower 16 bits of private IP
}

func (sf *Sonyflake) NextID() (uint64, error) {
    const maskSequence = uint16(0xFFFF)
    sf.mutex.Lock()
    defer sf.mutex.Unlock()
    current := toSonyflakeTime(time.Now())
    if sf.elapsedTime < current {
        sf.elapsedTime = current
        sf.sequence = 0
    } else {
        sf.sequence++
        if sf.sequence == 0 {
            sf.elapsedTime++
            sleepTime(sf.elapsedTime - current) // wait for tick
        }
    }
    return sf.toID(), nil
}
--- ## 8. Level 7 — MongoDB ObjectID MongoDB's ObjectID predates Snowflake and uses a different decomposition — 96 bits (12 bytes) with second-precision timestamps:
// MongoDB ObjectID: 12 bytes = 96 bits
//  ┌──────────────────────────────────────────────────────────┐
//  │ 4-byte timestamp │ 5-byte random │ 3-byte counter        │
//  │  (Unix seconds)  │ (machine+pid) │ (incrementing)        │
//  └──────────────────────────────────────────────────────────┘

// Example: 507f1f77bcf86cd799439011
//          ^^^^^^^^
//          507f1f77 → 0x507f1f77 = 1350949751 (Unix) = Oct 22 2012 21:49:11 UTC

Bytes 0–3:   4-byte big-endian Unix timestamp (second precision) → sortable to the second
Bytes 4–8:   5-byte random value generated at process startup (machine + pid hash)
Bytes 9–11:  3-byte incrementing counter, random initial value → 16,777,216 IDs/sec/process
MongoDB ObjectIDs are **roughly sortable**: two IDs from different seconds will compare correctly. Two IDs from the *same* second will sort by the 5-byte random machine field, which carries no time information. The 5-byte random value was historically derived from the machine's hostname + process ID. Modern MongoDB drivers use a purely random value at startup to avoid hostname collision in containerised environments where many processes share the same hostname. At 96 bits, ObjectIDs are heavier than Snowflake but lighter than UUID. If you are already using MongoDB and do not need strict millisecond ordering, ObjectID is a fine native choice. --- ## 9. Level 8 — ULID ULID (Universally Unique Lexicographically Sortable Identifier) combines UUID's universality with Snowflake's sortability, encoded as a URL-safe string:
// ULID: 128 bits → 26 Crockford Base32 characters
// Example: 01ARZ3NDEKTSV4RRFFQ69G5FAV
//          ^^^^^^^^^^                   = 10-char timestamp (48-bit ms Unix epoch)
//                    ^^^^^^^^^^^^^^^^   = 16-char random    (80-bit cryptographic random)

// Crockford Base32 alphabet — removes I, L, O, U to avoid visual confusion:
// 0123456789ABCDEFGHJKMNPQRSTVWXYZ
// ULID generator (JavaScript, no external libs)
function generateULID() {
    var ENC = "0123456789ABCDEFGHJKMNPQRSTVWXYZ";

    // 10-char timestamp
    var ms  = Date.now();
    var ts  = "";
    var t   = ms;
    for (var i = 9; i >= 0; i--) {
        ts = ENC[t % 32] + ts;
        t  = Math.floor(t / 32);
    }

    // 16-char random
    var rand = "";
    for (var j = 0; j < 16; j++) {
        rand += ENC[Math.floor(Math.random() * 32)];
    }

    return ts + rand;
}
ULID advantages over Snowflake and UUID: - **String-sortable** — `01ARZ3` < `01ARZ4` lexicographically, which maps to time order. Store as a `VARCHAR(26)` and `ORDER BY id` gives you time order. - **No special characters** — safe in URLs, filenames, JSON keys, and database columns without quoting. - **Case-insensitive** — `01arz3ndek` and `01ARZ3NDEK` are the same ULID. - **128-bit collision resistance** — same strength as UUID v4 for the random component. The main disadvantage vs Snowflake: 128 bits vs 64. And two ULIDs generated in the same millisecond will sort by their random component, not strict generation order. The optional monotonic ULID spec addresses this by incrementing the random part within a millisecond. --- ## 10. Interactive: ID Algorithm Comparison Generate IDs for each algorithm and observe sortability. The colour indicates whether the five generated IDs, when sorted lexicographically, appear in the same order they were created (green = time-sorted, red = scrambled).
UUID v4
  • Click to generate…
128 bitsCoord: No
Auto-Increment
  • Click to generate…
64 bitsCoord: Yes (DB)
Snowflake
  • Click to generate…
64 bitsCoord: No
ULID
  • Click to generate…
128 bitsCoord: No

Green = IDs sort in generation order. Red = sort order does not match generation order.

--- ## 11. DB Index Locality — The Hidden Performance Cliff One of the most practically important differences between random and sequential IDs is their impact on write performance in clustered database indexes (InnoDB, Postgres B-tree, RocksDB). {: class="marginalia" } Random UUID inserts into a MySQL InnoDB table with 100 million rows can be 10–50× slower than sequential IDs due to B-tree page fragmentation. This is a real production performance cliff that teams hit when they first scale a UUID-keyed table. A B-tree index stores keys in sorted order across fixed-size pages (16 KB in MySQL). When you insert a new key: - **Sequential IDs** always insert at the rightmost leaf page — the current maximum. Pages fill left to right, the buffer pool cache stays warm, writes are sequential on disk. - **Random IDs** insert at a random position in the tree. The target leaf page is almost certainly not in the buffer pool. The engine must perform a random disk read, find the target page, potentially split it (expensive), and write it back.
B-Tree Leaf Page Fill Pattern
Sequential
(Snowflake)
Random
(UUID v4)
Sequential inserts (green) always append to the rightmost page — minimal cache misses, no page splits until a page is full, then only the last page splits.
Random inserts (red) scatter across all pages. With a large table most pages will be cold in the buffer pool — each insert pays a random I/O penalty.
In practice, on a 100-million-row InnoDB table running on spinning disks: - **Snowflake / sequential IDs:** sustained insert throughput of **50,000–200,000 rows/sec** - **UUID v4 / random IDs:** insert throughput of **3,000–20,000 rows/sec**, degrading as the table grows On SSDs the gap narrows (random reads are cheaper) but does not disappear — even on NVMe storage, scattered B-tree page splits consume buffer pool capacity and create write amplification that hurts throughput under load. --- ## 12. Full Algorithm Comparison
AlgorithmBitsTime-SortableCoordinationMax IDs/sec/nodeBest Use Case
UUID v4128NoNoUnlimitedSimple unique IDs, no ordering needed
Auto-Increment64YesYes (DB)~10KSingle-database, low-scale systems
Timestamp + Random64~YesNo~65KApproximate ordering, low-contention
Snowflake64YesNo4,096,000Distributed systems, social platforms
Sonyflake63YesNo6,553,600Auto-configured, private-IP networks
MongoDB ObjectID96~Yes (1s)No16,777,216MongoDB-native applications
ULID128YesNoUnlimitedURL-safe sortable IDs, human-readable logs
--- ## Interview Answer Summary When this question is posed in a system design interview, the ideal answer moves through these steps: 1. **Clarify requirements** — confirm the 64-bit constraint, throughput target, whether strict millisecond sorting is required, and the number of nodes. 2. **Rule out naive solutions** — explain why UUID is too large and not sortable; explain why auto-increment requires central coordination. 3. **Propose Snowflake** — sketch the three-field layout, derive the capacity maths out loud (69.7 years, 4M IDs/sec per node), explain how uniqueness is guaranteed without coordination. 4. **Address the failure modes** — clock drift guard (wait or throw), machine ID assignment strategy (static config, ZooKeeper ephemeral node, or IP-based like Sonyflake). 5. **Mention alternatives** — ULID if URL-safety or lexicographic sortability matters, Sonyflake if you want zero-configuration machine IDs, MongoDB ObjectID if the stack is already Mongo.
The 64-bit constraint is the whole point. With 128 bits you can lazily concatenate timestamp and random bits and get good-enough behaviour. With 64 bits, every bit counts — you are forced to reason carefully about the trade-offs between lifespan, node count, and per-node throughput. That reasoning is what interviewers are evaluating.