System Design: Link Unfurling — How Slack and WhatsApp Generate URL Previews

Series System Design: Web Scenarios

The Open Graph Protocol
was created by Facebook
in 2010 alongside the
“Like” button launch.
The intent was to turn
the entire web into a
Facebook graph — it
succeeded beyond all
expectations.

Design the URL preview system for Slack. When a user pastes a URL in a message, Slack fetches that URL and shows a rich preview: title, description, thumbnail image. Handle millions of URLs per day, protect against SSRF attacks, cache aggressively, and support custom Open Graph metadata.

The question: Design the link unfurling system used by Slack. When a user pastes a URL, show a rich preview with title, description, and thumbnail. Handle 250M unfurl requests per day, protect against SSRF, and cache aggressively.


When you paste https://github.com/torvalds/linux in Slack, within seconds it expands into a rich card:

🐧
github.com
torvalds/linux: Linux kernel source tree
Linux kernel source tree. Contribute to torvalds/linux development by creating an account on GitHub.

Slack’s servers fetched the URL behind the scenes, parsed the <meta property="og:..."> tags, stored an image proxy copy, and pushed the preview to every client in the channel — all in under 500ms.

This is called link unfurling (or URL enrichment). It involves web scraping, caching, security validation, and real-time push to clients. Every major messaging platform has this problem: Slack, WhatsApp, iMessage, Discord, Telegram.


2. The Open Graph Protocol

Facebook created OG
in 2010 alongside the
“Like” button. The spec
was intentionally open
so every platform
would adopt it — and
every major one did.

Facebook’s Open Graph Protocol (2010) standardised how pages declare preview metadata. A page that wants to control its preview embeds these tags in <head>:

html
<!-- Open Graph meta tags in <head> -->
<meta property="og:title"       content="Linux kernel source tree" />
<meta property="og:description"  content="Contribute to torvalds/linux..." />
<meta property="og:image"        content="https://opengraph.githubassets.com/..." />
<meta property="og:url"          content="https://github.com/torvalds/linux" />
<meta property="og:type"         content="website" />

When a page has no OG tags, the unfurler falls back gracefully:

  1. og:title<title> tag content
  2. og:description<meta name="description"> → first 200 chars of <p> text
  3. og:image → first <img> with width > 200px
  4. og:url → the requested URL itself

Twitter (now X) added their own variant — twitter:card tags — which follow the same idea but with different property names.

Interactive: Open Graph Parser

▶ Open Graph tag parser — see how preview data is extracted
Mode: With OG tags
mock html document
html

      
extracted metadata
Click "Parse OG Tags" to extract...
slack-style preview
Preview will appear here

3. The Unfurling Pipeline

There are two paths: synchronous (fast, cached) and asynchronous (slow, background).

Synchronous path — target: under 500ms

User sends
message
Detect URL
pattern
POST /unfurl
{url}
Redis
cache check
Fetch &
parse HTML
Cache &
return
Preview
shown

On a cache hit (step 3 succeeds), the pipeline short-circuits — no HTTP fetch needed, response in < 5ms from Redis. On a cache miss, the system fetches the URL with a strict 3-second timeout, parses the HTML, stores the result in Redis, then returns the preview data.

Asynchronous path — for slow URLs

Some URLs take 2–5 seconds (JavaScript-heavy pages, slow servers). Making the user wait is bad UX. Instead:

  1. Message is sent immediately — no preview yet
  2. Unfurl job is enqueued (SQS / Kafka)
  3. Worker fetches and parses asynchronously
  4. When complete, result is pushed via WebSocket to every client in the channel
  5. Preview appears inline — message UI updates seamlessly
Interview signal: Mentioning both paths — and knowing when to use each — demonstrates you understand latency/UX tradeoffs. Don't design a purely synchronous system; slow URLs will block message sending.

The unfurler service (pseudocode)

python
async def unfurl(url: str) -> Preview:
    # 1. Normalise: lowercase, strip tracking params
    key = cache_key(url)

    # 2. Cache check
    cached = await redis.get(key)
    if cached:
        return Preview.from_json(cached)

    # 3. SSRF validation BEFORE any network call
    validate_url(url)   # raises SSRFError if blocked

    # 4. Fetch with timeout
    html = await fetch_html(
        url,
        timeout=3.0,
        max_size=500_000,  # 500 KB max
        follow_redirects=3,
        re_validate_ip=True
    )

    # 5. Parse OG tags with fallback
    preview = parse_og(html, url)

    # 6. Cache the result
    ttl = choose_ttl(url)  # 1h news, 24h default
    await redis.setex(key, ttl, preview.to_json())

    return preview

SSRF was the attack
vector in the 2019
Capital One breach.
An SSRF flaw let the
attacker reach the AWS
EC2 metadata endpoint
and steal IAM creds,
exposing 100 million
customer records.

Server-Side Request Forgery (SSRF) is when an attacker tricks your server into making HTTP requests to internal infrastructure — using your server’s network privileges.

Imagine this request arriving at Slack’s unfurl endpoint:

http
POST /unfurl
Content-Type: application/json

{
  "url": "http://169.254.169.254/latest/meta-data/iam/security-credentials/"
}

Slack’s unfurl service, running on AWS, dutifully fetches that URL. The response is the EC2 instance’s IAM credentials — the attacker now has full AWS access.

SSRF mitigations

MitigationHow it worksCatches
DNS pre-resolve + blockResolve hostname to IP before fetching; reject RFC1918 / loopback / link-local rangesDirect IP attacks, hostname aliases
Separate VPCUnfurl service runs in isolated network with no route to internal servicesLateral movement even if IP check bypassed
Redirect re-validationAfter each redirect, re-check the destination IPOpen-redirect SSRF chains
Allowlist public IPs onlyOnly allow routable, non-reserved IP spaceCloud metadata endpoints, private ranges
Max 3 redirectsLimit redirect chain lengthRedirect loops, deep redirect chains
DNS rebinding protectionRe-resolve hostname at connection time; compare to pre-resolved IPDNS rebinding attacks

Blocked IP ranges:

python
import ipaddress

BLOCKED_RANGES = [
    "10.0.0.0/8",        # RFC1918 private
    "172.16.0.0/12",     # RFC1918 private
    "192.168.0.0/16",    # RFC1918 private
    "127.0.0.0/8",       # loopback
    "169.254.0.0/16",    # link-local / AWS IMDS
    "::1/128",           # IPv6 loopback
    "fc00::/7",          # IPv6 unique-local
    "fe80::/10",         # IPv6 link-local
]

def is_safe_ip(ip_str: str) -> bool:
    addr = ipaddress.ip_address(ip_str)
    for cidr in BLOCKED_RANGES:
        if addr in ipaddress.ip_network(cidr):
            return False
    return True

def validate_url(url: str):
    parsed = urllib.parse.urlparse(url)
    if parsed.scheme not in ('http', 'https'):
        raise SSRFError("non-http scheme blocked")
    ips = socket.getaddrinfo(parsed.hostname, None)
    for (_, _, _, _, sockaddr) in ips:
        if not is_safe_ip(sockaddr[0]):
            raise SSRFError("resolved to blocked IP: " + sockaddr[0])

Interactive: SSRF Checker

🛡️ SSRF validation — click a URL to see the validation pipeline
https://github.com/torvalds/linux
✓ Safe
http://192.168.1.1/admin
✗ Blocked
http://169.254.169.254/latest/meta-data/iam/security-credentials/
✗ Blocked
http://localhost:6379/
✗ Blocked
https://evil.com/redirect?to=http://10.0.0.1
✗ Blocked (after redirect)

5. Caching Strategy

Slack’s unfurl system
has been creatively
abused — users craft
pages with specific OG
metadata to generate
custom Slack message
cards. Rate limiting
was added to prevent
this kind of misuse.

Caching is the most important performance lever in link unfurling. The same GitHub or YouTube URL will be pasted by thousands of users — fetching it every time is wasteful and slow.

Cache key normalisation

Before looking up the cache, normalise the URL so that equivalent URLs share one cache entry:

https://GitHub.com/torvalds/linux?utm_source=newsletter&utm_medium=email
↓ normalize
https://github.com/torvalds/linux

https://example.com/page?ref=twitter&fbclid=abc123
↓ normalize
https://example.com/page

Normalisation steps:

  1. Lowercase the scheme and hostname
  2. Remove known tracking params: utm_*, fbclid, ref, source, campaign
  3. Sort remaining query parameters alphabetically
  4. Remove trailing slashes from the path

TTL policy

URL typeTTLRationale
News articles (bbc.com, nytimes.com)1 hourContent and headline may change
GitHub repos, docs, wikis24 hoursRarely changes intraday
YouTube videos24 hoursTitle/thumbnail very stable
Twitter/X posts4 hoursEngagement numbers update frequently
404 / error responses1 hourNegative cache — site may come back
Default24 hoursConservative default

Image proxying

Never hotlink the og:image directly. Instead:

  1. Download the image to your own CDN at unfurl time
  2. Serve from your CDN — stable URL, controlled by you
  3. Resize to max 1200×630px to save bandwidth
  4. Convert to WebP for modern clients

If you hotlink directly, images disappear when the source site removes them, and the site owner can track every Slack user who views the message via image request logs.

Cache invalidation

Users can click “Refresh preview” on any Slack message. This:

  1. Issues POST /unfurl?bust=1 {url} — skip cache, force re-fetch
  2. Overwrites the cache entry with fresh data
  3. Pushes the updated preview to all channel clients via WebSocket

6. Scale Estimates

50M
Slack users
250M
Unfurl requests / day
~3K
Requests / sec
80%
Cache hit rate
600
Actual fetches / sec
120 MB/s
Outbound bandwidth

The math: 50M users × 5 URLs/day = 250M requests/day = ~3,000/sec. With 80% cache hit rate (GitHub, YouTube, Twitter dominate), actual external HTTP fetches drop to ~600/sec. At 200KB average page size, that’s 120 MB/sec outbound — manageable with a fleet of ~20 fetch workers.

Why is cache hit rate so high? Pareto principle applies hard here. A small number of popular domains (GitHub, YouTube, Twitter, Wikipedia, Notion) account for a huge fraction of all shared URLs. These hit the cache constantly.

Sizing Redis cache storage:

  • 100M cached URL previews × ~1KB per entry = ~100 GB
  • Redis with LRU eviction handles this comfortably on a few large nodes
  • Use Redis Cluster for horizontal scaling and HA

7. JavaScript-Rendered Pages

A simple HTTP GET of a React or Vue app returns near-empty HTML — the content is injected by JavaScript after load. The <head> OG tags may be present (server-side rendering), but increasingly they are not.

Solutions in order of cost:

ApproachLatencyCostUse case
Plain HTTP fetch + HTML parse<300msVery lowMost sites (SSR, static)
Server-Side Rendering detection<300msVery lowSites with SSR but blank SPA shell
Headless Chrome render2–5 secHigh (CPU)Pure SPA sites, no SSR
Pre-rendering cache (rendertron)VariesMediumFrequently-shared SPA URLs

Tiered approach (recommended):

python
async def fetch_with_fallback(url: str) -> Preview:
    # Tier 1: fast plain fetch
    html = await http_get(url, timeout=3.0)
    preview = parse_og(html, url)

    if preview.is_empty():
        # Tier 2: queue for headless render (async)
        await queue.enqueue('headless_render', url=url)
        # Return partial preview now; push update when headless completes
        return Preview(url=url, title=extract_domain(url))

    return preview

Headless Chrome workers are expensive — keep a pool of ~10 warm Chromium instances. Each render takes 2–5 seconds. At 600 actual fetches/sec with ~5% needing headless rendering, that’s 30 headless renders/sec. A pool of 10 workers × 0.5 renders/sec each = 5/sec — clearly you need more workers or a separate headless fleet. Use autoscaling here.


8. Special Handling for Major Platforms

Most traffic comes from a few sites. Custom parsers for each deliver better previews than generic OG parsing:

python
def parse_og(html: str, url: str) -> Preview:
    domain = extract_domain(url)

    # Custom parsers for high-traffic domains
    if domain == 'youtube.com' or domain == 'youtu.be':
        return parse_youtube(html, url)

    if domain == 'twitter.com' or domain == 'x.com':
        return parse_twitter(html, url)

    if domain == 'github.com':
        return parse_github(html, url)

    # Generic OG parser as default
    return parse_generic_og(html, url)


def parse_twitter(html: str, url: str) -> Preview:
    soup = BeautifulSoup(html)
    # Twitter uses twitter:* tags, not og:*
    title = meta_content(soup, 'twitter:title')
    image = meta_content(soup, 'twitter:image')
    desc  = meta_content(soup, 'twitter:description')
    return Preview(title=title, image=image, description=desc, url=url)

Platform-specific notes:

  • YouTube: thumbnail URL follows a stable pattern (img.youtube.com/vi/{video_id}/maxresdefault.jpg) — can be constructed without parsing HTML at all
  • GitHub: uses a dynamic OG image service that generates repo cards server-side — these are reliable and cache well
  • Instagram / Facebook: rate-limit aggressive scrapers heavily; requires rotating user-agents and respecting crawl delays

9. Content Safety

Before surfacing a preview to a user, the unfurl pipeline should validate it:

RiskCheckWhen
Phishing / malware URLGoogle Safe Browsing API lookupSynchronous, before fetch
NSFW thumbnail imageML image classifier (nudity/gore)Async, after image download
Malware downloadVirusTotal URL scanAsync (result cached)
Misleading OG titleText similarity between og:title and page titleSynchronous
Copyright trapBlock known piracy domainsURL blocklist lookup

Misleading previews are a real attack vector: a bad actor sets og:title to “Official COVID Vaccine Guide” while the page content is disinformation. One defence: if og:title diverges significantly from the <title> tag (cosine similarity below threshold), display the <title> instead and flag for review.


10. Architecture Diagram

▫ service architecture
Slack Client
→ POST /unfurl
API Gateway
Unfurl Service
↓ check
Redis Cache
‹hit returns›
  ↓ miss
SSRF Validator
HTTP Fetcher
OG Parser
slow URLs →
Job Queue
Headless Chrome Workers
→ WebSocket push
Slack Client

11. Capacity Estimate

MetricValue
Unfurl requests / sec3,000
Cache hit rate80%
Actual external fetches / sec600
Avg fetched page size200 KB
Outbound bandwidth120 MB/sec
Redis cache storage~100 GB (100M entries × 1 KB)
Image CDN storage~5 TB
Headless workers needed~60 (30 renders/sec × 2 sec each)
Fetch worker fleet~20 pods (600 fetches/sec ÷ 30/pod)

The Open Graph Protocol
is now supported by
every major social
platform, messaging app,
search engine, and
link-in-bio tool — a
rare case of a single
company’s proprietary
format becoming a
genuine open standard.