System Design: Link Unfurling — How Slack and WhatsApp Generate URL Previews
The Open Graph Protocol
was created by Facebook
in 2010 alongside the
“Like” button launch.
The intent was to turn
the entire web into a
Facebook graph — it
succeeded beyond all
expectations.
Design the URL preview system for Slack. When a user pastes a URL in a message, Slack fetches that URL and shows a rich preview: title, description, thumbnail image. Handle millions of URLs per day, protect against SSRF attacks, cache aggressively, and support custom Open Graph metadata.
The question: Design the link unfurling system used by Slack. When a user pastes a URL, show a rich preview with title, description, and thumbnail. Handle 250M unfurl requests per day, protect against SSRF, and cache aggressively.
1. What is link unfurling?
When you paste https://github.com/torvalds/linux in Slack, within seconds it expands into a rich card:
Slack’s servers fetched the URL behind the scenes, parsed the <meta property="og:..."> tags, stored an image proxy copy, and pushed the preview to every client in the channel — all in under 500ms.
This is called link unfurling (or URL enrichment). It involves web scraping, caching, security validation, and real-time push to clients. Every major messaging platform has this problem: Slack, WhatsApp, iMessage, Discord, Telegram.
2. The Open Graph Protocol
Facebook created OG
in 2010 alongside the
“Like” button. The spec
was intentionally open
so every platform
would adopt it — and
every major one did.
Facebook’s Open Graph Protocol (2010) standardised how pages declare preview metadata. A page that wants to control its preview embeds these tags in <head>:
<!-- Open Graph meta tags in <head> --> <meta property="og:title" content="Linux kernel source tree" /> <meta property="og:description" content="Contribute to torvalds/linux..." /> <meta property="og:image" content="https://opengraph.githubassets.com/..." /> <meta property="og:url" content="https://github.com/torvalds/linux" /> <meta property="og:type" content="website" />
When a page has no OG tags, the unfurler falls back gracefully:
og:title→<title>tag contentog:description→<meta name="description">→ first 200 chars of<p>textog:image→ first<img>withwidth > 200pxog:url→ the requested URL itself
Twitter (now X) added their own variant — twitter:card tags — which follow the same idea but with different property names.
Interactive: Open Graph Parser
3. The Unfurling Pipeline
There are two paths: synchronous (fast, cached) and asynchronous (slow, background).
Synchronous path — target: under 500ms
message
pattern
{url}
cache check
parse HTML
return
shown
On a cache hit (step 3 succeeds), the pipeline short-circuits — no HTTP fetch needed, response in < 5ms from Redis. On a cache miss, the system fetches the URL with a strict 3-second timeout, parses the HTML, stores the result in Redis, then returns the preview data.
Asynchronous path — for slow URLs
Some URLs take 2–5 seconds (JavaScript-heavy pages, slow servers). Making the user wait is bad UX. Instead:
- Message is sent immediately — no preview yet
- Unfurl job is enqueued (SQS / Kafka)
- Worker fetches and parses asynchronously
- When complete, result is pushed via WebSocket to every client in the channel
- Preview appears inline — message UI updates seamlessly
The unfurler service (pseudocode)
async def unfurl(url: str) -> Preview: # 1. Normalise: lowercase, strip tracking params key = cache_key(url) # 2. Cache check cached = await redis.get(key) if cached: return Preview.from_json(cached) # 3. SSRF validation BEFORE any network call validate_url(url) # raises SSRFError if blocked # 4. Fetch with timeout html = await fetch_html( url, timeout=3.0, max_size=500_000, # 500 KB max follow_redirects=3, re_validate_ip=True ) # 5. Parse OG tags with fallback preview = parse_og(html, url) # 6. Cache the result ttl = choose_ttl(url) # 1h news, 24h default await redis.setex(key, ttl, preview.to_json()) return preview
4. SSRF: The Most Dangerous Bug in Link Unfurling
SSRF was the attack
vector in the 2019
Capital One breach.
An SSRF flaw let the
attacker reach the AWS
EC2 metadata endpoint
and steal IAM creds,
exposing 100 million
customer records.
Server-Side Request Forgery (SSRF) is when an attacker tricks your server into making HTTP requests to internal infrastructure — using your server’s network privileges.
Imagine this request arriving at Slack’s unfurl endpoint:
POST /unfurl Content-Type: application/json { "url": "http://169.254.169.254/latest/meta-data/iam/security-credentials/" }
Slack’s unfurl service, running on AWS, dutifully fetches that URL. The response is the EC2 instance’s IAM credentials — the attacker now has full AWS access.
SSRF mitigations
| Mitigation | How it works | Catches |
|---|---|---|
| DNS pre-resolve + block | Resolve hostname to IP before fetching; reject RFC1918 / loopback / link-local ranges | Direct IP attacks, hostname aliases |
| Separate VPC | Unfurl service runs in isolated network with no route to internal services | Lateral movement even if IP check bypassed |
| Redirect re-validation | After each redirect, re-check the destination IP | Open-redirect SSRF chains |
| Allowlist public IPs only | Only allow routable, non-reserved IP space | Cloud metadata endpoints, private ranges |
| Max 3 redirects | Limit redirect chain length | Redirect loops, deep redirect chains |
| DNS rebinding protection | Re-resolve hostname at connection time; compare to pre-resolved IP | DNS rebinding attacks |
Blocked IP ranges:
import ipaddress BLOCKED_RANGES = [ "10.0.0.0/8", # RFC1918 private "172.16.0.0/12", # RFC1918 private "192.168.0.0/16", # RFC1918 private "127.0.0.0/8", # loopback "169.254.0.0/16", # link-local / AWS IMDS "::1/128", # IPv6 loopback "fc00::/7", # IPv6 unique-local "fe80::/10", # IPv6 link-local ] def is_safe_ip(ip_str: str) -> bool: addr = ipaddress.ip_address(ip_str) for cidr in BLOCKED_RANGES: if addr in ipaddress.ip_network(cidr): return False return True def validate_url(url: str): parsed = urllib.parse.urlparse(url) if parsed.scheme not in ('http', 'https'): raise SSRFError("non-http scheme blocked") ips = socket.getaddrinfo(parsed.hostname, None) for (_, _, _, _, sockaddr) in ips: if not is_safe_ip(sockaddr[0]): raise SSRFError("resolved to blocked IP: " + sockaddr[0])
Interactive: SSRF Checker
5. Caching Strategy
Slack’s unfurl system
has been creatively
abused — users craft
pages with specific OG
metadata to generate
custom Slack message
cards. Rate limiting
was added to prevent
this kind of misuse.
Caching is the most important performance lever in link unfurling. The same GitHub or YouTube URL will be pasted by thousands of users — fetching it every time is wasteful and slow.
Cache key normalisation
Before looking up the cache, normalise the URL so that equivalent URLs share one cache entry:
Normalisation steps:
- Lowercase the scheme and hostname
- Remove known tracking params:
utm_*,fbclid,ref,source,campaign - Sort remaining query parameters alphabetically
- Remove trailing slashes from the path
TTL policy
| URL type | TTL | Rationale |
|---|---|---|
| News articles (bbc.com, nytimes.com) | 1 hour | Content and headline may change |
| GitHub repos, docs, wikis | 24 hours | Rarely changes intraday |
| YouTube videos | 24 hours | Title/thumbnail very stable |
| Twitter/X posts | 4 hours | Engagement numbers update frequently |
| 404 / error responses | 1 hour | Negative cache — site may come back |
| Default | 24 hours | Conservative default |
Image proxying
Never hotlink the og:image directly. Instead:
- Download the image to your own CDN at unfurl time
- Serve from your CDN — stable URL, controlled by you
- Resize to max 1200×630px to save bandwidth
- Convert to WebP for modern clients
If you hotlink directly, images disappear when the source site removes them, and the site owner can track every Slack user who views the message via image request logs.
Cache invalidation
Users can click “Refresh preview” on any Slack message. This:
- Issues
POST /unfurl?bust=1 {url}— skip cache, force re-fetch - Overwrites the cache entry with fresh data
- Pushes the updated preview to all channel clients via WebSocket
6. Scale Estimates
The math: 50M users × 5 URLs/day = 250M requests/day = ~3,000/sec. With 80% cache hit rate (GitHub, YouTube, Twitter dominate), actual external HTTP fetches drop to ~600/sec. At 200KB average page size, that’s 120 MB/sec outbound — manageable with a fleet of ~20 fetch workers.
Why is cache hit rate so high? Pareto principle applies hard here. A small number of popular domains (GitHub, YouTube, Twitter, Wikipedia, Notion) account for a huge fraction of all shared URLs. These hit the cache constantly.
Sizing Redis cache storage:
- 100M cached URL previews × ~1KB per entry = ~100 GB
- Redis with LRU eviction handles this comfortably on a few large nodes
- Use Redis Cluster for horizontal scaling and HA
7. JavaScript-Rendered Pages
A simple HTTP GET of a React or Vue app returns near-empty HTML — the content is injected by JavaScript after load. The <head> OG tags may be present (server-side rendering), but increasingly they are not.
Solutions in order of cost:
| Approach | Latency | Cost | Use case |
|---|---|---|---|
| Plain HTTP fetch + HTML parse | <300ms | Very low | Most sites (SSR, static) |
| Server-Side Rendering detection | <300ms | Very low | Sites with SSR but blank SPA shell |
| Headless Chrome render | 2–5 sec | High (CPU) | Pure SPA sites, no SSR |
| Pre-rendering cache (rendertron) | Varies | Medium | Frequently-shared SPA URLs |
Tiered approach (recommended):
async def fetch_with_fallback(url: str) -> Preview: # Tier 1: fast plain fetch html = await http_get(url, timeout=3.0) preview = parse_og(html, url) if preview.is_empty(): # Tier 2: queue for headless render (async) await queue.enqueue('headless_render', url=url) # Return partial preview now; push update when headless completes return Preview(url=url, title=extract_domain(url)) return preview
Headless Chrome workers are expensive — keep a pool of ~10 warm Chromium instances. Each render takes 2–5 seconds. At 600 actual fetches/sec with ~5% needing headless rendering, that’s 30 headless renders/sec. A pool of 10 workers × 0.5 renders/sec each = 5/sec — clearly you need more workers or a separate headless fleet. Use autoscaling here.
8. Special Handling for Major Platforms
Most traffic comes from a few sites. Custom parsers for each deliver better previews than generic OG parsing:
def parse_og(html: str, url: str) -> Preview: domain = extract_domain(url) # Custom parsers for high-traffic domains if domain == 'youtube.com' or domain == 'youtu.be': return parse_youtube(html, url) if domain == 'twitter.com' or domain == 'x.com': return parse_twitter(html, url) if domain == 'github.com': return parse_github(html, url) # Generic OG parser as default return parse_generic_og(html, url) def parse_twitter(html: str, url: str) -> Preview: soup = BeautifulSoup(html) # Twitter uses twitter:* tags, not og:* title = meta_content(soup, 'twitter:title') image = meta_content(soup, 'twitter:image') desc = meta_content(soup, 'twitter:description') return Preview(title=title, image=image, description=desc, url=url)
Platform-specific notes:
- YouTube: thumbnail URL follows a stable pattern (
img.youtube.com/vi/{video_id}/maxresdefault.jpg) — can be constructed without parsing HTML at all - GitHub: uses a dynamic OG image service that generates repo cards server-side — these are reliable and cache well
- Instagram / Facebook: rate-limit aggressive scrapers heavily; requires rotating user-agents and respecting crawl delays
9. Content Safety
Before surfacing a preview to a user, the unfurl pipeline should validate it:
| Risk | Check | When |
|---|---|---|
| Phishing / malware URL | Google Safe Browsing API lookup | Synchronous, before fetch |
| NSFW thumbnail image | ML image classifier (nudity/gore) | Async, after image download |
| Malware download | VirusTotal URL scan | Async (result cached) |
| Misleading OG title | Text similarity between og:title and page title | Synchronous |
| Copyright trap | Block known piracy domains | URL blocklist lookup |
Misleading previews are a real attack vector: a bad actor sets og:title to “Official COVID Vaccine Guide” while the page content is disinformation. One defence: if og:title diverges significantly from the <title> tag (cosine similarity below threshold), display the <title> instead and flag for review.
10. Architecture Diagram
11. Capacity Estimate
| Metric | Value |
|---|---|
| Unfurl requests / sec | 3,000 |
| Cache hit rate | 80% |
| Actual external fetches / sec | 600 |
| Avg fetched page size | 200 KB |
| Outbound bandwidth | 120 MB/sec |
| Redis cache storage | ~100 GB (100M entries × 1 KB) |
| Image CDN storage | ~5 TB |
| Headless workers needed | ~60 (30 renders/sec × 2 sec each) |
| Fetch worker fleet | ~20 pods (600 fetches/sec ÷ 30/pod) |
The Open Graph Protocol
is now supported by
every major social
platform, messaging app,
search engine, and
link-in-bio tool — a
rare case of a single
company’s proprietary
format becoming a
genuine open standard.