System Design: SSO and Session Management — Authentication at Scale

Series System Design: Web Scenarios

Google’s auth system
is called GAIA
— Google
Accounts and ID
Administration. It
handles every login
across every Google
product for billions
of users, every day.

Logging into Gmail and instantly being logged into YouTube, Drive, and Maps feels like magic. It isn’t. Behind that seamless experience sits one of the most carefully engineered systems in software: a distributed Single Sign-On (SSO) infrastructure that manages billions of active sessions, issues and rotates cryptographic tokens, and must never go down — because when it does, half the internet notices.

The interview question: Design the authentication system for a company like Google, where logging into one service (Gmail) also logs you into all other services (Drive, YouTube, Maps). Handle millions of sessions, token refresh, logout-everywhere, and support third-party apps via OAuth.


1. Session vs Token: The Fundamental Choice

Every authentication system faces the same foundational question first: where does the server keep track of who is logged in?

Server-Side Sessions

The traditional model: a user logs in, the server generates a random sessionId, stores the session data in a database (or Redis), and sends only the sessionId to the browser as a cookie. On every subsequent request, the server looks up the sessionId to find the user.

python
# Login: server creates a session in Redis
def login(username, password):
    user = db.find_user(username)
    if not verify_password(password, user.password_hash):
        raise AuthError("invalid credentials")

    session_id = generate_random_id()        # e.g. "a3f9c..." (128-bit random)
    session_data = {
        "userId":    user.id,
        "createdAt": now(),
        "expiresAt": now() + timedelta(days=30),
        "ip":        request.remote_addr,
        "userAgent": request.headers["User-Agent"],
    }
    redis.setex(
        "session:" + session_id,
        86400 * 30,           # TTL: 30 days in seconds
        json_encode(session_data)
    )
    return session_id       # stored in browser cookie

# Every request: server validates the session
def authenticate_request(request):
    session_id = request.cookies.get("session_id")
    session = redis.get("session:" + session_id)
    if not session or session["expiresAt"] < now():
        raise AuthError("session expired or invalid")
    return session["userId"]

Pros: Instant revocation — delete the key from Redis and the user is immediately logged out on their next request. Small cookie (just the ID). Full control over session lifecycle.

Cons: Stateful — every application server must reach the same session store, adding a network round-trip to every authenticated request. The session store becomes a critical single point of failure.

JWT (JSON Web Tokens)

A different model: the server signs a token containing the user’s identity and hands it back to the client. The client sends that token on every request. The server verifies the signature locally — no database lookup required.

A JWT has three base64url-encoded parts separated by dots:

eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJ1c2VySWQiOiJ1XzEyMyIsImVtYWlsIjoiYWxpY2VAZ21haWwuY29tIiwidG9rZW5WZXJzaW9uIjo3LCJpYXQiOjE3MTc5MzQwMDAsImV4cCI6MTcxNzkzNDkwMH0.SflKxwRJSMeKKF2QT4fwpMeJf36POk6yJV_adQssw5c
header — algorithm & type payload — user data (claims) signature — HMAC or RSA over header.payload
python
# Login: server issues a signed JWT
def login_jwt(username, password):
    user = db.find_user(username)
    if not verify_password(password, user.password_hash):
        raise AuthError("invalid credentials")

    payload = {
        "userId":       user.id,
        "email":        user.email,
        "tokenVersion": user.token_version,   # for revocation (section 2)
        "iat":          now_unix(),
        "exp":          now_unix() + 900,      # expires in 15 minutes
    }
    return jwt.encode(payload, SECRET_KEY, algorithm="HS256")

# Every request: server verifies locally — NO Redis lookup
def authenticate_request_jwt(request):
    token = request.headers.get("Authorization").split()[1]
    try:
        claims = jwt.decode(token, SECRET_KEY, algorithms=["HS256"])
    except jwt.ExpiredSignatureError:
        raise AuthError("token expired")
    except jwt.InvalidTokenError:
        raise AuthError("invalid token")

    # Optionally verify tokenVersion against DB (section 2)
    return claims["userId"]

Pros: Stateless — any server can verify a token without shared storage. Scales horizontally with zero coordination. Works naturally across domains.

Cons: The logout problem — a signed token is valid until it expires. You can’t “un-sign” it. If a token is stolen, you’re stuck until expiry (up to 15 minutes for a short-lived token, or days if misconfigured).

PropertyServer SessionsJWT
Revocation speedInstantOn expiry only
Horizontal scalingNeeds shared storeZero coordination
Cross-domainCookie limitationsHeader-based, works anywhere
Token size~50 bytes (ID only)~200–500 bytes
DB lookup per requestAlwaysNever (or optional)
Payload tamperingNot possibleDetected by signature

2. The JWT Revocation Problem

The JWT spec (RFC 7519)
defines no revocation
mechanism at all.
This was a deliberate
trade-off for
statelessness — and
the source of
countless security bugs.

A JWT cannot be “un-issued.” Once signed, it is valid until its exp claim passes. This creates a fundamental tension: short expiry improves security but creates constant re-authentication friction. Long expiry improves UX but leaves stolen tokens valid for hours or days.

Three real solutions exist, each with different trade-offs:

Solution A: Short-Lived Access Tokens + Refresh Tokens

This is the industry standard (used by Google, GitHub, Stripe, and most major platforms).

  • Access token: Short-lived (15 minutes). Stateless JWT. Used for every API call.
  • Refresh token: Long-lived (30 days). Opaque random string stored in the DB. Used only to get a new access token.

Revocation is now possible: delete the refresh token from the database. The access token lives at most 15 more minutes — an acceptable window for most threat models.

python
# Issuing tokens at login
def login_with_refresh(username, password):
    user = db.authenticate(username, password)

    access_token = jwt.encode({
        "userId": user.id,
        "exp":    now_unix() + 900,     # 15 minutes
    }, SECRET_KEY)

    refresh_token = generate_secure_random(64)
    db.store_refresh_token({
        "token":     sha256(refresh_token),  # store hash, not plaintext
        "userId":    user.id,
        "expiresAt": now() + timedelta(days=30),
        "deviceId":  request.get_device_id(),
    })

    return {"access_token": access_token, "refresh_token": refresh_token}

# Client calls this when access_token expires (HTTP 401)
def refresh_access_token(refresh_token):
    token_hash = sha256(refresh_token)
    record = db.find_refresh_token(token_hash)

    if not record or record["expiresAt"] < now():
        raise AuthError("refresh token invalid or expired")

    # Rotate: old token out, new token in (prevents replay)
    db.delete_refresh_token(token_hash)
    return login_with_refresh.issue_new_pair(record["userId"])

Solution B: Token Blacklist in Redis

When a token is revoked, store its jti (JWT ID claim) in Redis with TTL equal to the token’s remaining lifetime. Each request checks the blacklist.

Trade-off: This effectively reintroduces a Redis lookup on every request — partially defeating the "stateless" argument for JWT. The upside is that only revoked tokens are in the blacklist (usually a tiny fraction), so the data structure stays small.

Solution C: Token Versioning

Store a tokenVersion integer on the user record in the database. Include it in the JWT payload. On every request, verify the JWT’s tokenVersion matches the current value in the DB.

Revoking all sessions for a user is a single UPDATE users SET token_version = token_version + 1 WHERE id = ?. All existing tokens fail their version check on the next request.

sql
-- Revoke all sessions for a user
UPDATE users
SET    token_version = token_version + 1
WHERE  id = 'user_123';

-- Application check (pseudo-code in SQL style)
-- jwt.tokenVersion must equal users.token_version
SELECT token_version
FROM   users
WHERE  id = jwt_claim_user_id
  AND  token_version = jwt_claim_token_version;

This approach re-introduces one DB read per request, but only a single integer column — fast with a primary key lookup and easily cached.


3. SSO Architecture

The protocol underlying
most SSO systems is
SAML 2.0 (enterprises)
or OpenID Connect
(modern web). OIDC is
OAuth 2.0 + an identity
layer (the id_token).
Google uses OIDC.

Single Sign-On answers the question: how does logging into one service automatically authenticate you to all others? The answer is a centralized Identity Provider (IdP)accounts.google.com — that all services (called Service Providers or Relying Parties) delegate authentication to.

The canonical flow:

SSO Authentication Flow
1
User visits mail.google.com. Gmail checks for a local session — none found. Gmail redirects to accounts.google.com/login?service=gmail&return_to=https://mail.google.com
2
User enters credentials on accounts.google.com. Auth Server verifies password (and MFA if enrolled). Creates a long-lived SSO session in Redis, sets an accounts.google.com cookie (httpOnly, Secure).
3
Auth Server generates a short-lived (60-second) SSO token — a signed, single-use ticket. Redirects to mail.google.com?sso_token=XYZ.
4
Gmail sends sso_token to Auth Server for validation (server-to-server). Auth Server verifies signature, marks token as used (prevents replay), returns user identity.
5
Gmail creates its own local session for the user. Sets a mail.google.com cookie. User is now authenticated to Gmail.
6
User clicks youtube.com. YouTube has no local session. Redirects to accounts.google.com. Auth Server detects the existing SSO session cookie — no credentials re-entry needed. Issues a new SSO token for YouTube. Steps 3–5 repeat silently.

The key architectural insight: each service maintains its own local session (for performance — they don’t hit the Auth Server on every request), but all of them were bootstrapped via the same central SSO session.

The SSO cookie lives on a different domain (accounts.google.com) from the service cookies (mail.google.com, youtube.com). Browsers scope cookies to domains, so the SSO cookie travels with every request to the Auth Server but is invisible to the individual services. This is not a bug — it's the design.

The SSO Token Exchange (Server-to-Server Validation)

python
# Auth Server: issue SSO token after successful authentication
def issue_sso_token(user_id, service, return_to):
    token_id = generate_random_id()
    token_data = {
        "userId":    user_id,
        "service":   service,
        "return_to": return_to,
        "createdAt": now_unix(),
        "expiresAt": now_unix() + 60,   # 60-second window
        "used":      False,
    }
    redis.setex("sso_token:" + token_id, 120, json_encode(token_data))
    return token_id

# Service Provider: validate SSO token (server-to-server)
def validate_sso_token(token_id):
    key = "sso_token:" + token_id
    data = redis.get(key)

    if not data:
        raise AuthError("token not found or expired")
    if data["expiresAt"] < now_unix():
        raise AuthError("token expired")
    if data["used"]:
        raise AuthError("token already used — replay attack?")

    # Mark as used atomically to prevent replay
    redis.hset(key, "used", True)

    return data["userId"]

4. OAuth 2.0 + PKCE for Third-Party Apps

OAuth 2.0 is not an
authentication protocol.

It is an authorization
framework. OpenID
Connect (OIDC) adds
the identity layer on
top. “Login with Google”
is OIDC, not raw OAuth.

OAuth solves a different problem: how does a third-party application (say, a calendar app) get limited access to your Google data, without you giving it your Google password?

The Authorization Code Flow with PKCE (Proof Key for Code Exchange) is the current standard for all OAuth clients, especially mobile and single-page apps that cannot safely store a client secret.

Why PKCE?

Without PKCE, the authorization code returned in the redirect URL could be intercepted by a malicious app on the same device (common on mobile — any app can register a URL scheme). PKCE makes the authorization code useless without the original code_verifier known only to the legitimate app.

OAuth 2.0 + PKCE — Authorization Code Flow
1
App generates PKCE pair:
code_verifier = 64 random bytes (base64url-encoded)
code_challenge = BASE64URL(SHA256(code_verifier))
App stores code_verifier in memory (never sent to server).
2
Redirect user to Auth Server:
GET /authorize?response_type=code&client_id=APP_ID&redirect_uri=https://app.example.com/callback&scope=email+calendar&code_challenge=CHALLENGE&code_challenge_method=S256&state=RANDOM_STATE
The state parameter prevents CSRF attacks.
3
User authenticates and consents. Auth Server stores code_challenge alongside the generated authorization code. Redirects to:
https://app.example.com/callback?code=AUTH_CODE&state=RANDOM_STATE
4
App exchanges code for tokens (back-channel, server-to-server):
POST /token { grant_type=authorization_code, code=AUTH_CODE, redirect_uri=..., code_verifier=VERIFIER }
Auth Server recomputes SHA256(code_verifier) and verifies it matches the stored code_challenge. If it does, issues tokens.
5
Auth Server responds:
{ "access_token": "...", "token_type": "Bearer", "expires_in": 3600, "refresh_token": "...", "id_token": "..." }
The id_token is an OIDC JWT containing the user's identity (sub, email, name).
javascript
// PKCE: generating the code_verifier and code_challenge
async function generatePKCE() {
    // 1. Generate a cryptographically random verifier
    const array = new Uint8Array(64);
    crypto.getRandomValues(array);
    const verifier = base64URLEncode(array);

    // 2. Hash it: challenge = BASE64URL(SHA256(verifier))
    const data = new TextEncoder().encode(verifier);
    const hashBuffer = await crypto.subtle.digest("SHA-256", data);
    const challenge = base64URLEncode(new Uint8Array(hashBuffer));

    return { verifier, challenge };
}

function base64URLEncode(buffer) {
    return btoa(String.fromCharCode(...buffer))
        .replace(/\+/g, "-")
        .replace(/\//g, "_")
        .replace(/=/g, "");
}

5. Interactive: JWT Playground

JWT Decode & Tamper Playground

6. Interactive: SSO Flow Visualizer

SSO Session Propagation
Auth Server
accounts.google.com
Idle
Service
Gmail
Not logged in
Service
YouTube
Not logged in
Service
Drive
Not logged in

7. Session Storage at Scale

Redis is not a database.
It is an in-memory
store with optional
persistence. For session
data you can afford
to lose (user just
logs in again), this
is fine. For refresh
tokens, you need
durability — use
Redis AOF or a proper DB.

Google has roughly 5 billion active sessions. Keeping all of them in a single Redis instance is impossible (memory limit) and unwise (single point of failure). The solution is tiered storage based on session activity.

Hot tier — Redis Cluster:

  • Sessions active in the last 7 days
  • Sharded by sessionId across 50+ nodes (~50 GB each)
  • O(1) reads, sub-millisecond latency
  • LRU eviction pushes cold sessions to warm tier

Warm tier — Redis with disk persistence:

  • Sessions 7–30 days inactive
  • Slower access acceptable — user is returning after a gap
  • When accessed, session is promoted back to hot tier

Cold tier — Cassandra:

  • Sessions 30+ days inactive (keep for “remember me” scenarios)
  • Wide-column model: partition key is userId, clustering key is sessionId
  • Batch deletion of expired sessions via TTL
python
class TieredSessionStore:

    def get(self, session_id):
        # 1. Check hot tier (Redis) first
        session = self.redis_hot.get("sess:" + session_id)
        if session:
            self.redis_hot.expire("sess:" + session_id, 604800)  # refresh TTL
            return decode(session)

        # 2. Check warm tier
        session = self.redis_warm.get("sess:" + session_id)
        if session:
            self._promote_to_hot(session_id, session)
            return decode(session)

        # 3. Check cold tier (Cassandra)
        row = self.cassandra.execute(
            "SELECT * FROM sessions WHERE session_id = ?",
            [session_id]
        ).one()
        if row:
            self._promote_to_hot(session_id, encode(row))
            return row

        return None  # session not found anywhere

    def _promote_to_hot(self, session_id, data):
        self.redis_hot.setex("sess:" + session_id, 604800, data)
        # Optionally delete from warm/cold to avoid duplication

8. Logout Everywhere

When a user clicks “Sign out of all devices,” the system must invalidate every active session across every device, every browser, every service. This is the logout problem at its hardest.

1
Increment tokenVersion in DB: One SQL statement: UPDATE users SET token_version = token_version + 1 WHERE id = 'user_123'. All access tokens now carry a stale version — they will fail on next use.
2
Delete all refresh tokens: DELETE FROM refresh_tokens WHERE user_id = 'user_123'. Clients can no longer silently renew their access tokens.
3
Destroy SSO session: Delete the SSO session from Redis. Any service that redirects back to the Auth Server will find no active session and force re-authentication.
4
Local service sessions: These expire naturally. If the access token has a 15-minute TTL, within 15 minutes every service will return 401 and the user will be prompted to log in again. For near-instant revocation, services must check tokenVersion on each request.
The 15-minute gap: Even after logout-everywhere, an active access token remains usable until it expires. For most systems, 15 minutes is acceptable. For high-security scenarios (compromised account, banking), use token versioning with a per-request DB check — you lose the stateless benefit but gain instant revocation.

9. Multi-Factor Authentication (MFA)

TOTP (RFC 6238) uses
HMAC-SHA1 over the
current Unix time
divided by 30. The
same algorithm runs
on your phone and
the server — if
clocks are in sync,
the codes match.
No network needed.

MFA adds a second verification step after password authentication. The most common mechanism is TOTP (Time-based One-Time Password), used by Google Authenticator, Authy, and 1Password.

TOTP algorithm:

python
import hmac, hashlib, struct, time, base64

def generate_totp(secret_base32, digits=6, period=30):
    # 1. Decode the shared secret (stored in user DB, displayed as QR code)
    secret = base64.b32decode(secret_base32.upper())

    # 2. Compute time counter: 30-second windows since Unix epoch
    counter = int(time.time()) // period

    # 3. HMAC-SHA1 of the 8-byte big-endian counter
    msg = struct.pack(">Q", counter)
    h = hmac.new(secret, msg, hashlib.sha1).digest()

    # 4. Dynamic truncation: take 4 bytes at offset indicated by last nibble
    offset = h[-1] & 0x0F
    code = struct.unpack(">I", h[offset:offset+4])[0] & 0x7FFFFFFF

    # 5. Modulo to get N-digit code
    return str(code % (10 ** digits)).zfill(digits)

def verify_totp(secret, provided_code, window=1):
    # Accept current window and ±1 (clock skew tolerance)
    for drift in range(-window, window + 1):
        expected = generate_totp(secret, period=30)
        if hmac.compare_digest(expected, provided_code):
            return True
    return False

The MFA challenge flow:

python
def login_step1(username, password):
    user = db.authenticate(username, password)
    if not user.mfa_enabled:
        return issue_full_session(user)  # no MFA, done

    # Issue a short-lived challenge token (not a full session!)
    challenge = {
        "userId":    user.id,
        "mfaNeeded": True,
        "exp":       now_unix() + 60,  # 60-second window to enter MFA code
    }
    challenge_token = jwt.encode(challenge, MFA_KEY, algorithm="HS256")
    return {"mfa_required": True, "challenge_token": challenge_token}

def login_step2(challenge_token, totp_code):
    claims = jwt.decode(challenge_token, MFA_KEY, algorithms=["HS256"])
    user = db.find_user(claims["userId"])

    if not verify_totp(user.mfa_secret, totp_code):
        raise AuthError("invalid TOTP code")

    return issue_full_session(user)  # MFA passed, issue real session

10. Capacity Estimate

MetricAssumptionResult
Active sessions (Google-scale)~5 billion logged-in users~5,000,000,000
Session size in RedisuserId + metadata + expiry~500 bytes
Total session storage5B × 500 bytes~2.5 TB
Redis nodes required50 GB usable per node~50 nodes
Auth requests / second5B sessions / 10s avg request interval~500,000 req/s
Token refresh requests / dayEvery access token refreshed every 15 min~5B × 96 = ~480B/day
Refresh token DB size1 row per device × avg 3 devices/user~15B rows
Auth Server replication500K req/s at 5ms/req per core~1,000 cores
Why not one Redis? A single Redis node handles ~100K ops/sec with sub-millisecond latency. At 500K auth req/sec plus session reads/writes, you need a Redis Cluster with at minimum 10–20 shards, with replicas for fault tolerance — realistically 50+ nodes for Google-scale with redundancy.

11. Security Hardening Checklist

httpOnly cookies vs
localStorage for JWTs:

httpOnly prevents XSS
reads but is vulnerable
to CSRF. localStorage
blocks CSRF but is
readable by JS (XSS).
Neither is perfect.
The security community
debates this endlessly —
the right answer is
“it depends on your
threat model.”

Beyond the core architecture, production-grade auth systems require these mitigations:

CSRF protection: Every state-changing request must include either a CSRF token (double-submit cookie pattern) or use the SameSite=Strict cookie attribute to prevent cross-site form submissions.

Token storage: Store access tokens in httpOnly cookies (inaccessible to JavaScript — prevents XSS token theft). Store refresh tokens the same way. Never put tokens in localStorage if XSS is a realistic threat vector.

Rate limiting on auth endpoints: The login endpoint is the most-attacked endpoint in any system. Apply per-IP rate limiting (e.g., 10 attempts per 15 minutes), account lockout after N failures, and CAPTCHA after repeated failures.

Refresh token rotation: On every use of a refresh token, immediately issue a new one and invalidate the old. If a refresh token is used twice, it likely means the original was stolen — revoke all tokens for that user.

python
def use_refresh_token(token):
    record = db.find_refresh_token(sha256(token))

    if not record:
        # Token not found — either expired or already used.
        # Check if this token was recently rotated (possible replay attack)
        rotated = db.find_rotated_token(sha256(token))
        if rotated:
            # Replay detected: revoke the entire token family
            db.revoke_token_family(rotated.family_id)
            raise SecurityAlert("refresh token reuse detected")
        raise AuthError("token invalid")

    # Valid: rotate (issue new, invalidate old)
    new_token = generate_secure_random(64)
    db.rotate_refresh_token(
        old_token_hash=sha256(token),
        new_token_hash=sha256(new_token),
        family_id=record.family_id
    )
    new_access = issue_access_token(record.user_id)
    return {"access_token": new_access, "refresh_token": new_token}

# Secure cookie settings
# Set-Cookie: refresh_token=XYZ; HttpOnly; Secure; SameSite=Strict; Path=/auth/refresh
# Path=/auth/refresh means the cookie is ONLY sent to the refresh endpoint

12. System Diagram: Full Architecture

Component Overview
┌──────────────────────────────────────────────────────────────┐
│                         Browser                              │
│   Cookies: accounts.google.com (SSO)  +  mail.google.com     │
└──────────┬──────────────────────────────────────┬────────────┘
           │  HTTPS                               │  HTTPS
           ▼                                      ▼
┌──────────────────────┐               ┌──────────────────────┐
│  Auth Server Cluster │               │  Service Cluster     │
│  accounts.google.com │               │  (Gmail, YouTube...) │
│                      │◄── s2s ──────►│                      │
│  - GAIA auth logic   │  token valid? │  - Local session     │
│  - MFA verification  │               │  - API calls         │
│  - OAuth 2.0 + OIDC  │               │  - Verifies JWT      │
└──────────┬───────────┘               └──────────┬───────────┘
           │                                      │
    ┌──────▼──────┐                        ┌──────▼──────┐
    │ Redis Cluster│                        │  Redis Hot  │
    │  SSO sessions│                        │  (sessions) │
    │  MFA tokens  │                        └──────┬──────┘
    │  Blacklist   │                               │
    └──────┬───────┘                        ┌──────▼──────┐
           │                                │  Redis Warm │
    ┌──────▼───────┐                        └──────┬──────┘
    │  Primary DB  │                               │
    │  - users     │                        ┌──────▼──────┐
    │  - tokenVer  │                        │  Cassandra  │
    │  - refresh   │                        │  (cold)     │
    │    tokens    │                        └─────────────┘
    └──────────────┘

Summary: Interview Cheat Sheet

TopicKey DecisionProduction Recommendation
Token typeSessions vs JWTJWT access tokens (15 min) + opaque refresh tokens (30 days)
RevocationInstant vs eventualToken versioning in DB for logout-everywhere; blacklist for single-token revocation
SSO mechanismCentral IdPSingle auth domain issues short-lived SSO tokens; services create local sessions
Third-party authOAuth flowAuthorization Code + PKCE; mandatory for mobile/SPA; never use Implicit Flow
Session storageHot/warm/coldRedis Cluster (hot) → Redis+disk (warm) → Cassandra (cold)
Token storageCookie vs localStoragehttpOnly cookies; SameSite=Strict; Path-scoped refresh endpoint
MFATOTP vs pushTOTP (RFC 6238) + recovery codes; push notifications for enterprise
ScaleAuth bottleneckStateless JWT verification removes auth from critical path; 50+ Redis shards for sessions