Design WhatsApp

WhatsApp is one of the world’s most popular messaging platforms, serving over 2 billion users globally. This document outlines the system design for a WhatsApp-like service, covering real-time messaging, end-to-end encryption, multimedia sharing, group chats, and presence management at massive scale.

Designing WhatsApp presents unique challenges including maintaining persistent connections for billions of users, ensuring message delivery guarantees, implementing end-to-end encryption at scale, and handling real-time presence updates while maintaining sub-second latency.

Step 1: Understand the Problem and Establish Design Scope

Before diving into the design, it’s crucial to define the functional and non-functional requirements. For user-facing applications like this, functional requirements are the “Users should be able to…” statements, whereas non-functional requirements define system qualities via “The system should…” statements.

Functional Requirements

Core Requirements (Priority 1-3):

Users should be able to send one-to-one real-time messages with sub-second latency.
Users should be able to participate in group chats (up to 256 members per group).
Users should be able to share multimedia content (images, videos, audio, documents up to 2GB).
Users should be able to see online/offline status, last seen, and typing indicators.

Below the Line (Out of Scope):

Users should be able to make voice and video calls.
Users should be able to post 24-hour ephemeral status updates.
Users should be able to edit or delete sent messages.
Users should be able to create channels for one-to-many broadcasting.

Non-Functional Requirements

Core Requirements:

The system should ensure 95% of messages are delivered within 1 second.
The system should provide message delivery guarantees (at-least-once delivery).
The system should support end-to-end encryption using Signal Protocol.
The system should handle 100 billion messages per day with 500 million concurrent connections.

Below the Line (Out of Scope):

The system should achieve 99.99% uptime with no single point of failure.
The system should ensure GDPR compliance for data privacy.
The system should have multi-region deployment with disaster recovery.
The system should provide comprehensive monitoring and alerting.

Clarification Questions & Assumptions:

Platform: Mobile apps (iOS, Android) and web clients.
Scale: 2 billion monthly active users (MAU) with 1.5 billion daily active users (DAU).
Message Volume: 100 billion messages per day, averaging 1.2 million messages per second.
Peak Load: Up to 5 million messages per second during peak hours.
Storage: Average message size of 1KB for text, 500KB for media files.
Retention: 7-day retention for undelivered messages, with longer-term archival policies.

Back-of-the-Envelope Calculations

Traffic Estimates: With 1.5 billion daily active users sending an average of 67 messages per day each, we get approximately 100 billion messages daily. This translates to roughly 1.2 million messages per second on average, with peak loads reaching 5 million messages per second.

The read-to-write ratio is approximately 1:1, as every message sent is typically read by at least one recipient. For group messages, the ratio increases proportionally to group size.

Bandwidth Requirements: For text messages at 1KB average, incoming bandwidth is approximately 1.2GB per second. With 20% of messages containing media at 500KB average, media bandwidth adds roughly 115GB per second. Total bandwidth requirements (bidirectional) are around 230GB per second at average load.

Storage Requirements: Daily message storage is approximately 100TB for text messages. With media comprising 20% of messages, daily media storage reaches 10PB raw, but with deduplication and compression, this reduces to roughly 1-2PB per day. Annual storage needs, considering retention policies and compression, are in the range of 10-20PB for messages and several exabytes for media.

Connection Distribution: With 500 million concurrent connections at peak and assuming each server can handle 100,000 concurrent WebSocket connections, we need approximately 5,000 servers dedicated to connection management. The message rate per connection is quite sparse, averaging only 0.0024 messages per second per connection.

Step 2: Propose High-Level Design and Get Buy-in

Planning the Approach

Before moving on to designing the system, it’s important to plan your strategy. For user-facing product-style questions, the plan should be straightforward: build your design up sequentially, going one by one through your functional requirements. This will help you stay focused and ensure you don’t get lost in the weeds.

Defining the Core Entities

To satisfy our key functional requirements, we’ll need the following entities:

User: Any person who uses WhatsApp for messaging. Includes personal information such as phone number (used as unique identifier), display name, profile picture, public encryption keys (identity key, pre-keys), privacy settings, and account creation timestamp.

Message: An individual message sent between users. Contains the encrypted content blob, sender and receiver identifiers, conversation identifier, unique message ID (generated using Snowflake algorithm), timestamp, delivery status (sent, delivered, read), and metadata such as message type and encryption headers.

Conversation: A chat thread between two users or within a group. Includes conversation identifier, participant list, conversation type (one-to-one or group), last message timestamp, and unread message count. For groups, also includes group name, description, admin list, and group settings.

Media: Multimedia content shared in messages. Includes content hash (SHA-256 for deduplication), storage location (S3 key), file type, file size, encryption key, thumbnail references, and upload timestamp.

Presence: Real-time status information for users. Tracks online/offline status, last seen timestamp, typing indicator status, and the gateway server currently handling the user’s connection.

API Design

Send Message Endpoint: Used by clients to send messages to other users or groups. The message content is already encrypted on the client side.

POST /messages -> Message
Body: {
  conversationId: string,
  encryptedContent: blob,
  messageType: "text" | "media",
  metadata: object
}

Update Location Presence: Used by clients to maintain connection state and update presence information.

POST /presence -> Success/Error
Body: {
  status: "online" | "typing",
  conversationId?: string
}

Upload Media Endpoint: Used to upload multimedia content before sending it in a message.

POST /media -> MediaReference
Body: {
  encryptedFile: blob,
  fileType: string,
  fileSize: number
}

Get Pre-Keys Endpoint: Used during encryption key exchange to retrieve another user’s public keys.

GET /users/:userId/keys -> KeyBundle
Returns: {
  identityKey,
  signedPreKey,
  oneTimePreKey
}

Acknowledge Message Endpoint: Used by clients to acknowledge message delivery and read status.

PATCH /messages/:messageId/status -> Success
Body: {
  status: "delivered" | "read"
}

High-Level Architecture

Let’s build up the system sequentially, addressing each functional requirement:

1. Users should be able to send one-to-one real-time messages with sub-second latency

The core components necessary to fulfill real-time messaging are:

Client Applications: Mobile apps for iOS and Android, plus web clients. Maintain persistent WebSocket connections to the backend for real-time bidirectional communication.
Load Balancer: GeoDNS-based routing directs clients to the nearest data center. Layer 4 load balancers then distribute connections across WebSocket gateway servers using IP hash for session stickiness.
WebSocket Gateway: Maintains persistent connections with clients. Handles connection lifecycle, authentication, heartbeat monitoring, and routing messages to appropriate services. Each gateway server can handle approximately 100,000 concurrent connections.
Message Service: Core message processing logic. Validates messages, generates unique message IDs using Snowflake algorithm, handles message routing, manages acknowledgments (sent, delivered, read), and coordinates with storage systems.
Message Store: Distributed database (Cassandra) for storing messages. Partitioned by user ID and conversation ID to ensure all messages for a user’s conversation are co-located. Provides high write throughput and horizontal scalability.
User Database: Relational database (MySQL with sharding) storing user profiles, contacts, encryption keys, and settings. Sharded by user ID using consistent hashing.

Message Flow:

The sender’s client encrypts the message using the Signal Protocol and sends it via WebSocket to the Gateway.
The Gateway authenticates the request and forwards it to the Message Service.
The Message Service generates a unique message ID, stores the message in the sender’s partition in Cassandra, and returns ACK_SENT to the sender.
The service checks if the recipient is online by querying Redis presence cache.
If online, it routes the message to the recipient’s Gateway server (identified via Redis routing table) using Kafka as the internal message bus.
The recipient’s Gateway delivers the message, and upon acknowledgment, the Message Service stores it in the recipient’s partition and sends ACK_DELIVERED to the sender.
When the recipient opens the chat and views the message, a READ_RECEIPT is sent back to the sender.

2. Users should be able to participate in group chats

We extend our existing design to support group messaging:

Group Service: Manages group metadata including member lists, admin lists, group settings, and permissions. Handles group creation, member addition/removal, and enforces group policies.
Group Metadata Cache: Redis cache storing group membership information for fast lookups during message fan-out.

Group Message Flow:

The sender’s client sends a message to a group conversation.
The Message Service receives the message and queries the Group Service for the member list.
For groups, WhatsApp uses the Sender Keys protocol where messages are encrypted once with a symmetric key, making group encryption efficient.
The Message Service enqueues the message for fan-out to all group members using Kafka, with one message containing all recipient IDs.
A consumer processes the fan-out, creating individual delivery tasks for each member.
Online members receive messages immediately; offline members have messages queued in Redis.
The sender receives acknowledgments as members receive and read the message.

Group Optimization: For small groups (under 50 members), standard fan-out works well. For larger groups, optimizations include batched enqueuing, lazy expansion (storing the message once and creating pointers in each user’s inbox), and prioritizing delivery to online members.

We add specialized components for media handling:

Media Service: Handles multimedia upload/download operations. Generates thumbnails and previews, performs compression and optimization, manages deduplication using content hashing, and integrates with CDN for fast delivery.
Media Storage: Object storage (S3) for scalable, durable media storage. Uses content-addressed storage where the SHA-256 hash of the content becomes the storage key, enabling automatic deduplication.
CDN: Content Delivery Network (CloudFront) caches media at edge locations globally for low-latency access.

Media Upload Flow:

The client requests an upload token from the Media Service via the Gateway.
The Media Service generates a pre-signed URL with expiration and returns it to the client.
The client encrypts the media file using AES-256 with a randomly generated key.
The client uploads the encrypted file directly to S3 using the pre-signed URL (bypassing the Gateway for large files).
The Media Service computes the content hash, checks for deduplication, generates thumbnails, and returns a media ID.
The client sends a message containing the media ID and the encryption key (encrypted via the Signal Protocol).
The recipient downloads the encrypted media, decrypts it using the key from the message, and displays it.

Media Optimization: Images are re-encoded as JPEG with 85% quality. Videos are re-encoded to H.264 with AAC audio at maximum 1Mbps bitrate. Audio messages use the Opus codec at 32kbps. All media includes thumbnail generation for quick preview. Lifecycle policies automatically move media from hot storage (7 days) to warm (30 days) to cold (1 year) tiers, with eventual archival or deletion.

4. Users should be able to see online/offline status, last seen, and typing indicators

We add presence management components:

Presence Service: Tracks user online/offline status in real-time. Manages last seen timestamps, handles typing indicators, and publishes presence updates to interested parties.
Presence Cache: Redis-based cache storing presence information with short TTL (5 minutes). Automatically expires inactive users to offline state.
Pub/Sub System: Redis Pub/Sub for broadcasting presence updates to clients interested in a user’s status.

Presence Management Flow:

When a client connects, the Gateway registers the user as online in the Presence Cache with a 5-minute TTL.
Any activity (sending a message, viewing a chat) refreshes the TTL.
Other users who have this user in their contacts receive a presence update via Pub/Sub.
When typing, the client sends a typing indicator that’s broadcast to conversation participants with a 5-second TTL.
If the TTL expires without refresh, Redis automatically sets the user as offline.
The last seen timestamp is updated on disconnect or TTL expiry, subject to the user’s privacy settings.

Privacy Controls: Users can configure who sees their last seen status: Everyone, Contacts Only, or Nobody. The Presence Service checks these settings before returning presence information. Read receipts can also be disabled, though this prevents the user from seeing others’ read receipts as well.

Step 3: Design Deep Dive

With the core functional requirements met, it’s time to dig into the non-functional requirements via deep dives. These are the critical areas that separate good designs from great ones.

Deep Dive 1: How do we maintain 500 million concurrent WebSocket connections?

Managing half a billion persistent connections is one of WhatsApp’s biggest challenges. Traditional web servers aren’t designed for this connection density.

Solution: Horizontally Scaled Gateway Cluster with Connection Routing

Architecture: We deploy a large cluster of dedicated WebSocket gateway servers. Each server is optimized to handle approximately 100,000 concurrent connections, requiring roughly 5,000 servers globally. These servers are distributed across multiple data centers in different geographic regions.

Connection Management: Each gateway server maintains a local in-memory map of user ID to connection object. This allows O(1) lookup when delivering messages to users connected to that server. A global routing table stored in Redis maps each user ID to the gateway server ID currently handling their connection.

Protocol Design: WhatsApp uses a custom binary protocol over WebSocket, influenced by XMPP but optimized for mobile networks. The protocol uses binary encoding rather than JSON or XML to minimize overhead. Each message frame contains a compact header (2 bytes), operation code (1 byte), length field (4 bytes), and payload. Operation codes identify message types: MESSAGE, ACK, PRESENCE, TYPING, READ_RECEIPT, HEARTBEAT, and MEDIA_MESSAGE.

Connection Lifecycle: When a client connects, it establishes a WebSocket connection to the nearest data center via GeoDNS routing. The Gateway performs authentication using the provided token, validates with the Auth Service, and registers the connection in both local memory and the global Redis routing table. The Gateway then checks Redis for any offline messages queued for this user and delivers them in order.

Heartbeat Mechanism: Clients send heartbeat messages every 30 seconds (adjustable based on network conditions). The server responds with an acknowledgment. If three consecutive heartbeats are missed, the server closes the connection and removes it from the routing table. Clients implement exponential backoff for reconnection, starting at 1 second and doubling up to a maximum of 60 seconds.

Session Resumption: When clients reconnect after a brief disconnection, they include the last received message ID. The Gateway can then replay any missed messages without requiring a full conversation sync. This is crucial for mobile networks where connections frequently drop and reconnect.

Load Distribution: Layer 4 load balancers use IP hash for sticky sessions, ensuring clients reconnect to the same gateway server when possible. This optimizes connection state management. If a server needs maintenance or crashes, clients automatically reconnect to a different server through the load balancer, and the connection routing table in Redis is updated.

Resource Optimization: Gateway servers are optimized with minimal memory per connection, using efficient data structures. Connection buffers are sized appropriately to handle typical message sizes. Server operating systems are tuned for high connection counts, adjusting kernel parameters for file descriptors and TCP buffer sizes.

Deep Dive 2: How do we implement end-to-end encryption with the Signal Protocol?

WhatsApp’s end-to-end encryption ensures that only the sender and recipient can read message content, not even WhatsApp’s servers. This is implemented using the Signal Protocol.

Signal Protocol Overview: The Signal Protocol provides forward secrecy (compromising current keys doesn’t reveal past messages), future secrecy (self-healing from key compromise), and cryptographic deniability. It uses several types of keys:

Key Types: Identity Key Pair is a long-term key unique to each user, generated using Ed25519. Signed Pre-Key is a medium-term key rotated weekly, using X25519 Diffie-Hellman. One-Time Pre-Keys are short-lived keys uploaded in batches, with each key used exactly once for a new session. Ratchet Keys are per-message keys derived using the Double Ratchet Algorithm.

Key Exchange (X3DH): When Alice wants to send her first message to Bob, she must establish a shared secret. Bob has previously uploaded his Identity Key, Signed Pre-Key, and a batch of One-Time Pre-Keys to the server. Alice requests Bob’s key bundle from the server. She receives Bob’s public keys and performs the X3DH (Extended Triple Diffie-Hellman) key agreement locally on her device.

The shared secret is computed by combining the results of four Diffie-Hellman operations: Alice’s Identity Key with Bob’s Signed Pre-Key, Alice’s Ephemeral Key with Bob’s Identity Key, Alice’s Ephemeral Key with Bob’s Signed Pre-Key, and Alice’s Ephemeral Key with Bob’s One-Time Pre-Key. This multi-layered approach provides strong security properties.

Alice then sends her first message encrypted with keys derived from this shared secret. Bob receives the message, performs the same X3DH calculation using Alice’s public keys (included in the message header), derives the same shared secret, and decrypts the message.

Double Ratchet Algorithm: After the initial key exchange, the Double Ratchet Algorithm provides ongoing key management. It consists of two ratchets working together: the DH Ratchet and the Symmetric Ratchet.

The DH Ratchet performs periodic Diffie-Hellman key exchanges, typically with each message round trip. This generates new root keys and chain keys, providing forward secrecy. The Symmetric Ratchet derives individual message keys from chain keys using HKDF (HMAC-based Key Derivation Function).

For each message, the current chain key is used to derive both a message key (for encrypting that specific message) and the next chain key. The chain key is then deleted, ensuring that compromise of the current chain key doesn’t reveal previous message keys. This provides forward secrecy at the message level.

Message Encryption Format: Each encrypted message contains a version byte, a header with the sender’s current ratchet public key (for DH ratchet), the previous chain length (for handling out-of-order delivery), the message number, the ciphertext, and a MAC for authentication. The recipient uses the header information to determine which keys to use for decryption.

Server’s Role: The server acts purely as a relay for encrypted blobs. It has no access to message plaintext. The server stores and distributes public keys to facilitate key exchange. It cannot decrypt messages since it doesn’t have access to private keys. However, metadata like sender, recipient, timestamp, and message size is visible to the server, as this information is necessary for routing.

Group Chat Encryption: Pairwise encryption for groups would be inefficient. For a group with N members, the sender would need to encrypt the message N times. WhatsApp uses Sender Keys to optimize group encryption.

Each group member generates a Sender Key (symmetric key). This Sender Key is distributed to other group members encrypted using pairwise Signal Protocol sessions. Messages are then encrypted once using the Sender Key and sent to all members. Members decrypt using the shared Sender Key. When a member leaves the group, a new Sender Key is generated and distributed, ensuring forward secrecy.

Key Storage: All private keys are stored locally on user devices, typically in secure enclaves (iOS Secure Enclave, Android Keystore). Keys are never transmitted to the server in unencrypted form. If a user reinstalls WhatsApp, they generate new keys, and a new verification code is displayed to contacts, allowing them to verify the security of the conversation.

Deep Dive 3: How do we ensure reliable message delivery with at-least-once semantics?

Message delivery reliability is critical. Messages must not be lost due to network failures, server crashes, or client disconnections.

Solution: Multi-Layer Acknowledgment System with Persistent Queues

Three-Tier Acknowledgment: The system uses three acknowledgment levels to track message progress through the delivery pipeline.

ACK_SENT (single gray checkmark) indicates the message successfully reached the server and was stored in the sender’s message partition in Cassandra. This guarantees durability; even if the client crashes, the message won’t be lost.

ACK_DELIVERED (double gray checkmarks) indicates the message was delivered to the recipient’s device. The recipient’s client sends an acknowledgment to the server, which stores the message in the recipient’s partition in Cassandra and notifies the sender.

ACK_READ (blue checkmarks) indicates the recipient opened the conversation and viewed the message. The client sends a READ_RECEIPT when the message enters the viewport, which is forwarded to the sender to update the UI.

Message Flow with Failures: When the sender’s client sends a message, it includes a client-generated message ID. If the client doesn’t receive ACK_SENT within 5 seconds, it retries. The server uses the client message ID to detect and handle duplicates, ensuring idempotent message processing.

Once the message is stored in Cassandra (sender’s partition), ACK_SENT is returned. If the recipient is online, the message is routed to their Gateway via Kafka. If offline, it’s queued in Redis. If delivery fails, the message remains in the queue for retry.

Kafka for Reliable Internal Messaging: Internal message routing between services uses Kafka for reliability. Messages are partitioned by conversation ID to maintain ordering. Each message is only acknowledged after successful processing. If a consumer crashes, Kafka’s rebalancing mechanism ensures another consumer picks up the messages.

Offline Message Storage: Redis is used for short-term offline message queuing. Messages are stored in sorted sets keyed by user ID, with timestamps as scores for ordering. When a user reconnects, the Gateway queries Redis for queued messages and delivers them in order from oldest to newest.

As each message is acknowledged by the client, it’s removed from the Redis queue. If the queue grows too large, messages are delivered in batches to avoid overwhelming the client. Messages older than 7 days are moved from Redis to long-term cold storage in Cassandra.

Deduplication: Each message has a unique Snowflake ID generated by the server. Clients also include a client-generated message ID. The server maintains a mapping between client message IDs and server message IDs. If a client retries due to network failure, the server recognizes the duplicate and returns the existing message ID instead of creating a new message.

Eventual Consistency: While WhatsApp aims for strong consistency within a conversation, the system accepts eventual consistency across replicas. Messages are written with QUORUM consistency in Cassandra (2 out of 3 replicas), balancing consistency with availability. If a replica is temporarily unavailable, hinted handoff ensures it eventually receives the update.

Deep Dive 4: How do we efficiently store and query 100 billion messages per day?

The sheer volume of messages requires careful database selection and sharding strategy.

Solution: Cassandra with User-Conversation Partitioning

Why Cassandra: Cassandra is a wide-column NoSQL database designed for high write throughput. It provides linear scalability by adding more nodes, with no single point of failure. Its tunable consistency model allows balancing between consistency and availability. The partition-key-based architecture aligns perfectly with WhatsApp’s access patterns.

Partitioning Strategy: Messages are partitioned by a composite key of user ID and conversation ID. This ensures all messages for a specific user’s conversation are stored together on the same partition. The clustering key is timestamp followed by message ID, ensuring messages are stored in chronological order.

This partitioning strategy provides several benefits. Fetching a user’s conversation is a single-partition query, which is very fast. Messages within a conversation are naturally ordered by timestamp. The load is evenly distributed across the cluster since user IDs are globally distributed.

Data Model: The messages table has a partition key of user ID and conversation ID combined. Clustering keys are timestamp (descending order for recent-first queries) and message ID. Columns include sender ID, encrypted content (stored as blob), metadata (map type for flexible attributes), and status (sent, delivered, read).

Replication: Each message is replicated across three nodes (replication factor 3) for durability. Write consistency level is set to QUORUM, meaning writes succeed when 2 out of 3 replicas acknowledge. Read consistency level is also QUORUM, ensuring readers see the most recent writes. This configuration can tolerate one node failure without data loss.

Cross-Datacenter Replication: For disaster recovery, Cassandra replicates data across multiple geographic data centers. Each data center has its own replica set. This provides geographic redundancy and allows users to access data from the nearest data center for lower latency.

Write Path Optimization: Cassandra’s write path is optimized for high throughput. Writes first go to a commit log (sequential disk write, very fast) and then to an in-memory structure (memtable). The commit log provides durability even if the server crashes before the memtable is flushed to disk. Periodically, memtables are flushed to disk as SSTables (immutable sorted files).

Read Path Optimization: Reads may need to merge data from multiple SSTables. Bloom filters quickly determine which SSTables might contain the requested data, avoiding unnecessary disk reads. Compaction periodically merges SSTables to reduce read amplification and reclaim space from deleted messages.

Handling Hot Partitions: Some users (celebrities, popular groups) may generate significantly more messages than average, creating hot partitions. Virtual nodes (vnodes) help distribute the load more evenly. For extreme cases, additional partitioning by time buckets can be introduced, though this complicates queries.

Data Retention: Not all messages need to be stored forever. Cassandra’s TTL (Time To Live) feature can automatically delete old messages. Alternatively, time-bucketed tables can be used, with old buckets dropped entirely. For compliance and backup, older messages can be archived to cheaper cold storage like S3 Glacier before deletion.

Deep Dive 5: How do we handle group message fan-out efficiently?

Group messages need to be delivered to all members, which can be expensive for large groups.

Solution: Tiered Fan-Out Strategy with Optimization

Group Size Tiers: WhatsApp handles groups differently based on size. Small groups (2-50 members) use standard fan-out where the message is individually queued for each member. Medium groups (51-256 members, WhatsApp’s current limit) use batched fan-out with optimizations. For very large groups or broadcast lists, different strategies like lazy expansion are employed.

Batched Enqueuing: Instead of creating 256 individual Kafka messages for a 256-member group, the system creates a single Kafka message containing all recipient IDs. A consumer then processes this message, expanding it to individual delivery tasks. This dramatically reduces the load on Kafka and improves throughput.

Lazy Expansion: The message content is stored once in Cassandra in a shared partition. Each user’s inbox contains a lightweight pointer (reference) to the shared message rather than a full copy. This reduces storage requirements significantly. When a user fetches their messages, the pointers are resolved to retrieve the actual content.

Presence-Based Prioritization: Before fan-out, the system queries Redis to determine which group members are currently online. Online members are prioritized for immediate delivery, while offline members have messages queued in their offline queue. This improves perceived latency for active users.

Async Fan-Out Processing: The sender receives ACK_SENT immediately after the message is stored and the fan-out is queued. They don’t wait for delivery to all recipients. Fan-out processing happens asynchronously in the background. This keeps the sender’s experience fast and responsive.

Group Metadata Caching: Group membership lists are cached in Redis with a 1-hour TTL. This avoids repeatedly querying the User Database during fan-out. When membership changes (member added or removed), the cache is invalidated, and an update notification is pushed to all online members.

Rate Limiting: To prevent spam in large groups, rate limiting is applied per user per group. Normal users might be limited to 10 messages per minute in a group, while admins have higher limits. Detected spam can result in shadowbanning where the sender thinks messages are delivered, but they’re actually dropped.

Delivery Metrics: For group messages, acknowledgments work differently. A single checkmark indicates the message was sent to the server. Double checkmarks appear when at least one member has received the message. Blue checkmarks show a list of members who have read the message, viewable by tapping the checkmark indicator.

Deep Dive 6: How do we handle media storage and delivery at scale?

With billions of photos, videos, and files shared daily, media storage is a massive challenge.

Solution: Content-Addressed Storage with CDN and Lifecycle Management

Content-Addressed Storage: Instead of random filenames, WhatsApp uses content hashing for media storage. Each file’s SHA-256 hash becomes its storage key. This provides automatic deduplication. If the same image is sent multiple times (common for viral content), it’s only stored once. When uploading, the client first computes the hash. The server checks if this hash already exists. If so, it returns the existing media ID without uploading. This saves significant bandwidth and storage.

Storage Structure: The hash is used to create a hierarchical folder structure for better performance. For example, hash “abc123…” becomes “media/ab/c1/abc123.jpg”. This prevents having billions of files in a single directory, which would degrade filesystem performance.

Encryption Before Upload: Media is encrypted on the client device before upload using AES-256. The encryption key is randomly generated for each file. The encrypted file is uploaded to S3, ensuring the server never sees plaintext media. The encryption key is sent in the message (encrypted via Signal Protocol). The recipient downloads the encrypted file, retrieves the key from the message, and decrypts locally.

Thumbnail Generation: The Media Service generates multiple thumbnail sizes (100x100 for lists, 300x300 for previews) server-side. Thumbnails are stored alongside the original with suffixes like “hash_thumb100.jpg”. Messages include thumbnail references for quick loading. The full-resolution image is loaded only when the user taps to view.

Compression: Images are re-encoded as JPEG with 85% quality, providing a good balance between file size and visual quality. Videos are re-encoded to H.264 video codec with AAC audio at a maximum bitrate of 1Mbps, significantly reducing file sizes while maintaining acceptable quality. Voice messages use the Opus audio codec at 32kbps, optimized for speech. Documents are not compressed to preserve integrity.

CDN Integration: S3 buckets are fronted by CloudFront CDN for global distribution. Media is cached at edge locations close to users, dramatically reducing latency. Cache TTL is 7 days for media files and 30 days for thumbnails. Origin shield adds an additional caching layer between CloudFront and S3 to reduce load on S3.

Signed URLs: Media URLs are signed with expiration timestamps (typically 24 hours). This prevents unauthorized access to media files. URLs contain HMAC signatures that are validated by CloudFront before serving content. This ensures media can only be accessed by authorized recipients.

Lifecycle Management: Media storage is tiered based on access patterns. The hot tier uses S3 Standard for frequently accessed recent media (0-7 days). The warm tier uses S3 Infrequent Access for less frequently accessed media (7-30 days), which costs less but has slightly higher retrieval costs. The cold tier uses S3 Glacier for long-term archival (30+ days to 1 year). After a year, media is either moved to Glacier Deep Archive for compliance or deleted based on retention policies.

Cost Optimization: Content deduplication significantly reduces storage costs. Lifecycle policies automatically move data to cheaper storage tiers. Compression reduces both storage costs and bandwidth costs. CDN caching reduces the number of requests to S3, lowering data transfer costs.

Deep Dive 7: How do we handle presence information for billions of users?

Tracking who’s online, when they were last seen, and when they’re typing is challenging at scale.

Solution: Redis-Based Presence with TTL and Pub/Sub

Presence Data Model: Redis stores presence information with keys like “presence:user123”. The value is a hash containing status (online or offline), last seen timestamp, and the gateway server ID handling the connection. Each key has a TTL of 5 minutes.

Automatic Expiration: The TTL-based approach elegantly handles disconnections. When a user connects, the Gateway sets their presence key with a 5-minute TTL. Any activity (sending a message, opening a chat) refreshes the TTL. If the user disconnects gracefully, the Gateway explicitly sets status to offline and updates last seen. If the user disconnects ungracefully (network drop, crash), the TTL expires after 5 minutes, and Redis automatically sets the status to offline via keyspace notifications.

Last Seen Logic: The last seen timestamp is updated when a user disconnects or when their TTL expires. This timestamp is subject to privacy settings. Users can configure who sees their last seen: Everyone, Contacts Only, or Nobody. The Presence Service checks these privacy settings (cached in Redis) before returning last seen information to other users.

Typing Indicators: When a user types in a conversation, the client sends a typing indicator to the Presence Service with the conversation ID. This is stored in Redis with a very short TTL (5 seconds). The Presence Service uses Pub/Sub to broadcast the typing status to other conversation participants. If no typing activity occurs for 5 seconds, the indicator automatically expires. Typing indicators are throttled to maximum one update per 3 seconds to prevent spam.

Pub/Sub for Real-Time Updates: When a user’s status changes (comes online, goes offline), the Presence Service publishes a notification to a Redis Pub/Sub channel. Clients interested in this user’s status (those who have the user in their contacts and are currently viewing a chat list) are subscribed to relevant channels. They receive the update in real-time without polling.

Scalability Considerations: Not all users need presence updates for all contacts. The system only tracks presence for contacts currently visible in the UI or recent conversations. This dramatically reduces the number of presence subscriptions. Presence updates are also batched when possible. If multiple contacts’ status changes simultaneously, they’re sent in a single notification.

Privacy Features: Beyond last seen privacy, users can hide their online status entirely. In this mode, their status always appears as offline to others, and they don’t receive real-time presence updates. Read receipts can also be disabled, though this is a trade-off: if you disable read receipts, you also can’t see others’ read receipts.

Read Receipts (Blue Checkmarks): Read receipts are a special form of presence. When a message enters the viewport (scrolled into view) in an open conversation, the client sends a READ_RECEIPT. This is processed by the Message Service, which updates the message status to “read” in Cassandra and notifies the sender via their WebSocket connection. The sender’s UI updates the checkmark from gray to blue. For group chats, individual read receipts are tracked, and the UI shows a list of members who have read the message.

Step 4: Wrap Up

In this chapter, we proposed a system design for a real-time messaging platform like WhatsApp. If there is extra time at the end of the interview, here are additional points to discuss:

Summary of Key Design Decisions

The architecture prioritizes several key design principles. WebSocket-based persistent connections enable real-time, low-latency messaging with a custom binary protocol for efficiency and horizontal scaling through connection routing via Redis. End-to-end encryption using the Signal Protocol ensures industry-leading security where the server never has access to message plaintext, with forward secrecy and future secrecy guarantees.

Database sharding uses Cassandra for messages, partitioned by user ID and conversation ID for co-location. MySQL handles user data, sharded by user ID with consistent hashing. Redis manages presence information and offline message queues.

Media handling uses content-addressed storage in S3 for deduplication. Client-side encryption before upload ensures privacy. CDN integration provides global, low-latency delivery.

High availability is achieved through eliminating single points of failure with all components replicated. Multi-region deployment with geo-routing ensures global coverage. Graceful degradation allows presence to be stale while ensuring messages are never lost.

Additional Features

Voice and video calling would require WebRTC for signaling, TURN and STUN servers for NAT traversal, and either mesh architecture for small groups or SFU (Selective Forwarding Unit) architecture for larger groups. The existing infrastructure provides authentication and presence, while call media travels peer-to-peer when possible.

Status updates (24-hour ephemeral stories) require ephemeral storage with Redis TTL, fan-out to viewers, and separate UI flows. This is essentially a broadcast with automatic expiration.

Payments integration would need a separate payment service with strong consistency guarantees, integration with payment providers, fraud detection, and transaction history. This requires careful consideration of financial regulations across regions.

Channels for one-to-many broadcasting differ from groups in that they have no member limit, are one-way communication only, and use different fan-out strategies optimized for massive audiences.

Scalability and Future Enhancements

Scaling to 10 billion users would require more aggressive sharding, potentially increasing from hundreds to thousands of shards. Multi-region active-active deployment presents challenges with message ordering across regions but improves latency and availability. Edge gateways placed closer to users can further reduce connection latency.

Performance optimizations include message compression using gzip or brotli before transmission to reduce bandwidth. Connection multiplexing can combine multiple logical channels over a single WebSocket connection. Predictive prefetching using machine learning can preload conversations likely to be opened next.

Monitoring and observability are crucial. Key metrics include message throughput, latency (p50, p99, p999), connection count, error rates, and delivery success rates. Distributed tracing using tools like Jaeger or Zipkin traces message journeys end-to-end. Anomaly detection alerts on issues like sudden drops in delivery rates indicating outages. Real-time dashboards visualize system health.

Trade-offs and Considerations

The consistency versus availability trade-off manifests differently across components. For messages, we prefer availability with eventual consistency, accepting that retries handle transient failures. For user authentication, we prefer consistency to prevent duplicate accounts or security issues. For presence information, eventual consistency is acceptable since stale status for a few seconds doesn’t significantly impact user experience.

Latency versus throughput is optimized for low latency since real-time messaging is the core use case. Throughput is achieved through horizontal scaling rather than batching, which would increase latency.

Cost versus performance involves using Redis for hot data despite higher cost because of its speed. Cassandra handles warm data, being cheaper than Redis while still providing fast reads. S3 Glacier stores cold data at very low cost, accepting slow retrieval times.

Privacy versus features involves trade-offs where end-to-end encryption limits server-side features. Cloud search isn’t possible since the server can’t read messages. Smart replies require on-device machine learning. Metadata about who talks to whom and when remains visible to the server as it’s necessary for routing, balancing strong content privacy with some metadata exposure.

Operational Excellence

Deployment strategy uses blue-green deployments for zero-downtime updates. Canary releases gradually roll out changes, starting with 1% of traffic, then 10%, then 100%. Feature flags enable gradual rollout and quick rollback if issues arise without redeploying.

Disaster recovery involves multi-region replication in Cassandra and S3. Recovery Time Objective is set at 1 hour, defining the maximum acceptable downtime. Recovery Point Objective is 5 minutes, representing the maximum data loss from asynchronous replication lag. Regular disaster recovery drills conducted quarterly ensure the team is prepared.

Security measures include DDoS protection through rate limiting and services like Cloudflare. Spam detection uses machine learning models and heuristics. Abuse prevention combines automated systems with manual review. GDPR compliance includes handling data deletion requests and ensuring user privacy.

Conclusion

Designing WhatsApp at scale requires careful consideration of real-time requirements, security, and global distribution. The architecture presented here balances low latency through WebSocket connections and geo-distributed gateways, strong security via end-to-end encryption with Signal Protocol, high availability through replication and sharding, and cost efficiency via tiered storage and content deduplication.

The system is designed to handle 2 billion users, 100 billion messages per day, and 500 million concurrent connections while maintaining sub-second message delivery and 99.99% uptime. Key to this scalability is the stateless Gateway layer, efficient sharding strategies, and leveraging Redis for hot data and Cassandra for durable storage.

As messaging platforms continue to evolve, this architecture provides a solid foundation for adding features like payments, business messaging, and advanced AI capabilities while preserving the core values of simplicity, security, and reliability. The principles demonstrated here—using the right database for each use case, leveraging caching aggressively, embracing asynchronous processing, and designing for failure—apply broadly to distributed systems design.

Summary

This comprehensive guide covered the design of a real-time messaging platform like WhatsApp, including:

Core Functionality: One-to-one messaging, group chats, multimedia sharing, and presence management.
Key Challenges: Managing 500 million concurrent connections, implementing end-to-end encryption, handling 100 billion messages per day, and efficient media storage.
Solutions: Horizontally scaled WebSocket gateways with Redis routing, Signal Protocol for encryption, Cassandra for message storage with smart partitioning, content-addressed media storage with CDN, and Redis-based presence with TTL.
Scalability: Multi-region deployment, database sharding strategies, tiered storage, and caching at multiple layers.

The design demonstrates how to build real-time systems with high throughput, strong security, and global scale while maintaining sub-second latency and high availability.

Design WhatsApp

Design WhatsApp

Step 1: Understand the Problem and Establish Design Scope

Functional Requirements

Non-Functional Requirements

Back-of-the-Envelope Calculations

Step 2: Propose High-Level Design and Get Buy-in

Planning the Approach

Defining the Core Entities

API Design

High-Level Architecture

1. Users should be able to send one-to-one real-time messages with sub-second latency

2. Users should be able to participate in group chats

4. Users should be able to see online/offline status, last seen, and typing indicators

Step 3: Design Deep Dive

Deep Dive 1: How do we maintain 500 million concurrent WebSocket connections?

Deep Dive 2: How do we implement end-to-end encryption with the Signal Protocol?

Deep Dive 3: How do we ensure reliable message delivery with at-least-once semantics?

Deep Dive 4: How do we efficiently store and query 100 billion messages per day?

Deep Dive 5: How do we handle group message fan-out efficiently?

Deep Dive 6: How do we handle media storage and delivery at scale?

Deep Dive 7: How do we handle presence information for billions of users?

Step 4: Wrap Up

Summary

Gaurav Aryal

Comments

Recently Viewed

Design WhatsApp

Design WhatsApp

Step 1: Understand the Problem and Establish Design Scope

Functional Requirements

Non-Functional Requirements

Back-of-the-Envelope Calculations

Step 2: Propose High-Level Design and Get Buy-in

Planning the Approach

Defining the Core Entities

API Design

High-Level Architecture

1. Users should be able to send one-to-one real-time messages with sub-second latency

2. Users should be able to participate in group chats

3. Users should be able to share multimedia content

4. Users should be able to see online/offline status, last seen, and typing indicators

Step 3: Design Deep Dive

Deep Dive 1: How do we maintain 500 million concurrent WebSocket connections?

Deep Dive 2: How do we implement end-to-end encryption with the Signal Protocol?

Deep Dive 3: How do we ensure reliable message delivery with at-least-once semantics?

Deep Dive 4: How do we efficiently store and query 100 billion messages per day?

Deep Dive 5: How do we handle group message fan-out efficiently?

Deep Dive 6: How do we handle media storage and delivery at scale?

Deep Dive 7: How do we handle presence information for billions of users?

Step 4: Wrap Up

Summary

Stay Updated

Gaurav Aryal

Comments

Recently Viewed

Keyboard Shortcuts

Navigation

Actions