Design ESPN

ESPN is a comprehensive sports platform that serves millions of concurrent users with live scores, play-by-play updates, news articles, video streaming, personalized content, and fantasy sports. The platform must handle massive traffic spikes during major sporting events like the Super Bowl, World Cup, and March Madness while maintaining sub-second latency for live score updates and delivering high-quality video streams across diverse network conditions.

Designing ESPN presents unique challenges including real-time score distribution to millions of users, handling high-velocity play-by-play data feeds, adaptive bitrate video streaming, personalized content recommendations at scale, and maintaining system stability during extreme traffic spikes.

Step 1: Understand the Problem and Establish Design Scope

Before diving into the design, it’s crucial to define the functional and non-functional requirements. For user-facing applications like this, functional requirements are the “Users should be able to…” statements, whereas non-functional requirements define system qualities via “The system should…” statements.

Functional Requirements

Core Requirements (Priority 1-3):

  1. Users should be able to view live scores and game details with sub-second latency.
  2. Users should be able to watch live sports streams and video highlights with minimal buffering.
  3. Users should be able to read sports news articles and search for content.
  4. Users should be able to follow favorite teams and players, receiving personalized content and notifications.

Below the Line (Out of Scope):

  • Users should be able to manage fantasy sports teams and leagues.
  • Users should be able to participate in live chat during games.
  • Users should be able to share content on social media platforms.
  • Users should be able to access historical statistics and advanced analytics.

Non-Functional Requirements

Core Requirements:

  • The system should provide sub-second latency for live score updates.
  • The system should support 50 million concurrent users during peak events.
  • The system should maintain 99.99% uptime for score services.
  • The system should handle 1 million requests per second during major events.
  • The system should ensure consistent fantasy scoring across all users.

Below the Line (Out of Scope):

  • The system should ensure security and privacy of user data, complying with regulations like GDPR.
  • The system should be resilient to failures, with redundancy and failover mechanisms.
  • The system should have robust monitoring, logging, and alerting capabilities.
  • The system should facilitate easy updates and maintenance without significant downtime.

Clarification Questions & Assumptions:

  • Platform: Web, iOS, Android, Smart TV, and game consoles.
  • Scale: 100 million daily active users, with 50 million concurrent users during major events.
  • Content Volume: 1 million articles per year, 10,000 videos per month, 100,000 games per year.
  • Geographic Coverage: Global, with CDN distribution across all major regions.
  • Video Quality: Support for 360p to 4K adaptive bitrate streaming.

Step 2: Propose High-Level Design and Get Buy-in

Planning the Approach

Before moving on to designing the system, it’s important to plan your strategy. For user-facing product-style questions, the plan should be straightforward: build your design up sequentially, going one by one through your functional requirements. This will help you stay focused and ensure you don’t get lost in the weeds.

Defining the Core Entities

To satisfy our key functional requirements, we’ll need the following entities:

User: Any person who uses the platform to consume sports content. Includes personal information, preferences, followed teams and players, notification settings, and subscription status.

Game: A sporting event with all its associated data. Contains the competing teams, sport type, scheduled time, current status (scheduled, in-progress, final), scores, quarter/period information, and timestamps for key events.

Score Update: Real-time score changes and game events. Includes the game identifier, timestamp, home and away scores, game status, time remaining, and the type of scoring event.

Article: Written sports content including news, analysis, and editorials. Contains the title, author, body text, sport and team tags, publication timestamp, and associated media (images, videos).

Video Stream: Live or on-demand video content. Includes stream identifier, event details, available quality levels (bitrates), HLS/DASH manifest URLs, DVR window for live streams, and encoding status.

Notification: Alerts sent to users about game events or breaking news. Contains the recipient user, notification type (score update, game start, breaking news), message content, delivery status, and timestamp.

API Design

Get Live Scores Endpoint: Used by clients to fetch current scores for active games.

GET /scores/live -> List<GameScore>
Query params: {
  sport?: string,
  teamId?: string,
  limit?: number
}

Get Game Details Endpoint: Used to retrieve detailed information about a specific game including play-by-play data.

GET /games/:gameId -> GameDetail

Get Articles Endpoint: Used to fetch sports news articles with filtering and pagination.

GET /articles -> List<Article>
Query params: {
  sport?: string,
  teamId?: string,
  search?: string,
  page: number,
  limit: number
}

Get Video Stream Endpoint: Used to retrieve video stream manifest URL for playback.

GET /videos/:videoId/manifest -> StreamManifest

Follow Team Endpoint: Used by authenticated users to follow a team for personalized content.

POST /users/follows -> Success
Body: {
  entityType: "team" | "player",
  entityId: string
}

Get Recommendations Endpoint: Used to fetch personalized content recommendations.

GET /recommendations -> List<Content>
Query params: {
  type?: "articles" | "videos" | "games",
  limit: number
}

Note: User authentication is handled via JWT tokens in the Authorization header for all authenticated endpoints.

High-Level Architecture

Let’s build up the system sequentially, addressing each functional requirement:

1. Users should be able to view live scores and game details with sub-second latency

The core components necessary to fulfill live score delivery are:

  • Client Applications: The primary touchpoints for users across web, iOS, Android, Smart TV, and game consoles. Establish WebSocket connections for real-time score updates.
  • API Gateway: Acts as the entry point for client requests, routing to appropriate microservices. Manages authentication, rate limiting, and WebSocket connection upgrades.
  • Scores Service: Manages real-time game state and score updates. Ingests data from sports feed providers, validates and enriches the data, and broadcasts updates to connected clients via WebSocket.
  • WebSocket Server Cluster: Maintains persistent connections with millions of clients. Handles connection management, subscription routing (users subscribe to specific games/teams), and message broadcasting.
  • Redis Cluster: Distributed cache storing active game data with short TTLs (5 seconds). Provides sub-millisecond read latency for score queries and supports pub/sub for WebSocket message distribution.
  • Cassandra: Time-series database storing historical game data and play-by-play events. Optimized for write-heavy workloads and time-based queries.
  • Kafka: Message queue handling score update events. Decouples score ingestion from downstream consumers (WebSocket servers, fantasy service, notification service, analytics).

Live Score Flow:

  1. Sports data providers send real-time score updates via API calls or WebSocket feeds.
  2. The Scores Service receives updates, validates data integrity, and enriches with additional metadata.
  3. Updates are published to Kafka topics partitioned by sport or game.
  4. The Scores Service updates the Redis cache with the latest game state.
  5. Historical data is asynchronously written to Cassandra for permanent storage.
  6. WebSocket servers consume from Kafka and broadcast updates to subscribed clients.
  7. Clients receive delta updates (only changed fields) to minimize bandwidth.
2. Users should be able to watch live sports streams and video highlights with minimal buffering

We extend our design to support video streaming:

  • Video Service: Manages video content lifecycle including live stream ingestion, on-demand video storage, and manifest generation for adaptive bitrate streaming.
  • Encoding Infrastructure: Transcodes live streams and recorded videos into multiple bitrates (360p, 720p, 1080p, 4K) and packages them for HLS/DASH delivery.
  • CDN (CloudFront): Global content delivery network with hundreds of edge locations. Caches video segments and manifests close to users for low-latency delivery.
  • S3 Storage: Object storage for video segments (live and on-demand), HLS/DASH manifests, and thumbnails. Organized with lifecycle policies to transition older content to cheaper storage tiers.
  • Origin Server: Central video distribution point that serves as the source of truth for CDN. Handles live stream packaging and DVR functionality.

Video Streaming Flow:

  1. Live sports broadcasts are ingested via RTMP feeds from broadcasting partners.
  2. The origin server receives the live stream and initiates real-time transcoding.
  3. Encoders produce multiple bitrate variants simultaneously (360p through 4K).
  4. Encoded segments are packaged into HLS or DASH format (typically 6-second segments).
  5. Segments and manifests are uploaded to S3 and distributed via CloudFront CDN.
  6. Clients request the master manifest from the CDN, which lists available quality levels.
  7. The video player’s adaptive bitrate algorithm selects the appropriate quality based on network conditions.
  8. Segments are fetched from the nearest CDN edge location, minimizing latency.
  9. For DVR functionality, the system maintains a rolling window of segments (typically 2-4 hours) allowing users to rewind live streams.
3. Users should be able to read sports news articles and search for content

We add components for content management and search:

  • Content Service: Manages articles, editorials, and news content. Provides CRUD operations, content publishing workflows, and integration with the CMS (Content Management System).
  • Elasticsearch: Full-text search engine indexing articles, players, teams, and games. Supports complex queries with filters, sorting, and relevance scoring.
  • PostgreSQL: Relational database storing article metadata, author information, categories, and relationships between content and entities (teams, players, games).
  • S3 for Images: Object storage for article images, author photos, and other static assets. Served via CloudFront CDN for optimal performance.

Content Delivery Flow:

  1. Content editors create and publish articles through a CMS interface.
  2. The Content Service stores article metadata in PostgreSQL and indexes the full text in Elasticsearch.
  3. Images are uploaded to S3 and optimized for web delivery (multiple sizes, WebP format).
  4. When users browse articles, the Content Service queries PostgreSQL for metadata and returns paginated results.
  5. For search queries, the service queries Elasticsearch which returns ranked results based on relevance.
  6. The client renders articles with images served from CloudFront CDN.
  7. Related content recommendations are generated based on article tags, team/player associations, and user preferences.
4. Users should be able to follow favorite teams and players, receiving personalized content and notifications

We introduce personalization and notification components:

  • User Service: Manages user profiles, authentication, preferences, and following relationships. Stores user data in PostgreSQL and caches active user sessions in Redis.
  • Personalization Engine: Generates personalized content recommendations using collaborative filtering and content-based algorithms. Processes user behavior data (views, reads, watches) to build preference models.
  • Notification Service: Dispatches real-time notifications to users about game events and breaking news. Integrates with APN (Apple Push Notification) for iOS and FCM (Firebase Cloud Messaging) for Android.
  • Recommendation Cache: Redis cache storing pre-computed recommendations for active users. Refreshed periodically or invalidated when user preferences change.

Personalization Flow:

  1. Users authenticate and the User Service loads their profile including followed teams and players.
  2. When users follow a team, this relationship is stored in PostgreSQL and cached in Redis.
  3. The Personalization Engine periodically processes user behavior data to generate recommendations.
  4. Recommendations are cached in Redis with appropriate TTLs (5-15 minutes).
  5. When users request content, the API checks the recommendation cache and returns personalized results.
  6. Recommendation models are trained offline using batch processing on historical user interaction data.

Notification Flow:

  1. Game events (scores, game start/end, etc.) are published to Kafka topics.
  2. The Notification Service consumes these events and determines affected users (those following the relevant teams).
  3. The service checks user notification preferences and quiet hours settings.
  4. Rate limiting is applied to prevent overwhelming users (max 10 notifications per hour).
  5. Deduplication ensures users don’t receive identical notifications within a time window.
  6. Notifications are batched and sent via APN/FCM to reduce API calls.
  7. Delivery status is tracked for analytics and troubleshooting.

Step 3: Design Deep Dive

With the core functional requirements met, it’s time to dig into the non-functional requirements via deep dives. These are the critical areas that separate good designs from great ones.

Deep Dive 1: How do we deliver score updates to 50 million concurrent users with sub-second latency?

Managing the distribution of real-time score updates to millions of connected clients while maintaining consistency and low latency is challenging. The primary issues are connection scalability, message distribution efficiency, and cache coherence.

Problem Analysis:

With 50 million concurrent users, each maintaining a WebSocket connection, we face several challenges:

  • Each WebSocket server can handle approximately 50,000 concurrent connections (limited by file descriptors and memory).
  • This requires 1,000 WebSocket server instances at peak load.
  • Broadcasting updates naively would require each server to send updates to all connected clients, creating a massive bandwidth bottleneck.
  • We need intelligent routing so users only receive updates for games they care about.

Solution: Multi-Layer Architecture with Pub/Sub

We implement a sophisticated distribution system with multiple optimization layers:

Layer 1: Connection Management The WebSocket server cluster uses sticky sessions via load balancer configuration. Each client maintains a persistent connection to a specific server instance. Health checks ensure connections are distributed across healthy servers only. When a server goes down, clients automatically reconnect to another instance with exponential backoff and jitter to prevent thundering herd problems.

Layer 2: Subscription Model Clients don’t receive all score updates indiscriminately. Instead, they subscribe to specific games, teams, or leagues based on their preferences. When a client connects, they send a subscription message listing the entities they want to follow. The WebSocket server maintains a mapping of connection IDs to subscriptions in local memory. This dramatically reduces unnecessary message transmission.

Layer 3: Redis Pub/Sub for Distribution When a score update arrives from the Scores Service via Kafka, it’s published to Redis Pub/Sub channels organized by entity (game ID, team ID, sport). Each WebSocket server subscribes to Redis channels for all entities that its connected clients care about. This creates an efficient fan-out mechanism where a single update is multiplied only to relevant servers.

Layer 4: In-Memory Caching Each WebSocket server maintains an in-memory cache of active game states with 1-second TTL and LRU eviction. When clients initially connect, they receive the current state from this cache immediately. This prevents overwhelming the Redis cluster with read queries. The cache is updated whenever new score updates arrive via Redis Pub/Sub.

Layer 5: Delta Updates and Compression Instead of sending complete game objects on every update, we send delta updates containing only changed fields. For example, if only the home team score changes, we send just that field. Messages larger than 1KB are compressed using gzip before transmission. This reduces bandwidth by 70-80%.

Handling Connection Drops: Network interruptions are common on mobile devices. When a client reconnects, they send the timestamp of their last received update. The server sends any missed updates from the past 60 seconds (fetched from Redis or Cassandra). This ensures no data loss while avoiding sending duplicate updates.

Monitoring and Circuit Breaking: Each WebSocket server tracks key metrics: connection count, message latency (p50, p95, p99), Redis connectivity status, and Kafka consumer lag. If Redis becomes unavailable, servers implement circuit breaker logic and fall back to direct Cassandra reads (higher latency but maintains availability). Alerts trigger when consumer lag exceeds 100 messages or p99 latency exceeds 2 seconds.

Deep Dive 2: How do we ingest high-velocity play-by-play data from multiple sports feed providers?

Sports data arrives from multiple official feed providers, each with different formats, update frequencies, and reliability characteristics. During a busy Sunday with 20+ simultaneous NFL games, we might receive 10,000+ events per second. The system must normalize this data, ensure idempotency, maintain chronological order, and handle out-of-order or duplicate events.

Problem Analysis:

Different challenges arise with multi-source ingestion:

  • Feed formats vary: XML, JSON, WebSocket streams, each with provider-specific schemas.
  • Events may arrive out of order due to network delays or provider issues.
  • Duplicate events can occur due to provider retries or network issues.
  • Some providers are more reliable than others; we need failover strategies.
  • Event timestamps from providers might not be strictly monotonic.

Solution: Adapter Pattern with Stream Processing

Feed Adapter Layer: For each sports data provider, we implement a custom adapter that normalizes their specific format into a common internal schema. The adapter handles format conversion, field mapping, and preliminary validation. For example, an NBA feed adapter converts the provider’s event codes into standardized event types (field goal, free throw, turnover, etc.).

Validation and Enrichment: After normalization, events pass through a validation stage that checks for required fields (game ID, timestamp, event type, event ID). Invalid events are logged and sent to a dead letter queue for investigation. Valid events are enriched with additional context: team names, player details, current scores. This enrichment reduces the work required by downstream consumers.

Deduplication Strategy: To ensure idempotency, we use Redis to track processed event IDs. Each event has a unique identifier from the provider. Before processing, we check if this event ID exists in Redis (with a 24-hour TTL). If found, we skip the duplicate. If not found, we process the event and store its ID in Redis. This prevents the same touchdown or goal from being scored twice in our system.

Kafka for Event Streaming: Validated and deduplicated events are published to Kafka topics. We partition by game ID to ensure all events for a given game are processed in order. Kafka’s exactly-once semantics (when properly configured) ensure no event is lost or processed twice. Multiple consumer groups can subscribe to these topics: real-time score aggregators, fantasy score calculators, notification dispatchers, and analytics pipelines.

Handling Out-of-Order Events: We use Apache Flink for stream processing with event-time semantics rather than processing-time. Each event carries its original timestamp from the provider. Flink uses watermarks to handle late-arriving events, allowing a 5-second grace period. Events arriving within this window are correctly ordered and processed. Events arriving after the watermark are written to a separate “late events” table in Cassandra for manual reconciliation.

Time-Series Storage: Cassandra is ideal for storing play-by-play events as it’s optimized for time-series data with high write throughput. The primary key structure uses game ID as the partition key and event time as a clustering key, ensuring events for each game are stored together and sorted chronologically. This makes queries like “show me all events for game X” or “show me all scoring plays” extremely efficient.

Feed Health Monitoring: We continuously monitor each feed provider’s health by tracking event arrival rates and detecting stale feeds. Each game has an expected event rate based on sport type and game state. If a feed stops sending events (or the rate drops significantly), alerts are triggered. We can then failover to a secondary provider or notify the operations team. This proactive monitoring prevents scenarios where a feed dies silently and users see frozen scores.

Deep Dive 3: How do we deliver adaptive bitrate video streaming with minimal buffering?

Video streaming to millions of concurrent users across varying network conditions (5G, 4G, 3G, WiFi) while minimizing buffering events and CDN costs is complex. A buffering event during a crucial game moment significantly degrades user experience.

Problem Analysis:

Video streaming challenges include:

  • Users have vastly different network conditions, from 5G at 100+ Mbps to 3G at 2 Mbps.
  • Network conditions fluctuate during a session due to movement, congestion, or interference.
  • Live streams have low latency requirements (5-10 second delay) compared to on-demand content.
  • Serving high-quality video (4K) to millions of users simultaneously is bandwidth-intensive and expensive.
  • Users expect instant start times (under 3 seconds) and smooth playback without buffering.

Solution: HLS/DASH with Intelligent Client-Side Adaptation

Multi-Bitrate Encoding: Every live stream and video is transcoded into multiple quality levels simultaneously: 360p at 1 Mbps, 720p at 3 Mbps, 1080p at 6 Mbps, and 4K at 15 Mbps. The encoding infrastructure uses hardware acceleration (GPUs) for real-time transcoding. Each quality level is segmented into 6-second chunks, allowing clients to switch quality every few seconds without interrupting playback.

HLS/DASH Protocol: We use HTTP Live Streaming (HLS) for Apple devices and DASH for others. Both protocols work similarly: a master manifest lists available quality levels, and variant playlists reference the actual video segments. Clients download segments sequentially, building a buffer of pre-fetched content. This buffer protects against temporary network issues.

Adaptive Bitrate Algorithm: The client-side video player implements an adaptive bitrate algorithm that continuously monitors network conditions and adjusts quality. The algorithm considers multiple factors: current bandwidth (estimated from recent segment download times), buffer level (how many seconds of video are pre-downloaded), and historical bandwidth using exponential weighted moving average. If the buffer is low (under 10 seconds), the algorithm conservatively selects a lower bitrate. If the buffer is healthy (over 20 seconds) and bandwidth is sufficient, it tries a higher bitrate.

CDN Strategy and Origin Shield: Video segments and manifests are distributed via CloudFront CDN with over 200 edge locations worldwide. This ensures most users fetch content from a nearby server, reducing latency. We enable Origin Shield, which adds an additional caching layer between edge locations and the origin server. This dramatically reduces origin server load because all edge locations in a region share the Origin Shield cache. During a major event when millions request the same live stream segment, the origin serves it once to the Origin Shield, which then serves all edge locations.

Segment Caching and TTL: Live stream segments are cached at edge locations with a 24-hour TTL (even though they’re 6 seconds of video). This might seem counterintuitive, but once a segment is created for a live stream, it never changes. For on-demand content, segments are cached until evicted by LRU policies. Master and variant manifests have shorter TTLs (5-10 seconds for live, 1 hour for on-demand) as they’re updated when new segments become available.

DVR and Time-Shifting: For live streams, we maintain a rolling window of segments (typically 2-4 hours) enabling DVR functionality. The manifest includes all available segments within this window. When a user rewinds, the client simply requests older segments from the manifest. Segments older than the DVR window are deleted from S3 to save storage costs. The Origin server generates manifests dynamically based on the current live point and the configured DVR window.

Prefetching and Predictive Loading: The client player prefetches the next 2-3 segments at the current bitrate to build a healthy buffer. During stable network conditions, it might speculatively prefetch one segment at a higher quality to enable quicker quality upgrades. This prefetching strategy balances buffer health with bandwidth efficiency.

Quality of Experience Monitoring: We track detailed video metrics for every session: video start time, buffering ratio (percentage of time spent buffering), average bitrate, number of bitrate switches, and CDN cache hit rate. These metrics feed into quality dashboards and alert systems. If the average buffering ratio exceeds 1% or video start failures exceed 0.5%, incidents are triggered for investigation.

Deep Dive 4: How do we generate personalized content recommendations for 100 million users?

Delivering relevant recommendations at scale while maintaining freshness and balancing engagement with content diversity is challenging. Users expect Netflix-quality personalization where every article, video, and game suggestion feels tailored to their interests.

Problem Analysis:

Recommendation challenges include:

  • 100 million users each need personalized recommendations, but computing recommendations for all users on every request is impossible.
  • User preferences evolve over time; recommendations must stay fresh.
  • Cold start problem: new users have no behavior history.
  • Balancing exploitation (showing content similar to what users like) with exploration (introducing diverse content).
  • Recommendations must consider recency, trending topics, and breaking news alongside user preferences.

Solution: Multi-Stage Pipeline with Offline and Online Components

Stage 1: Candidate Generation When a user requests recommendations, we first generate a large candidate set (1,000-2,000 items) using multiple strategies running in parallel:

Collaborative Filtering: Users who follow similar teams and read similar articles likely have similar interests. We use matrix factorization to find users similar to the current user, then retrieve content those similar users engaged with. This runs as a batch job every few hours, pre-computing user similarity scores and caching them in Redis.

Content-Based Filtering: Based on the user’s followed teams, players, and sports, we retrieve recent articles and videos tagged with those entities. If a user follows the Lakers and LeBron James, we fetch all recent Lakers/LeBron content. This strategy is fast and works well for users with clear preferences.

Trending and Popular Items: We include content that’s currently trending globally or within specific sports. This ensures users discover breaking news and popular stories even if they don’t match the user’s historical preferences. Trending scores are computed in real-time using a sliding window over the past few hours.

Stage 2: Ranking The candidate set of 1,000-2,000 items is too large to present. We use a neural network-based ranker to score each candidate and select the top 20-50. The ranker considers:

User Features: Age group, country, favorite teams, favorite sports, historical engagement patterns (average session duration, articles read in the past 7 days, preferred content types), activity patterns (time of day when most active), and engagement metrics (click-through rate, average read time).

Item Features: Sport, teams, players, content type (news, analysis, opinion), word count, whether it includes video, author reputation score, freshness score (recent articles get a boost), and engagement signals (view count, average time on page, share count).

Contextual Features: Current time of day, user’s device type, current location (city/country), and whether any of the user’s followed teams are playing today.

The ranker is a deep neural network trained offline on historical engagement data. The model learns patterns like “users who read articles in the morning prefer shorter news pieces” or “users who follow multiple teams in the same sport engage more with league-wide analysis.”

Stage 3: Post-Processing The ranked results undergo final adjustments:

Diversity Filtering: We apply Maximal Marginal Relevance (MMR) to ensure recommendations aren’t all about the same team or topic. The algorithm selects the highest-ranked item, then iteratively selects items that balance relevance score with diversity from already-selected items. This ensures users see a mix of content about different teams, sports, and topics.

Business Rules: Certain content might be boosted or suppressed based on business needs. Sponsored content, editorial picks, or content from new authors might receive artificial boosts. Breaking news gets maximum priority regardless of user preferences.

A/B Testing: Users are assigned to experiment variants that might test different ranking models, diversity weights, or UI presentations. The A/B testing framework tracks engagement metrics across variants to continuously improve recommendations.

Caching and Precomputation: Computing recommendations on-demand for every request would be too slow. Instead, we pre-compute recommendations for active users (those who’ve visited in the past 7 days) in batch jobs running every 15-30 minutes. These recommendations are cached in Redis with appropriate TTLs. When a user requests recommendations, we serve from cache and trigger an asynchronous refresh if the cached version is stale. This hybrid approach balances freshness with performance.

Cold Start Handling: For new users without behavioral history, we rely heavily on demographic signals and trending content. As soon as a user follows their first team, we have enough signal to generate better recommendations using content-based filtering. After a user engages with 10-20 pieces of content, collaborative filtering becomes effective.

Model Training Pipeline: The ranking model is retrained weekly using the past month’s engagement data. Training happens offline on a Spark cluster, processing billions of interaction events. Model features and embeddings are extracted, and the neural network is trained using positive examples (clicked/read content) and negative examples (shown but not clicked). The trained model is deployed to the serving infrastructure with A/B testing to ensure it outperforms the previous model.

Deep Dive 5: How do we send timely notifications without overwhelming users?

Sending relevant notifications about game events to millions of users while respecting their preferences, avoiding notification fatigue, and optimizing delivery costs requires careful orchestration.

Problem Analysis:

Notification challenges include:

  • During a major game, a single scoring play might trigger notifications for millions of users following that team.
  • Users have different tolerance levels; some want every score update while others only want final scores.
  • Quiet hours: users don’t want notifications at 3 AM unless it’s a championship game.
  • Notification fatigue: sending too many notifications leads to users disabling them entirely.
  • Delivery costs: APN and FCM have rate limits and costs scale with volume.

Solution: Smart Notification Pipeline with User Preference Filtering

Event Triggering: Game events (scores, game start/end, player milestones, breaking news) are published to Kafka topics. The Notification Service consumes these events and determines which users should be notified. For a touchdown in an NFL game, the service queries the database for all users following either team involved.

User Preference Filtering: Before sending any notification, the system checks multiple layers of user preferences stored in PostgreSQL and cached in Redis:

Global Enable: Is the user opted into notifications at all? If not, skip immediately.

Channel Preferences: Does the user want push notifications, email, SMS, or some combination? This determines which delivery channels to use.

Type Preferences: Does the user want score updates, game reminders, breaking news, or fantasy updates? Different notification types can be independently enabled or disabled.

Quiet Hours: Has the user configured quiet hours (e.g., 10 PM to 8 AM)? The system converts to the user’s timezone and checks if the current time falls within this window. Exception: championship games or user-defined “always notify” events override quiet hours.

Rate Limiting: Even if a notification passes all preference checks, rate limiting prevents overwhelming users. We track notification counts per user in Redis using sliding window counters:

Hourly Limit: Maximum 10 notifications per hour per user. This prevents notification storms during high-scoring games.

Daily Limit: Maximum 50 notifications per day per user. This ensures users aren’t bombarded across multiple events.

If a user hits their limit, lower-priority notifications are suppressed while critical notifications (like game start for a followed team) are still delivered.

Deduplication: Users shouldn’t receive duplicate notifications for the same event within a short time window. We use Redis to track recently sent notifications with a key format like “sent:{userId}:{gameId}:{notificationType}” with a 1-hour TTL. Before sending, we check this key. If it exists, we skip the duplicate notification.

Batching for Cost Optimization: Instead of making individual API calls to APN/FCM for each notification, we batch notifications by platform. FCM supports up to 500 messages per batch request. The Notification Service accumulates notifications in memory until either a batch is full (500 messages) or a timeout expires (5 seconds), then sends the batch. This dramatically reduces API calls and associated costs.

Delivery Tracking: After sending, we track delivery status. Both APN and FCM provide feedback about successful deliveries and failures (invalid device tokens, user uninstalled app). Failed device tokens are marked as inactive in the database to avoid retrying them. This keeps the database clean and reduces wasted delivery attempts.

Priority and Urgency: Notifications have priority levels that influence delivery timing:

Critical: Championship game events, breaking news about followed teams. Delivered immediately even during quiet hours.

High: Score updates for followed teams. Delivered immediately unless quiet hours are active.

Medium: Game reminders, fantasy updates. Delivered respecting all preference rules.

Low: General sports news, promotional content. Delivered only if the user hasn’t received many notifications recently.

Smart Notification Composition: Notification content is dynamically generated based on context. A touchdown notification includes the team name, current score, and time remaining. The message adapts to user preferences; some users get detailed play descriptions while others get simple score updates. Deep links in notifications take users directly to the relevant game or article within the app.

Deep Dive 6: How do we handle traffic spikes during big games like the Super Bowl?

Major sporting events can cause 10x to 100x traffic spikes within minutes. The Super Bowl might bring 50 million concurrent users when typical traffic is 5 million. The system must scale elastically without manual intervention while maintaining performance and availability.

Problem Analysis:

Traffic spike challenges include:

  • Sudden traffic surges: The Super Bowl kickoff creates an immediate spike as millions open the app simultaneously.
  • Sustained high load: Peak load persists for 3-4 hours rather than being transient.
  • Geographic concentration: Events like the Super Bowl concentrate traffic in specific regions.
  • Cascading failures: If one component fails under load, it can trigger failures in dependent components.
  • Cold start latency: Newly launched instances need time to warm up (load caches, establish connections).

Solution: Multi-Faceted Scaling Strategy

Auto-Scaling with Predictive Scaling: We use Kubernetes Horizontal Pod Autoscaler (HPA) to automatically scale services based on metrics like CPU utilization, memory usage, and custom metrics (WebSocket connection count). HPA targets are set conservatively: 70% CPU, 80% memory, 40,000 WebSocket connections per server.

For known events like the Super Bowl, we use predictive scaling. Historical data from similar events predicts peak traffic. The system pre-scales 30 minutes before kickoff, launching additional instances to handle the expected load. This avoids the cold start problem where rapid scaling causes latency spikes as new instances initialize.

Cache Warming: Before major events, a cache warming process pre-loads critical data into Redis: game metadata, team rosters, player statistics. This ensures the first wave of requests is served from cache rather than overwhelming the database. Cache warming happens automatically for any game marked as “high profile” in the system.

CDN Prefetching: For video streaming, we work with CDN providers to prefetch the live stream segments to edge locations before the event starts. This ensures the CDN cache is warm when millions of users start watching simultaneously, reducing origin server load by 90%+.

Database Read Replicas: PostgreSQL uses read replicas to scale read operations. During normal traffic, we might have 2-3 read replicas. Before major events, we automatically provision additional replicas (up to 10). Read queries are distributed across replicas using connection pooling and load balancing. This prevents the primary database from being overwhelmed by read traffic while it handles writes.

Rate Limiting and Throttling: The API Gateway implements rate limiting per user and per IP to prevent abuse and ensure fair resource allocation. During extreme load (system above 90% capacity), we implement adaptive throttling: rate limits are temporarily reduced to 20-50% of normal values. This prevents complete system overload by gracefully degrading service rather than failing catastrophically.

Circuit Breaker Pattern: Services implement circuit breakers when calling dependencies. If the Content Service becomes slow or unavailable, the circuit breaker “opens” and the API returns cached data or a degraded response instead of waiting for timeouts. This prevents cascading failures where one slow service brings down everything else.

After a cooldown period (60 seconds), the circuit breaker enters “half-open” state and allows a few test requests through. If they succeed, the circuit closes and normal operation resumes. If they fail, the circuit re-opens for another cooldown cycle.

Load Shedding: When the system approaches its limits, we implement load shedding based on request priority:

Critical: Live scores and game data. Always served.

High: Video streaming. Served unless system is above 95% capacity.

Medium: Articles and search. Served unless system is above 85% capacity.

Low: Recommendations and social features. Shed when system exceeds 70% capacity.

This priority-based approach ensures critical functionality remains available even when the system is overwhelmed. Users might not get personalized recommendations during the Super Bowl, but they’ll always see live scores.

Geographic Distribution and Load Balancing: Traffic is distributed globally across multiple AWS regions: US-East, US-West, Europe, and Asia-Pacific. Geographic load balancing routes users to the nearest region. During region-specific events (like NFL games), we can temporarily shift capacity by launching additional instances in the affected region.

Monitoring and Alerting: Real-time dashboards show system health metrics: request rate, error rate, latency percentiles (p50, p95, p99), database CPU/memory, cache hit rates, Kafka consumer lag. Automated alerts trigger when metrics exceed thresholds, notifying on-call engineers. For major events, the engineering team monitors dashboards proactively rather than waiting for alerts.

Graceful Degradation: During partial outages or extreme load, the system degrades gracefully rather than failing completely:

  • If Redis is unavailable, fall back to database reads (higher latency but functional).
  • If Elasticsearch is down, disable search and show curated content instead.
  • If video encoding is delayed, show lower quality stream or last available segment.
  • If recommendation service is slow, show trending content instead of personalized recommendations.

Each service has defined fallback behaviors that maintain core functionality even when dependencies fail.

Step 4: Wrap Up

In this chapter, we proposed a system design for a comprehensive sports platform like ESPN. If there is extra time at the end of the interview, here are additional points to discuss:

Additional Features:

  • Fantasy sports: Real-time player scoring, league management, and draft functionality.
  • Social features: Live chat during games, social sharing, community polls and predictions.
  • Advanced analytics: Historical statistics, player comparisons, and predictive analytics.
  • Multi-angle video: Allow users to switch between camera angles during live streams.
  • Interactive overlays: Real-time stats and graphics overlaid on video streams.

Scaling Considerations:

  • Horizontal Scaling: All services are stateless (session state in Redis) to enable horizontal scaling across hundreds of instances.
  • Database Sharding: Shard PostgreSQL by geographic region or user ID ranges to distribute load.
  • Caching Layers: Multi-level caching (application cache, distributed cache, CDN) with appropriate TTLs for each data type.
  • Message Queue Partitioning: Kafka topics are partitioned by game ID or sport to enable parallel processing across consumer groups.
  • Read-Write Separation: Direct writes to primary databases and reads to replicas to balance load.

Error Handling:

  • Network Failures: Implement retry logic with exponential backoff and jitter to avoid thundering herd.
  • Service Failures: Circuit breakers prevent cascading failures and allow graceful degradation.
  • Database Failures: Automatic failover to replica databases with health checks.
  • Third-Party API Failures: Multiple sports feed providers with automatic failover if primary provider fails.
  • Message Loss Prevention: Kafka’s replication and exactly-once semantics ensure no events are lost.

Consistency and Reliability:

  • Score Consistency: All users should see the same score for a game at any point in time. Redis cache updates are atomic and Kafka ensures ordered delivery within partitions.
  • Fantasy Score Consistency: Fantasy scores are calculated once per event and cached to ensure all users in a league see identical scores.
  • Idempotent Operations: All critical operations (score updates, fantasy calculations, notifications) are designed to be safely retried without side effects.
  • Data Durability: Kafka retains messages for 7 days, Cassandra uses replication factor of 3, PostgreSQL uses streaming replication to standby instances.

Security Considerations:

  • Encrypt sensitive data in transit (TLS 1.3) and at rest (AES-256).
  • Implement proper authentication (JWT tokens with refresh tokens) and authorization (role-based access control).
  • Rate limiting per user and per IP to prevent abuse and DDoS attacks.
  • Input validation and sanitization to prevent injection attacks (SQL, XSS, etc.).
  • API keys and secrets stored in secure vaults (AWS Secrets Manager) and rotated regularly.
  • Regular security audits and penetration testing to identify vulnerabilities.

Monitoring and Analytics:

  • Track key metrics: daily active users, video streaming hours, article engagement rate, notification delivery rate, fantasy league participation.
  • Business metrics: subscription conversion rate, advertisement impressions, content popularity trends.
  • Technical metrics: API latency, error rates, cache hit rates, database query performance, message queue lag.
  • Real-time dashboards for operations team showing system health and traffic patterns.
  • A/B testing framework for recommendation algorithms, UI changes, and pricing strategies.
  • User behavior analytics to understand engagement patterns and optimize content strategy.

Future Improvements:

  • Machine learning for demand prediction and optimal resource allocation during events.
  • Computer vision for automated highlight generation from live streams.
  • Natural language processing for automated article summarization and tagging.
  • Personalized video thumbnails and preview clips tailored to user interests.
  • Voice assistant integration (Alexa, Google Assistant) for hands-free score updates.
  • Augmented reality features for enhanced viewing experiences.
  • Blockchain for transparent and secure fantasy sports transactions.

Data Consistency Trade-offs:

  • Live Scores: Strong consistency required. Use Redis with single writer pattern and Kafka for ordered delivery.
  • Articles and Content: Eventual consistency acceptable. Cache with longer TTLs (5-15 minutes).
  • Recommendations: Eventual consistency acceptable. Pre-computed periodically and cached.
  • Fantasy Scores: Strong consistency required. Use PostgreSQL transactions and Redis locks.
  • User Preferences: Strong consistency required for writes, eventual consistency acceptable for reads.

Cost Optimization:

  • Aggressive caching reduces database queries by 90%, saving on database capacity.
  • CDN for video reduces origin bandwidth costs by 95%.
  • Auto-scaling down during off-peak hours reduces compute costs by 60%.
  • S3 lifecycle policies move old videos to cheaper storage tiers (Glacier).
  • Reserved instances for baseline capacity, spot instances for burst capacity during events.
  • Efficient video encoding (H.265 instead of H.264) reduces bandwidth costs by 30-40%.

Congratulations on getting this far! Designing ESPN is a complex system design challenge that combines real-time systems, high-throughput data processing, personalization at scale, and global content delivery. The key is to start with core functional requirements, then systematically address non-functional requirements through deep dives into critical areas like real-time distribution, video streaming, and traffic handling.


Summary

This comprehensive guide covered the design of a sports content platform like ESPN, including:

  1. Core Functionality: Live scores with sub-second latency, adaptive bitrate video streaming, content management and search, personalized recommendations, and smart notifications.
  2. Key Challenges: Real-time score distribution to millions of users, high-velocity data ingestion from multiple providers, video streaming across varying network conditions, personalization at scale, and handling extreme traffic spikes.
  3. Solutions: Multi-layer WebSocket architecture with Redis pub/sub, feed adapter pattern with stream processing, HLS/DASH with CDN distribution, multi-stage recommendation pipeline with ML ranking, smart notification filtering with rate limiting, and predictive auto-scaling with graceful degradation.
  4. Scalability: Horizontal scaling across all services, multi-level caching strategies, database sharding and read replicas, message queue partitioning, and global CDN distribution.

The design demonstrates how to build a system that handles massive scale (50M+ concurrent users), maintains low latency (sub-second score updates), delivers high-quality user experiences (smooth video streaming, relevant recommendations), and remains reliable during extreme traffic events (10x-100x spikes).