Design Medium

Medium is a sophisticated content publishing platform that serves millions of writers and readers worldwide. As a Meta staff engineer, I’ll walk through designing a system that handles article publishing, personalized reading feeds, social engagement, publication management, and subscription-based paywalls at scale.

Step 1: Understand the Problem and Establish Design Scope

Before diving into the design, it’s crucial to define the functional and non-functional requirements. For user-facing applications like this, functional requirements are the “Users should be able to…” statements, whereas non-functional requirements define system qualities via “The system should…” statements.

Functional Requirements

Core Requirements (Priority 1-3):

Core Article Features:

  1. Writers should be able to create, edit, publish, and delete articles with rich text formatting.
  2. The system should support drafts with auto-save functionality.
  3. Writers should be able to embed rich media including images, videos, code snippets, and embeds.
  4. Writers should have access to article versioning and edit history.
  5. The system should calculate reading time estimation based on content length.

Reading Experience: 6. Readers should have access to a personalized home feed based on interests and following. 7. Readers should be able to discover articles through tags and topics. 8. The system should track reading progress allowing users to resume where they left off. 9. Readers should be able to highlight and annotate content. 10. Readers should be able to bookmark and save articles for later.

Social Engagement: 11. Readers should be able to clap for articles up to 50 claps per article. 12. Users should be able to comment with threaded discussions. 13. Users should be able to follow writers and publications. 14. Users should be able to share articles across platforms.

Publications: 15. The system should support multi-author publications with role-based access. 16. Publications should have administrators, editors, and writers with appropriate permissions. 17. Writers should be able to submit articles to publications for editorial review.

Paywall & Monetization: 18. The system should enforce member-only content with soft paywalls. 19. Non-members should have a metered paywall allowing 3 free articles per month. 20. Writers should earn based on member reading time.

Below the Line (Out of Scope):

  • Users should be able to create response articles to existing content.
  • Publications should be able to send custom newsletters.
  • Writers should have detailed analytics dashboards.
  • The system should support custom publication domains.
  • Advanced search filters by read time and date ranges.

Non-Functional Requirements

Core Requirements:

  • The system should load article pages in under 1 second at the 95th percentile.
  • The system should generate personalized feeds in under 500 milliseconds.
  • The system should return search results in under 200 milliseconds.
  • The system should update claps in real-time with under 100 milliseconds latency.
  • The system should ensure strong consistency for article content and payments.
  • The system should accept eventual consistency for claps, views, and recommendations.
  • The system should maintain 99.95% uptime for reading operations.
  • The system should maintain 99.9% uptime for writing operations.

Below the Line (Out of Scope):

  • The system should comply with data privacy regulations like GDPR.
  • The system should implement comprehensive monitoring and alerting.
  • The system should have automated CI/CD pipelines for deployments.
  • The system should implement disaster recovery procedures.

Clarification Questions & Assumptions:

  • Scale: 100M+ monthly active users, 10M+ published articles, 100K+ new articles daily.
  • Platform: Web and mobile applications for both readers and writers.
  • Geographic Coverage: Global with CDN distribution.
  • Payment Processing: Handled by third-party payment processors like Stripe.

Step 2: Propose High-Level Design and Get Buy-in

Planning the Approach

Before moving on to designing the system, it’s important to plan your strategy. For user-facing product-style questions, the plan should be straightforward: build your design up sequentially, going one by one through your functional requirements. This will help you stay focused and ensure you don’t get lost in the weeds.

Defining the Core Entities

To satisfy our key functional requirements, we’ll need the following entities:

Article: The central content entity containing the article title, subtitle, rich text content in structured format, tags, reading time, word count, view count, read count, clap count, publication status, and whether it’s member-only content. Articles also maintain version history for rollback capabilities.

User: Represents both writers and readers with personal information, authentication details, profile data, following relationships, notification preferences, and for writers, their earnings and statistics.

Fare: An estimated fare calculation that includes pickup location, destination location, estimated price, and estimated time of arrival. This entity is created before ride confirmation.

Publication: A multi-author content hub with name, description, branding information, member roles, submission queue, and analytics. Publications can have custom layouts and moderation workflows.

Engagement: Captures user interactions including claps, comments, highlights, and bookmarks. Each interaction links to both the user and the article.

Subscription: Manages user membership status including subscription plan, payment method, billing cycle, and current period dates.

Location: Tracks driver positions in real-time with latitude and longitude coordinates and update timestamps.

API Design

Get Fare Estimate Endpoint: Used by riders to calculate fare before requesting a ride.

POST /fare -> Fare
Body: { pickupLocation: { lat, long }, destination: { lat, long } }

Create Article Endpoint: Used by writers to create new articles or drafts.

POST /articles -> Article
Body: { title: string, subtitle: string, content: blocks[], tags: string[], isPublished: boolean, isMemberOnly: boolean }

Update Article Endpoint: Used by writers to edit existing articles or drafts with auto-save support.

PATCH /articles/:articleId -> Article
Body: { title: string, subtitle: string, content: blocks[], tags: string[] }

Get Personalized Feed Endpoint: Used by readers to retrieve their personalized home feed with pagination.

GET /feed?offset=0&limit=20 -> Article[]

Clap for Article Endpoint: Used by readers to express appreciation for an article.

POST /articles/:articleId/clap -> ClapCount
Body: { clapCount: number }

Subscribe to Member Endpoint: Used by users to become paying members.

POST /subscriptions -> Subscription
Body: { planType: "monthly" | "annual", paymentMethodId: string }

Search Articles Endpoint: Used by users to search for content across the platform.

GET /search?query=string&filters={tags,author,publication} -> Article[]

High-Level Architecture

Let’s build up the system sequentially, addressing each functional requirement:

1. Writers should be able to create, edit, publish, and delete articles with rich text formatting

The core components necessary to fulfill article management are:

  • Client Applications: Web and mobile interfaces where writers compose and manage their content. The editor provides a rich text interface supporting various content blocks.
  • API Gateway: Entry point for all client requests, handling authentication, authorization, and routing to appropriate services.
  • Article Service: Manages all article CRUD operations, stores content in structured format, calculates reading time, handles version control, and manages article metadata.
  • Database: PostgreSQL database storing articles, versions, tags, and metadata with strong consistency guarantees.
  • Media Storage: Amazon S3 for storing uploaded images, videos, and other media files.
  • CDN: CloudFront for delivering media assets with low latency globally.

Article Creation Flow:

  1. The writer composes content in the client editor, which structures text into blocks representing paragraphs, headings, images, code, and other content types.
  2. The editor auto-saves drafts every 3 seconds of inactivity by sending PATCH requests to the Article Service.
  3. When uploading images, the client sends files to the API Gateway which validates size and format.
  4. Images are stored in S3 with unique keys and processed asynchronously to generate multiple sizes and formats.
  5. The Article Service stores content in a JSONB column in PostgreSQL, allowing flexible querying and indexing.
  6. When the writer publishes, the status changes from draft to published and the article becomes visible to readers.

Content Storage Structure:

Articles are stored with a flexible block-based structure where each block has an ID, type, and content. Block types include paragraphs with inline styling, headings at various levels, images with captions and dimensions, code blocks with language specification, quotes, and embedded content. This structure allows efficient rendering, partial updates, and version comparison.

2. Readers should have access to a personalized home feed based on interests and following

We extend our design to support personalized feed generation:

  • Feed Service: Generates and manages personalized feeds using a combination of pre-computed and real-time algorithms. The service ranks articles based on multiple signals including content relevance, engagement prediction, social signals, and freshness.
  • Recommendation Service: Runs machine learning models to predict which articles users are likely to read. Uses both content-based and collaborative filtering approaches.
  • Redis Cache: Stores pre-computed feeds as sorted sets with 24-hour TTL, enabling fast feed retrieval.
  • Cassandra: Time-series database storing reading history and user interactions, optimized for high write throughput.
  • Kafka: Event streaming platform publishing article publications, user follows, and other events that trigger feed updates.

Feed Generation Flow:

  1. A nightly batch job processes all active users, building personalized feeds for the next day.
  2. For each user, the system retrieves their reading history from the last 90 days from Cassandra.
  3. The system builds an interest profile by analyzing which topics and tags the user engages with most.
  4. The candidate generation phase assembles approximately 10,000 potential articles from multiple sources: articles from followed writers and publications, topic-based recommendations from Elasticsearch, and collaborative filtering recommendations from similar users.
  5. The system extracts features for each candidate including user features like read count and average read time, article features like age and engagement metrics, and interaction features like topic similarity.
  6. A machine learning model predicts the probability that the user will read each article.
  7. Articles are ranked by predicted engagement probability and the top 1000 are stored in Redis as a sorted set.
  8. When the user opens the app, the Feed Service retrieves articles from Redis with pagination support.

Real-time Feed Updates:

When a writer publishes a new article or a user follows someone, the system publishes events to Kafka. Consumers process these events and inject relevant articles into pre-computed feeds. If a followed writer publishes, the article is immediately added to all followers’ feeds with a boosted score to ensure visibility.

3. Readers should be able to clap for articles up to 50 claps per article

We need components for real-time engagement tracking:

  • Engagement Service: Manages all user interactions including claps, comments, highlights, and bookmarks. Optimized for high write throughput and low latency.
  • Redis Counters: Track real-time clap counts for instant user feedback.
  • Kafka Event Stream: Buffers engagement events for asynchronous persistence to PostgreSQL.

Clapping Flow:

  1. When a reader claps, the client sends a POST request with the clap count, typically between 1 and 5 per interaction.
  2. The Engagement Service validates that the user hasn’t exceeded 50 total claps for this article by checking Redis.
  3. The service increments two Redis counters: one tracking the user’s clap count for this article and one tracking the article’s total clap count.
  4. The updated total is immediately returned to the client for optimistic UI updates.
  5. Concurrently, the service publishes a clap event to Kafka for asynchronous persistence.
  6. A Kafka consumer batches clap events and periodically flushes aggregated counts to PostgreSQL.
  7. This architecture provides sub-second clap feedback while ensuring durability and preventing PostgreSQL overload.

Handling Edge Cases:

The 50-clap limit is enforced using Redis with a 30-day TTL. If Redis fails or data expires, the system gracefully allows additional claps rather than blocking legitimate engagement. The tradeoff favors user experience over perfect accuracy for a non-critical feature.

4. The system should enforce member-only content with soft paywalls

Additional components for subscription management:

  • Paywall Service: Determines access control for member-only articles, manages subscription state, and tracks metered paywall usage.
  • Payment Processing: Integration with Stripe for subscription billing, payment method management, and webhook handling.
  • Billing Database: PostgreSQL tables storing subscription status, payment history, and writer earnings.

Paywall Enforcement Flow:

  1. When a reader requests an article, the Article Service checks if the content is member-only.
  2. For public articles, access is immediately granted.
  3. For member-only articles, the Paywall Service checks several conditions in order: if the user is the article author, access is granted; if the user has an active subscription, access is granted; if the user is within their monthly 3-article limit, access is granted and the counter is incremented in Redis.
  4. The metered paywall uses a Redis counter keyed by user ID and month, with a 32-day TTL.
  5. If all checks fail, the article content is blocked and the user sees a subscription prompt.
  6. This approach balances access control with performance, using Redis for high-speed checks.

Writer Earnings Calculation:

Monthly earnings are calculated by analyzing member reading time across all of a writer’s articles. The system queries Cassandra for all member reading sessions, calculating total member reading minutes for each writer. Earnings are proportional to the writer’s share of total platform reading time, distributed from 50% of subscription revenue. Writers must reach a minimum threshold of $10 for payout eligibility. This model incentivizes quality content that keeps members engaged.

5. Readers should be able to discover articles through tags and topics

We introduce search capabilities:

  • Elasticsearch Cluster: Provides full-text search across article content, titles, tags, and author information. Supports faceted search, filtering, and relevance ranking.
  • Search Service: Handles search queries, applies filters, ranks results by relevance, and provides auto-complete suggestions.
  • Indexing Pipeline: Kafka consumers index new articles and updates in near real-time.

Search Flow:

  1. When an article is published or updated, the Article Service publishes an event to Kafka.
  2. The Search Service consumes these events and indexes the article in Elasticsearch.
  3. Articles are indexed with multiple fields including title, subtitle, full content, tags, author name, publication name, and reading time.
  4. When a user searches, the query is sent to the Search Service with optional filters for tags, authors, publications, or date ranges.
  5. Elasticsearch performs full-text search with relevance scoring, considering factors like term frequency, field weights, and recency.
  6. Results are returned ranked by relevance with highlighted matching text.
  7. The system supports “More Like This” queries to find similar articles based on content similarity.
6. The system should support multi-author publications with role-based access

Final components for publication management:

  • Publication Service: Manages publication creation, member roles, submission workflows, and editorial queues.
  • Authorization Service: Enforces role-based access control for publication operations.
  • Notification Service: Sends alerts to editors about pending submissions and to writers about review decisions.

Publication Workflow:

  1. A writer submits an article to a publication, creating a submission record with pending status.
  2. The system checks the writer’s role in the publication to ensure they have submission permissions.
  3. Editors receive notifications about the pending submission through the Notification Service.
  4. An editor reviews the submission and can approve or reject with feedback.
  5. Upon approval, the article’s publication ID is updated and it appears under the publication’s brand.
  6. Upon rejection, the writer receives feedback and can revise and resubmit.
  7. Publications maintain their own analytics tracking article performance across all published content.

Step 3: Design Deep Dive

With the core functional requirements met, it’s time to dig into the non-functional requirements via deep dives. These are the critical areas that separate good designs from great ones.

Deep Dive 1: How do we design the article editor with auto-save and conflict resolution?

Medium’s editor must feel instant while handling concurrent edits and preventing data loss.

Editor Architecture:

The editor uses a sophisticated rich text editing framework similar to Draft.js or ProseMirror. Content is represented as an immutable data structure where edits create new versions rather than mutating existing state. This enables undo/redo functionality and conflict resolution.

Auto-Save Strategy:

The editor implements debounced saves where user keystrokes trigger a save timer reset. After 3 seconds of inactivity, the current content is serialized and sent to the server. The client maintains optimistic state, immediately reflecting changes while waiting for server confirmation.

Conflict Resolution:

When two clients edit simultaneously, the last write wins for the overall article, but the system preserves both versions in history. Writers receive a warning if their version is stale and can review changes before overwriting. For collaborative editing scenarios, more sophisticated operational transformation or CRDT approaches could be implemented.

Version History:

Every save creates a version record with the complete content snapshot, version number, timestamp, and user ID. This enables writers to compare versions, view diffs, and rollback to previous states. Version storage uses PostgreSQL with JSONB columns for efficient storage and querying.

Image Upload Pipeline:

When writers upload images, the client first validates file size and format before sending to the server. The server generates a unique S3 key and returns a signed URL for direct upload. After successful upload, an asynchronous job processes the image, generating thumbnails, medium-sized versions, and large displays. The system also creates WebP variants for browsers that support the format. Image optimization reduces file sizes without visible quality loss. EXIF metadata is extracted and sensitive information is stripped. Finally, the image URLs are updated in CloudFront for global CDN delivery.

Deep Dive 2: How do we generate personalized feeds efficiently at scale?

Feed generation must balance personalization quality with computational cost and latency.

Hybrid Feed Architecture:

The system uses pre-computation for expensive machine learning inference combined with real-time updates for time-sensitive content. This hybrid approach provides high-quality recommendations without excessive latency.

Feed Ranking Algorithm:

The ranking algorithm combines multiple signals weighted by importance. Content relevance contributes 40% based on topic matching between the article’s topic vector and the user’s interest vector calculated from reading history. Engagement prediction contributes 30% using a machine learning model that predicts the probability the user will read the article based on historical patterns. Social signals contribute 20% including whether the user follows the author or publication and engagement levels from the user’s network. Freshness and quality contribute the remaining 10% balancing recency with engagement metrics like clap-to-view ratio.

Candidate Generation:

The system generates approximately 10,000 candidate articles from multiple sources to ensure diversity. Following feeds include recent articles from followed writers and publications with guaranteed inclusion. Content-based filtering finds articles with similar topics to recently read content using Elasticsearch’s More Like This queries. Collaborative filtering identifies similar users through embedding similarity and recommends articles they engaged with. Trending content in the user’s topics provides popular recent articles. Editorial picks ensure high-quality content appears even for new users with sparse history.

Storage and Caching:

Pre-computed feeds are stored in Redis as sorted sets where the score represents ranking and members are article IDs. Feeds have a 24-hour TTL and are regenerated nightly. Pagination is efficient using Redis’s range query operations. For active users, feeds are updated every hour, while inactive users receive weekly updates to conserve computational resources.

Real-time Feed Injection:

When followed authors publish new articles, the system immediately injects them into follower feeds rather than waiting for the next batch computation. Events are published to Kafka and consumers update Redis feeds asynchronously. New articles receive a recency boost to ensure visibility while maintaining overall ranking quality.

Deep Dive 3: How do we calculate reading time accurately?

Reading time estimation must account for different content types and reading speeds.

Reading Time Algorithm:

The algorithm processes each content block individually with different calculation methods. For text blocks including paragraphs and headings, the system counts words and divides by an average reading speed of 265 words per minute. For images, the first image adds 12 seconds as readers typically pause to examine it, while subsequent images add 3 seconds each. Code blocks are read more slowly with words counted and multiplied by a 0.5 factor reflecting the slower pace of reading technical content. The total time is summed across all blocks and rounded to the nearest minute with a minimum of 1 minute.

Optimizations:

Reading time is calculated once when an article is published and stored in the database rather than computed on every request. When articles are edited, reading time is recalculated only if content blocks change. This cached approach ensures fast article page loads without sacrificing accuracy.

Deep Dive 4: How do we implement the clapping system with high throughput?

Claps must feel instant while handling potential abuse and ensuring durability.

Real-time Aggregation:

When a user claps, the request first checks Redis for the user’s current clap count for that article. If adding the new claps would exceed 50, the request is rejected. Otherwise, Redis increments both the user-specific counter and the article’s total counter atomically using pipelined commands. The updated total is immediately returned for UI display.

Asynchronous Persistence:

Concurrently with the Redis update, the system publishes a clap event to Kafka containing the article ID, user ID, clap count, and timestamp. A consumer service batches events for 5 seconds or until reaching 1000 events, whichever comes first. The consumer aggregates claps by article ID and performs batch updates to PostgreSQL, incrementing article clap counts. This batching reduces database load by up to 100x compared to synchronous writes.

Data Reconciliation:

Periodically, a background job compares Redis counts with PostgreSQL to detect and correct any discrepancies. This handles edge cases like Redis failures or consumer crashes. The reconciliation process runs daily during low-traffic hours.

Deep Dive 5: How do we build the recommendation engine with machine learning?

The recommendation system must learn from user behavior and improve over time.

Multi-Stage Pipeline:

The recommendation pipeline consists of three stages: candidate generation, feature engineering, and ML ranking.

Stage 1: Candidate Generation

The system retrieves potential articles from multiple sources. Content-based filtering uses Elasticsearch’s More Like This functionality to find articles similar to those the user recently read. The query analyzes text similarity across title, content, and tags fields. Collaborative filtering identifies users with similar reading patterns using approximate nearest neighbor search on user embedding vectors. The system retrieves top-rated articles from these similar users. Topic-based recommendations find trending articles in the user’s preferred topics from the last 7 days. Editorial picks ensure quality content for cold start users. Already-read articles are filtered out, leaving approximately 1000 candidates.

Stage 2: Feature Engineering

For each candidate article, the system extracts multiple feature categories. User features include their 30-day read count, average read time, and interest vector. Article features include age in hours, reading time, clap count, view count, clap rate, and topic vector. Interaction features calculate cosine similarity between user interests and article topics, whether the user follows the author or publication. Social features count how many connections have engaged with the article and how many similar users have read it.

Stage 3: ML Ranking

A gradient boosted tree model, specifically XGBoost, predicts the probability that a user will read each candidate article. The model is trained on historical engagement data from the past 30 days, where positive examples are articles users read and negative examples are articles shown but not clicked. The objective function optimizes for AUC (Area Under the Curve). The model is retrained nightly with fresh data and deployed to a model serving layer. For inference, features are extracted for all candidates, the model predicts engagement probabilities, and articles are ranked by these probabilities. The top articles are surfaced in the user’s feed.

Model Training Pipeline:

Training data is generated by joining article impressions with engagement events. Features are computed in batch using Spark jobs processing Cassandra reading history. Models are trained on GPU instances and validated on a holdout set. A/B testing compares new models against the production model before deployment. Model artifacts are versioned and stored in S3 for rollback capability.

Deep Dive 6: How do we manage publication workflows with role-based access?

Publications require sophisticated permission systems and editorial workflows.

Role-Based Access Control:

Publications support three role types with different permissions. Administrators can manage publication settings, add or remove members, assign roles, approve or reject submissions, and publish articles. Editors can review submissions, approve or reject articles, and provide feedback to writers. Writers can submit articles to publications and view submission status. The Authorization Service enforces these permissions by checking the user’s role before allowing operations.

Submission Workflow:

When a writer submits an article, a submission record is created with pending status. The system validates that the writer has submission permissions for the publication. Editors receive notifications through multiple channels including in-app notifications, email, and mobile push depending on their preferences. The editorial queue displays pending submissions sorted by submission date. Editors can review articles, seeing the full content in a preview mode. They can approve the submission, which moves the article under the publication’s brand, or reject with written feedback. Writers receive notifications about review decisions and can revise and resubmit if rejected.

Publication Analytics:

Publications track aggregate statistics across all published articles. Metrics include total views, reads, claps, and follower growth over time. The system generates daily reports showing top-performing articles and engagement trends. These analytics help publication editors understand their audience and optimize content strategy.

Deep Dive 7: How do we implement the metered paywall accurately and performantly?

The paywall must be fast, accurate, and resistant to abuse while favoring legitimate access.

Metered Paywall Design:

The system allows non-members to read 3 member-only articles per month before hitting the paywall. The counter uses Redis for high-speed access checks. The key format includes user ID and year-month to partition by billing period. When accessing member content, the system increments the counter and sets a 32-day TTL, slightly longer than a month to handle edge cases. The counter check happens before serving article content to prevent unauthorized access.

Edge Cases:

If Redis is unavailable, the system degrades gracefully by allowing access rather than blocking legitimate readers. This favors user experience over perfect enforcement for a soft paywall. After Redis recovery, counters may be slightly inaccurate, but this is acceptable given the soft paywall nature. For users who attempt to bypass the paywall through clearing cookies or using incognito mode, additional fingerprinting techniques can be implemented but are out of scope for this design.

Subscription Management:

When users subscribe, their subscription record is created in PostgreSQL with active status. The Paywall Service checks subscription status by querying the database with caching in Redis for frequently accessed users. Subscription checks are prioritized over metered checks for performance. Stripe webhooks handle subscription lifecycle events like renewals, cancellations, and payment failures, updating the database accordingly. The system maintains strong consistency for subscription status to prevent billing issues.

Deep Dive 8: How do we optimize for SEO with server-side rendering?

Articles must be discoverable by search engines and load quickly for good rankings.

Server-Side Rendering:

Article pages are rendered on the server using a framework like Next.js. When a request arrives, the server fetches article data from the database, including content, author information, and metadata. The full HTML is generated server-side with proper semantic markup. SEO metadata is embedded in the HTML head including title tags, meta descriptions, Open Graph tags for social sharing, Twitter Card tags, canonical URLs, author information, and published/modified timestamps. This ensures search engine crawlers see complete content without requiring JavaScript execution.

Structured Data:

Articles include JSON-LD structured data using Schema.org Article vocabulary. This helps search engines understand the content type, author, publication, images, and dates. Rich snippets in search results improve click-through rates from search engines.

Performance Optimizations:

Several techniques ensure fast page loads. Critical CSS is inlined in the HTML head to render above-the-fold content immediately. Web fonts are preloaded to prevent flash of unstyled text. Images use lazy loading with blur placeholders, loading only as they enter the viewport. JavaScript bundles are code-split to load only necessary code for each page. Service workers cache content for offline reading and fast subsequent visits. Prefetching loads the next article in the feed while the user reads, enabling instant navigation.

Core Web Vitals:

The system optimizes for Google’s Core Web Vitals metrics. Largest Contentful Paint (LCP) is improved by optimizing image loading and reducing render-blocking resources. First Input Delay (FID) is minimized by reducing JavaScript execution time and using web workers for heavy computation. Cumulative Layout Shift (CLS) is prevented by reserving space for images and ads before they load. These optimizations improve search rankings and user experience.

Step 4: Wrap Up

In this chapter, we proposed a system design for a content publishing platform like Medium. If there is extra time at the end of the interview, here are additional points to discuss:

Key Design Decisions:

1. Hybrid Feed Architecture: We chose pre-computed feeds with real-time updates to balance personalization quality with latency. Nightly batch jobs generate high-quality ranked feeds using expensive ML models, while real-time injection keeps feeds fresh for time-sensitive content. This approach provides the best of both worlds: sophisticated recommendations without sacrificing responsiveness.

2. Redis for Clap Aggregation: Claps require instant visual feedback to encourage engagement. Redis provides sub-millisecond writes while Kafka batching ensures PostgreSQL isn’t overwhelmed. This achieves both excellent user experience and data durability. The eventual consistency is acceptable for a non-critical feature like claps.

3. Elasticsearch for Search: Full-text search across millions of articles demands inverted indexes and relevance ranking. Elasticsearch provides these capabilities along with faceted search and More Like This queries essential for discovery. The system shards by time period to manage costs, archiving old content to separate indexes.

4. Cassandra for Analytics: Reading sessions generate massive write volume, potentially millions per day. Cassandra’s write-optimized architecture and time-series data model perfectly fits this access pattern. The system can horizontally scale writes by adding more nodes without complex sharding logic.

5. Metered Paywall in Redis: The 3-article limit must be performant and accurate. Redis counters with monthly TTLs provide fast checks without complex database queries. Lost counts due to cache eviction favor users, which is an acceptable tradeoff for a soft paywall. This design prioritizes user experience over perfect enforcement.

Scaling Bottlenecks & Solutions:

Feed Generation at Scale: Generating personalized feeds for 100M users is computationally expensive. The solution is to tier users by activity level where highly active users get fresh feeds hourly, moderately active users get daily updates, and inactive users get weekly updates. This focuses computational resources on users who will notice the difference.

Real-time Clap Counting: Popular articles receive thousands of claps per minute, which could overwhelm traditional databases. The solution uses Redis counters for real-time updates with Kafka batching for eventual persistence. Eventual consistency is acceptable for claps since they’re not transactional.

Article Search: Elasticsearch cluster costs scale linearly with data volume. The solution archives old articles older than 2 years to a separate cold index with reduced replication. Most searches focus on recent content, so this reduces costs without impacting user experience.

Image Storage: User-uploaded images consume terabytes of storage. The solution uses S3 lifecycle policies to move old images to Glacier after 1 year. CloudFront CDN reduces origin load by caching popular images globally. Compression and format optimization reduce storage by 60% without visible quality loss.

Additional Features:

If time permits, discuss these enhancements:

Video Content: Support for video articles with transcoding pipelines. Videos would be uploaded to S3, transcoded to multiple resolutions using services like AWS MediaConvert, and delivered via CloudFront with adaptive bitrate streaming. This enables rich multimedia content beyond text and images.

Audio Articles: Text-to-speech for listening mode, allowing users to consume content while commuting or exercising. The system would generate audio versions using services like Amazon Polly, cache them in S3, and stream via CDN. Writers could optionally record their own narration.

Live Events: Real-time collaborative writing and Q&A sessions using WebSocket connections. Writers could host live sessions where they compose articles with audience participation. This requires bi-directional communication with connection pooling and horizontal scaling.

Advanced Analytics: Comprehensive writer dashboards with audience insights including geographic distribution, referral sources, reading time patterns, and follower growth trends. Machine learning models could predict optimal publishing times and suggest topics likely to perform well.

Mobile Apps: Native iOS and Android applications with offline reading support. Articles would be downloaded and cached locally, with synchronization when connectivity is restored. This improves the reading experience in low-connectivity environments.

Monitoring and Observability:

Key Metrics:

  • Feed latency at p50, p95, and p99 percentiles to ensure consistent performance.
  • Article page load time tracking Core Web Vitals for SEO rankings.
  • Search query latency to maintain sub-200ms response times.
  • Clap processing lag by monitoring Kafka consumer offset lag.
  • Recommendation click-through rate to evaluate ML model effectiveness.
  • Subscription conversion rate to optimize paywall design.
  • Writer earnings accuracy to ensure fair compensation.

Alerting:

  • Feed generation failures indicating batch job issues.
  • PostgreSQL replication lag exceeding 5 seconds risking data inconsistency.
  • Redis cache hit rate below 95% suggesting cache warming issues.
  • Elasticsearch cluster health status yellow or red indicating node failures.
  • Stripe webhook failures risking billing issues.
  • CDN 5xx error rate above 0.1% indicating origin problems.

Error Handling:

The system implements comprehensive error handling strategies. Network failures use retry logic with exponential backoff and jitter to prevent thundering herd problems. Service failures employ circuit breakers to prevent cascading failures when dependencies are unhealthy. Database failures trigger automatic failover to read replicas. Third-party API failures fall back to cached data or degraded functionality rather than complete outage.

Security Considerations:

All data in transit is encrypted using TLS 1.3. Sensitive data at rest including passwords and payment information is encrypted using AES-256. Authentication uses JWT tokens with short expiration and refresh token rotation. Authorization checks occur at multiple layers including API Gateway, service layer, and database row-level security. Rate limiting prevents abuse with different limits for authenticated and anonymous users. Input validation prevents SQL injection, XSS, and other injection attacks. Content Security Policy headers prevent XSS attacks in the browser.

Future Improvements:

Machine Learning Enhancements: Train specialized models for different user segments like new users, casual readers, and power users. Implement reinforcement learning to optimize long-term engagement rather than just immediate clicks. Use transformer models for better content understanding and similarity.

Global Expansion: Add multi-language support with automatic translation using neural machine translation. Implement region-specific recommendations accounting for cultural preferences. Deploy edge locations in more geographic regions for lower latency.

Creator Tools: Provide A/B testing for article titles and images to optimize engagement. Implement collaborative writing features for co-authored articles. Add monetization options beyond subscriptions including tips, paid articles, and premium publications.

Congratulations on getting this far! Designing Medium is a complex system design challenge that touches on many important concepts including content management, personalization, real-time systems, and monetization. The key is to start simple, satisfy functional requirements first, then layer in the non-functional requirements and optimizations.


Summary

This comprehensive guide covered the design of a content publishing platform like Medium, including:

  1. Core Functionality: Article creation with rich text editing, personalized feeds, social engagement through claps and comments, publication management, and subscription-based paywalls.
  2. Key Challenges: High-throughput clap counting, personalized feed generation at scale, metered paywall enforcement, and SEO optimization.
  3. Solutions: Redis for real-time counting, hybrid feed architecture with pre-computation and real-time updates, Elasticsearch for search and discovery, Cassandra for analytics, and machine learning for recommendations.
  4. Scalability: Tiered feed generation, asynchronous processing with Kafka, CDN for media delivery, and database sharding by geography.

The design demonstrates how to build a content platform that balances user experience, personalization, monetization, and operational efficiency at scale serving millions of users worldwide.