Design Pinterest

Pinterest is a visual discovery platform where users find, save, and organize ideas through images (pins) on customizable boards. At 500M monthly active users, Pinterest must handle massive image storage, real-time feed generation, sophisticated visual search, and personalized recommendations.

Designing Pinterest presents unique challenges including handling billions of images, visual similarity search using deep learning, personalized feed generation at scale, and real-time updates across a distributed social graph.

Step 1: Understand the Problem and Establish Design Scope

Before diving into the design, it’s crucial to define the functional and non-functional requirements. For user-facing applications like this, functional requirements are the “Users should be able to…” statements, whereas non-functional requirements define system qualities via “The system should…” statements.

Functional Requirements

Core Requirements (Priority 1-4):

Users should be able to create pins by uploading images or saving from URLs with title, description, and link metadata.
Users should be able to organize pins into boards with different privacy levels (public, private, secret).
Users should be able to see a personalized home feed with pins from followed users, boards, and recommendations.
Users should be able to search for pins using text queries or by uploading an image for visual similarity search.

Below the Line (Out of Scope):

Users should be able to like and comment on pins.
Users should be able to follow other users or specific boards.
Users should be able to create collaborative boards where multiple users can pin.
Users should be able to repin existing pins to their own boards.
The system should support rich pins with extra metadata for articles, products, and recipes.

Non-Functional Requirements

Core Requirements:

The system should prioritize low latency for home feed generation (< 200ms at p99).
The system should handle massive scale (500M MAU, 300B total pins, 5M new pins daily).
The system should ensure visual search returns results quickly (< 500ms).
The system should serve images globally with low latency (< 100ms from CDN).

Below the Line (Out of Scope):

The system should maintain 99.99% uptime with multi-region deployment.
The system should prevent data loss for uploaded images and user content.
The system should gracefully degrade during peak traffic.
The system should comply with data privacy regulations.

Clarification Questions & Assumptions:

Platform: Web, iOS, and Android apps.
Scale: 500M monthly active users, 100M daily active users, 300B total pins.
Image Volume: 10M image uploads per day, average pin size 500KB after processing.
Consistency: Strong consistency for user actions, eventual consistency acceptable for feeds.
Image Processing: Can be asynchronous with 1-2 minute delay acceptable.

Step 2: Propose High-Level Design and Get Buy-in

Planning the Approach

For user-facing product-style questions, we’ll build the design sequentially, going through functional requirements one by one. This keeps us focused and ensures comprehensive coverage of all requirements.

Defining the Core Entities

To satisfy our key functional requirements, we’ll need the following entities:

Pin: An individual image with metadata. Includes the image URL pointing to S3 storage, title, description, source link, category, dominant color, dimensions, engagement metrics (likes, repins, comments), and timestamps. Each pin is uniquely identified and belongs to a user and board.

Board: A collection of pins created by users. Contains name, description, category, privacy settings (public, private, secret), collaboration status, pin count, follower count, and timestamps. Boards are the primary organizational unit in Pinterest.

User: Any user on the platform. Contains personal information, authentication credentials, profile data, interests, privacy settings, and social graph relationships (followers and following).

Feed: A personalized stream of pins for each user. Represents the ranked and filtered collection of pins that appear on the home page, combining content from followed users, boards, and algorithmic recommendations.

Visual Embedding: A high-dimensional numerical representation of pin images generated by convolutional neural networks. Stored as float vectors (typically 2048 dimensions) and used for visual similarity search to find visually similar pins.

API Design

Pin Creation Endpoint: Used to create a new pin by uploading an image or providing a URL to save.

POST /pins -> Pin
Body: {
  image: file or url,
  title: string,
  description: string,
  link: string,
  boardId: string
}

Board Creation Endpoint: Allows users to create new boards for organizing their pins.

POST /boards -> Board
Body: {
  name: string,
  description: string,
  privacy: "public" | "private" | "secret"
}

Home Feed Endpoint: Returns the personalized home feed for the authenticated user with infinite scroll pagination.

GET /feed?cursor={cursor} -> { pins: Pin[], nextCursor: string }

Text Search Endpoint: Searches for pins, boards, and users based on text queries.

GET /search?q={query}&type=pins -> { results: Pin[], cursor: string }

Visual Search Endpoint: Finds visually similar pins by uploading a query image.

POST /search/visual -> { results: Pin[], similarities: number[] }
Body: {
  image: file
}

High-Level Architecture

Let’s build up the system sequentially, addressing each functional requirement:

1. Users should be able to create pins by uploading images or saving from URLs

The core components necessary for pin creation are:

Client Applications: Web, iOS, and Android apps that provide the user interface for creating pins. Handle image selection, metadata input, and upload initiation.
API Gateway: Entry point for all client requests. Handles authentication using JWT tokens, rate limiting to prevent abuse, and routes requests to appropriate backend services.
Pin Service: Manages all pin-related operations including creation, updates, and deletion. Validates uploaded images for file type, size (under 20MB), and dimensions. Generates unique pin IDs and coordinates the image upload to S3.
Object Storage (S3): Stores all uploaded images across multiple tiers. Uses multipart upload for large files and maintains versioning for backup purposes. Images are organized by pin ID with different resolutions stored separately.
Message Queue (Kafka): Distributes image processing tasks across worker nodes. Ensures asynchronous processing doesn’t block pin creation and provides retry capability for failed jobs.
Image Processing Workers: Multiple consumer groups that transform images, generate embeddings, extract metadata, and perform content moderation in parallel.
Database (PostgreSQL): Stores pin metadata including references to S3 image paths, user and board associations, engagement counts, and timestamps. Sharded by pin ID for horizontal scalability.

Pin Creation Flow:

The user selects an image and enters metadata in the client app, which sends a POST request to create the pin.
The API gateway authenticates the request and forwards it to the Pin Service.
The Pin Service validates the image, generates a unique pin ID, and uploads the original image to S3.
The service creates a pin record in PostgreSQL with status “processing” and returns the pin to the client.
The Pin Service publishes a message to Kafka containing the pin ID and S3 path for asynchronous processing.
Image processing workers consume the message and generate multiple resolutions, extract embeddings, perform moderation, and update the pin status to “active” when complete.

2. Users should be able to organize pins into boards with different privacy levels

We extend our existing design to support board management:

Board Service: Handles board CRUD operations, pin-to-board associations, permission management, and collaborative board features. Validates privacy settings and ensures users can only modify boards they own or have collaboration access to.
Board Table: Stores board metadata in PostgreSQL including owner, privacy settings, collaboration status, and engagement metrics.

Board Creation Flow:

The user creates a board in the client app, specifying name, description, and privacy level.
The API gateway forwards the request to the Board Service after authentication.
The Board Service creates a new board record in PostgreSQL associated with the user.
When users add pins to boards, the Pin Service updates the board_id reference on the pin record.

3. Users should be able to see a personalized home feed with pins from followed users, boards, and recommendations

We introduce new components to generate personalized feeds:

User Service: Manages user profiles, authentication, social graph (followers and following), interests, and preferences. Stores relationships between users and between users and boards.
Feed Service: Generates personalized home feeds by combining content from followed sources with algorithmic recommendations. Implements smart ranking using machine learning to predict user engagement.
Cache Layer (Redis): Stores pre-computed feeds, ranked results, and frequently accessed data. Uses sorted sets for feed storage with timestamps as scores for efficient pagination.
Recommendation Engine: Analyzes user behavior, pin engagement, and visual similarity to suggest relevant pins. Uses collaborative filtering and content-based approaches.
Graph Database (HBase): Stores the social graph and user-pin-board relationships optimized for graph traversal queries. Column families organize followers, following, pins, boards, and interest scores.

Feed Generation Flow:

The user opens the app, which requests the home feed from the Feed Service.
The Feed Service checks Redis cache for a pre-computed feed for this user.
If not cached, it employs a hybrid push-pull approach: for users following few accounts (under 1000), it uses fan-out on write (push) where new pins are written to follower feeds. For users following many accounts, it uses fan-out on read (pull) and merges recent content in real-time.
The service fetches candidate pins from followed users and boards, adds recommended pins based on user interests, and includes trending content.
A machine learning ranking model scores each pin by predicting engagement probability using features like category affinity, visual similarity to liked pins, social proof, and freshness.
Post-processing applies diversity constraints to avoid consecutive pins from the same source and inserts discovery content periodically.
The ranked feed is cached in Redis with a 10-minute TTL and the first page is returned to the client.

4. Users should be able to search for pins using text queries or by uploading images for visual similarity search

We add specialized search capabilities:

Search Service: Handles both text and visual search queries. Coordinates between Elasticsearch for text search and vector databases for visual search. Implements query understanding, autocomplete, and result ranking.
Elasticsearch: Indexes pin metadata, titles, descriptions, and categories for fast text-based retrieval. Supports autocomplete suggestions and trending searches.
Vector Database (Faiss): Stores visual embeddings generated by CNNs and performs approximate nearest neighbor search for visual similarity queries. Uses HNSW (Hierarchical Navigable Small World) indexing for fast retrieval at scale.
CNN Embedding Service: Processes query images through the same neural network model used during pin creation (ResNet-50 or EfficientNet) to generate consistent embeddings for comparison.

Text Search Flow:

The user enters a search query in the client app.
The Search Service receives the query and forwards it to Elasticsearch.
Elasticsearch performs full-text search across indexed pin metadata, applying relevance scoring.
Results are re-ranked based on engagement metrics and user preferences before being returned.

Visual Search Flow:

The user uploads an image for visual search.
The CNN Embedding Service processes the image, resizing to model input dimensions and extracting a 2048-dimensional feature vector.
The vector is sent to Faiss for approximate nearest neighbor search, finding the top 100 most similar pins based on cosine similarity.
Results are re-ranked using a hybrid score combining visual similarity, engagement metrics, and freshness.
A diversification algorithm applies Maximal Marginal Relevance to ensure variety in colors, categories, and sources.
The top 50 pins are returned to the client with relevance scores.

Step 3: Design Deep Dive

With core functional requirements met, it’s time to dig into the non-functional requirements via deep dives. These critical areas separate good designs from great ones.

Deep Dive 1: How do we handle massive image storage and serve billions of images globally with low latency?

Managing petabytes of image data and ensuring fast delivery worldwide requires a sophisticated content delivery strategy.

Challenge: With 300B pins averaging 500KB each (after processing), we’re storing approximately 150 petabytes of image data. Additionally, serving 10B+ image requests per day globally while maintaining under 100ms latency is extremely challenging.

Solution: Multi-Tier CDN Architecture with Dynamic Optimization

The architecture uses CloudFront CDN with S3 origin, optimized across multiple dimensions:

Storage Organization: Images are stored in S3 organized by resolution tier. For each pin, multiple versions are generated: thumbnails at 236x236 for grid views, medium at 600px width for feed views, large at 1200px width for detail views, and the original high-resolution version for backup. Each resolution is stored in separate S3 prefixes for efficient retrieval. Modern image formats like WebP and AVIF are generated alongside JPEG, providing 30-50% size reduction.

CDN Distribution: CloudFront operates 300+ edge locations globally, serving cached images near users. Origin shield sits between edge locations and S3, reducing load on the origin by consolidating requests. Lambda@Edge functions run at edge locations for dynamic transformations and intelligent routing based on user location.

Dynamic Image Resizing: When a client requests a custom image size not pre-generated, Lambda@Edge intercepts the request. If the size isn’t cached, it fetches the original from S3, resizes using optimized libraries, caches the result at the edge with a one-year TTL, and returns it to the client. Subsequent requests for the same size are served directly from edge cache.

Format Negotiation: The system inspects the Accept header in requests to determine browser capabilities. Modern browsers receive WebP format for 30% smaller files, cutting-edge browsers get AVIF for 50% reduction, and older clients fall back to JPEG. This intelligent format selection happens transparently at the CDN edge.

Progressive Loading: To improve perceived performance, the system employs multiple techniques. Progressive JPEGs load incrementally, showing a low-quality preview that sharpens as more data arrives. Low-Quality Image Placeholders (LQIP) are tiny 20x20 thumbnails stored base64-encoded in the database and inlined in HTML, providing instant visual feedback while full images load in the background.

Cache Strategy: Images are immutable since pin IDs never change, allowing aggressive caching. Cache-Control headers set max-age to one year with the immutable directive. This results in 95% of images served from edge in under 50ms, 4% from regional shield in under 150ms, and only 1% from S3 origin in under 300ms.

Cost Optimization: To manage storage costs at this scale, infrequently accessed images (older than one year) move to S3 Intelligent-Tiering automatically. Images over two years old migrate to Glacier for long-term archival. Lazy loading images below the fold saves approximately 40% of bandwidth. Combined with aggressive compression, these optimizations reduce storage and bandwidth costs by 60%.

Deep Dive 2: How do we implement visual search at scale using deep learning embeddings?

Visual search allows users to find similar pins by uploading images, requiring sophisticated computer vision and efficient similarity search across billions of vectors.

Challenge: Processing uploaded images through neural networks in real-time, searching billions of high-dimensional vectors for nearest neighbors, and returning results in under 500ms presents significant computational challenges.

Solution: CNN-Based Embedding Pipeline with Approximate Nearest Neighbor Search

The visual search system operates through several coordinated stages:

Embedding Generation: During pin creation, each image passes through a pre-trained convolutional neural network, specifically ResNet-50 or EfficientNet. The image is resized to the model’s expected input dimensions (typically 224x224), normalized, and fed through the network. The output from the last pooling layer before classification produces a 2048-dimensional feature vector that captures visual characteristics. This vector undergoes L2 normalization to enable cosine similarity comparisons and is stored in the vector database.

Vector Index Architecture: Faiss (Facebook AI Similarity Search) maintains the index of billions of vectors. The index uses HNSW (Hierarchical Navigable Small World) structure, which creates a multi-layer graph where each node represents a vector and edges connect similar vectors. This enables logarithmic search time complexity. Product Quantization compresses vectors from 8KB (2048 floats × 4 bytes) down to approximately 96 bytes, allowing billions of vectors to fit in memory while maintaining search accuracy.

Query Processing: When a user uploads an image for visual search, it goes through the same CNN pipeline to generate a query embedding. The normalized vector is submitted to Faiss, which traverses the HNSW graph to find the approximate 100 nearest neighbors based on cosine similarity (implemented as dot product after normalization). This search completes in 20-50ms despite the massive index size.

Re-ranking and Filtering: The top 100 candidates undergo re-ranking with business logic. NSFW content is filtered if the user has safe search enabled. Pins from blocked users are excluded. The system applies a hybrid scoring formula combining 70% visual similarity, 20% engagement score (weighted likes, repins, and comments), and 10% freshness score (boosting recent pins).

Diversification: To prevent showing too many visually identical or similar pins, the system applies Maximal Marginal Relevance. This algorithm selects results that are both relevant to the query and diverse from already-selected results. It ensures variety in dominant colors, categories, and pin sources, creating a more engaging and useful result set.

Performance Optimizations: The vector index is sharded by category, allowing searches to target relevant subsets. Hot indexes containing recent pins stay in memory for fastest access. GPU acceleration speeds up embedding generation for query images. Batch processing handles multiple concurrent queries efficiently. Popular query embeddings are cached to avoid redundant CNN inference.

Continuous Improvement: The system uses a continuous training pipeline. User engagement data (clicks, saves on search results) trains the ranking model. Triplet loss optimization ensures the embedding space places visually similar pins closer together. The model learns from Pinterest-specific data rather than generic ImageNet features. New models undergo A/B testing before full deployment, ensuring improvements in user engagement metrics.

Deep Dive 3: How do we generate personalized feeds at scale with millions of updates per minute?

Generating personalized feeds for 100M daily active users while incorporating real-time updates from followed users and algorithmic recommendations requires careful architectural decisions.

Challenge: Each user might follow hundreds of other users and boards. With millions of pins created daily, naively regenerating feeds on every request would be computationally prohibitive and couldn’t meet the 200ms latency requirement.

Solution: Hybrid Push-Pull with Smart Ranking and Multi-Level Caching

The feed system employs different strategies based on user characteristics:

Fan-out on Write (Push) for Regular Users: For users following fewer than 1000 accounts, the system uses push-based feed distribution. When user A creates or repins a pin, the system fetches A’s follower list from HBase. For each follower, it adds the pin ID to their feed cache in Redis, implemented as a sorted set with timestamps as scores. This allows efficient retrieval of recent pins. Each user’s feed cache keeps only the latest 1000 pins, with automatic trimming of older content. The cache expires after 30 days to prevent unbounded growth.

Fan-out on Read (Pull) for Power Users: For users following more than 1000 accounts (celebrities, influencers), push becomes impractical. Instead, when these users request their feed, the system fetches the list of followed users and boards, queries recent pins from each (limited to the last 7 days), merges results, and ranks them in real-time. Results are cached for 5 minutes to serve subsequent requests efficiently.

Smart Ranking Algorithm: The ranking pipeline operates in stages. First, candidate generation collects 1000+ pins from the feed cache or fresh pulls, adds recommended pins based on user interests stored in HBase, includes trending pins from users with similar engagement patterns, and adds pins from boards the user might like based on collaborative filtering.

Feature Engineering: For each candidate pin, the system extracts comprehensive features. User features include interest categories, recent engagement history, and demographics. Pin features include category, engagement rate (weighted sum of likes, repins, comments divided by age in hours), visual characteristics, and creator authority metrics. User-pin interaction features measure category affinity (how much the user engages with this category), visual similarity to previously liked pins, social proof (mutual connections who engaged), and freshness.

Machine Learning Ranking: A Gradient Boosted Decision Tree model (XGBoost) trained on historical engagement data predicts the probability that the user will engage with each pin (like, repin, click, or save). The model incorporates 200+ features and updates daily with fresh training data to adapt to evolving user preferences and content trends.

Post-Processing: The top 500 pins undergo diversity filtering. The system enforces constraints: maximum 3 consecutive pins from the same board, maximum 5 pins from the same creator in the top 50, and category variation every 10 pins. Content pillars are inserted at regular intervals: every 20th pin shows trending or viral content, and every 30th pin introduces discovery content from new categories to broaden user interests. Duplicates (same pin appearing from different sources) are removed.

Caching and Pagination: The system returns the first 50 pins to the client while caching the complete ranked feed in Redis with a 10-minute TTL. The next page generates in the background using a cursor-based pagination scheme. Feed caches invalidate when users follow or unfollow accounts, ensuring relevance. For very active users, WebSocket connections enable real-time updates, while web users poll every 30 seconds for new pins.

Performance Metrics: Pre-computation runs for active users every 15 minutes, storing results in Redis with 30-minute TTLs. If the ranking service fails, the system falls back to chronological ordering of followed content. The complete feed generation pipeline completes in under 200ms at p99, meeting the strict latency requirement.

Deep Dive 4: How do we prevent duplicate pins and maintain content quality?

With millions of uploads daily from diverse sources, duplicate detection is essential for content quality and storage efficiency.

Challenge: Users frequently upload the same images, sometimes with minor modifications like different crops or filters. Without deduplication, the platform would fill with redundant content, degrading user experience and wasting storage.

Solution: Multi-Level Deduplication Using Perceptual Hashing and CNN Embeddings

The deduplication system operates at multiple stages:

Perceptual Hashing: Each uploaded image generates a perceptual hash (pHash), a 64-bit fingerprint that remains stable despite minor image modifications like resizing, compression, or slight cropping. The algorithm converts the image to grayscale, resizes to a small fixed size, applies a discrete cosine transform, and extracts low-frequency components to form the hash. Similar images produce pHashes with low Hamming distance (number of differing bits).

Hash Storage and Lookup: A dedicated table in PostgreSQL stores pin IDs with their corresponding pHash and dHash (difference hash, providing complementary detection). A B-tree index on pHash enables fast lookups. During upload, the system generates the pHash and queries for existing hashes with Hamming distance less than 5, indicating likely duplicates.

Upload-Time Deduplication: When a duplicate is detected during upload, the user sees the existing pin with options to save it to their board or proceed with uploading anyway. If they save the existing pin, its save count increments, and the new upload is prevented. This interactive approach respects user intent while preventing most duplicates.

Post-Processing Batch Jobs: Daily batch jobs scan for duplicates that slipped through initial detection. For each duplicate set, the system identifies the pin with the highest engagement (combined likes, repins, comments) as canonical. Other duplicates are marked as such in the database, their URLs redirect to the canonical pin, and comments and saves merge to the canonical pin. This consolidates engagement and improves content discovery.

CNN-Based Near-Duplicate Detection: For images that are visually similar but not identical (same content with different framing, filters, or edits), the visual embeddings from the CNN provide detection. If two pins have cosine similarity above 0.95 in embedding space, they’re flagged as near-duplicates for human review. This catches cases that perceptual hashing might miss.

URL-Based Deduplication: Since many pins originate from URLs, the system hashes and stores source URLs. When the same URL is saved multiple times, it allows different crops or edits (as users might want to highlight different aspects) but displays “X people saved this” on the pin detail page, providing social proof.

Impact Metrics: Deduplication reduces storage requirements by approximately 15%, improves feed quality by ensuring more diverse content, and saves bandwidth and processing costs. Users see fewer repetitive pins, improving engagement and satisfaction.

Deep Dive 5: How do we scale the database layer to handle hundreds of billions of pins?

Storing and querying metadata for 300B pins with complex access patterns requires sophisticated database architecture.

Challenge: A single PostgreSQL instance cannot handle this data volume or query load. Queries span different access patterns: by user, by board, by category, by time range, and by social graph relationships.

Solution: Horizontal Sharding with Hybrid Storage Architecture

The data layer uses multiple specialized databases optimized for different access patterns:

PostgreSQL Sharding for Pins and Boards: Pin and board metadata reside in PostgreSQL sharded by ID using consistent hashing. The system runs 256 shards distributed across 64 database servers, with each shard handling approximately 1.2B pins at full scale. Shard assignment uses the hash of the pin ID modulo the shard count, ensuring even distribution. Each shard maintains indexes on user_id, board_id, category, and created_at for efficient querying.

HBase for Social Graph: The social graph (followers, following, user-pin-board relationships) is stored in HBase, a distributed column-oriented database optimized for sparse data and high write throughput. The row key is user_id, and column families organize different relationship types: following (user IDs and board IDs with timestamps), followers (user IDs), pins (pin IDs saved by the user), boards (board IDs owned), and interests (categories with affinity scores).

Redis for High-Speed Cache: Redis provides multiple caching layers. Feed caches store pre-computed personalized feeds as sorted sets. Metadata caches hold frequently accessed pin and board details with one-hour TTLs. Session data and distributed locks use Redis for its atomic operations and TTL support. Cache hit ratios above 90% dramatically reduce database load.

Elasticsearch for Search: Pin metadata is indexed in Elasticsearch for text search. The index includes titles, descriptions, categories, and user-generated content. Sharding across multiple Elasticsearch nodes enables horizontal scaling, with replicas providing fault tolerance and read scalability.

Faiss for Vector Search: Visual embeddings live in Faiss, optimized for high-dimensional similarity search. The index is sharded by category, allowing targeted searches within relevant content subsets.

Cross-Database Queries: When queries require data from multiple systems, the application layer performs scatter-gather operations. For example, finding pins from a user involves: fetching pin IDs from HBase (user’s pins), querying PostgreSQL shards in parallel for pin metadata, and assembling the final result. Caching aggressively reduces the need for repeated cross-database queries.

Consistency Considerations: The system uses strong consistency for user actions affecting their own data (creating pins, updating boards) by writing to the primary database shard and waiting for acknowledgment. Feed generation and recommendations use eventual consistency, reading from replicas that may be slightly behind. This trade-off prioritizes availability and performance for read-heavy operations while ensuring correctness for writes.

Showing relevant “More like this” and “Related pins” requires efficient graph traversal across billions of relationships.

Challenge: Finding related pins involves multiple relationship types: same board, visual similarity, co-saved by similar users, and semantic similarity. Naively querying these relationships for every pin view would be prohibitively slow.

Solution: Pre-Computed Graph Edges with Multi-Hop Traversal

The related pins system builds on the graph structure stored in HBase:

Graph Edge Types: The system maintains multiple edge types between pins. Same-board edges connect pins in the same collection, providing contextual similarity. Visual similarity edges link pins with embedding cosine similarity above a threshold. Behavioral edges connect pins frequently saved by the same users. Semantic edges link pins with similar text descriptions.

Multi-Hop Traversal: Starting from a source pin, the algorithm performs multi-hop traversal. The first hop retrieves up to 20 pins from the same board using HBase column family queries. The second hop takes those pins and finds visually similar pins using Faiss approximate nearest neighbor search on their embeddings. The third hop queries behavioral data, finding pins saved by users who also saved the source pin. This generates approximately 500 candidate pins.

Similarity Scoring: Each candidate receives a composite score. Visual similarity comes from cosine distance in embedding space. Text similarity uses embeddings of titles and descriptions. Behavioral similarity measures the Jaccard similarity of saver sets (users who saved both pins divided by users who saved either). Same-board bonus applies if the candidate shares the board with the source. The final score combines these with weights: 50% visual, 20% text, 20% behavioral, and 10% same-board.

Filtering and Ranking: The system removes the source pin and any duplicates. Low-quality pins (flagged as spam or NSFW) are filtered out. Remaining pins rank by similarity score, then diversity filtering ensures maximum 5 pins from any single board. The top results balance relevance with variety.

Caching Strategy: For popular pins (over 1000 saves), related pins are pre-computed and cached in Redis with a 24-hour TTL under the key pattern related:{pin_id}. Long-tail pins compute related content on-demand, which is acceptable given their lower view frequency. Background jobs asynchronously update cached results for trending pins as their engagement patterns change.

HBase Schema Optimization: The HBase schema stores graph edges efficiently. Row keys are pin IDs. Column families separate edge types: same_board stores related pin IDs with scores, visual_sim stores visually similar pin IDs with cosine similarities, and co_saved stores co-saved pin IDs with Jaccard similarities. This organization enables fast retrieval of specific edge types without scanning unnecessary data.

Performance Characteristics: Graph queries on HBase complete in under 50ms with proper key design and region distribution. Visual similarity queries to Faiss finish in under 20ms thanks to HNSW indexing. Total related pins computation completes in under 100ms, meeting real-time requirements. Batch pre-computation for hot content ensures the majority of requests hit cache.

Step 4: Wrap Up

In this design, we proposed a comprehensive system for a visual discovery platform like Pinterest. If there is extra time at the end of the interview, here are additional points to discuss:

Additional Features:

Video pins: Extend the architecture to support video uploads with thumbnail generation, streaming optimization, and video embeddings for similarity search.
Rich pins: Enhanced metadata for specific content types like articles (headline, author), products (price, availability), and recipes (ingredients, cooking time).
Messaging: Direct messaging between users for sharing pins and boards privately.
Real-time collaboration: Multiple users editing the same board simultaneously with conflict resolution.
Native shopping: In-app checkout and shopping features with inventory management and order processing.

Scalability Considerations:

Horizontal Scaling: All services are stateless, enabling easy horizontal scaling by adding more instances behind load balancers.
Database Sharding: Geographic sharding can complement ID-based sharding, keeping European users’ data in EU data centers for GDPR compliance and reduced latency.
Caching Layers: Multi-level caching from browser cache to CDN to application cache to database query cache reduces load at each tier.
Asynchronous Processing: Image processing, embedding generation, and feed materialization all happen asynchronously via Kafka, decoupling write path from read path.

Error Handling:

Image Processing Failures: Dead letter queues capture failed processing jobs for manual review. Pins remain in “processing” state until completion or failure is determined.
CDN Failures: Multiple CDN providers can be used with automatic failover. If CloudFront fails, requests route to Fastly or Akamai.
Database Failures: Read replicas provide failover for query traffic. PostgreSQL streaming replication enables quick promotion of replicas to primary in case of failure.
Search Index Failures: Elasticsearch cluster replication across availability zones ensures search remains available despite node failures.

Security Considerations:

Encrypt images at rest in S3 using AES-256 encryption and in transit using TLS 1.3.
Implement proper authentication using JWT tokens with short expiration and refresh token rotation.
Rate limiting prevents abuse at multiple layers: API gateway, service level, and database level.
Input validation and sanitization prevent injection attacks and malicious file uploads.
NSFW detection and content moderation use machine learning models to automatically flag inappropriate content for review.

Monitoring and Observability:

Track key metrics: feed load time (p50, p95, p99), image upload success rate, visual search accuracy measured by click-through rate, cache hit ratios across all caching layers, database query latency, and Kafka consumer lag.
Distributed tracing using OpenTelemetry tracks requests end-to-end across microservices, identifying bottlenecks in pin creation flows, CNN inference latency, and feed generation pipelines.
Alerting triggers when feed latency exceeds 500ms for 5 minutes, image upload failure rate exceeds 1%, CDN error rate exceeds 0.1%, database replication lag exceeds 30 seconds, or Kafka consumer lag exceeds 1 million messages.

Disaster Recovery:

S3 cross-region replication ensures images survive regional outages. PostgreSQL continuous Write-Ahead Log archiving enables point-in-time recovery. Daily HBase snapshots back up the social graph. Redis persistence with Append-Only File provides durability for cached data.
Multi-region deployment with active-active in US-West and US-East and active-passive in EU and Asia provides resilience. Route 53 latency-based routing directs users to the nearest healthy region. Automated failover completes in under 5 minutes using health checks and DNS updates.

Future Enhancements:

Real-time personalization: Update feed ranking models in real-time based on user actions in the current session, adapting to immediate interest signals.
Augmented reality try-on: Allow users to visualize products in their physical space using AR, especially for home decor and furniture pins.
Advanced recommendation models: Incorporate graph neural networks to better model user-pin-board relationships and improve recommendation quality.
Improved visual search: Use multi-modal models that combine image and text understanding for more nuanced similarity matching.
Social features expansion: Add live boards where users can collaborate in real-time, group discussions around pins, and shared collections.

This design provides a production-grade foundation for Pinterest that scales to hundreds of millions of users while maintaining low latency, high availability, and excellent user experience. The architecture leverages industry-standard technologies combined with machine learning and computer vision to deliver a sophisticated visual discovery platform that handles billions of images and serves personalized content to a global user base.

Summary

This comprehensive guide covered the design of a visual discovery platform like Pinterest, including:

Core Functionality: Pin creation with image processing, board organization, personalized feed generation, and text/visual search capabilities.
Key Challenges: Massive image storage and delivery, visual similarity search at scale, personalized feed generation with millions of updates, content deduplication, and database scaling to billions of records.
Solutions: Multi-tier CDN architecture with dynamic optimization, CNN-based embeddings with Faiss ANN search, hybrid push-pull feed generation with ML ranking, multi-level deduplication using perceptual hashing and embeddings, and horizontal sharding with hybrid storage.
Scalability: Stateless services for horizontal scaling, multi-level caching, asynchronous processing via message queues, and specialized databases for different access patterns.

The design demonstrates how to handle a content-heavy platform with sophisticated machine learning requirements, real-time personalization, and global distribution while maintaining sub-second latencies and high availability.

Design Pinterest