Design Airbnb

Airbnb is a global vacation rental marketplace connecting hosts who have properties to rent with guests seeking accommodations. At scale, Airbnb handles millions of listings across 220+ countries, processes billions in transactions annually, and serves over 150 million users. This system must handle complex geospatial searches, prevent double bookings through sophisticated concurrency control, process payments securely, manage dynamic pricing, and maintain trust through review systems.

Designing Airbnb presents unique challenges including efficient geospatial search, booking concurrency control, payment escrow management, fraud detection in reviews, calendar conflict resolution, and dynamic pricing optimization.

Step 1: Understand the Problem and Establish Design Scope

Before diving into the design, it’s crucial to define the functional and non-functional requirements. For user-facing applications like this, functional requirements are the “Users should be able to…” statements, whereas non-functional requirements define system qualities via “The system should…” statements.

Functional Requirements

Core Requirements (Priority 1-3):

  1. Hosts should be able to create, update, and delete property listings with photos, amenities, descriptions, and pricing.
  2. Guests should be able to search for properties by location, dates, and filters (price, amenities, property type).
  3. Guests should be able to book available properties for specific dates without double bookings.
  4. The system should process payments securely, holding funds in escrow until check-in completion.
  5. Guests and hosts should be able to review each other after stays through a dual-blind review system.

Below the Line (Out of Scope):

  • Hosts should be able to manage availability calendars with blocked dates and minimum stay requirements.
  • The system should provide dynamic pricing recommendations based on demand and competition.
  • Users should be able to message each other before and during bookings.
  • The system should support cancellations with policy-based refunds.
  • The system should handle special offers and promotional pricing.

Non-Functional Requirements

Core Requirements:

  • The system should prioritize low latency for searches (p99 < 200ms).
  • The system should ensure strong consistency in booking operations to prevent double bookings.
  • The system should be highly available (99.99% uptime for booking and payment systems).
  • The system should scale to handle 50,000 searches/second during peak travel seasons.

Below the Line (Out of Scope):

  • The system should comply with PCI DSS standards for payment data security.
  • The system should support multi-region deployment for global low latency.
  • The system should implement comprehensive fraud detection for reviews and bookings.
  • The system should provide real-time analytics for hosts and internal monitoring.

Clarification Questions & Assumptions:

  • Platform: Web and mobile apps for both guests and hosts.
  • Scale: 7 million active listings, 150 million users, 100 million bookings per year.
  • Search Volume: 50,000 queries per second during peak, 15,000 QPS average.
  • Booking Window: Support bookings up to 2 years in advance.
  • Payment Processing: Use third-party processor (Stripe) for actual payment handling.

Step 2: Propose High-Level Design and Get Buy-in

Planning the Approach

For user-facing product-style questions, the plan should be straightforward: build your design up sequentially, going one by one through your functional requirements. This will help you stay focused and ensure you don’t get lost in the weeds.

Defining the Core Entities

To satisfy our key functional requirements, we’ll need the following entities:

Listing: A property available for rent. Contains details such as title, description, location coordinates, property type (entire home, private room, shared room), capacity, number of bedrooms and bathrooms, amenities list, photos, base price per night, house rules, cancellation policy, and availability calendar data.

User: Anyone using the platform as either a host or guest. Includes personal information, verification status (ID, phone, email), payment methods, payout preferences for hosts, booking history, review history, and trust scores.

Booking: An individual reservation from request through completion. Records the listing ID, guest ID, host ID, check-in and check-out dates, total price, booking status (pending, confirmed, completed, cancelled), payment details, created timestamp, and confirmation details.

Review: Feedback submitted by guests about properties and hosts, or by hosts about guests. Includes the booking reference, reviewer and reviewee IDs, rating (1-5 stars), review text, review type (guest-to-host or host-to-guest), submission timestamp, and visibility status (hidden until both parties submit).

Availability: Date-specific availability and pricing for listings. Tracks each date for a listing, whether it’s available or blocked, the booking ID if reserved, price override for that date, minimum stay requirement, and the reason for blocking (booked, host-blocked, maintenance).

API Design

Search Listings Endpoint: Used by guests to search for available properties based on location, dates, and various filters.

POST /search -> SearchResults
Body: {
  location: { lat, long, radius },
  checkIn: date,
  checkOut: date,
  filters: { priceRange, amenities, propertyType, capacity }
}

Create Listing Endpoint: Used by hosts to create new property listings with all necessary details.

POST /listings -> Listing
Body: {
  title, description, location, propertyType,
  capacity, amenities, photos, pricing, houseRules
}

Create Booking Endpoint: Used by guests to book a property for specific dates after reviewing availability and pricing.

POST /bookings -> Booking
Body: {
  listingId: string,
  checkInDate: date,
  checkOutDate: date,
  guestCount: number
}

Process Payment Endpoint: Used to authorize payment for a booking. The payment is held in escrow until check-in completion.

POST /payments -> Payment
Body: {
  bookingId: string,
  paymentMethodId: string,
  amount: number
}

Submit Review Endpoint: Used by guests or hosts to submit reviews after a completed stay. Reviews remain hidden until both parties submit.

POST /reviews -> Review
Body: {
  bookingId: string,
  rating: number,
  reviewText: string,
  reviewType: "guest-to-host" | "host-to-guest"
}

Note: User authentication is handled via JWT tokens in request headers, not in request bodies, to ensure security and prevent tampering.

High-Level Architecture

Let’s build up the system sequentially, addressing each functional requirement:

1. Hosts should be able to create and manage property listings

The core components necessary for listing management are:

  • Client Applications: Web and mobile interfaces for hosts to manage their listings. Provides rich UI for uploading photos, setting prices, managing availability, and viewing booking analytics.
  • API Gateway: Entry point for all client requests, handling authentication, rate limiting, request routing, and TLS termination. Routes listing management requests to the appropriate backend service.
  • Listing Service: Manages all CRUD operations for property listings. Validates listing data, stores listing metadata in the database, coordinates photo uploads to object storage, maintains listing state, and publishes listing change events to the event stream.
  • Object Storage (S3): Stores listing photos and documents. Photos are uploaded directly from clients with pre-signed URLs, processed for different sizes (thumbnail, medium, full), and served via CDN for fast global delivery.
  • Database (PostgreSQL): Stores listing metadata including details, amenities, and house rules. Uses relational structure to handle complex queries and maintain data integrity.
  • Event Stream (Kafka): Publishes listing creation and update events to downstream consumers, particularly the search indexing service for eventual consistency in search results.
  • CDN (CloudFront): Caches and serves listing photos globally with low latency. Reduces load on origin servers and provides excellent user experience worldwide.

Listing Creation Flow:

  1. The host fills out listing details in the client application and uploads photos, sending a POST request to /listings.
  2. The API Gateway authenticates the request, validates rate limits, and forwards to the Listing Service.
  3. The Listing Service validates the data, generates pre-signed URLs for photo uploads, and creates a listing record in the database.
  4. Photos are uploaded directly to S3 from the client using pre-signed URLs, bypassing the backend for better performance.
  5. The Listing Service publishes a “listing created” event to Kafka for downstream processing.
  6. The service returns the created Listing entity with photo URLs to the client.
2. Guests should be able to search for properties by location, dates, and filters

We need to introduce new components optimized for search:

  • Search Service: Handles all search queries from guests. Translates search parameters into database queries, applies filters and ranking, coordinates with the availability service to filter unavailable listings, implements pagination for large result sets, and caches popular searches for performance.
  • Elasticsearch Cluster: Specialized search engine optimized for geospatial queries. Stores denormalized listing data with geo_point fields for location-based searches, maintains search indices updated via Kafka events, provides sub-200ms query performance, and supports complex filtering on amenities, price, property type, and ratings.
  • Cache Layer (Redis): Caches frequently accessed search results and availability data. Reduces database load, improves response times for popular searches, stores hot listing availability with short TTL, and maintains cache consistency through event-driven invalidation.
  • ML Ranking Service: Provides personalized search result ranking. Analyzes user preferences and booking history, scores listings based on likelihood of booking, considers factors like location relevance, price competitiveness, listing quality, and host reputation, and returns ranked results optimized for conversion.

Search Flow:

  1. The guest enters location, dates, and filters into the client app, sending a POST request to /search.
  2. The API Gateway forwards the request to the Search Service after authentication and rate limiting.
  3. The Search Service first checks Redis cache for recently executed similar queries using a hash of search parameters.
  4. On cache miss, it queries Elasticsearch with a geo_distance or geo_bounding_box query to find listings within the specified location.
  5. It filters results by property type, amenities, and price range using Elasticsearch’s filter clauses for performance.
  6. The Search Service queries the Availability Service to filter out listings unavailable for the requested dates.
  7. Remaining candidates are sent to the ML Ranking Service for personalized scoring and ranking.
  8. The top results are paginated and returned to the client with listing previews, photos (via CDN URLs), pricing, and availability.
  9. The results are cached in Redis with a 60-second TTL for subsequent identical queries.
3. Guests should be able to book available properties without double bookings

We extend our design with booking management and concurrency control:

  • Booking Service: Manages the entire booking lifecycle from request through completion. Implements distributed locking to prevent race conditions, validates availability with strong consistency, coordinates with payment processing, manages booking state transitions (requested, confirmed, completed, cancelled), and publishes booking events for downstream processing.
  • Availability Service: Maintains detailed availability calendars for all listings. Manages date-specific availability (2-year window pre-generated), handles blocking and unblocking of dates, enforces minimum stay requirements, detects and prevents booking conflicts, and provides fast availability checks with Redis caching.
  • Distributed Lock (Redis): Ensures only one booking process can reserve specific dates for a listing at a time. Uses SET with NX (not exists) and EX (expiration) flags for atomic lock acquisition, implements 10-second TTL to prevent deadlocks, provides lock release on booking completion or failure, and prevents double bookings under high concurrency.
  • Booking Database (PostgreSQL): Stores all booking records with ACID guarantees. Maintains booking details, uses SELECT FOR UPDATE for pessimistic locking, tracks booking history and status changes, and ensures financial transaction consistency.

Booking Flow:

  1. The guest selects a listing and dates, sending a POST request to /bookings with listing ID and date range.
  2. The API Gateway routes the request to the Booking Service after authentication.
  3. The Booking Service generates a lock key based on listing ID and date range, then attempts to acquire a distributed lock in Redis using SET NX with 10-second TTL.
  4. If the lock is acquired successfully, the service proceeds; otherwise, it returns an error indicating concurrent booking in progress.
  5. With the lock held, it queries the Booking Database with SELECT FOR UPDATE to check for overlapping confirmed bookings.
  6. If no conflicts exist, it creates a pending booking record in a database transaction, updates the availability calendar to mark dates as reserved, and commits the transaction.
  7. The service releases the distributed lock and triggers the payment authorization flow.
  8. It publishes a “booking created” event to Kafka for downstream processing (notifications, search index updates, analytics).
  9. The booking details are returned to the client with payment instructions.
4. The system should process payments securely with escrow

We add payment processing components:

  • Payment Service: Manages all payment operations. Integrates with Stripe for payment processing, implements authorization and capture flow (authorize on booking, capture after check-in), manages escrow holding periods (24 hours post check-in), processes refunds based on cancellation policies, handles host payouts with platform fee deduction, implements idempotency to prevent duplicate charges, and maintains payment audit logs.
  • Stripe API: Third-party payment processor handling actual financial transactions. Provides PaymentIntent API for authorization and capture, stores payment methods securely (PCI compliant), handles card processing, fraud detection, and chargebacks, supports Connect API for host payouts, and provides webhook notifications for payment events.
  • Payment Database (PostgreSQL): Stores payment records with full audit trail. Tracks payment intents, authorizations, captures, refunds, and host payouts with transaction history.

Payment Flow:

  1. After a booking is created, the Payment Service is called to authorize payment.
  2. It retrieves the guest’s payment method from the User Service (stored Stripe payment method ID).
  3. The service generates an idempotency key using booking ID and timestamp to prevent duplicate charges.
  4. It calls Stripe’s PaymentIntent API with capture_method set to “manual” to authorize but not immediately capture the payment.
  5. Stripe validates the payment method, checks for sufficient funds, and places a hold on the amount.
  6. The Payment Service stores the payment record with status “authorized” and the Stripe PaymentIntent ID.
  7. A scheduled job monitors bookings for check-in completion, waiting 24 hours after check-in to ensure no issues arise.
  8. After the escrow period, the Payment Service calls Stripe to capture the authorized payment, transferring funds from the guest.
  9. It updates the payment status to “captured” and triggers the host payout process.
  10. The host payout subtracts the platform fee (typically 3%), then transfers remaining funds to the host’s connected Stripe account.
5. Guests and hosts should review each other through a dual-blind system

We add review management components:

  • Review Service: Manages the review system and fraud detection. Stores reviews from both guests and hosts, implements dual-blind revelation (reviews hidden until both parties submit), enforces review eligibility (must complete booking), calculates aggregate ratings and trust scores, implements fraud detection algorithms, and handles review disputes and moderation.
  • Fraud Detection Engine: Protects platform trust through multi-layered analysis. Detects behavioral anomalies (review velocity, patterns, new accounts), analyzes content for similarity and bot-generated text, identifies network patterns (review rings, IP clustering), checks sentiment-rating consistency, assigns fraud scores, and flags suspicious reviews for manual review.
  • Review Database (PostgreSQL): Stores all reviews with metadata for fraud analysis. Maintains review text, ratings, submission timestamps, visibility status, fraud scores, and reviewer/reviewee relationships.

Review Submission Flow:

  1. After a booking is completed, both guest and host can submit reviews within a time window (typically 14 days).
  2. The client sends a POST request to /reviews with booking ID, rating, and review text.
  3. The Review Service validates that the user is eligible (completed the booking and hasn’t already reviewed).
  4. It passes the review through the Fraud Detection Engine, which analyzes behavioral signals (review velocity, account age), content signals (text similarity with previous reviews, sentiment analysis), and network signals (IP address, device fingerprinting).
  5. The fraud detection assigns a score; if the score exceeds threshold, the review is rejected or flagged for manual review.
  6. Valid reviews are stored in the database with status “hidden” initially.
  7. When both parties (guest and host) have submitted reviews, a trigger reveals both reviews simultaneously by updating their visibility status.
  8. The Review Service recalculates aggregate ratings for the listing and trust scores for the users.
  9. A notification is sent to both parties that reviews are now visible.

Step 3: Design Deep Dive

With the core functional requirements met, it’s time to dig into the non-functional requirements via deep dives. These are the critical areas that separate good designs from great ones.

Deep Dive 1: How do we efficiently search millions of listings with sub-200ms latency?

Searching through millions of listings with complex geospatial queries, availability filtering, and multi-dimensional filters requires specialized infrastructure.

Problem: Traditional Database Limitations

Using PostgreSQL or DynamoDB with simple location queries would require full table scans or complex index management. Even with lat/long indexes, traditional B-tree indexes aren’t optimized for two-dimensional geospatial data. Joining with availability data and applying multiple filters would further degrade performance.

Solution: Elasticsearch with Geo Queries

Elasticsearch provides specialized geospatial capabilities optimized for our use case:

Index Design: We maintain a denormalized Elasticsearch index with listing data structured for fast queries. Each document contains the listing ID, title, description, location as a geo_point type, price per night, property type and amenities as keyword fields, aggregate rating and review count, superhost and instant book flags, guest capacity and bedroom count, an availability bitmap for quick date checking, and the last update timestamp.

Geospatial Query Types: Elasticsearch supports several geo query types. The geo_distance query finds listings within a radius of coordinates (e.g., within 10km of downtown). The geo_bounding_box query finds listings within a rectangular area (useful for map viewport queries). The geo_shape query handles complex polygons for neighborhood boundaries.

Query Execution: When a search request arrives, we construct an Elasticsearch query with a geo_distance or geo_bounding_box in the must clause. We add filters for price range, property type, amenities, capacity, instant book, and superhost status in the filter clause (cached and fast). The query is sorted by geo_distance for proximity ordering or by our custom scoring function for relevance.

Availability Filtering - Two-Phase Approach: Phase one executes the Elasticsearch query to retrieve 1,000 candidate listings based on location and static filters. Phase two filters these candidates by availability using either Redis cache (for hot listings with TTL of 60 seconds) or PostgreSQL query (for cold listings or exact availability). This two-phase approach balances speed and accuracy.

Availability Bitmap Optimization: We store a 365-day availability bitmap in Elasticsearch for quick pre-filtering. Each bit represents one day (1 for available, 0 for booked). We perform a quick bitwise AND operation with the requested date range to eliminate obviously unavailable listings. For listings passing the bitmap check, we verify exact availability in PostgreSQL before returning results.

Cluster Configuration: The Elasticsearch cluster is sharded by geographic region (US-West, US-East, Europe, Asia-Pacific) for query performance. Each shard has three replicas for read scaling and fault tolerance. We use a hot-warm-cold architecture where recent and popular listings are on SSD nodes, older listings move to HDD nodes, and inactive listings are archived.

Caching Strategy: We implement multi-level caching. Layer one is Redis cache for popular searches with 60-second TTL, using a hash of location, dates, and filters as the key. Layer two is Elasticsearch’s query cache that automatically caches filter clauses. Layer three is CDN caching for search result pages with 30-second TTL.

Index Updates: The Elasticsearch index is updated asynchronously via Kafka. When listings are created or updated, the Listing Service publishes events to Kafka. A consumer service reads these events and updates the Elasticsearch index. The refresh interval is 5 seconds, providing near-real-time search results while optimizing indexing performance.

Deep Dive 2: How do we prevent double bookings under high concurrency?

Preventing double bookings is critical for platform trust and requires strong consistency guarantees. With thousands of concurrent booking requests, race conditions are likely without proper coordination.

Problem: Race Conditions

Consider two guests simultaneously trying to book the same dates for a listing. Without coordination, both might check availability, find the dates free, and create bookings, resulting in a double booking. Even with database transactions, the time between checking availability and creating the booking creates a race window.

Solution: Distributed Locks + Pessimistic Database Locking

We implement a two-layer locking strategy for robustness:

Layer 1: Distributed Lock (Redis): When a booking request arrives, we immediately attempt to acquire a distributed lock using Redis SET with NX and EX flags. The lock key is composed of listing ID and the date range (e.g., “lock:listing:12345:2025-03-01:2025-03-05”). The lock value is a unique token (UUID) to ensure only the lock owner can release it. The TTL is set to 10 seconds to prevent deadlocks if the service crashes.

If the lock acquisition fails, it means another booking process is already working on these dates. We immediately return an error to the client asking them to retry. If successful, we proceed to the database layer.

Layer 2: Pessimistic Database Locking: With the distributed lock held, we query the Booking Database using SELECT FOR UPDATE SKIP LOCKED to lock rows representing overlapping bookings. The query checks for any confirmed or pending bookings where the dates overlap with the requested range. The SKIP LOCKED clause means if another transaction holds the lock, we skip rather than wait, quickly detecting conflicts.

If conflicting bookings exist, we rollback, release the distributed lock, and return an error. If no conflicts exist, we proceed with the booking creation within the same transaction.

Booking Creation: Still within the transaction, we insert a new booking record with status “pending”. We update the availability_calendar table to mark each date in the range as unavailable and link it to the booking ID. We commit the transaction, making the booking and availability updates atomic.

Lock Release: After the transaction commits (or rolls back on error), we release the distributed lock using a Lua script to ensure we only delete the lock if we own it (token matches). This prevents accidentally releasing another process’s lock if ours expired.

Edge Cases Handled: Lock expiration during processing is handled by implementing lock refresh/extension if the booking process takes longer than expected. Partial booking conflicts (where check-in or check-out dates overlap) are caught by comprehensive date range logic in the SQL query. The system handles cancellations by marking bookings as cancelled, releasing dates in the availability calendar, and publishing events to update search indices.

Database Schema Design: The bookings table includes listing ID, guest ID, host ID, check-in and check-out dates, status, total price, and timestamps. An index on (listing_id, check_in_date, check_out_date) filtered by status enables fast conflict checking. The availability_calendar table has a composite primary key of (listing_id, date) with columns for availability status, booking ID, price override, and minimum stay.

Deep Dive 3: How do we implement dynamic pricing based on demand and competition?

Dynamic pricing helps hosts maximize revenue by adjusting prices based on market conditions. The system needs to analyze historical demand, seasonality, competitive pricing, and listing quality to recommend optimal prices.

Challenge: Multi-Factor Pricing Optimization

Pricing decisions depend on numerous factors: historical occupancy rates, seasonal demand patterns, local events and holidays, competitive pricing from similar listings, listing quality metrics (reviews, superhost status), lead time (days until check-in), cancellation policy flexibility, minimum stay requirements, and recent booking velocity.

Solution: ML-Based Pricing Engine

The pricing engine consists of several components working together:

Feature Engineering: We extract features from multiple data sources. From the listing itself, we get bedrooms, bathrooms, capacity, property type, amenities count, average rating, review count, superhost status, and instant book availability. From historical data, we calculate occupancy rate over the last 90 days, average booked price, and booking velocity. From competitive analysis, we determine median price of similar listings within 1km, the 25th and 75th percentile prices, and the listing’s price position relative to competitors. From temporal factors, we identify days until check-in, day of week, whether it’s a weekend, holiday flags, and seasonal indicators. From market demand, we calculate the percentage of nearby listings booked and demand growth trends.

Model Architecture: We use gradient boosting (XGBoost or LightGBM) for price prediction due to its excellent performance on tabular data with mixed feature types. The model is trained on historical booking data with 100 million samples. The target variable is the actual booked price (not asking price) to optimize for conversion. Features include the 50+ engineered features described above. The model outputs a base price recommendation with confidence intervals.

Training Pipeline: The model is retrained weekly on the latest booking data. We use time-based train-test split to prevent lookahead bias. Hyperparameters are tuned using cross-validation optimizing for RMSE. Feature importance analysis helps identify key pricing drivers. A/B testing validates model improvements before deployment.

Business Rules Layer: After the model prediction, we apply business constraints. We respect host-specified minimum and maximum price boundaries. We round to friendly price points ($95 instead of $97.32) for psychological pricing. We cap surge pricing to prevent extreme increases. We smooth day-to-day variations to avoid price whiplash. We adjust for promotional campaigns and platform incentives.

Pricing Recommendations: The service provides hosts with a base price recommendation for each date, weekend premium suggestions, seasonal adjustment factors, expected booking probability at different price points, and revenue optimization suggestions (e.g., lower price on low-demand dates to increase occupancy).

Real-Time Updates: Base prices are recalculated nightly in batch jobs for all listings. Real-time adjustments are made for surge pricing during high-demand events like conferences or festivals. The system performs A/B testing of different pricing strategies to continuously improve recommendations. Hosts can override with manual pricing if they prefer.

Deep Dive 4: How do we detect and prevent review fraud?

Review fraud undermines platform trust and misleads users. We need comprehensive detection systems to identify fake reviews, review bombing, incentivized reviews, and review rings.

Challenge: Multi-Dimensional Fraud Detection

Fraudsters employ various tactics: posting fake positive reviews to boost ratings, coordinating review rings where users review each other, generating reviews via bots with templated text, creating new accounts to circumvent restrictions, paying for positive reviews, and posting retaliatory negative reviews.

Solution: Multi-Layered Fraud Detection System

Behavioral Signals: We monitor review velocity by tracking reviews per user per time period (suspicious if more than 10 reviews in 7 days). We analyze review patterns like always posting 5 stars or always 1 star. We consider account age where new accounts posting reviews are flagged. We enforce booking verification ensuring reviews only come from completed stays.

Content Signals: We perform text similarity detection using TF-IDF and cosine similarity to catch copy-paste reviews. We identify language patterns indicative of bot-generated content using statistical analysis. We check sentiment-rating consistency flagging cases where 5-star reviews contain negative sentiment or vice versa. We detect excessive keyword stuffing typical of spam reviews.

Network Signals: We analyze IP address clustering where multiple reviews from the same IP in short time span are suspicious. We use device fingerprinting to detect multiple accounts from the same device. We perform social graph analysis to identify review rings where groups of users systematically review each other’s listings.

Historical Signals: We examine host and guest review history for patterns. We check dispute history for users involved in previous fraud investigations. We consider verification status where unverified accounts receive higher scrutiny. We track previous fraud flags and ban evasion attempts.

Fraud Scoring Algorithm: When a review is submitted, we extract all relevant features from the categories above. We calculate individual signal scores for each category. An ML model (trained on labeled fraud examples) predicts fraud probability. We compute a composite fraud score from 0 to 100. If the score exceeds 70, the review is automatically rejected. If the score is between 40 and 70, the review is flagged for manual moderation. Reviews with scores below 40 are automatically approved.

Dual-Blind Review System: The dual-blind review mechanism prevents retaliatory reviews. Both guest and host must submit reviews independently. Reviews remain hidden until both parties submit or the review window expires (14 days). Once both submit, reviews are revealed simultaneously. This prevents one party from tailoring their review based on the other’s feedback.

Continuous Improvement: We maintain a feedback loop where human moderators label flagged reviews as fraud or legitimate. These labels are fed back into the ML model for retraining. We track false positive and false negative rates to tune thresholds. We periodically audit high-rated listings and top reviewers for fraud patterns.

Deep Dive 5: How do we handle high-frequency availability calendar updates?

Managing availability calendars for millions of listings with frequent updates (bookings, cancellations, host blocks) while providing fast lookups for search queries is challenging.

Challenge: Availability Calendar Complexity

Each listing maintains availability for a 2-year booking window (730 days). Updates occur on every booking, cancellation, host manual block, minimum stay change, and price override. Search queries need to filter by availability for date ranges, requiring fast lookups. Calendar operations must be transactional to prevent corruption. Caching is difficult due to high update frequency.

Solution: Calendar Service with Redis Cache + PostgreSQL

Schema Design: The availability_calendar table uses a composite primary key of (listing_id, date). Each row represents a single date for a listing with columns for is_available (boolean), booking_id (reference if booked), price_override (optional custom price for this date), minimum_stay (integer, defaults to 1), blocked_reason (enum: booked, host_blocked, maintenance), and timestamps for created and updated times.

Calendar Pre-Generation: When a listing is created, we pre-generate 2 years of calendar data (730 rows) with default availability set to true. This avoids sparse data issues and makes queries simpler. A scheduled job extends the calendar window as time passes, always maintaining 2 years of future data.

Availability Lookup: To check if dates are available, we query the availability_calendar table for the listing ID and date range. We check if all dates in the range have is_available as true. We verify that no booking_id is set for any date in the range. We check that the stay duration meets the maximum minimum_stay requirement across the range. This query uses the primary key index and is very fast.

Caching Strategy: We cache availability data in Redis with a two-tier approach. For hot listings (frequently viewed), we cache the entire 90-day availability bitmap with 60-second TTL. For specific date range queries, we cache the availability result with the cache key composed of listing ID and date range. When availability changes (booking, cancellation), we invalidate affected cache entries.

Blocking Dates (Booking): To block dates for a booking, we begin a database transaction, use SELECT FOR UPDATE to lock the relevant rows in the availability calendar for the date range, check if all dates are currently available, update all dates to set is_available to false, set booking_id to the new booking ID, set blocked_reason to ‘booked’, commit the transaction, invalidate Redis cache for the listing, and publish an availability changed event to Kafka for search index updates.

Unblocking Dates (Cancellation): To release dates, we begin a transaction, lock the relevant calendar rows, update dates to set is_available back to true, clear the booking_id, clear the blocked_reason, commit the transaction, invalidate cache, and publish events.

Minimum Stay Validation: Before accepting a booking, we validate minimum stay requirements by querying the maximum minimum_stay value across the date range. If the actual stay duration is less than this maximum, we reject the booking with an error message indicating the required minimum stay.

Conflict Resolution: The combination of distributed locks (in the Booking Service) and database row locking (SELECT FOR UPDATE) ensures that concurrent attempts to book the same dates are serialized. Only one booking succeeds while others receive conflict errors.

Deep Dive 6: How do we process payments securely with escrow and handle edge cases?

Payment processing requires PCI compliance, escrow management, fraud prevention, refund handling, and host payouts with platform fees.

Challenge: Payment Lifecycle Management

The payment flow is complex: authorize payment when booking is created (don’t charge yet), hold funds in escrow until check-in completion, handle booking cancellations with policy-based refunds, capture payment 24 hours after check-in to allow issue resolution, distribute payouts to hosts minus platform fees, manage payment failures and retries, handle disputes and chargebacks, and support multiple currencies and payment methods.

Solution: Payment Service with Stripe Integration

Authorization Flow: When a booking is created, the Payment Service is called to authorize payment. We retrieve the guest’s stored payment method from Stripe (payment method ID stored in our User database). We generate an idempotency key using booking ID and timestamp to prevent duplicate charges if the request is retried. We call Stripe’s PaymentIntent API with the amount in cents, currency, customer ID, payment method ID, and critically, capture_method set to “manual”. This authorizes the payment and places a hold but doesn’t capture funds yet. Stripe validates the payment method, checks for sufficient funds or credit, performs fraud checks, and returns a PaymentIntent ID. We store a payment record in our database with status “authorized” and the PaymentIntent ID.

Escrow Period: After authorization, funds are held by Stripe but not transferred to our platform account yet. The guest sees the charge as pending on their card statement. We maintain the booking in “confirmed” status. We monitor for check-in completion based on the check-in date. After check-in occurs, we wait an additional 24 hours (escrow period) to allow for issue resolution. This protects both parties: guests can report serious issues before we capture payment, and hosts are assured of payment after successful check-in.

Payment Capture: A scheduled job (cron or Lambda) runs hourly to identify bookings ready for capture where check-in date has passed, 24 hours have elapsed since check-in, payment status is “authorized”, and no disputes are open. For each eligible booking, we call Stripe’s PaymentIntent capture API, update our payment record to status “captured”, record the capture timestamp, and trigger the host payout flow.

Host Payout: After capturing payment from the guest, we process the host payout. We retrieve the host’s Stripe Connect account ID from our User database. We calculate the platform fee (typically 3% host service fee) and the net payout amount. We call Stripe’s Transfer API to send funds from our platform account to the host’s connected account with metadata including booking ID. We record the payout in our database with the transfer ID, payout amount, platform fee, and status “completed”.

Refund Handling: When a booking is cancelled, the refund amount depends on the cancellation policy (flexible, moderate, strict). The Payment Service calculates the refund amount based on the policy and time until check-in. If payment is still in “authorized” status, we cancel the authorization via Stripe’s PaymentIntent cancel API, releasing the hold. If payment is already “captured”, we create a Stripe Refund for the calculated amount. We record the refund in our database with the refund ID, amount, reason, and timestamp. We update the booking status to “cancelled” and publish events.

Idempotency: All payment operations use idempotency keys to ensure requests can be safely retried without duplicating charges. Stripe guarantees that requests with the same idempotency key return the same result. We store idempotency keys in our database and check for duplicates before processing.

Error Handling: Payment failures (declined cards, insufficient funds, fraud blocks) are handled by returning specific error codes to the client. We implement retry logic with exponential backoff for transient failures. Failed payments result in booking cancellation after retry exhaustion. We log all payment operations for audit and debugging.

Deep Dive 7: How do we personalize search results with machine learning?

Generic search results don’t optimize for individual user preferences. Personalized ranking can significantly improve booking conversion rates.

Challenge: Search Result Ranking

Different users have different preferences: some prioritize price, others prefer highly rated properties. Location relevance varies by trip purpose (business vs. leisure). Previous booking history indicates preferences. Listing quality and host reputation matter differently to different users. Engagement signals (views, wishlists) provide implicit feedback.

Solution: Two-Stage Ranking (Retrieval + Ranking)

Stage 1: Retrieval (Elasticsearch): This stage quickly generates candidates (1,000s of listings) based on hard filters. We perform geospatial queries to find listings in the target location. We apply filters for price range, dates, amenities, property type, and capacity. We use Elasticsearch scoring for basic relevance (text match, distance). This stage prioritizes recall and speed, returning many candidates quickly.

Stage 2: Ranking (ML Model): This stage re-ranks the top candidates (50-100) for personalization. We extract detailed features for each listing, apply an ML model to predict booking probability, and sort by predicted score to optimize for conversion.

Feature Extraction: For each listing-user pair, we extract rich features. Listing quality features include average rating, review count, host response rate, superhost status, instant book availability, and listing age. Price features include price per night, price percentile in the area, and price relative to user’s search budget. Location features include distance from search center, neighborhood popularity score, and proximity to landmarks. Availability features include available nights in the next 30 days and recent booking rate. User personalization features include user’s previous booking count, average price of past bookings, preferred property types inferred from history, amenity preferences, and whether the user has booked with this host before. Engagement features include listing views in the last 7 days, wishlist count, and conversion rate (views to bookings ratio). Time-based features include whether the search is for weekends, days until check-in, and time of day for the search.

Model Architecture: We use a Learning to Rank (LTR) approach with LambdaMART or a neural network architecture. The model is trained on historical search and booking data (100+ million samples). The label is binary: 1 if the user booked the listing, 0 if they viewed but didn’t book. The loss function optimizes ranking quality (NDCG, MAP) rather than classification accuracy. Features include the 50+ features described above plus feature crosses.

Training Pipeline: We collect training data from search logs and booking events. We sample negative examples (viewed but not booked) at a ratio of 10:1 with positive examples. We split data by time to prevent lookahead bias. We retrain the model weekly on fresh data. We evaluate using offline metrics (NDCG, MRR) and online A/B tests (booking conversion rate). We gradually roll out model updates to monitor for regressions.

Online Serving: When a search request arrives, the Search Service retrieves 1,000 candidates from Elasticsearch. It sends the top 100 candidates to the ML Ranking Service with user context. The ranking service extracts features for each listing-user pair in parallel. It runs the model to predict booking probability for each listing. It sorts listings by predicted score and returns the ranked order. The Search Service returns the top 20 results to the client. Results are personalized per user while maintaining fairness to hosts.

Performance Optimization: Feature extraction is parallelized across listings. Hot features are cached in Redis (user preferences, listing aggregate stats). We use model serving infrastructure (TensorFlow Serving, SageMaker) for low latency. We set tight SLA for ranking (add max 50ms to search latency). We fall back to non-personalized ranking if the ML service is slow or unavailable.

Step 4: Wrap Up

In this chapter, we proposed a system design for a vacation rental marketplace like Airbnb. If there is extra time at the end of the interview, here are additional points to discuss:

Additional Features:

  • Messaging system: Real-time chat between guests and hosts using WebSockets for instant communication.
  • Advanced calendar management: Hosts can set custom pricing by date, seasonal rules, and last-minute discounts.
  • Wishlists and saved searches: Users can save favorite listings and receive notifications when prices drop.
  • Experiences and activities: Expand beyond accommodations to bookable experiences hosted by locals.
  • Multi-listing management: Tools for professional hosts managing multiple properties.

Scaling Considerations:

  • Horizontal Scaling: All services are stateless and can scale independently based on load. The Listing Service, Search Service, Booking Service, and Payment Service can each scale to hundreds of instances.
  • Database Sharding: Shard listings and bookings by geographic region for locality and performance. User data can be sharded by user ID hash.
  • Read Replicas: Use read replicas for analytics queries and non-critical reads to offload the primary database.
  • Elasticsearch Scaling: Add nodes to the Elasticsearch cluster to handle increased query load and index size.
  • Message Queue Scaling: Use partitioned Kafka topics for parallel event processing across consumers.

Error Handling:

  • Network Failures: Implement retry logic with exponential backoff for all external API calls (Stripe, mapping services).
  • Service Failures: Use circuit breakers to prevent cascading failures when downstream services are unhealthy.
  • Database Failures: Automatic failover to replica databases with minimal downtime.
  • Payment Failures: Queue failed payments for retry with alerting for manual intervention if needed.
  • Search Degradation: Fall back to simpler queries or cached results if Elasticsearch is struggling.

Security Considerations:

  • Encrypt sensitive data in transit using TLS 1.3 for all API communication.
  • Encrypt sensitive data at rest including payment details and personal information.
  • Implement rate limiting per user and per IP to prevent abuse and DDoS attacks.
  • Use JWT tokens with short expiration for authentication and authorization.
  • Perform input validation and sanitization to prevent injection attacks.
  • Implement CSRF protection for state-changing operations.
  • Regular security audits and penetration testing.

Monitoring and Analytics:

  • Track key metrics: search latency (p50, p95, p99), booking conversion rate, payment success rate, double booking incidents (target: zero), cache hit ratio, and service health.
  • Implement distributed tracing (using tools like Jaeger or X-Ray) to track requests across microservices.
  • Set up alerting for anomalies: search latency exceeding SLA, booking errors, payment failures, Elasticsearch cluster health issues, and database replication lag.
  • Build dashboards for business metrics: bookings per day, revenue, host earnings, guest acquisition, and platform growth.
  • A/B testing framework for pricing strategies, search ranking algorithms, and UI changes.

Future Improvements:

  • Real-time pricing: Continuously adjust prices based on live demand signals and competitor pricing.
  • Smart calendars: ML-powered availability and pricing suggestions for hosts to maximize revenue.
  • Video tours: Support for 360-degree video walkthroughs of properties.
  • AI-powered customer support: Chatbots for common guest and host questions.
  • Blockchain payments: Support for cryptocurrency payments for international transactions.
  • Carbon footprint tracking: Calculate and display environmental impact of trips to support sustainable travel.
  • Voice search: Integration with smart assistants for voice-based search and booking.

Congratulations on getting this far! Designing Airbnb is a complex system design challenge that touches on many important distributed systems concepts including geospatial search, distributed locking, payment processing, fraud detection, and machine learning. The key is to start simple, satisfy functional requirements first, then layer in the non-functional requirements and optimizations.


Summary

This comprehensive guide covered the design of a vacation rental marketplace like Airbnb, including:

  1. Core Functionality: Listing management, geospatial search, booking with concurrency control, payment processing with escrow, and dual-blind review system.
  2. Key Challenges: Efficient search at scale, preventing double bookings, dynamic pricing optimization, review fraud detection, calendar conflict resolution, and payment lifecycle management.
  3. Solutions: Elasticsearch for geospatial search, distributed locks with Redis for booking concurrency, ML-based pricing and ranking models, multi-layered fraud detection, and Stripe integration for payments.
  4. Scalability: Horizontal service scaling, database sharding, caching strategies, async processing, and message queues for reliability.

The design demonstrates how to handle complex marketplace dynamics with strong consistency requirements, sophisticated fraud prevention, and personalization at scale while maintaining high availability and performance.