Design Postmates
Postmates is an on-demand delivery platform that connects customers with couriers who can deliver anything from local stores and restaurants. The system must handle millions of orders daily, match couriers efficiently, optimize routes for multi-stop deliveries, and provide real-time tracking with accurate ETAs.
Designing Postmates presents unique challenges including real-time courier location tracking, efficient multi-stop route optimization, dynamic ETA calculation with traffic data, batching multiple orders for the same courier, and handling peak demand during meal times while maintaining delivery quality.
Step 1: Understand the Problem and Establish Design Scope
Before diving into the design, it’s crucial to define the functional and non-functional requirements. For user-facing applications like this, functional requirements are the “Users should be able to…” statements, whereas non-functional requirements define system qualities via “The system should…” statements.
Functional Requirements
Core Requirements (Priority 1-3):
- Customers should be able to browse merchant catalogs (restaurants, stores, pharmacies) and place orders for delivery.
- Customers should be able to track orders in real-time with live courier location and receive accurate ETA updates.
- Upon order placement, the system should match a courier efficiently, considering multi-stop delivery optimization.
- Couriers should be able to accept/reject delivery offers, navigate with optimized routes, and handle multi-stop deliveries.
Below the Line (Out of Scope):
- Customers should be able to schedule deliveries for specific times.
- Merchants should be able to manage their catalog, prices, and availability through a dashboard.
- Merchants should be able to integrate with their POS systems.
- Customers should be able to rate couriers and merchants post-delivery.
- The system should handle promotions, discounts, and loyalty programs.
Non-Functional Requirements
Core Requirements:
- The system should prioritize low latency courier matching (< 3 seconds).
- The system should ensure real-time tracking updates with < 1 second latency.
- The system should maintain strong consistency for payment transactions and order state transitions.
- The system should handle peak traffic during lunch and dinner hours (3x average load, approximately 2000 orders/second).
Below the Line (Out of Scope):
- The system should ensure 99.99% uptime for order placement.
- The system should comply with data privacy regulations and secure user information.
- The system should be resilient to failures with robust monitoring and alerting.
- The system should support fraud detection and prevention mechanisms.
Clarification Questions & Assumptions:
- Platform: Mobile apps for customers and couriers, web dashboard for merchants.
- Scale: 10 million daily active users, 500,000 active couriers globally, 1 million merchants.
- Order Volume: 5 million orders per day with peak hours handling 2000 orders per second.
- Location Update Frequency: Couriers update their location roughly every 5 seconds, resulting in approximately 100,000 updates per second.
- Geographic Coverage: Global coverage with focus on major metropolitan areas.
- Payment: Integrated with third-party payment processors (Stripe, PayPal).
Step 2: Propose High-Level Design and Get Buy-in
Planning the Approach
Before moving on to designing the system, it’s important to plan your strategy. For user-facing product-style questions, the plan should be straightforward: build your design up sequentially, going one by one through your functional requirements. This will help you stay focused and ensure you don’t get lost in the weeds.
Defining the Core Entities
To satisfy our key functional requirements, we’ll need the following entities:
Customer: Any user who uses the platform to order items for delivery. Includes personal information, delivery addresses, payment methods, and order history.
Courier: Any user registered as a courier on the platform who provides delivery services. Contains personal details, vehicle information, current location, availability status, capacity (how many orders they can handle), and performance metrics (rating, acceptance rate).
Merchant: Any business that offers items for delivery. Includes business details, catalog of items with prices and availability, delivery zones, operating hours, and POS integration details.
Order: An individual order from creation to delivery completion. Records customer identity, merchant identity, ordered items, pricing breakdown, delivery address, order state (created, confirmed, assigned, picked up, delivered), and timestamps for each state transition.
Fare: An estimated delivery fee for an order. Includes pickup and destination locations, distance, estimated delivery time, base fee, surge multiplier, and total estimated cost.
Route: The optimized path for a courier handling multiple orders. Contains an ordered list of stops (pickups and dropoffs), estimated travel time for each segment, total distance, and constraints (time windows, precedence).
API Design
Create Order Endpoint: Used by customers to place an order after selecting items from a merchant catalog.
POST /orders -> Order
Body: {
merchantId: string,
items: [{ itemId, quantity, specialInstructions }],
deliveryAddress: { lat, long, streetAddress },
paymentMethodId: string
}
Get Fare Estimate Endpoint: Used by customers to get an estimated delivery fee before confirming their order.
POST /fare -> Fare
Body: {
merchantLocation: { lat, long },
deliveryLocation: { lat, long },
orderSubtotal: number
}
Update Courier Location Endpoint: Used by couriers to update their location in real-time. Called periodically by the courier client.
POST /couriers/location -> Success/Error
Body: {
lat: number,
long: number,
heading: number
}
Note: The courierId is present in the session cookie or JWT and not in the body or path params. Always consider security implications - never trust data sent from the client as it can be easily manipulated.
Accept Delivery Offer Endpoint: Allows couriers to accept or reject a delivery offer. Upon acceptance, the system updates the order status and provides the courier with navigation details.
PATCH /deliveries/:orderId -> Delivery
Body: {
action: "accept" | "reject"
}
Update Order Status Endpoint: Used by couriers to update the order status as they progress through the delivery (arrived at merchant, picked up, delivered).
PATCH /orders/:orderId/status -> Order
Body: {
status: "at_merchant" | "picked_up" | "delivered",
timestamp: ISO8601,
location: { lat, long }
}
High-Level Architecture
Let’s build up the system sequentially, addressing each functional requirement:
1. Customers should be able to browse merchant catalogs and place orders for delivery
The core components necessary to fulfill order placement are:
- Customer Client: The primary touchpoint for users, available on iOS and Android. Interfaces with the system’s backend services to browse catalogs, place orders, and track deliveries.
- API Gateway: Acts as the entry point for client requests, routing requests to appropriate microservices. Manages cross-cutting concerns such as authentication, rate limiting, and request validation.
- Merchant Service: Manages merchant profiles, catalogs, and availability. Stores menu items with prices, descriptions, and availability status. Provides search functionality to help customers discover merchants and items.
- Order Service: Manages order lifecycle from creation to completion. Validates orders against merchant catalog, calculates pricing, processes order creation, manages order state transitions, and maintains order history.
- Pricing Engine: Calculates delivery fees based on distance, demand, time of day, and other factors. Implements dynamic pricing (surge pricing) during peak hours and applies promotions and discounts.
- Database: Stores Order, Merchant, and Customer entities. Uses PostgreSQL for transactional data requiring strong consistency.
- Search Index: Uses Elasticsearch to enable fast full-text search across merchant catalogs. Indexed by item name, merchant name, cuisine type, and location for efficient discovery.
Order Placement Flow:
- The customer browses merchants and items through the client app, which queries the Merchant Service via the API Gateway.
- The customer selects items and adds them to their cart, then requests a fare estimate by calling the Pricing Engine.
- The Pricing Engine calculates the delivery fee based on distance, time, and demand, returning an estimate to the customer.
- The customer confirms the order, sending a POST request to the Order Service with order details.
- The Order Service validates item availability with the Merchant Service, creates a new Order entity in the database with status “created”, and initiates the courier matching process.
- The Order Service returns the order confirmation to the customer, including estimated delivery time.
2. Customers should be able to track orders in real-time with live courier location and accurate ETA updates
We extend our existing design to support real-time tracking:
- Tracking Service: Provides real-time location streaming to customers. Manages WebSocket connections for live tracking, calculates dynamic ETAs based on courier location and traffic, and sends push notifications for order updates.
- Location Service: Manages real-time location data of couriers. Receives location updates from couriers every 5 seconds, stores location in a geospatial data store for efficient proximity queries, and provides location data to the Tracking Service.
- Notification Service: Dispatches notifications to customers and couriers. Sends push notifications via APN and FCM for order status updates, uses SMS for critical events, and sends email notifications for receipts.
- Redis Geo Store: Uses Redis with GEO commands to store courier locations with geospatial indexing. Enables efficient proximity searches and supports TTL for stale location cleanup.
Real-Time Tracking Flow:
- After an order is placed and a courier is assigned, the customer opens the tracking screen in the app.
- The client establishes a WebSocket connection with the Tracking Service.
- The courier continuously sends location updates to the Location Service as they move.
- The Location Service stores locations in Redis using GEOADD and publishes updates to a pub/sub channel.
- The Tracking Service subscribes to location updates for active orders and pushes them to connected customers via WebSocket.
- The Tracking Service recalculates ETA every 30 seconds using the courier’s current location, traffic data from mapping APIs, and remaining stops.
3. Upon order placement, the system should match a courier efficiently
We need to introduce new components to facilitate courier matching:
- Courier Service: Manages courier profiles, availability, and status. Tracks courier capacity (how many active orders), maintains performance metrics (rating, acceptance rate, on-time percentage), and handles courier earnings calculations.
- Dispatch Engine: The brain of Postmates, responsible for matching orders to couriers. Implements sophisticated algorithms considering distance, courier capacity, current route compatibility, ratings, and predicted acceptance probability. Supports batching multiple orders to the same courier for efficiency.
- Routing Service: Calculates optimal routes for multi-stop deliveries. Integrates with mapping APIs (Google Maps, Mapbox) for distance and time calculations, factors in real-time traffic data, and solves the Traveling Salesman Problem with precedence constraints (pickup before dropoff).
- Message Queue (Kafka): Ensures ride requests are not dropped during peak demand. Provides durable storage of order events, enables asynchronous processing of courier matching, and supports retry logic for failed matches.
Courier Matching Flow:
- When an order is created, the Order Service publishes an order event to Kafka.
- The Dispatch Engine consumes order events from Kafka and begins the matching workflow.
- The Dispatch Engine queries the Location Service for available couriers within the delivery zone using Redis GEORADIUS command.
- For each potential courier, the Dispatch Engine calculates a matching score based on multiple factors: distance to merchant, current capacity, courier rating, acceptance rate, and route compatibility if they have existing orders.
- The Dispatch Engine sorts couriers by score and attempts to acquire a distributed lock on the top-ranked courier using Redis SET with NX flag.
- If the lock is acquired, a delivery offer is sent to the courier via push notification.
- The system waits for courier response with a timeout (typically 10 seconds).
- If the courier accepts, the order is assigned and the lock is released. If they reject or timeout, the system tries the next courier in the ranked list.
4. Couriers should be able to accept/reject delivery offers and navigate with optimized routes
The final piece involves courier interaction and route optimization:
- Courier Client: Interface for couriers to receive delivery offers, view order details, and navigate. Provides GPS navigation with turn-by-turn directions, allows updating order status at each step, and displays earnings information.
- Route Optimizer: Advanced component within the Routing Service that solves multi-stop optimization problems. When a courier has multiple orders, it determines the optimal sequence of pickups and dropoffs to minimize total travel time while respecting constraints (pickup must precede dropoff for each order, time windows for food freshness).
Courier Accept and Navigation Flow:
- The courier receives a push notification for a new delivery offer.
- The courier opens the app and sees order details including merchant location, delivery location, estimated earnings, and estimated time.
- The courier accepts the offer by sending a PATCH request to the Order Service.
- The Order Service updates the order status to “assigned” and publishes an event.
- If the courier already has active orders, the Routing Service recalculates the optimal route considering all pickups and dropoffs.
- The Route Optimizer determines the best sequence of stops that minimizes detour while meeting delivery time constraints.
- The Courier Client displays the optimized route with navigation instructions.
- As the courier completes each step (arrives at merchant, picks up, arrives at customer, delivers), they update the order status, triggering notifications to the customer.
Step 3: Design Deep Dive
With the core functional requirements met, it’s time to dig into the non-functional requirements via deep dives. These are the critical areas that separate good designs from great ones.
Deep Dive 1: How do we efficiently match orders to couriers while supporting multi-stop deliveries?
The Dispatch Engine is the most critical component of Postmates. It must match orders to couriers quickly while optimizing for multiple objectives: fast delivery, courier efficiency, and customer satisfaction. Unlike simple ride-sharing where a driver picks up one passenger and drops them off, Postmates couriers often handle multiple orders simultaneously.
The Matching Challenge:
We need to consider several factors when scoring potential courier-order matches:
-
Distance to Merchant: Couriers closer to the merchant’s location can pick up orders faster, reducing total delivery time. We calculate the straight-line distance (haversine formula) initially, then use cached routing data for accuracy.
-
Courier Capacity: Each courier can handle multiple orders, but there’s a practical limit (typically 3-4 orders). We prioritize couriers with available capacity who can efficiently batch orders.
-
Route Compatibility: If a courier already has active orders, we need to evaluate how adding this new order affects their route. A new order that’s “on the way” is much more efficient than one requiring a significant detour.
-
Courier Quality Metrics: We factor in courier rating (customer satisfaction), acceptance rate (how often they accept offers), and completion rate (how often they complete accepted orders).
-
Time to Pickup: We predict when the courier will arrive at the merchant versus when the food will be ready. Ideally, the courier arrives right when the food is ready to minimize wait time.
Scoring Algorithm:
The Dispatch Engine assigns a score to each potential courier based on these weighted factors. The scoring function combines multiple components:
- Distance component: Higher scores for closer couriers, using an inverse distance formula.
- Capacity component: Bonus points for couriers who have capacity for more orders.
- Rating component: Multiply by the courier’s average rating.
- Acceptance rate component: Favor couriers with high historical acceptance rates to reduce matching latency.
- Route compatibility component: Calculate the detour cost if the courier has existing orders, penalizing routes that require significant extra distance.
- Timing component: Bonus if the courier’s ETA to merchant aligns with the food preparation time.
After scoring all potential couriers, the system sorts them in descending order and attempts to assign the order sequentially, starting with the highest-scored courier.
Batching Strategy:
When a new order comes in, the Dispatch Engine checks if any nearby orders can be batched together. Batching is beneficial when:
- Multiple orders are from the same merchant (easy to pick up together).
- Orders are from nearby merchants (within 1km) and going to nearby customers (within 2km).
- Adding the batched order doesn’t violate delivery time constraints.
The system uses Redis GEORADIUS to find pending orders near the new order’s merchant location. It then evaluates each potential batch for efficiency, calculating whether batching reduces total courier travel time while meeting delivery promises.
Multi-Pickup Optimization:
When a courier already has orders, adding a new order requires solving an insertion problem: where in the current route should we insert the new pickup and dropoff to minimize additional distance?
We try inserting the new pickup at each possible position in the route, and for each pickup position, try inserting the dropoff at each valid position after it (respecting the constraint that pickup must precede dropoff). We calculate the total distance for each valid route configuration and select the one with minimum added distance, provided it doesn’t violate time constraints.
Deep Dive 2: How do we calculate accurate ETAs with real-time traffic data?
Accurate ETAs are crucial for customer satisfaction. An ETA that’s too optimistic leads to disappointment, while one that’s too pessimistic may deter orders. The challenge is that ETAs depend on multiple dynamic factors.
ETA Components:
A complete ETA calculation considers several stages:
-
Courier to Merchant Travel Time: If the courier is not yet at the merchant, we need to estimate how long it takes them to reach the merchant location. This uses real-time traffic data from mapping APIs.
-
Merchant Preparation Time: The time the merchant needs to prepare the order. This is estimated based on historical data for the merchant, current order volume, and item complexity. The merchant can also update this estimate in real-time.
-
Wait Time at Merchant: If the courier arrives before the food is ready, they must wait. This is calculated as the maximum of zero and the difference between prep time and courier travel time.
-
Merchant to Customer Travel Time: The delivery leg, from merchant to customer location. This also uses real-time traffic data and is recalculated as the courier moves.
-
Buffer Time: A small buffer (typically 2 minutes) accounts for parking, walking to the door, and handoff time.
The total ETA is the sum of these components, providing the customer with an estimated delivery time.
Dynamic ETA Updates:
ETAs are not static—they should update as circumstances change:
- As the courier moves toward the merchant, we recalculate the remaining travel time every 30 seconds using their current location.
- If the merchant updates the prep time (e.g., they’re busier than expected), we immediately recalculate the ETA.
- If traffic conditions change significantly, the mapping API returns updated travel times, and we push new ETAs to the customer.
- After pickup, the ETA simplifies to just delivery time plus buffer.
Traffic Data Integration:
We integrate with third-party mapping APIs (Google Maps Directions API, Mapbox) to get real-time traffic data. These APIs provide “duration_in_traffic” which considers current traffic conditions, accidents, and road closures.
To reduce API costs, we implement aggressive caching:
- Cache travel time estimates between popular locations with a 5-minute TTL.
- Use historical traffic patterns as fallbacks during API failures.
- Batch multiple ETA calculations when possible.
Machine Learning for Prep Time:
Rather than relying solely on merchant estimates, we can use historical data to predict preparation times more accurately. We train models that consider:
- Merchant characteristics (cuisine type, kitchen capacity).
- Time of day and day of week (busier during lunch/dinner rush).
- Current order volume at the merchant.
- Item complexity (simple drinks vs. multi-course meals).
- Historical prep times for similar orders.
This ML-based approach can reduce ETA prediction errors by 30-40% compared to static estimates.
Deep Dive 3: How do we optimize routes for multi-stop deliveries?
When a courier handles multiple orders simultaneously, determining the optimal sequence of pickups and dropoffs is a complex optimization problem. This is a variant of the Traveling Salesman Problem (TSP) with precedence constraints.
The Challenge:
Given N orders, we have 2N stops (N pickups and N dropoffs). We need to find the sequence that minimizes total travel distance while respecting constraints:
- Each order’s pickup must occur before its dropoff.
- Each order must be delivered within its promised time window.
- Food quality considerations (hot food should not sit too long).
This is NP-hard, meaning there’s no known polynomial-time algorithm for the optimal solution. However, for small N (typically N ≤ 4 in practice), we can find good solutions quickly.
Solution Approaches:
For batches of 4 or fewer orders (8 stops), we can use brute force with optimizations:
- Generate all valid permutations of stops that respect precedence constraints.
- For each permutation, calculate the total route distance.
- Select the permutation with minimum distance that meets time constraints.
- This has factorial complexity but is fast enough for small N.
For larger batches or when we need faster results, we use heuristic approaches:
Nearest Neighbor with Constraints: Start at the courier’s current location and repeatedly choose the nearest valid next stop. A stop is valid only if:
- If it’s a dropoff, we’ve already picked up that order.
- Adding this stop doesn’t violate time constraints.
This greedy approach runs in polynomial time and typically produces routes within 15-20% of optimal.
Production-Grade Optimization:
For production systems, we use specialized libraries like Google OR-Tools that implement sophisticated algorithms:
- OR-Tools provides a Vehicle Routing Problem (VRP) solver with support for pickup-delivery constraints.
- We model our problem as a VRP with a single vehicle (the courier).
- We create a distance matrix between all locations (courier’s current position plus all pickup and dropoff points).
- We add pickup-delivery pair constraints ensuring each pickup precedes its corresponding dropoff.
- We can also add time window constraints and capacity constraints.
- The solver uses advanced techniques like local search and constraint programming to find near-optimal solutions in seconds.
Real-World Considerations:
Beyond pure distance optimization, we consider:
- Traffic conditions: Some routes may be shorter but have worse traffic.
- Parking availability: Some merchants have easier pickup than others.
- Building access: High-rise apartments take longer than houses.
- Customer preferences: VIP customers might get priority in the route.
These factors can be incorporated as costs in the distance matrix or as additional constraints in the optimization problem.
Deep Dive 4: How do we prevent the same courier from being assigned multiple orders simultaneously?
We need strong consistency in courier assignment to ensure each courier only receives one delivery offer at a time. Without this, we might send multiple offers to the same courier, leading to confusion and inefficiency.
The Problem:
In a distributed system with multiple Dispatch Engine instances, two orders might be processed simultaneously, both identifying the same courier as the best match. Without coordination, both instances might send offers to that courier.
Solution: Distributed Locking
We use Redis-based distributed locking to ensure exclusive access to a courier during the matching process:
When the Dispatch Engine wants to send an offer to a courier, it attempts to acquire a lock using Redis SET command with the NX (Not eXists) flag and EX (EXpiry) flag. The command is: SET lock:courier:{courierId} {orderId} EX 10 NX.
The NX flag ensures this is an atomic test-and-set operation: the lock is only set if it doesn’t already exist. If the command succeeds, this instance has exclusive access to that courier. If it fails, another instance already holds the lock, so we skip this courier and try the next one.
The EX flag sets a 10-second TTL on the lock, ensuring that even if the Dispatch Engine instance crashes or network fails, the lock automatically expires, making the courier available again.
The Matching Loop:
The Dispatch Engine uses a while loop to iterate through ranked couriers:
- Get the next highest-scored courier from the ranked list.
- Attempt to acquire the lock for that courier.
- If lock acquisition succeeds, send a delivery offer notification to the courier.
- Wait for the courier’s response (accept/reject) or timeout (10 seconds).
- If the courier accepts, assign the order and release the lock.
- If the courier rejects or times out, release the lock and continue to the next courier.
- If lock acquisition fails (courier is already locked), immediately skip to the next courier.
This ensures that at any given time, each courier is considering at most one delivery offer, providing a clean user experience and preventing resource conflicts.
Deep Dive 5: How do we ensure no orders are dropped during peak demand or system failures?
During peak periods (lunch and dinner rush), the system receives thousands of orders per second. Additionally, individual service instances may crash or be restarted for deployments. We need to ensure no orders are lost.
Solution: Durable Message Queue
We use Apache Kafka as a durable message queue to decouple order creation from courier matching:
When an order is created, instead of immediately triggering the matching process, the Order Service publishes an order event to a Kafka topic. This write to Kafka is extremely fast and reliable—Kafka persists messages to disk and replicates them across multiple brokers.
The Dispatch Engine instances run as Kafka consumers in a consumer group. Kafka automatically distributes messages across consumers in the group, providing parallel processing. Each consumer processes messages from assigned partitions.
Exactly-Once Processing:
Kafka supports exactly-once semantics, ensuring each order is processed exactly one time even if there are failures:
- Consumers only commit offsets after successfully completing processing.
- If a consumer crashes while processing a message, Kafka will redeliver that message to another consumer.
- Idempotent processing ensures that even if a message is processed twice, it doesn’t cause duplicate courier assignments.
Handling Failures:
If a Dispatch Engine instance crashes:
- Kafka’s consumer group rebalancing automatically reassigns its partitions to healthy instances.
- Messages that were being processed (but not yet committed) are redelivered to new consumers.
- No messages are lost.
If the entire matching process fails for an order (e.g., no couriers accept):
- The message can be retried with exponential backoff.
- After maximum retries, the message goes to a Dead Letter Queue for manual review.
- The customer is notified that no courier is available, and we may trigger surge pricing to attract more couriers.
Scaling for Peak Load:
During peak hours, we can dynamically scale the number of Dispatch Engine instances:
- Kafka partitions allow parallel processing—more partitions enable more parallel consumers.
- Auto-scaling policies monitor queue depth and add instances when backlog grows.
- Geographic partitioning allows region-specific scaling.
Deep Dive 6: How do we handle high-frequency location updates while maintaining system performance?
With 500,000 active couriers sending location updates every 5 seconds, we receive approximately 100,000 location updates per second. Writing this volume to a traditional database would be prohibitively expensive and slow.
Storage Solution: Redis Geo
We use Redis with geospatial commands for location storage:
Redis provides specialized commands for geospatial data:
- GEOADD adds a location to a sorted set with geospatial indexing.
- GEORADIUS finds all items within a specified radius of a point.
- GEOSEARCH provides more advanced proximity searches.
These operations have O(log N) complexity for writes and proximity searches, making them efficient even with millions of locations.
When a courier sends a location update, we execute: GEOADD courier_locations {longitude} {latitude} {courierId}. This updates the courier’s position in the geospatial index.
When matching an order, we execute: GEORADIUS courier_locations {merchant_longitude} {merchant_latitude} 5000 m WITHDIST WITHCOORD. This returns all couriers within 5km of the merchant, along with their distances and coordinates.
Handling Stale Data:
We need to remove location data for couriers who go offline. Redis doesn’t support TTL on individual sorted set members, so we have several options:
- Time-bucketed sets: Create a new sorted set every minute (e.g., courier_locations_2024_03_28_1801) with a TTL on the entire set. When searching, query the current and previous bucket.
- Separate tracking: Maintain a separate key for each courier with TTL, and use Redis keyspace notifications to remove them from the geospatial index when they expire.
- Periodic cleanup: Run a background job that periodically removes couriers who haven’t updated their location in the last minute.
Client-Side Optimization:
We can reduce the volume of location updates through client-side intelligence:
- Adaptive update frequency: If the courier is stationary (velocity near zero), reduce update frequency to every 30 seconds. If moving, update every 5 seconds. If moving fast or turning, update every 3 seconds.
- Distance-based updates: Only send updates if the courier has moved more than 50 meters from their last reported position.
- Batching: If the network is slow, batch multiple location points and send them together.
These optimizations can reduce updates by 50-70% while maintaining accuracy, significantly lowering server load and costs.
Write Optimization:
For very high write volumes, we can add optimizations:
- Write-behind caching: Buffer location updates in memory and flush to Redis in batches.
- Sharding: Partition the location data by geographic region (geohash prefix) across multiple Redis instances.
- Read replicas: Use Redis replication for read scaling, with writes going to master and proximity searches reading from replicas.
Deep Dive 7: How do we manage merchant integration and real-time catalog updates?
Merchants need to manage their catalogs, update item availability in real-time, and receive orders through their existing workflows. Many merchants already use POS (Point of Sale) systems, and we need to integrate smoothly.
Catalog Management:
Merchants can manage their catalogs through a web dashboard or mobile app. When they update their menu:
- The Merchant Service validates the menu structure (items, prices, categories).
- The data is stored in PostgreSQL for durability and transactional integrity.
- The catalog is indexed in Elasticsearch for full-text search across items.
- The catalog is cached in Redis with a TTL for fast reads.
- Cache invalidation ensures customers see updated menus immediately.
Search Functionality:
Customers search for items using full-text search powered by Elasticsearch:
- Searches consider item name, merchant name, cuisine type, and item descriptions.
- Geospatial filtering ensures only merchants within delivery range appear.
- Results are ranked by relevance, merchant rating, and distance.
- Filters (price range, dietary restrictions, merchant rating) narrow results.
Real-Time Availability:
When a merchant runs out of an item, they need to mark it unavailable immediately:
- The merchant updates availability through their dashboard or POS.
- The Merchant Service updates Redis cache immediately (HSET availability:{merchantId} {itemId} 0).
- The change is persisted to PostgreSQL.
- Connected customers viewing that menu receive a real-time update via WebSocket.
When a customer places an order, we validate availability before accepting:
- Check Redis cache for item availability.
- If any item is unavailable, reject the order and notify the customer.
- This prevents customers from ordering items that are no longer available.
POS Integration:
Many merchants use existing POS systems (Square, Toast, Clover, etc.). We integrate through APIs or webhooks:
When a Postmates order is placed, the Merchant Service sends it to the merchant’s POS system via their API. The order appears in the POS just like an in-store order.
Different POS systems have different APIs, so we implement adapters for each:
- Square integration: POST to Square Orders API with idempotency keys.
- Toast integration: Use Toast’s Orders API with location IDs.
- Generic webhooks: For merchants without specific integrations, send standardized webhooks.
The idempotency keys ensure that network retries don’t create duplicate orders. We use the Postmates order ID as the idempotency key.
When the merchant confirms the order in their POS, we receive a callback, and the Order Service transitions the order state to “confirmed” and starts searching for a courier.
Deep Dive 8: How do we implement dynamic pricing to balance supply and demand?
The Pricing Engine calculates delivery fees dynamically based on multiple factors. This helps balance supply and demand—when couriers are scarce, higher prices incentivize more couriers to go online while managing customer expectations.
Pricing Components:
The total delivery fee consists of several components:
-
Base Fee: A fixed minimum fee (e.g., $2.99) that covers basic operational costs.
-
Distance Fee: Additional charges based on delivery distance. Typically structured in tiers: free for the first 2km, then $0.50 per km up to 5km, then $0.75 per km beyond that.
-
Surge Multiplier: Dynamic multiplier based on supply-demand ratio. Calculated by comparing pending orders to available couriers in the delivery zone. If demand exceeds supply, the multiplier increases (e.g., 1.25x, 1.5x, 2.0x).
-
Small Order Fee: An additional fee (e.g., $2.00) for orders below a minimum subtotal (e.g., $10) to ensure orders are economically viable.
-
Service Fee: A percentage of the order subtotal (e.g., 15%) that covers platform costs.
The total fee is calculated as: (base_fee + distance_fee) * surge_multiplier + small_order_fee + service_fee.
Surge Pricing Calculation:
The surge multiplier is determined by the current demand-supply ratio in the delivery zone:
- Calculate demand: count of pending orders (not yet assigned to a courier).
- Calculate supply: count of available couriers (online, not at capacity).
- Calculate ratio: demand / max(supply, 1).
- Apply surge based on ratio: < 1 → 1.0x, 1-2 → 1.25x, 2-3 → 1.5x, > 3 → 2.0x.
We also apply time-based adjustments:
- Peak hours (lunch: 12-1pm, dinner: 6-8pm) get an additional 1.2x multiplier.
- Weather conditions (rain, snow) get an additional 1.3x multiplier due to increased difficulty.
Courier Payout Calculation:
Courier earnings are calculated separately to ensure fair compensation:
- Base Pay: Minimum pay per delivery (e.g., $3.00).
- Distance Pay: Payment per kilometer (e.g., $0.60/km).
- Time Pay: Payment per minute (e.g., $0.15/min) to compensate for traffic and wait times.
- Pickup/Dropoff Fees: Fixed fees for each pickup ($1.00) and dropoff ($1.50).
- Tips: Customer tips go entirely to the courier.
- Surge/Boost: When surge pricing is active, couriers receive boost pay proportional to the surge multiplier.
Total payout = base_pay + distance_pay + time_pay + pickup_fees + dropoff_fees + tips + boost.
This ensures couriers earn fairly even when customers pay surge pricing—the extra revenue is shared with couriers who work during high-demand periods.
A/B Testing Pricing:
The Pricing Engine supports A/B testing different pricing strategies:
- Random assignment of customers to control vs. experimental groups.
- Track metrics: conversion rate, average order value, customer lifetime value.
- Gradually roll out successful experiments to all customers.
Step 4: Wrap Up
In this design, we proposed a comprehensive system for an on-demand delivery platform like Postmates. If there is extra time at the end of the interview, here are additional points to discuss:
Additional Features:
- Scheduled deliveries: Allow customers to schedule deliveries for specific times in the future.
- Group orders: Enable multiple items from different merchants in a single delivery.
- Priority delivery: Offer expedited delivery for additional fees.
- Subscription programs: Unlimited free delivery for monthly subscribers.
- Live chat: Real-time communication between customers, couriers, and merchants.
- Alcohol delivery: Age verification and compliance with local regulations.
Scaling Considerations:
- Horizontal Scaling: All services should be stateless to allow horizontal scaling. Use container orchestration (Kubernetes) for dynamic scaling.
- Database Sharding: Shard order data by customer ID or geographic region. Partition location data by geohash for geographic distribution.
- Caching Layers: Multi-level caching with application cache (in-memory), distributed cache (Redis), and CDN for static assets (merchant photos, menus).
- Geographic Distribution: Deploy services in multiple regions with data center affinity routing customers to the nearest data center.
Error Handling:
- Network Failures: Implement retry logic with exponential backoff for transient failures. Use circuit breakers to prevent cascading failures when downstream services are unhealthy.
- Service Failures: Implement health checks and automatic failover. Use load balancers to route traffic away from unhealthy instances.
- Database Failures: Use database replication with automatic failover to replicas. Implement connection pooling and timeout handling.
- Third-Party API Failures: Have fallback mapping services or cache recent results. Implement graceful degradation (use historical average travel times if traffic APIs fail).
Security Considerations:
- Encrypt sensitive data in transit (TLS/HTTPS) and at rest (database encryption).
- Implement proper authentication (JWT tokens) and authorization (role-based access control).
- Rate limiting to prevent abuse and DDoS attacks.
- Input validation and sanitization to prevent injection attacks.
- PCI compliance for payment data handling.
- Background checks and identity verification for couriers.
Monitoring and Analytics:
- Track key metrics: order acceptance rate, average delivery time, courier utilization, customer satisfaction.
- Real-time dashboards for operations team to monitor system health and respond to issues.
- Alerting for anomalies: long delivery times, high cancellation rate, low courier availability.
- A/B testing framework for pricing, matching algorithms, and UI changes.
- Business intelligence: analyze order patterns, popular items, merchant performance.
Fraud Prevention:
- Detect fake orders from patterns (same customer, many cancellations).
- Monitor courier GPS spoofing (detecting teleportation or impossible speeds).
- Flag suspicious merchant behavior (fake orders, price manipulation).
- Implement velocity checks on payment methods.
- Verify merchant business licenses and permits.
Data Partitioning Strategy:
- Shard orders by customer ID or order ID range for write distribution.
- Partition location data by geohash prefix for geographic queries.
- Separate hot data (active orders, online couriers) from cold data (historical data) with different storage tiers.
- Use time-based partitioning for order history (partition by month).
Future Improvements:
- Machine learning for demand prediction to proactively position couriers in high-demand areas.
- Improved matching algorithms using reinforcement learning to optimize for multiple objectives.
- Computer vision for receipt scanning and automatic pricing.
- Autonomous delivery vehicles and drones for certain delivery types.
- Predictive ETAs using neural networks trained on historical data.
Congratulations on getting this far! Designing Postmates is a complex system design challenge that touches on many important distributed systems concepts including real-time location tracking, route optimization, distributed locking, message queues, and dynamic pricing. The key is to start simple, satisfy functional requirements first, then layer in the non-functional requirements and optimizations.
Summary
This comprehensive guide covered the design of an on-demand delivery platform like Postmates, including:
- Core Functionality: Order placement, merchant catalog management, courier matching with multi-stop support, real-time tracking, and dynamic ETA calculation.
- Key Challenges: High-frequency location updates, efficient proximity searches, multi-stop route optimization, distributed locking for courier assignment, and peak traffic handling.
- Solutions: Redis Geo for geospatial queries, Kafka for durable order processing, distributed locking for consistency, route optimization using TSP algorithms, dynamic pricing based on supply-demand, and merchant POS integration.
- Scalability: Horizontal scaling of stateless services, database sharding by geography, multi-level caching, client-side optimizations, and message queue partitioning.
The design demonstrates how to handle complex logistics systems with real-time requirements, multi-objective optimization, strong consistency needs, and integration with third-party systems.
Comments