Design Proximity Service

A proximity service is a geospatial system that enables users to discover nearby points of interest (POI) such as restaurants, gas stations, hotels, or other businesses. Systems like Yelp, Google Maps, Uber, and Foursquare rely on efficient proximity search at massive scale. This design covers how to build a production-grade proximity service handling billions of places and millions of concurrent queries.

Step 1: Requirements Clarification

Functional Requirements

Core Features:

  • Search for nearby places within a given radius (e.g., 5km, 10km)
  • Filter results by category (restaurants, gas stations, hotels, etc.)
  • Return detailed place information (name, address, rating, photos)
  • Rank results by distance, popularity, ratings, and relevance
  • Support real-time updates when places are added, modified, or closed
  • Handle both stationary objects (businesses) and moving objects (users)
  • Support different search modes: radius search, k-nearest neighbors (kNN)

Optional Features:

  • Place recommendations based on user preferences
  • Real-time place availability (e.g., wait times, parking spots)
  • Direction and navigation integration
  • User check-ins and reviews

Non-Functional Requirements

Scale:

  • 500 million places globally
  • 100 million daily active users (DAU)
  • 5 billion search queries per day (~57,870 QPS)
  • Peak load: 200,000 QPS
  • Each place has ~1KB of data
  • Search latency: p99 < 200ms

Availability and Reliability:

  • 99.99% availability (52 minutes downtime per year)
  • Geo-redundant deployment across multiple regions
  • Graceful degradation during partial failures

Data Characteristics:

  • Read-heavy: 99% reads, 1% writes
  • Place data changes infrequently (hours/days)
  • User location changes frequently (seconds/minutes)
  • Search patterns are geographically clustered

Storage Estimation:

  • Place data: 500M places * 1KB = 500GB
  • With metadata, indexes: ~2TB total
  • User location cache: 100M users * 100 bytes = 10GB

Step 2: High-Level Design

System Architecture

┌─────────────┐
│   Clients   │ (Mobile, Web)
└──────┬──────┘

┌──────▼──────────────────────────────────────┐
│          API Gateway / Load Balancer         │
│        (Rate Limiting, Authentication)       │
└──────┬──────────────────────────────────────┘

┌──────▼───────────────────────────────────────┐
│           Location Service Cluster           │
│  (Geohash, Quadtree, Proximity Algorithms)   │
└──┬───────────────────────────────────────┬───┘
   │                                       │
┌──▼─────────────────┐         ┌──────────▼──────────┐
│  Search Service    │         │   Ranking Service   │
│  (Elasticsearch/   │         │  (ML-based scoring) │
│   Redis GEO)       │         └──────────┬──────────┘
└──┬─────────────────┘                    │
   │                                       │
┌──▼───────────────────────────────────────▼───┐
│            Place Service (CRUD)              │
│         (Place metadata management)          │
└──┬───────────────────────────────────────────┘

┌──▼──────────────────────────────────────────┐
│     Data Layer                               │
│  ┌────────────┐  ┌──────────┐  ┌─────────┐ │
│  │ PostgreSQL │  │  Redis   │  │ S3/CDN  │ │
│  │ (Master-   │  │  (Cache) │  │(Photos) │ │
│  │  Replica)  │  └──────────┘  └─────────┘ │
│  └────────────┘                             │
└─────────────────────────────────────────────┘

Core Components

1. API Gateway:

  • Authentication and authorization
  • Rate limiting per user/API key
  • Request routing and load balancing
  • SSL termination

2. Location Service:

  • Receives user coordinates and search radius
  • Translates geographic coordinates to geospatial indexes
  • Performs initial candidate selection
  • Handles geospatial computations

3. Search Service:

  • Executes proximity queries using geospatial indexes
  • Filters results by category, hours, ratings
  • Returns candidate list to ranking service
  • Powered by Redis GEO or Elasticsearch

4. Ranking Service:

  • Scores candidates based on multiple factors
  • Distance-based scoring (closer = higher)
  • Popularity signals (reviews, ratings, check-ins)
  • Personalization based on user preferences
  • ML models for relevance ranking

5. Place Service:

  • CRUD operations for place data
  • Manages place metadata (name, category, hours, photos)
  • Handles place updates and deletions
  • Synchronizes with search indexes

6. Data Layer:

  • PostgreSQL: Primary source of truth for place data
  • Redis: Geospatial index and cache layer
  • Elasticsearch: Full-text search and complex geo queries
  • S3/CDN: Static assets (photos, logos)

API Design

GET /v1/search/nearby
Parameters:
  - latitude: double (required)
  - longitude: double (required)
  - radius: int (meters, default: 5000, max: 50000)
  - category: string (optional)
  - limit: int (default: 20, max: 100)
  - offset: int (pagination)
  - sort: string (distance, rating, popularity)

Response:
{
  "results": [
    {
      "place_id": "uuid",
      "name": "Blue Bottle Coffee",
      "category": "cafe",
      "location": {"lat": 37.7749, "lon": -122.4194},
      "distance": 450, // meters
      "rating": 4.5,
      "price_level": 2,
      "open_now": true,
      "photos": ["url1", "url2"]
    }
  ],
  "total": 156,
  "next_offset": 20
}

Step 3: Deep Dive into Critical Components

3.1 Geospatial Indexing Strategies

The core challenge is efficiently finding nearby places from 500M+ locations. Traditional database indexes don’t work well for 2D spatial queries.

Option 1: Geohash

How it works: Geohash encodes latitude/longitude into a short alphanumeric string. Nearby locations share common prefixes.

San Francisco: 9q8yy (37.7749, -122.4194)
Oakland:       9q9p1 (37.8044, -122.2712)

Properties:

  • 4-character geohash: ~20km x 20km grid
  • 5-character geohash: ~4.9km x 4.9km grid
  • 6-character geohash: ~1.2km x 1.2km grid
  • 7-character geohash: ~152m x 152m grid

Implementation in Redis:

# Add places to geospatial index
GEOADD places:geo -122.4194 37.7749 "place:123"
GEOADD places:geo -122.2712 37.8044 "place:456"

# Search within radius
GEORADIUS places:geo -122.4194 37.7749 5 km WITHDIST WITHCOORD COUNT 20

# Search by existing member
GEORADIUSBYMEMBER places:geo "place:123" 10 km WITHDIST

Advantages:

  • Simple to implement
  • Fast lookups using prefix matching
  • Works with standard databases (index on geohash string)
  • Consistent grid size at same precision level

Disadvantages:

  • Edge cases: Places just across geohash boundaries might be missed
  • Requires checking neighboring geohashes for border queries
  • Fixed grid doesn’t adapt to density

Production Implementation:

import geohash2

def find_nearby_geohashes(lat, lon, radius_km):
    """
    Returns list of geohashes to check based on radius.
    For 5km radius, check center + 8 neighbors.
    """
    center_hash = geohash2.encode(lat, lon, precision=6)
    neighbors = geohash2.neighbors(center_hash)
    return [center_hash] + neighbors

def search_nearby(lat, lon, radius_km, category=None):
    geohashes = find_nearby_geohashes(lat, lon, radius_km)

    candidates = []
    for gh in geohashes:
        # Query database with geohash prefix
        places = db.query(
            "SELECT * FROM places WHERE geohash LIKE ? AND category = ?",
            (gh + '%', category)
        )
        candidates.extend(places)

    # Filter by actual distance
    results = []
    for place in candidates:
        distance = haversine(lat, lon, place.lat, place.lon)
        if distance <= radius_km:
            results.append((place, distance))

    return sorted(results, key=lambda x: x[1])

Option 2: Quadtree

How it works: Quadtree recursively divides 2D space into four quadrants. Dense areas get more subdivisions, sparse areas remain coarse.

Root (World)
├─ NW (North America)
│  ├─ NW (Pacific Northwest)
│  ├─ NE (Northeast US)
│  ├─ SW (Southwest US)
│  └─ SE (Southeast US)
├─ NE (Europe)
├─ SW (South America)
└─ SE (Asia)

Structure:

class QuadTreeNode:
    def __init__(self, boundary, capacity=50):
        self.boundary = boundary  # (min_lat, max_lat, min_lon, max_lon)
        self.capacity = capacity
        self.places = []
        self.divided = False
        self.nw = self.ne = self.sw = self.se = None

    def subdivide(self):
        mid_lat = (self.boundary.min_lat + self.boundary.max_lat) / 2
        mid_lon = (self.boundary.min_lon + self.boundary.max_lon) / 2

        self.nw = QuadTreeNode(Boundary(mid_lat, max_lat, min_lon, mid_lon))
        self.ne = QuadTreeNode(Boundary(mid_lat, max_lat, mid_lon, max_lon))
        self.sw = QuadTreeNode(Boundary(min_lat, mid_lat, min_lon, mid_lon))
        self.se = QuadTreeNode(Boundary(min_lat, mid_lat, mid_lon, max_lon))
        self.divided = True

    def insert(self, place):
        if not self.boundary.contains(place.location):
            return False

        if len(self.places) < self.capacity:
            self.places.append(place)
            return True

        if not self.divided:
            self.subdivide()
            # Redistribute existing places
            for p in self.places:
                self._insert_to_child(p)
            self.places = []

        return self._insert_to_child(place)

    def search_radius(self, center, radius):
        results = []
        if not self.boundary.intersects_circle(center, radius):
            return results

        # Check leaf nodes
        for place in self.places:
            if distance(center, place.location) <= radius:
                results.append(place)

        # Recurse to children
        if self.divided:
            results.extend(self.nw.search_radius(center, radius))
            results.extend(self.ne.search_radius(center, radius))
            results.extend(self.sw.search_radius(center, radius))
            results.extend(self.se.search_radius(center, radius))

        return results

Advantages:

  • Adaptive: More subdivisions in dense areas (Manhattan) vs sparse areas (rural)
  • Efficient for k-nearest neighbor queries
  • No edge case issues like geohash
  • Memory-efficient for sparse regions

Disadvantages:

  • Complex to implement and maintain
  • Expensive to rebalance on updates
  • Difficult to distribute across multiple servers
  • In-memory structure, hard to persist

Best for: In-memory caching layer, not primary storage.

Option 3: R-tree

How it works: R-tree is similar to B-tree but for multi-dimensional data. Groups nearby objects into hierarchical bounding boxes.

Use with PostgreSQL + PostGIS:

-- Create table with geospatial column
CREATE TABLE places (
    id UUID PRIMARY KEY,
    name VARCHAR(255),
    location GEOGRAPHY(POINT, 4326),
    category VARCHAR(50)
);

-- Create spatial index (uses R-tree internally)
CREATE INDEX idx_places_location ON places USING GIST(location);

-- Query nearby places
SELECT
    id,
    name,
    ST_Distance(location, ST_MakePoint(-122.4194, 37.7749)::geography) AS distance
FROM places
WHERE ST_DWithin(
    location,
    ST_MakePoint(-122.4194, 37.7749)::geography,
    5000  -- 5km in meters
)
AND category = 'restaurant'
ORDER BY distance
LIMIT 20;

Advantages:

  • Battle-tested with PostgreSQL PostGIS
  • Handles complex geospatial queries
  • ACID transactions for updates
  • Production-grade reliability

Disadvantages:

  • Slower than in-memory solutions (Redis GEO)
  • Database load increases with query volume
  • Scaling requires read replicas and sharding

Architecture:

┌──────────────────────────────────────┐
│       Redis GEO Cluster              │
│  ┌────────┐  ┌────────┐  ┌────────┐ │
│  │Shard 1 │  │Shard 2 │  │Shard 3 │ │
│  │ US-West│  │ US-East│  │ Europe │ │
│  └────────┘  └────────┘  └────────┘ │
└──────────────────────────────────────┘

Why Redis GEO:

  • Sub-millisecond latency
  • 100K+ queries per second per instance
  • Built-in geospatial commands
  • Sorted set implementation using geohash

Production Implementation:

import redis
from typing import List, Dict

class ProximitySearchService:
    def __init__(self):
        self.redis_client = redis.Redis(
            host='redis-cluster.internal',
            port=6379,
            decode_responses=True,
            socket_connect_timeout=2,
            socket_timeout=2
        )

    def index_place(self, place_id: str, lat: float, lon: float,
                    category: str):
        """
        Index place in Redis GEO by category.
        Key pattern: places:geo:{category}
        """
        key = f"places:geo:{category}"
        self.redis_client.geoadd(key, (lon, lat, place_id))

        # Also add to global index for category-agnostic search
        self.redis_client.geoadd("places:geo:all", (lon, lat, place_id))

        # Store place metadata separately
        self.redis_client.hset(f"place:{place_id}", mapping={
            "name": place.name,
            "category": category,
            "rating": place.rating,
            "lat": lat,
            "lon": lon
        })

    def search_nearby(self, lat: float, lon: float, radius_m: int,
                      category: str = None, limit: int = 20) -> List[Dict]:
        """
        Search nearby places using Redis GEORADIUS.
        """
        key = f"places:geo:{category}" if category else "places:geo:all"

        # GEORADIUS with distance and coordinates
        results = self.redis_client.georadius(
            name=key,
            longitude=lon,
            latitude=lat,
            radius=radius_m,
            unit='m',
            withdist=True,
            withcoord=True,
            count=limit,
            sort='ASC'  # Closest first
        )

        # Fetch place metadata
        places = []
        for place_id, distance, coords in results:
            metadata = self.redis_client.hgetall(f"place:{place_id}")
            places.append({
                "place_id": place_id,
                "distance": distance,
                "location": {"lat": coords[1], "lon": coords[0]},
                **metadata
            })

        return places

    def search_knn(self, lat: float, lon: float, k: int = 10,
                   category: str = None) -> List[Dict]:
        """
        Find k-nearest neighbors regardless of distance.
        Start with small radius and expand until k results found.
        """
        radius = 1000  # Start with 1km
        max_radius = 50000  # Max 50km

        while radius <= max_radius:
            results = self.search_nearby(lat, lon, radius, category, limit=k)
            if len(results) >= k:
                return results[:k]
            radius *= 2  # Exponential backoff

        return results

Redis GEO Internals:

  • Uses sorted set with geohash as score
  • Geohash is 52-bit integer (fits in Redis score)
  • GEORADIUS queries sorted set by geohash range
  • Performance: O(N+log(M)) where N = results, M = total items

Sharding Strategy: Shard by geographic region to keep related data together:

def get_redis_shard(lat: float, lon: float) -> str:
    """Route to appropriate Redis shard based on location."""
    if -125 < lon < -65 and 25 < lat < 50:
        return "redis-us"
    elif -10 < lon < 40 and 35 < lat < 70:
        return "redis-eu"
    elif 100 < lon < 145 and 20 < lat < 45:
        return "redis-asia"
    else:
        return "redis-global"

3.3 Elasticsearch for Complex Geo Queries

When to use Elasticsearch:

  • Need full-text search (“coffee near me”)
  • Complex filters (category AND open_now AND rating > 4.0)
  • Faceted search (aggregate by category)
  • Geospatial bounding box queries

Index Mapping:

{
  "mappings": {
    "properties": {
      "place_id": {"type": "keyword"},
      "name": {
        "type": "text",
        "fields": {
          "keyword": {"type": "keyword"}
        }
      },
      "location": {"type": "geo_point"},
      "category": {"type": "keyword"},
      "rating": {"type": "float"},
      "price_level": {"type": "integer"},
      "open_now": {"type": "boolean"},
      "hours": {
        "type": "nested",
        "properties": {
          "day": {"type": "keyword"},
          "open": {"type": "keyword"},
          "close": {"type": "keyword"}
        }
      },
      "popularity_score": {"type": "float"}
    }
  }
}

Geo Query Examples:

# Geo distance query
{
  "query": {
    "bool": {
      "must": {
        "match": {"category": "restaurant"}
      },
      "filter": {
        "geo_distance": {
          "distance": "5km",
          "location": {
            "lat": 37.7749,
            "lon": -122.4194
          }
        }
      }
    }
  },
  "sort": [
    {
      "_geo_distance": {
        "location": {
          "lat": 37.7749,
          "lon": -122.4194
        },
        "order": "asc",
        "unit": "km"
      }
    }
  ]
}

# Geo bounding box query
{
  "query": {
    "bool": {
      "filter": {
        "geo_bounding_box": {
          "location": {
            "top_left": {"lat": 37.8, "lon": -122.5},
            "bottom_right": {"lat": 37.7, "lon": -122.3}
          }
        }
      }
    }
  }
}

# Complex query with multiple filters
{
  "query": {
    "bool": {
      "must": [
        {"match": {"name": "coffee"}}
      ],
      "filter": [
        {"term": {"category": "cafe"}},
        {"term": {"open_now": true}},
        {"range": {"rating": {"gte": 4.0}}},
        {
          "geo_distance": {
            "distance": "2km",
            "location": {"lat": 37.7749, "lon": -122.4194}
          }
        }
      ]
    }
  },
  "sort": [
    {"_score": "desc"},
    {"_geo_distance": {
      "location": {"lat": 37.7749, "lon": -122.4194},
      "order": "asc"
    }}
  ]
}

Sharding by Region:

# Create index per region for better performance
indices = [
    "places-us-west",
    "places-us-east",
    "places-europe",
    "places-asia"
]

def search_places(lat, lon, query_params):
    """Search appropriate regional index."""
    index = get_index_by_location(lat, lon)

    response = es_client.search(
        index=index,
        body={
            "query": build_geo_query(lat, lon, query_params),
            "size": 20
        }
    )

    return response['hits']['hits']

3.4 Ranking Service

Ranking Factors:

  1. Distance (primary)
  2. Popularity (reviews, check-ins)
  3. Rating
  4. Personalization (user preferences)
  5. Freshness (newly opened places)
  6. Business tier (promoted listings)

Scoring Formula:

def calculate_score(place, user_location, user_prefs):
    """
    Multi-factor scoring for place ranking.
    """
    # Distance score (inverse exponential)
    distance_km = haversine(user_location, place.location)
    distance_score = math.exp(-distance_km / 5.0)  # Decay over 5km

    # Rating score (normalized)
    rating_score = place.rating / 5.0

    # Popularity score (log scale)
    popularity_score = math.log10(place.review_count + 1) / 4.0

    # Personalization score
    category_match = 1.0 if place.category in user_prefs else 0.5

    # Weighted combination
    total_score = (
        0.50 * distance_score +
        0.20 * rating_score +
        0.15 * popularity_score +
        0.15 * category_match
    )

    return total_score

# Sort by score
ranked_places = sorted(
    candidates,
    key=lambda p: calculate_score(p, user_loc, user_prefs),
    reverse=True
)

ML-Based Ranking:

import numpy as np
from sklearn.ensemble import GradientBoostingRegressor

class MLRankingService:
    def __init__(self):
        self.model = self.load_model()

    def extract_features(self, place, user_location, user_prefs):
        """Extract features for ML model."""
        distance = haversine(user_location, place.location)

        return np.array([
            distance,
            place.rating,
            place.review_count,
            place.price_level,
            int(place.open_now),
            place.popularity_score,
            int(place.category in user_prefs),
            place.days_since_opened
        ])

    def rank(self, places, user_location, user_prefs):
        """Rank places using ML model."""
        features = [
            self.extract_features(p, user_location, user_prefs)
            for p in places
        ]

        scores = self.model.predict(features)

        ranked = sorted(
            zip(places, scores),
            key=lambda x: x[1],
            reverse=True
        )

        return [place for place, score in ranked]

3.5 Handling Moving Objects (Users)

Challenge: User location updates frequently but we don’t want to reindex constantly.

Solution: Separate User Location Cache

class UserLocationService:
    def __init__(self):
        self.redis = redis.Redis()

    def update_user_location(self, user_id: str, lat: float, lon: float):
        """
        Cache user location with TTL.
        No need to index in geospatial structure.
        """
        self.redis.setex(
            f"user:location:{user_id}",
            300,  # 5 minute TTL
            json.dumps({"lat": lat, "lon": lon, "timestamp": time.time()})
        )

    def get_user_location(self, user_id: str):
        """Retrieve cached user location."""
        data = self.redis.get(f"user:location:{user_id}")
        return json.loads(data) if data else None

For ride-sharing (moving objects that need to be searched):

# Update driver location in Redis GEO
def update_driver_location(driver_id, lat, lon):
    # Remove old location
    redis.zrem("drivers:active", driver_id)

    # Add new location
    redis.geoadd("drivers:active", (lon, lat, driver_id))

    # Set expiry on driver metadata
    redis.setex(f"driver:{driver_id}", 60, json.dumps({
        "lat": lat,
        "lon": lon,
        "status": "available"
    }))

# Cleanup expired drivers
def cleanup_expired_drivers():
    """Remove drivers who haven't updated in 60 seconds."""
    all_drivers = redis.zrange("drivers:active", 0, -1)
    for driver_id in all_drivers:
        if not redis.exists(f"driver:{driver_id}"):
            redis.zrem("drivers:active", driver_id)

3.6 Place Data Management and Updates

Write Path:

Place Update → API → Place Service → PostgreSQL (Write)

                            Update Redis GEO (Async)

                            Update Elasticsearch (Async)

Implementation:

class PlaceService:
    def __init__(self):
        self.db = PostgresConnection()
        self.redis = RedisConnection()
        self.es = ElasticsearchConnection()
        self.mq = KafkaProducer()

    def create_place(self, place_data):
        """Create new place."""
        # 1. Write to PostgreSQL (source of truth)
        place_id = self.db.execute("""
            INSERT INTO places (name, category, location, rating)
            VALUES (%s, %s, ST_Point(%s, %s), %s)
            RETURNING id
        """, (
            place_data['name'],
            place_data['category'],
            place_data['lon'],
            place_data['lat'],
            place_data['rating']
        ))

        # 2. Publish to Kafka for async indexing
        self.mq.send('place-updates', {
            'event': 'create',
            'place_id': place_id,
            'data': place_data
        })

        return place_id

    def update_place(self, place_id, updates):
        """Update existing place."""
        # Update PostgreSQL
        self.db.execute("""
            UPDATE places
            SET name = %s, rating = %s, updated_at = NOW()
            WHERE id = %s
        """, (updates['name'], updates['rating'], place_id))

        # Publish update event
        self.mq.send('place-updates', {
            'event': 'update',
            'place_id': place_id,
            'updates': updates
        })

# Async indexer consumer
class PlaceIndexer:
    def consume_updates(self):
        """Process place updates from Kafka."""
        for message in kafka_consumer:
            event = message.value

            if event['event'] == 'create':
                self._index_new_place(event['place_id'], event['data'])
            elif event['event'] == 'update':
                self._update_indexes(event['place_id'], event['updates'])
            elif event['event'] == 'delete':
                self._remove_from_indexes(event['place_id'])

    def _index_new_place(self, place_id, data):
        """Add to Redis GEO and Elasticsearch."""
        # Index in Redis GEO
        category = data['category']
        redis.geoadd(
            f"places:geo:{category}",
            (data['lon'], data['lat'], place_id)
        )

        # Index in Elasticsearch
        es.index(
            index='places',
            id=place_id,
            document={
                'place_id': place_id,
                'name': data['name'],
                'category': category,
                'location': {'lat': data['lat'], 'lon': data['lon']},
                'rating': data['rating']
            }
        )

3.7 Caching Strategy

Multi-Layer Cache:

L1: Application Cache (In-Memory)

from functools import lru_cache
import hashlib

class CacheLayer:
    def __init__(self):
        self.local_cache = {}  # In-memory dict

    def get_cache_key(self, lat, lon, radius, category):
        """Generate cache key from search params."""
        # Round to 3 decimal places (~100m precision)
        lat_rounded = round(lat, 3)
        lon_rounded = round(lon, 3)
        return f"{lat_rounded}:{lon_rounded}:{radius}:{category}"

    @lru_cache(maxsize=1000)
    def search_cached(self, lat, lon, radius, category):
        """LRU cache for frequent searches."""
        cache_key = self.get_cache_key(lat, lon, radius, category)

        # Check local cache first
        if cache_key in self.local_cache:
            return self.local_cache[cache_key]

        # Execute search
        results = self.search_nearby(lat, lon, radius, category)

        # Cache results
        self.local_cache[cache_key] = results
        return results

L2: Redis Cache (Distributed)

def search_with_cache(lat, lon, radius, category):
    """Search with Redis cache layer."""
    cache_key = f"search:{round(lat,2)}:{round(lon,2)}:{radius}:{category}"

    # Check cache
    cached = redis.get(cache_key)
    if cached:
        return json.loads(cached)

    # Execute search
    results = proximity_search(lat, lon, radius, category)

    # Cache for 5 minutes
    redis.setex(cache_key, 300, json.dumps(results))

    return results

Cache Invalidation:

def invalidate_place_cache(place_id):
    """Invalidate cache when place is updated."""
    # Get place location
    place = db.get_place(place_id)

    # Invalidate all cache keys in surrounding area
    geohashes = get_nearby_geohashes(place.lat, place.lon, radius=10)

    for gh in geohashes:
        pattern = f"search:{gh}:*"
        keys = redis.keys(pattern)
        if keys:
            redis.delete(*keys)

3.8 Database Sharding by Region

Sharding Strategy: Partition data by geographic region to improve query performance and enable regional isolation.

# Shard mapping
SHARDS = {
    'us-west': {
        'bounds': {'min_lat': 32, 'max_lat': 49, 'min_lon': -125, 'max_lon': -100},
        'db': 'postgres-us-west.internal'
    },
    'us-east': {
        'bounds': {'min_lat': 25, 'max_lat': 48, 'min_lon': -100, 'max_lon': -65},
        'db': 'postgres-us-east.internal'
    },
    'europe': {
        'bounds': {'min_lat': 35, 'max_lat': 70, 'min_lon': -10, 'max_lon': 40},
        'db': 'postgres-eu.internal'
    }
}

def get_shard_for_location(lat, lon):
    """Route query to appropriate database shard."""
    for shard_name, config in SHARDS.items():
        bounds = config['bounds']
        if (bounds['min_lat'] <= lat <= bounds['max_lat'] and
            bounds['min_lon'] <= lon <= bounds['max_lon']):
            return config['db']
    return 'postgres-global.internal'  # Fallback

# Query router
def query_places(lat, lon, radius_km):
    """Route query to correct shard."""
    db_conn = get_shard_for_location(lat, lon)

    # For cross-boundary queries, query multiple shards
    if is_near_boundary(lat, lon, radius_km):
        shards = get_affected_shards(lat, lon, radius_km)
        results = []
        for shard in shards:
            results.extend(query_shard(shard, lat, lon, radius_km))
        return merge_and_sort(results, lat, lon)
    else:
        return query_shard(db_conn, lat, lon, radius_km)

Step 4: Wrap-Up

Final Architecture Summary

The production-grade proximity service uses a multi-layered approach:

Geospatial Indexing:

  • Redis GEO for low-latency proximity queries (primary)
  • Elasticsearch for complex filters and full-text search
  • PostgreSQL + PostGIS as source of truth
  • Geohash for cache keys and routing

Data Flow:

  • Writes: PostgreSQL → Kafka → Redis GEO + Elasticsearch
  • Reads: Application → L1 Cache → Redis GEO → Ranking → Response
  • Complex queries: Elasticsearch with geo filters

Scalability:

  • Geographic sharding for PostgreSQL and Elasticsearch
  • Redis cluster with regional shards
  • Multi-layer caching (application + Redis)
  • Async indexing via Kafka

Performance:

  • p99 latency < 200ms
  • 200K QPS capacity with auto-scaling
  • 99.99% availability with multi-region deployment

Key Design Decisions

  1. Redis GEO as primary search layer - Sub-millisecond latency for simple proximity queries
  2. PostgreSQL as source of truth - ACID guarantees for place data
  3. Async indexing - Eventual consistency acceptable for search indexes
  4. Geographic sharding - Keeps related data together, reduces cross-region queries
  5. Multi-layer caching - Reduces load on search layer by 80%+

Extensions and Future Work

Real-time place availability:

  • WebSocket connection for live updates
  • Redis Pub/Sub for place status changes
  • Event-driven updates to mobile clients

Machine learning enhancements:

  • Personalized ranking using collaborative filtering
  • Demand prediction for popular areas
  • Anomaly detection for fake reviews

Advanced features:

  • Routing and navigation integration
  • AR-based place discovery
  • Social features (friend check-ins, recommendations)

This design handles 500M+ places, 100M+ DAU, and 200K+ QPS while maintaining sub-200ms latency - production-ready for scale.