Design Proximity Service

A proximity service is a geospatial system that enables users to discover nearby points of interest (POI) such as restaurants, gas stations, hotels, or other businesses. Systems like Yelp, Google Maps, Uber, and Foursquare rely on efficient proximity search at massive scale. This design covers how to build a production-grade proximity service handling billions of places and millions of concurrent queries.

Step 1: Requirements Clarification

Functional Requirements

Core Features:

Search for nearby places within a given radius (e.g., 5km, 10km)
Filter results by category (restaurants, gas stations, hotels, etc.)
Return detailed place information (name, address, rating, photos)
Rank results by distance, popularity, ratings, and relevance
Support real-time updates when places are added, modified, or closed
Handle both stationary objects (businesses) and moving objects (users)
Support different search modes: radius search, k-nearest neighbors (kNN)

Optional Features:

Place recommendations based on user preferences
Real-time place availability (e.g., wait times, parking spots)
Direction and navigation integration
User check-ins and reviews

Non-Functional Requirements

Scale:

500 million places globally
100 million daily active users (DAU)
5 billion search queries per day (~57,870 QPS)
Peak load: 200,000 QPS
Each place has ~1KB of data
Search latency: p99 < 200ms

Availability and Reliability:

99.99% availability (52 minutes downtime per year)
Geo-redundant deployment across multiple regions
Graceful degradation during partial failures

Data Characteristics:

Read-heavy: 99% reads, 1% writes
Place data changes infrequently (hours/days)
User location changes frequently (seconds/minutes)
Search patterns are geographically clustered

Storage Estimation:

Place data: 500M places * 1KB = 500GB
With metadata, indexes: ~2TB total
User location cache: 100M users * 100 bytes = 10GB

Step 2: High-Level Design

System Architecture

┌─────────────┐
│   Clients   │ (Mobile, Web)
└──────┬──────┘
       │
┌──────▼──────────────────────────────────────┐
│          API Gateway / Load Balancer         │
│        (Rate Limiting, Authentication)       │
└──────┬──────────────────────────────────────┘
       │
┌──────▼───────────────────────────────────────┐
│           Location Service Cluster           │
│  (Geohash, Quadtree, Proximity Algorithms)   │
└──┬───────────────────────────────────────┬───┘
   │                                       │
┌──▼─────────────────┐         ┌──────────▼──────────┐
│  Search Service    │         │   Ranking Service   │
│  (Elasticsearch/   │         │  (ML-based scoring) │
│   Redis GEO)       │         └──────────┬──────────┘
└──┬─────────────────┘                    │
   │                                       │
┌──▼───────────────────────────────────────▼───┐
│            Place Service (CRUD)              │
│         (Place metadata management)          │
└──┬───────────────────────────────────────────┘
   │
┌──▼──────────────────────────────────────────┐
│     Data Layer                               │
│  ┌────────────┐  ┌──────────┐  ┌─────────┐ │
│  │ PostgreSQL │  │  Redis   │  │ S3/CDN  │ │
│  │ (Master-   │  │  (Cache) │  │(Photos) │ │
│  │  Replica)  │  └──────────┘  └─────────┘ │
│  └────────────┘                             │
└─────────────────────────────────────────────┘

Core Components

1. API Gateway:

Authentication and authorization
Rate limiting per user/API key
Request routing and load balancing
SSL termination

2. Location Service:

Receives user coordinates and search radius
Translates geographic coordinates to geospatial indexes
Performs initial candidate selection
Handles geospatial computations

3. Search Service:

Executes proximity queries using geospatial indexes
Filters results by category, hours, ratings
Returns candidate list to ranking service
Powered by Redis GEO or Elasticsearch

4. Ranking Service:

Scores candidates based on multiple factors
Distance-based scoring (closer = higher)
Popularity signals (reviews, ratings, check-ins)
Personalization based on user preferences
ML models for relevance ranking

5. Place Service:

CRUD operations for place data
Manages place metadata (name, category, hours, photos)
Handles place updates and deletions
Synchronizes with search indexes

6. Data Layer:

PostgreSQL: Primary source of truth for place data
Redis: Geospatial index and cache layer
Elasticsearch: Full-text search and complex geo queries
S3/CDN: Static assets (photos, logos)

API Design

GET /v1/search/nearby
Parameters:
  - latitude: double (required)
  - longitude: double (required)
  - radius: int (meters, default: 5000, max: 50000)
  - category: string (optional)
  - limit: int (default: 20, max: 100)
  - offset: int (pagination)
  - sort: string (distance, rating, popularity)

Response:
{
  "results": [
    {
      "place_id": "uuid",
      "name": "Blue Bottle Coffee",
      "category": "cafe",
      "location": {"lat": 37.7749, "lon": -122.4194},
      "distance": 450, // meters
      "rating": 4.5,
      "price_level": 2,
      "open_now": true,
      "photos": ["url1", "url2"]
    }
  ],
  "total": 156,
  "next_offset": 20
}

Step 3: Deep Dive into Critical Components

3.1 Geospatial Indexing Strategies

The core challenge is efficiently finding nearby places from 500M+ locations. Traditional database indexes don’t work well for 2D spatial queries.

Option 1: Geohash

How it works: Geohash encodes latitude/longitude into a short alphanumeric string. Nearby locations share common prefixes.

San Francisco: 9q8yy (37.7749, -122.4194)
Oakland:       9q9p1 (37.8044, -122.2712)

Properties:

4-character geohash: ~20km x 20km grid
5-character geohash: ~4.9km x 4.9km grid
6-character geohash: ~1.2km x 1.2km grid
7-character geohash: ~152m x 152m grid

Implementation in Redis:

# Add places to geospatial index
GEOADD places:geo -122.4194 37.7749 "place:123"
GEOADD places:geo -122.2712 37.8044 "place:456"

# Search within radius
GEORADIUS places:geo -122.4194 37.7749 5 km WITHDIST WITHCOORD COUNT 20

# Search by existing member
GEORADIUSBYMEMBER places:geo "place:123" 10 km WITHDIST

Advantages:

Simple to implement
Fast lookups using prefix matching
Works with standard databases (index on geohash string)
Consistent grid size at same precision level

Disadvantages:

Edge cases: Places just across geohash boundaries might be missed
Requires checking neighboring geohashes for border queries
Fixed grid doesn’t adapt to density

Production Implementation:

import geohash2

def find_nearby_geohashes(lat, lon, radius_km):
    """
    Returns list of geohashes to check based on radius.
    For 5km radius, check center + 8 neighbors.
    """
    center_hash = geohash2.encode(lat, lon, precision=6)
    neighbors = geohash2.neighbors(center_hash)
    return [center_hash] + neighbors

def search_nearby(lat, lon, radius_km, category=None):
    geohashes = find_nearby_geohashes(lat, lon, radius_km)

    candidates = []
    for gh in geohashes:
        # Query database with geohash prefix
        places = db.query(
            "SELECT * FROM places WHERE geohash LIKE ? AND category = ?",
            (gh + '%', category)
        )
        candidates.extend(places)

    # Filter by actual distance
    results = []
    for place in candidates:
        distance = haversine(lat, lon, place.lat, place.lon)
        if distance <= radius_km:
            results.append((place, distance))

    return sorted(results, key=lambda x: x[1])

Option 2: Quadtree

How it works: Quadtree recursively divides 2D space into four quadrants. Dense areas get more subdivisions, sparse areas remain coarse.

Root (World)
├─ NW (North America)
│  ├─ NW (Pacific Northwest)
│  ├─ NE (Northeast US)
│  ├─ SW (Southwest US)
│  └─ SE (Southeast US)
├─ NE (Europe)
├─ SW (South America)
└─ SE (Asia)

Structure:

class QuadTreeNode:
    def __init__(self, boundary, capacity=50):
        self.boundary = boundary  # (min_lat, max_lat, min_lon, max_lon)
        self.capacity = capacity
        self.places = []
        self.divided = False
        self.nw = self.ne = self.sw = self.se = None

    def subdivide(self):
        mid_lat = (self.boundary.min_lat + self.boundary.max_lat) / 2
        mid_lon = (self.boundary.min_lon + self.boundary.max_lon) / 2

        self.nw = QuadTreeNode(Boundary(mid_lat, max_lat, min_lon, mid_lon))
        self.ne = QuadTreeNode(Boundary(mid_lat, max_lat, mid_lon, max_lon))
        self.sw = QuadTreeNode(Boundary(min_lat, mid_lat, min_lon, mid_lon))
        self.se = QuadTreeNode(Boundary(min_lat, mid_lat, mid_lon, max_lon))
        self.divided = True

    def insert(self, place):
        if not self.boundary.contains(place.location):
            return False

        if len(self.places) < self.capacity:
            self.places.append(place)
            return True

        if not self.divided:
            self.subdivide()
            # Redistribute existing places
            for p in self.places:
                self._insert_to_child(p)
            self.places = []

        return self._insert_to_child(place)

    def search_radius(self, center, radius):
        results = []
        if not self.boundary.intersects_circle(center, radius):
            return results

        # Check leaf nodes
        for place in self.places:
            if distance(center, place.location) <= radius:
                results.append(place)

        # Recurse to children
        if self.divided:
            results.extend(self.nw.search_radius(center, radius))
            results.extend(self.ne.search_radius(center, radius))
            results.extend(self.sw.search_radius(center, radius))
            results.extend(self.se.search_radius(center, radius))

        return results

Advantages:

Adaptive: More subdivisions in dense areas (Manhattan) vs sparse areas (rural)
Efficient for k-nearest neighbor queries
No edge case issues like geohash
Memory-efficient for sparse regions

Disadvantages:

Complex to implement and maintain
Expensive to rebalance on updates
Difficult to distribute across multiple servers
In-memory structure, hard to persist

Best for: In-memory caching layer, not primary storage.

Option 3: R-tree

How it works: R-tree is similar to B-tree but for multi-dimensional data. Groups nearby objects into hierarchical bounding boxes.

Use with PostgreSQL + PostGIS:

-- Create table with geospatial column
CREATE TABLE places (
    id UUID PRIMARY KEY,
    name VARCHAR(255),
    location GEOGRAPHY(POINT, 4326),
    category VARCHAR(50)
);

-- Create spatial index (uses R-tree internally)
CREATE INDEX idx_places_location ON places USING GIST(location);

-- Query nearby places
SELECT
    id,
    name,
    ST_Distance(location, ST_MakePoint(-122.4194, 37.7749)::geography) AS distance
FROM places
WHERE ST_DWithin(
    location,
    ST_MakePoint(-122.4194, 37.7749)::geography,
    5000  -- 5km in meters
)
AND category = 'restaurant'
ORDER BY distance
LIMIT 20;

Advantages:

Battle-tested with PostgreSQL PostGIS
Handles complex geospatial queries
ACID transactions for updates
Production-grade reliability

Disadvantages:

Slower than in-memory solutions (Redis GEO)
Database load increases with query volume
Scaling requires read replicas and sharding

3.2 Redis GEO for High-Performance Proximity Search

Architecture:

┌──────────────────────────────────────┐
│       Redis GEO Cluster              │
│  ┌────────┐  ┌────────┐  ┌────────┐ │
│  │Shard 1 │  │Shard 2 │  │Shard 3 │ │
│  │ US-West│  │ US-East│  │ Europe │ │
│  └────────┘  └────────┘  └────────┘ │
└──────────────────────────────────────┘

Why Redis GEO:

Sub-millisecond latency
100K+ queries per second per instance
Built-in geospatial commands
Sorted set implementation using geohash

Production Implementation:

import redis
from typing import List, Dict

class ProximitySearchService:
    def __init__(self):
        self.redis_client = redis.Redis(
            host='redis-cluster.internal',
            port=6379,
            decode_responses=True,
            socket_connect_timeout=2,
            socket_timeout=2
        )

    def index_place(self, place_id: str, lat: float, lon: float,
                    category: str):
        """
        Index place in Redis GEO by category.
        Key pattern: places:geo:{category}
        """
        key = f"places:geo:{category}"
        self.redis_client.geoadd(key, (lon, lat, place_id))

        # Also add to global index for category-agnostic search
        self.redis_client.geoadd("places:geo:all", (lon, lat, place_id))

        # Store place metadata separately
        self.redis_client.hset(f"place:{place_id}", mapping={
            "name": place.name,
            "category": category,
            "rating": place.rating,
            "lat": lat,
            "lon": lon
        })

    def search_nearby(self, lat: float, lon: float, radius_m: int,
                      category: str = None, limit: int = 20) -> List[Dict]:
        """
        Search nearby places using Redis GEORADIUS.
        """
        key = f"places:geo:{category}" if category else "places:geo:all"

        # GEORADIUS with distance and coordinates
        results = self.redis_client.georadius(
            name=key,
            longitude=lon,
            latitude=lat,
            radius=radius_m,
            unit='m',
            withdist=True,
            withcoord=True,
            count=limit,
            sort='ASC'  # Closest first
        )

        # Fetch place metadata
        places = []
        for place_id, distance, coords in results:
            metadata = self.redis_client.hgetall(f"place:{place_id}")
            places.append({
                "place_id": place_id,
                "distance": distance,
                "location": {"lat": coords[1], "lon": coords[0]},
                **metadata
            })

        return places

    def search_knn(self, lat: float, lon: float, k: int = 10,
                   category: str = None) -> List[Dict]:
        """
        Find k-nearest neighbors regardless of distance.
        Start with small radius and expand until k results found.
        """
        radius = 1000  # Start with 1km
        max_radius = 50000  # Max 50km

        while radius <= max_radius:
            results = self.search_nearby(lat, lon, radius, category, limit=k)
            if len(results) >= k:
                return results[:k]
            radius *= 2  # Exponential backoff

        return results

Redis GEO Internals:

Uses sorted set with geohash as score
Geohash is 52-bit integer (fits in Redis score)
GEORADIUS queries sorted set by geohash range
Performance: O(N+log(M)) where N = results, M = total items

Sharding Strategy: Shard by geographic region to keep related data together:

def get_redis_shard(lat: float, lon: float) -> str:
    """Route to appropriate Redis shard based on location."""
    if -125 < lon < -65 and 25 < lat < 50:
        return "redis-us"
    elif -10 < lon < 40 and 35 < lat < 70:
        return "redis-eu"
    elif 100 < lon < 145 and 20 < lat < 45:
        return "redis-asia"
    else:
        return "redis-global"

3.3 Elasticsearch for Complex Geo Queries

When to use Elasticsearch:

Need full-text search (“coffee near me”)
Complex filters (category AND open_now AND rating > 4.0)
Faceted search (aggregate by category)
Geospatial bounding box queries

Index Mapping:

{
  "mappings": {
    "properties": {
      "place_id": {"type": "keyword"},
      "name": {
        "type": "text",
        "fields": {
          "keyword": {"type": "keyword"}
        }
      },
      "location": {"type": "geo_point"},
      "category": {"type": "keyword"},
      "rating": {"type": "float"},
      "price_level": {"type": "integer"},
      "open_now": {"type": "boolean"},
      "hours": {
        "type": "nested",
        "properties": {
          "day": {"type": "keyword"},
          "open": {"type": "keyword"},
          "close": {"type": "keyword"}
        }
      },
      "popularity_score": {"type": "float"}
    }
  }
}

Geo Query Examples:

# Geo distance query
{
  "query": {
    "bool": {
      "must": {
        "match": {"category": "restaurant"}
      },
      "filter": {
        "geo_distance": {
          "distance": "5km",
          "location": {
            "lat": 37.7749,
            "lon": -122.4194
          }
        }
      }
    }
  },
  "sort": [
    {
      "_geo_distance": {
        "location": {
          "lat": 37.7749,
          "lon": -122.4194
        },
        "order": "asc",
        "unit": "km"
      }
    }
  ]
}

# Geo bounding box query
{
  "query": {
    "bool": {
      "filter": {
        "geo_bounding_box": {
          "location": {
            "top_left": {"lat": 37.8, "lon": -122.5},
            "bottom_right": {"lat": 37.7, "lon": -122.3}
          }
        }
      }
    }
  }
}

# Complex query with multiple filters
{
  "query": {
    "bool": {
      "must": [
        {"match": {"name": "coffee"}}
      ],
      "filter": [
        {"term": {"category": "cafe"}},
        {"term": {"open_now": true}},
        {"range": {"rating": {"gte": 4.0}}},
        {
          "geo_distance": {
            "distance": "2km",
            "location": {"lat": 37.7749, "lon": -122.4194}
          }
        }
      ]
    }
  },
  "sort": [
    {"_score": "desc"},
    {"_geo_distance": {
      "location": {"lat": 37.7749, "lon": -122.4194},
      "order": "asc"
    }}
  ]
}

Sharding by Region:

# Create index per region for better performance
indices = [
    "places-us-west",
    "places-us-east",
    "places-europe",
    "places-asia"
]

def search_places(lat, lon, query_params):
    """Search appropriate regional index."""
    index = get_index_by_location(lat, lon)

    response = es_client.search(
        index=index,
        body={
            "query": build_geo_query(lat, lon, query_params),
            "size": 20
        }
    )

    return response['hits']['hits']

3.4 Ranking Service

Ranking Factors:

Distance (primary)
Popularity (reviews, check-ins)
Rating
Personalization (user preferences)
Freshness (newly opened places)
Business tier (promoted listings)

Scoring Formula:

def calculate_score(place, user_location, user_prefs):
    """
    Multi-factor scoring for place ranking.
    """
    # Distance score (inverse exponential)
    distance_km = haversine(user_location, place.location)
    distance_score = math.exp(-distance_km / 5.0)  # Decay over 5km

    # Rating score (normalized)
    rating_score = place.rating / 5.0

    # Popularity score (log scale)
    popularity_score = math.log10(place.review_count + 1) / 4.0

    # Personalization score
    category_match = 1.0 if place.category in user_prefs else 0.5

    # Weighted combination
    total_score = (
        0.50 * distance_score +
        0.20 * rating_score +
        0.15 * popularity_score +
        0.15 * category_match
    )

    return total_score

# Sort by score
ranked_places = sorted(
    candidates,
    key=lambda p: calculate_score(p, user_loc, user_prefs),
    reverse=True
)

ML-Based Ranking:

import numpy as np
from sklearn.ensemble import GradientBoostingRegressor

class MLRankingService:
    def __init__(self):
        self.model = self.load_model()

    def extract_features(self, place, user_location, user_prefs):
        """Extract features for ML model."""
        distance = haversine(user_location, place.location)

        return np.array([
            distance,
            place.rating,
            place.review_count,
            place.price_level,
            int(place.open_now),
            place.popularity_score,
            int(place.category in user_prefs),
            place.days_since_opened
        ])

    def rank(self, places, user_location, user_prefs):
        """Rank places using ML model."""
        features = [
            self.extract_features(p, user_location, user_prefs)
            for p in places
        ]

        scores = self.model.predict(features)

        ranked = sorted(
            zip(places, scores),
            key=lambda x: x[1],
            reverse=True
        )

        return [place for place, score in ranked]

3.5 Handling Moving Objects (Users)

Challenge: User location updates frequently but we don’t want to reindex constantly.

Solution: Separate User Location Cache

class UserLocationService:
    def __init__(self):
        self.redis = redis.Redis()

    def update_user_location(self, user_id: str, lat: float, lon: float):
        """
        Cache user location with TTL.
        No need to index in geospatial structure.
        """
        self.redis.setex(
            f"user:location:{user_id}",
            300,  # 5 minute TTL
            json.dumps({"lat": lat, "lon": lon, "timestamp": time.time()})
        )

    def get_user_location(self, user_id: str):
        """Retrieve cached user location."""
        data = self.redis.get(f"user:location:{user_id}")
        return json.loads(data) if data else None

For ride-sharing (moving objects that need to be searched):

# Update driver location in Redis GEO
def update_driver_location(driver_id, lat, lon):
    # Remove old location
    redis.zrem("drivers:active", driver_id)

    # Add new location
    redis.geoadd("drivers:active", (lon, lat, driver_id))

    # Set expiry on driver metadata
    redis.setex(f"driver:{driver_id}", 60, json.dumps({
        "lat": lat,
        "lon": lon,
        "status": "available"
    }))

# Cleanup expired drivers
def cleanup_expired_drivers():
    """Remove drivers who haven't updated in 60 seconds."""
    all_drivers = redis.zrange("drivers:active", 0, -1)
    for driver_id in all_drivers:
        if not redis.exists(f"driver:{driver_id}"):
            redis.zrem("drivers:active", driver_id)

3.6 Place Data Management and Updates

Write Path:

Place Update → API → Place Service → PostgreSQL (Write)
                                    ↓
                            Update Redis GEO (Async)
                                    ↓
                            Update Elasticsearch (Async)

Implementation:

class PlaceService:
    def __init__(self):
        self.db = PostgresConnection()
        self.redis = RedisConnection()
        self.es = ElasticsearchConnection()
        self.mq = KafkaProducer()

    def create_place(self, place_data):
        """Create new place."""
        # 1. Write to PostgreSQL (source of truth)
        place_id = self.db.execute("""
            INSERT INTO places (name, category, location, rating)
            VALUES (%s, %s, ST_Point(%s, %s), %s)
            RETURNING id
        """, (
            place_data['name'],
            place_data['category'],
            place_data['lon'],
            place_data['lat'],
            place_data['rating']
        ))

        # 2. Publish to Kafka for async indexing
        self.mq.send('place-updates', {
            'event': 'create',
            'place_id': place_id,
            'data': place_data
        })

        return place_id

    def update_place(self, place_id, updates):
        """Update existing place."""
        # Update PostgreSQL
        self.db.execute("""
            UPDATE places
            SET name = %s, rating = %s, updated_at = NOW()
            WHERE id = %s
        """, (updates['name'], updates['rating'], place_id))

        # Publish update event
        self.mq.send('place-updates', {
            'event': 'update',
            'place_id': place_id,
            'updates': updates
        })

# Async indexer consumer
class PlaceIndexer:
    def consume_updates(self):
        """Process place updates from Kafka."""
        for message in kafka_consumer:
            event = message.value

            if event['event'] == 'create':
                self._index_new_place(event['place_id'], event['data'])
            elif event['event'] == 'update':
                self._update_indexes(event['place_id'], event['updates'])
            elif event['event'] == 'delete':
                self._remove_from_indexes(event['place_id'])

    def _index_new_place(self, place_id, data):
        """Add to Redis GEO and Elasticsearch."""
        # Index in Redis GEO
        category = data['category']
        redis.geoadd(
            f"places:geo:{category}",
            (data['lon'], data['lat'], place_id)
        )

        # Index in Elasticsearch
        es.index(
            index='places',
            id=place_id,
            document={
                'place_id': place_id,
                'name': data['name'],
                'category': category,
                'location': {'lat': data['lat'], 'lon': data['lon']},
                'rating': data['rating']
            }
        )

3.7 Caching Strategy

Multi-Layer Cache:

L1: Application Cache (In-Memory)

from functools import lru_cache
import hashlib

class CacheLayer:
    def __init__(self):
        self.local_cache = {}  # In-memory dict

    def get_cache_key(self, lat, lon, radius, category):
        """Generate cache key from search params."""
        # Round to 3 decimal places (~100m precision)
        lat_rounded = round(lat, 3)
        lon_rounded = round(lon, 3)
        return f"{lat_rounded}:{lon_rounded}:{radius}:{category}"

    @lru_cache(maxsize=1000)
    def search_cached(self, lat, lon, radius, category):
        """LRU cache for frequent searches."""
        cache_key = self.get_cache_key(lat, lon, radius, category)

        # Check local cache first
        if cache_key in self.local_cache:
            return self.local_cache[cache_key]

        # Execute search
        results = self.search_nearby(lat, lon, radius, category)

        # Cache results
        self.local_cache[cache_key] = results
        return results

L2: Redis Cache (Distributed)

def search_with_cache(lat, lon, radius, category):
    """Search with Redis cache layer."""
    cache_key = f"search:{round(lat,2)}:{round(lon,2)}:{radius}:{category}"

    # Check cache
    cached = redis.get(cache_key)
    if cached:
        return json.loads(cached)

    # Execute search
    results = proximity_search(lat, lon, radius, category)

    # Cache for 5 minutes
    redis.setex(cache_key, 300, json.dumps(results))

    return results

Cache Invalidation:

def invalidate_place_cache(place_id):
    """Invalidate cache when place is updated."""
    # Get place location
    place = db.get_place(place_id)

    # Invalidate all cache keys in surrounding area
    geohashes = get_nearby_geohashes(place.lat, place.lon, radius=10)

    for gh in geohashes:
        pattern = f"search:{gh}:*"
        keys = redis.keys(pattern)
        if keys:
            redis.delete(*keys)

3.8 Database Sharding by Region

Sharding Strategy: Partition data by geographic region to improve query performance and enable regional isolation.

# Shard mapping
SHARDS = {
    'us-west': {
        'bounds': {'min_lat': 32, 'max_lat': 49, 'min_lon': -125, 'max_lon': -100},
        'db': 'postgres-us-west.internal'
    },
    'us-east': {
        'bounds': {'min_lat': 25, 'max_lat': 48, 'min_lon': -100, 'max_lon': -65},
        'db': 'postgres-us-east.internal'
    },
    'europe': {
        'bounds': {'min_lat': 35, 'max_lat': 70, 'min_lon': -10, 'max_lon': 40},
        'db': 'postgres-eu.internal'
    }
}

def get_shard_for_location(lat, lon):
    """Route query to appropriate database shard."""
    for shard_name, config in SHARDS.items():
        bounds = config['bounds']
        if (bounds['min_lat'] <= lat <= bounds['max_lat'] and
            bounds['min_lon'] <= lon <= bounds['max_lon']):
            return config['db']
    return 'postgres-global.internal'  # Fallback

# Query router
def query_places(lat, lon, radius_km):
    """Route query to correct shard."""
    db_conn = get_shard_for_location(lat, lon)

    # For cross-boundary queries, query multiple shards
    if is_near_boundary(lat, lon, radius_km):
        shards = get_affected_shards(lat, lon, radius_km)
        results = []
        for shard in shards:
            results.extend(query_shard(shard, lat, lon, radius_km))
        return merge_and_sort(results, lat, lon)
    else:
        return query_shard(db_conn, lat, lon, radius_km)

Step 4: Wrap-Up

Final Architecture Summary

The production-grade proximity service uses a multi-layered approach:

Geospatial Indexing:

Redis GEO for low-latency proximity queries (primary)
Elasticsearch for complex filters and full-text search
PostgreSQL + PostGIS as source of truth
Geohash for cache keys and routing

Data Flow:

Writes: PostgreSQL → Kafka → Redis GEO + Elasticsearch
Reads: Application → L1 Cache → Redis GEO → Ranking → Response
Complex queries: Elasticsearch with geo filters

Scalability:

Geographic sharding for PostgreSQL and Elasticsearch
Redis cluster with regional shards
Multi-layer caching (application + Redis)
Async indexing via Kafka

Performance:

p99 latency < 200ms
200K QPS capacity with auto-scaling
99.99% availability with multi-region deployment

Key Design Decisions

Redis GEO as primary search layer - Sub-millisecond latency for simple proximity queries
PostgreSQL as source of truth - ACID guarantees for place data
Async indexing - Eventual consistency acceptable for search indexes
Geographic sharding - Keeps related data together, reduces cross-region queries
Multi-layer caching - Reduces load on search layer by 80%+

Extensions and Future Work

Real-time place availability:

WebSocket connection for live updates
Redis Pub/Sub for place status changes
Event-driven updates to mobile clients

Machine learning enhancements:

Personalized ranking using collaborative filtering
Demand prediction for popular areas
Anomaly detection for fake reviews

Advanced features:

Routing and navigation integration
AR-based place discovery
Social features (friend check-ins, recommendations)

This design handles 500M+ places, 100M+ DAU, and 200K+ QPS while maintaining sub-200ms latency - production-ready for scale.

Design Proximity Service