Weather Service
Walk through a complete weather service design: ingesting from 100K sensors at 1,700 writes/sec, resolving arbitrary coordinates to nearby readings via PostGIS in under 5ms, and serving 33K reads/sec through a layered Redis and CDN cache.
What is a weather data service?
A weather data service ingests atmospheric readings from distributed sensor networks and third-party data providers, stores billions of time-stamped measurements, and serves current conditions plus short-term forecasts by geographic coordinate. The engineering challenge is not the meteorology: every user query must translate an arbitrary lat/lng into readings from the nearest stations and aggregate them in under 100ms while the ingestion pipeline handles thousands of sensor writes per second in parallel. It tests IoT ingestion, time-series storage, geospatial indexing, and read-heavy caching strategy in a single system.
Functional Requirements
Core Requirements
- Ingest weather readings from sensor networks or third-party weather APIs at regular intervals.
- Store current and historical observations per geographic location.
- Serve current conditions and short-term forecasts by latitude/longitude.
- Refresh the client UI with updated data at a configurable interval.
Below the Line (out of scope)
- Building numerical weather prediction (NWP) forecasting models.
- Severe weather push alerting.
- Historical data bulk export and analytics.
- User authentication and personalization.
Forecasting model training is out of scope because NWP algorithms (ensemble methods, physics simulations) require specialized ML infrastructure separate from the data serving layer. In practice, you call an external model API (NOAA, Tomorrow.io) and cache the returned forecast. The serving logic is a cache-aside proxy, not a modeling pipeline.
Severe weather alerting could sit beside the write path as a separate consumer: read each new Observation off the Kafka topic, evaluate it against geo-fenced alert cells, and fire a notification if a threshold is crossed. The alert evaluation design mirrors a price alert system. Deliberately deferred because it does not change the ingestion or read path we are designing.
Historical export belongs in a data warehouse (BigQuery, Redshift) fed by a separate Kafka consumer off the same ingestion topic. The export consumer writes to the warehouse; the serving path never touches it. Not a fundamental design change to the read or write path.
User authentication would add a user_id context to the read API and allow personalization such as saved locations. It layers on top of the existing API design without changing the storage or geo-resolution logic.
The hardest part in scope: Translating an arbitrary user coordinate into up-to-date conditions from nearby stations requires a geospatial index that stays fast as the station count grows, and a cache invalidation policy that knows when a reading is stale. Both problems sit in tension: a tighter cache TTL means fresher data but more database reads at scale.
Non-Functional Requirements
Core Requirements
- Scale (writes): 100K weather stations each ingesting one reading every 60 seconds = approximately 1,667 writes/sec sustained. Design for 3x burst headroom to handle thunderstorm events when stations report more frequently.
- Scale (reads): 10M DAU each polling every 5 minutes = approximately 33K reads/sec at even distribution. Expect 5x peak spikes = up to 165K reads/sec during morning weather checks.
- Read latency: Current conditions delivered in under 100ms p99 end-to-end.
- Data freshness: Current conditions stale by no more than 5 minutes. Older than 5 minutes, the app should indicate the last-updated timestamp rather than silently displaying stale data.
- Availability: 99.9% uptime for the read path (current conditions and forecasts). Brief ingestion lag during a partial outage is acceptable; dark screens are not.
- Durability: Historical observations retained for at least 2 years for trend analysis and display.
Below the Line
- Sub-second real-time streaming of sensor readings (pub/sub dashboard use cases).
- Per-sensor raw data export with millisecond timestamps (industrial IoT).
Read/write ratio: Roughly 33K reads vs 1,700 writes peak = approximately 20:1. The system is read-dominant but not as skewed as a URL shortener (1,000:1). The interesting design tension here is that the write path must be durable and throughput-consistent (sensors cannot block waiting for slow writes), while the read path must be fast (geo-query plus aggregation under 100ms). These two paths have conflicting needs that push toward a clean separation via a message queue.
Core Entities
- Station: A geographic measurement source (lat, lng, altitude, station_type). Can be a physical IoT sensor or a virtual aggregation point from a third-party provider. Changes rarely; the table is small and cache-friendly.
- Observation: A single time-stamped reading from one station. Captures temperature_c, humidity_pct, pressure_hpa, wind_speed_kph, wind_direction_deg, and precipitation_mm. The primary hot data; billions of rows accumulate over months.
- Forecast: A set of predicted conditions for a future time window at a location. Sourced from an external NWP API and cached locally. Read-only to this system; the weather service does not generate forecasts.
- WeatherSnapshot: A pre-computed "latest reading" record per station, held in Redis. Avoids live aggregation on every user query. Rebuilt from new Observations as they arrive.
The primary data flow is: raw Observations feed both the durable historical store (TimescaleDB) and the live snapshot cache (Redis). Every user read for current conditions hits the snapshot cache, not the historical store. Schema and indexing decisions are deferred to the deep dives.
API Design
One endpoint per functional requirement, grouped by the requirement it satisfies.
FR 1 (ingest observations):
POST /v1/observations
Content-Type: application/json
Body: {
station_id: "KNYC",
readings: [
{
observed_at: "2026-04-03T12:00:00Z",
temperature_c: 18.5,
humidity_pct: 72,
pressure_hpa: 1013,
wind_speed_kph: 14,
wind_direction_deg: 270,
precipitation_mm: 0
}
]
}
Response 202: { ingested_count: 1 }
Batch array rather than single-record: sensors often buffer 5-10 readings locally during connectivity gaps and flush in a burst. A batch endpoint handles this without the client making one HTTP call per reading. The 202 (Accepted) response confirms the payload was received and queued; it does not guarantee durable storage, which is handled asynchronously by the Kafka consumer.
FR 2 and FR 4 (current conditions with configurable refresh):
GET /v1/weather/current?lat=40.7128&lng=-74.0060&radius_km=25
Response 200: {
location: { lat: 40.7128, lng: -74.0060 },
observed_at: "2026-04-03T12:00:00Z",
temperature_c: 18.5,
humidity_pct: 72,
pressure_hpa: 1013,
wind_speed_kph: 14,
wind_direction_deg: 270,
precipitation_mm: 0,
nearest_station_id: "KNYC",
nearest_station_distance_km: 2.4
}
Cache-Control: max-age=300
The Cache-Control: max-age=300 header drives the UI refresh interval without a separate parameter. CDN edges and browsers respect this header and serve cached responses for 5 minutes before re-requesting. The optional radius_km parameter (default 25 km) controls how wide a net to cast for nearby stations; clients in rural areas with sparse sensor coverage can increase this to 100 km without changing anything else in the pipeline.
FR 3 (short-term forecast):
GET /v1/weather/forecast?lat=40.7128&lng=-74.0060&hours=24
Response 200: {
location: { lat: 40.7128, lng: -74.0060 },
generated_at: "2026-04-03T12:00:00Z",
hourly: [
{ hour: "2026-04-03T13:00:00Z", temperature_c: 19.0, precipitation_prob: 0.1, wind_speed_kph: 12 }
]
}
Cache-Control: max-age=1800
Forecast data from NWP providers updates at most every 30 minutes, so a max-age=1800 TTL is honest about freshness without over-fetching upstream. The generated_at field tells the client when the underlying model ran, independent of when the cached copy was served.
High-Level Design
1. Ingest weather data from sensor networks and third-party APIs
The write path receives batches of sensor readings from two source types: direct sensor adapters using MQTT or HTTP push, and scheduled pollers that call third-party APIs (NOAA, Tomorrow.io) every few minutes.
Components:
- Adapter Service: Thin normalizers that translate heterogeneous sensor protocols (MQTT, HTTP, CoAP) into a canonical
Observationevent. One adapter type per source protocol. - Kafka (
raw-observationstopic): Decouples ingest acceptance from storage writes. Sensors get a fast 202 ACK; the write consumers work at their own pace. Also absorbs thunderstorm write bursts without backpressure reaching sensors. - Ingestion Consumer: Reads from Kafka in 5-second batches and bulk-inserts into TimescaleDB. Batch writes convert 8,500 tiny row inserts into a small number of large COPY operations.
- Snapshot Updater: A second Kafka consumer on the same topic that upserts the latest reading per station_id into Redis. This keeps the fast read cache always-warm without touching the TSDB.
- TimescaleDB: The durable historical store for all Observations, auto-partitioned by time.
- Redis (
station:{id}hash): Holds the latest Observation JSON for each active station. Sub-millisecond lookup; no TSDB round-trip needed for current conditions.
Request walkthrough:
- A sensor flushes a batch of buffered readings to the Adapter Service via MQTT/HTTP.
- The Adapter normalizes the payload to a canonical
Observationschema and produces one Kafka message per reading to theraw-observationstopic. - The Adapter returns HTTP 202 to the sensor immediately (queue accepted, not yet written).
- The Ingestion Consumer reads a 5-second window of messages, batches them, and bulk-inserts into TimescaleDB.
- The Snapshot Updater reads the same messages and UPSERTs each station's latest reading into Redis:
HSET station:KNYC temperature_c 18.5 humidity_pct 72 .... - Both consumers commit their Kafka offsets after a successful write. If either fails, Kafka replays from the last committed offset.
Continue Reading with Premium
Unlock this article and every other in-depth system design guide on the platform with NotesFromSDE Premium.