Time series storage internals
How time series databases store, compress, and query high-rate timestamped data, including delta-of-delta encoding, Gorilla compression, downsampling, and retention policies that power monitoring and IoT systems.
The problem
Your monitoring system ingests 1 million sensor readings per second from an IoT fleet. You store them in PostgreSQL: INSERT INTO readings (sensor_id, ts, value) VALUES (101, now(), 23.5). At 1M inserts/sec, the B-tree index on (sensor_id, ts) generates massive write amplification, random I/O saturates the disk, and insert throughput collapses to a fraction of what the sensors produce.
The read side is equally painful. An engineer queries 30 days of data for one sensor: SELECT AVG(value) WHERE sensor_id = 101 AND ts > now() - 30d. That is 2.6 billion rows to scan. Even with a proper index, the query takes minutes because the data is spread across thousands of B-tree pages scattered on disk.
PostgreSQL was not designed for this access pattern. It optimizes for random reads and point updates, not for append-only, high-throughput timestamped writes with range scan reads. The timestamps are monotonically increasing, the writes are always at the current time, and reads are almost always "give me everything between time A and time B." A storage engine built around these three properties can be orders of magnitude faster. This is what time series databases solve.
What it is
A time series database (TSDB) is a storage engine optimized for timestamped, append-mostly data with temporal query patterns. Instead of general-purpose B-trees, it uses specialized compression (delta-of-delta for timestamps, XOR for float values), time-partitioned storage (chunks of hours or days), and purpose-built indexes that exploit the monotonic nature of time.
Think of a library that only collects daily newspapers. A general library files every book by author and title, with a card catalog for random lookups. A newspaper archive instead stacks papers chronologically by date. To find all articles from March 2024, you pull the March shelf. No catalog lookup needed, no random seeking. The temporal ordering is the index.
The three properties that TSDBs exploit are: writes are append-only (always at the current timestamp), write volume is high (thousands to millions of points per second), and reads are almost always time-bounded ranges. Every design decision in a TSDB follows from these three assumptions.
How it works
The core architecture of a TSDB has three layers: an in-memory write buffer for absorbing burst traffic, immutable on-disk chunks for persistence, and a time-based index for fast range lookups.
Pseudocode for the write and read paths:
// Write path
function ingest(metric_name, labels, timestamp, value):
series_id = series_index.get_or_create(metric_name, labels)
wal.append(series_id, timestamp, value) // crash safety
head_chunk[series_id].append(timestamp, value) // in-memory
if head_chunk[series_id].age() > CHUNK_DURATION: // e.g., 2 hours
flush_to_disk(head_chunk[series_id]) // compress + write
head_chunk[series_id] = new Chunk()
// Read path: range query
function query_range(metric_name, labels, start, end):
series_id = series_index.lookup(metric_name, labels)
chunks = chunk_index.overlapping(series_id, start, end)
results = []
for chunk in chunks:
data = decompress(chunk)
results.append(data.filter(start, end))
return merge_sorted(results)
Writes always go to the head chunk in memory, which is periodically flushed to an immutable compressed chunk on disk. Reads identify the relevant chunks by time range, decompress only those chunks, and merge the results. This is fundamentally different from B-tree storage where reads and writes operate on the same mutable pages.
Compression: the Gorilla algorithm
Facebook's Gorilla paper (2015) introduced the compression scheme that most TSDBs now use. It exploits two properties of sensor data: timestamps arrive at regular intervals, and consecutive values change slowly.
Delta-of-delta timestamp encoding
Instead of storing raw timestamps, Gorilla stores the difference between consecutive deltas. For regular-interval data, the delta-of-delta is zero, which encodes in a single bit:
Raw timestamps: [1704067200, 1704067260, 1704067320, 1704067380]
Deltas: [-, +60, +60, +60] (regular 60s interval)
Delta-of-deltas: [-, -, 0, 0] (pure zeros for regular data!)
Encoding: 0 = same delta as last time (1 bit)
non-zero = store the actual delta-of-delta (variable bits)
For 1M regular reads per second, timestamps compress by ~96%
XOR value encoding
For float values, Gorilla XORs each value with the previous one. Sensor values that change slowly produce XOR results with many leading and trailing zeros:
Raw values: [23.5, 23.6, 23.5, 23.7]
XOR with previous:
23.5 XOR 23.6 β mostly zeros, only a few bits differ
Store: number of leading zeros + meaningful bits only
The combined result: Gorilla achieves approximately 1.37 bytes per data point compared to 16 bytes raw (8-byte timestamp + 8-byte float64). That is a 12x compression ratio before any general-purpose compression (LZ4, Zstd) on top.
Interview tip: name the compression ratio
When time series storage comes up, say "Gorilla-style compression stores regular metrics at ~1.37 bytes per point versus 16 bytes raw, a 12x reduction, by exploiting the regularity of timestamps and the slow-changing nature of sensor values." This shows you understand the mechanism, not just the existence of compression.
Prometheus uses Gorilla-inspired encoding for its chunks. InfluxDB uses a similar delta + XOR scheme in its TSM engine. VictoriaMetrics further optimizes with custom varint encoding for even better compression on high-cardinality data.
Chunking and the write path
TSDBs partition data into time-bounded chunks (also called blocks or shards). Each chunk covers a fixed time window (typically 2 hours in Prometheus, configurable in InfluxDB). This design has three advantages: writes are always appended to the current chunk, old chunks are immutable and compressible, and retention is implemented by deleting entire chunk files.
Continue Reading with Premium
Unlock this article and every other in-depth system design guide on the platform with NotesFromSDE Premium.