Time series storage internals

The problem

Your monitoring system ingests 1 million sensor readings per second from an IoT fleet. You store them in PostgreSQL: INSERT INTO readings (sensor_id, ts, value) VALUES (101, now(), 23.5). At 1M inserts/sec, the B-tree index on (sensor_id, ts) generates massive write amplification, random I/O saturates the disk, and insert throughput collapses to a fraction of what the sensors produce.

The read side is equally painful. An engineer queries 30 days of data for one sensor: SELECT AVG(value) WHERE sensor_id = 101 AND ts > now() - 30d. That is 2.6 billion rows to scan. Even with a proper index, the query takes minutes because the data is spread across thousands of B-tree pages scattered on disk.

PostgreSQL was not designed for this access pattern. It optimizes for random reads and point updates, not for append-only, high-throughput timestamped writes with range scan reads. The timestamps are monotonically increasing, the writes are always at the current time, and reads are almost always "give me everything between time A and time B." A storage engine built around these three properties can be orders of magnitude faster. This is what time series databases solve.

What it is

A time series database (TSDB) is a storage engine optimized for timestamped, append-mostly data with temporal query patterns. Instead of general-purpose B-trees, it uses specialized compression (delta-of-delta for timestamps, XOR for float values), time-partitioned storage (chunks of hours or days), and purpose-built indexes that exploit the monotonic nature of time.

Think of a library that only collects daily newspapers. A general library files every book by author and title, with a card catalog for random lookups. A newspaper archive instead stacks papers chronologically by date. To find all articles from March 2024, you pull the March shelf. No catalog lookup needed, no random seeking. The temporal ordering is the index.

The three properties that TSDBs exploit are: writes are append-only (always at the current timestamp), write volume is high (thousands to millions of points per second), and reads are almost always time-bounded ranges. Every design decision in a TSDB follows from these three assumptions.

How it works

The core architecture of a TSDB has three layers: an in-memory write buffer for absorbing burst traffic, immutable on-disk chunks for persistence, and a time-based index for fast range lookups.

Pseudocode for the write and read paths:

// Write path
function ingest(metric_name, labels, timestamp, value):
  series_id = series_index.get_or_create(metric_name, labels)
  wal.append(series_id, timestamp, value)           // crash safety
  head_chunk[series_id].append(timestamp, value)    // in-memory
  if head_chunk[series_id].age() > CHUNK_DURATION:  // e.g., 2 hours
    flush_to_disk(head_chunk[series_id])             // compress + write
    head_chunk[series_id] = new Chunk()

// Read path: range query
function query_range(metric_name, labels, start, end):
  series_id = series_index.lookup(metric_name, labels)
  chunks = chunk_index.overlapping(series_id, start, end)
  results = []
  for chunk in chunks:
    data = decompress(chunk)
    results.append(data.filter(start, end))
  return merge_sorted(results)

Writes always go to the head chunk in memory, which is periodically flushed to an immutable compressed chunk on disk. Reads identify the relevant chunks by time range, decompress only those chunks, and merge the results. This is fundamentally different from B-tree storage where reads and writes operate on the same mutable pages.

Compression: the Gorilla algorithm

Facebook's Gorilla paper (2015) introduced the compression scheme that most TSDBs now use. It exploits two properties of sensor data: timestamps arrive at regular intervals, and consecutive values change slowly.

Delta-of-delta timestamp encoding

Instead of storing raw timestamps, Gorilla stores the difference between consecutive deltas. For regular-interval data, the delta-of-delta is zero, which encodes in a single bit:

Raw timestamps:    [1704067200, 1704067260, 1704067320, 1704067380]
Deltas:            [-, +60, +60, +60]     (regular 60s interval)
Delta-of-deltas:   [-, -, 0, 0]           (pure zeros for regular data!)

Encoding: 0 = same delta as last time (1 bit)
          non-zero = store the actual delta-of-delta (variable bits)

For 1M regular reads per second, timestamps compress by ~96%

XOR value encoding

For float values, Gorilla XORs each value with the previous one. Sensor values that change slowly produce XOR results with many leading and trailing zeros:

Raw values: [23.5, 23.6, 23.5, 23.7]
XOR with previous:
  23.5 XOR 23.6 → mostly zeros, only a few bits differ
  Store: number of leading zeros + meaningful bits only

The combined result: Gorilla achieves approximately 1.37 bytes per data point compared to 16 bytes raw (8-byte timestamp + 8-byte float64). That is a 12x compression ratio before any general-purpose compression (LZ4, Zstd) on top.

Interview tip: name the compression ratio

When time series storage comes up, say "Gorilla-style compression stores regular metrics at ~1.37 bytes per point versus 16 bytes raw, a 12x reduction, by exploiting the regularity of timestamps and the slow-changing nature of sensor values." This shows you understand the mechanism, not just the existence of compression.

Prometheus uses Gorilla-inspired encoding for its chunks. InfluxDB uses a similar delta + XOR scheme in its TSM engine. VictoriaMetrics further optimizes with custom varint encoding for even better compression on high-cardinality data.

Chunking and the write path

TSDBs partition data into time-bounded chunks (also called blocks or shards). Each chunk covers a fixed time window (typically 2 hours in Prometheus, configurable in InfluxDB). This design has three advantages: writes are always appended to the current chunk, old chunks are immutable and compressible, and retention is implemented by deleting entire chunk files.

The problem

What it is

How it works

The core architecture of a TSDB has three layers: an in-memory write buffer for absorbing burst traffic, immutable on-disk chunks for persistence, and a time-based index for fast range lookups.

Pseudocode for the write and read paths:

// Write path
function ingest(metric_name, labels, timestamp, value):
  series_id = series_index.get_or_create(metric_name, labels)
  wal.append(series_id, timestamp, value)           // crash safety
  head_chunk[series_id].append(timestamp, value)    // in-memory
  if head_chunk[series_id].age() > CHUNK_DURATION:  // e.g., 2 hours
    flush_to_disk(head_chunk[series_id])             // compress + write
    head_chunk[series_id] = new Chunk()

// Read path: range query
function query_range(metric_name, labels, start, end):
  series_id = series_index.lookup(metric_name, labels)
  chunks = chunk_index.overlapping(series_id, start, end)
  results = []
  for chunk in chunks:
    data = decompress(chunk)
    results.append(data.filter(start, end))
  return merge_sorted(results)

Compression: the Gorilla algorithm

Delta-of-delta timestamp encoding

Instead of storing raw timestamps, Gorilla stores the difference between consecutive deltas. For regular-interval data, the delta-of-delta is zero, which encodes in a single bit:

Raw timestamps:    [1704067200, 1704067260, 1704067320, 1704067380]
Deltas:            [-, +60, +60, +60]     (regular 60s interval)
Delta-of-deltas:   [-, -, 0, 0]           (pure zeros for regular data!)

Encoding: 0 = same delta as last time (1 bit)
          non-zero = store the actual delta-of-delta (variable bits)

For 1M regular reads per second, timestamps compress by ~96%

XOR value encoding

For float values, Gorilla XORs each value with the previous one. Sensor values that change slowly produce XOR results with many leading and trailing zeros:

Raw values: [23.5, 23.6, 23.5, 23.7]
XOR with previous:
  23.5 XOR 23.6 → mostly zeros, only a few bits differ
  Store: number of leading zeros + meaningful bits only

Interview tip: name the compression ratio

Time series storage internals

The problem

What it is

How it works

Compression: the Gorilla algorithm

Delta-of-delta timestamp encoding

XOR value encoding

Chunking and the write path

Continue Reading with Premium

Comments

Time series storage internals

The problem

What it is

How it works

Compression: the Gorilla algorithm

Delta-of-delta timestamp encoding

XOR value encoding

Chunking and the write path

Continue Reading with Premium

Comments