Segment trees
How segment trees enable efficient range queries and range updates in O(log N), the data structure behind real-time leaderboards, financial analytics, and database range aggregations.
The problem
You are building a real-time gaming leaderboard. One million players have scores that change constantly (point updates), and your API needs to answer "how many players have scores between 500 and 1000?" (range queries) in under 10 milliseconds.
With a raw array, answering a range query requires scanning every element in the range. For a million elements, that is O(N) per query. At 10,000 queries per second, you are doing 10 billion comparisons per second. The leaderboard page takes 3 seconds to load. Response times degrade linearly as the player base grows.
You could precompute prefix sums for O(1) range queries. But when a player's score changes, you must rebuild the entire prefix sum array, which is O(N). With 50,000 score updates per second, the prefix sum array is permanently stale. By the time you finish rebuilding, thousands of new updates have arrived.
This is the fundamental tension: arrays are fast to update but slow to query. Prefix sums are fast to query but slow to update. You need a structure that gives O(log N) for both operations simultaneously.
| Approach | Query | Update |
|---|---|---|
| Raw array | O(N) scan | O(1) |
| Prefix sum array | O(1) | O(N) rebuild |
| Segment tree | O(log N) | O(log N) |
When both queries and updates happen at scale, neither extreme works. Segment trees handle both in O(log N).
What it is
A segment tree is a binary tree where each node stores a precomputed aggregate (sum, min, max, count) over a contiguous segment of an array. The root covers the entire array, and each leaf covers a single element.
Think of a company org chart for budget tracking. The CEO node holds the total budget of the entire company. Each VP node holds the total for their division. Each manager node holds the total for their team. Each individual contributor is a leaf with their own budget. To find the total budget of divisions 3 through 5, you sum three VP nodes instead of iterating through hundreds of employees. When one employee's budget changes, you update their leaf and recalculate upward through their manager, VP, and CEO, touching only log(N) nodes.
Interview tip: the one-liner
"A segment tree precomputes aggregates over every possible range in O(N) space, then answers any range query in O(log N) by combining at most 2 * log(N) precomputed nodes, and handles point updates in O(log N) by recomputing ancestors."
How it works
A segment tree is a complete binary tree. Each node stores the aggregate of a range of the original array. The root covers the full array; leaves cover individual elements.
For N elements, the tree has at most 4N nodes. Storage is O(N). The tree is stored as a flat array where node i has children at 2i+1 and 2i+2. No pointers needed.
class SegmentTree {
private tree: number[];
private n: number;
constructor(arr: number[]) {
this.n = arr.length;
this.tree = new Array(4 * this.n).fill(0);
this.build(arr, 0, 0, this.n - 1);
}
private build(arr: number[], node: number, start: number, end: number): void {
if (start === end) {
this.tree[node] = arr[start];
return;
}
const mid = Math.floor((start + end) / 2);
const left = 2 * node + 1;
const right = 2 * node + 2;
this.build(arr, left, start, mid);
this.build(arr, right, mid + 1, end);
this.tree[node] = this.tree[left] + this.tree[right]; // merge: sum
}
}
Build time: O(N). Each element is visited once. The recursive build visits every node exactly once, and each node's computation (merging two children) is O(1).
Array-based storage
The tree is stored as a flat array rather than with node pointers. Node at index i has its left child at 2i + 1 and right child at 2i + 2. This eliminates pointer overhead and allows single-block allocation. Why 4N nodes? For power-of-2 input sizes, the tree has exactly 2N - 1 nodes. For non-power-of-2 sizes, the last level may be sparse, requiring up to 4N slots in the worst case.
Range query
A query for sum([L, R]) descends the tree, cutting off branches that lie entirely outside [L, R] and using precomputed sums for branches entirely inside.
The query decomposes [2, 5] into two precomputed nodes: sum[2,3] = 5 and sum[4,5] = 14. Answer: 19. Only 7 nodes were visited instead of scanning 4 array elements. At scale, this is O(log N) versus O(N).
query(queryLeft: number, queryRight: number): number {
return this._query(0, 0, this.n - 1, queryLeft, queryRight);
}
private _query(
node: number, start: number, end: number,
qLeft: number, qRight: number
): number {
// No overlap
if (qRight < start || end < qLeft) return 0; // identity for sum
// Total overlap: use precomputed value
if (qLeft <= start && end <= qRight) return this.tree[node];
// Partial overlap: recurse into both children
const mid = Math.floor((start + end) / 2);
const leftSum = this._query(2 * node + 1, start, mid, qLeft, qRight);
const rightSum = this._query(2 * node + 2, mid + 1, end, qLeft, qRight);
return leftSum + rightSum;
}
At each level, at most 4 nodes are visited (2 partial overlaps, potentially 2 full overlaps that terminate). At most 2 nodes per level create "partial overlap" recursive calls, and the rest terminate early. Result: O(log N).
The three cases for each node during a query:
- No overlap (query range is entirely outside the node's range): return the identity element immediately. Zero work.
- Total overlap (node's range is entirely inside the query range): return the precomputed value immediately. Zero recursion.
- Partial overlap (query range partially covers the node's range): recurse into both children and merge results.
Point update
Change arr[i] = newValue. Walk from root to the leaf at index i, updating every ancestor.
The update touches exactly log(N) nodes: one per tree level. Each node recomputation is O(1) (summing two children), so the total is O(log N).
update(index: number, newValue: number): void {
this._update(0, 0, this.n - 1, index, newValue);
}
private _update(
node: number, start: number, end: number,
index: number, newValue: number
): void {
if (start === end) {
this.tree[node] = newValue;
return;
}
const mid = Math.floor((start + end) / 2);
if (index <= mid) {
this._update(2 * node + 1, start, mid, index, newValue);
} else {
this._update(2 * node + 2, mid + 1, end, index, newValue);
}
// Recompute this node from updated children
this.tree[node] = this.tree[2 * node + 1] + this.tree[2 * node + 2];
}
Touches exactly log N nodes. O(log N).
Notice the update code is nearly identical to the query code. The only difference: at the leaf, it sets the value instead of returning it, and on the way back up, it recomputes each parent from its two children. This symmetry makes segment trees easy to implement once you understand the recursive pattern.
Lazy propagation for range updates
When updating a range [L, R] instead of a single point, the naive approach degrades to O(N log N) because you call point update for each element. Lazy propagation defers updates: tag a node "this entire range needs +5" and only push the tag down when you actually need to recurse into children.
Common interview mistake
Candidates often describe range updates as "just call point update in a loop." This is O(N log N), which defeats the purpose. Always mention lazy propagation when range updates come up. It reduces range update + range query to O(log N) each.
class LazySegmentTree {
private tree: number[];
private lazy: number[]; // pending range-add values
private n: number;
// ... build same as before ...
private pushDown(node: number, start: number, end: number): void {
if (this.lazy[node] !== 0) {
const mid = Math.floor((start + end) / 2);
const leftSize = mid - start + 1;
const rightSize = end - mid;
this.tree[2 * node + 1] += this.lazy[node] * leftSize;
this.tree[2 * node + 2] += this.lazy[node] * rightSize;
this.lazy[2 * node + 1] += this.lazy[node];
this.lazy[2 * node + 2] += this.lazy[node];
this.lazy[node] = 0; // clear after propagating
}
}
rangeUpdate(qLeft: number, qRight: number, delta: number): void {
this._rangeUpdate(0, 0, this.n - 1, qLeft, qRight, delta);
}
private _rangeUpdate(
node: number, start: number, end: number,
qLeft: number, qRight: number, delta: number
): void {
if (qRight < start || end < qLeft) return;
if (qLeft <= start && end <= qRight) {
// Tag the whole range, don't recurse yet
this.tree[node] += delta * (end - start + 1);
this.lazy[node] += delta;
return;
}
this.pushDown(node, start, end);
const mid = Math.floor((start + end) / 2);
this._rangeUpdate(2 * node + 1, start, mid, qLeft, qRight, delta);
this._rangeUpdate(2 * node + 2, mid + 1, end, qLeft, qRight, delta);
this.tree[node] = this.tree[2 * node + 1] + this.tree[2 * node + 2];
}
}
With lazy propagation, range update + range query are both O(log N). The lazy array adds O(N) space but transforms the asymptotic behavior of range operations.
How pushDown works conceptually: When a node has a pending lazy tag of +5, it means "every element in this node's range needs +5 added, but the children don't know yet." When we need to access a child (during query or update), we first push the tag down: add (+5 x child_size) to each child's sum, transfer the +5 tag to each child's lazy slot, and clear the parent's lazy slot. This ensures the children always have correct values when accessed.
The key invariant: tree[node] always reflects the correct aggregate for its range, including any pending lazy values. Only the children can be stale. This makes the implementation local: each node only needs to push down before accessing its own children.
Aggregate function variants
The examples above use sum as the aggregate. Segment trees work with any associative binary operation that has an identity element.
| Aggregate | Merge function | Identity element | Use case |
|---|---|---|---|
| Sum | a + b | 0 | Total revenue in range, event counts |
| Min | min(a, b) | +infinity | Lowest price in time window |
| Max | max(a, b) | -infinity | Peak traffic in interval |
| GCD | gcd(a, b) | 0 | Number theory problems |
| Bitwise OR | a | b | 0 | Flag union queries |
| Count (with condition) | a + b | 0 | "How many elements > X in range?" |
The identity element is returned when a query falls entirely outside a node's range (no overlap). For sum, returning 0 has no effect on the total. For min, returning +infinity cannot be the minimum.
My recommendation: when building a segment tree, parameterize the merge function and identity. This makes the implementation reusable across different aggregate types without duplicating code.
The reason this works: the segment tree algorithm only cares that the merge function is associative (merge(merge(a, b), c) == merge(a, merge(b, c))) and has an identity element. It never needs to know what the aggregate actually means. This is why one segment tree template can handle sum, min, max, GCD, and any custom associative function.
function build_segment_tree(array, merge_fn, identity):
// merge_fn: (a, b) -> result
// identity: value that satisfies merge_fn(identity, x) == x
tree = new Array(4 * length(array))
fill(tree, identity)
build_recursive(tree, array, merge_fn, 0, 0, length(array) - 1)
return tree
Persistent segment trees
A persistent segment tree preserves all previous versions of the tree after each update. Instead of modifying nodes in-place, each update creates new nodes for the path from root to the updated leaf, sharing all unmodified subtrees with the previous version.
This is copy-on-write applied to a segment tree. An update creates O(log N) new nodes (one per level), while the remaining O(N) nodes are shared with the previous version. The result: you can query any historical version of the array in O(log N) with no performance penalty compared to querying the current version.
Where this matters: online judges and competitive programming aside, persistent segment trees appear in version-controlled databases and time-travel queries. If your system needs "what was the sum of this range at timestamp T?", a persistent segment tree answers this without storing full snapshots.
The space cost is O(N + Q x log N) for Q updates, since each update adds log N new nodes. This is much cheaper than storing Q full copies of the tree (which would be O(Q x N)).
For a concrete example: 10 million data points with 1 million updates creates about 10M + 1M x 23 = 33M nodes. At 8 bytes per node, that is ~264 MB. Storing 1 million full copies would require 10M x 1M x 8 = 80 TB. Persistent segment trees reduce this by a factor of 300,000x.
Production usage
| System | Usage | Notable behavior |
|---|---|---|
| Real-time leaderboards (gaming, competitions) | Rank queries and score updates | A segment tree over score buckets answers "how many players have score >= X?" in O(log N). When a player's score changes, update two buckets (decrement old, increment new) in O(log N). |
| Time-series databases (InfluxDB, TimescaleDB) | Range aggregations over time windows | Segment trees (or conceptually equivalent structures) over time buckets answer "sum of events in [t1, t2]" without scanning all data points. Combined with downsampling for older data. |
| Database query engines (columnar stores) | Min/max zone maps for block pruning | PostgreSQL's BRIN indexes and columnar engines use min/max metadata per block (conceptually a leaf-level segment tree) to skip irrelevant blocks during range queries. |
| Computational geometry | Sweep-line interval queries | Count overlapping intervals, find all segments intersecting a vertical line. Segment trees handle these in O(log N + k) where k is the output size. |
| Network monitoring | Traffic aggregation over IP ranges | Aggregate bandwidth or packet counts over IP address ranges. Each IP prefix maps to a segment tree node for hierarchical aggregation. |
For your interview: leaderboards and time-series databases are the two most common real-world applications to mention. Both involve frequent updates and frequent range queries, which is exactly the segment tree's sweet spot.
Limitations and when NOT to use it
- Overkill for static data. If your data never changes after initial load, a simple prefix sum array gives O(1) range queries with O(N) build time. The segment tree's O(log N) queries are slower than O(1), and the 4N space overhead is unnecessary.
- High constant factor for small N. For arrays under ~1000 elements, the overhead of tree traversal, cache misses, and the 4N allocation make a segment tree slower than a brute-force scan. Profile before committing to the data structure.
- Not cache-friendly by default. The flat array representation places parent and children nodes far apart in memory for large trees. At N = 10 million, a query touches ~23 nodes scattered across a 40M-entry array. Each access is a likely cache miss.
- Only supports associative merge functions. Median, mode, and percentile queries do not have associative merge functions. You cannot compute the median of a range by merging sub-range medians. For these, use a merge sort tree or wavelet tree.
- Memory overhead is 4x the input. A segment tree for N elements requires 4N nodes in the worst case. Compare to a Fenwick tree which requires only N entries for range-sum functionality.
- Lazy propagation is tricky to implement correctly. Bugs are subtle: forgetting to push down before recursing, double-counting lazy tags. Composite lazy tags (combining add and set) increase complexity further. Prefer well-tested library implementations.
Interview cheat sheet
- When asked about range queries with updates, say "segment tree" immediately. It is the canonical answer for O(log N) range query + O(log N) point update on mutable arrays. Follow up with construction cost: O(N) time, O(N) space.
- When comparing to prefix sums, state the tradeoff clearly: prefix sums give O(1) queries but O(N) updates. Segment trees give O(log N) for both. If updates are rare or nonexistent, prefix sums win.
- When range updates are mentioned, always bring up lazy propagation. Say: "Lazy propagation defers range updates by tagging subtree roots and pushing tags down on demand. This keeps both range update and range query at O(log N)."
- When asked about leaderboards or ranking, describe a segment tree over score buckets: "Maintain an array where index i counts players with score i. Range sum [X, max] gives the number of players with score >= X. Score changes update two buckets." The tree is indexed by score (value space), not by player ID.
- When discussing time-series aggregations, mention that segment trees are the mechanism behind efficient range queries in time-series databases. "Sum of events in [t1, t2] decomposes into O(log N) precomputed sub-ranges."
- When asked about persistent data structures, explain: "Each update creates O(log N) new nodes while sharing unmodified subtrees. Enables time-travel queries on any historical version at O(log N) per query."
- When the interviewer asks about alternatives, compare to Fenwick trees: "Fenwick trees also do O(log N) range sum + point update but with half the memory (N instead of 4N). The tradeoff: Fenwick trees only support prefix queries, not arbitrary range queries without subtraction, and they cannot handle min/max."
- When discussing database internals like BRIN indexes or zone maps, connect to segment trees: "A BRIN index is conceptually a single level of a segment tree, storing min/max per block. A full segment tree would give O(log N) pruning instead of linear block scanning."
Quick recap
- A segment tree stores precomputed aggregates over every sub-range of an array in a complete binary tree with 4N nodes, enabling O(log N) range queries, O(log N) point updates, and O(N) construction.
- Range queries decompose [L, R] into at most O(log N) non-overlapping precomputed nodes, visiting at most 4 nodes per level.
- Point updates walk from leaf to root, recomputing log(N) ancestor aggregates at O(1) per node.
- Lazy propagation defers range updates to subtree roots and pushes tags down on demand, keeping both range update and range query at O(log N). The pushDown call must appear before any child access.
- Segment trees work with any associative aggregate (sum, min, max, GCD). Persistent variants share unmodified subtrees across versions, creating O(log N) new nodes per update.
- Use segment trees when you need both range queries and frequent updates. Use prefix sums for static data. Use Fenwick trees for range sums when memory matters. Use sparse tables for static range-min in O(1).
Related concepts
- Time-series storage - Time-series databases use hierarchical aggregation structures (conceptually similar to segment trees) for range queries over time windows. Understanding segment trees explains why range aggregations are efficient.
- B-tree indexes - Both segment trees and B-trees are tree structures that trade O(log N) operations for hierarchical organization. B-trees optimize for disk I/O with high branching factors; segment trees optimize for in-memory range aggregation with binary splitting.
- Databases - Database engines use segment-tree-like structures for zone maps, BRIN indexes, and block-level min/max pruning. Understanding segment trees illuminates how range predicates skip irrelevant data blocks.