HyperLogLog: counting unique things without counting them
How HyperLogLog estimates cardinality (distinct count) using probabilistic hashing. How hash bucketing and harmonic mean estimation work, why it uses only 12KB for any cardinality, and how to merge HLL sketches.
38 min read2026-04-04hardalgorithmsinternalsprobabilisticanalytics
Your analytics dashboard shows "unique visitors today." The product team wants this number on every page, every dashboard, every A/B experiment summary. The naive implementation: maintain a HashSet of all user IDs you have seen, then call .size().
At 10,000 daily users, this costs a few hundred KB. At 100 million daily users, the set costs over a gigabyte of RAM for a single counter. Now multiply that by every page, every experiment, every time window. You are looking at terabytes of memory just for counting distinct things.
It gets worse in distributed systems. Your traffic is split across 20 shards. To compute "unique visitors across all shards," you need to transfer every shard's full set to a central node and compute the union. That is 20 gigabytes of network transfer for a single number. And you need this number updated every minute for real-time dashboards.
Even with clever deduplication (Bloom filters, sorted merges), the fundamental problem remains: exact distinct counting requires memory proportional to cardinality. When cardinality is in the hundreds of millions, the cost is prohibitive.
You can approximate with sampling, but sampling introduces bias: rare items are underrepresented. You can use bitmaps, but bitmap size is proportional to the domain size, not the cardinality.
This is the problem HyperLogLog solves: counting distinct elements (cardinality estimation) using a fixed, tiny amount of memory, regardless of whether the actual count is 1,000 or 1 trillion.
Here is a concrete comparison of the approaches:
Approach
Memory for 100M distinct users
Merge cost across 20 shards
Time complexity per add
HashSet
~4 GB
~80 GB network transfer
O(1) amortized, but GC spikes
Sorted file + dedup
~1.6 GB on disk
Full dump + sort-merge
O(log N)
HyperLogLog
12-16 KB
320 KB total
O(1), constant
The difference is not incremental. It is orders of magnitude. And the accuracy loss (0.81%) is invisible on any dashboard or report where the numbers are already rounded for display.
HyperLogLog (HLL) is a probabilistic data structure that estimates the number of distinct elements in a dataset using constant memory, typically 12-16KB, regardless of actual cardinality. The standard error is about 0.81% with the typical 16,384-bucket configuration. It was invented by Philippe Flajolet, Éric Fusy, Olivier Gandouet, and Frédéric Meunier in 2007.
Comments
Analogy: Imagine you are flipping coins and recording the longest streak of heads you have ever seen. If you have only flipped 10 times, a streak of 3 heads is reasonable. If you have flipped 10 million times, seeing a streak of 20+ heads is expected. The longest streak you have observed tells you roughly how many coin flips you must have done. HLL applies this same idea to hash outputs: longer runs of leading zeros imply more distinct inputs.
The core observation: if you hash items uniformly, the more distinct items you process, the longer the longest run of leading zero bits you will encounter. If a hash starts with 5 leading zeros (00000xxx...), the probability of that specific output is 1/32. You probably needed to hash around 32 distinct items to see it. Ten leading zeros means you likely processed about 1,024 items.
Why 'Hyper'?
The name comes from using the harmonic mean (related to "hyperharmonic" numbers) to combine estimates from multiple buckets. The original "LogLog" algorithm used the geometric mean. HyperLogLog's harmonic mean reduces variance from outlier buckets significantly.
The single crude estimator 2^(max_leading_zeros) is wildly inaccurate on its own. The improvement comes from splitting the hash space into thousands of independent buckets, computing the estimator per bucket, and combining them with a harmonic mean. This is the core of the algorithm, and I will walk through each step.
The algorithm has four steps: hash, split, track, and estimate. Each step is simple on its own. The clever engineering is in how they combine to produce a cardinality estimate from nothing but a small byte array.
Before diving in, here is the high-level intuition: hash each element, use part of the hash to pick a bucket, and use the rest of the hash to make a "how rare is this hash?" observation. Track the rarest observation per bucket. When you have seen many distinct elements, rare observations (long zero runs) become more likely, so the tracked maximums increase.
Step 1: Hash every element to a uniform 64-bit string. The hash function must distribute outputs uniformly across the full 64-bit range. MurmurHash3 and xxHash are common choices. Cryptographic hashes work but are needlessly slow.
Step 2: Split the hash into two parts. The first b bits select which of the 2^b buckets (registers) this element maps to. The remaining 64 - b bits are used for the leading-zero observation. With b=14, you get 16,384 buckets and 50 remaining bits.
Step 3: Track the maximum zero-run per bucket. For the remaining bits, count the number of leading zeros plus one. If this value exceeds the current register value, update it.
def add(element): h = hash64(element) bucket_index = h >> (64 - b) # first b bits remaining = h & ((1 << (64 - b)) - 1) # remaining bits run_length = count_leading_zeros(remaining) + 1 registers[bucket_index] = max(registers[bucket_index], run_length)
Each register fits in a single byte (maximum run length in 64 bits is 64, which fits in 6 bits, though most implementations use a full byte for alignment). Total memory: 16,384 registers x 1 byte = 16KB.
Step 4: Estimate cardinality using the harmonic mean of 2^(-M[j]) across all buckets. This is the aggregation step that turns 16,384 independent observations into one number:
def estimate(): m = len(registers) # 16,384 alpha = 0.7213 / (1 + 1.079 / m) raw = alpha * m * m / sum(2 ** -r for r in registers) return round(raw)
The harmonic mean is critical. A simple average would be destroyed by outlier buckets that saw an unusually long zero run by chance. The harmonic mean naturally downweights outliers. For your interview: just say "it uses buckets and harmonic mean to reduce variance from outliers."
The true count is 4. The estimate of ~5 is off because 4 buckets provide very high variance (26% standard error). With 16,384 buckets, this variance shrinks to 0.81%.
Consider what happens with the arithmetic mean. If one bucket sees a hash with 15 leading zeros (a 1-in-32768 event), that single outlier dominates the average. The arithmetic mean of {1, 1, 1, 15} is 4.5, suggesting about 2^4.5 ≈ 23 distinct elements from 4 observations.
The harmonic mean resists outliers by working with reciprocals: 4 / (2^(-1) + 2^(-1) + 2^(-1) + 2^(-15)) ≈ 4 / 1.5 ≈ 2.67, much closer to the truth. This outlier resistance is why the algorithm jumped from "LogLog" (geometric mean, requiring more buckets for the same accuracy) to "HyperLogLog" (harmonic mean).
Notice that "alice" appearing twice has zero effect. The hash is identical both times, so the bucket and run length are the same. The max operation is idempotent. This is exactly what we want: duplicates are automatically ignored without any explicit deduplication.
A single maximum-zero-run counter is extremely noisy. If you hash 1,000 elements into one counter, a single lucky hash with 15 leading zeros makes the counter estimate 32,768. That is useless.
The fix: split the input into m independent substreams using the first b bits of the hash. Each substream has roughly n/m elements (where n is the total count). Each substream produces its own estimate. Combining m independent estimates reduces the variance by a factor of sqrt(m).
This is called stochastic averaging, and it is the same principle behind ensemble methods in machine learning: many weak estimators combined produce a strong estimator. Each individual bucket is noisy. Together, 16,384 buckets average out to 0.81% error.
The tradeoff is intuitive: more buckets means more memory (one byte each) but less noise. Fewer buckets means less memory but wilder estimates. The formula 1.04/sqrt(m) captures this exactly. Adding buckets has rapidly diminishing returns because of the square root: going from 16 to 16,384 buckets (1000x) only improves accuracy by ~32x (from 26% to 0.81%).
Redis uses 6-bit registers (max value 63, sufficient for up to 2^63 cardinality with 64-bit hashes). The dense representation packs registers tightly: 16,384 × 6 bits = 98,304 bits = 12,288 bytes. Some implementations use 8-bit registers for simplicity, costing 16,384 bytes.
The entire structure fits in a few L1 cache lines when accessed sequentially. This makes HLL not just memory-efficient but also fast: updating a register is a single byte write with no branching beyond the max comparison.
For performance-sensitive applications, the per-element cost breaks down to:
One hash computation (~25ns for MurmurHash3 on a 16-byte key)
One bit shift to extract the bucket index
One leading-zero count (a single CPU instruction: LZCNT on x86, CLZ on ARM)
One max comparison and conditional store
Total: roughly 30-50 nanoseconds per element on modern hardware. At 50ns per element, a single core processes 20 million distinct-count insertions per second. Redis achieves close to this in practice.
The standard error formula is 1.04 / sqrt(m) where m is the number of buckets (registers). You trade memory for accuracy in a straightforward way: doubling the buckets halves the error. This is the only tuning knob that matters.
Buckets (m)
b (prefix bits)
Memory
Standard error
16
4
16 bytes
~26%
64
6
64 bytes
~13%
256
8
256 bytes
~6.5%
1,024
10
1 KB
~3.25%
4,096
12
4 KB
~1.6%
16,384
14
16 KB
~0.81%
65,536
16
64 KB
~0.41%
The sweet spot for most production systems is b=14 (16,384 buckets, 16KB, 0.81% error). Redis, BigQuery, Presto, and Druid all default to this. I almost never see a valid reason to go below b=12 or above b=16.
Common misconception: error grows with cardinality
The standard error of HLL is relative, not absolute. Whether you have 1,000 or 1 billion distinct items, the error stays at ~0.81%. A 0.81% error on 1 billion items means ±8.1 million, which sounds large but is still only 0.81%. The memory stays at 16KB regardless.
The raw harmonic mean estimator has known bias at the extremes. HyperLogLog++ (the Google improvement adopted by most implementations) adds two corrections:
Small range (below ~2.5m estimates): When many registers are still at zero, the estimate is too high. The algorithm falls back to LinearCounting, which uses the number of empty registers to produce a better estimate for small cardinalities. Specifically: LinearCounting(m, V) = m × ln(m / V) where V is the count of zero-valued registers.
The intuition: if most registers are still zero, you have not seen many distinct elements. The fraction of empty registers directly indicates how much of the hash space has been "hit." This gives a more accurate estimate than the harmonic mean when cardinality is very low.
Large range (above ~2^32): With 32-bit hashes, hash collisions become significant above 2^32 cardinality. HyperLogLog++ uses 64-bit hashes and applies a large-range correction factor. In practice, 64-bit hashes push this threshold so high that the correction rarely matters.
Bias correction (HLL++ enhancement): Google's HyperLogLog++ paper introduced empirical bias correction for the medium range (between the small and large thresholds). They precomputed the expected bias for various cardinalities and bucket counts, then stored these as lookup tables. During estimation, the algorithm interpolates the bias correction from the closest precomputed values. This eliminates the "bump" in error rate that raw HLL produces around the transition from LinearCounting to the harmonic mean estimator.
HLL's accuracy depends entirely on the hash function producing a uniform distribution. If the hash is biased (certain bit patterns are more likely), the leading-zero observations are skewed and the estimate goes wrong.
Hash function
Suitable for HLL?
Notes
MurmurHash3 (128-bit)
Yes
Fast, excellent distribution. Most common choice.
xxHash (64-bit)
Yes
Fastest option. Used in many streaming systems.
SHA-256
Yes, but slow
Cryptographic strength is unnecessary for HLL. Use only if hashes double as security tokens.
CRC32
No
Poor distribution. Correlated bit patterns produce biased zero-run observations.
Java hashCode()
No
Often has low-bit clustering. Integer hashCode is the identity function, which is terrible for HLL.
The mistake I see most often: using a weak hash function because "it's fast" and then wondering why the cardinality estimates have 5-10% error instead of 0.81%. The hash function is not where you save cycles. MurmurHash3 hashes a 16-byte key in about 25 nanoseconds. That is faster than a single L2 cache miss.
If you are working in Java, avoid String.hashCode() for HLL. It uses a polynomial hash that clusters in the low bits for short strings. Use Google Guava's Hashing.murmur3_128() or similar.
When should you use HLL versus an exact counting method? The decision depends on cardinality, number of counters, and whether you need mergeability.
Approach
Memory per counter
Accuracy
Merge support
Best for
HashSet
32-40 bytes × cardinality
Exact
Expensive (full set transfer)
Low cardinality (under 100K), need element retrieval
Sorted bitmap (Roaring)
~1 bit per element + overhead
Exact
Moderate (bitmap OR)
Integer domains, moderate cardinality
HyperLogLog
12-16 KB fixed
~0.81% error
Cheap (16KB max per merge)
High cardinality, distributed, many counters
For a single counter tracking 1,000 users, a HashSet uses ~40KB and gives exact results while HLL uses 12KB for approximate results. HLL wins only when you have thousands of counters, need mergeability, or the cardinality is unknown and could be enormous.
So when does HLL actually pay off? Two scenarios dominate:
Many counters, unknown cardinality. You need "unique visitors per page" for 10 million pages. With HashSets, the worst-case page (10M unique visitors) costs 400MB alone. With HLL, every page costs exactly 12KB. Total: 120GB versus potentially terabytes.
Distributed aggregation. You need "unique visitors globally" from 100 shards. With HashSets, you ship 100 full sets (potentially gigabytes each) and union them. With HLL, you ship 100 × 12KB = 1.2MB and take element-wise max. The network savings are orders of magnitude.
Slightly more complex implementation; now the de facto standard
MinCount
Simpler than HLL; tracks minimum hash values per bucket
Less accurate per byte than HLL
Linear Counting
Exact at low cardinality, degrades at high
Only works until the bit array is mostly full
Theta Sketch
Supports union, intersection, AND difference
~2x memory of HLL; from Apache DataSketches
CPC Sketch (Compressed Probabilistic Counting)
~40% less memory than HLL for same accuracy
From Apache DataSketches; more complex; supports merging
For most practical purposes, HyperLogLog++ is the right default. Theta sketches are the alternative when you need set intersection. CPC sketches are worth considering when memory per sketch matters (e.g., millions of per-key sketches) and the ~40% memory savings compound to terabytes.
My recommendation for interviews: always say "HyperLogLog" first. If the interviewer presses on set intersection, pivot to theta sketches. If they press on memory efficiency for per-key tracking, mention CPC sketches. Layer the sophistication; don't lead with it.
One more you may encounter: streaming HLL. Some systems (Apache Flink, Spark Structured Streaming) integrate HLL directly into their stream processing operators. Each window or partition maintains its own sketch, and the framework handles merge on shuffle boundaries. If an interviewer asks about real-time cardinality in streaming systems, this is the right answer.
This is HLL's killer feature for distributed systems: two HLL sketches merge by taking the element-wise maximum of their registers. No coordination protocol, no deduplication logic, no data transfer beyond the sketches themselves.
def merge(hll1, hll2): return [max(hll1[i], hll2[i]) for i in range(len(hll1))]
Why does this work? Each register tracks the maximum zero-run for its bucket. Taking the max across two sketches is equivalent to having observed all elements from both sketches in a single stream. The merged result is identical to what you would get from processing all elements through a single HLL. This property holds for any number of sketches, not just two.
This makes distributed distinct counting trivially cheap. Each shard maintains its own HLL sketch. To get the global count, merge all sketches. Each merge transfers only 16KB per shard. Compare this to the naive approach of transferring entire sets of user IDs across the network.
Here is what the cost looks like for a typical analytics workload:
Metric
Exact (HashSet)
HLL (b=14)
Per-shard memory
~4 GB (100M users)
12 KB
Network per merge (20 shards)
~80 GB
240 KB
Merge computation
Minutes (set union)
Microseconds (byte-wise max)
Result accuracy
Exact
±0.81%
The 0.81% accuracy loss buys you a 300,000x reduction in network transfer. For dashboards and analytics, that tradeoff is a no-brainer. I have never seen a product team reject a 0.81% error rate on a "unique visitors" metric. The number on the dashboard is already rounded to the nearest thousand.
Interview tip: lead with mergeability
When HLL comes up in an interview, lead with the merge property. Say: "HLL sketches are mergeable, so each shard maintains a 16KB sketch and the coordinator just takes element-wise max. The network cost is 16KB per shard regardless of cardinality." This shows you understand the distributed systems angle, not just the algorithm.
My recommendation: any time you need a "count distinct" metric across sharded data, HLL should be your default answer. The merge cost is negligible compared to any alternative. Say this in your interview and move on to the next design decision.
Redis uses a sparse representation for HLLs with low cardinality, switching to a dense representation as more registers become non-zero. The sparse encoding uses run-length encoding and typically consumes far less than 12KB for small sets.
Representation
When used
Memory
Sparse (RLE)
Few non-zero registers
Tens of bytes to ~3KB
Dense
Many registers occupied
Exactly 12,288 bytes
The transition from sparse to dense is automatic and transparent. Redis's dense representation uses 6 bits per register (not 8), which is why the total is 12KB (16,384 × 6 bits / 8 = 12,288 bytes) instead of the theoretical 16KB.
I'll often see teams build their own HLL implementation when Redis is already in their stack. Don't. Redis's implementation includes the HyperLogLog++ corrections for small and large ranges, the sparse-to-dense optimization, and it has been production-tested for years.
A common production pattern: track unique visitors per hour and merge hourly sketches to answer arbitrary time-range queries.
The key insight: you cannot subtract from an HLL. If you merge hours 14-17 and then want to remove hour 14, you must recompute by merging hours 15-17 from scratch. Always keep the granular sketches and merge on read.
This pattern costs 12KB per time bucket per metric. For 24 hourly buckets across 1,000 metrics, that is 24 × 1,000 × 12KB = 288MB. Entirely reasonable for real-time analytics.
A common variation of the time-windowed pattern: track unique participants per A/B experiment variant.
# User enters experiment "checkout_v2", variant "control"
PFADD experiment:checkout_v2:control user:42
# User enters variant "treatment"
PFADD experiment:checkout_v2:treatment user:99
# How many unique users participated in the experiment total?
PFMERGE experiment:checkout_v2:all experiment:checkout_v2:control experiment:checkout_v2:treatment
PFCOUNT experiment:checkout_v2:all
This tells you the total unique participants across all variants with a single merge operation. Without HLL, you would need to union potentially millions of user IDs across variants. With hundreds of concurrent experiments, HLL keeps memory bounded while set-based approaches blow up.
The cost per experiment is just (number_of_variants + 1) × 12KB. Even 500 experiments with 5 variants each cost only 500 × 6 × 12KB = 36MB.
This pattern is used at every major tech company that runs A/B experiments at scale. Netflix, Spotify, and LinkedIn all use HLL-based cardinality tracking for experiment analysis. The alternative (exact sets per variant per experiment) would consume orders of magnitude more memory and make cross-variant aggregation prohibitively expensive.
HLL sketches stored as column metadata; query time drops from minutes to milliseconds on large tables
Apache Druid
Built-in HLL aggregator for real-time OLAP
Uses DataSketches library; supports theta sketch union operations for set intersection
Presto / Trino
approx_distinct() function
Configurable precision parameter; returns HLL digest for downstream merging
Elasticsearch
cardinality aggregation
Precision threshold 0-40000 (default 3000); trades accuracy for speed on high-cardinality fields
PostgreSQL
hll extension from Citus
Stores HLL columns directly in tables; aggregate functions for GROUP BY cardinality queries
The pattern across all these systems is the same: expose an approximate distinct count function that returns in constant time regardless of data size, and support partial aggregation (merge) for distributed query execution. When BigQuery or Presto splits a query across 50 workers, each worker computes a local HLL sketch and the coordinator merges them. The user sees APPROX_COUNT_DISTINCT(user_id) and gets the answer in seconds instead of minutes.
In BigQuery specifically, you can also persist HLL sketches as columns using HLL_COUNT.INIT(), store them in a table, and merge them later with HLL_COUNT.MERGE(). This lets you precompute partial sketches during ETL and merge them at query time, which is how Google keeps interactive query performance on multi-petabyte datasets.
Interview tip: reference a specific system
Don't just say "HLL is used in databases." Name one: "Redis PFADD uses HLL internally with 12KB per key and 0.81% error" or "BigQuery's APPROX_COUNT_DISTINCT uses HLL++ with partial aggregation across workers." Specificity signals depth.
HLL is powerful within its niche, but using it outside that niche produces subtle bugs that are hard to catch because the estimates look plausible. Here are the concrete scenarios where HLL is the wrong tool.
You need exact counts. HLL gives ~0.81% error. If your use case requires exact distinct counts (billing, compliance, deduplication with zero tolerance), HLL is the wrong tool. Use an exact set or bitmap. A 0.81% error on a $10M revenue report is an $81K discrepancy.
You need to enumerate the elements. HLL tells you how many distinct items, not which ones. If you need to retrieve the actual elements (e.g., "list all users who visited this page"), you need a set or Bloom filter.
Low cardinality with high accuracy needs. For sets under 10,000 elements, the absolute error of HLL (up to ~80 elements off) may be unacceptable, and an exact HashSet costs only a few KB anyway. There is no memory advantage.
Set intersection is needed. HLL supports union (merge) but not intersection. You cannot directly compute "users who visited page A AND page B" from two HLL sketches. The inclusion-exclusion workaround has unbounded relative error when the intersection is small. Use theta sketches instead.
Per-key cardinality tracking at massive scale. If you need an HLL per key (e.g., distinct visitors per URL for 100 million URLs), the total memory is 100M × 12KB = 1.2TB. Consider whether you need that granularity or can use sampling. CPC sketches cut this by ~40%.
Deletions are required. Standard HLL does not support removing elements. Once a hash sets a register, that register cannot be decremented. If elements come and go, look at time-bucketed sketches (create new sketches for each period and expire old ones) or CPC sketches.
The rule of thumb: if someone asks "how many unique X" and the answer can be approximate, use HLL. If they ask "which X" or "exactly how many X," you need something else.
When asked "how do you count unique users across shards": say HyperLogLog immediately. Each shard keeps a 16KB sketch, sketches merge by element-wise max, total network cost is 16KB per shard regardless of cardinality. This single sentence covers the distributed angle.
When asked "how much memory does it use": say 12-16KB regardless of cardinality. 16,384 registers at 6-8 bits each. Error is ~0.81%. Compare: an exact HashSet for 100M users costs ~4GB. The memory savings are 250,000x.
When asked "can you do real-time unique counts": say yes. HLL add is O(1) per element (one hash, one register update). Redis PFADD handles millions of adds per second on a single core.
When asked "what about accuracy": say standard error is 1.04/sqrt(m). With 16,384 registers that is 0.81%. Double the registers to halve the error, but 0.81% is sufficient for analytics, dashboards, and A/B experiments.
When asked "how do you combine counts from multiple time windows": say merge the HLL sketches for each window. Keep hourly sketches, merge them for daily counts. You cannot subtract, though, so you must keep the granular sketches and merge on read.
When asked "how is this different from a Bloom filter": say Bloom filters answer membership ("is X in the set?") while HLL answers cardinality ("how many distinct X?"). Different questions, both probabilistic, both constant memory.
When asked "can you get the intersection of two sets": say HLL supports union (merge) but not intersection directly. If the interviewer wants intersection, suggest MinHash or theta sketches from the Apache DataSketches library.
When asked "what systems support this natively": name three: Redis (PFADD/PFCOUNT/PFMERGE), BigQuery (APPROX_COUNT_DISTINCT), and Elasticsearch's cardinality aggregation. This shows you have seen HLL in real production contexts.
Here are the six things to remember about HyperLogLog. If you can state all six in 30 seconds, you have the mechanism down.
HyperLogLog estimates distinct count (cardinality) using constant memory, typically 12-16KB for ~0.81% error regardless of whether the cardinality is 1,000 or 1 trillion.
The algorithm hashes each element, routes it to one of 16,384 buckets using the first 14 bits, and tracks the maximum run of leading zeros in the remaining bits per bucket.
The harmonic mean across all buckets produces the final estimate, naturally downweighting outlier buckets that saw unusually long zero runs by chance.
HLL sketches merge by element-wise max, enabling distributed distinct counting where each shard sends only 16KB to the coordinator.
Redis provides native HLL via PFADD, PFCOUNT, and PFMERGE with sparse-to-dense auto-promotion that saves memory at low cardinalities.
HLL supports union (merge) but not intersection or deletion. Use theta sketches for set intersection and time-bucketed keys for rolling windows.
Bloom filters - Both are probabilistic data structures with constant memory, but Bloom filters answer membership ("is X in the set?") while HLL answers cardinality ("how many distinct X?"). You often use them together: Bloom filter to check membership, HLL to count distinct members.
Count-min sketch - Another probabilistic sketch for streaming analytics. CMS estimates frequency ("how often was X seen?") rather than cardinality. Where HLL counts unique things, CMS counts how many times each thing appeared.
Hashing - HLL depends entirely on uniform hash distribution. A biased hash function breaks HLL's accuracy guarantees. Understanding hash functions helps you pick the right one (MurmurHash3, xxHash) and avoid the wrong ones (CRC32, Java hashCode).
Caching - HLL cardinality estimates drive cache sizing decisions: knowing the distinct key count helps set appropriate cache capacities and predict hit rates without tracking every key individually.
Databases - Most modern analytical databases (BigQuery, Redshift, Druid) include HLL as a built-in aggregate function for interactive query performance on high-cardinality columns.