EAV tables anti-pattern

TL;DR

Entity-Attribute-Value (EAV) stores attributes as rows (entity_id, attribute_name, attribute_value) instead of columns. It feels flexible; it is slow and unmaintainable at scale.
Retrieving a single entity requires self-joining the EAV table once per attribute. Five filters means five self-joins. There is no type safety, no referential integrity, and indexes are nearly useless.
Fix: Use JSONB columns (PostgreSQL) for flexible attributes, document stores (MongoDB, DynamoDB) for truly schemaless data, or wide column stores (Cassandra) for sparse attributes at massive scale.
When it's acceptable: Only for runtime user-defined attributes (form builders, plugin systems) on legacy RDBMS with no JSON support.

It starts innocently. You're building a product catalog for an e-commerce platform. The product team keeps asking for new attributes: colour, material, weight, thread count, battery life. Every new attribute means a schema migration, a deployment, and a prayer that nothing breaks.

A senior dev proposes the "flexible" solution: store attributes as rows instead of columns. No more migrations. No more deployment risk. The schema is "future-proof."

Every team I've seen adopt EAV describes that moment the same way: "We thought we were being clever."

CREATE TABLE product_attributes (
    product_id  INT,
    attr_name   VARCHAR(100),
    attr_value  VARCHAR(1000)
);

Six months later, the catalog has 2 million products with an average of 18 attributes each. That's 36 million rows in product_attributes. Fetching a single product's colour, size, material, and weight looks like this:

SELECT
    p.name,
    colour.attr_value AS colour,
    size.attr_value   AS size,
    material.attr_value AS material,
    weight.attr_value AS weight
FROM products p
LEFT JOIN product_attributes colour   ON p.id = colour.product_id   AND colour.attr_name   = 'colour'
LEFT JOIN product_attributes size     ON p.id = size.product_id     AND size.attr_name     = 'size'
LEFT JOIN product_attributes material ON p.id = material.product_id AND material.attr_name = 'material'
LEFT JOIN product_attributes weight   ON p.id = weight.product_id   AND weight.attr_name   = 'weight'
WHERE p.id = 123;

Four self-joins to read one product. Filtering for "all red T-shirts smaller than size L costing under $50" requires five self-joins plus WHERE clauses on text values. I've profiled this exact query pattern at a mid-size retailer. It went from 2ms with normal columns to 1,400ms with EAV at 500K products.

The query planner can't use indexes effectively because every attribute is a row, not a column. An index on (attr_name, attr_value) helps filter one attribute, but the self-join still has to happen for each additional attribute in the WHERE clause.

Here's how the query cost scales with the number of attribute filters:

Attribute filters	Self-joins required	Relative query time (500K products)
1	1	1x (15ms)
3	3	8x (120ms)
5	5	35x (520ms)
8	8	95x (1,400ms)
12	12	250x+ (timeout)

The problem isn't just speed. EAV also makes your codebase harder to maintain. Every query that touches attributes needs hand-written pivot SQL. ORMs can't map EAV naturally. New developers spend their first week figuring out how to add a column that doesn't exist as a column.

Why It Happens

EAV feels like the smart, forward-thinking choice. Every decision that leads to it sounds reasonable in isolation:

"We don't know all the attributes yet." The product team is still figuring out what fields they need. EAV means no schema migrations.
"Different product types have different attributes." A laptop has RAM and screen size; a shirt has material and thread count. EAV handles both.
"We need users to define custom fields." Admin panels where merchants add their own attributes. EAV is the obvious data model.
"It worked fine in the prototype." With 500 products and 3 attributes each, EAV queries run in under 10ms. The pain only shows at scale.

The trap is that EAV optimises for write-time flexibility (adding new attributes) at the expense of read-time performance (querying across attributes). Most systems read 10-100x more than they write. I've seen three teams adopt EAV in the past five years. All three migrated off it within 18 months.

The worst part: by the time EAV becomes painful, migration is expensive. You have millions of rows to backfill, dozens of queries to rewrite, and application logic built around the pivot pattern.

TL;DR

Entity-Attribute-Value (EAV) stores attributes as rows (entity_id, attribute_name, attribute_value) instead of columns. It feels flexible; it is slow and unmaintainable at scale.
Retrieving a single entity requires self-joining the EAV table once per attribute. Five filters means five self-joins. There is no type safety, no referential integrity, and indexes are nearly useless.
Fix: Use JSONB columns (PostgreSQL) for flexible attributes, document stores (MongoDB, DynamoDB) for truly schemaless data, or wide column stores (Cassandra) for sparse attributes at massive scale.
When it's acceptable: Only for runtime user-defined attributes (form builders, plugin systems) on legacy RDBMS with no JSON support.

The Problem

A senior dev proposes the "flexible" solution: store attributes as rows instead of columns. No more migrations. No more deployment risk. The schema is "future-proof."

Every team I've seen adopt EAV describes that moment the same way: "We thought we were being clever."

CREATE TABLE product_attributes (
    product_id  INT,
    attr_name   VARCHAR(100),
    attr_value  VARCHAR(1000)
);

SELECT
    p.name,
    colour.attr_value AS colour,
    size.attr_value   AS size,
    material.attr_value AS material,
    weight.attr_value AS weight
FROM products p
LEFT JOIN product_attributes colour   ON p.id = colour.product_id   AND colour.attr_name   = 'colour'
LEFT JOIN product_attributes size     ON p.id = size.product_id     AND size.attr_name     = 'size'
LEFT JOIN product_attributes material ON p.id = material.product_id AND material.attr_name = 'material'
LEFT JOIN product_attributes weight   ON p.id = weight.product_id   AND weight.attr_name   = 'weight'
WHERE p.id = 123;

Here's how the query cost scales with the number of attribute filters:

Attribute filters	Self-joins required	Relative query time (500K products)
1	1	1x (15ms)
3	3	8x (120ms)
5	5	35x (520ms)
8	8	95x (1,400ms)
12	12	250x+ (timeout)

Why It Happens

EAV feels like the smart, forward-thinking choice. Every decision that leads to it sounds reasonable in isolation:

"We don't know all the attributes yet." The product team is still figuring out what fields they need. EAV means no schema migrations.
"Different product types have different attributes." A laptop has RAM and screen size; a shirt has material and thread count. EAV handles both.
"We need users to define custom fields." Admin panels where merchants add their own attributes. EAV is the obvious data model.
"It worked fine in the prototype." With 500 products and 3 attributes each, EAV queries run in under 10ms. The pain only shows at scale.

The worst part: by the time EAV becomes painful, migration is expensive. You have millions of rows to backfill, dozens of queries to rewrite, and application logic built around the pivot pattern.

EAV tables anti-pattern

TL;DR

The Problem

Why It Happens

Continue Reading with Premium

Comments

EAV tables anti-pattern

TL;DR

The Problem

Why It Happens

Continue Reading with Premium

Comments