Polymorphic associations anti-pattern

TL;DR

Polymorphic associations store a foreign key plus a type discriminator column to point to one of several tables: parent_id = 42, parent_type = 'Post'.
The database cannot enforce referential integrity on a column that points to any of N tables. Orphaned rows, inconsistent deletes, and silent data corruption are guaranteed over time.
You also can't create a standard foreign key constraint on parent_id. The index must cover (parent_id, parent_type) and query performance degrades accordingly.
Fix: Use separate join tables per association type (normalised) or nullable FK columns per parent type with a check constraint.
When it's acceptable: ORM-managed polymorphism (Rails, Django) where application-layer code handles cascades and you have integration tests covering delete paths.

The Problem

You're building a comments system. Comments can be left on Posts, Videos, or Products. A developer introduces a "flexible" schema:

CREATE TABLE comments (
    id          BIGINT PRIMARY KEY,
    body        TEXT,
    user_id     BIGINT,
    parent_id   BIGINT,           -- points to a post, video, or product
    parent_type VARCHAR(20)       -- 'Post', 'Video', 'Product'
);

This looks clean. One table, one query pattern, works for all three parent types. I've seen this exact design proposed in at least a dozen schema reviews. It always feels elegant at design time.

Then someone deletes a Video. Does the database automatically delete its comments? No. There's no FOREIGN KEY (parent_id) REFERENCES videos(id). The database has no idea that parent_type = 'Video' means this row should be tied to the videos table. Those comments are now orphans, permanently pointing to a non-existent video ID.

Over 18 months, your comments table accumulates thousands of orphans. Customer support reports "ghost comments" appearing on unrelated pages. The bug is intermittent because orphaned parent_id values occasionally collide with new IDs in different parent tables.

Querying all comments for a post requires filtering on two columns: WHERE parent_id = ? AND parent_type = 'Post'. Joining comments to their parent is not possible with a standard SQL JOIN. You need a UNION across three separate queries or a CASE statement in your ORM.

The performance cost is real too. An index on parent_id alone is nearly useless because it matches rows across all parent types. You need a composite index on (parent_type, parent_id), and even then, the index is less selective than a proper FK index on a single-type column.

Here's how index efficiency compares across 10 million comments:

Schema approach	Index type	Index scan rows for "comments on post 42"	Query time
Polymorphic `(parent_type, parent_id)`	Composite B-tree	Filters on type string first, then ID	~3ms
Nullable FK `post_id`	Standard B-tree	Direct integer lookup	~0.5ms
Separate `post_comments` table	Standard B-tree on smaller table	Direct lookup on focused table	~0.3ms

The performance gap widens with more parent types because the composite index has more distinct parent_type values to filter through.

Why It Happens

Polymorphic associations emerge from reasonable instincts:

"One table is simpler than three." Nobody wants to create post_comments, video_comments, and product_comments. That feels like unnecessary duplication.
"Our ORM supports it." Rails' belongs_to :parent, polymorphic: true and Django's GenericForeignKey make this pattern a one-liner. If the framework supports it, it must be fine, right?
"We might add more parent types later." Adding a new commentable type with polymorphic associations means zero schema changes. With separate FK tables, you need a new table and migration.
"The prototype worked." With 1,000 comments and 3 parent types, everything is fast and no orphans exist yet. The problems emerge at scale and over time.

The core mistake: treating the database as a dumb storage layer and pushing integrity enforcement to the application. The database exists to enforce constraints. When you take that away, bugs are a matter of when, not if.

I reviewed a codebase last year where the parent_type column had 14 distinct values. Fourteen tables that could be the "parent" of an activity log row. Seven of those parent types had no cascade logic at the application level. The orphan count was in the hundreds of thousands.

The hidden cost: query complexity

Even if you manage to prevent orphans, polymorphic associations make queries harder than they need to be. Fetching a comment with its parent data requires knowing the type at query time:

TL;DR

Polymorphic associations store a foreign key plus a type discriminator column to point to one of several tables: parent_id = 42, parent_type = 'Post'.
The database cannot enforce referential integrity on a column that points to any of N tables. Orphaned rows, inconsistent deletes, and silent data corruption are guaranteed over time.
You also can't create a standard foreign key constraint on parent_id. The index must cover (parent_id, parent_type) and query performance degrades accordingly.
Fix: Use separate join tables per association type (normalised) or nullable FK columns per parent type with a check constraint.
When it's acceptable: ORM-managed polymorphism (Rails, Django) where application-layer code handles cascades and you have integration tests covering delete paths.

The Problem

You're building a comments system. Comments can be left on Posts, Videos, or Products. A developer introduces a "flexible" schema:

CREATE TABLE comments (
    id          BIGINT PRIMARY KEY,
    body        TEXT,
    user_id     BIGINT,
    parent_id   BIGINT,           -- points to a post, video, or product
    parent_type VARCHAR(20)       -- 'Post', 'Video', 'Product'
);

This looks clean. One table, one query pattern, works for all three parent types. I've seen this exact design proposed in at least a dozen schema reviews. It always feels elegant at design time.

Here's how index efficiency compares across 10 million comments:

Schema approach	Index type	Index scan rows for "comments on post 42"	Query time
Polymorphic `(parent_type, parent_id)`	Composite B-tree	Filters on type string first, then ID	~3ms
Nullable FK `post_id`	Standard B-tree	Direct integer lookup	~0.5ms
Separate `post_comments` table	Standard B-tree on smaller table	Direct lookup on focused table	~0.3ms

The performance gap widens with more parent types because the composite index has more distinct parent_type values to filter through.

Why It Happens

Polymorphic associations emerge from reasonable instincts:

"One table is simpler than three." Nobody wants to create post_comments, video_comments, and product_comments. That feels like unnecessary duplication.
"Our ORM supports it." Rails' belongs_to :parent, polymorphic: true and Django's GenericForeignKey make this pattern a one-liner. If the framework supports it, it must be fine, right?
"We might add more parent types later." Adding a new commentable type with polymorphic associations means zero schema changes. With separate FK tables, you need a new table and migration.
"The prototype worked." With 1,000 comments and 3 parent types, everything is fast and no orphans exist yet. The problems emerge at scale and over time.

The hidden cost: query complexity

Even if you manage to prevent orphans, polymorphic associations make queries harder than they need to be. Fetching a comment with its parent data requires knowing the type at query time:

Polymorphic associations anti-pattern

TL;DR

The Problem

Why It Happens

The hidden cost: query complexity

Continue Reading with Premium

Comments

Polymorphic associations anti-pattern

TL;DR

The Problem

Why It Happens

The hidden cost: query complexity

Continue Reading with Premium

Comments