Polymorphic associations anti-pattern
Learn why polymorphic foreign keys with a type discriminator break referential integrity and indexes, and what foreign key tables and typed joins look like as clean alternatives.
TL;DR
- Polymorphic associations store a foreign key plus a type discriminator column to point to one of several tables:
parent_id = 42, parent_type = 'Post'. - The database cannot enforce referential integrity on a column that points to any of N tables. Orphaned rows, inconsistent deletes, and silent data corruption are guaranteed over time.
- You also can't create a standard foreign key constraint on
parent_id. The index must cover(parent_id, parent_type)and query performance degrades accordingly. - Fix: Use separate join tables per association type (normalised) or nullable FK columns per parent type with a check constraint.
- When it's acceptable: ORM-managed polymorphism (Rails, Django) where application-layer code handles cascades and you have integration tests covering delete paths.
The Problem
You're building a comments system. Comments can be left on Posts, Videos, or Products. A developer introduces a "flexible" schema:
CREATE TABLE comments (
id BIGINT PRIMARY KEY,
body TEXT,
user_id BIGINT,
parent_id BIGINT, -- points to a post, video, or product
parent_type VARCHAR(20) -- 'Post', 'Video', 'Product'
);
This looks clean. One table, one query pattern, works for all three parent types. I've seen this exact design proposed in at least a dozen schema reviews. It always feels elegant at design time.
Then someone deletes a Video. Does the database automatically delete its comments? No. There's no FOREIGN KEY (parent_id) REFERENCES videos(id). The database has no idea that parent_type = 'Video' means this row should be tied to the videos table. Those comments are now orphans, permanently pointing to a non-existent video ID.
Over 18 months, your comments table accumulates thousands of orphans. Customer support reports "ghost comments" appearing on unrelated pages. The bug is intermittent because orphaned parent_id values occasionally collide with new IDs in different parent tables.
Querying all comments for a post requires filtering on two columns: WHERE parent_id = ? AND parent_type = 'Post'. Joining comments to their parent is not possible with a standard SQL JOIN. You need a UNION across three separate queries or a CASE statement in your ORM.
The performance cost is real too. An index on parent_id alone is nearly useless because it matches rows across all parent types. You need a composite index on (parent_type, parent_id), and even then, the index is less selective than a proper FK index on a single-type column.
Here's how index efficiency compares across 10 million comments:
| Schema approach | Index type | Index scan rows for "comments on post 42" | Query time |
|---|---|---|---|
Polymorphic (parent_type, parent_id) | Composite B-tree | Filters on type string first, then ID | ~3ms |
Nullable FK post_id | Standard B-tree | Direct integer lookup | ~0.5ms |
Separate post_comments table | Standard B-tree on smaller table | Direct lookup on focused table | ~0.3ms |
The performance gap widens with more parent types because the composite index has more distinct parent_type values to filter through.
Why It Happens
Polymorphic associations emerge from reasonable instincts:
- "One table is simpler than three." Nobody wants to create
post_comments,video_comments, andproduct_comments. That feels like unnecessary duplication. - "Our ORM supports it." Rails'
belongs_to :parent, polymorphic: trueand Django'sGenericForeignKeymake this pattern a one-liner. If the framework supports it, it must be fine, right? - "We might add more parent types later." Adding a new commentable type with polymorphic associations means zero schema changes. With separate FK tables, you need a new table and migration.
- "The prototype worked." With 1,000 comments and 3 parent types, everything is fast and no orphans exist yet. The problems emerge at scale and over time.
The core mistake: treating the database as a dumb storage layer and pushing integrity enforcement to the application. The database exists to enforce constraints. When you take that away, bugs are a matter of when, not if.
I reviewed a codebase last year where the parent_type column had 14 distinct values. Fourteen tables that could be the "parent" of an activity log row. Seven of those parent types had no cascade logic at the application level. The orphan count was in the hundreds of thousands.
The hidden cost: query complexity
Even if you manage to prevent orphans, polymorphic associations make queries harder than they need to be. Fetching a comment with its parent data requires knowing the type at query time:
Continue Reading with Premium
Unlock this article and every other in-depth system design guide on the platform with NotesFromSDE Premium.