Polymorphic associations anti-pattern
Learn why polymorphic foreign keys with a type discriminator break referential integrity and indexes, and what foreign key tables and typed joins look like as clean alternatives.
TL;DR
- Polymorphic associations store a foreign key plus a type discriminator column to point to one of several tables:
parent_id = 42, parent_type = 'Post'. - The database cannot enforce referential integrity on a column that points to any of N tables. Orphaned rows, inconsistent deletes, and silent data corruption are guaranteed over time.
- You also can't create a standard foreign key constraint on
parent_id. The index must cover(parent_id, parent_type)and query performance degrades accordingly. - Fix: Use separate join tables per association type (normalised) or nullable FK columns per parent type with a check constraint.
- When it's acceptable: ORM-managed polymorphism (Rails, Django) where application-layer code handles cascades and you have integration tests covering delete paths.
The Problem
You're building a comments system. Comments can be left on Posts, Videos, or Products. A developer introduces a "flexible" schema:
CREATE TABLE comments (
id BIGINT PRIMARY KEY,
body TEXT,
user_id BIGINT,
parent_id BIGINT, -- points to a post, video, or product
parent_type VARCHAR(20) -- 'Post', 'Video', 'Product'
);
This looks clean. One table, one query pattern, works for all three parent types. I've seen this exact design proposed in at least a dozen schema reviews. It always feels elegant at design time.
Then someone deletes a Video. Does the database automatically delete its comments? No. There's no FOREIGN KEY (parent_id) REFERENCES videos(id). The database has no idea that parent_type = 'Video' means this row should be tied to the videos table. Those comments are now orphans, permanently pointing to a non-existent video ID.
Over 18 months, your comments table accumulates thousands of orphans. Customer support reports "ghost comments" appearing on unrelated pages. The bug is intermittent because orphaned parent_id values occasionally collide with new IDs in different parent tables.
Querying all comments for a post requires filtering on two columns: WHERE parent_id = ? AND parent_type = 'Post'. Joining comments to their parent is not possible with a standard SQL JOIN. You need a UNION across three separate queries or a CASE statement in your ORM.
The performance cost is real too. An index on parent_id alone is nearly useless because it matches rows across all parent types. You need a composite index on (parent_type, parent_id), and even then, the index is less selective than a proper FK index on a single-type column.
Here's how index efficiency compares across 10 million comments:
| Schema approach | Index type | Index scan rows for "comments on post 42" | Query time |
|---|---|---|---|
Polymorphic (parent_type, parent_id) | Composite B-tree | Filters on type string first, then ID | ~3ms |
Nullable FK post_id | Standard B-tree | Direct integer lookup | ~0.5ms |
Separate post_comments table | Standard B-tree on smaller table | Direct lookup on focused table | ~0.3ms |
The performance gap widens with more parent types because the composite index has more distinct parent_type values to filter through.
Why It Happens
Polymorphic associations emerge from reasonable instincts:
- "One table is simpler than three." Nobody wants to create
post_comments,video_comments, andproduct_comments. That feels like unnecessary duplication. - "Our ORM supports it." Rails'
belongs_to :parent, polymorphic: trueand Django'sGenericForeignKeymake this pattern a one-liner. If the framework supports it, it must be fine, right? - "We might add more parent types later." Adding a new commentable type with polymorphic associations means zero schema changes. With separate FK tables, you need a new table and migration.
- "The prototype worked." With 1,000 comments and 3 parent types, everything is fast and no orphans exist yet. The problems emerge at scale and over time.
The core mistake: treating the database as a dumb storage layer and pushing integrity enforcement to the application. The database exists to enforce constraints. When you take that away, bugs are a matter of when, not if.
I reviewed a codebase last year where the parent_type column had 14 distinct values. Fourteen tables that could be the "parent" of an activity log row. Seven of those parent types had no cascade logic at the application level. The orphan count was in the hundreds of thousands.
The hidden cost: query complexity
Even if you manage to prevent orphans, polymorphic associations make queries harder than they need to be. Fetching a comment with its parent data requires knowing the type at query time:
-- You can't do this:
SELECT c.*, p.* FROM comments c JOIN ??? ON c.parent_id = ???.id;
-- You have to do this:
SELECT c.*, p.title FROM comments c
JOIN posts p ON c.parent_id = p.id WHERE c.parent_type = 'Post'
UNION ALL
SELECT c.*, v.title FROM comments c
JOIN videos v ON c.parent_id = v.id WHERE c.parent_type = 'Video'
UNION ALL
SELECT c.*, pr.name FROM comments c
JOIN products pr ON c.parent_id = pr.id WHERE c.parent_type = 'Product';
Three queries unioned together. Every new parent type adds another UNION branch. ORMs hide this complexity, but the SQL is still being generated and executed.
Debugging these queries is painful. When a comment appears under the wrong parent, you have to check the parent_type value, cross-reference against the actual table, verify the ID still exists, and determine whether it's an orphan or an ID collision. With proper FKs, none of these failure modes exist.
How to Detect It
| Symptom | What It Means | How to Check |
|---|---|---|
Column named *_type or *_kind next to an ID column | Polymorphic discriminator | SELECT column_name FROM information_schema.columns WHERE column_name LIKE '%_type' |
| No FK constraints on the association table | Database can't enforce integrity | Check information_schema.table_constraints for the table |
| Orphaned rows (ID points to deleted parent) | Cascade deletes not working | SELECT COUNT(*) FROM comments c LEFT JOIN posts p ON c.parent_id = p.id WHERE c.parent_type = 'Post' AND p.id IS NULL |
ORM uses GenericForeignKey or polymorphic: true | Framework-level polymorphism | Search codebase for these patterns |
| UNION queries to join child to parent | Can't use a standard JOIN | Search for UNION on the comments table |
| Bug reports about "ghost" data or wrong associations | ID collision across parent types | Correlate orphaned row count with bug frequency |
The fastest check: count orphaned rows. If orphans exist and nobody noticed, you have a polymorphic association problem with missing application-level cascade logic.
-- Run this for each parent type to count orphans
SELECT 'Post' AS parent_type, COUNT(*) AS orphan_count
FROM comments c
LEFT JOIN posts p ON c.parent_id = p.id
WHERE c.parent_type = 'Post' AND p.id IS NULL
UNION ALL
SELECT 'Video', COUNT(*)
FROM comments c
LEFT JOIN videos v ON c.parent_id = v.id
WHERE c.parent_type = 'Video' AND v.id IS NULL;
If this returns anything above zero, you have orphans. The higher the count, the more urgent the migration.
The Fix
Fix 1: Separate join tables per type
Create one join table per type of association. Each table has a proper foreign key.
CREATE TABLE post_comments (
comment_id BIGINT REFERENCES comments(id) ON DELETE CASCADE,
post_id BIGINT REFERENCES posts(id) ON DELETE CASCADE,
PRIMARY KEY (comment_id)
);
CREATE TABLE video_comments (
comment_id BIGINT REFERENCES comments(id) ON DELETE CASCADE,
video_id BIGINT REFERENCES videos(id) ON DELETE CASCADE,
PRIMARY KEY (comment_id)
);
CREATE TABLE product_comments (
comment_id BIGINT REFERENCES comments(id) ON DELETE CASCADE,
product_id BIGINT REFERENCES products(id) ON DELETE CASCADE,
PRIMARY KEY (comment_id)
);
Referential integrity enforced by the database. Cascades work correctly. Standard indexed joins. Trade-off: more tables, more joins, and adding a new commentable type requires a new table and migration.
This approach is best when you have many parent types (5+) or expect to add new types frequently. The overhead of a new join table is low, and each table is simple and self-contained.
-- Querying: clean JOIN, no type discriminator
SELECT c.body, p.title AS parent_title
FROM comments c
JOIN post_comments pc ON c.id = pc.comment_id
JOIN posts p ON pc.post_id = p.id
WHERE p.id = 42;
Fix 2: Nullable foreign keys per parent type
Add one nullable FK column per possible parent type. A comment for a post has post_id set and video_id and product_id as NULL. A check constraint ensures exactly one is non-null.
CREATE TABLE comments (
id BIGINT PRIMARY KEY,
body TEXT,
user_id BIGINT REFERENCES users(id),
post_id BIGINT REFERENCES posts(id) ON DELETE CASCADE,
video_id BIGINT REFERENCES videos(id) ON DELETE CASCADE,
product_id BIGINT REFERENCES products(id) ON DELETE CASCADE,
CONSTRAINT exactly_one_parent CHECK (
(post_id IS NOT NULL)::int +
(video_id IS NOT NULL)::int +
(product_id IS NOT NULL)::int = 1
)
);
Full referential integrity. Standard indexed joins per type. Trade-off: sparse columns (many NULLs) and adding a new commentable type requires a schema migration to add a column.
I prefer this pattern for systems with fewer than 5 parent types. It's the simplest to query and the easiest for new developers to understand.
-- Querying: simple WHERE, standard index
SELECT c.body, c.post_id, c.video_id, c.product_id
FROM comments c
WHERE c.post_id = 42;
-- The query planner uses a standard B-tree index on post_id
CREATE INDEX idx_comments_post_id ON comments(post_id) WHERE post_id IS NOT NULL;
Which fix to use
Severity and Blast Radius
Polymorphic associations are a data integrity anti-pattern. The failures are silent and cumulative.
- Data corruption: Orphaned rows accumulate slowly. You often don't notice until a customer reports "ghost" content or a report shows impossible data.
- Affected systems: Every feature that reads or writes the polymorphic table. Comments, notifications, activity feeds, audit logs, and any table using the
(parent_id, parent_type)pattern. - Recovery difficulty: Medium-high. Finding and cleaning orphaned rows requires cross-table scans for each parent type. Migrating the schema requires rewriting queries and updating application code.
- Cascade risk: Low for read paths. High for delete paths. One missed cascade can silently corrupt data for months before anyone notices.
- ID collision risk: If you reuse auto-increment IDs across parent tables (
posts.id = 42andvideos.id = 42), orphaned comments can resolve to the wrong parent type. This is a data leak, not just a bug.
Comparing the approaches
| Aspect | Polymorphic FK | Nullable FK columns | Separate join tables |
|---|---|---|---|
| Referential integrity | None (application only) | Full (database enforced) | Full (database enforced) |
| Cascade deletes | Manual, error-prone | Automatic via FK | Automatic via FK |
| Orphan risk | High | None | None |
| Query complexity | UNION per parent type | Simple WHERE on FK column | JOIN to specific table |
| Schema changes for new type | Zero | Add column + migration | Add table + migration |
| Index efficiency | Composite (type, id) | Standard FK index | Standard FK index |
When It's Actually OK
- ORM-managed polymorphism (Rails
belongs_to :parent, polymorphic: true, DjangoGenericForeignKey) where the application layer handles cascade deletes and you have integration tests covering every delete path. The risk is managed, not eliminated. - Read-only analytics tables where referential integrity doesn't matter because the data is a snapshot, not a live association.
- Internal tools with 2-3 parent types where the developer team is small, the codebase is well-understood, and the data volume is low (under 100K rows).
- Event sourcing stores where the "parent" is an event aggregate ID and events are append-only (never deleted). If nothing is ever deleted, orphans can't exist.
A useful rule of thumb: if the child data matters (comments, payments, audit records), use proper FKs. If the child data is ephemeral (logs, temp caches, analytics events), polymorphic associations are tolerable.
Even in these cases, nullable FK columns are almost always a better choice. The schema migration cost of adding a new parent type is far lower than the debugging cost of orphaned rows.
The bottom line: if your data matters (and it does), let the database enforce integrity. Application-level constraints are a safety net with holes.
How This Shows Up in Interviews
Polymorphic associations come up when designing commenting systems, notification systems, or any feature that attaches to multiple entity types.
The strong answer addresses integrity directly:
- "I'd use nullable FK columns per parent type with a CHECK constraint ensuring exactly one is non-null. This gives the database full FK enforcement."
- "For more than 5 parent types, I'd use separate join tables per type. More tables, but each has proper FK constraints and cascade deletes."
- "I'd avoid a
parent_typediscriminator column because the database can't enforce referential integrity on it. Orphaned rows are inevitable."
Callout: "The database exists to enforce constraints. A
parent_typecolumn is a promise only the application can keep, and applications have bugs."
What interviewers probe
They'll ask: "What happens when you delete a Video? How do you prevent orphaned comments?" If you're using polymorphic associations, you need a convincing answer about application-level cascades. If you're using proper FK patterns, the answer is simply "the database handles it."
They may also test edge cases: "What if posts.id = 42 and videos.id = 42 both exist, and a comment with parent_id = 42, parent_type = 'Video' is queried without the type filter?" The answer: the wrong data is returned. With proper FKs, this scenario is impossible because each FK points to exactly one table.
Migration talking point
If asked about migrating an existing polymorphic schema, describe the dual-write approach: add new FK columns alongside the old discriminator, backfill in batches, migrate reads first, then writes, then drop the old columns. This shows you've done it before (or at least thought it through).
Quick Recap
- Polymorphic associations use one FK column that can reference multiple tables, distinguished by a type discriminator. The database can't enforce referential integrity on this.
- Orphaned rows accumulate silently over time. ID collisions across parent types can cause data to appear under the wrong entity.
- Fix 1: Nullable FK column per parent type with a CHECK constraint. Best for fewer than 5 parent types.
- Fix 2: Separate join tables per parent type. Best for 5+ parent types or when types change frequently.
- ORM-level polymorphism is acceptable if you manage cascade logic at the application layer, test it rigorously, and accept the risk that direct SQL operations bypass your hooks.
- The database exists to enforce constraints. Don't make promises in application code that the schema should guarantee.