PostgreSQL vs. MySQL vs. SQL Server
How PostgreSQL, MySQL, and SQL Server compare for system design, covering feature depth, replication models, JSON support, full-text search, cloud-managed options, and when each is the right default.
TL;DR
| Dimension | PostgreSQL | MySQL | SQL Server |
|---|---|---|---|
| Default for | New projects, complex queries, JSON + relational hybrid | Read-heavy web apps, PHP/Java stacks, Vitess sharding | .NET ecosystem, Microsoft shops, BI + OLTP hybrid |
| Killer feature | Extensions (PostGIS, pgvector, TimescaleDB, Citus) | Battle-tested replication at massive scale (Aurora, Vitess) | Columnstore indexes, Always On AG, SSMS tooling |
| Watch out for | VACUUM tuning at scale, connection pooling needed | Weak partial indexes, limited SQL features | Per-core licensing costs, Windows dependency |
| Replication | Streaming + logical replication | Binary log + Group Replication | Always On Availability Groups |
| JSON support | JSONB with GIN indexes (first-class) | JSON type with generated columns for indexing | JSON functions (adequate, no native type) |
| Cloud managed | Aurora PG, Cloud SQL, Azure DB for PG, Supabase, Neon | Aurora MySQL, PlanetScale, Cloud SQL | Azure SQL Database, RDS for SQL Server |
Default answer: PostgreSQL for most new projects. MySQL when your team has deep MySQL expertise or you need Vitess/Aurora MySQL specifically. SQL Server when you're in a Microsoft/.NET environment or need columnstore analytics.
All three are ACID-compliant, support standard SQL, have mature ecosystems, and can handle terabytes of data. The differences are in the edges: extensions, replication topology, licensing, and operational ergonomics.
The Framing
You're starting a new project and need a relational database. The team has two camps: "just use Postgres" and "our backend devs all know MySQL." The CTO asks whether SQL Server would integrate better with Azure.
This conversation happens at every startup and inside every enterprise team. The SQL syntax is 90% identical across all three, so the decision feels trivial. It isn't. The differences show up at 3 AM when you're debugging replication lag, when your JSONB query needs an index that MySQL can't create, or when your DBA discovers PostgreSQL's VACUUM is eating 40% of disk I/O because nobody tuned autovacuum.
Each database has a philosophy. PostgreSQL prioritizes correctness and extensibility. MySQL prioritizes simplicity and read throughput. SQL Server prioritizes enterprise integration and tooling. The "right" choice depends on which trade-offs match your constraints.
For your interview: lead with PostgreSQL as the default, then explain when MySQL or SQL Server is better. That shows judgment, not bias.
How Each Works
PostgreSQL: Correctness-First with Extensible Architecture
PostgreSQL uses Multi-Version Concurrency Control (MVCC) where readers never block writers and vice versa. Each row version is stored inline, and VACUUM reclaims dead tuples. This enables serializable snapshot isolation (the strongest isolation level available in production databases) without the performance penalty most engines impose.
The extension system is PostgreSQL's superpower. Unlike MySQL's plugin model or SQL Server's CLR integration, PostgreSQL extensions can define new types, operators, index methods, and even storage engines:
-- PostGIS: geographic queries
SELECT name FROM stores
WHERE ST_DWithin(location, ST_MakePoint(-73.99, 40.73)::geography, 5000);
-- pgvector: ML embedding similarity search
SELECT title FROM articles
ORDER BY embedding <=> '[0.12, 0.45, ...]'::vector
LIMIT 10;
-- JSONB: index and query JSON natively
CREATE INDEX idx_meta ON products USING GIN (metadata jsonb_path_ops);
SELECT * FROM products WHERE metadata @> '{"color": "red"}';
Replication comes in two flavors. Streaming replication sends WAL (Write-Ahead Log) bytes to replicas for physical standby. Logical replication decodes WAL into row-level changes, enabling selective replication (specific tables), cross-version upgrades, and feeding change data to external systems like Kafka via Debezium. Streaming replication is simpler to set up but requires identical PostgreSQL versions on primary and replica. Logical replication allows cross-version replication, making zero-downtime major version upgrades possible.
Window functions, CTEs (Common Table Expressions), recursive queries, lateral joins, and row-level security (RLS) are all mature. I've seen PostgreSQL handle analytics queries that would require a separate data warehouse in MySQL. The query planner is sophisticated enough to flatten subqueries, push down predicates into views, and parallelize hash joins across multiple workers.
Partitioning in PostgreSQL supports range, list, and hash strategies natively since PG 10. A 500M-row orders table partitioned by month lets VACUUM work on individual monthly partitions (fast) instead of the entire table (slow). Partition pruning eliminates irrelevant partitions at query time. Combined with parallel query execution (PG 11+), analytical queries on partitioned tables scale nearly linearly with available CPU cores.
-- Declarative partitioning
CREATE TABLE orders (
id BIGSERIAL,
created_at TIMESTAMPTZ NOT NULL,
total NUMERIC(10,2),
customer_id BIGINT
) PARTITION BY RANGE (created_at);
CREATE TABLE orders_2026_01 PARTITION OF orders
FOR VALUES FROM ('2026-01-01') TO ('2026-02-01');
CREATE TABLE orders_2026_02 PARTITION OF orders
FOR VALUES FROM ('2026-02-01') TO ('2026-03-01');
JIT compilation (Just-In-Time, using LLVM) compiles query expressions into machine code at runtime. For CPU-bound analytical queries touching millions of rows, JIT can reduce execution time by 20-50%. It's enabled by default in PG 12+ but only kicks in for queries whose estimated cost exceeds jit_above_cost (default 100,000). For simple OLTP queries, JIT adds overhead and is skipped automatically.
The pg_stat_statements extension is essential for production PostgreSQL. It tracks execution statistics for all SQL statements: total time, number of calls, mean time, rows returned, and buffer hits. This is how you identify slow queries without application-level instrumentation. Combined with auto_explain (which logs execution plans for slow queries), you get full observability into query performance.
MySQL: Simplicity and Read Throughput at Scale
MySQL's InnoDB engine uses a clustered index (primary key physically orders data on disk), which makes primary key lookups blazing fast. Simple SELECT * WHERE id = ? queries are faster on MySQL than PostgreSQL for this reason: one B-tree traversal hits the data page directly.
Replication is binary log (binlog) based. The primary writes changes to the binlog; replicas read and apply them. This is battle-tested at massive scale. Group Replication adds multi-primary support with Paxos-based consensus for automatic failover.
-- InnoDB clustered index advantage: fast PK lookup
SELECT * FROM users WHERE id = 12345; -- Single B-tree traversal
-- MySQL 8.0+ CTE support (finally)
WITH active_users AS (
SELECT user_id, COUNT(*) as order_count
FROM orders WHERE created_at > NOW() - INTERVAL 30 DAY
GROUP BY user_id
)
SELECT u.name, a.order_count
FROM users u JOIN active_users a ON u.id = a.user_id;
MySQL's ecosystem for scaling reads is unmatched. ProxySQL handles read/write splitting transparently. Vitess (built at YouTube) shards MySQL horizontally across thousands of nodes. Aurora MySQL provides up to 15 read replicas with sub-second replication lag.
The trade-off: MySQL's SQL dialect is less expressive. No LATERAL joins, limited window function support compared to PostgreSQL, no native array or composite types, and JSON indexes require generated columns rather than native GIN indexes.
Global Transaction Identifiers (GTIDs) simplify replication management by assigning a unique ID to every transaction. This makes failover and replica promotion straightforward: the new primary picks up from the last committed GTID rather than tracking binary log file positions. Semi-synchronous replication (rpl_semi_sync_source_enabled=ON) waits for at least one replica to acknowledge the transaction before returning to the client, providing durability guarantees closer to synchronous replication without the full latency penalty.
-- Check replication lag on a MySQL replica
SHOW SLAVE STATUS\G
-- Key fields: Seconds_Behind_Master, Slave_SQL_Running_State
-- GTID-based replication setup
SET GLOBAL gtid_mode = ON;
SET GLOBAL enforce_gtid_consistency = ON;
-- Replicas auto-track by GTID, no need to specify log file positions
SQL Server: Enterprise Integration and Hybrid Analytics
SQL Server's strength is integration with the Microsoft ecosystem: .NET, Azure, Visual Studio, Power BI, and SQL Server Management Studio (SSMS). For .NET shops, the developer experience is seamless from IDE to database.
Columnstore indexes are the standout technical feature. They store data column-wise with batch-mode processing, enabling OLAP-style analytics directly in the OLTP database. A query scanning billions of rows across a few columns runs 10-100x faster with columnstore than traditional row-based indexes.
-- Columnstore index for analytics on OLTP data
CREATE NONCLUSTERED COLUMNSTORE INDEX idx_orders_analytics
ON orders (order_date, total_amount, status, customer_id);
-- This analytical query now runs in seconds, not minutes
SELECT DATEPART(month, order_date) AS month,
COUNT(*) AS order_count,
SUM(total_amount) AS revenue
FROM orders
WHERE order_date >= '2025-01-01'
GROUP BY DATEPART(month, order_date);
Always On Availability Groups provide synchronous or asynchronous replication with automatic failover. Readable secondary replicas handle read-only workloads, offloading reporting from the primary.
T-SQL extends standard SQL with procedural programming features. While PostgreSQL's PL/pgSQL is comparable, T-SQL's integration with SSMS (execution plans, query store, performance dashboards) provides a developer experience that other databases lacks.
SQL Server's automatic tuning feature analyzes query patterns and automatically creates or drops indexes based on workload data. The Database Engine Tuning Advisor has existed for years, but the automatic tuning mode (enabled in Azure SQL by default) takes action without human intervention. It monitors for plan regressions (queries that suddenly got slower due to a plan change) and automatically forces the previous good plan. This self-tuning capability is unique among the three databases.
Linked Servers allow SQL Server to query external data sources (other SQL Server instances, Oracle, PostgreSQL, MySQL) using standard T-SQL syntax. Combined with distributed transactions, this enables cross-database operations that would require application-level coordination in PostgreSQL or MySQL. For enterprises with data spread across multiple database systems, Linked Servers reduce integration complexity.
-- Query a remote PostgreSQL database from SQL Server
SELECT p.product_name, o.order_date
FROM OPENQUERY(PG_LINKED_SERVER,
'SELECT product_id, product_name FROM products WHERE active = true'
) AS p
JOIN local_orders o ON o.product_id = p.product_id;
Temporal tables (system-versioned) automatically track all row changes with valid-from/valid-to timestamps. This provides built-in audit history without application code. Every UPDATE or DELETE creates a historical record in a paired history table. Querying "what was this row at timestamp X" is a single SQL statement.
-- System-versioned temporal table
CREATE TABLE employees (
id INT PRIMARY KEY,
name NVARCHAR(100),
salary DECIMAL(10,2),
department NVARCHAR(50),
valid_from DATETIME2 GENERATED ALWAYS AS ROW START,
valid_to DATETIME2 GENERATED ALWAYS AS ROW END,
PERIOD FOR SYSTEM_TIME (valid_from, valid_to)
) WITH (SYSTEM_VERSIONING = ON);
-- Query historical state: what was the salary on Jan 15?
SELECT name, salary
FROM employees FOR SYSTEM_TIME AS OF '2026-01-15'
WHERE id = 42;
In-Memory OLTP (Hekaton) provides lock-free, latch-free memory-optimized tables for extreme OLTP workloads. This is a niche feature but delivers 10-30x throughput improvement for contention-heavy workloads like session stores and high-frequency counters.
Query Store captures execution plan history for every query. When a query suddenly becomes slow (plan regression), Query Store lets you identify the exact point where the plan changed and force the previous good plan. This is a built-in feature that eliminates one of the most frustrating database performance problems: queries that were fast yesterday and slow today with no code changes. PostgreSQL's pg_stat_statements tracks aggregated statistics but not plan history; you'd need third-party tools like pganalyze to get comparable plan regression detection.
Monitoring Essentials by Database
Each database has different critical metrics to monitor. Getting these dashboards right from day one prevents most production incidents.
- PostgreSQL:
pg_stat_user_tables.n_dead_tup(bloat indicator),age(datfrozenxid)(transaction ID wraparound risk, must stay under 2 billion), replication lag viapg_stat_replication, and connection count vsmax_connections. Ifn_dead_tupgrows continuously, autovacuum isn't keeping up. Set up PagerDuty alerts forage(datfrozenxid)> 1 billion. - MySQL:
Seconds_Behind_Master(replication lag), InnoDB buffer pool hit ratio (should be >99%),Threads_runningvsThreads_connected(contention indicator), and binary log position for failover readiness. Theperformance_schemaprovides deep instrumentation for wait events and lock contention. - SQL Server: Wait statistics (
sys.dm_os_wait_statsreveals bottlenecks),sys.dm_exec_query_statsfor expensive queries,sys.dm_db_index_usage_statsfor unused indexes to drop, and Always On dashboard for replica synchronization health. The Dynamic Management Views (DMVs) provide the most comprehensive built-in monitoring of all three databases.
The key takeaway from the replication models: PostgreSQL gives you the most flexibility (streaming + logical), MySQL gives you the simplest setup (binlog relay), and SQL Server gives you the best tooling (Always On dashboard, automatic failover with SSMS integration).
One thing to note: all three databases support synchronous replication for zero data loss, but the performance penalty is significant (2-5x latency on writes). Most production deployments use asynchronous replication and accept the small data loss window (typically sub-second) in exchange for write performance.
Head-to-Head Comparison
| Dimension | PostgreSQL | MySQL | SQL Server | Verdict |
|---|---|---|---|---|
| SQL feature depth | Full SQL:2016, lateral joins, recursive CTEs, window functions, JSONB | SQL:2016 partial, CTEs since 8.0, limited lateral | T-SQL (proprietary extensions), good window support | PostgreSQL has deepest standard SQL |
| Extensions/plugins | PostGIS, pgvector, TimescaleDB, Citus, pg_trgm (100+ extensions) | Limited plugin model, no new types or indexes | CLR integration, columnstore built-in | PostgreSQL, dramatically more extensible |
| JSON support | JSONB (binary, GIN-indexed, operators @>, ->, #>>) | JSON type, functional indexes via generated columns | JSON functions, no native JSON type | PostgreSQL, native indexing wins |
| Replication maturity | Streaming + logical (mature since PG 10) | Binlog replication (decades of battle-testing at scale) | Always On AG (enterprise-grade, GUI-managed) | MySQL for sheer scale; SQL Server for enterprise |
| Horizontal scaling | Citus extension, manual sharding | Vitess (YouTube scale), ProxySQL, PlanetScale | Elastic pools in Azure SQL | MySQL via Vitess is most proven |
| Concurrency model | MVCC, readers never block writers, VACUUM required | MVCC with undo logs, no vacuum needed | Row versioning with tempdb, snapshot isolation | Both PG and MySQL strong; PG needs vacuum tuning |
| Operational complexity | VACUUM tuning, bloat monitoring, extension management | Simpler day-to-day, but replication topology gets complex | SSMS simplifies most operations | MySQL simplest for basic OLTP |
| Cloud managed options | Aurora PG, Cloud SQL, Azure DB, Supabase, Neon | Aurora MySQL, PlanetScale, Cloud SQL, Azure DB | Azure SQL (best integration), RDS for SQL Server | All have strong cloud options |
| Licensing | Open source (PostgreSQL License, very permissive) | Open source (GPL) with commercial Enterprise features | Commercial (per-core licensing, expensive) | PostgreSQL and MySQL win on cost |
| Full-text search | tsvector/tsquery with GIN indexes (good, not great) | FULLTEXT indexes on InnoDB (basic) | Full-text with semantic search (best built-in) | SQL Server built-in; PG with pg_trgm for fuzzy |
The fundamental tension: PostgreSQL gives you the most features and flexibility at the cost of operational complexity (vacuum, bloat). MySQL gives you simplicity and proven scale at the cost of SQL expressiveness. SQL Server gives you the best enterprise tooling at the cost of vendor lock-in and licensing fees.
Performance Benchmarks (Approximate)
These numbers are approximate and workload-dependent, but they give a sense of relative performance:
# Simple OLTP (SELECT by PK, INSERT, UPDATE)
PostgreSQL (PG 16, pgbouncer): 45K-60K TPS
MySQL (InnoDB, 8.0): 55K-75K TPS # Clustered index advantage
SQL Server (2022): 50K-65K TPS
# Complex analytical query (window functions, CTEs, 10M rows)
PostgreSQL: 2-5 seconds # Parallel query, JIT
MySQL: 15-45 seconds # Limited parallelism
SQL Server (columnstore): 0.5-2 seconds # Batch-mode columnar
# JSON operations (JSONB query with GIN index, 1M docs)
PostgreSQL (JSONB + GIN): 0.5-2ms per query
MySQL (JSON + generated col): 2-8ms per query
SQL Server (JSON functions): 5-15ms per query
Connection pooling matters for PostgreSQL. Each PostgreSQL connection is a separate OS process (not a thread), consuming ~5-10 MB of RAM. At 500 connections, that's 2.5-5 GB just for connection overhead. PgBouncer or built-in connection pooling (Supabase, Neon) is essential for production deployments. MySQL uses threads (~256 KB per connection), handling thousands of connections with minimal overhead. SQL Server also uses threads and handles high connection counts natively.
When PostgreSQL Wins
PostgreSQL is the right default for most new server-side projects. Here's when it's the clearly better choice:
You need more than relational data. JSONB with GIN indexes lets you store and query semi-structured data without a separate document store. PostGIS handles geospatial queries. pgvector handles ML embedding search. TimescaleDB handles time-series. Rather than running four specialized databases, you run one.
Complex queries are core to your product. If your application relies on window functions, recursive CTEs, lateral joins, or advanced aggregations, PostgreSQL's SQL engine is years ahead of MySQL. I've seen teams rewrite MySQL stored procedures as single PG queries using window functions, reducing both code complexity and latency.
You need strong consistency guarantees. PostgreSQL supports serializable snapshot isolation by default (you can set it explicitly). MySQL's default is REPEATABLE READ, which allows phantom reads. For financial or inventory systems where anomalies are unacceptable, PostgreSQL's isolation model is safer out of the box.
Logical replication for CDC pipelines. PostgreSQL's logical decoding feeds row-level changes to Debezium, Kafka, or downstream systems without performance impact on the primary. This enables event-driven architectures built on top of the database, not alongside it. Combined with pgoutput (the default output plugin since PG 10), you get structured change events without third-party extensions.
Row-level security (RLS) for multi-tenancy. RLS policies enforce tenant isolation at the database level, not just the application level. This is defense-in-depth for SaaS platforms where a code bug shouldn't leak data across tenants. You can combine RLS with current_setting() to pass tenant context through connection variables, making multi-tenant queries transparent to application code.
Partitioning for large tables. Declarative partitioning (range, list, hash) makes PostgreSQL viable for billion-row tables. Combined with parallel query (utilizing multiple CPU cores per query), partition pruning, and JIT compilation, PostgreSQL handles analytical workloads that MySQL would struggle with.
Advanced data types. Arrays, hstore (key-value), composite types, range types, and domain types are all first-class. These reduce the need for join tables and lookup patterns. A tags TEXT[] column with a GIN index replaces an entire many-to-many tag relationship pattern.
Foreign Data Wrappers (FDW) for cross-database queries. FDW lets you query external data sources (MySQL, MongoDB, Redis, CSV files, other PostgreSQL instances) directly from PostgreSQL using standard SQL. postgres_fdw connects to remote PostgreSQL servers. mysql_fdw connects to MySQL. You can JOIN a local PostgreSQL table with a remote MySQL table in a single query. This is invaluable during migrations or in polyglot persistence architectures where data lives across multiple systems.
LISTEN/NOTIFY for real-time updates. PostgreSQL's LISTEN/NOTIFY provides pub/sub messaging within the database. Applications can subscribe to channels and receive notifications when data changes. Combined with triggers, this enables reactive patterns without polling. For small-scale real-time features (notify connected clients when a row changes), LISTEN/NOTIFY eliminates the need for a separate message broker.
Concurrent index creation. CREATE INDEX CONCURRENTLY builds indexes without locking the table. In production, adding an index to a 100M-row table on MySQL or SQL Server blocks writes for the duration. PostgreSQL lets you add indexes while the application continues operating. For systems that can't afford downtime for schema changes, this is a significant operational advantage.
UPSERT with conflict resolution. INSERT ... ON CONFLICT DO UPDATE (UPSERT) handles insert-or-update atomically. The conflict target can be any unique constraint, and the UPDATE clause has access to both the existing row (excluded) and the new values. MySQL's INSERT ... ON DUPLICATE KEY UPDATE provides similar functionality, but PostgreSQL's syntax is more flexible (you can specify which constraint triggers the upsert).
When MySQL Wins
MySQL's advantages are specific and real. Choose MySQL when:
Massive read-replica scaling is your primary concern. Aurora MySQL supports up to 15 read replicas with sub-second lag. Vitess (powering YouTube and Slack) shards MySQL horizontally across thousands of instances. If your scaling strategy is "more read replicas" or "horizontal sharding via Vitess," MySQL's ecosystem is more mature.
Simple OLTP with high read throughput. InnoDB's clustered primary key index means SELECT * WHERE id = ? is a single B-tree traversal hitting the data page directly. For workloads that are 90%+ primary key lookups, MySQL is measurably faster. I've benchmarked 15-20% higher throughput on MySQL for this specific pattern.
Your team and tooling are MySQL-native. PHP frameworks (Laravel, Symfony), Java ORMs (Hibernate), and many older web stacks default to MySQL. Migrating from MySQL to PostgreSQL has a cost: query syntax differences, ORM compatibility, operational procedures, and team retraining. That cost needs a feature-driven justification.
Read/write splitting with ProxySQL. ProxySQL sits between your application and MySQL, automatically routing read queries to replicas and write queries to the primary. The application doesn't need to know about the topology. ProxySQL handles connection pooling, query caching, and automatic failover. Combined with GTID-based replication, this provides a high-availability setup with minimal application changes. PostgreSQL has pgBouncer for pooling and Patroni for HA, but ProxySQL's integrated read/write splitting is more turnkey.
PlanetScale or Vitess for serverless/sharding. PlanetScale's branching model and non-blocking schema migrations are built on Vitess and only work with MySQL. If you want Git-like database branching for development workflows, PlanetScale is MySQL-only.
You're deeply invested in an existing MySQL ecosystem. When 500K lines of code and 200 stored procedures target MySQL, the migration cost to PostgreSQL is enormous. New features within the same app might as well stay MySQL. Migrate only when a PostgreSQL feature is blocking a business requirement.
Performance for simple workloads is measurably better. In TPC-C style benchmarks, MySQL with InnoDB consistently outperforms PostgreSQL by 10-20% for simple OLTP workloads (INSERT, UPDATE, SELECT by PK). This gap has narrowed with each PostgreSQL release, but for applications where every millisecond counts on simple operations, MySQL's simpler execution path provides a genuine edge.
Aurora MySQL serverless v2. Aurora Serverless v2 for MySQL scales compute capacity from 0.5 to 256 ACUs (Aurora Capacity Units) in increments of 0.5, adjusting within seconds based on workload. This is ideal for variable-traffic applications that need MySQL compatibility without capacity planning. Aurora Serverless for PostgreSQL exists but Aurora MySQL's serverless implementation has more production mileage.
When SQL Server Wins
SQL Server is the right choice in specific contexts that other databases can't match:
Microsoft/.NET ecosystem. Entity Framework, Azure Functions, Azure App Service, and Power BI integrate seamlessly with SQL Server. The tooling chain (Visual Studio, SSMS, Azure Data Studio) provides the best developer experience for .NET shops. If your entire stack is Microsoft, fighting the ecosystem to use PostgreSQL creates friction for no gain.
Hybrid OLTP + analytics without a warehouse. Columnstore indexes let you run analytical queries (GROUP BY, SUM, aggregations across millions of rows) directly on your OLTP database. For companies that need basic BI without the cost of a separate data warehouse (Snowflake, BigQuery), SQL Server's columnstore eliminates an entire infrastructure layer.
Enterprise compliance and support. Regulated industries (banking, healthcare, government) often require vendor-backed support SLAs. Microsoft's enterprise support for SQL Server is well-established. Open-source support contracts exist for PostgreSQL, but Microsoft's enterprise support apparatus is more familiar to compliance teams.
Temporal tables for built-in audit history. SQL Server's system-versioned temporal tables automatically track all changes to rows with valid-from/valid-to timestamps. This provides out-of-the-box audit trails without application-side event sourcing. PostgreSQL can do this with triggers, but SQL Server makes it declarative.
In-Memory OLTP for hot tables. Memory-optimized tables (Hekaton engine) provide lock-free, latch-free data access for extreme contention workloads. Session stores, counters, and lookup tables with millions of concurrent reads/writes benefit from 10-30x throughput improvement over disk-based tables.
Query Store for performance regression detection. Query Store captures execution plan history, making it easy to identify when a plan change caused a performance regression. This built-in feature eliminates the need for third-party monitoring tools for plan analysis. PostgreSQL has pg_stat_statements but it doesn't track plan history.
The Nuance
The honest answer for most teams: PostgreSQL is the safe default. Not because it wins every dimension, but because it loses the fewest.
MySQL is faster for simple lookups and has proven sharding infrastructure. SQL Server has better OLAP-in-OLTP and enterprise tooling. But PostgreSQL's extension model means it can grow with your needs without adding new database technologies to your stack.
The biggest mistake I see: choosing a database based on what the team already knows rather than what the workload requires. Familiarity is a valid factor, but it shouldn't override a clear technical mismatch. If you need JSONB queries, geospatial joins, and strong isolation, "but we know MySQL" is not a sufficient counterargument.
The second biggest mistake: choosing SQL Server for a startup just because the team uses .NET. Modern .NET works excellently with PostgreSQL via Npgsql. SQL Server's licensing costs ($3,586 per core for Standard edition) add up fast when you scale. That money buys a lot of PostgreSQL operational expertise.
Operational Overhead Comparison
PostgreSQL's main operational burden is VACUUM. Dead tuples (from UPDATE and DELETE operations) accumulate until VACUUM reclaims the space. Autovacuum handles this automatically, but poorly tuned autovacuum on high-write tables leads to table bloat, index bloat, and eventually transaction ID wraparound (a catastrophic failure at 2 billion transactions if VACUUM doesn't keep up). Monitoring pg_stat_user_tables.n_dead_tup and age(datfrozenxid) is essential.
MySQL's InnoDB uses undo logs instead of inline tuple versions. Dead row versions live in the undo tablespace and are purged automatically. There's no equivalent of PostgreSQL's VACUUM. The main operational concern is the purge lag: if long-running transactions hold old undo logs, the undo tablespace grows. But this is far less common than PostgreSQL bloat issues. InnoDB also has a background merge thread for its change buffer, which batches writes to secondary indexes and reduces random I/O.
SQL Server uses tempdb for row versioning (when snapshot isolation is enabled). Tempdb contention under heavy load is the main operational concern. The mitigation is straightforward: put tempdb on fast storage and create multiple tempdb data files (one per CPU core, up to 8). SQL Server 2022 introduced tempdb metadata optimization by default, reducing latch contention that plagued earlier versions during high-connection workloads.
Licensing Cost Comparison
| Database | Self-hosted Cost | Cloud Managed (2 vCPU, 8 GB) |
|---|---|---|
| PostgreSQL | Free (open source) | ~$130-200/mo (RDS, Aurora) |
| MySQL | Free (Community) | ~$120-180/mo (RDS, Aurora) |
| SQL Server | $3,586/core (Standard) | ~$200-350/mo (RDS, Azure SQL) |
SQL Server's per-core licensing multiplied across a cluster of 8 servers with 16 cores each = $458,000 just for database licenses. For startups and scale-ups, this is a significant factor that should be weighed against SQL Server's genuine advantages in tooling and enterprise features.
The migration question comes up often. PostgreSQL to MySQL and vice versa is painful: different SQL dialects, different procedural languages, different replication models. SQL Server to PostgreSQL is less painful because PG's SQL is a superset of most T-SQL patterns. Tools like pgLoader and ora2pg automate the bulk of schema and data migration. But stored procedures, triggers, and database-specific features always require manual conversion.
Common Migration Pitfalls
Migrating between SQL databases is always harder than expected. Here are the traps I've seen teams fall into:
- Date/time handling differs. MySQL's
DATETIMEhas no timezone. PostgreSQL'sTIMESTAMPTZstores timezone-aware timestamps. SQL Server'sDATETIMEOFFSETstores the offset explicitly. Converting between these requires careful handling of timezone assumptions baked into existing data. - Auto-increment behavior varies. MySQL's
AUTO_INCREMENTrestarts from the max value after a server restart. PostgreSQL'sSERIAL(backed by sequences) never reuses values. SQL Server'sIDENTITYcan have gaps. Applications that assume gapless IDs will break. - Stored procedure languages are incompatible. T-SQL, PL/pgSQL, and MySQL's procedural SQL have different syntax, error handling, and capabilities. Every stored procedure needs manual rewrite. For large codebases with hundreds of procedures, this is the dominant migration cost.
- Implicit type coercion rules differ. MySQL silently truncates strings that exceed column length. PostgreSQL raises an error. MySQL converts invalid dates to
0000-00-00. PostgreSQL rejects them. These silent data corruption patterns in MySQL become errors in PostgreSQL, which is the correct behavior but breaks applications that depend on the silent coercion. - Index behavior and collation. PostgreSQL indexes support partial indexes (
CREATE INDEX ... WHERE active = true), expression indexes (CREATE INDEX ... ON lower(email)), and covering indexes (INCLUDEclause). MySQL's index capabilities are more limited. Collation defaults also differ: MySQL uses case-insensitive collation by default (utf8mb4_0900_ai_ci), while PostgreSQL uses case-sensitive (en_US.UTF-8). Applications migrating from MySQL to PostgreSQL may find thatWHERE name = 'john'no longer matches'John'. - Transaction behavior edge cases. MySQL with REPEATABLE READ allows phantom reads (new rows inserted by concurrent transactions become visible). PostgreSQL's REPEATABLE READ prevents phantoms. This matters for count-then-insert patterns and range queries. Code that was "correct" under MySQL's REPEATABLE READ may have subtly different behavior under PostgreSQL's stricter implementation.
For greenfield projects, the decision framework is straightforward. Do you need PostgreSQL's extensions or complex SQL? Choose PostgreSQL. Do you need Vitess-level sharding or Aurora MySQL? Choose MySQL. Are you in a Microsoft shop where Azure SQL's integration saves development weeks? Choose SQL Server. If none of these clearly applies, PostgreSQL is the safe bet because it has the fewest limitations you'll hit later.
ORM and Driver Ecosystem
All three databases have mature ORM support, but the depth varies:
- Python: SQLAlchemy and Django ORM support all three. PostgreSQL gets the best support for advanced features (JSONB operators, array fields, full-text search). Django's
django.contrib.postgresmodule provides PostgreSQL-specific fields and lookups. - Java/Kotlin: Hibernate and jOOQ support all three. SQL Server via Microsoft's JDBC driver, MySQL via Connector/J, PostgreSQL via the JDBC driver. jOOQ generates type-safe SQL and handles dialect differences.
- Node.js/TypeScript: Prisma, Drizzle, and TypeORM support all three. Prisma's migration system works across databases but PostgreSQL gets the most complete type mapping.
- .NET: Entity Framework Core has first-class SQL Server support (it's the reference implementation). Npgsql provides excellent PostgreSQL support. Pomelo.EntityFrameworkCore.MySql handles MySQL. For .NET developers, SQL Server and PostgreSQL are equally well-supported.
The practical impact: if you're using an ORM for 90% of queries and raw SQL for 10%, the database choice matters less for the ORM queries and more for the raw SQL edge cases where PostgreSQL's SQL depth shines.
One last consideration: local development. PostgreSQL runs identically on macOS, Linux, and Windows (or Docker). MySQL is similarly cross-platform. SQL Server requires Docker (Linux container) or Windows for local development; the macOS developer experience is weaker. For teams with diverse development environments, PostgreSQL and MySQL have a practical edge in local development ergonomics.
The bottom line: you can build a successful product on any of these three databases. The differences matter at scale, at the margins, and when you hit a feature gap. Choose deliberately based on your workload, team, and cloud platform. Then stop second-guessing and start building.
Interview tip: lead with the default, then qualify
Say: "I'd default to PostgreSQL for most relational workloads because of its feature depth and extensibility. But if the team has deep MySQL expertise and the workload is simple OLTP, MySQL is a fine choice. If we're in a Microsoft shop with Azure infrastructure, SQL Server's integration advantages are worth the licensing cost." This shows you know the landscape, not just one tool.
Real-World Examples
Instagram: Runs one of the largest PostgreSQL deployments in the world. Their main database cluster handles user profiles, follows, likes, and media metadata. They rely heavily on PostgreSQL's JSONB for flexible metadata storage and logical replication for feeding changes to downstream systems. At their scale (2B+ monthly active users), they shard PostgreSQL manually at the application level using a custom sharding library rather than using Citus or switching to MySQL + Vitess. Their engineering blog reports managing over 10,000 PostgreSQL instances. The key architectural decision: PostgreSQL's JSONB flexibility allowed them to evolve their schema rapidly without costly migrations.
Shopify: Built their entire platform on MySQL and scaled it using Vitess. Their deployment spans thousands of MySQL instances sharded by merchant ID. ProxySQL handles read/write splitting. When they needed features MySQL couldn't provide (full-text search, analytics), they added Elasticsearch and a data warehouse rather than migrating to PostgreSQL. Scale what works, add specialized tools for gaps. At their scale (hundreds of millions of transactions per day), Vitess's automated sharding, resharding, and schema migration capabilities are critical operational tools that justified staying on MySQL.
Stack Overflow: Runs on SQL Server with just two primary database servers handling 1.3 billion pageviews per month. Their entire stack is .NET, and SQL Server's columnstore indexes allow analytical queries (trending questions, tag analytics, user reputation calculations) directly on the production database without a separate warehouse. Their architecture proves two things: (1) SQL Server can be remarkably efficient when the workload matches, and (2) vertical scaling with the right database is often simpler than horizontal scaling with the wrong one. They estimate that their SQL Server licensing costs are a fraction of what a microservices + PostgreSQL + Elasticsearch + analytics warehouse setup would cost in engineering hours.
Notion: Uses PostgreSQL as their primary database. As their product grew to millions of pages and blocks, they hit PostgreSQL's VACUUM performance limits with multi-billion-row tables. Their engineering team invested heavily in partitioning, custom VACUUM tuning, and eventually sharding at the application layer. The lesson: PostgreSQL scales further than most teams expect, but multi-billion-row tables require deliberate operational investment. Their choice to stay on PostgreSQL (rather than migrate to a distributed SQL database) was driven by PostgreSQL's rich feature set for their block-based content model.
How This Shows Up in Interviews
Database selection comes up early in most system design interviews. The interviewer mentions "we need a database" and watches what you say.
What they're testing: Do you have a default with reasoning, or do you say "it depends" without committing? Can you explain when your default doesn't apply?
Depth expected at senior level:
- Know PostgreSQL's extension model and name 2-3 extensions (PostGIS, pgvector, TimescaleDB)
- Understand MVCC and what VACUUM does (dead tuple reclamation, bloat risk)
- Know MySQL's clustered index advantage for primary key lookups
- Explain why Aurora MySQL/PostgreSQL differs from RDS (storage-compute separation, 6-way replication)
- Articulate when SQL Server's columnstore indexes eliminate the need for a data warehouse
| Interviewer asks | Strong answer |
|---|---|
| "Which database would you use here?" | "PostgreSQL as the default. We need JSONB for the flexible metadata schema, and PostGIS for the location-based queries. If the team had deep MySQL expertise, I'd consider Aurora MySQL, but we'd lose native JSONB indexing." |
| "Why not MySQL?" | "MySQL's JSON indexing requires generated columns, which is more complex. The query patterns here use window functions and lateral joins that PostgreSQL handles natively. MySQL's advantage is simpler read-replica scaling, but we're not at that scale yet." |
| "The company uses .NET and Azure. Still PostgreSQL?" | "In that case, Azure SQL is worth serious consideration. The integration with Entity Framework, SSMS tooling, and columnstore indexes for the analytics requirements would reduce development friction. The trade-off is licensing cost and vendor lock-in." |
| "How would you handle scaling reads?" | "Start with Aurora PostgreSQL read replicas (up to 15, sub-second lag). If we outgrow that, add a Redis cache layer for hot data. For MySQL, the answer is similar but Vitess gives an additional horizontal sharding option that PostgreSQL lacks." |
| "What about CockroachDB or other NewSQL?" | "CockroachDB gives distributed SQL with strong consistency. I'd consider it for multi-region writes with serializable isolation. For single-region, PostgreSQL is simpler and has a larger ecosystem. CockroachDB uses PG's wire protocol, so migration is feasible later." |
| "How do you handle connection pooling?" | "PostgreSQL uses process-per-connection, so PgBouncer or Supabase's built-in pooler is essential above 100 connections. MySQL uses threads and handles thousands of connections natively. SQL Server also uses threads. This is a real operational difference for high-concurrency workloads." |
Gotcha: don't say 'Postgres is always the answer'
If the system design involves a Microsoft stack or the interviewer mentions Azure/.NET, recommending PostgreSQL without acknowledging SQL Server is a miss. Similarly, dismissing MySQL when the scenario involves Vitess-level sharding shows a blind spot. Show you know the full landscape.
Quick Recap
- PostgreSQL is the strongest default for new projects: MVCC with serializable isolation, JSONB with GIN indexes, extensions (PostGIS, pgvector, TimescaleDB), and the deepest SQL feature set. The trade-off is VACUUM tuning and operational complexity.
- MySQL wins for simple OLTP workloads with massive read-replica scaling (Aurora, Vitess). InnoDB's clustered index makes primary key lookups measurably faster. Choose it when your team has deep MySQL expertise or needs Vitess-level horizontal sharding.
- SQL Server wins in Microsoft/.NET ecosystems and when you need columnstore indexes for hybrid OLTP+analytics without a separate warehouse. The trade-off is licensing cost ($3,586+/core) and vendor lock-in.
- All three have mature cloud-managed options. Aurora (PG and MySQL), Azure SQL (SQL Server), and Cloud SQL (all three) handle most operational burden. Don't choose a database based on operations you'll never do manually.
- PostgreSQL's extension model is the key differentiator. It turns one database into many (geospatial, vector search, time-series, distributed) without adding infrastructure components. MySQL and SQL Server can't match this flexibility.
- In interviews, lead with PostgreSQL, then name the specific scenario where MySQL or SQL Server is better. "Default to PostgreSQL, but for this workload..." shows judgment, not bias.
Backup and Recovery Comparison
| Capability | PostgreSQL | MySQL | SQL Server |
|---|---|---|---|
| Point-in-time recovery | Continuous WAL archiving + base backup | Binlog + full backup | Full + differential + log backups |
| Online backup | pg_basebackup (no downtime) | mysqldump (locks) or Percona XtraBackup (no locks) | BACKUP DATABASE (no downtime) |
| Incremental | WAL segments (continuous) | Binary log (continuous) | Differential + log backups |
| Managed (AWS) | Automated snapshots, PITR to any second in retention | Automated snapshots, PITR to any second | Automated snapshots, PITR |
| Recovery speed | Fast (replay WAL to target time) | Moderate (replay binlog) | Fast (restore differential + log chain) |
All three support point-in-time recovery in managed environments. The practical difference is in self-hosted scenarios: PostgreSQL's continuous WAL archiving is the most straightforward, MySQL requires Percona XtraBackup for non-locking backups, and SQL Server's multi-tier backup strategy (full + differential + log) provides the most granular restore options.
For managed services, AWS RDS and Aurora handle backups automatically. Automated daily snapshots with configurable retention (up to 35 days) and continuous transaction log backups enable point-in-time restore to any second within the retention window. This is a strong reason to use managed services for databases: the backup and recovery story is dramatically simpler than self-hosted.
The disaster recovery strategy differs by database too. PostgreSQL uses streaming replication to a standby in another region (or Aurora Global Database with <1s replication lag). MySQL uses binlog-based cross-region replication (or Aurora Global Database). SQL Server uses Always On Availability Groups with asynchronous replicas across regions. In all cases, cross-region DR adds latency to writes if synchronous, or data loss risk if asynchronous. The trade-off is the same regardless of database: RPO (Recovery Point Objective) vs write latency.
Related Trade-offs
- Databases for relational database fundamentals and design principles
- SQL vs. NoSQL for when to leave relational databases entirely
- Normalization vs. denormalization for schema design trade-offs within any SQL database
- Replication for leader-follower and multi-leader replication mechanics
- Sharding for horizontal partitioning strategies across database instances