PostgreSQL vs. MySQL vs. SQL Server
How PostgreSQL, MySQL, and SQL Server compare for system design, covering feature depth, replication models, JSON support, full-text search, cloud-managed options, and when each is the right default.
TL;DR
| Dimension | PostgreSQL | MySQL | SQL Server |
|---|---|---|---|
| Default for | New projects, complex queries, JSON + relational hybrid | Read-heavy web apps, PHP/Java stacks, Vitess sharding | .NET ecosystem, Microsoft shops, BI + OLTP hybrid |
| Killer feature | Extensions (PostGIS, pgvector, TimescaleDB, Citus) | Battle-tested replication at massive scale (Aurora, Vitess) | Columnstore indexes, Always On AG, SSMS tooling |
| Watch out for | VACUUM tuning at scale, connection pooling needed | Weak partial indexes, limited SQL features | Per-core licensing costs, Windows dependency |
| Replication | Streaming + logical replication | Binary log + Group Replication | Always On Availability Groups |
| JSON support | JSONB with GIN indexes (first-class) | JSON type with generated columns for indexing | JSON functions (adequate, no native type) |
| Cloud managed | Aurora PG, Cloud SQL, Azure DB for PG, Supabase, Neon | Aurora MySQL, PlanetScale, Cloud SQL | Azure SQL Database, RDS for SQL Server |
Default answer: PostgreSQL for most new projects. MySQL when your team has deep MySQL expertise or you need Vitess/Aurora MySQL specifically. SQL Server when you're in a Microsoft/.NET environment or need columnstore analytics.
All three are ACID-compliant, support standard SQL, have mature ecosystems, and can handle terabytes of data. The differences are in the edges: extensions, replication topology, licensing, and operational ergonomics.
The Framing
You're starting a new project and need a relational database. The team has two camps: "just use Postgres" and "our backend devs all know MySQL." The CTO asks whether SQL Server would integrate better with Azure.
This conversation happens at every startup and inside every enterprise team. The SQL syntax is 90% identical across all three, so the decision feels trivial. It isn't. The differences show up at 3 AM when you're debugging replication lag, when your JSONB query needs an index that MySQL can't create, or when your DBA discovers PostgreSQL's VACUUM is eating 40% of disk I/O because nobody tuned autovacuum.
Each database has a philosophy. PostgreSQL prioritizes correctness and extensibility. MySQL prioritizes simplicity and read throughput. SQL Server prioritizes enterprise integration and tooling. The "right" choice depends on which trade-offs match your constraints.
For your interview: lead with PostgreSQL as the default, then explain when MySQL or SQL Server is better. That shows judgment, not bias.
How Each Works
PostgreSQL: Correctness-First with Extensible Architecture
PostgreSQL uses Multi-Version Concurrency Control (MVCC) where readers never block writers and vice versa. Each row version is stored inline, and VACUUM reclaims dead tuples. This enables serializable snapshot isolation (the strongest isolation level available in production databases) without the performance penalty most engines impose.
The extension system is PostgreSQL's superpower. Unlike MySQL's plugin model or SQL Server's CLR integration, PostgreSQL extensions can define new types, operators, index methods, and even storage engines:
-- PostGIS: geographic queries
SELECT name FROM stores
WHERE ST_DWithin(location, ST_MakePoint(-73.99, 40.73)::geography, 5000);
-- pgvector: ML embedding similarity search
SELECT title FROM articles
ORDER BY embedding <=> '[0.12, 0.45, ...]'::vector
LIMIT 10;
-- JSONB: index and query JSON natively
CREATE INDEX idx_meta ON products USING GIN (metadata jsonb_path_ops);
SELECT * FROM products WHERE metadata @> '{"color": "red"}';
Replication comes in two flavors. Streaming replication sends WAL (Write-Ahead Log) bytes to replicas for physical standby. Logical replication decodes WAL into row-level changes, enabling selective replication (specific tables), cross-version upgrades, and feeding change data to external systems like Kafka via Debezium. Streaming replication is simpler to set up but requires identical PostgreSQL versions on primary and replica. Logical replication allows cross-version replication, making zero-downtime major version upgrades possible.
Window functions, CTEs (Common Table Expressions), recursive queries, lateral joins, and row-level security (RLS) are all mature. I've seen PostgreSQL handle analytics queries that would require a separate data warehouse in MySQL. The query planner is sophisticated enough to flatten subqueries, push down predicates into views, and parallelize hash joins across multiple workers.
Partitioning in PostgreSQL supports range, list, and hash strategies natively since PG 10. A 500M-row orders table partitioned by month lets VACUUM work on individual monthly partitions (fast) instead of the entire table (slow). Partition pruning eliminates irrelevant partitions at query time. Combined with parallel query execution (PG 11+), analytical queries on partitioned tables scale nearly linearly with available CPU cores.
-- Declarative partitioning
CREATE TABLE orders (
id BIGSERIAL,
created_at TIMESTAMPTZ NOT NULL,
total NUMERIC(10,2),
customer_id BIGINT
) PARTITION BY RANGE (created_at);
CREATE TABLE orders_2026_01 PARTITION OF orders
FOR VALUES FROM ('2026-01-01') TO ('2026-02-01');
CREATE TABLE orders_2026_02 PARTITION OF orders
FOR VALUES FROM ('2026-02-01') TO ('2026-03-01');
JIT compilation (Just-In-Time, using LLVM) compiles query expressions into machine code at runtime. For CPU-bound analytical queries touching millions of rows, JIT can reduce execution time by 20-50%. It's enabled by default in PG 12+ but only kicks in for queries whose estimated cost exceeds jit_above_cost (default 100,000). For simple OLTP queries, JIT adds overhead and is skipped automatically.
The pg_stat_statements extension is essential for production PostgreSQL. It tracks execution statistics for all SQL statements: total time, number of calls, mean time, rows returned, and buffer hits. This is how you identify slow queries without application-level instrumentation. Combined with auto_explain (which logs execution plans for slow queries), you get full observability into query performance.
MySQL: Simplicity and Read Throughput at Scale
MySQL's InnoDB engine uses a clustered index (primary key physically orders data on disk), which makes primary key lookups blazing fast. Simple SELECT * WHERE id = ? queries are faster on MySQL than PostgreSQL for this reason: one B-tree traversal hits the data page directly.
Replication is binary log (binlog) based. The primary writes changes to the binlog; replicas read and apply them. This is battle-tested at massive scale. Group Replication adds multi-primary support with Paxos-based consensus for automatic failover.
-- InnoDB clustered index advantage: fast PK lookup
SELECT * FROM users WHERE id = 12345; -- Single B-tree traversal
-- MySQL 8.0+ CTE support (finally)
WITH active_users AS (
SELECT user_id, COUNT(*) as order_count
FROM orders WHERE created_at > NOW() - INTERVAL 30 DAY
GROUP BY user_id
)
SELECT u.name, a.order_count
FROM users u JOIN active_users a ON u.id = a.user_id;
MySQL's ecosystem for scaling reads is unmatched. ProxySQL handles read/write splitting transparently. Vitess (built at YouTube) shards MySQL horizontally across thousands of nodes. Aurora MySQL provides up to 15 read replicas with sub-second replication lag.
The trade-off: MySQL's SQL dialect is less expressive. No LATERAL joins, limited window function support compared to PostgreSQL, no native array or composite types, and JSON indexes require generated columns rather than native GIN indexes.
Global Transaction Identifiers (GTIDs) simplify replication management by assigning a unique ID to every transaction. This makes failover and replica promotion straightforward: the new primary picks up from the last committed GTID rather than tracking binary log file positions. Semi-synchronous replication (rpl_semi_sync_source_enabled=ON) waits for at least one replica to acknowledge the transaction before returning to the client, providing durability guarantees closer to synchronous replication without the full latency penalty.
-- Check replication lag on a MySQL replica
SHOW SLAVE STATUS\G
-- Key fields: Seconds_Behind_Master, Slave_SQL_Running_State
-- GTID-based replication setup
SET GLOBAL gtid_mode = ON;
SET GLOBAL enforce_gtid_consistency = ON;
-- Replicas auto-track by GTID, no need to specify log file positions
SQL Server: Enterprise Integration and Hybrid Analytics
SQL Server's strength is integration with the Microsoft ecosystem: .NET, Azure, Visual Studio, Power BI, and SQL Server Management Studio (SSMS). For .NET shops, the developer experience is seamless from IDE to database.
Columnstore indexes are the standout technical feature. They store data column-wise with batch-mode processing, enabling OLAP-style analytics directly in the OLTP database. A query scanning billions of rows across a few columns runs 10-100x faster with columnstore than traditional row-based indexes.
-- Columnstore index for analytics on OLTP data
CREATE NONCLUSTERED COLUMNSTORE INDEX idx_orders_analytics
ON orders (order_date, total_amount, status, customer_id);
-- This analytical query now runs in seconds, not minutes
SELECT DATEPART(month, order_date) AS month,
COUNT(*) AS order_count,
SUM(total_amount) AS revenue
FROM orders
WHERE order_date >= '2025-01-01'
GROUP BY DATEPART(month, order_date);
Always On Availability Groups provide synchronous or asynchronous replication with automatic failover. Readable secondary replicas handle read-only workloads, offloading reporting from the primary.
T-SQL extends standard SQL with procedural programming features. While PostgreSQL's PL/pgSQL is comparable, T-SQL's integration with SSMS (execution plans, query store, performance dashboards) provides a developer experience that other databases lacks.
SQL Server's automatic tuning feature analyzes query patterns and automatically creates or drops indexes based on workload data. The Database Engine Tuning Advisor has existed for years, but the automatic tuning mode (enabled in Azure SQL by default) takes action without human intervention. It monitors for plan regressions (queries that suddenly got slower due to a plan change) and automatically forces the previous good plan. This self-tuning capability is unique among the three databases.
Linked Servers allow SQL Server to query external data sources (other SQL Server instances, Oracle, PostgreSQL, MySQL) using standard T-SQL syntax. Combined with distributed transactions, this enables cross-database operations that would require application-level coordination in PostgreSQL or MySQL. For enterprises with data spread across multiple database systems, Linked Servers reduce integration complexity.
-- Query a remote PostgreSQL database from SQL Server
SELECT p.product_name, o.order_date
FROM OPENQUERY(PG_LINKED_SERVER,
'SELECT product_id, product_name FROM products WHERE active = true'
) AS p
JOIN local_orders o ON o.product_id = p.product_id;
Temporal tables (system-versioned) automatically track all row changes with valid-from/valid-to timestamps. This provides built-in audit history without application code. Every UPDATE or DELETE creates a historical record in a paired history table. Querying "what was this row at timestamp X" is a single SQL statement.
-- System-versioned temporal table
CREATE TABLE employees (
id INT PRIMARY KEY,
name NVARCHAR(100),
salary DECIMAL(10,2),
department NVARCHAR(50),
valid_from DATETIME2 GENERATED ALWAYS AS ROW START,
valid_to DATETIME2 GENERATED ALWAYS AS ROW END,
PERIOD FOR SYSTEM_TIME (valid_from, valid_to)
) WITH (SYSTEM_VERSIONING = ON);
-- Query historical state: what was the salary on Jan 15?
SELECT name, salary
FROM employees FOR SYSTEM_TIME AS OF '2026-01-15'
WHERE id = 42;
In-Memory OLTP (Hekaton) provides lock-free, latch-free memory-optimized tables for extreme OLTP workloads. This is a niche feature but delivers 10-30x throughput improvement for contention-heavy workloads like session stores and high-frequency counters.
Query Store captures execution plan history for every query. When a query suddenly becomes slow (plan regression), Query Store lets you identify the exact point where the plan changed and force the previous good plan. This is a built-in feature that eliminates one of the most frustrating database performance problems: queries that were fast yesterday and slow today with no code changes. PostgreSQL's pg_stat_statements tracks aggregated statistics but not plan history; you'd need third-party tools like pganalyze to get comparable plan regression detection.
Monitoring Essentials by Database
Each database has different critical metrics to monitor. Getting these dashboards right from day one prevents most production incidents.
- PostgreSQL:
pg_stat_user_tables.n_dead_tup(bloat indicator),age(datfrozenxid)(transaction ID wraparound risk, must stay under 2 billion), replication lag viapg_stat_replication, and connection count vsmax_connections. Ifn_dead_tupgrows continuously, autovacuum isn't keeping up. Set up PagerDuty alerts forage(datfrozenxid)> 1 billion. - MySQL:
Seconds_Behind_Master(replication lag), InnoDB buffer pool hit ratio (should be >99%),Threads_runningvsThreads_connected(contention indicator), and binary log position for failover readiness. Theperformance_schemaprovides deep instrumentation for wait events and lock contention. - SQL Server: Wait statistics (
sys.dm_os_wait_statsreveals bottlenecks),sys.dm_exec_query_statsfor expensive queries,sys.dm_db_index_usage_statsfor unused indexes to drop, and Always On dashboard for replica synchronization health. The Dynamic Management Views (DMVs) provide the most comprehensive built-in monitoring of all three databases.
Continue Reading with Premium
Unlock this article and every other in-depth system design guide on the platform with NotesFromSDE Premium.