Horizontal vs. vertical scaling
The difference between scaling up (bigger machines) and scaling out (more machines): when each strategy applies, the limits of vertical scaling, and why horizontal scaling requires stateless architecture.
TL;DR
| Dimension | Choose Vertical Scaling | Choose Horizontal Scaling |
|---|---|---|
| Architecture | No code changes needed, single-node simplicity | Requires stateless design, load balancer, distributed state |
| Ceiling | Hardware limit (AWS: 448 vCPUs, 24 TB RAM on u-24tb1.metal) | Theoretically unlimited, add machines indefinitely |
| Fault tolerance | Single point of failure, the big machine goes down and everything stops | Built-in redundancy, one node fails and others continue |
| Cost curve | Non-linear: 2x CPU costs 2.5-3x the price | Near-linear: 2x capacity costs ~2x (commodity hardware) |
| Best for | Databases, GPU workloads, coordination nodes, early-stage startups | Stateless web/API servers, read-heavy workloads, traffic spikes |
Default answer: scale vertically first (it's simpler), then scale horizontally when you hit the ceiling or need fault tolerance. For stateless application servers, go horizontal from day one. For databases, go vertical until it hurts, then add read replicas, then shard as a last resort.
The Framing
A startup I worked with had a PostgreSQL database on a db.r6g.xlarge (4 vCPUs, 32 GB RAM). At 2,000 queries/sec it maxed out CPU. The team debated for two weeks: shard the database or upgrade the instance?
They upgraded to db.r6g.4xlarge (16 vCPUs, 128 GB RAM). Took 20 minutes of downtime. Cost went from $0.65/hr to $2.60/hr. Problem solved for the next 18 months. No code changes, no schema redesign, no distributed transaction headaches.
The team that sharded their database at 2,000 QPS? They spent three months building a sharding layer, introduced cross-shard query bugs that took weeks to diagnose, and ended up with two shards doing 1,000 QPS each. They could have clicked "modify instance" and gone to lunch.
This is the core lesson. Vertical scaling is boring. Horizontal scaling is interesting. Engineers systematically over-invest in horizontal scaling because it's more technically challenging. But "boring and simple" is usually the right first move. Scale vertically until you can't, then scale horizontally where you must.
How Each Works
Vertical Scaling: Bigger Machine
Vertical scaling means replacing your current server with a more powerful one. More CPU cores, more RAM, faster NVMe storage, better network bandwidth. The application code stays exactly the same.
AWS EC2 vertical scaling path:
t3.medium β 2 vCPU, 4 GB RAM β $0.042/hr
m6i.xlarge β 4 vCPU, 16 GB RAM β $0.192/hr
m6i.4xlarge β 16 vCPU, 64 GB RAM β $0.768/hr
m6i.16xlarge β 64 vCPU, 256 GB RAM β $3.072/hr
u-24tb1.metalβ448 vCPU, 24 TB RAM β ~$218/hr (the ceiling)
RDS PostgreSQL vertical scaling:
db.r6g.large β 2 vCPU, 16 GB β ~3,000 QPS simple queries
db.r6g.4xlarge β 16 vCPU, 128 GB β ~20,000 QPS simple queries
db.r6g.16xlargeβ 64 vCPU, 512 GB β ~60,000 QPS simple queries
The advantages are obvious. No architectural changes. No load balancer. No distributed state management. ACID transactions work on a single machine without coordination overhead. Upgrades take minutes (managed databases handle it with a brief failover).
The limits are equally obvious. There's a ceiling: the biggest machine AWS sells has 448 vCPUs and 24 TB RAM. You can't go bigger. The cost curve is non-linear: doubling CPU roughly triples the price at the high end. And you have one machine. If it fails, everything fails.
My rule of thumb: if a vertical upgrade buys you 12+ months of headroom and costs less than $5,000/month, do it. It's almost always cheaper than the engineering time to build horizontal scaling infrastructure.
Horizontal Scaling: More Machines
Horizontal scaling means running multiple instances of your service behind a load balancer. Each instance handles a portion of the traffic. Add more instances to handle more load.
# The fundamental requirement: stateless design
# Each request can be handled by ANY instance
# BAD: state stored in process memory
class BadServer:
def __init__(self):
self.sessions = {} # <-- lost if this instance dies
def handle(self, request):
user = self.sessions[request.session_id] # Breaks on different instance
return process(request, user)
# GOOD: state stored externally
class GoodServer:
def __init__(self, redis_client, db_client):
self.redis = redis_client
self.db = db_client
def handle(self, request):
user = self.redis.get(f"session:{request.session_id}") # Any instance works
return process(request, user)
The prerequisite is stateless architecture. If an instance stores user sessions in process memory, requests from the same user must always hit the same instance (sticky sessions). That defeats the purpose. Move state to Redis, a database, or a distributed cache, and every instance is interchangeable.
Horizontal scaling is theoretically unlimited: need 10x capacity? Add 10x instances. Need 100x for Black Friday? Autoscaling handles it. It also provides fault tolerance: one instance crashes and the load balancer routes around it.
The costs are architectural complexity (load balancer, health checks, service registration, distributed state), operational complexity (deploying to N machines, monitoring N machines, debugging issues that only happen on 1 of N machines), and the requirement to move all state out of the process.
Head-to-Head Comparison
| Dimension | Vertical | Horizontal | Verdict |
|---|---|---|---|
| Implementation effort | Click "resize," wait 5 minutes | Redesign for statelessness, add LB, externalize state | Vertical, much simpler |
| Max capacity | Hardware ceiling (448 vCPU, 24 TB RAM) | Theoretically unlimited | Horizontal |
| Fault tolerance | None. One machine = one failure domain | Built-in. N-1 instances survive one failure | Horizontal |
| Cost efficiency at scale | Non-linear: 2x resources costs 2.5-3x | Near-linear: 2x resources costs ~2x | Horizontal at scale |
| Cost efficiency at small scale | One machine, no LB overhead, no coordination | LB + multiple instances + external state | Vertical at small scale |
| Latency | Everything on one box: no network hops | Cross-instance communication adds latency | Vertical for single-request |
| ACID transactions | Single-node transactions, no coordination | Distributed transactions or eventual consistency | Vertical for transactional workloads |
| Scaling granularity | Large jumps (4x CPU minimum upgrade steps) | Incremental (add one instance at a time) | Horizontal |
| Downtime during scaling | Brief (managed DB failover: ~30s) | Zero (add instances behind LB) | Horizontal |
| Operational complexity | One machine to monitor and debug | N machines, distributed logs, network partitions | Vertical |
The pattern I've seen repeatedly: teams adopt horizontal scaling for the app tier (correct, since web servers are naturally stateless) but keep the database vertical far longer than people expect. Shopify ran their core commerce database on a single very large PostgreSQL instance well past $1B GMV. Scaling the database horizontally (sharding) has enormous complexity costs that vertical scaling avoids.
When Vertical Scaling Wins
Vertical scaling is right when you want simplicity and haven't hit the ceiling.
Databases, almost always first. PostgreSQL, MySQL, MongoDB. Vertical first because distributed database complexity is enormous. Sharding requires choosing a partition key, handling cross-shard queries, managing data migration, and dealing with hotspots. A db.r6g.16xlarge (64 vCPU, 512 GB RAM) handles far more than most startups need. I've seen databases serving 50,000+ simple queries per second on a single vertical instance.
Specialized compute. GPU workloads (ML inference, video encoding) scale better with a bigger GPU than with multiple smaller GPUs. An A100 80 GB outperforms two A10G 24 GBs for most inference workloads because the model fits in one GPU's memory without cross-GPU communication.
Coordination-sensitive services. Leader election quorum nodes, ZooKeeper clusters, etcd clusters. These run on 3-5 nodes by design. Adding more nodes increases coordination overhead. Scale each node up, not the cluster out.
Early-stage startups. Your first 100K users don't need horizontal scaling. A single $200/month instance handles more traffic than most startups will see in year one. Spend engineering time on the product, not the infrastructure.
When Horizontal Scaling Wins
Horizontal scaling is right when you need fault tolerance, handle variable traffic, or have hit vertical limits.
Stateless application servers. Web servers, API servers, microservices. These are designed to be stateless and interchangeable. Horizontal scaling is the natural model. There's no reason to run one giant API server when four medium ones give you the same capacity plus fault tolerance.
Traffic with spikes. E-commerce during flash sales, streaming during live events, news sites during breaking stories. Vertical scaling can't react: you can't resize an instance in 30 seconds. Auto-scaling adds instances in under a minute. My recommendation: if your peak traffic is 5x+ your baseline, horizontal with auto-scaling is the only practical option.
Read-heavy workloads. Database read replicas are a form of horizontal read scaling. Adding 3 read replicas gives you 4x the read capacity without touching the primary. Works beautifully for workloads that are 90%+ reads.
Fault tolerance requirements. SLA demands 99.99% availability? You need redundancy across instances and availability zones. No single machine, no matter how powerful, achieves four nines alone. You need at least 2-3 instances with health checks and automatic failover.
Cost optimization at scale. At large scale, horizontal scaling on commodity hardware is cheaper per unit of compute than vertical scaling on premium instances. Four r6g.4xlarge instances ($2.60/hr each = $10.40/hr total) give you 64 vCPUs and 512 GB RAM. One r6g.16xlarge with the same specs costs $12.80/hr. That's a 23% premium for the convenience of a single machine.
The Nuance
The Hybrid Reality
In practice, every production system uses both strategies at different layers. Here's the standard pattern:
The app tier scales horizontally from day one. The cache starts vertical (single Redis node) and goes horizontal (Redis Cluster) when data exceeds one node's memory. The database stays vertical for as long as possible, then adds read replicas, then shards.
The Cost Crossover
There's a specific point where horizontal becomes cheaper than vertical. Below that point, the operational overhead of horizontal scaling is a waste.
For small teams (under 10 engineers), the operational overhead of horizontal infrastructure often costs more in engineering time than the savings on compute. For large teams with platform engineering capabilities, horizontal scaling is almost always cheaper at scale.
When Vertical Hits the Wall
The moment vertical scaling fails is abrupt. You're on the largest available instance and CPU is at 90%. There's no bigger machine. Now you must go horizontal, and you must do it under pressure, which is the worst time to make architectural decisions.
My advice: plan for horizontal scaling in your architecture (stateless services, external state) even if you deploy vertically today. The migration from "one big instance" to "four small instances" should be a configuration change, not a rewrite. If your application requires sticky sessions or stores state in local files, fix that now while you have time.
Real-World Examples
Shopify: Runs one of the largest Ruby on Rails monoliths on a vertically scaled database. Their core commerce database was a single large MySQL instance for years, handling millions of merchants. They stayed vertical with aggressive query optimization and caching, only sharding when they absolutely had to (and it took years of engineering effort). The lesson: vertical scaling's simplicity is worth protecting as long as possible.
Netflix: The poster child for horizontal scaling. Their microservices run on thousands of EC2 instances with auto-scaling. During peak hours (8 PM on a Sunday), they scale up to handle 200M+ active users. Their stateless architecture means instances are disposable: any instance can handle any request for any user. They pioneered Chaos Monkey specifically because horizontal architectures need automated failure testing.
Instagram: Scaled vertically for their PostgreSQL database well past 1 billion users by combining aggressive vertical scaling (very large instances), read replicas, and application-level caching. They famously ran their entire backend on fewer than 10 engineers. When they finally sharded, they took months to plan the partition strategy and migration. The delay was worth it because every month of vertical scaling was a month of simpler operations.
How This Shows Up in Interviews
This tradeoff comes up early in almost every system design interview, usually when you draw your first architecture box. The interviewer wants to see that you know when each strategy applies.
What they're testing: Do you default to horizontal because it sounds more "scalable," or do you show judgment about when vertical is the right call? Senior candidates know that horizontal scaling has real costs.
Depth expected at senior level:
- Know specific instance sizes and their limits (r6g.16xlarge: 64 vCPU, 512 GB)
- Explain the database scaling ladder: vertical, connection pooling, read replicas, caching, sharding
- Identify which tiers scale which way and why
- Discuss auto-scaling mechanics: metrics, cool-down periods, predictive scaling
- Name the stateless-architecture prerequisite for horizontal scaling
| Interviewer asks | Strong answer |
|---|---|
| "How would you scale this?" | "The app tier scales horizontally behind an ALB, starting at 3 instances with CPU-based autoscaling. The database scales vertically first. At this traffic level, a db.r6g.4xlarge handles it easily. I'd add read replicas when reads hit 80% of capacity." |
| "Why not shard the database from the start?" | "Sharding adds cross-shard query complexity, application-level routing, and months of migration work. Vertical scaling on a single instance handles 50K+ simple QPS. I'd exhaust vertical scaling, read replicas, and caching before introducing sharding." |
| "What happens when you hit the vertical ceiling?" | "The ceiling is real: 448 vCPU, 24 TB RAM for the largest EC2 instance. For databases, the path is: read replicas for read scaling, then sharding for write scaling. For compute, horizontal scaling with a load balancer and stateless design." |
| "How does auto-scaling work?" | "Kubernetes HPA scales pods based on CPU/memory metrics. AWS Auto Scaling uses target tracking policies. Key: scale up fast (30s), scale down slow (5 min) to avoid thrashing. Set minimum replicas to 3 for availability." |
| "What's the cost tradeoff?" | "Vertical is cheaper in ops overhead for small teams. Horizontal is cheaper in raw compute at scale. The crossover depends on team size and traffic patterns. Four r6g.4xlarge instances cost 19% less than one r6g.16xlarge with the same total specs, but you need a load balancer, distributed monitoring, and ops tooling." |
Interview tip: mention the database scaling ladder
When the interviewer asks about database scaling, don't jump to sharding. Walk the ladder: "I'd start with a vertically scaled primary, add PgBouncer for connection pooling, then read replicas, then Redis caching. Sharding is a last resort." This shows you understand the complexity cost of each step.
Quick Recap
- Vertical scaling (bigger machine) requires no code changes and is the simplest way to increase capacity. It's the correct first step for databases and most services that haven't hit their ceiling.
- Horizontal scaling (more machines) requires stateless architecture, a load balancer, and externalized state. It provides fault tolerance, auto-scaling, and theoretically unlimited capacity.
- The database scaling ladder runs: vertical upgrade, connection pooling, read replicas, caching, sharding. Most applications never need sharding. Exhaust every earlier rung first.
- Vertical scaling has a hard ceiling (448 vCPU, 24 TB RAM on AWS's largest instance) and no fault tolerance. Plan your architecture for horizontal scaling even if you deploy vertically today.
- The cost crossover favors horizontal at scale (4 medium instances are ~19% cheaper than 1 equivalent large instance) but vertical at small scale (simpler ops, no LB overhead).
- In interviews, show you know which tier scales which way: app tier horizontal from day one, database vertical first, cache vertical then cluster, and use auto-scaling with fast scale-up and slow scale-down.
Related Trade-offs
- Scalability for the broader principles of designing systems that handle growing load
- Load balancing for the routing algorithms and health checks that make horizontal scaling work
- Stateful vs. stateless for why stateless design is the prerequisite for horizontal scaling
- SQL vs. NoSQL for how database choice affects your scaling options
- Read replicas vs. caching for the two read-scaling strategies that come before sharding Consistent hashing distributes keys across nodes
Message brokers: Kafka: partition-based horizontal scaling (add brokers, add partitions)
CDN / edge nodes: By definition horizontal (nodes near users around the world)
## The Database Sharding Decision
Databases require special consideration because they're inherently stateful:
Scaling path:
- Vertical: increase primary hardware β cheapest, simplest
- Read replicas: add horizontal read capacity β adds replica lag
- Connection pooling: PgBouncer, RDS Proxy β more connections, not more writes
- Caching: remove read load from DB entirely β most effective read scaling
- Sharding: horizontal write scaling β major application changes required
Each step is an order of magnitude more complex than the previous. Don't jump to sharding until you've exhausted steps 1-4.
Sharding is the last resort, not the first answer, for databases.
## Quick Recap
1. Vertical scaling (bigger machines) requires no application changes and works well for databases. Horizontal scaling (more machines) requires stateless services and a load balancer.
2. Vertical scaling has a ceiling (largest machines are finite) and a cost cliff (large instances aren't cost-linear). Horizontal scaling's ceiling is effectively unlimited.
3. Scale-up before scale-out: vertical is simpler, faster, and appropriate until you hit the hardware ceiling or the SPOF risk is unacceptable.
4. The standard architecture: stateless application tier (horizontal); database tier (vertical first, then read replicas, then sharding as a last resort).
5. Sharding is the last step in database horizontal scaling. Exhaust vertical scaling, read replicas, connection pooling, and caching first.