AWS core services for system design interviews

Why This Guide Exists

Most system design interviews are cloud-agnostic in theory but AWS-flavored in practice. When your interviewer says "how would you store images?" they expect to hear "S3" not "some object store." When they say "how do you process events in real-time?" they are thinking Kinesis or SQS, not a generic message queue.

This guide covers every AWS service that matters for system design interviews. I organize them by category, cover the architecture, note the production gotchas I have seen in real systems, and give you the exact talking points interviewers want to hear. This is not documentation. This is what you need to know to sound like someone who has actually built on AWS.

How to use this guide

You do not need to memorize every service. Focus on the top 10 (S3, DynamoDB, Lambda, SQS, SNS, Kinesis, CloudFront, RDS/Aurora, ECS, API Gateway) and know the rest well enough to reference them when the design calls for it. An interviewer will never fault you for saying "I would use Kinesis Data Firehose for the delivery stream" even if you do not know every configuration detail.

1. Compute

Compute is the foundation. Every system design needs something to run the code, and your choice of compute primitive shapes everything downstream: scaling model, deployment strategy, cost structure, and cold start behavior.

EC2 (Elastic Compute Cloud)

What it solves: You need a virtual machine with full OS-level control.

Key talking points:

EC2 gives you full control over the OS, runtime, and networking. Use it when you need GPU instances (P4d, G5), custom kernel modules, or long-running stateful processes.
Instance families matter: C-series for compute (API servers), R-series for memory (caches, in-memory DBs), M-series for general purpose, G/P-series for ML inference.
Auto Scaling Groups (ASG) with launch templates are the standard HA pattern. Spread across 3 AZs minimum.
Spot instances save 60-90% for fault-tolerant workloads (batch processing, CI/CD runners, training jobs). Do not use them for your primary API servers.
Graviton (ARM) instances offer ~20% better price/performance for most workloads. I always recommend them unless your binary requires x86.

EC2 is rarely the right first answer

In a system design interview, if you reach for EC2 first, the interviewer will ask "why not Lambda?" or "why not containers?" EC2 is the right answer when you need long-running processes, GPU access, or very specific OS configurations. For stateless API servers, prefer ECS/Fargate or Lambda.

Production gotchas I have seen:

EBS volumes are AZ-locked. If your instance fails and relaunches in a different AZ, the volume does not follow. Use EFS for shared storage or design for statelessness.
The default instance limit per region is surprisingly low (around 20 for some instance types). I have seen production launches fail because nobody requested a limit increase ahead of time.
Placement groups matter for HPC: cluster placement for low latency, spread placement for fault isolation.

Lambda

What it solves: Run code without provisioning or managing servers. You pay per invocation (per 1ms of execution time).

Key talking points:

Lambda scales from zero to thousands of concurrent executions in seconds. The service handles all capacity planning.
Maximum execution time is 15 minutes. Maximum memory is 10GB (CPU scales proportionally with memory).
Lambda pricing is per-invocation ($0.20 per 1M requests) plus duration ($0.0000166667 per GB-second). At high volume, this can be more expensive than containers.
Cold starts vary by runtime: Python/Node.js ~200-500ms, Java ~3-10s (unless you use SnapStart). For latency-sensitive APIs, use provisioned concurrency.
Lambda has a concurrency limit of 1000 per region by default (can be raised to tens of thousands).

The Lambda crossover point

Lambda is cheaper than Fargate for workloads under roughly 1M requests/day with short durations (under 1 second). Beyond that, the per-invocation pricing adds up. I use Lambda for event-driven glue, webhooks, and bursty workloads. For sustained high-throughput APIs, containers win on cost.

Production gotchas I have seen:

Cold starts are the number one complaint. Use provisioned concurrency for user-facing APIs or switch to SnapStart for Java. For background tasks, cold starts do not matter.
The 6MB payload limit on synchronous invocation (via API Gateway) catches people. For larger payloads, use S3 presigned URLs.
Lambda functions share a regional concurrency pool. One runaway function can starve all others. Use reserved concurrency to isolate critical functions.
Environment variables are limited to 4KB total. Use SSM Parameter Store or Secrets Manager for larger config.

Failure modes practitioners know:

Async invocations retry twice on failure, then go to a dead-letter queue (if configured). If you forget the DLQ, failed events silently vanish.
SQS-triggered Lambdas use long polling. If your function throws an error, the message goes back to the queue and retries in a loop until the visibility timeout expires. Without a DLQ and maxReceiveCount, you get an infinite retry storm.
VPC-attached Lambdas used to have 10-30 second cold starts. AWS fixed this with Hyperplane ENIs in 2019, but I still see interviews ask about it.

ECS and Fargate

What it solves: Run Docker containers at scale without managing the underlying EC2 instances (Fargate) or with managed EC2 capacity (ECS on EC2).

Key talking points:

ECS is AWS's native container orchestrator. Fargate is the serverless compute engine for ECS (and EKS). With Fargate, you define CPU/memory per task and AWS handles placement.
ECS Service with desired count + ALB health checks gives you self-healing. Failed containers restart automatically.
Fargate pricing: pay per vCPU-hour ($0.04048) and per GB-hour ($0.004445). No charge for idle EC2 capacity.
Use ECS on EC2 when you need GPU containers, larger instance sizes, or cost optimization via Spot + Reserved instances at high scale.
ECS Exec lets you SSH into running containers for debugging, similar to kubectl exec.

ECS vs EKS: the interview answer

"If my team already uses Kubernetes and needs multi-cloud portability, I would use EKS. If I am building on AWS with a small-to-medium team and want less operational overhead, I would use ECS with Fargate. ECS is simpler, has deeper AWS integration (IAM task roles, CloudMap service discovery), and costs less in operational burden."

This is the answer interviewers want. They are testing whether you pick the right tool rather than the most complex one.

Production gotchas I have seen:

Fargate tasks take 30-60 seconds to provision. If your scaling policy reacts too late, you get a latency spike before new tasks come online. Set aggressive scaling thresholds (CPU 40%) and use target-tracking policies.
Task role vs execution role confusion is extremely common. The execution role pulls images from ECR and writes logs. The task role is what your application code uses to call other AWS services. Mix these up and you get cryptic "access denied" errors.
Container health checks in the task definition are separate from ALB health checks. You need both. If you only configure ALB health checks, ECS will not restart a container that is stuck in a bad state but still passing the ALB check.

EKS (Elastic Kubernetes Service)

What it solves: Managed Kubernetes control plane on AWS. You run standard Kubernetes workloads without managing the API server, etcd, or scheduler.

Key talking points:

EKS charges $0.10/hour for the control plane ($73/month) plus your node costs. The control plane is multi-AZ by default.
Use managed node groups for most workloads. Karpenter replaces the Kubernetes Cluster Autoscaler with faster, more efficient node provisioning (launches right-sized instances in seconds).
EKS on Fargate eliminates node management entirely but has limitations: no DaemonSets, no stateful storage, max 4 vCPU/30GB per pod.
IRSA (IAM Roles for Service Accounts) is the correct way to give pods AWS permissions. Do not attach IAM roles to nodes.

EKS operational overhead is real

EKS gives you Kubernetes, and Kubernetes has a tax. You need to manage the VPC CNI plugin, CoreDNS, kube-proxy, cert-manager, ingress controllers, and version upgrades. A team of two engineers should not run EKS. Use ECS/Fargate and save the operational burden for when you genuinely need Kubernetes features (CRDs, service mesh, multi-cloud).

App Runner

What it solves: The simplest way to deploy a containerized web service on AWS. You point it at a container image or source code, and it handles building, deploying, scaling, and TLS.

Key talking points:

App Runner is "Heroku on AWS." Minimal configuration, automatic scaling from zero to thousands of requests, built-in HTTPS.
Pricing: pay per vCPU-hour and GB-hour when active, plus a small provisioned charge when idle.
No VPC configuration needed by default (but you can connect to VPC resources via a VPC connector).
Use it for prototypes, internal tools, or simple APIs where you want to minimize operational overhead.

When to mention App Runner in an interview

App Runner is a good answer when the interviewer says "assume a small team, keep it simple." It shows you know the full spectrum from serverless (Lambda) to managed containers (App Runner/Fargate) to full control (EC2). Do not design a high-scale system design on App Runner; it has limited customization.

2. Storage

Storage services are the most commonly referenced AWS services in system design interviews. You will mention S3 in almost every design.

Step Functions

Before jumping to storage, one critical compute service deserves mention: Step Functions orchestrates multi-step workflows using a visual state machine. When your Lambda function needs to call three services in sequence with error handling, retries, and conditional branching, do not build that logic in code. Use Step Functions.

Key talking points:

Standard Workflows: long-running (up to 1 year), exactly-once execution, $0.025 per 1,000 state transitions.
Express Workflows: high-volume, short-duration (up to 5 minutes), at-least-once execution, $1.00 per 1M requests. Use for real-time data processing.
Step Functions integrates directly with 200+ AWS services (S3, DynamoDB, ECS, SQS) without writing Lambda glue code. An "optimized integration" calls the service directly from the state machine.
Use Step Functions for: order processing pipelines, data transformation workflows, human approval processes, and any orchestration that spans multiple services.

Step Functions replaces Lambda orchestration code

The mistake I see most often: a Lambda function that calls three other Lambdas in sequence, with try/catch blocks, retry logic, and state tracking in DynamoDB. This should be a Step Functions state machine. The state machine handles retries, error handling, parallel execution, and wait states declaratively. Your Lambdas do one thing each, and Step Functions orchestrates them.

Now, storage:

S3 (Simple Storage Service)

What it solves: Infinitely scalable object storage. Store anything from user uploads to data lake files to static website assets.

Key talking points:

S3 durability is 99.999999999% (11 nines). You will not lose data. Availability is 99.99% for Standard.
S3 supports 3,500 PUT/POST/DELETE and 5,500 GET requests per second per prefix. Partition your prefixes to scale beyond this (use random prefixes, not date-based).
Maximum object size is 5TB. Use multipart upload for anything over 100MB (required for over 5GB).
S3 Select and S3 Object Lambda let you query or transform data in-place without downloading the entire object.
S3 Event Notifications trigger Lambda, SQS, or SNS when objects are created/deleted, enabling event-driven pipelines.

The presigned URL pattern

For user uploads, never proxy files through your API server. Generate a presigned URL (valid for 5-15 minutes), return it to the client, and let the client upload directly to S3. This eliminates your server as a bottleneck, reduces bandwidth costs, and supports files up to 5GB. I use this pattern in every system that handles user-generated content.

Production gotchas I have seen:

S3 is eventually consistent for overwrites (PUT after PUT) and deletes. If you overwrite an object and immediately read it, you might get the old version. Design for this.
S3 bucket names are globally unique across all AWS accounts. If someone takes your desired name, you cannot have it. Use a naming convention like {company}-{env}-{purpose}.
S3 Transfer Acceleration uses CloudFront edge locations for faster uploads from distant clients. It costs extra ($0.04/GB) but reduces upload latency by 50-500% for global users.
Requester Pays buckets shift download costs to the requester. Use this for public datasets to avoid bandwidth cost surprises.

Failure modes practitioners know:

Accidental public bucket exposure has caused countless data breaches. S3 Block Public Access (account-level and bucket-level) should be enabled by default on every account. AWS now blocks public access by default on new buckets.
If you enable versioning, deletes do not actually remove data; they add a delete marker. Your storage costs grow silently. Use lifecycle rules to expire old versions.
Cross-region replication has a lag of seconds to minutes. Do not rely on it for real-time failover.

EBS (Elastic Block Store)

What it solves: Persistent block storage volumes for EC2 instances. Think of it as a virtual hard drive that persists independently from the instance lifecycle.

Key talking points:

EBS is AZ-scoped. A volume in us-east-1a cannot be attached to an instance in us-east-1b.
Volume types matter: gp3 (general purpose, 3000 baseline IOPS, $0.08/GB), io2 (provisioned IOPS up to 64,000, for databases), st1 (throughput-optimized HDD for big data), sc1 (cold HDD for infrequent access).
gp3 is almost always the right choice. It decouples IOPS and throughput from volume size (unlike gp2, where you needed bigger volumes for more IOPS).
EBS snapshots are stored in S3 (incremental). Use them for backups and cross-region disaster recovery.

EBS in system design interviews

EBS rarely comes up as a primary topic, but it matters when discussing databases on EC2 (self-managed Postgres, MongoDB). Always mention the volume type and IOPS when designing a database tier. "I would use io2 Block Express volumes provisioned at 40,000 IOPS for the primary database" shows you understand storage performance.

EFS (Elastic File System)

What it solves: Managed NFS file system that multiple EC2 instances or containers can mount simultaneously. Unlike EBS, EFS volumes span AZs.

Key talking points:

EFS automatically scales from gigabytes to petabytes. No capacity provisioning needed.
Throughput modes: bursting (scales with size), provisioned (fixed throughput), elastic (automatically scales with workload). Elastic mode is the new default and usually the right choice.
EFS costs more than EBS ($0.30/GB for Standard vs $0.08/GB for gp3). Use it only when you need shared file access.
Use cases: shared config files for a container fleet, CMS media directories, ML training data shared across instances, WordPress uploads.

EFS latency is higher than EBS

EFS single-operation latency is 1-3ms vs 0.1-0.5ms for EBS io2. Do not use EFS as your database storage. Use it for shared file access patterns where latency is not the bottleneck.

FSx

What it solves: Managed file systems for specialized workloads. FSx for Lustre provides high-performance parallel file access for HPC and ML training. FSx for Windows File Server provides fully managed Windows-native SMB shares. FSx for NetApp ONTAP provides multi-protocol enterprise storage.

Key talking points:

FSx for Lustre: sub-millisecond latency, hundreds of GB/s throughput. Integrates natively with S3 (auto-import/export). Use for ML training data, HPC simulations, and video rendering.
FSx for Windows: fully managed Active Directory integrated Windows file shares. Use for Windows application workloads (SQL Server, .NET apps, IIS shared content).
FSx for NetApp ONTAP: multi-protocol (NFS, SMB, iSCSI), deduplication, compression, snapshots, cross-region replication. Enterprise storage without managing the appliance.
In interviews, FSx rarely comes up directly. Mention it when the design requires high-throughput parallel I/O (ML training at scale) or Windows-native file shares.

When to mention FSx

"For ML training with 100GB+ datasets and multiple GPU instances, I would use FSx for Lustre backed by S3. Lustre provides sub-millisecond latency and hundreds of GB/s throughput, which eliminates I/O bottlenecks during training." This shows you understand that not all storage is S3.

3. Databases

Database selection is the single most impactful decision in system design. Get it wrong and you spend months migrating. Get it right and the system scales naturally.

RDS (Relational Database Service)

What it solves: Managed relational databases (PostgreSQL, MySQL, MariaDB, Oracle, SQL Server). AWS handles patching, backups, failover, and replication.

Key talking points:

Multi-AZ deployment provides an automatic failover standby (30-60 second failover). This is synchronous replication, not a read replica.
Read replicas are async. You can have up to 15 read replicas (5 for MySQL/MariaDB on basic RDS, 15 on Aurora). Cross-region replicas enable global reads.
RDS Proxy pools database connections, reducing the overhead of thousands of Lambda or container connections hitting the database.
Automated backups with point-in-time recovery (up to 35 days). Snapshots for longer retention.

RDS vs Aurora: the interview shortcut

"For PostgreSQL or MySQL, I would use Aurora unless cost is the primary constraint. Aurora provides 5x throughput improvement over standard MySQL and 3x over PostgreSQL, with up to 15 read replicas, automatic storage scaling to 128TB, and sub-10 second failover."

This is almost always the right answer. Standard RDS is cheaper for small, low-traffic databases.

Production gotchas I have seen:

Connection limits are instance-size dependent. A db.t3.micro supports ~60 connections. I have seen production outages because developers launched with a tiny instance during development and forgot to resize before go-live.
Multi-AZ failover changes your database endpoint's DNS record. Applications with DNS caching (JVM default caches for 30 seconds) may continue connecting to the old instance. Set TTL to 5 seconds or use RDS Proxy.
Parameter group changes (like changing max_connections) often require a reboot. Schedule these during maintenance windows.

Aurora

What it solves: AWS's cloud-native relational database (MySQL and PostgreSQL compatible) with distributed storage that separates compute from storage.

Key talking points:

Aurora stores 6 copies of your data across 3 AZs. Writes require 4/6 quorum, reads require 3/6. This means it tolerates losing an entire AZ plus one more node.
Storage auto-scales from 10GB to 128TB. You never provision storage.
Aurora Serverless v2 scales compute from 0.5 to 256 ACUs (Aurora Capacity Units) based on load. Scales in increments of 0.5 ACU in milliseconds. Ideal for variable workloads.
Aurora Global Database provides cross-region read replicas with under 1 second replication lag and promotes a secondary region in under 1 minute for disaster recovery.
Aurora DSQL (released 2024) is AWS's distributed SQL offering, providing active-active multi-region writes with strong consistency.

Aurora is not cheap

Aurora costs about 20% more than standard RDS for equivalent instance sizes. Storage costs $0.10/GB/month (vs $0.115 for gp3 EBS, but Aurora auto-scales). I still recommend it for production workloads because the operational benefits (auto-scaling storage, faster failover, built-in replication) outweigh the cost premium.

DynamoDB

What it solves: Fully managed NoSQL key-value and document database. Single-digit millisecond performance at any scale, with no capacity planning for on-demand mode.

Key talking points:

DynamoDB has two capacity modes: on-demand (pay per request, no planning) and provisioned (cheaper at steady-state, requires you to set read/write capacity units).
Partition key design is everything. A hot partition key caps your throughput. Use high-cardinality keys (userId, orderId) and avoid time-based partitions that create hot spots.
Global Secondary Indexes (GSI) let you query on non-primary-key attributes. Each GSI is essentially a separate table with its own capacity.
DynamoDB Streams provides change data capture (CDC) for event-driven architectures, search indexing, and cross-region replication.
Global Tables replicate data across regions with sub-second replication lag, enabling active-active architectures.

DynamoDB is not a relational database replacement

DynamoDB does not support JOINs, complex aggregations, or ad-hoc queries. You must know your access patterns upfront and design your table schema (partition key, sort key, GSIs) around them. If your access patterns are unknown or change frequently, use a relational database.

Production gotchas I have seen:

The single-partition throughput limit is 3,000 RCU and 1,000 WCU (about 3,000 reads and 1,000 writes per second per partition). If all traffic hits one partition key, you are throttled regardless of your total table capacity.
On-demand mode has a "burst" limit: it can instantly handle 2x your previous peak. If traffic jumps from 100 TPS to 10,000 TPS without ramp-up, you get throttled. Pre-warm the table by gradually increasing traffic.
GSI updates are eventually consistent. If you write to the base table and immediately query the GSI, you might not see the write.
Item size limit is 400KB. For larger documents, store a pointer to S3.

ElastiCache

What it solves: Managed Redis or Memcached for in-memory caching. Sub-millisecond read latency for frequently accessed data.

Key talking points:

Redis on ElastiCache supports data structures (strings, hashes, sorted sets, lists, streams), pub/sub, Lua scripting, and cluster mode for horizontal scaling.
Cluster mode enabled: data is sharded across up to 500 shards with up to 5 replicas each. This scales to millions of operations per second.
Memcached is simpler (pure key-value, multi-threaded) and better for simple caching. Redis is better for everything else.
ElastiCache Serverless (released 2023) automatically scales cache capacity. You pay per GB stored and per ECPUs consumed.

Cache-aside is the default pattern

Always describe cache-aside (lazy loading) in interviews: read from cache first, on miss read from DB and populate cache. Write-through (write to cache and DB on every write) is only needed for read-dominated workloads where cache misses are expensive.

Production gotchas I have seen:

Redis failover on ElastiCache takes 15-30 seconds. During failover, writes fail. Your application needs retry logic and must handle connection interruptions.
The "thundering herd" problem: when a popular cache key expires, hundreds of concurrent requests miss the cache and slam the database simultaneously. Use cache stampede protection (lock or probabilistic early recomputation).
ElastiCache Redis runs in your VPC. Lambda functions in a VPC can connect to it, but Lambda functions outside a VPC cannot. Plan your networking.

MemoryDB for Redis

What it solves: Redis-compatible, durable in-memory database. Unlike ElastiCache, MemoryDB provides durability via a distributed transaction log, making it suitable as a primary database.

Key talking points:

MemoryDB stores data durably across multiple AZs using a transaction log. You do not lose data on node failure (unlike ElastiCache, where data can be lost during failover).
Microsecond read latency, single-digit millisecond write latency.
Use MemoryDB when Redis is your primary data store (session store, leaderboard, feature store). Use ElastiCache when Redis is a cache in front of another database.
MemoryDB costs about 2x ElastiCache for equivalent instance sizes because of the durability guarantee.

MemoryDB vs ElastiCache: the decision

"If Redis is my cache, I use ElastiCache. If Redis is my database, I use MemoryDB." This is the one-liner interviewers want. MemoryDB survives node failures without data loss. ElastiCache can lose the most recent writes during failover.

DocumentDB

What it solves: Managed document database compatible with MongoDB workloads.

Key talking points:

DocumentDB implements the MongoDB 3.6/4.0/5.0 wire protocol. Most MongoDB drivers and tools work without modification.
Under the hood, DocumentDB uses a similar architecture to Aurora (shared distributed storage, separate compute). It is not running MongoDB code.
Use DocumentDB when you have an existing MongoDB workload and want to move to managed AWS infrastructure. For new projects, I recommend DynamoDB for key-value access and Aurora for relational access.

DocumentDB is not MongoDB

DocumentDB has behavioral differences from MongoDB: different consistency guarantees, limited aggregation pipeline support, and no multi-document ACID transactions across shards. Test your application thoroughly before migrating. I have seen teams assume compatibility and encounter subtle bugs in production.

4. Messaging and Streaming

Messaging services decouple producers from consumers. In system design, they are how you go from "synchronous monolith" to "scalable distributed system." Know when to use queues vs topics vs streams.

SQS (Simple Queue Service)

What it solves: Fully managed message queue for decoupling microservices. Producers send messages, consumers poll and process them independently.

Key talking points:

Standard queues: nearly unlimited throughput, at-least-once delivery, best-effort ordering. FIFO queues: exactly-once processing, strict ordering, limited to 3,000 messages/second (with batching).
Visibility timeout: once a consumer receives a message, it is invisible to other consumers for a configurable period (default 30 seconds). If the consumer does not delete the message before the timeout, it becomes visible again for retry.
Long polling (WaitTimeSeconds: 20) reduces empty responses and API costs. Always use long polling.
Dead Letter Queues (DLQ) capture messages that fail processing after a configured number of attempts. Never deploy an SQS consumer without a DLQ.
Message retention: up to 14 days. Maximum message size: 256KB. For larger payloads, store the body in S3 and send an S3 pointer in the message (using the Extended Client Library).

SQS + Lambda is the default async pattern

When an interviewer says "process this in the background," my default answer is: "Drop a message on SQS, trigger a Lambda consumer." Lambda handles polling, scaling, and retry automatically. The only configuration is the batch size, visibility timeout, and DLQ.

Production gotchas I have seen:

Standard SQS can deliver duplicates. Your consumer must be idempotent. The most common approach: use a deduplication key in a database and check before processing.
Visibility timeout must be longer than your processing time. If processing takes 5 minutes and your visibility timeout is 30 seconds, the message reappears and gets processed by another consumer concurrently, causing duplicate processing.
FIFO queues require a MessageGroupId. Messages within the same group are ordered, but different groups process in parallel. Do not use a single group for all messages (this serializes everything).

What it solves: Fully managed pub/sub messaging. A single message published to a topic fans out to multiple subscribers simultaneously.

Key talking points:

SNS supports multiple subscriber types: SQS, Lambda, HTTP/S endpoints, email, SMS, mobile push.
The SNS + SQS fan-out pattern is the standard for event-driven architectures. Publish once to SNS, fan out to multiple SQS queues, each powering an independent consumer.
SNS message filtering lets subscribers receive only messages matching a filter policy (based on message attributes). This avoids every consumer receiving every message.
SNS FIFO topics pair with SQS FIFO queues for ordered fan-out.

SNS vs SQS: the key distinction

SQS is a queue (point-to-point, one consumer processes each message). SNS is a topic (pub/sub, every subscriber gets every message). In practice, I almost always use them together: publish to SNS, subscribe SQS queues, and have each service consume from its own queue.

Kinesis Data Streams

What it solves: Real-time streaming data ingestion. Collect, process, and analyze hundreds of thousands of records per second from clickstreams, IoT sensors, logs, and application events.

Key talking points:

Kinesis uses shards for throughput. Each shard supports 1MB/s or 1,000 records/s for writes and 2MB/s for reads. Scale by adding shards.
Data retention: 24 hours by default, extendable to 365 days. Unlike SQS, Kinesis retains data after consumption, supporting multiple consumers and replay.
Enhanced fan-out gives each consumer a dedicated 2MB/s read throughput per shard (vs the shared 2MB/s without it). Use this when you have more than 2 consumers.
Kinesis is ordered within a shard. The partition key determines the shard. Same partition key = same shard = ordered processing.

Kinesis vs SQS: the decision matrix

Use SQS when: messages are independent, you want automatic scaling, you need exactly-once processing (FIFO), or you do not need replay. Use Kinesis when: you need real-time streaming, ordered processing, multiple consumers reading the same data, or the ability to replay events. Kinesis is more complex and expensive. Default to SQS unless you genuinely need streaming.

Production gotchas I have seen:

"Hot shards" are the number one operational problem. If 80% of your events have the same partition key, one shard handles 80% of the traffic. Use high-cardinality partition keys.
Kinesis has a 5 read transactions per second per shard limit (shared across all consumers). With 3 consumers polling, you exhaust this quickly. Use enhanced fan-out for production workloads.
Resharding (splitting or merging shards) is manual and takes time. Plan your initial shard count based on peak throughput, not average.

EventBridge

What it solves: Serverless event bus for routing events between AWS services, SaaS applications, and your custom applications based on rules and content-based filtering.

Key talking points:

EventBridge Scheduler replaces CloudWatch Events for cron jobs. One-time or recurring schedules with built-in retry, dead-letter queues, and timezone support.
EventBridge Pipes connects sources (SQS, Kinesis, DynamoDB Streams) to targets with optional filtering and transformation. No Lambda glue code needed.
Content-based filtering evaluates event content (not just source). You can route "order.placed" events where amount > $1000 to a fraud-check Lambda.
Schema Registry automatically discovers and stores event schemas, enabling code generation for producers and consumers.

EventBridge is the modern default for event routing

If I am designing a microservices system from scratch, I use EventBridge as the event bus instead of raw SNS/SQS. EventBridge provides content-based routing, schema registry, archive/replay, and native AWS service integration. SNS + SQS is simpler for pure fan-out; EventBridge is better for complex routing logic.

MSK (Managed Streaming for Apache Kafka)

What it solves: Fully managed Apache Kafka for teams that need Kafka-specific features (log compaction, exactly-once semantics, Kafka Streams, Kafka Connect) with reduced operational burden.

Key talking points:

MSK manages Kafka brokers, Apache ZooKeeper (or KRaft mode), patching, and storage. You still manage topics, partitions, and consumer groups.
MSK Serverless eliminates cluster management entirely. You pay per data ingested and stored, with automatic capacity scaling.
Use MSK when: you have existing Kafka expertise, need Kafka-specific features (log compaction, Kafka Connect ecosystem), or are migrating from self-managed Kafka. Use Kinesis when: you want simpler operations and deeper AWS integration.
MSK pricing: per broker instance-hour plus EBS storage. A 3-broker cluster on m5.large costs ~$500/month before storage.

Kafka vs Kinesis in interviews

"If the team has Kafka expertise and uses the Kafka ecosystem (Kafka Connect, Kafka Streams, Schema Registry), I would use MSK. If we are building from scratch on AWS and want simpler operations, I would use Kinesis Data Streams." This distinction shows you understand that the choice is about operational trade-offs, not technical capability.

5. Networking and Content Delivery

Networking services are the connective tissue of every architecture. In system design interviews, knowing when to use an ALB versus an API Gateway, or how CloudFront caching works, separates practitioners from people who just read documentation.

CloudFront

What it solves: Global content delivery network (CDN) with 450+ edge locations. Caches static and dynamic content close to users for lower latency.

Key talking points:

CloudFront supports multiple origins with path-based routing. Route /api/* to your ALB and /* to S3. This is the standard pattern for serving a single-page app with an API backend.
Origin Shield adds a centralized caching tier between edge locations and your origin. It reduces origin load by collapsing duplicate requests from different edges.
Lambda@Edge and CloudFront Functions run code at the edge. Use CloudFront Functions for lightweight transformations (URL rewriting, header manipulation) and Lambda@Edge for heavier logic (authentication, A/B testing, dynamic content generation).
CloudFront signed URLs and signed cookies restrict access to private content. Use signed URLs for individual files, signed cookies for multiple files (like video segments).
Data transfer from S3 to CloudFront is free. You only pay for CloudFront data transfer to the user ($0.085/GB for the first 10TB).

CloudFront as a security layer

CloudFront integrated with AWS WAF blocks malicious traffic at the edge before it reaches your origin. I always put CloudFront in front of every public-facing application, even if the users are all in one region. The WAF integration, DDoS protection (Shield Standard is free), and TLS termination at the edge make it worthwhile.

Production gotchas I have seen:

CloudFront caches 4xx and 5xx error responses by default. If your origin returns a temporary 500, CloudFront caches it and serves it to all users. Configure custom error caching TTLs (set error TTL to 0 or 5 seconds).
Cache invalidation is slow (takes 1-5 minutes) and the first 1,000 invalidation paths per month are free, then $0.005 per path. Instead of invalidation, use versioned filenames (e.g., app.abc123.js).
CloudFront behaviors (path patterns) are evaluated in order. The most specific pattern must come first. I have seen wildcard patterns accidentally match traffic intended for a more specific path.

Route 53

What it solves: Managed DNS service with health checking and traffic routing policies (weighted, latency-based, geolocation, failover).

Key talking points:

Routing policies: Simple (single resource), Weighted (distribute a percentage of traffic), Latency (route to lowest-latency region), Geolocation (route by user location), Failover (active-passive DR), and Multivalue (return multiple healthy IPs).
Health checks monitor endpoints and trigger failover. Check interval: 30 seconds (or 10 seconds for fast health checks). Combine with CloudWatch alarms for complex health evaluation.
Route 53 is the only AWS service with a 100% SLA. It is globally distributed and highly available by design.
Alias records are Route 53's special record type that maps a domain directly to an AWS resource (ALB, CloudFront, S3 website) without a CNAME's extra DNS hop.

Route 53 in multi-region designs

When your interviewer asks about multi-region failover, Route 53 is always part of the answer. "I would use Route 53 latency-based routing to direct users to the nearest healthy region, with health checks on each region's ALB. If the primary region fails, Route 53 automatically routes traffic to the secondary region within 30-60 seconds."

API Gateway

What it solves: Managed API proxy that handles authentication, throttling, request transformation, and API versioning for REST, HTTP, and WebSocket APIs.

Key talking points:

REST API vs HTTP API: REST API has more features (request validation, API keys, usage plans, caching, AWS X-Ray tracing) but costs 3.5x more and has higher latency. HTTP API is the right choice for most Lambda proxy integrations.
API Gateway throttling: 10,000 requests/second account-level limit (adjustable), with per-stage and per-method limits. Protects your backend from traffic spikes.
WebSocket API maintains persistent connections for real-time features (chat, notifications, live updates). Manages connection state and routes frames to Lambda.
API Gateway caching (REST API only) puts a TTL-based cache in front of your backend, reducing Lambda invocations and backend load.

API Gateway payload limits

API Gateway has a 10MB payload limit and a 29-second integration timeout. If your API handles file uploads or long-running operations, you need workarounds: presigned S3 URLs for uploads and async patterns (SQS + polling or WebSocket callbacks) for long operations.

ALB and NLB (Elastic Load Balancing)

What it solves: Distribute incoming traffic across multiple targets (EC2, containers, Lambda, IP addresses).

Key talking points:

ALB (Layer 7): Routes based on HTTP path, host header, query string, and HTTP method. Supports sticky sessions, WebSocket, HTTP/2, and gRPC. Use for web applications and REST APIs.
NLB (Layer 4): Routes TCP/UDP connections. Provides static IP addresses, ultra-low latency (~100us for connection setup), and millions of requests per second. Use for gRPC, gaming, IoT, and anything requiring static IPs or non-HTTP protocols.
Cross-zone load balancing distributes traffic evenly across all targets in all AZs. Enabled by default on ALB, optional on NLB.
ALB costs ~$0.0225/hour plus $0.008 per LCU (Load Balancer Capacity Unit). NLB costs ~$0.0225/hour plus $0.006 per NLCU.

VPC (Virtual Private Cloud)

What it solves: Isolated virtual network within AWS. VPCs define the network topology for all your resources: subnets, routing tables, network ACLs, and security groups.

Key talking points:

Every AWS resource that sits on a network (EC2, RDS, ECS, Lambda with VPC access) runs inside a VPC. Understanding VPC design is fundamental.
Standard VPC architecture: public subnets (internet-facing ALB, NAT gateways), private subnets (app servers, databases), across 3 AZs.
Security groups are stateful (allow return traffic automatically). Network ACLs are stateless (must explicitly allow both inbound and outbound).
VPC peering connects two VPCs for private communication. Transit Gateway simplifies multi-VPC routing (hub-and-spoke model).
VPC endpoints (Gateway endpoints for S3/DynamoDB, Interface endpoints for other services) keep AWS API traffic on the private network, avoiding NAT gateway costs.

NAT Gateway cost trap

NAT Gateways cost $0.045/hour ($32/month) per AZ plus $0.045/GB processed. In a 3-AZ setup, that is $96/month just for the gateway, before any data processing. If your private subnets talk to AWS services (S3, DynamoDB, SQS), use VPC endpoints instead of routing through the NAT Gateway. I have seen teams cut their NAT Gateway bill by 80% by adding VPC endpoints.

6. Security and Identity

Security services come up in every system design interview, usually when discussing authentication, authorization, and data protection. "How do you secure this?" is a question you should answer proactively.

IAM (Identity and Access Management)

What it solves: Controls who can do what in your AWS account. Every API call to AWS is authenticated and authorized through IAM.

Key talking points:

Principle of least privilege: grant only the permissions required to perform a task. Use specific resource ARNs and actions, not *.
IAM roles are preferred over IAM users for everything except human console access. EC2 instance profiles, ECS task roles, Lambda execution roles, and IRSA for EKS pods.
Service Control Policies (SCPs) in AWS Organizations set guardrails across your entire account structure. Use them to prevent disabling CloudTrail, creating unencrypted resources, or deploying in unapproved regions.
IAM policy evaluation order: explicit deny wins, then check SCPs, then identity policies, then resource policies, then permission boundaries.

Never use IAM access keys in code

Hardcoded AWS access keys in application code or environment variables is a critical security vulnerability. Use IAM roles everywhere: EC2 instance profiles, ECS task roles, Lambda execution roles. For local development, use aws sso login. I have seen production databases compromised because someone committed access keys to a public GitHub repo.

Cognito

What it solves: User authentication and authorization for web and mobile applications. Provides user sign-up, sign-in, and token management without building your own auth system.

Key talking points:

User Pools handle authentication (sign-up, sign-in, MFA, password policies, social login federation with Google/Facebook/Apple).
Identity Pools (Federated Identities) exchange Cognito tokens for temporary AWS credentials, letting users directly access AWS resources (S3, DynamoDB) with fine-grained IAM policies.
Cognito issues standard JWTs (ID token, access token, refresh token). API Gateway can validate these directly without custom code.
Supports OpenID Connect and SAML 2.0 for enterprise SSO federation.
Pricing: free for the first 50,000 monthly active users (MAUs), then $0.0055/MAU.

Cognito limitations to know

Cognito's advanced features have rough edges: the Hosted UI is not highly customizable, email sending requires SES configuration for production volumes (default sandbox limit is 50 emails/day), and user migration from another system requires a Lambda trigger that runs on first login. Evaluate these limitations before choosing Cognito over Auth0 or a custom solution.

KMS (Key Management Service)

What it solves: Create and manage encryption keys for data encryption at rest and in transit across AWS services.

Key talking points:

KMS integrates with almost every AWS service (S3, EBS, RDS, DynamoDB, SQS, Kinesis) for envelope encryption. You enable encryption with a checkbox and point to a KMS key.
AWS-managed keys (free), Customer-managed keys ($1/month per key + $0.03 per 10,000 API calls), and custom key stores (CloudHSM backed).
Envelope encryption: KMS generates a data key, encrypts your data with the data key, then encrypts the data key with the KMS key. Only the encrypted data key is stored alongside your data.
Key rotation: automatic annual rotation for customer-managed keys. Old versions are retained for decrypting existing data.

Encryption in interview answers

Always mention encryption in your system design. "All data at rest is encrypted with customer-managed KMS keys, and all data in transit uses TLS 1.2+." This takes 10 seconds to say and shows security awareness. It is especially important for healthcare (HIPAA), finance (PCI-DSS), and any system handling PII.

WAF (Web Application Firewall)

What it solves: Protects web applications from common exploits (SQL injection, XSS, OWASP Top 10) and bot traffic by filtering HTTP/HTTPS requests at CloudFront, ALB, or API Gateway.

Key talking points:

WAF rules evaluate request parameters (headers, body, URI, query strings, IP) and allow, block, count, or CAPTCHA-challenge matching requests.
AWS Managed Rules provide pre-built rule sets: Core Rule Set (OWASP), SQL injection, XSS, IP reputation list, and Bot Control. These cost $1-10/month per rule group.
Rate-based rules block IPs exceeding a threshold (e.g., 2000 requests per 5 minutes). This is your first line of defense against L7 DDoS.
WAF is attached to CloudFront distributions, ALBs, or API Gateway stages. Always attach it to CloudFront for global protection.

WAF + Shield for DDoS protection

AWS Shield Standard (free, automatic L3/L4 DDoS protection) covers all AWS resources. Shield Advanced ($3,000/month) adds L7 DDoS protection, real-time visibility, a DDoS response team, and cost protection for scaling during attacks. For most applications, Shield Standard + WAF rate-based rules are sufficient.

Secrets Manager

What it solves: Store, rotate, and retrieve database credentials, API keys, and other secrets. Applications call the Secrets Manager API instead of hardcoding secrets.

Key talking points:

Automatic rotation: Secrets Manager can rotate RDS, Redshift, and DocumentDB credentials automatically on a schedule. Custom Lambda functions handle rotation for other secret types.
Pricing: $0.40 per secret per month + $0.05 per 10,000 API calls. Use SSM Parameter Store (SecureString, free for standard parameters) for non-rotating secrets to save cost.
Secrets Manager integrates with ECS (inject secrets as environment variables), Lambda (environment variables or direct API call), and RDS Proxy (automatic credential handling).

7. Observability

You cannot operate a system you cannot observe. Observability services provide the metrics, logs, and traces you need to detect problems, diagnose root causes, and verify that your system is healthy.

CloudWatch

What it solves: Unified monitoring for AWS resources and applications. Collects metrics, logs, and events. Triggers alarms and automated actions.

Key talking points:

CloudWatch Metrics: built-in metrics for every AWS service (free, 5-minute resolution or 1-minute for detailed monitoring). Custom metrics via PutMetricData API ($0.30/metric/month).
CloudWatch Logs: centralized log storage. Use metric filters to extract numeric values from logs and create alarms (e.g., count ERROR occurrences).
CloudWatch Alarms: threshold-based or anomaly detection. Alarm states: OK, ALARM, INSUFFICIENT_DATA. Actions: SNS notification, Auto Scaling trigger, or EC2 action.
CloudWatch Logs Insights: SQL-like query language for searching and analyzing log data. Fast and cheap for ad-hoc troubleshooting.
Embedded Metric Format (EMF): emit custom metrics directly from your application logs without separate PutMetricData calls.

The three alarms every system needs

Error rate: alarm when 5xx errors exceed 1% of total requests. 2. Latency: alarm when P99 latency exceeds your SLA (e.g., 500ms). 3. Queue depth: alarm when SQS ApproximateAgeOfOldestMessage exceeds 5 minutes. These three catch most production incidents before users notice.

X-Ray

What it solves: Distributed tracing for microservices. Traces a request as it flows across Lambda functions, API Gateway, ECS services, and other AWS resources.

Key talking points:

X-Ray creates a service map showing how requests flow through your system and where latency or errors originate.
The X-Ray SDK instruments outgoing HTTP calls, AWS SDK calls, and SQL queries automatically. Add it to your Lambda function or ECS task with minimal code.
Trace sampling reduces cost: you do not need to trace every request. The default is 1 request/second + 5% of additional requests.
X-Ray integrates with CloudWatch ServiceLens for a unified view of traces, metrics, and logs.

X-Ray vs third-party tracing

X-Ray is good for basic distributed tracing within AWS. For complex microservice architectures, I prefer Datadog APM or Jaeger with OpenTelemetry. X-Ray's UI for trace analysis is limited compared to dedicated observability platforms. That said, mentioning X-Ray in an interview shows you know the AWS native tool.

CloudTrail

What it solves: Logs every API call made in your AWS account. Who did what, when, from where, and to which resource.

Key talking points:

CloudTrail records API calls to S3 for auditing, compliance, and forensics. Enabled by default for management events (IAM, EC2 operations) for 90 days.
Data events (S3 object-level operations, Lambda invocations) require explicit configuration and cost $0.10 per 100,000 events.
CloudTrail Lake provides SQL queryable event storage for security investigations (who deleted this bucket? when was this IAM policy changed?).
Organization trails capture events across all accounts in an AWS Organization.

CloudTrail is your forensic lifeline

If someone compromises an IAM key or deletes a resource, CloudTrail is how you trace the damage. Always send CloudTrail logs to a centralized S3 bucket in a separate security account with Object Lock (WORM) enabled. An attacker who compromises your application account should not be able to delete your audit trail.

8. AI/ML Services

AI/ML services are increasingly appearing in system design interviews, especially for recommendation engines, content moderation, document processing, and generative AI applications.

SageMaker

What it solves: End-to-end ML platform for building, training, and deploying machine learning models.

Key talking points:

SageMaker manages the full ML lifecycle: data labeling (Ground Truth), notebooks, training, hyperparameter tuning, model registry, endpoint deployment, and monitoring.
Real-time endpoints: dedicated instances serving your model with autoscaling. For GPUs, use ml.g5.xlarge for inference ($1.41/hour).
SageMaker Serverless Inference: scale to zero when no traffic, cold start of 1-2 minutes. Good for low-traffic or experimental endpoints.
Multi-model endpoints: load multiple models on a single instance and route by request. Saves cost when serving many low-traffic models.

SageMaker in interviews

In most system design interviews, you do not need to describe the ML model training pipeline in detail. Focus on the inference endpoint: "The ML model runs on a SageMaker real-time endpoint behind an API Gateway. The endpoint auto-scales based on invocation count and P99 latency. Model artifacts are stored in S3 and versioned in the SageMaker Model Registry."

Bedrock

What it solves: Managed service for accessing foundation models (Claude, Llama, Titan, Mistral) via API for generative AI applications without managing infrastructure.

Key talking points:

Bedrock provides a unified API (Converse API) for calling different foundation models. Switching models requires changing one parameter, not your architecture.
Knowledge Bases implement RAG (Retrieval-Augmented Generation): ingest documents from S3, embed them, store in OpenSearch Serverless, and retrieve relevant chunks to augment model prompts.
Bedrock Agents can call tools (Lambda functions), maintain conversation state, and orchestrate multi-step workflows autonomously.
Guardrails filter harmful content, block PII, enforce topic policies, and control hallucination. Layer them in front of any model.
Pricing: per-token (input and output). Claude 3.5 Sonnet on Bedrock costs $3/million input tokens and $15/million output tokens. Provisioned Throughput reduces per-token cost for steady workloads.

Bedrock vs SageMaker for GenAI

"If I am using a foundation model off the shelf (with or without fine-tuning), I use Bedrock. If I am training a custom model from scratch or need full control over the serving infrastructure, I use SageMaker." Bedrock is faster to start, SageMaker is more flexible. For most generative AI applications in interviews (chatbot, RAG, content generation), Bedrock is the right answer.

Comprehend, Rekognition, and Textract

What it solves: Pre-built AI services for natural language processing (Comprehend), image/video analysis (Rekognition), and document extraction (Textract).

Key talking points:

Comprehend: sentiment analysis, entity extraction, language detection, topic modeling, PII detection. No ML expertise needed. Use for content moderation, document classification, and customer feedback analysis.
Rekognition: object/face detection, celebrity recognition, text in images, content moderation, custom labels for domain-specific detection. Use for photo moderation, identity verification, and visual search.
Textract: extract text, tables, and forms from scanned documents and PDFs. Goes beyond OCR by understanding document structure. Use for invoice processing, identity document extraction, and form digitization.

Pre-built AI in system design

When an interviewer asks about content moderation, document processing, or sentiment analysis, reach for these services before saying "train a custom model." "I would use Rekognition for image moderation with a confidence threshold of 90%, backed by a human review queue (A2I) for borderline cases." This shows you prefer managed services over building from scratch.

9. Data and Analytics

Data and analytics services come up in system design interviews involving reporting, ETL pipelines, real-time dashboards, and data warehousing.

Redshift

What it solves: Managed data warehouse for petabyte-scale analytics using SQL. Columnar storage optimized for OLAP queries over large datasets.

Key talking points:

Redshift Serverless eliminates cluster management. You pay per RPU-hour (Redshift Processing Unit) based on actual query compute. Ideal for variable workloads.
Redshift Spectrum queries data directly in S3 without loading it into Redshift tables. Combine Redshift tables (hot data) with S3 (cold data) for a lakehouse architecture.
Redshift ML runs SageMaker models directly in SQL with CREATE MODEL statements.
Concurrency scaling automatically adds cluster capacity when query queues build up. No manual intervention needed.

Redshift vs Athena: the decision

Use Redshift for: dashboards with frequent, complex queries by many concurrent users, sub-second query latency requirements, and workloads where the same data is queried repeatedly. Use Athena for: ad-hoc queries, infrequent analysis, and querying data that is already in S3 without loading it into a warehouse. Redshift is more expensive but faster for repeated queries.

Athena

What it solves: Serverless SQL query engine for data in S3. No infrastructure to manage, pay per query ($5 per TB scanned).

Key talking points:

Athena is Presto/Trino under the hood. It supports standard SQL, window functions, and complex joins.
Cost optimization is about scanning less data: partition your S3 data by date/region, use columnar formats (Parquet, ORC), and compress with Snappy or ZSTD. Converting from JSON to Parquet can reduce query costs by 90%.
Athena Federated Query lets you query RDS, DynamoDB, Redshift, and other sources alongside S3 data.
Use the Glue Data Catalog as a shared metastore for both Athena and Redshift Spectrum.

Glue

What it solves: Managed ETL (Extract, Transform, Load) service for data preparation and integration. Serverless Apache Spark engine plus a central data catalog.

Key talking points:

Glue ETL jobs run Apache Spark serverlessly. You define the transformation logic, and Glue manages the cluster.
Glue Data Catalog is the central metadata store for S3 data lakes. Athena, Redshift Spectrum, and EMR all use it for table definitions.
Glue Crawlers automatically discover schema from S3 data and populate the catalog. Save hours of manual schema definition.
Glue Studio provides a visual ETL editor for non-engineers (data analysts, BI teams).
Pricing: $0.44 per DPU-hour (Data Processing Unit). A small job with 2 DPUs running for 10 minutes costs $0.15.

Glue as the default ETL choice

When an interviewer asks about data pipelines, my default answer is Glue. "I would use Glue ETL to transform raw JSON from S3 into partitioned Parquet, update the Glue Data Catalog, and make the data queryable by Athena and Redshift Spectrum." This covers the full ETL pipeline in one sentence.

EMR (Elastic MapReduce)

What it solves: Managed Hadoop/Spark/Hive/Presto clusters for big data processing. Full control over the cluster configuration and open-source framework versions.

Key talking points:

EMR is the right choice when Glue ETL is insufficient: custom Spark configurations, specific library versions, notebook-based exploration, or Hadoop ecosystem tools (HBase, Flink, Hudi).
EMR Serverless eliminates cluster management. Submit Spark or Hive jobs and EMR provisions the compute. Pricing: per vCPU-hour and per GB-hour.
EMR on EKS runs Spark on your existing Kubernetes cluster, sharing node resources with other workloads.
Use Spot instances for EMR task nodes (not core or master nodes) to reduce costs by 60-90%.

EMR vs Glue in 10 seconds

Glue: simple ETL, no cluster management, visual editor, good for routine data transformation. EMR: complex big data workloads, custom configurations, interactive notebooks, full Hadoop ecosystem access. Default to Glue and escalate to EMR only when Glue cannot meet the requirement.

Kinesis Data Firehose

What it solves: Fully managed delivery stream that captures, transforms, and loads streaming data into S3, Redshift, OpenSearch, and HTTP endpoints. Zero administration.

Key talking points:

Firehose automatically batches, compresses, and encrypts data before delivery. Minimum buffer interval: 60 seconds.
Built-in format conversion: JSON to Parquet or ORC using the Glue Data Catalog schema. This eliminates a separate ETL step for data lake ingestion.
Lambda transformation: run a Lambda function on each batch for data enrichment, filtering, or reformatting before delivery.
Firehose is the simplest path from real-time data to S3. "Kinesis Data Streams for real-time processing, Firehose for delivery to storage."

The streaming data pipeline pattern

"Producers send events to Kinesis Data Streams. Lambda consumers process events in real-time (alerts, aggregations). Kinesis Data Firehose consumes from the same stream and delivers batched, Parquet-formatted data to S3 for the data lake. Athena queries the S3 data for ad-hoc analysis." This is the standard real-time + batch pattern.

10. Infrastructure as Code

Infrastructure as Code (IaC) rarely comes up as a primary interview topic, but mentioning it shows operational maturity. "All infrastructure is defined in CDK/CloudFormation and deployed via CI/CD" is a powerful one-liner.

CloudFormation

What it solves: AWS-native IaC service. Define your infrastructure as JSON or YAML templates and deploy them as versioned stacks.

Key talking points:

CloudFormation creates, updates, and deletes resources in dependency order. If creation fails, it rolls back to the previous state.
Stack sets deploy the same template across multiple accounts and regions simultaneously (for AWS Organizations).
Drift detection identifies resources that have been manually modified outside CloudFormation.
Change sets preview changes before applying them, reducing "surprise" modifications.

CloudFormation vs Terraform

"I prefer CDK (which generates CloudFormation) for AWS-only environments because of the deep AWS integration. I use Terraform for multi-cloud or when the team already has Terraform expertise." This shows you understand both tools and make pragmatic decisions.

CDK (Cloud Development Kit)

What it solves: Define infrastructure using familiar programming languages (TypeScript, Python, Java, Go, C#). CDK synthesizes your code into CloudFormation templates.

Key talking points:

CDK Constructs are reusable building blocks at three levels: L1 (raw CloudFormation resources), L2 (opinionated defaults with best practices), L3 (patterns that combine multiple resources).
L2 constructs handle 80% of the boilerplate: a Function construct creates the Lambda, execution role, log group, and CloudWatch alarms. A RestApi construct creates API Gateway, stages, and deployment.
CDK is the recommended IaC tool for new AWS projects. It provides type safety, IDE autocomplete, and testability (write unit tests for your infrastructure).
cdk diff shows exactly what changes will be applied before deployment.

SAM (Serverless Application Model)

What it solves: An extension of CloudFormation optimized for serverless applications (Lambda, API Gateway, DynamoDB, Step Functions). Simpler syntax for common serverless patterns.

Key talking points:

SAM templates use shorthand resource types like AWS::Serverless::Function that expand into multiple CloudFormation resources (Lambda, IAM role, event mapping, log group).
SAM CLI provides local development: sam local invoke runs Lambda locally in a Docker container. sam local start-api starts a local API Gateway.
SAM Accelerate (sam sync) speeds up development by deploying code changes directly to the cloud in seconds (bypassing full CloudFormation deployments).
Use SAM for small serverless applications. Use CDK for anything more complex or when you want a real programming language instead of YAML.

IaC in your interview answer

At the end of any system design answer, add: "All of this infrastructure is defined in CDK, deployed via CI/CD pipeline using CodePipeline, with separate stacks for networking, compute, and data. Each environment (dev, staging, production) has its own stack with parameterized configuration." This takes 15 seconds and signals operational maturity.

Quick Reference: When to Use What

This table summarizes the key decision points for the most commonly confused services.

Decision	Use This	When
Compute: simple API	Lambda or Fargate	Lambda for bursty/low-traffic, Fargate for sustained
Compute: need GPU	EC2 (P4d/G5) or SageMaker	EC2 for custom workloads, SageMaker for ML
Database: relational	Aurora	Default for PostgreSQL/MySQL on AWS
Database: key-value	DynamoDB	Known access patterns, single-digit ms latency
Database: cache	ElastiCache Redis	Cache in front of DB, sessions, leaderboards
Queue: async processing	SQS	Decouple services, background jobs
Queue: fan-out	SNS + SQS	One event, multiple consumers
Streaming: real-time	Kinesis Data Streams	Ordered events, replay, multiple consumers
Streaming: Kafka	MSK	Existing Kafka ecosystem, Kafka Connect
Storage: objects	S3	Files, images, backups, data lake
Storage: block	EBS gp3	EC2 instance storage, databases
CDN	CloudFront	All public-facing content
DNS routing	Route 53	Multi-region, failover, latency-based
API management	API Gateway (HTTP API)	Auth, throttling, Lambda integration
Auth: users	Cognito	User sign-up/sign-in for apps
Auth: AWS resources	IAM roles	Service-to-service, cross-account
Secrets	Secrets Manager	Credentials with rotation
ETL	Glue	Standard data transformation
Data warehouse	Redshift Serverless	Frequent complex queries
Ad-hoc query	Athena	Infrequent S3 queries
GenAI	Bedrock	Foundation model API access
Custom ML	SageMaker	Train and deploy custom models
IaC	CDK	Default for new AWS projects

Test Your Understanding

AWS Services Knowledge Check

Loading questions…

Interview Cheat Sheet

When an AWS service question comes up in your interview, use this mental framework:

Name the service and its one-line purpose.
State the architectural pattern: "S3 + CloudFront for static assets" or "SQS + Lambda for async processing."
Mention one production gotcha: "S3 is eventually consistent for overwrites" or "Lambda cold starts are 200ms for Python but 3-10 seconds for Java."
Give a number: latency, throughput, cost, or limit that demonstrates practitioner knowledge.
Mention what you would NOT use: "I would not use DynamoDB here because the access patterns are not yet defined."

This five-step formula takes 30 seconds per service and demonstrates depth without rambling. Interviewers prefer a practitioner who knows trade-offs over someone who recites documentation.

Common Architectural Patterns with AWS Services

To tie everything together, here are the four patterns I reach for most often in system design interviews.

Pattern 1: The basic web app. CloudFront + S3 for static assets, API Gateway + Lambda (or ALB + Fargate) for the API, Aurora for relational data, ElastiCache for caching, Cognito for auth. This covers 80% of system design questions.

Pattern 2: Event-driven microservices. API Gateway receives requests, Lambda processes them, SNS fans out events to SQS queues, each microservice consumes from its own queue. DynamoDB for service-local storage. EventBridge for cross-service routing with content filtering.

Pattern 3: Real-time data pipeline. Kinesis Data Streams ingests events, Lambda processes in real-time, Firehose delivers to S3 in Parquet format, Athena or Redshift queries the data lake. CloudWatch dashboards for operational visibility.

Pattern 4: GenAI application. API Gateway + Lambda for the API layer, Bedrock for model inference, Knowledge Bases + OpenSearch Serverless for RAG, DynamoDB for conversation history, S3 for document storage, Guardrails for content safety.

In every pattern, wrap the infrastructure in CDK, deploy via CI/CD, encrypt everything with KMS, log everything to CloudWatch, and monitor with alarms. These operational concerns are not optional extras; they are table stakes.

The best interview candidates do not just name services. They explain why they chose one service over another, what failure mode they are protecting against, and what the cost trade-off looks like. That is what this guide prepares you for.

Why This Guide Exists

How to use this guide

1. Compute

EC2 (Elastic Compute Cloud)

What it solves: You need a virtual machine with full OS-level control.

Key talking points:

EC2 gives you full control over the OS, runtime, and networking. Use it when you need GPU instances (P4d, G5), custom kernel modules, or long-running stateful processes.
Instance families matter: C-series for compute (API servers), R-series for memory (caches, in-memory DBs), M-series for general purpose, G/P-series for ML inference.
Auto Scaling Groups (ASG) with launch templates are the standard HA pattern. Spread across 3 AZs minimum.
Spot instances save 60-90% for fault-tolerant workloads (batch processing, CI/CD runners, training jobs). Do not use them for your primary API servers.
Graviton (ARM) instances offer ~20% better price/performance for most workloads. I always recommend them unless your binary requires x86.

EC2 is rarely the right first answer

Production gotchas I have seen:

EBS volumes are AZ-locked. If your instance fails and relaunches in a different AZ, the volume does not follow. Use EFS for shared storage or design for statelessness.
The default instance limit per region is surprisingly low (around 20 for some instance types). I have seen production launches fail because nobody requested a limit increase ahead of time.
Placement groups matter for HPC: cluster placement for low latency, spread placement for fault isolation.

Lambda

What it solves: Run code without provisioning or managing servers. You pay per invocation (per 1ms of execution time).

Key talking points:

Lambda scales from zero to thousands of concurrent executions in seconds. The service handles all capacity planning.
Maximum execution time is 15 minutes. Maximum memory is 10GB (CPU scales proportionally with memory).
Lambda pricing is per-invocation ($0.20 per 1M requests) plus duration ($0.0000166667 per GB-second). At high volume, this can be more expensive than containers.
Cold starts vary by runtime: Python/Node.js ~200-500ms, Java ~3-10s (unless you use SnapStart). For latency-sensitive APIs, use provisioned concurrency.
Lambda has a concurrency limit of 1000 per region by default (can be raised to tens of thousands).

The Lambda crossover point

Production gotchas I have seen:

Cold starts are the number one complaint. Use provisioned concurrency for user-facing APIs or switch to SnapStart for Java. For background tasks, cold starts do not matter.
The 6MB payload limit on synchronous invocation (via API Gateway) catches people. For larger payloads, use S3 presigned URLs.
Lambda functions share a regional concurrency pool. One runaway function can starve all others. Use reserved concurrency to isolate critical functions.
Environment variables are limited to 4KB total. Use SSM Parameter Store or Secrets Manager for larger config.

Failure modes practitioners know:

Async invocations retry twice on failure, then go to a dead-letter queue (if configured). If you forget the DLQ, failed events silently vanish.
SQS-triggered Lambdas use long polling. If your function throws an error, the message goes back to the queue and retries in a loop until the visibility timeout expires. Without a DLQ and maxReceiveCount, you get an infinite retry storm.
VPC-attached Lambdas used to have 10-30 second cold starts. AWS fixed this with Hyperplane ENIs in 2019, but I still see interviews ask about it.

ECS and Fargate

What it solves: Run Docker containers at scale without managing the underlying EC2 instances (Fargate) or with managed EC2 capacity (ECS on EC2).

Key talking points:

ECS is AWS's native container orchestrator. Fargate is the serverless compute engine for ECS (and EKS). With Fargate, you define CPU/memory per task and AWS handles placement.
ECS Service with desired count + ALB health checks gives you self-healing. Failed containers restart automatically.
Fargate pricing: pay per vCPU-hour ($0.04048) and per GB-hour ($0.004445). No charge for idle EC2 capacity.
Use ECS on EC2 when you need GPU containers, larger instance sizes, or cost optimization via Spot + Reserved instances at high scale.
ECS Exec lets you SSH into running containers for debugging, similar to kubectl exec.

ECS vs EKS: the interview answer

This is the answer interviewers want. They are testing whether you pick the right tool rather than the most complex one.

Production gotchas I have seen:

Fargate tasks take 30-60 seconds to provision. If your scaling policy reacts too late, you get a latency spike before new tasks come online. Set aggressive scaling thresholds (CPU 40%) and use target-tracking policies.
Task role vs execution role confusion is extremely common. The execution role pulls images from ECR and writes logs. The task role is what your application code uses to call other AWS services. Mix these up and you get cryptic "access denied" errors.
Container health checks in the task definition are separate from ALB health checks. You need both. If you only configure ALB health checks, ECS will not restart a container that is stuck in a bad state but still passing the ALB check.

EKS (Elastic Kubernetes Service)

What it solves: Managed Kubernetes control plane on AWS. You run standard Kubernetes workloads without managing the API server, etcd, or scheduler.

Key talking points:

EKS charges $0.10/hour for the control plane ($73/month) plus your node costs. The control plane is multi-AZ by default.
Use managed node groups for most workloads. Karpenter replaces the Kubernetes Cluster Autoscaler with faster, more efficient node provisioning (launches right-sized instances in seconds).
EKS on Fargate eliminates node management entirely but has limitations: no DaemonSets, no stateful storage, max 4 vCPU/30GB per pod.
IRSA (IAM Roles for Service Accounts) is the correct way to give pods AWS permissions. Do not attach IAM roles to nodes.

EKS operational overhead is real

App Runner

What it solves: The simplest way to deploy a containerized web service on AWS. You point it at a container image or source code, and it handles building, deploying, scaling, and TLS.

Key talking points:

App Runner is "Heroku on AWS." Minimal configuration, automatic scaling from zero to thousands of requests, built-in HTTPS.
Pricing: pay per vCPU-hour and GB-hour when active, plus a small provisioned charge when idle.
No VPC configuration needed by default (but you can connect to VPC resources via a VPC connector).
Use it for prototypes, internal tools, or simple APIs where you want to minimize operational overhead.

When to mention App Runner in an interview

2. Storage

Storage services are the most commonly referenced AWS services in system design interviews. You will mention S3 in almost every design.

Step Functions

Key talking points:

Standard Workflows: long-running (up to 1 year), exactly-once execution, $0.025 per 1,000 state transitions.
Express Workflows: high-volume, short-duration (up to 5 minutes), at-least-once execution, $1.00 per 1M requests. Use for real-time data processing.
Step Functions integrates directly with 200+ AWS services (S3, DynamoDB, ECS, SQS) without writing Lambda glue code. An "optimized integration" calls the service directly from the state machine.
Use Step Functions for: order processing pipelines, data transformation workflows, human approval processes, and any orchestration that spans multiple services.

Step Functions replaces Lambda orchestration code

Now, storage:

S3 (Simple Storage Service)

What it solves: Infinitely scalable object storage. Store anything from user uploads to data lake files to static website assets.

Key talking points:

S3 durability is 99.999999999% (11 nines). You will not lose data. Availability is 99.99% for Standard.
S3 supports 3,500 PUT/POST/DELETE and 5,500 GET requests per second per prefix. Partition your prefixes to scale beyond this (use random prefixes, not date-based).
Maximum object size is 5TB. Use multipart upload for anything over 100MB (required for over 5GB).
S3 Select and S3 Object Lambda let you query or transform data in-place without downloading the entire object.
S3 Event Notifications trigger Lambda, SQS, or SNS when objects are created/deleted, enabling event-driven pipelines.

The presigned URL pattern

Production gotchas I have seen:

S3 is eventually consistent for overwrites (PUT after PUT) and deletes. If you overwrite an object and immediately read it, you might get the old version. Design for this.
S3 bucket names are globally unique across all AWS accounts. If someone takes your desired name, you cannot have it. Use a naming convention like {company}-{env}-{purpose}.
S3 Transfer Acceleration uses CloudFront edge locations for faster uploads from distant clients. It costs extra ($0.04/GB) but reduces upload latency by 50-500% for global users.
Requester Pays buckets shift download costs to the requester. Use this for public datasets to avoid bandwidth cost surprises.

Failure modes practitioners know:

Accidental public bucket exposure has caused countless data breaches. S3 Block Public Access (account-level and bucket-level) should be enabled by default on every account. AWS now blocks public access by default on new buckets.
If you enable versioning, deletes do not actually remove data; they add a delete marker. Your storage costs grow silently. Use lifecycle rules to expire old versions.
Cross-region replication has a lag of seconds to minutes. Do not rely on it for real-time failover.

EBS (Elastic Block Store)

What it solves: Persistent block storage volumes for EC2 instances. Think of it as a virtual hard drive that persists independently from the instance lifecycle.

Key talking points:

EBS is AZ-scoped. A volume in us-east-1a cannot be attached to an instance in us-east-1b.
Volume types matter: gp3 (general purpose, 3000 baseline IOPS, $0.08/GB), io2 (provisioned IOPS up to 64,000, for databases), st1 (throughput-optimized HDD for big data), sc1 (cold HDD for infrequent access).
gp3 is almost always the right choice. It decouples IOPS and throughput from volume size (unlike gp2, where you needed bigger volumes for more IOPS).
EBS snapshots are stored in S3 (incremental). Use them for backups and cross-region disaster recovery.

EBS in system design interviews

EFS (Elastic File System)

What it solves: Managed NFS file system that multiple EC2 instances or containers can mount simultaneously. Unlike EBS, EFS volumes span AZs.

Key talking points:

EFS automatically scales from gigabytes to petabytes. No capacity provisioning needed.
Throughput modes: bursting (scales with size), provisioned (fixed throughput), elastic (automatically scales with workload). Elastic mode is the new default and usually the right choice.
EFS costs more than EBS ($0.30/GB for Standard vs $0.08/GB for gp3). Use it only when you need shared file access.
Use cases: shared config files for a container fleet, CMS media directories, ML training data shared across instances, WordPress uploads.

EFS latency is higher than EBS

EFS single-operation latency is 1-3ms vs 0.1-0.5ms for EBS io2. Do not use EFS as your database storage. Use it for shared file access patterns where latency is not the bottleneck.

FSx

Key talking points:

FSx for Lustre: sub-millisecond latency, hundreds of GB/s throughput. Integrates natively with S3 (auto-import/export). Use for ML training data, HPC simulations, and video rendering.
FSx for Windows: fully managed Active Directory integrated Windows file shares. Use for Windows application workloads (SQL Server, .NET apps, IIS shared content).
FSx for NetApp ONTAP: multi-protocol (NFS, SMB, iSCSI), deduplication, compression, snapshots, cross-region replication. Enterprise storage without managing the appliance.
In interviews, FSx rarely comes up directly. Mention it when the design requires high-throughput parallel I/O (ML training at scale) or Windows-native file shares.

When to mention FSx

3. Databases

Database selection is the single most impactful decision in system design. Get it wrong and you spend months migrating. Get it right and the system scales naturally.

RDS (Relational Database Service)

What it solves: Managed relational databases (PostgreSQL, MySQL, MariaDB, Oracle, SQL Server). AWS handles patching, backups, failover, and replication.

Key talking points:

Multi-AZ deployment provides an automatic failover standby (30-60 second failover). This is synchronous replication, not a read replica.
Read replicas are async. You can have up to 15 read replicas (5 for MySQL/MariaDB on basic RDS, 15 on Aurora). Cross-region replicas enable global reads.
RDS Proxy pools database connections, reducing the overhead of thousands of Lambda or container connections hitting the database.
Automated backups with point-in-time recovery (up to 35 days). Snapshots for longer retention.

RDS vs Aurora: the interview shortcut

This is almost always the right answer. Standard RDS is cheaper for small, low-traffic databases.

Production gotchas I have seen:

Connection limits are instance-size dependent. A db.t3.micro supports ~60 connections. I have seen production outages because developers launched with a tiny instance during development and forgot to resize before go-live.
Multi-AZ failover changes your database endpoint's DNS record. Applications with DNS caching (JVM default caches for 30 seconds) may continue connecting to the old instance. Set TTL to 5 seconds or use RDS Proxy.
Parameter group changes (like changing max_connections) often require a reboot. Schedule these during maintenance windows.

Aurora

What it solves: AWS's cloud-native relational database (MySQL and PostgreSQL compatible) with distributed storage that separates compute from storage.

Key talking points:

Aurora stores 6 copies of your data across 3 AZs. Writes require 4/6 quorum, reads require 3/6. This means it tolerates losing an entire AZ plus one more node.
Storage auto-scales from 10GB to 128TB. You never provision storage.
Aurora Serverless v2 scales compute from 0.5 to 256 ACUs (Aurora Capacity Units) based on load. Scales in increments of 0.5 ACU in milliseconds. Ideal for variable workloads.
Aurora Global Database provides cross-region read replicas with under 1 second replication lag and promotes a secondary region in under 1 minute for disaster recovery.
Aurora DSQL (released 2024) is AWS's distributed SQL offering, providing active-active multi-region writes with strong consistency.

Aurora is not cheap

DynamoDB

What it solves: Fully managed NoSQL key-value and document database. Single-digit millisecond performance at any scale, with no capacity planning for on-demand mode.

Key talking points:

DynamoDB has two capacity modes: on-demand (pay per request, no planning) and provisioned (cheaper at steady-state, requires you to set read/write capacity units).
Partition key design is everything. A hot partition key caps your throughput. Use high-cardinality keys (userId, orderId) and avoid time-based partitions that create hot spots.
Global Secondary Indexes (GSI) let you query on non-primary-key attributes. Each GSI is essentially a separate table with its own capacity.
DynamoDB Streams provides change data capture (CDC) for event-driven architectures, search indexing, and cross-region replication.
Global Tables replicate data across regions with sub-second replication lag, enabling active-active architectures.

DynamoDB is not a relational database replacement

Production gotchas I have seen:

The single-partition throughput limit is 3,000 RCU and 1,000 WCU (about 3,000 reads and 1,000 writes per second per partition). If all traffic hits one partition key, you are throttled regardless of your total table capacity.
On-demand mode has a "burst" limit: it can instantly handle 2x your previous peak. If traffic jumps from 100 TPS to 10,000 TPS without ramp-up, you get throttled. Pre-warm the table by gradually increasing traffic.
GSI updates are eventually consistent. If you write to the base table and immediately query the GSI, you might not see the write.
Item size limit is 400KB. For larger documents, store a pointer to S3.

ElastiCache

What it solves: Managed Redis or Memcached for in-memory caching. Sub-millisecond read latency for frequently accessed data.

Key talking points:

Redis on ElastiCache supports data structures (strings, hashes, sorted sets, lists, streams), pub/sub, Lua scripting, and cluster mode for horizontal scaling.
Cluster mode enabled: data is sharded across up to 500 shards with up to 5 replicas each. This scales to millions of operations per second.
Memcached is simpler (pure key-value, multi-threaded) and better for simple caching. Redis is better for everything else.
ElastiCache Serverless (released 2023) automatically scales cache capacity. You pay per GB stored and per ECPUs consumed.

Cache-aside is the default pattern

Production gotchas I have seen:

Redis failover on ElastiCache takes 15-30 seconds. During failover, writes fail. Your application needs retry logic and must handle connection interruptions.
The "thundering herd" problem: when a popular cache key expires, hundreds of concurrent requests miss the cache and slam the database simultaneously. Use cache stampede protection (lock or probabilistic early recomputation).
ElastiCache Redis runs in your VPC. Lambda functions in a VPC can connect to it, but Lambda functions outside a VPC cannot. Plan your networking.

MemoryDB for Redis

What it solves: Redis-compatible, durable in-memory database. Unlike ElastiCache, MemoryDB provides durability via a distributed transaction log, making it suitable as a primary database.

Key talking points:

MemoryDB stores data durably across multiple AZs using a transaction log. You do not lose data on node failure (unlike ElastiCache, where data can be lost during failover).
Microsecond read latency, single-digit millisecond write latency.
Use MemoryDB when Redis is your primary data store (session store, leaderboard, feature store). Use ElastiCache when Redis is a cache in front of another database.
MemoryDB costs about 2x ElastiCache for equivalent instance sizes because of the durability guarantee.

MemoryDB vs ElastiCache: the decision

DocumentDB

What it solves: Managed document database compatible with MongoDB workloads.

Key talking points:

DocumentDB implements the MongoDB 3.6/4.0/5.0 wire protocol. Most MongoDB drivers and tools work without modification.
Under the hood, DocumentDB uses a similar architecture to Aurora (shared distributed storage, separate compute). It is not running MongoDB code.
Use DocumentDB when you have an existing MongoDB workload and want to move to managed AWS infrastructure. For new projects, I recommend DynamoDB for key-value access and Aurora for relational access.

DocumentDB is not MongoDB

4. Messaging and Streaming

Messaging services decouple producers from consumers. In system design, they are how you go from "synchronous monolith" to "scalable distributed system." Know when to use queues vs topics vs streams.

SQS (Simple Queue Service)

What it solves: Fully managed message queue for decoupling microservices. Producers send messages, consumers poll and process them independently.

Key talking points:

Standard queues: nearly unlimited throughput, at-least-once delivery, best-effort ordering. FIFO queues: exactly-once processing, strict ordering, limited to 3,000 messages/second (with batching).
Visibility timeout: once a consumer receives a message, it is invisible to other consumers for a configurable period (default 30 seconds). If the consumer does not delete the message before the timeout, it becomes visible again for retry.
Long polling (WaitTimeSeconds: 20) reduces empty responses and API costs. Always use long polling.
Dead Letter Queues (DLQ) capture messages that fail processing after a configured number of attempts. Never deploy an SQS consumer without a DLQ.
Message retention: up to 14 days. Maximum message size: 256KB. For larger payloads, store the body in S3 and send an S3 pointer in the message (using the Extended Client Library).

SQS + Lambda is the default async pattern

Production gotchas I have seen:

Standard SQS can deliver duplicates. Your consumer must be idempotent. The most common approach: use a deduplication key in a database and check before processing.
Visibility timeout must be longer than your processing time. If processing takes 5 minutes and your visibility timeout is 30 seconds, the message reappears and gets processed by another consumer concurrently, causing duplicate processing.
FIFO queues require a MessageGroupId. Messages within the same group are ordered, but different groups process in parallel. Do not use a single group for all messages (this serializes everything).

What it solves: Fully managed pub/sub messaging. A single message published to a topic fans out to multiple subscribers simultaneously.

Key talking points:

SNS supports multiple subscriber types: SQS, Lambda, HTTP/S endpoints, email, SMS, mobile push.
The SNS + SQS fan-out pattern is the standard for event-driven architectures. Publish once to SNS, fan out to multiple SQS queues, each powering an independent consumer.
SNS message filtering lets subscribers receive only messages matching a filter policy (based on message attributes). This avoids every consumer receiving every message.
SNS FIFO topics pair with SQS FIFO queues for ordered fan-out.

SNS vs SQS: the key distinction

Kinesis Data Streams

What it solves: Real-time streaming data ingestion. Collect, process, and analyze hundreds of thousands of records per second from clickstreams, IoT sensors, logs, and application events.

Key talking points:

Kinesis uses shards for throughput. Each shard supports 1MB/s or 1,000 records/s for writes and 2MB/s for reads. Scale by adding shards.
Data retention: 24 hours by default, extendable to 365 days. Unlike SQS, Kinesis retains data after consumption, supporting multiple consumers and replay.
Enhanced fan-out gives each consumer a dedicated 2MB/s read throughput per shard (vs the shared 2MB/s without it). Use this when you have more than 2 consumers.
Kinesis is ordered within a shard. The partition key determines the shard. Same partition key = same shard = ordered processing.

Kinesis vs SQS: the decision matrix

Production gotchas I have seen:

"Hot shards" are the number one operational problem. If 80% of your events have the same partition key, one shard handles 80% of the traffic. Use high-cardinality partition keys.
Kinesis has a 5 read transactions per second per shard limit (shared across all consumers). With 3 consumers polling, you exhaust this quickly. Use enhanced fan-out for production workloads.
Resharding (splitting or merging shards) is manual and takes time. Plan your initial shard count based on peak throughput, not average.

EventBridge

What it solves: Serverless event bus for routing events between AWS services, SaaS applications, and your custom applications based on rules and content-based filtering.

Key talking points:

EventBridge Scheduler replaces CloudWatch Events for cron jobs. One-time or recurring schedules with built-in retry, dead-letter queues, and timezone support.
EventBridge Pipes connects sources (SQS, Kinesis, DynamoDB Streams) to targets with optional filtering and transformation. No Lambda glue code needed.
Content-based filtering evaluates event content (not just source). You can route "order.placed" events where amount > $1000 to a fraud-check Lambda.
Schema Registry automatically discovers and stores event schemas, enabling code generation for producers and consumers.

EventBridge is the modern default for event routing

MSK (Managed Streaming for Apache Kafka)

What it solves: Fully managed Apache Kafka for teams that need Kafka-specific features (log compaction, exactly-once semantics, Kafka Streams, Kafka Connect) with reduced operational burden.

Key talking points:

MSK manages Kafka brokers, Apache ZooKeeper (or KRaft mode), patching, and storage. You still manage topics, partitions, and consumer groups.
MSK Serverless eliminates cluster management entirely. You pay per data ingested and stored, with automatic capacity scaling.
Use MSK when: you have existing Kafka expertise, need Kafka-specific features (log compaction, Kafka Connect ecosystem), or are migrating from self-managed Kafka. Use Kinesis when: you want simpler operations and deeper AWS integration.
MSK pricing: per broker instance-hour plus EBS storage. A 3-broker cluster on m5.large costs ~$500/month before storage.

Kafka vs Kinesis in interviews

5. Networking and Content Delivery

CloudFront

What it solves: Global content delivery network (CDN) with 450+ edge locations. Caches static and dynamic content close to users for lower latency.

Key talking points:

CloudFront supports multiple origins with path-based routing. Route /api/* to your ALB and /* to S3. This is the standard pattern for serving a single-page app with an API backend.
Origin Shield adds a centralized caching tier between edge locations and your origin. It reduces origin load by collapsing duplicate requests from different edges.
Lambda@Edge and CloudFront Functions run code at the edge. Use CloudFront Functions for lightweight transformations (URL rewriting, header manipulation) and Lambda@Edge for heavier logic (authentication, A/B testing, dynamic content generation).
CloudFront signed URLs and signed cookies restrict access to private content. Use signed URLs for individual files, signed cookies for multiple files (like video segments).
Data transfer from S3 to CloudFront is free. You only pay for CloudFront data transfer to the user ($0.085/GB for the first 10TB).

CloudFront as a security layer

Production gotchas I have seen:

CloudFront caches 4xx and 5xx error responses by default. If your origin returns a temporary 500, CloudFront caches it and serves it to all users. Configure custom error caching TTLs (set error TTL to 0 or 5 seconds).
Cache invalidation is slow (takes 1-5 minutes) and the first 1,000 invalidation paths per month are free, then $0.005 per path. Instead of invalidation, use versioned filenames (e.g., app.abc123.js).
CloudFront behaviors (path patterns) are evaluated in order. The most specific pattern must come first. I have seen wildcard patterns accidentally match traffic intended for a more specific path.

Route 53

What it solves: Managed DNS service with health checking and traffic routing policies (weighted, latency-based, geolocation, failover).

Key talking points:

Routing policies: Simple (single resource), Weighted (distribute a percentage of traffic), Latency (route to lowest-latency region), Geolocation (route by user location), Failover (active-passive DR), and Multivalue (return multiple healthy IPs).
Health checks monitor endpoints and trigger failover. Check interval: 30 seconds (or 10 seconds for fast health checks). Combine with CloudWatch alarms for complex health evaluation.
Route 53 is the only AWS service with a 100% SLA. It is globally distributed and highly available by design.
Alias records are Route 53's special record type that maps a domain directly to an AWS resource (ALB, CloudFront, S3 website) without a CNAME's extra DNS hop.

Route 53 in multi-region designs

API Gateway

What it solves: Managed API proxy that handles authentication, throttling, request transformation, and API versioning for REST, HTTP, and WebSocket APIs.

Key talking points:

REST API vs HTTP API: REST API has more features (request validation, API keys, usage plans, caching, AWS X-Ray tracing) but costs 3.5x more and has higher latency. HTTP API is the right choice for most Lambda proxy integrations.
API Gateway throttling: 10,000 requests/second account-level limit (adjustable), with per-stage and per-method limits. Protects your backend from traffic spikes.
WebSocket API maintains persistent connections for real-time features (chat, notifications, live updates). Manages connection state and routes frames to Lambda.
API Gateway caching (REST API only) puts a TTL-based cache in front of your backend, reducing Lambda invocations and backend load.

API Gateway payload limits

ALB and NLB (Elastic Load Balancing)

What it solves: Distribute incoming traffic across multiple targets (EC2, containers, Lambda, IP addresses).

Key talking points:

ALB (Layer 7): Routes based on HTTP path, host header, query string, and HTTP method. Supports sticky sessions, WebSocket, HTTP/2, and gRPC. Use for web applications and REST APIs.
NLB (Layer 4): Routes TCP/UDP connections. Provides static IP addresses, ultra-low latency (~100us for connection setup), and millions of requests per second. Use for gRPC, gaming, IoT, and anything requiring static IPs or non-HTTP protocols.
Cross-zone load balancing distributes traffic evenly across all targets in all AZs. Enabled by default on ALB, optional on NLB.
ALB costs ~$0.0225/hour plus $0.008 per LCU (Load Balancer Capacity Unit). NLB costs ~$0.0225/hour plus $0.006 per NLCU.

VPC (Virtual Private Cloud)

What it solves: Isolated virtual network within AWS. VPCs define the network topology for all your resources: subnets, routing tables, network ACLs, and security groups.

Key talking points:

Every AWS resource that sits on a network (EC2, RDS, ECS, Lambda with VPC access) runs inside a VPC. Understanding VPC design is fundamental.
Standard VPC architecture: public subnets (internet-facing ALB, NAT gateways), private subnets (app servers, databases), across 3 AZs.
Security groups are stateful (allow return traffic automatically). Network ACLs are stateless (must explicitly allow both inbound and outbound).
VPC peering connects two VPCs for private communication. Transit Gateway simplifies multi-VPC routing (hub-and-spoke model).
VPC endpoints (Gateway endpoints for S3/DynamoDB, Interface endpoints for other services) keep AWS API traffic on the private network, avoiding NAT gateway costs.

NAT Gateway cost trap

6. Security and Identity

IAM (Identity and Access Management)

What it solves: Controls who can do what in your AWS account. Every API call to AWS is authenticated and authorized through IAM.

Key talking points:

Principle of least privilege: grant only the permissions required to perform a task. Use specific resource ARNs and actions, not *.
IAM roles are preferred over IAM users for everything except human console access. EC2 instance profiles, ECS task roles, Lambda execution roles, and IRSA for EKS pods.
Service Control Policies (SCPs) in AWS Organizations set guardrails across your entire account structure. Use them to prevent disabling CloudTrail, creating unencrypted resources, or deploying in unapproved regions.
IAM policy evaluation order: explicit deny wins, then check SCPs, then identity policies, then resource policies, then permission boundaries.

Never use IAM access keys in code

Cognito

What it solves: User authentication and authorization for web and mobile applications. Provides user sign-up, sign-in, and token management without building your own auth system.

Key talking points:

User Pools handle authentication (sign-up, sign-in, MFA, password policies, social login federation with Google/Facebook/Apple).
Identity Pools (Federated Identities) exchange Cognito tokens for temporary AWS credentials, letting users directly access AWS resources (S3, DynamoDB) with fine-grained IAM policies.
Cognito issues standard JWTs (ID token, access token, refresh token). API Gateway can validate these directly without custom code.
Supports OpenID Connect and SAML 2.0 for enterprise SSO federation.
Pricing: free for the first 50,000 monthly active users (MAUs), then $0.0055/MAU.

Cognito limitations to know

KMS (Key Management Service)

What it solves: Create and manage encryption keys for data encryption at rest and in transit across AWS services.

Key talking points:

KMS integrates with almost every AWS service (S3, EBS, RDS, DynamoDB, SQS, Kinesis) for envelope encryption. You enable encryption with a checkbox and point to a KMS key.
AWS-managed keys (free), Customer-managed keys ($1/month per key + $0.03 per 10,000 API calls), and custom key stores (CloudHSM backed).
Envelope encryption: KMS generates a data key, encrypts your data with the data key, then encrypts the data key with the KMS key. Only the encrypted data key is stored alongside your data.
Key rotation: automatic annual rotation for customer-managed keys. Old versions are retained for decrypting existing data.

Encryption in interview answers

WAF (Web Application Firewall)

What it solves: Protects web applications from common exploits (SQL injection, XSS, OWASP Top 10) and bot traffic by filtering HTTP/HTTPS requests at CloudFront, ALB, or API Gateway.

Key talking points:

WAF rules evaluate request parameters (headers, body, URI, query strings, IP) and allow, block, count, or CAPTCHA-challenge matching requests.
AWS Managed Rules provide pre-built rule sets: Core Rule Set (OWASP), SQL injection, XSS, IP reputation list, and Bot Control. These cost $1-10/month per rule group.
Rate-based rules block IPs exceeding a threshold (e.g., 2000 requests per 5 minutes). This is your first line of defense against L7 DDoS.
WAF is attached to CloudFront distributions, ALBs, or API Gateway stages. Always attach it to CloudFront for global protection.

WAF + Shield for DDoS protection

Secrets Manager

What it solves: Store, rotate, and retrieve database credentials, API keys, and other secrets. Applications call the Secrets Manager API instead of hardcoding secrets.

Key talking points:

Automatic rotation: Secrets Manager can rotate RDS, Redshift, and DocumentDB credentials automatically on a schedule. Custom Lambda functions handle rotation for other secret types.
Pricing: $0.40 per secret per month + $0.05 per 10,000 API calls. Use SSM Parameter Store (SecureString, free for standard parameters) for non-rotating secrets to save cost.
Secrets Manager integrates with ECS (inject secrets as environment variables), Lambda (environment variables or direct API call), and RDS Proxy (automatic credential handling).

7. Observability

You cannot operate a system you cannot observe. Observability services provide the metrics, logs, and traces you need to detect problems, diagnose root causes, and verify that your system is healthy.

CloudWatch

What it solves: Unified monitoring for AWS resources and applications. Collects metrics, logs, and events. Triggers alarms and automated actions.

Key talking points:

CloudWatch Metrics: built-in metrics for every AWS service (free, 5-minute resolution or 1-minute for detailed monitoring). Custom metrics via PutMetricData API ($0.30/metric/month).
CloudWatch Logs: centralized log storage. Use metric filters to extract numeric values from logs and create alarms (e.g., count ERROR occurrences).
CloudWatch Alarms: threshold-based or anomaly detection. Alarm states: OK, ALARM, INSUFFICIENT_DATA. Actions: SNS notification, Auto Scaling trigger, or EC2 action.
CloudWatch Logs Insights: SQL-like query language for searching and analyzing log data. Fast and cheap for ad-hoc troubleshooting.
Embedded Metric Format (EMF): emit custom metrics directly from your application logs without separate PutMetricData calls.

The three alarms every system needs

Error rate: alarm when 5xx errors exceed 1% of total requests. 2. Latency: alarm when P99 latency exceeds your SLA (e.g., 500ms). 3. Queue depth: alarm when SQS ApproximateAgeOfOldestMessage exceeds 5 minutes. These three catch most production incidents before users notice.

X-Ray

What it solves: Distributed tracing for microservices. Traces a request as it flows across Lambda functions, API Gateway, ECS services, and other AWS resources.

Key talking points:

X-Ray creates a service map showing how requests flow through your system and where latency or errors originate.
The X-Ray SDK instruments outgoing HTTP calls, AWS SDK calls, and SQL queries automatically. Add it to your Lambda function or ECS task with minimal code.
Trace sampling reduces cost: you do not need to trace every request. The default is 1 request/second + 5% of additional requests.
X-Ray integrates with CloudWatch ServiceLens for a unified view of traces, metrics, and logs.

X-Ray vs third-party tracing

CloudTrail

What it solves: Logs every API call made in your AWS account. Who did what, when, from where, and to which resource.

Key talking points:

CloudTrail records API calls to S3 for auditing, compliance, and forensics. Enabled by default for management events (IAM, EC2 operations) for 90 days.
Data events (S3 object-level operations, Lambda invocations) require explicit configuration and cost $0.10 per 100,000 events.
CloudTrail Lake provides SQL queryable event storage for security investigations (who deleted this bucket? when was this IAM policy changed?).
Organization trails capture events across all accounts in an AWS Organization.

CloudTrail is your forensic lifeline

8. AI/ML Services

AI/ML services are increasingly appearing in system design interviews, especially for recommendation engines, content moderation, document processing, and generative AI applications.

SageMaker

What it solves: End-to-end ML platform for building, training, and deploying machine learning models.

Key talking points:

SageMaker manages the full ML lifecycle: data labeling (Ground Truth), notebooks, training, hyperparameter tuning, model registry, endpoint deployment, and monitoring.
Real-time endpoints: dedicated instances serving your model with autoscaling. For GPUs, use ml.g5.xlarge for inference ($1.41/hour).
SageMaker Serverless Inference: scale to zero when no traffic, cold start of 1-2 minutes. Good for low-traffic or experimental endpoints.
Multi-model endpoints: load multiple models on a single instance and route by request. Saves cost when serving many low-traffic models.

SageMaker in interviews

Bedrock

What it solves: Managed service for accessing foundation models (Claude, Llama, Titan, Mistral) via API for generative AI applications without managing infrastructure.

Key talking points:

Bedrock provides a unified API (Converse API) for calling different foundation models. Switching models requires changing one parameter, not your architecture.
Knowledge Bases implement RAG (Retrieval-Augmented Generation): ingest documents from S3, embed them, store in OpenSearch Serverless, and retrieve relevant chunks to augment model prompts.
Bedrock Agents can call tools (Lambda functions), maintain conversation state, and orchestrate multi-step workflows autonomously.
Guardrails filter harmful content, block PII, enforce topic policies, and control hallucination. Layer them in front of any model.
Pricing: per-token (input and output). Claude 3.5 Sonnet on Bedrock costs $3/million input tokens and $15/million output tokens. Provisioned Throughput reduces per-token cost for steady workloads.

Bedrock vs SageMaker for GenAI

Comprehend, Rekognition, and Textract

What it solves: Pre-built AI services for natural language processing (Comprehend), image/video analysis (Rekognition), and document extraction (Textract).

Key talking points:

Comprehend: sentiment analysis, entity extraction, language detection, topic modeling, PII detection. No ML expertise needed. Use for content moderation, document classification, and customer feedback analysis.
Rekognition: object/face detection, celebrity recognition, text in images, content moderation, custom labels for domain-specific detection. Use for photo moderation, identity verification, and visual search.
Textract: extract text, tables, and forms from scanned documents and PDFs. Goes beyond OCR by understanding document structure. Use for invoice processing, identity document extraction, and form digitization.

Pre-built AI in system design

9. Data and Analytics

Data and analytics services come up in system design interviews involving reporting, ETL pipelines, real-time dashboards, and data warehousing.

Redshift

What it solves: Managed data warehouse for petabyte-scale analytics using SQL. Columnar storage optimized for OLAP queries over large datasets.

Key talking points:

Redshift Serverless eliminates cluster management. You pay per RPU-hour (Redshift Processing Unit) based on actual query compute. Ideal for variable workloads.
Redshift Spectrum queries data directly in S3 without loading it into Redshift tables. Combine Redshift tables (hot data) with S3 (cold data) for a lakehouse architecture.
Redshift ML runs SageMaker models directly in SQL with CREATE MODEL statements.
Concurrency scaling automatically adds cluster capacity when query queues build up. No manual intervention needed.

Redshift vs Athena: the decision

Athena

What it solves: Serverless SQL query engine for data in S3. No infrastructure to manage, pay per query ($5 per TB scanned).

Key talking points:

Athena is Presto/Trino under the hood. It supports standard SQL, window functions, and complex joins.
Cost optimization is about scanning less data: partition your S3 data by date/region, use columnar formats (Parquet, ORC), and compress with Snappy or ZSTD. Converting from JSON to Parquet can reduce query costs by 90%.
Athena Federated Query lets you query RDS, DynamoDB, Redshift, and other sources alongside S3 data.
Use the Glue Data Catalog as a shared metastore for both Athena and Redshift Spectrum.

Glue

What it solves: Managed ETL (Extract, Transform, Load) service for data preparation and integration. Serverless Apache Spark engine plus a central data catalog.

Key talking points:

Glue ETL jobs run Apache Spark serverlessly. You define the transformation logic, and Glue manages the cluster.
Glue Data Catalog is the central metadata store for S3 data lakes. Athena, Redshift Spectrum, and EMR all use it for table definitions.
Glue Crawlers automatically discover schema from S3 data and populate the catalog. Save hours of manual schema definition.
Glue Studio provides a visual ETL editor for non-engineers (data analysts, BI teams).
Pricing: $0.44 per DPU-hour (Data Processing Unit). A small job with 2 DPUs running for 10 minutes costs $0.15.

Glue as the default ETL choice

EMR (Elastic MapReduce)

What it solves: Managed Hadoop/Spark/Hive/Presto clusters for big data processing. Full control over the cluster configuration and open-source framework versions.

Key talking points:

EMR is the right choice when Glue ETL is insufficient: custom Spark configurations, specific library versions, notebook-based exploration, or Hadoop ecosystem tools (HBase, Flink, Hudi).
EMR Serverless eliminates cluster management. Submit Spark or Hive jobs and EMR provisions the compute. Pricing: per vCPU-hour and per GB-hour.
EMR on EKS runs Spark on your existing Kubernetes cluster, sharing node resources with other workloads.
Use Spot instances for EMR task nodes (not core or master nodes) to reduce costs by 60-90%.

EMR vs Glue in 10 seconds

Kinesis Data Firehose

What it solves: Fully managed delivery stream that captures, transforms, and loads streaming data into S3, Redshift, OpenSearch, and HTTP endpoints. Zero administration.

Key talking points:

Firehose automatically batches, compresses, and encrypts data before delivery. Minimum buffer interval: 60 seconds.
Built-in format conversion: JSON to Parquet or ORC using the Glue Data Catalog schema. This eliminates a separate ETL step for data lake ingestion.
Lambda transformation: run a Lambda function on each batch for data enrichment, filtering, or reformatting before delivery.
Firehose is the simplest path from real-time data to S3. "Kinesis Data Streams for real-time processing, Firehose for delivery to storage."

The streaming data pipeline pattern

10. Infrastructure as Code

CloudFormation

What it solves: AWS-native IaC service. Define your infrastructure as JSON or YAML templates and deploy them as versioned stacks.

Key talking points:

CloudFormation creates, updates, and deletes resources in dependency order. If creation fails, it rolls back to the previous state.
Stack sets deploy the same template across multiple accounts and regions simultaneously (for AWS Organizations).
Drift detection identifies resources that have been manually modified outside CloudFormation.
Change sets preview changes before applying them, reducing "surprise" modifications.

CloudFormation vs Terraform

CDK (Cloud Development Kit)

What it solves: Define infrastructure using familiar programming languages (TypeScript, Python, Java, Go, C#). CDK synthesizes your code into CloudFormation templates.

Key talking points:

CDK Constructs are reusable building blocks at three levels: L1 (raw CloudFormation resources), L2 (opinionated defaults with best practices), L3 (patterns that combine multiple resources).
L2 constructs handle 80% of the boilerplate: a Function construct creates the Lambda, execution role, log group, and CloudWatch alarms. A RestApi construct creates API Gateway, stages, and deployment.
CDK is the recommended IaC tool for new AWS projects. It provides type safety, IDE autocomplete, and testability (write unit tests for your infrastructure).
cdk diff shows exactly what changes will be applied before deployment.

SAM (Serverless Application Model)

What it solves: An extension of CloudFormation optimized for serverless applications (Lambda, API Gateway, DynamoDB, Step Functions). Simpler syntax for common serverless patterns.

Key talking points:

SAM templates use shorthand resource types like AWS::Serverless::Function that expand into multiple CloudFormation resources (Lambda, IAM role, event mapping, log group).
SAM CLI provides local development: sam local invoke runs Lambda locally in a Docker container. sam local start-api starts a local API Gateway.
SAM Accelerate (sam sync) speeds up development by deploying code changes directly to the cloud in seconds (bypassing full CloudFormation deployments).
Use SAM for small serverless applications. Use CDK for anything more complex or when you want a real programming language instead of YAML.

IaC in your interview answer

Quick Reference: When to Use What

This table summarizes the key decision points for the most commonly confused services.

Decision	Use This	When
Compute: simple API	Lambda or Fargate	Lambda for bursty/low-traffic, Fargate for sustained
Compute: need GPU	EC2 (P4d/G5) or SageMaker	EC2 for custom workloads, SageMaker for ML
Database: relational	Aurora	Default for PostgreSQL/MySQL on AWS
Database: key-value	DynamoDB	Known access patterns, single-digit ms latency
Database: cache	ElastiCache Redis	Cache in front of DB, sessions, leaderboards
Queue: async processing	SQS	Decouple services, background jobs
Queue: fan-out	SNS + SQS	One event, multiple consumers
Streaming: real-time	Kinesis Data Streams	Ordered events, replay, multiple consumers
Streaming: Kafka	MSK	Existing Kafka ecosystem, Kafka Connect
Storage: objects	S3	Files, images, backups, data lake
Storage: block	EBS gp3	EC2 instance storage, databases
CDN	CloudFront	All public-facing content
DNS routing	Route 53	Multi-region, failover, latency-based
API management	API Gateway (HTTP API)	Auth, throttling, Lambda integration
Auth: users	Cognito	User sign-up/sign-in for apps
Auth: AWS resources	IAM roles	Service-to-service, cross-account
Secrets	Secrets Manager	Credentials with rotation
ETL	Glue	Standard data transformation
Data warehouse	Redshift Serverless	Frequent complex queries
Ad-hoc query	Athena	Infrequent S3 queries
GenAI	Bedrock	Foundation model API access
Custom ML	SageMaker	Train and deploy custom models
IaC	CDK	Default for new AWS projects

Test Your Understanding

AWS Services Knowledge Check

Loading questions…

Interview Cheat Sheet

When an AWS service question comes up in your interview, use this mental framework:

Name the service and its one-line purpose.
State the architectural pattern: "S3 + CloudFront for static assets" or "SQS + Lambda for async processing."
Mention one production gotcha: "S3 is eventually consistent for overwrites" or "Lambda cold starts are 200ms for Python but 3-10 seconds for Java."
Give a number: latency, throughput, cost, or limit that demonstrates practitioner knowledge.
Mention what you would NOT use: "I would not use DynamoDB here because the access patterns are not yet defined."

This five-step formula takes 30 seconds per service and demonstrates depth without rambling. Interviewers prefer a practitioner who knows trade-offs over someone who recites documentation.

Common Architectural Patterns with AWS Services

To tie everything together, here are the four patterns I reach for most often in system design interviews.

AWS Services Knowledge Check

Comments

AWS Services Knowledge Check

Comments