Push vs pull
When to use WebSocket, SSE, long polling, and webhooks, and the backpressure and fan-out traps that kill push systems quietly at scale.
TL;DR
- Use push (WebSocket/SSE) when sub-second latency matters to the user and the server can predict when to send: chat, live prices, collaborative editing.
- Use pull (polling) when clients need data on their own schedule, the server doesn't know which clients care about a given event, or the data changes less often than the poll interval.
- Use webhooks when the consumer is a server, not a browser, and you want reliable server-to-server event delivery with retry semantics.
- Use long polling as the stepping stone: lower latency than polling, no persistent connection overhead, works through any proxy or firewall.
- Push looks simpler until you hit 100K concurrent connections. At that scale you're managing connection state, fan-out amplification, backpressure, and stale connection cleanup simultaneously.
- Pull looks wasteful until you realize it gives you natural backpressure, horizontal scalability, and zero connection-state management for free.
The Framing
In 2013, a VC-backed retail startup built a live inventory dashboard for warehouse workers. They chose WebSockets. Every item scan showed up immediately on every manager's screen. Demo looked incredible.
Three months after launch, their on-call engineer got paged at 2 a.m. Their WebSocket server was struggling to maintain 8,000 connections, each receiving 40 messages per second from 200 inventory scanners. Memory on the single WebSocket server had climbed past 90%. The server was spending more time managing TCP state than actually delivering messages.
The fix wasn't a bigger machine. It was realizing their actual requirement was "managers see updates within 5 seconds," not "managers see updates in 50ms." A 5-second poll interval would have worked perfectly, scaled effortlessly across multiple app servers, and required zero specialized infrastructure.
The real question is never "push or pull?" It is: what is the actual latency requirement, and who needs to know when data changes?
How Each Works
Push: The Server Is in Control
In a push model, the server holds the initiative. When an event occurs, the server sends it to all interested clients without waiting to be asked. The client's job is to stay connected and receive.
The server must maintain some form of persistent state: which clients are connected, and which events they care about. Every event multiplies by the number of subscribers. A single price_update event sent to 10,000 connected clients requires 10,000 individual TCP writes.
The server bears the full cost of delivering each event to every subscriber. The client bears almost no cost. This asymmetry is fine at small scale and becomes the bottleneck at large scale.
I keep this asymmetry as the first thing I draw on the whiteboard when reviewing any push design: how many subscribers per event, and how often does the event fire? Those two numbers tell you whether push is feasible before you write a single line of code.
Pull: The Client Is in Control
In a pull model, the client holds the initiative. It asks for data when it wants it. The server responds and closes the connection; state lives in the database.
I'll often recommend starting here even when the team is leaning toward push. Validate the actual latency requirement first. If 5 seconds is acceptable, the polling architecture gives you horizontal scalability and backpressure for free. If it's overwhelmed, it simply doesn't poll. The server never needs to know who is listening or when.
The maximum latency from event to client awareness equals the poll interval. If a price changes at t=0 and the client polls every 5 seconds, the client may not see the change until t=5. That's the cost you pay for simplicity.
The Push Mechanisms
Push is not one thing. It's a family of protocols with very different properties.
WebSocket: Bidirectional Real-Time Channel
WebSocket starts as an HTTP connection, then upgrades via the Upgrade: websocket header to a raw TCP tunnel. From that point on, both sides can send frames at any time, in either direction, with no headers on each frame.
// Server: Node.js with ws library
const wss = new WebSocketServer({ port: 8080 });
wss.on('connection', (ws) => {
// Register client in your subscription store
subscriptionStore.add(ws);
ws.on('message', (data) => {
let msg: Record<string, unknown>;
try {
msg = JSON.parse(data.toString()) as Record<string, unknown>;
} catch {
ws.send(JSON.stringify({ error: 'invalid_json' }));
return;
}
if (msg.action === 'subscribe') {
subscriptionStore.subscribe(ws, msg.channel as string);
}
});
ws.on('close', () => {
// CRITICAL: always clean up or your store leaks stale sockets
subscriptionStore.remove(ws);
});
});
// When an event fires elsewhere in your system:
function broadcastPriceUpdate(ticker: string, price: number) {
const subscribers = subscriptionStore.getByChannel(`prices:${ticker}`);
const payload = JSON.stringify({ event: 'price', ticker, price });
for (const ws of subscribers) {
if (ws.readyState === WebSocket.OPEN) {
ws.send(payload); // one TCP write per subscriber
}
}
}
The readyState check before sending is critical. Stale connections that haven't sent a close frame yet will cause writes to dead sockets and silent message loss.
WebSocket is the right choice when the client also sends data to the server frequently. Chat, collaborative editing, multiplayer game state, and live trading interfaces all benefit from bidirectionality.
Server-Sent Events (SSE): One-Way Stream
SSE is a regular HTTP GET that the server never closes. The response body is an infinite stream of newline-delimited data: events. The browser's built-in EventSource API handles reconnection automatically.
// Express SSE endpoint
app.get('/events', (req, res) => {
res.setHeader('Content-Type', 'text/event-stream');
res.setHeader('Cache-Control', 'no-cache');
res.setHeader('Connection', 'keep-alive');
// Register this response as a subscriber
const clientId = crypto.randomUUID();
sseClients.set(clientId, res);
// Send initial state immediately (prevents empty wait on first connect)
res.write(`data: ${JSON.stringify({ type: 'connected', id: clientId })}\n\n`);
// Ping every 25s to prevent proxy timeouts
const heartbeat = setInterval(() => {
res.write(': ping\n\n'); // comment line, no "data:" — client ignores it
}, 25_000);
req.on('close', () => {
clearInterval(heartbeat);
sseClients.delete(clientId); // always clean up
});
});
// Broadcasting an event to all SSE clients
function broadcastEvent(payload: object) {
const data = JSON.stringify(payload);
for (const [id, res] of sseClients) {
res.write(`data: ${data}\n\n`);
}
}
The heartbeat keep-alive is not optional. Load balancers and corporate proxies close idle connections after 60–90 seconds. Without it, you'll see clients mysteriously disconnect in production and not in local development.
I reach for SSE over WebSocket by default on any feature where the client only reads. The built-in EventSource reconnect logic has saved me hours of manual reconnect implementation, and you get plain HTTP semantics that work through any proxy.
SSE is the right choice when the client only needs to receive, not send. Notification feeds, live dashboard metrics, deployment progress streams, and AI model response streaming (ChatGPT's interface is SSE) are all natural SSE use cases.
Webhooks: Push for Server-to-Server
A webhook is push with the roles inverted. Instead of a server pushing to browsers, your service pushes HTTP POST requests to another service's registered URL when events occur.
// Your webhook endpoint
// CRITICAL: this route must receive the raw body, not parsed JSON.
// Register express.raw() on this route before any global express.json() middleware applies.
app.post(
'/api/webhooks/stripe',
express.raw({ type: 'application/json' }),
async (req, res) => {
// 1. Verify signature FIRST — never skip this
const sig = req.headers['stripe-signature'] as string;
let event: Stripe.Event;
try {
event = stripe.webhooks.constructEvent(req.body, sig, process.env.STRIPE_WEBHOOK_SECRET!);
} catch (err) {
// Invalid signature = attacker spoofing events
return res.status(400).send('Webhook signature verification failed');
}
// 2. ACK immediately to prevent retry storms
res.status(200).json({ received: true });
// 3. Process ASYNC after responding
// Queuing prevents: slow processing causing Stripe to retry a successfully received event
await queue.enqueue({
type: 'stripe_webhook',
eventId: event.id, // idempotency key
eventType: event.type,
data: event.data.object,
});
// Real handler processes the queue separately
}
);
The two absolutes for webhook endpoints: verify the HMAC signature before doing anything, and return 200 before processing. Violating either causes security holes (accepting spoofed events) or retry storms (processing takes 6s, source treats it as failure, retries, your handler processes the same event twice).
Long Polling: Pull That Mimics Push
Long polling is the awkward middle child. The client sends an HTTP request that the server holds open until data is available. On response, the client immediately opens a new connection.
The latency behavior is nearly identical to real push: the client gets data within milliseconds of the event firing. But the delivery mechanism is pure HTTP, works through any proxy, needs no special protocol support, and degrades cleanly. No WebSocket upgrade failure, no SSE connection drop.
Where long polling falls apart: each pending request holds a server thread (or an event loop subscription, in Node.js). At 10,000 concurrent long-polling clients, you have 10,000 requests in flight simultaneously. With traditional thread-per-request servers, thread pool exhaustion is a real risk.
Head-to-Head Comparison
| Dimension | Short Polling | Long Polling | SSE | WebSocket | Webhook |
|---|---|---|---|---|---|
| Client initiates | Yes | Yes | Once | Once | No |
| Bidirectional | No | No | No | Yes | No |
| Latency | = poll interval | ~event time | ~event time | ~event time | ~event time |
| Connection state | None | Per-request | Per-client | Per-client | None |
| Works through proxy | Yes | Yes | Yes | Sometimes | Yes |
| Client type | Any | Any | Browser | Browser + native | Server |
| Reconnect handling | Implicit | Manual | Auto (EventSource) | Manual | N/A |
| Wasted requests | Many | Few | None | None | None |
| Scale model | Stateless | Stateless | Stateful | Stateful | Stateless |
| Libraries needed | None | None | EventSource (built-in) | ws / socket.io | HTTPS client |
The "Works through proxy" entry for WebSocket deserves a footnote. WebSockets work through most modern proxies, but you will encounter corporate firewalls, Nginx configs, and load balancers that drop WebSocket connections silently. SSE uses plain HTTP and never has this problem.
The Push Traps You'll Hit at Scale
These are the failure modes that don't show up in demos but will end your on-call weekend.
Fan-Out Amplification
Every push event multiplies by N subscribers. A 1 KB message sent to 500,000 subscribers is 500 MB of outbound bandwidth, every time that event fires.
During peak events (a sporting team's game starts, a viral tweet publishes, a flash sale launches), the fan-out cost spikes exactly when your systems are under the most load. I've seen engineering teams add caching, optimize databases, and tune app servers, only to have the WebSocket fan-out in a completely different service be the actual bottleneck under load.
The mitigation is fan-out-on-read: don't push directly to all subscribers. Push one message to a pub/sub layer (Redis Pub/Sub, Kafka, or a message queue) and let per-region edge servers pull and fan out to their local connection set. Twitter uses exactly this split: the fanout service writes to a Kafka topic; regional socket servers consume from it and push to their local connections.
flowchart TD
subgraph Source["⚙️ Event Source"]
App["App Server\nevent fires"]
end
subgraph FanoutTier["📨 Fan-out Tier: Single Write"]
Kafka["📨 Kafka Topic\nprice-updates\n1 message written"]
end
subgraph EdgeTier["🔀 Regional Socket Servers: Localized Fan-out"]
US["🔀 US East\n200K connections\n→ 200K socket writes"]
EU["🔀 EU West\n150K connections\n→ 150K socket writes"]
AP["🔀 APAC\n150K connections\n→ 150K socket writes"]
end
App -->|"1 write"| Kafka
Kafka -->|"1 message consumed"| US
Kafka -->|"1 message consumed"| EU
Kafka -->|"1 message consumed"| AP
Fan-out via message queue: the source writes once, regional socket servers fan out locally. Without this, the source makes 500K socket writes per event.
Backpressure: The Push System Killer
Push has no natural mechanism for consumers to signal "slow down." The producer has no visibility into whether the consumer's buffer is full.
Pull inherits backpressure for free. The consumer only requests what it can process: read 100 messages, process them, ask for 100 more. The queue never grows unbounded.
For push systems where messages can arrive faster than the consumer processes them, you need explicit flow control: credit-based windowing (consumer grants the producer permission to send N messages before acknowledging), or consumer-side rate limiting (server checks consumer's processing rate before sending).
The Stale Connection Problem
WebSocket connections silently die. Server process crashes, network partitions, NAT timeouts, client browser tab backgrounded and phone locked. The server TCP stack has no idea the client is gone until it tries to write to the socket.
At 100K connections, roughly 5–10% of your connections at any moment are dead sockets the server doesn't know about yet. Your memory grows, your subscription store has ghost entries, and broadcastPriceUpdate is writing to sockets that silently drop the frames.
The fix is a heartbeat ping/pong cycle:
const PING_INTERVAL_MS = 30_000;
const PONG_TIMEOUT_MS = 10_000;
wss.on('connection', (ws) => {
let isAlive = true;
ws.on('pong', () => { isAlive = true; });
const heartbeat = setInterval(() => {
if (!isAlive) {
ws.terminate(); // forcibly kill — close() is polite but unreliable on dead connections
return;
}
isAlive = false;
ws.ping(); // will receive pong if connection is alive
}, PING_INTERVAL_MS);
ws.on('close', () => clearInterval(heartbeat));
});
Set your ping interval at 30 seconds and your isAlive reset to false on each ping. If no pong arrives within the next 30-second window, terminate the connection. This pattern keeps your active connection count accurate and your memory bounded.
When Push Wins
So when does push actually win? The honest answer is narrower than most engineers expect.
In interviews, I see candidates default to WebSocket for any feature described as "real-time." My first question is always: what is the actual latency requirement? That single question changes the answer most of the time.
Use push when all three conditions hold:
- The latency requirement is below what any poll interval can achieve (sub-second)
- The server knows exactly which clients care about which events (subscription model)
- The server can sustain connection state for the peak concurrent connection count
Concrete scenarios where push is clearly correct:
- Live multiplayer game state synchronization (sub-50ms)
- Collaborative document editing (cursor positions, presence indicators)
- Financial market data feeds (bid/ask changes arriving 1,000×/second)
- Chat systems where message delivery within 200ms is a product requirement
- Real-time monitoring dashboards showing live metrics
For your interview: say you'll use WebSocket for bidirectional real-time, SSE for server-to-browser push with no client messages, and webhooks for server-to-server event delivery.
When Pull Wins
Pull wins most of the time, for most systems, at most scales.
I'll often see engineers propose WebSocket for social feed updates or notification feeds. A 10-second poll interval is invisible to users and eliminates an entire stateful infrastructure tier. The burden of proof should be on push, not pull.
Use pull when:
- Users would not notice a delay of 5–60 seconds
- The server cannot predict which clients care about a given event
- The connection pool overhead of persistent connections would consume more resources than the poll requests themselves
- The system spans multiple services and you need natural backpressure at every consumer boundary
- Clients connect intermittently (mobile apps, scheduled jobs, third-party integrations)
Concrete scenarios where pull is clearly correct:
- Email inbox refresh (checking every 30 seconds is fine)
- Feed/timeline updates in a social app (a 10-second delay is invisible)
- CI/CD build status (polling a job status endpoint every 5 seconds)
- Any consumer-to-broker queue consumption (Kafka, SQS, RabbitMQ, all pull-based)
- Third-party API integrations where you need to audit which events you've processed
If you're unsure whether you need push, you probably don't. Poll first. Add push when users complain about latency.
The Nuance
The False Choice: Hybrid Is Usually the Answer
Real systems use push for the perception of real-time and pull for the actual data delivery. This is not a cop-out. It's the correct architecture at scale.
The hybrid pattern is what I draw on the whiteboard first for any large-scale notification or feed system. The key insight that took me a while to internalize: the WebSocket channel is not a data channel, it's a signaling channel.
The pattern: push a lightweight notification ("something changed for you"), pull the actual data.
sequenceDiagram
participant App as ⚙️ App Server
participant WS as 🔀 WebSocket Gateway
participant C as 👤 Client Browser
Note over App,C: Hybrid push-then-pull
App->>WS: "invalidate user:7429 feed"
WS->>C: push: { type: 'invalidate', resource: 'feed' }
Note over C: Cache invalidated locally
C->>App: GET /api/feed?since=lastSeenId
App-->>C: 200 OK { items: [...] }
Note over C: Full data fetched via HTTP
The WebSocket message is just a 40-byte invalidation signal. The actual data is fetched via a normal HTTP GET that hits CDN, is cacheable, and benefits from all the infrastructure around normal API calls.
This hybrid pattern is used by Slack (WebSocket for presence/typing indicators, REST for message history), by most modern gaming backends (WebSocket for game events, HTTP for leaderboard data), and by Linear (WebSocket for live document sync, REST for initial document load at page open).
Message Queue Pull Is Not Polling
When engineers see Kafka or SQS consumers using a poll() loop, they sometimes call it "polling" with a negative connotation. It isn't the same thing as HTTP short polling.
SQS long polling holds the request for up to 20 seconds. Kafka's consumer poll blocks for a configurable max.poll.interval.ms. These are event-driven pull mechanisms with built-in long-poll semantics and zero wasted network round-trips when no messages exist.
More importantly: message queue pull gives you consumer groups, independent consumption offsets, and per-partition backpressure. These properties are what make Kafka the backbone of most large-scale data pipelines. A push-based message queue would require the broker to know which consumers are alive, track delivery per consumer, and retry failed deliveries. That's exactly the operational complexity that Kafka sidesteps by making consumers pull.
Real-World Examples
Slack: WebSocket for meta, REST for content
Slack's desktop client maintains a WebSocket connection to their gateway per active session. That connection carries only lightweight events: message posted (presence signal), typing indicator, user presence changes. When you receive a "message posted" signal, the client fires a REST conversations.history API call to fetch the actual message content. This is the hybrid pattern at production scale. The WebSocket channel carries roughly 50 bytes per event; the payload is fetched separately and cached locally.
I bring up Slack's hybrid model in every system design interview about real-time messaging. It shows you understand that WebSocket is not a replacement for HTTP. It's a complement for the events that HTTP can't efficiently deliver.
Uber: Demand-supply matching via pull at the broker level
Uber's location updates from drivers arrive via HTTP POST every 5 seconds. Driver location is consumed by a matching service that pulls from an in-memory geospatial index. Riders see "near real-time" driver movement. The actual position they see is exactly the last polled position, up to 5 seconds stale. Uber determined that 5-second staleness in driver position was invisible to users because network rendering latency and UI animation smooth over the gap.
The lesson: your latency requirement is the user-visible latency, not the technical update frequency. Uber's riders don't need position accuracy under 1 second. Designing for it would have added complexity and cost for zero user benefit.
Stripe Webhooks: Push with at-least-once delivery
Stripe's webhook system delivers all events as HTTP POSTs with exponential backoff retry (5s, 5min, 30min, 2h, 5h, 10h, 24h). Your endpoint may receive the same event more than once if:
- Your server returns non-200 during processing
- A network partition causes the delivery to time out
- Your server crashes after processing but before returning 200
This is at-least-once delivery, and every serious Stripe integration handles it by storing event IDs and skipping duplicate processing. The event.id field is the idempotency key. Missing this produces double-charges, double-fulfillments, and doubled email sends (all of which have happened to startups that skipped idempotency handling).
How This Shows Up in Interviews
Here's the honest answer on this topic in interviews: most candidates pick one and defend it. The correct answer is different for different requirements, and showing the reasoning chain is what earns staff-level credit.
Interview tip: state the latency requirement first
Before naming a protocol, say: "What's the latency tolerance for this feature?" If the interviewer says 5 seconds is fine, use polling. If they say sub-second, discuss push options. That one question changes the entire answer, and asking it signals you understand the tradeoff.
The WebSocket default is a trap
Juniors and many mid-levels jump to WebSockets for any 'real-time' feature. Interviewers who know the domain will probe: 'How do you handle 1M concurrent connections on a single server?' If you said WebSocket but didn't account for connection state, fan-out cost, and horizontal scaling strategy, you've signaled that WebSocket was pattern-matching, not reasoning.
Depth expected at senior/staff level:
- Name the latency requirement and derive the protocol from it, not the other way around.
- For WebSocket at scale: fan-out via message queue, not direct socket writes from the app server. The app server produces to Kafka; socket gateway servers consume and write to local connections.
- Mention connection state as a scaling concern: how many connections per server, how do you route from app server to the right socket server for a given user ID.
- Address the stale connection problem: heartbeat ping/pong or OS-level TCP keepalives with short idle timers.
- For webhooks: idempotency by event ID is non-negotiable. Mention at-least-once delivery semantics and how your processing handles duplicate delivery.
- For polling: explain how
ETag/If-None-Matchheaders make polling almost free when data hasn't changed. The server checks the hash, returns 304 Not Modified with an empty body — minimal CPU, minimal bandwidth.
Common follow-up questions and strong answers:
| Interviewer asks | Strong answer |
|---|---|
| "How does Slack scale WebSockets to millions of users?" | "They separate the connection layer from the application logic. A stateful WebSocket gateway tier handles connection bookkeeping. The app servers are stateless and communicate with the gateway via a pub/sub channel (Redis or Kafka). The gateway maintains which user IDs are connected to which socket, and routes inbound events to the right connections. The app server never holds a socket." |
| "What happens to your WebSocket server when you deploy a new version?" | "Rolling deploy kills each node's connections. Clients need auto-reconnect logic with exponential backoff and jitter. The client should detect the close event and reconnect within 1–3 seconds. During reconnect, it should fetch missed events via REST (using a sequence ID or timestamp it stored before disconnect). This is the 'offline reconciliation' pattern." |
| "Your notification feed has 500M users. Do you use push or pull?" | "Pull. Push to 500M users means maintaining 500M persistent connections and a fan-out that writes 500M times per notification. That's not feasible. Instead: push a badge count increment (cheap) via mobile push notification (APNS/FCM), and pull the actual feed items on app open. The badge is the signal; the feed is fetched on demand." |
| "How do you handle a client that falls behind in a message stream?" | "For WebSocket: track the last acknowledged sequence ID per client. On reconnect, replay all messages from that sequence ID from a short-term buffer (Redis sorted set by seq ID, TTL 5 minutes). After 5 minutes of disconnect, don't replay — send a 'stream reset' signal and let the client fetch current state via REST. This is how Kafka handles consumer offset lag, applied to WebSocket reconnect." |
| "Webhook consumer is slow — takes 30 seconds to process. Source retries. What's your design?" | "Accept immediately, queue internally, process async. Your endpoint returns 200 OK in under 500ms, writes the raw payload to an internal queue (SQS, Redis, Kafka), and a separate worker processes it. The worker is idempotent by event ID. This decouples delivery from processing. You can scale workers independently and replay failed events from the queue without the source needing to retry." |
Staff Engineer Observations
These don't show up in most articles, but they're the things I've actually had to think through at scale.
Test Your Understanding
Quick Recap
- Push delivers data to clients without a request; pull waits for clients to ask. Push minimizes latency; pull minimizes server-side state.
- WebSocket is TCP full-duplex: use for bidirectional real-time work. SSE is one-way HTTP: use for server-to-browser notifications. Long polling bridges both with plain HTTP. Webhooks are server-to-server HTTP push with retries.
- Push costs connection state (memory per connection), fan-out bandwidth (message × N subscribers), and stale connection hygiene (heartbeat or silent socket leaks).
- Pull provides backpressure inherently: the consumer controls dequeue rate, making it stable under overload.
- At mobile scale (hundreds of millions of users), neither WebSocket nor SSE is feasible directly. APNS (Apple) and FCM (Google) are infrastructure that exists specifically to absorb persistent connection state at device scale.
- Hybrid wins in practice: push a lightweight invalidation signal, pull the actual data. Slack, Google Docs, and Uber all do this.
- In interviews, state the latency requirement before naming a protocol. "Under 200ms" leads to WebSocket. "Under 5 seconds" leads to SSE or long polling. "Under 5 minutes" leads to polling. "No real-time requirement" leads to polling with ETags.
Related Concepts
- Message queues — Kafka and SQS consumers are pull-based systems. Understanding why the pull model was chosen for message queues (backpressure, replay, independent offsets) deepens your intuition for when pull beats push.
- Caching — The hybrid push-then-pull pattern works by pushing cache invalidation signals. Understanding TTL and event-driven invalidation is the other half of that pattern.
- API gateway — WebSocket connections typically pass through or terminate at the API gateway layer, which has its own connection multiplexing and routing concerns.
- Rate limiting — Webhook endpoints need rate limiting to handle retry storms. Understanding token bucket and sliding window rate limiting is directly applicable.