How WhatsApp shows the typing indicator in real time

The Problem Statement

Interviewer: "You are a user on WhatsApp. Your friend opens the chat with you and starts typing a reply. You see the three-dot 'typing...' indicator appear almost instantly. Walk me through how that works under the hood. How does your phone know they are typing? How does the indicator disappear after 3 seconds even if they keep typing?"

This question appears simple on the surface but has real depth. The interviewer is testing whether you understand ephemeral signal design (data with zero value the moment its context changes), real-time push delivery over WebSockets, and the specific design choice of client-side auto-expiry. A strong answer covers the full signal path from keystroke to rendered dots, the throttle mechanism that keeps volume manageable, and the 3-second auto-expire loop that makes this work without any server storage.

I find this question rewarding to answer because the typing indicator is a common-knowledge feature that almost no one thinks deeply about. Most engineers assume it works like a message. It does not. It is architecturally different in every meaningful way: no persistence, no retry, no delivery guarantee, no database writes. Getting that contrast out early is the move.

The same pattern appears in "recording audio...", "online" dots, Discord's "User is typing...", and Google Docs live cursors. Master this one system and you have a template for the whole class.

Clarifying the Scenario

You: "Before I design anything, a few clarifying questions."

You: "Should I focus on one-to-one chats first, then extend to groups?"

Interviewer: "Yes, start with 1:1, then cover the group case."

You: "I want to confirm my mental model first: the signal is ephemeral, right? If the recipient is offline when someone starts typing, they should not see the indicator when they reconnect 30 seconds later."

Interviewer: "Correct. Why does that matter to the design?"

You: "Because it changes the entire storage and delivery strategy. If we need guaranteed delivery, we persist the signal and replay on reconnect. But a typing indicator that is 30 seconds old is worse than nothing. It misleads. So I would design this as fire-and-forget with no database writes, no retry, and no queue. The signal either reaches the recipient right now, or it is dropped."

Interviewer: "Exactly. Walk me through it."

You: "One more: the indicator disappears after a few seconds even if the sender keeps typing. What drives that?"

Interviewer: "Right. Why do you think that is?"

You: "Two mechanisms working together. The sender throttles keystrokes so they send one composing signal every 2 seconds regardless of how fast they type. The recipient starts a local 3-second timer when they receive that signal. If no refresh arrives before the timer fires, the indicator hides. When the sender stops typing, no more refreshes come, and after 3 seconds the indicator auto-hides. No explicit 'stopped typing' event is needed."

Explaining the auto-expire logic proactively is the signal that separates a senior answer from a mid-level one. Go beyond the happy path before the interviewer asks.

My Approach

I break this into four areas:

The signal path: How a keystroke on the sender's phone becomes three animated dots on the recipient's screen
Throttle and debounce: Why 50 fast keystrokes produce only one server event every 2 seconds
The 3-second auto-expire loop: How the indicator self-heals without any explicit "stopped typing" event
Group chat aggregation: How the server coalesces signals from multiple simultaneous typists without quadratic fan-out

The core insight is that the typing indicator is a presence signal, not a message. Messages are durable. Presence is ephemeral. Once you internalize that distinction, every design choice in this system follows logically. You skip the database, skip the queue, skip the retry. You optimize purely for speed and let the failure path be a noop.

Think of it like a hand wave across a crowded room. If the other person is looking, they see it. If they are not looking, it is gone. You do not write it down and hand them a note saying "I waved at 2:03 PM." The wave was only meaningful in the moment it happened.

Numbers at a glance

Metric	Approximate value
Active WhatsApp users simultaneously	~200 million
Typing signals per second (peak)	~10-50 million
Signal payload size	~100 bytes
End-to-end latency (good connection)	50-150ms
Sender-side throttle window	2 seconds
Recipient auto-expire timer	3 seconds
Network jitter budget	1 second (gap between intervals)
Presence map entry TTL	30-60 seconds (heartbeat refreshed)

Scale context: why this matters

WhatsApp serves 2 billion users with roughly 100 billion messages per day. Typing signals are sent far more frequently than messages, because a single sent message may be preceded by dozens of composing refreshes. The entire typing indicator system must add near-zero marginal cost to the infrastructure that already handles message delivery.

The Architecture

Here is the full typing signal path from first keystroke to rendered indicator.

The entire path from keystroke to rendered indicator completes in under 100ms on a good connection. The server never touches a database. The routing layer consults an in-memory presence map to find which server holds the recipient's active socket, routes the signal server-to-server, and the recipient's gateway pushes it down. If the recipient is offline, the signal is dropped silently at the Forward Layer with no retry, no persistence, no dead-letter queue.

Step-by-step walkthrough

Step 1: Keystroke fires, throttle buffer absorbs it. Alice presses a key. The app fires a keystroke event. If it is the first keystroke in the current 2-second window, the throttle buffer creates a composing signal. If it is a subsequent keystroke within the same window, the event is discarded. The buffer holds one pending signal at a time.

Step 2: Composing signal sent over existing WebSocket. The composing signal is a tiny frame, roughly 100 bytes. It contains Alice's user ID, Bob's user ID, and the signal type ("composing"). No content, no timestamp, no sequence number. The signal travels over the WebSocket connection that Alice's app maintains persistently, the same pipe used for message delivery. No new connection is required.

Step 3: Sender's gateway receives and routes. Alice's regional gateway receives the composing frame. It does not write to any log or database. It looks up Bob's location in the presence map (a Redis lookup: GET presence:bob returns server-eu-3). If the key does not exist, Bob is offline and the signal is dropped here.

Step 4: Server-to-server gRPC forward. The gateway forwards the composing frame to Bob's gateway via an internal gRPC call. This is low-latency (typically 5-30ms for intra-region, 50-200ms cross-region). WhatsApp runs its own inter-datacenter backbone, so these hops are not over the public internet.

Step 5: Recipient's gateway pushes to socket. Bob's gateway receives the forwarded frame and immediately pushes it down Bob's WebSocket connection. No buffering, no acknowledgment required. If Bob's socket has since closed (he just put his phone down), the push fails and the frame is discarded.

Step 6: Client renders and starts timer. Bob's app receives the composing push, sets the UI state to "typing...", and starts a 3-second countdown. The signal event itself contains no timer value: the 3-second duration is hardcoded in the client. When a new composing push arrives, the timer resets to 3 seconds. When 3 seconds elapse with no new push, the "typing..." state is cleared.

The signal payload

The composing signal is intentionally minimal. A conceptual JSON representation:

{
  "type": "composing",
  "to": "15551234567@s.whatsapp.net",
  "from": "15559876543@s.whatsapp.net",
  "conversation_id": "conv_abc123"
}

WhatsApp uses binary protocol framing over WebSocket rather than raw JSON, so the actual on-wire size is closer to 30-50 bytes. There is no message ID, no content hash, no acknowledgment field. There is nothing to track, deduplicate, or retry. Every signal is entirely self-contained and stateless.

Compare this to a message delivery frame, which includes a message ID, content (encrypted), delivery receipt fields, and push notification metadata. A composing signal is roughly 1% the size of a minimal message.

For the interview: walk this diagram left to right. Call out the throttle buffer (50 keystrokes per second collapses to one signal per 2 seconds) and the local timer (auto-expire at 3 seconds, no explicit stop event). Those two facts are the core of the design.

Deep Dive 1: Typing Signal Delivery Without Persistence

The most consequential design decision in this system is choosing never to write a typing event to any database. Every other part of WhatsApp writes to storage: messages, delivery receipts, read receipts, media. But typing indicators are treated differently because their value is zero outside of the live session.

Here is the complete sequence for a single signal delivery.

The absence of a "stopped typing" event is intentional. Designing explicit start/stop events creates a synchronization problem: what if the stop event is lost due to a network drop? What if the user's app crashes before it can send the stop? The auto-expire model sidesteps both failure modes entirely. Time handles it.

XMPP heritage: where this pattern comes from

WhatsApp's original protocol was built on XMPP (Extensible Messaging and Presence Protocol). XMPP included "composing" notifications in XEP-0085 (Chat State Notifications), which works exactly this way: a composing stanza is sent when typing starts, and a paused stanza is sent when typing stops. WhatsApp simplified this to a single composing signal with client-side auto-expire, which is more resilient than relying on the sender to broadcast an explicit "stopped" state.

The "drop if offline" behavior is the right call and I want to make it explicit. The normal instinct is: if delivery fails, retry. But for ephemeral presence signals, retry is actively harmful. Imagine Alice starts typing at 2:00 PM. Bob's phone loses signal. At 2:00:10 his phone reconnects. If the server had queued the typing signal and delivered it on reconnect, Bob's phone would show "Alice is typing" based on a 10-second-old signal. Alice has already sent the message. The indicator has become misleading. The right behavior is exactly what WhatsApp does: treat failed delivery as a noop.

The three failure modes and how they resolve

Understanding failure handling cements your mastery of this design. There are exactly three failure modes, and each one resolves gracefully without special-case logic.

Sender loses network mid-session. Alice is typing. Her WiFi drops. The composing signal that was inflight is lost. Bob does not receive the refresh. After 3 seconds, Bob's local timer fires and hides the indicator. This is correct: Alice cannot type if she has no connection. The indicator disappears at exactly the right time, and no explicit "stop" signal was needed.

Recipient loses network mid-session. Bob loses connectivity while Alice is still actively typing. The gateway tries to push the next composing signal to Bob's socket and gets an error. The gateway drops the signal and marks Bob's session as disconnected. When Bob reconnects, his presence map entry is updated with the new server assignment. Alice's next throttle window fires a fresh signal, which finds the new server and delivers successfully. Bob sees the indicator appear within 2 seconds of reconnecting, showing Alice is still typing. The reconnection is invisible.

Gateway server restarts. The server handling Bob's session restarts during a rolling deployment. Bob's WebSocket connection drops momentarily. The client reconnects (within 1-3 seconds typically). The presence map key for Bob is updated to point to the new server handling his reconnection. Any signals inflight to the old server are lost, but the next 2-second composing cycle sends fresh ones. From the user's perspective: the typing indicator might briefly disappear during the reconnection window (at most 5 seconds worst-case), then reappear. Acceptable degradation.

The elegance is that none of these failure modes require explicit error handling on the server. The auto-expire timer handles all three. If you stop receiving signals for any reason, the indicator hides after 3 seconds. This is a self-healing design with no recovery code.

Multi-device typing synchronization

WhatsApp Multi-Device (introduced in 2021) allows a user's account to be active on a phone, WhatsApp Web, and up to 3 linked desktop devices simultaneously. This adds a twist to the typing indicator design.

When Alice (on her phone) types to Bob, Bob's indicator appears on all of Bob's linked devices: his phone, his laptop with WhatsApp Web open, and his tablet. Each device that is online receives the same composing push from their shared gateway session. This is handled by WhatsApp's multi-device architecture: all of a user's linked devices receive delivery fan-out from the same presence map entry.

The interesting case is the reverse: what do Alice's other devices see while she is typing? When Alice types on her phone, her phone sends composing signals to Bob. Nothing is sent to Alice's own other linked devices: they do not need to know that Alice is typing, since Alice is the one doing the typing. The composing signal is a notification for the recipient, not a sync event for the sender. Alice's WhatsApp Web session does not show any "you are typing from your phone" indicator.

This is a subtle but important distinction. Composing signals are directional: from person-typing to person-receiving. They are not broadcast to all participants in the conversation, including the sender's own devices.

Deep Dive 2: The 3-Second Auto-Expire Rule

The 3-second timer on the recipient's device is the most elegant piece of the design. It creates a self-healing presence loop without any server involvement after initial signal delivery.

The sender sends composing signals every 2 seconds while actively typing. The recipient's timer is 3 seconds. The 1-second gap between the 2-second send interval and the 3-second expire timer is the network jitter budget: even if the sender's signal is delayed by up to 1 second due to congestion or routing, the recipient's timer will not fire before the refresh arrives. The system tolerates up to 1 second of end-to-end delivery latency before triggering a false expiry.

Name the jitter budget in your interview answer

When you describe this mechanism, name the overlap explicitly: "The 1-second gap between the 2-second send interval and the 3-second expire timer is the jitter budget. As long as round-trip latency is under 1 second, the indicator never flickers." This precision signals that you understand the design intent, not just the implementation surface.

Battery optimization on mobile

This detail signals real mobile experience and most candidates never raise it. Sending a WebSocket frame every 2 seconds keeps the device radio awake during the typing session. For a 30-second typing session that is about 15 outgoing frames, which is negligible on modern hardware.

But when Android or iOS signals that the device is in battery-saver mode, the WhatsApp client widens the throttle window from 2 seconds to 4 or 5 seconds. The auto-expire timer on the recipient side is always set longer than the sender interval by at least 1 second, so extending the sender interval requires a correspondingly longer recipient timer. Signal frequency drops, the indicator flickers slightly more, but battery is protected. A minor UX degradation traded for a meaningful reduction in radio wake-ups.

Signal volume at WhatsApp scale

This is a useful back-of-envelope to have ready if asked. WhatsApp reported 2 billion monthly active users in 2023. Assume 10% are active at peak: 200 million concurrent users. In any given second, assume 5% are actively typing: 10 million concurrent typing sessions. Each sender fires one composing signal per 2 seconds, so peak signal rate is 5 million composing signals per second.

Each signal generates one server-to-server forward and one WebSocket push, for 10 million inbound calls per second of inter-gateway traffic. Each signal is roughly 50 bytes on the wire. That is 500 MB/s of composing-signal traffic globally, spread across hundreds of regional gateway clusters. Each cluster handles 50,000-100,000 concurrent connections. The load per machine is minimal: no disk I/O, no database calls, just memory reads and socket writes.

Compare this to message delivery traffic: WhatsApp processes roughly 100 billion messages per day, which is about 1.15 million messages per second at average (with peak 3-5x higher). Composing signals at peak are several times more frequent per minute of typing than messages, but each is dramatically cheaper to process (no storage, no ACK, no fan-out for 1:1 chats) and most are dropped silently if the recipient is offline.

Deep Dive 3: Group Chats and Multi-User Typing

One-to-one typing is straightforward. Groups are where the aggregation problem appears. In a group with 50 members, any subset might be typing simultaneously. If Alice, Bob, and Carol are all typing, the recipient should see "Alice, Bob, and Carol are typing..." (or "Several people are typing..." for large groups). The naive approach generates quadratic fan-out. The correct approach coalesces signals server-side.

The server maintains an in-memory set per group conversation. Each entry in the set is a user ID with an associated TTL, refreshed each time that user's composing signal arrives. When the set membership changes, the server pushes an updated typing list to all group members. This is still an in-memory-only operation. The group typing set lives in process memory or Redis with short TTLs, never in a durable store.

Group chat is significantly more expensive per signal

In a 1:1 chat, one composing signal triggers one server lookup and one push. In a group with 50 members, one composing signal from Alice triggers pushes to all 49 other online members. At scale, a busy group where 5 people are simultaneously typing generates 5 x 49 = 245 push operations per composing cycle. WhatsApp limits the displayed typists to the first 2-3 names and collapses the rest to "and others," which bounds the push volume in large concurrent-typing scenarios.

The Tricky Parts

End-to-end encryption does not apply here. WhatsApp is well-known for E2E encryption on messages, but typing indicators are not message content. They are transport-layer presence signals. They travel with TLS transport encryption (the WebSocket connection is TLS), but they are not encrypted with the recipient's public key the way messages are. The server can read composing signals. This is a deliberate tradeoff: applying the Signal protocol cipher-per-message overhead to signals that are discarded immediately would be wasteful without providing meaningful privacy protection.
The presence map consistency problem. The routing layer needs to know which gateway server holds the recipient's WebSocket connection. This is stored in an in-memory map (usually Redis). If the recipient's connection migrates from server A to server B during a server restart or load-balancer rebalance, the map might still point to A for a few seconds. During that window, composing signals get forwarded to the wrong server and are dropped. The degraded experience is that the typing indicator does not appear for a few seconds during reconnection, which is acceptable.
Privacy controls create conditional delivery. WhatsApp allows users to disable typing indicators in privacy settings. When a user disables this, their client neither sends composing signals nor renders received ones. The server checks this preference before routing. Since there is no DB write for composing signals, the check happens at the gateway level using a preference loaded at session start. A real-time preference change mid-session requires invalidating the cached preference in the gateway, which adds a small message to the session management system.
Regional routing adds a hop. WhatsApp routes traffic through regional servers. A signal from Alice in London to Bob in Mumbai routes through a regional gateway, not directly across the globe. This adds one extra server-to-server hop to the signal path. London to London gateway (10ms), London gateway to Mumbai gateway (160ms), Mumbai gateway to Bob (10ms) is roughly 180ms total. Still under 200ms, which users perceive as real-time.
Battery state interaction with throttle interval. The throttle window widens on low-battery devices (2 seconds to 4-5 seconds). If Alice is on low-battery (4s interval) and Bob's recipient timer is the standard 3 seconds, the indicator would flicker: show for 3s, briefly hide, reappear at 4s. The simplest fix is to set the standard recipient timer high enough to tolerate the low-battery interval, trading a slightly longer "fade out" after the sender stops typing for a stable indicator during active sessions.

What Most People Get Wrong

Mistake	What they say	Why it is wrong	What to say instead
Treating it like a message	"Write the typing event to the message queue"	Typing signals are ephemeral presence, not messages. Queuing them for durability inverts the design intent	"Drop if offline. No retry. No persistence. Presence signals are only useful in the present moment."
Polling instead of push	"The recipient polls every 500ms to check if the other person is typing"	Polling 200M users every 500ms is 400M requests per second just for presence	"Push over the existing WebSocket connection. Zero polling, sub-100ms delivery"
Assuming E2E encryption	"The typing signal is encrypted with the recipient's public key"	Transport TLS only. The server reads these signals to route them	"Transport TLS protects the signal in transit. The server can see composing signals at the gateway layer."
Ignoring group fan-out cost	"Same as 1:1, just send to the group"	One signal in a 50-person group triggers 49 pushes, multiplied by all concurrent typists	"Cap the displayed typists at 2-3 names. Only push when the visible label changes, not on every signal."
Forgetting the jitter budget	"The timer is 2 seconds to match the throttle interval"	If signal is even 100ms late, the indicator flickers off and immediately back on	"The timer must be longer than the send interval by at least 1 second to absorb network jitter."

How I Would Communicate This in an Interview

I start with the framing principle before touching any mechanism: "typing indicators are ephemeral presence signals, not messages, and that single insight drives every design choice here."

Then I sketch the signal path at a high level: keystroke on sender, throttle to one signal per 2 seconds, WebSocket frame to gateway, routing lookup via presence map, server-to-server forward to recipient's gateway, push to recipient's socket, auto-expire on recipient after 3 seconds if no refresh arrives.

I name the two key behaviors explicitly: the sender-side throttle ("50 fast keystrokes collapse to one signal per 2 seconds") and the recipient-side auto-expire ("the 3-second timer means we never need an explicit stopped-typing event"). Those two together make the system stateless on the server.

I then call out ephemeral-vs-durable explicitly: "This never writes to a database. If the recipient is offline, we drop the signal silently. Stale typing indicators are worse than no indicator." That contrast is the thing most candidates miss.

For depth, I flag the group chat fan-out problem, the E2E encryption nuance, and the battery optimization tradeoff. Not all at once: I let the interviewer's follow-up questions guide expansion. But I mention them briefly to show awareness.

My recommendation: spend the first 90 seconds on the happy-path signal flow with a quick sketch. That is the foundation. Then use the next 90 seconds for the 3-second auto-expire mechanism, because that is the most surprising and elegant part. Leave group chat and privacy for follow-ups.

Common follow-up questions

Interviewer asks	Strong answer
"What if both people are typing at the same time?"	"Two independent signal flows. No interaction. Traffic doubles, but trivially. Each signal only routes to the other party."
"Does this work end-to-end encrypted?"	"Transport TLS only. The server reads composing signals to route them. Content encryption with the Signal protocol would add key exchange overhead that is not justified for zero-content signals."
"How do you handle the indicator staying on after someone's app crashes?"	"Client-side timer handles it automatically. If no refresh arrives in 3 seconds, the indicator hides, regardless of why the refreshes stopped."
"What would you change if you were rebuilding this today?"	"Nothing fundamental. The design is already optimal for ephemeral presence. I might consider using QUIC instead of WebSocket for better connection migration, reducing flickering during network switches."
"What is the most common mistake teams make when building something like this?"	"Treating typing indicators like messages and routing them through Kafka for reliability. They do not need reliability. They need speed and zero storage overhead."
"How does this scale to 2 billion users?"	"The load is distributed across regional servers. Each server handles tens of thousands of concurrent connections. The presence map is sharded Redis. Nothing is centralized. The system scales horizontally."

One sentence to anchor your answer

"Typing indicators are presence signals, not messages. They have zero value the moment the context changes, so we design them to be fast, fire-and-forget, and never stored." Say this in your first 30 seconds and it frames everything you say after.

Interview Cheat Sheet

Typing indicators are ephemeral presence signals, not durable messages. This single distinction drives every design choice.
Sender throttles to one composing signal per 2 seconds regardless of how many keystrokes fire in that window.
The 1-second gap between 2-second send interval and 3-second expire timer is the jitter budget. Signals delayed up to 1 second still arrive before false expiry.
No database writes at any step. Signal is dropped silently if the recipient is offline. No retry, no queue.
Recipient's client starts a 3-second local timer on receiving the signal. If no refresh arrives in 3 seconds, the indicator auto-hides.
The jitter budget of 1 second tolerates most real-world network delays, including cross-regional hops of 150-200ms.
No explicit "stopped typing" event is needed. Time does the work on the recipient side.
Three self-healing failure modes: sender drops, recipient drops, server restarts. All handled by the auto-expire timer with zero server recovery code.
Group chats: server maintains an in-memory set per group with per-user TTLs. Only pushes to group members when the visible label changes.
WhatsApp caps displayed typist names at 2-3 and collapses the rest to "and others," preventing unbounded fan-out in large concurrent-typing scenarios.
Transport TLS only, not E2E encryption. The server reads composing signals to route them. Typing state carries no message content.
Users can opt out of typing indicators in privacy settings. The gateway checks this preference at session start and filters signals accordingly.

Test Your Understanding

Why This Design Generalizes

The WhatsApp typing indicator is a specific instance of a general pattern I call "ephemeral client-synced state": state that only has value in the present moment, changes frequently, and can tolerate loss without degrading the core product experience.

Once you recognize the pattern, you see it everywhere:

Cursor position in collaborative editors. Google Docs shows your teammates' colored cursors in real time. The cursor position is updated on every mouse move, pushed to connected clients, and maintained only in working memory. If a client disconnects, their cursor disappears. There is no database record of where your cursor was at 3:04 PM yesterday. Same pattern, shorter refresh interval (50ms instead of 2s).
Online/offline presence badges. The green dot next to a user's name in Slack or WhatsApp is maintained by the same WebSocket keepalive mechanism. Presence state is in-memory, refreshed by heartbeats, and expires when heartbeats stop. The design challenge is identical to typing indicators, differing only in the event sources and refresh intervals.
"User is viewing this item" state. Shopping sites that show "3 people are viewing this item right now" use in-memory counters with TTLs, not database queries. Each visitor's session sends a keep-alive, the counter increments, and the UI is pushed to other active viewers. No persistence, no replay, no retry.
Live reaction overlays in video streams. Emoji reactions floating up a live stream are ephemeral by design. They never replay; if you were not watching at that moment, they are gone. The fan-out problem (one reaction from one viewer broadcasts to all viewers) is the same group chat fan-out problem WhatsApp solves with aggregation.

The generalizable insight: whenever a piece of state is only meaningful right now, treat it as a signal rather than a record. Do not write it to a durable store. Do not retry failed delivery. Use push over pull and let time handle expiry. This shifts work from servers to clients, from storage to compute, and from synchronous consistency to eventual self-correction.

There is a practical test for whether a piece of state belongs in this category: ask yourself, "What is the worst outcome if a user misses this update entirely?" For typing indicators the answer is: they do not see the three dots briefly. They lose no data and they experience no confusion, because the next state transition (message arrival or silence) is self-explanatory. If the worst-case outcome of a missed update is "user is slightly confused for 3 seconds," you have found an ephemeral signal pattern. Design it accordingly.

The converse is equally useful. If missing an update causes data loss, incorrect state, or user-visible inconsistency that persists beyond a few seconds, it is not a signal. It is a message, and it belongs in a durable store with reliable delivery. Typing indicators: signal. Message delivery receipts: message. Read receipts: message. Last-seen timestamp updates: borderline (batched, eventually consistent writes to a DB, refreshed on heartbeat). Cursor position in a live collaborative doc: signal.

Quick Recap

The typing indicator is an ephemeral presence signal that is never stored in any database, because its value decays to zero the moment the recipient's context changes.
The sender throttles keystrokes to one composing signal every 2 seconds, keeping bandwidth and server load bounded regardless of typing speed.
Signals travel over the existing WebSocket connection to the sender's regional gateway, then hop server-to-server to the recipient's gateway, with no disk I/O at any step.
The recipient's client starts a 3-second local timer on receiving the signal. If no refresh arrives before the timer fires, the indicator auto-hides with no server involvement.
The 1-second gap between the 2-second send interval and the 3-second expire timer is the network jitter budget, tolerating up to 1 second of delivery delay before a false expiry.
Failed delivery to an offline recipient is silently dropped with no retry, because a queued and replayed typing signal would be stale and worse than nothing.
Group chats extend the pattern with server-side aggregation, per-user TTLs in an in-memory set, and diff-based push updates that only broadcast when the visible typing label changes.
The three failure modes (sender drops, recipient drops, server restarts) all resolve through the same auto-expire mechanism with no special-case code.
Multi-device accounts (phone + WhatsApp Web + tablet) receive the same composing push fan-out to all online linked devices, but the sender's own devices are not informed that they are typing.
The pattern generalizes to any ephemeral client-synced state: cursor positions, online presence indicators, live reaction overlays. Design it as a signal, not a record.

WebSocket and long-polling: The transport layer that makes sub-100ms push delivery possible without client polling. Understanding why HTTP long-polling was replaced by WebSocket is essential context for any real-time presence system. The key difference: WebSocket is full-duplex and persistent, so the server can push at any time with no new TCP handshake. Long-polling requires the client to re-initiate each request. At 200 million concurrent connections even a 1-second poll interval is unsustainable (200 million requests per second on HTTP infrastructure).
Presence systems: The broader problem class that typing indicators belong to. Online/offline status, "last seen" timestamps, and "active now" badges all share the same ephemeral-signal architecture. The challenge for online/offline is scale: a user with 500 contacts generating a "came online" event means 500 presence update fan-outs. Large platforms like WhatsApp limit this with selective presence: you only see presence data for contacts with whom you have had a recent conversation, not your entire contact list.
CRDT and real-time collaboration: The next layer of complexity beyond typing indicators. When multiple users edit the same document simultaneously, conflict-free replicated data structures handle the merge problem that a simple typing indicator sidesteps entirely. Typing indicators tell you someone is editing; CRDTs handle the merging of what they typed. Operational Transform is the older alternative (used in Google Docs historically before CRDTs became dominant in newer collaborative tools).
Rate limiting and throttling: The sender-side throttle is a form of client-side rate limiting. The same token-bucket and leaky-bucket patterns used in API rate limiters apply here, just at the client tier. There is an important distinction: server-side rate limiting protects the server from abuse; client-side rate limiting in WhatsApp protects the radio, the battery, and the server simultaneously. In battery-optimization mode, the throttle interval doubles, which is equivalent to halving the token-budget for a given period.
Hashed Wheel Timers: The data structure most commonly used to implement large numbers of concurrent timers efficiently on a server or in a runtime. If you are designing a server-side equivalent of this pattern (e.g., for the group typing set with per-user TTLs), a hashed wheel timer is the appropriate building block. It offers O(1) timer insertion and expiry for millions of concurrent timers, far more efficient than a sorted set for uniform expiry distributions.