Auth System
Design a secure login and session management system for a web application, covering credential storage, session tokens, multi-factor authentication, OAuth flows, and password reset at millions of users.
What is a user authentication system?
Authentication confirms that a user is who they claim to be. The system verifies credentials, issues a session token, enforces expiry, and provides account recovery.
The engineering challenge is correctness: one misconfigured hash function, one missing rate limiter, or one improperly validated token can expose millions of accounts. I open this interview by saying "everything in this design is about not being the next headline breach" because it immediately frames every decision as security-first rather than feature-first. This question forces candidates to reason simultaneously about security primitives, distributed session state, and safe credential storage.
Functional Requirements
Core Requirements
- Users can register with email and password.
- Users can log in and receive a session token.
- Authenticated sessions expire after a configurable period.
- Users can reset forgotten passwords securely.
Below the Line (out of scope)
- Fine-grained authorization / RBAC (separate from authentication).
- Federated SSO across organizations (SAML).
The hardest part in scope: Credential storage. A bcrypt misconfiguration, a skipped timing-safe comparison, or a missing per-user salt turns a routine database breach into immediate mass credential exposure, with downstream account takeovers at every other service where users reuse the same password.
Social login via OAuth/OIDC is out of scope because it replaces the credential check entirely. To add it: implement Authorization Code flow with PKCE (Proof Key for Code Exchange) to exchange an authorization code for an access token at the provider, then store a provider_id and provider_user_id alongside the user record. Never store the provider access token in the database.
Federated SSO (SAML) is out of scope because it introduces an identity provider protocol that sits above the auth system. To add it: implement a SAML Service Provider that validates signed assertions from the IdP and maps them to local user records.
Non-Functional Requirements
Core Requirements
- Latency: Login completes in under 200ms p99.
- Scale: 10M registered users, 1M DAU. Login rate peaks at approximately 700 requests per second.
- Availability: 99.99% uptime. Authentication is the gateway to every other feature; a login outage halts all downstream services for all users.
- Security: Passwords are never stored in plaintext. All session tokens are cryptographically signed and verifiable server-side.
- Brute force protection: Login and password reset endpoints enforce rate limiting. Accounts lock after 5 consecutive failed login attempts within 15 minutes.
Below the Line
- Social login via OAuth/OIDC providers (Google, GitHub)
- Passkeys and WebAuthn credential management
- TOTP or SMS multi-factor authentication
- Session concurrency limits (maximum N active sessions per user)
The hardest engineering problem in scope: Getting password hashing right under real operational constraints. Argon2id at
memory=64MB, time=3takes approximately 100ms to compute, which fits within our 200ms login budget but leaves almost no headroom for DB latency and network overhead. Tuning parameters that are simultaneously attack-resistant and operationally viable is the core challenge here. I'd draw the 200ms budget breakdown on the whiteboard: 100ms hash + 20ms DB + 30ms network + 50ms headroom. That visual makes the constraint concrete and shows the interviewer you understand operational math, not just security theory.
Social login is below the line because it replaces credential checking rather than extending it. To add it: implement OAuth 2.0 Authorization Code + PKCE and map provider identities to local user records via a SocialIdentity join table.
Passkeys and WebAuthn are below the line because they require a separate registration ceremony and a different assertion verification path. To add them: store a credential_id and public_key per authenticator in a WebAuthnCredential table and verify the signed client data assertion on each login.
TOTP MFA is below the line but only just. To add it: store an encrypted TOTP secret per user in MFACredential, issue a short-lived challenge token after password verification, require a valid 6-digit code before issuing a full session token, and add a POST /auth/mfa/verify endpoint.
Session concurrency limits are below the line because they require active session enumeration per user. To add them: maintain a SET sessions_by_user:{user_id} in Redis with session IDs as members, and reject new logins when the cardinality exceeds the configured maximum.
Core Entities
- User: The registered account. Carries
email,hashed_password(the Argon2id or bcrypt output string, which embeds the salt automatically),created_at, andis_verified. - Session: An active authenticated session. Carries
session_id(a cryptographically random opaque value),user_id,expires_at, and arevokedboolean for explicit invalidation. - PasswordResetToken: A one-time recovery credential. Carries
token_hash(the database never stores the raw token, onlysha256(token)),user_id,expires_at, and ausedflag. - MFACredential (when MFA is in scope): A registered second factor. Carries
user_id,type(TOTP or SMS), andsecret(encrypted at rest with application-level encryption, not just database-level).
Full column types, indexes, and foreign key constraints are deferred to a data model deep dive. The four entities above are sufficient to drive every endpoint and system walkthrough in this article.
API Design
FR 1 - Register a new account:
POST /auth/register
Body: { email, password }
Response: 201 Created · { user_id }
Return only user_id on registration. Do not issue a session token until the email is verified; an unverified account should not access protected resources.
FR 2 - Log in and receive tokens:
POST /auth/login
Body: { email, password }
Response: 200 OK · { access_token, refresh_token, expires_at }
The access_token is a short-lived signed JWT (15 minutes). The refresh_token is an opaque random value stored server-side. The full token lifecycle tradeoff is covered in the deep dives.
FR 2 (with MFA) - Verify a TOTP code:
POST /auth/mfa/verify
Body: { challenge_token, totp_code }
Response: 200 OK · { access_token, refresh_token, expires_at }
challenge_token is a short-lived intermediate token (5 minutes) issued after a correct password but before MFA passes. This prevents a full session from being issued until both factors succeed.
FR 3 - Validate a session:
GET /auth/me
Headers: Authorization: Bearer <access_token>
Response: 200 OK · { user_id, email }
GET /auth/me serves as both a profile endpoint and a session health check. Clients call it to confirm a stored token is still valid before making other authenticated requests.
FR 3 - Log out:
POST /auth/logout
Headers: Authorization: Bearer <access_token>
Body: { refresh_token }
Response: 204 No Content
Use POST for logout because it is side-effecting (it invalidates server-side state). Include the refresh token in the body so it is revoked immediately, not just left to expire on its own.
FR 4 - Initiate password reset:
POST /auth/forgot-password
Body: { email }
Response: 202 Accepted · { message: "If that address is registered, a reset link was sent." }
Always return 202 with an identical message regardless of whether the email exists. Any response that diverges based on email presence leaks account enumeration information.
FR 4 - Complete password reset:
POST /auth/reset-password
Body: { token, new_password }
Response: 200 OK
token is the raw value from the email link. The server computes sha256(token) before any database lookup. The raw token never persists in the database.
High-Level Design
1. User registration
Solving: Store a new account so the user can authenticate in the future.
Components:
- Client: Web or mobile app sending
POST /auth/register. - Auth Service: Validates email format, checks for duplicate accounts, hashes the password, and writes to the database.
- Database: Stores user records with a UNIQUE constraint on
email.
Request walkthrough:
- Client sends
POST /auth/registerwith email and password. - Auth Service validates the email format; reject 400 for malformed addresses.
- Auth Service queries the DB for an existing user with that email; return 409 if found.
- Auth Service hashes the password and writes the new user row.
- Auth Service returns 201 with
user_id.
Naive approach: Store the password directly.
Plaintext storage fails immediately. Any database breach exposes every password with zero effort. Because users reuse passwords across services, a single breach cascades into account takeovers at dozens of other services without any additional attack work. I'd pause here in the interview and explicitly say "this is the naive version, and I'm drawing it only to show what breaks" so the interviewer knows you're building toward a better design, not proposing this.
The fix is to hash with Argon2id. Never store the password. Store only the output of a deliberately slow one-way function. Argon2id handles passwords of arbitrary length (unlike bcrypt, which silently truncates input at 72 bytes), making it the correct choice for new systems.
bcrypt truncates input at 72 bytes. A password of 73 bytes produces the exact same hash as the first 72 bytes, silently. Argon2id has no such limit. Use Argon2id for all new systems; upgrade existing bcrypt hashes on next login using the migration path described in the deep dives.
This diagram covers the registration write path. Login and session issuance come next.
2. Login and session issuance
Solving: Verify credentials and issue a token the client uses for all subsequent authenticated requests.
Components:
- Client: Sends
POST /auth/login. - Auth Service: Loads the user record, runs a timing-safe hash comparison, and issues a session token on success.
- Session Store: Persists the issued token so any Auth Service instance can validate it on subsequent requests.
Request walkthrough (naive: session stored in the primary DB):
- Client sends
POST /auth/loginwith email and password. - Auth Service queries the DB for the user by email.
- Auth Service runs
argon2id.verify(stored_hash, submitted_password)using a timing-safe comparison function. - On failure: increment the attempt counter and return 401.
- On success: generate a cryptographically random 32-byte session ID, write a
Sessionrow to the DB, and return the session token.
Session in the primary DB fails at scale. Every protected API call generates a DB read just to validate the session. The primary DB becomes a bottleneck for authentication, sharing capacity with all the business data it also serves.
The fix is to move sessions to Redis. Redis reads complete in under 1ms versus 10-20ms for a PostgreSQL row. Any Auth Service instance validates any session. Logout becomes a single DEL command with instant effect.
The key question for the interviewer: do they need instant revocation, or is eventual expiry acceptable? That single answer determines whether you use opaque Redis sessions or stateless JWTs with refresh tokens. I'd ask this question aloud in the interview because it shows you understand the fundamental tradeoff rather than defaulting to one approach. The answer reshapes the entire session layer.
Use a timing-safe comparison when verifying hashes. A naive string equality short-circuits on the first mismatched byte, leaking the hash character by character via response time differences. Node.js provides crypto.timingSafeEqual; most hashing libraries expose this as verify() rather than making you test equality yourself.
This diagram shows the login path with Redis as the session store. The next section covers how subsequent requests validate that session.
3. Session validation on protected routes
Solving: Verify that every authenticated request carries a valid, non-expired session token before it reaches downstream services.
Components:
- API Gateway or auth middleware: Intercepts every incoming request, extracts the token from the
Authorizationheader or session cookie, and validates it before forwarding. - Redis: Answers session lookups in under 1ms using the token as the key.
- Downstream service: Receives the request with an attached
user_idand handles business logic without its own auth DB call.
Request walkthrough:
- Client sends any authenticated request with
Authorization: Bearer <session_token>. - API Gateway extracts the token and queries Redis:
GET session:{token}. - If the key is absent or the TTL has elapsed, return 401 immediately.
- If the key exists, extract
user_idfrom the stored value. - Attach
user_idto the request context and forward to the downstream service.
Session revocation works by deleting the Redis key: DEL session:{token}. This covers user-initiated logout and admin-forced invalidation (account compromise detected) with the same operation. I've seen candidates forget this step entirely and leave sessions immortal until TTL expiry. Mentioning the DEL for both logout and admin-kill signals that you think about session lifecycle end-to-end.
4. Password reset flow
Solving: Allow a user who has forgotten their password to set a new one via a time-limited, verified email link.
Components:
- Auth Service: Generates a cryptographically random reset token, stores only
sha256(token)in the DB, and triggers an email delivery task. - Email Queue: Accepts the send task asynchronously so the HTTP response returns before email delivery completes.
- Email Service: Sends the reset link containing the raw token.
- Database: Stores the token hash with a 15-minute TTL and a
usedflag.
Request walkthrough:
- Client sends
POST /auth/forgot-passwordwith the email address. - Auth Service queries for the user. Respond 202 regardless of whether the email is registered.
- If the user exists: generate 32 random bytes as the reset token, compute
sha256(token), write{ token_hash, user_id, expires_at = now + 15min, used = false }toPasswordResetToken. - Enqueue a send-email task containing the raw token embedded in the reset URL.
- User clicks the reset link; the client sends
POST /auth/reset-passwordwith the raw token and new password. - Auth Service computes
sha256(submitted_token)and queriesPasswordResetToken WHERE token_hash = ? AND used = false AND expires_at > now(). - If found: hash the new password with Argon2id, update the user row, and mark the token
used = true.
Store only sha256(token) in the database, never the raw token. The raw token appears only in the email link. If the database is breached, the attacker has only hashes of tokens, which are useless without the raw values that went out in emails.
The 15-minute expiry and one-time used flag work together: the expiry limits the attack window if a reset email is intercepted, and the used flag prevents replay after a successful reset. I'd call out this double-guard pattern explicitly on the whiteboard because it demonstrates defense-in-depth thinking, which is exactly what interviewers look for in auth system designs.
Potential Deep Dives
1. How do we store passwords securely?
The goal is to make offline brute-forcing a stolen users table as expensive as possible. The choice of hash function and its parameters determines what that cost is.
2. Session management: opaque token vs JWT
The session representation is the most consequential design choice in the auth system. It affects per-request latency, revocation speed, and horizontal scaling behavior.
3. How do we defend against brute force and credential stuffing?
Brute force attacks try many passwords against one account. Credential stuffing attacks replay username/password pairs from prior data breaches against many accounts. Both attack the same login endpoint but require different defenses.
Final Architecture
The single most important insight is the JWT split: access tokens validate locally inside the Auth Service with no Redis round-trip, while the refresh token lifecycle (issuance, rotation, revocation) lives entirely in Redis. The vast majority of authentication checks consume zero extra network I/O, while instant revocation remains available by deleting the refresh key.
Interview Cheat Sheet
- Authentication proves identity; authorization controls what the proven identity can do. Scope the interview to authentication only and name RBAC as explicitly out of scope.
- Anchor the scale early: 10M registered users, 1M DAU, 700 logins per second at peak. These numbers justify Redis for sessions and drive Argon2id parameter tuning.
- Never store passwords in plaintext or with fast hashes (MD5, SHA-1, SHA-256). Use Argon2id; bcrypt is acceptable but silently truncates input at 72 bytes, which is a correctness hazard.
- Set Argon2id parameters to 64 MB memory, 3 iterations, 4 parallel threads. Target 50-100ms per hash on your login hardware. Retune after hardware upgrades.
- Always use a timing-safe comparison when verifying hashes. Naive string equality short-circuits on differing bytes and leaks hash prefixes via response time differences.
- Store password reset tokens as
sha256(token)in the database. The raw token appears only in the email link. A DB breach cannot expose usable reset tokens without the raw values. - The forgot-password endpoint always returns 202 with an identical neutral message regardless of whether the email is registered. Any divergence leaks account enumeration information.
- Reset tokens expire in 15 minutes and are single-use. Mark them
used = trueimmediately on first redemption to prevent replay attacks. - Issue short-lived access JWTs (15 minutes) plus long-lived opaque refresh tokens (30 days). Access JWTs validate locally with no Redis call. Refresh tokens are the revocation surface.
- On refresh token replay (a previously used token is submitted again), revoke the entire token family immediately. This is the primary detection signal for stolen refresh tokens.
- JWT is signed (HS256 for single-service, RS256 for multi-service), not encrypted. Never put sensitive claims, PII, or secrets in JWT payloads.
- Rate limit login at two layers: per-IP sliding window in Redis (10 attempts per 60 seconds) and per-account counter in Redis (5 attempts per 15 minutes). Both counters live in Redis, not the primary DB.
- After 3 consecutive login failures, require a CAPTCHA. After 5, enforce account lockout. Add progressive server-side delays (1 second after failure 2, 5 seconds after failure 4) to slow automated attacks.
- At registration, check the submitted password against the HaveIBeenPwned k-anonymity API. Send only the first 5 characters of
sha1(password). Reject the password if the full hash appears in the response. - For OAuth/OIDC social login: use Authorization Code + PKCE flow, never the Implicit flow. Store only the provider user ID, not the provider access token.