Feature flags
How to decouple feature releases from code deployments using flags, covering flag types, targeting rules, evaluation architecture, flag debt cleanup, and gradual rollout strategies.
TL;DR
- Feature flags decouple deployment (code in production) from release (users see the feature). You deploy code on Monday but release the feature on Friday, with no additional deploy.
- Four flag types serve different purposes: release toggles (temporary rollout), ops toggles (kill switches), experiment toggles (A/B tests), and permission toggles (entitlement gates).
- Flag evaluation happens in-process via an SDK with a local cache synced from a central flag service. No network call on the hot path. Evaluation must be under 1ms.
- The biggest operational risk is flag debt: stale flags accumulate, nobody knows which are safe to remove, and the codebase fills with dead conditional branches.
- Combined with canary deployment, feature flags give you two layers of safety: canary validates the code works, flags control who sees the feature.
The Problem
Your team builds a new checkout flow over three months. The feature touches 40 files across the frontend and backend. Deployment day arrives. You merge the feature branch, deploy, and pray. The new checkout breaks for users on mobile Safari. Reverting requires a hotfix deploy, which takes 45 minutes. During that window, mobile conversion drops to zero.
The core issue: the code went from "never seen by a real user" to "seen by all users" in a single atomic deploy. There was no gradual exposure, no ability to disable it without a full revert, and no way to show it to internal users first.
What if the new checkout code was deployed but invisible? What if you could flip a switch in a dashboard to show it to 1% of users, watch the metrics, expand to 10%, then to everyone? And if something broke, flip the switch back in seconds, no deployment needed?
That's what feature flags enable. The code is in production from day one. The feature is visible only when you decide.
Here's the same deploy with a feature flag:
The difference: the fix took 5 seconds (toggle a flag) instead of 45 minutes (emergency deploy). And only 1% of users were affected, not everyone.
One-Line Definition
Feature flags are conditional branches in your code that control which features are visible or active, evaluated at runtime against configuration rules that can be changed without redeploying.
Analogy
Think of a house with every room wired for electricity, but each room has its own circuit breaker in the panel. The wiring (code) is installed during construction (deployment). But the lights (features) only turn on when you flip the breaker (flag). If a room's wiring has a problem, you flip that one breaker off. No need to rewire the house (redeploy). No need to cut power to the whole building (rollback).
Solution Walkthrough
Flag types taxonomy
Not all flags serve the same purpose. The type determines the flag's expected lifetime, who manages it, and how it's cleaned up.
The most dangerous confusion: treating a release toggle like a permanent ops toggle. Release toggles should be removed within weeks. If they linger for months, nobody remembers whether the old code path still works, and you've accidentally created permanent dead code.
Release toggles
The most common type. Temporarily wraps a new feature to control its rollout.
if feature_flags.is_enabled("new_checkout_flow", user=current_user):
return render_new_checkout()
else:
return render_old_checkout()
Expected lifetime: days to weeks. Should be removed once rollout hits 100% and is stable. I've seen codebases with 200+ release toggles that were never cleaned up. At that point, the flag system is no longer helping you ship safely. It's actively hurting code clarity.
Ops toggles (kill switches)
Used to disable a feature during an incident without a code deployment:
if feature_flags.is_enabled("recommendation_engine"):
items = recommendations.get(user_id)
else:
items = fallback_popular_items() # graceful degradation
Expected lifetime: permanent. The flag is the operational circuit breaker. During an incident, on-call flips the flag, and the expensive recommendation engine stops being called. No deploy. No code change. Response time: seconds.
For your interview: mention kill switches when discussing graceful degradation. "We'd wrap the recommendation call in an ops toggle so we can disable it during a database overload without redeploying."
Experiment toggles (A/B tests)
Split users into cohorts for product experiments:
variant = experiments.get_variant("checkout_button_color", user_id)
# variant is "control", "blue_button", or "green_button"
render_checkout(button_color=variant.button_color)
Expected lifetime: duration of the experiment (weeks to months). Must be cleaned up after the experiment concludes. Unlike release toggles, experiment toggles need stable cohort assignment, meaning the same user always sees the same variant for the duration of the experiment.
Permission toggles
Control access to features based on user attributes:
if user.tier == "enterprise" and feature_flags.is_enabled("sso"):
show_sso_settings()
Expected lifetime: potentially permanent. These aren't really "deployment" tools. They're entitlement gates. The SSO feature is always deployed but only visible to enterprise customers.
Targeting rules
Modern flag systems support complex targeting beyond simple on/off:
"Show new checkout to users where:
user.country == 'US'
AND user.account_age_days > 30
AND random_bucket(user_id) < 0.05 (5% of eligible users)"
Evaluation order (most specific first):
1. Enable for internal users (email matches *@company.com)
2. Enable for beta users (opted in)
3. Enable for 5% random users (percentage rollout)
4. Default: disabled for everyone else
Targeting rules should always evaluate from most specific to most general. The first matching rule wins. This lets you override the general rollout percentage for specific users or segments without changing the overall rollout.
The Flag Evaluation Architecture
Flag evaluation must be fast. In a web application, a single request might evaluate 10-20 flags. If each evaluation makes a network call to a flag service, you've added 10-20 network round trips. Unacceptable.
The solution: evaluate flags locally using an in-process SDK backed by a cached copy of the flag configuration. The SDK syncs with the central flag service in the background.
Client-side vs server-side evaluation:
| Approach | Evaluation Location | Latency | Security | Use Case |
|---|---|---|---|---|
| Server-side SDK | Application server | Sub-millisecond | Rules stay server-side | Backend services |
| Client-side SDK | Browser/mobile | Sub-millisecond (cached) | Rules exposed to client | Frontend features |
| Edge evaluation | CDN edge | Sub-millisecond | Rules at CDN config | Performance-critical |
For the server-side SDK, the application process downloads the full flag ruleset at startup and evaluates locally. Updates stream from the flag service via Server-Sent Events (SSE) or WebSocket. Propagation delay is typically 30 to 60 seconds.
For client-side SDKs, the evaluation happens in the browser or mobile app. The SDK calls the flag service once (on page load or app launch), receives the evaluated results for the current user, and caches them. The key difference: the client-side SDK receives pre-evaluated boolean results, not the rules themselves. Sending targeting rules to the browser would expose business logic and user segmentation to anyone with DevTools open.
Interview tip: flag evaluation performance
When discussing flag evaluation in interviews, mention "in-process SDK with local cache, no network call on the hot path." This shows you understand that flag evaluation is on the critical request path and must be fast.
Implementation Sketch
Continue Reading with Premium
Unlock this article and every other in-depth system design guide on the platform with NotesFromSDE Premium.