Feature flag engine

The Problem

Your company ships features behind boolean flags: if (featureFlags.get("dark-mode")) { ... }. The flags live in a YAML file that someone edits, commits, and redeploys. A typo in the YAML brought down checkout for 45 minutes last month. Product wants percentage rollouts ("give 5% of users the new search bar"), segment targeting ("enable for premium users in Canada"), and the ability to toggle flags in seconds without a deploy.

A hardcoded config file cannot do any of that. You need an evaluation engine that takes a flag key and user context, walks a tree of targeting rules, and returns the correct variant. The engine must be fast (it runs on every request), deterministic (same user always sees the same variant), and extensible (new rule types without editing existing evaluation code).

Design the core classes for a feature-flag evaluation engine that supports boolean and multivariate flags, percentage rollouts with sticky bucketing, composite targeting rules (AND/OR/NOT), override precedence, flag-change notification, and audit logging.

Requirements

Clarifying Questions

Before jumping into class design, ask questions to turn the vague prompt into a concrete specification. Cover four areas: core actions, error handling, boundaries, and future extensions.

You: "Are flags boolean only, or do we need multivariate flags that return strings or numbers?"

Interviewer: "Both. A boolean flag returns true/false. A multivariate flag returns one of several named variants, like 'control', 'variant-a', 'variant-b'."

Two flag types. Boolean is a special case of multivariate with exactly two variants. We can unify the model by treating every flag as multivariate and making boolean a convenience wrapper.

You: "For percentage rollouts, does the same user always see the same variant, or is it random on every evaluation?"

Interviewer: "Sticky. Once a user lands in the 20% bucket, they stay there. Use consistent hashing so the assignment is deterministic without storing per-user state."

Sticky bucketing via consistent hashing on userId + flagKey. No database lookups on the hot path.

You: "How complex can targeting rules get? Simple key-value checks, or nested AND/OR/NOT trees?"

Interviewer: "Nested. For example: (country = 'US' AND plan = 'premium') OR (email ends with '@beta.com'). Rules can be composed arbitrarily."

That rules out flat if-else chains. We need a tree structure for rule evaluation. Composite pattern is the natural fit.

You: "What is the override precedence? If a user matches multiple rules, which one wins?"

Interviewer: "User-level override beats everything. Then segment rules in priority order. Then percentage rollout. Then the default variant."

Clear precedence chain: user override > segment rules (ordered) > percentage rollout > default.

You: "Do other parts of the system need to react when a flag changes? For example, invalidating a cache or refreshing a UI."

Interviewer: "Yes. SDK clients and internal services should be notified when a flag configuration is updated so they can re-evaluate."

Observer pattern for flag-change propagation. Listeners register and get notified on config updates.

You: "Should we support different flag configs per environment, like dev, staging, and production?"

Interviewer: "Keep it simple. Assume a single environment for now. Environment scoping is an extension."

Good. One fewer dimension to model. We can note it in Extensibility.

You: "Do we need an audit trail of flag evaluations or config changes?"

Interviewer: "Config changes, yes. Log who changed what and when. Evaluation-level logging is out of scope."

Audit log for config mutations. Not per-evaluation, that would be too noisy.

Perfect. You have clarified scope and ruled out unnecessary complexity.

Final Requirements

Functional Requirements:

Evaluate a flag for a given user context and return the resolved variant (boolean or multivariate).
Support targeting rules that can be composed into AND/OR/NOT trees.
Support percentage rollouts with sticky bucketing (consistent hashing).
Enforce override precedence: user override > segment rules > percentage rollout > default.
Notify registered listeners when a flag configuration changes.
Log all flag configuration changes in an audit trail.

Non-Functional Requirements:

Fast evaluation: sub-millisecond for the common case (hot path on every request).
Thread-safe: concurrent evaluations and config updates must not corrupt state.
Extensible: new rule condition types (geo, device, date range) without editing existing evaluation code.

Out of Scope:

UI for flag management
Persistence / database layer
Environment scoping (dev/staging/prod)
Per-evaluation logging
A/B test metric collection

Example Inputs and Outputs

Scenario 1: Boolean flag with user override

Flag: dark-mode (boolean, default = false)
User override: userId "user-42" is force-enabled
Context: { userId: "user-42", country: "US", plan: "free" }
Expected: true (user override wins)
Why: validates that user overrides take highest precedence

Scenario 2: Percentage rollout

Flag: new-search-bar (boolean, default = false, 20% rollout enabled)
No user override, no segment rules match
Context: { userId: "user-77", country: "DE", plan: "premium" }
Expected: true if hash("user-77:new-search-bar") % 100 < 20, else false
Why: validates sticky percentage bucketing

Scenario 3: Composite targeting rule

Flag: premium-feature (boolean, default = false)
Segment rule: (country = "US" AND plan = "premium") OR (email endsWith "@beta.com")
Context: { userId: "user-99", country: "US", plan: "premium", email: "alice@work.com" }
Expected: true (left branch of OR matches)
Why: validates composite rule tree evaluation

Try It Yourself

Try it yourself

Before reading the solution, spend 15-20 minutes sketching your own class diagram. Focus on how you would model the rule tree and the override precedence chain. Compare your approach with the walkthrough below.

Step 1: Identify Core Entities

Start by asking: what are the main "things" in this problem? Look for nouns in the requirements and think about what responsibilities each one should have.

A feature-flag engine does one thing on the hot path: take a flag key and user context, walk a set of rules, and return a variant. Around that core loop, it manages flag configuration, notifies listeners of changes, and logs mutations.

Entity	Responsibility	Key attributes
`FeatureFlag`	Holds the full config for one flag: variants, rules, rollout, overrides, and default.	key, flagType, variants, defaultVariant, rules, overrides, rollout
`Variant`	A named value a flag can resolve to. For booleans, just `"on"` and `"off"`.	name, value
`EvaluationContext`	Carries user attributes at evaluation time. Immutable data holder.	userId, attributes map
`RuleCondition`	A single leaf condition like `country = "US"`. Composite nodes combine these.	attribute, operator, targetValue
`CompositeRule`	AND/OR/NOT node that composes child conditions into a tree.	operator (AND/OR/NOT), children
`RolloutStrategy`	Determines how percentage rollouts are bucketed. Strategy interface.	percentage, evaluate(context, flagKey)
`FlagChangeListener`	Observer notified when a flag config is updated.	onFlagChanged(flagKey, oldConfig, newConfig)
`AuditLog`	Records config mutations with who, when, and what.	entries list, log(entry)
`FlagEngine`	Orchestrator. Evaluates flags, manages configs, notifies listeners.	flagStore, evaluator, listeners, auditLog

Notice we separated RuleCondition (leaf) from CompositeRule (tree node). A flat list of conditions cannot express (A AND B) OR C. The Composite pattern lets us nest arbitrarily. We also separated RolloutStrategy from FeatureFlag because the bucketing algorithm is a policy that can vary independently.

Step 2: Define Relationships and Class Design

FlagEngine (orchestrator)

The engine is the entry point. It owns the flag store, the listener list, and the audit log. Evaluation is its core responsibility.

Deriving state from requirements: