How content moderation works at scale

The Problem Statement

Interviewer: "Facebook processes over 100 million posts per day. YouTube receives 500 hours of video every minute. How do these platforms decide what content violates their policies and should be removed? Walk me through the architecture of a content moderation system at scale."

This question tests whether you understand the tradeoffs between speed, accuracy, and cost in a classification pipeline. It also tests whether you know that "content moderation" is not one system but a tiered pipeline where each layer handles different confidence levels.

The hidden rubric: the interviewer wants to hear about the three tiers (hash matching, ML classification, human review), the concept of confidence thresholds and routing, and the tradeoff between false positives (removing good content, angering users) and false negatives (leaving bad content up, causing harm). If you only talk about "an ML model that classifies content," you have missed 80% of the system.

Clarifying the Scenario

You: "When you say content moderation, are we talking about a specific content type (images, text, video) or the full pipeline across all types?"

Interviewer: "Think about a platform that handles all types, but focus your deep dives on whichever parts are most interesting architecturally."

You: "Got it. And should I assume we are building this for a large platform (100M+ posts per day) or a smaller one?"

Interviewer: "Large scale. Think Facebook or YouTube."

You: "OK. I also want to clarify: are we covering only the automated pipeline, or also the human review workflow and appeals process?"

Interviewer: "All of it. I want to see the end-to-end system."

You: "Perfect. I will structure my answer in three parts: hash-based detection for known bad content, ML classification for new violations, and the human review queue that handles ambiguous cases. Then I will cover how these tiers connect and the key tradeoffs."

My Approach

I think about content moderation as a funnel with three stages, each progressively more expensive and accurate:

Perceptual hashing (Tier 1): Compare incoming content against a database of known-bad hashes. This is fast (~1ms), exact, and handles previously identified illegal content (child exploitation material, known terrorist propaganda). It catches 5-15% of all violations but with near-zero false positives.
ML classification (Tier 2): Run every piece of content through machine learning classifiers that detect categories like nudity, violence, hate speech, and spam. This is moderately fast (~50-200ms), catches 70-85% of remaining violations, but produces probabilistic confidence scores that need threshold tuning.
Human review (Tier 3): Route content that falls in the "uncertain" confidence range to human moderators, prioritized by severity and reach. This is slow (minutes to hours) and expensive ($1-3 per review) but handles context-dependent decisions that ML cannot make.

The key architectural challenge is not any single tier. It is the routing logic between tiers: which confidence thresholds trigger which actions, how to prioritize the human review queue, and how to handle the tradeoff between speed (take it down now) and accuracy (are we sure it is actually a violation).

The Architecture

Here is how the flow works, step by step:

Content arrives: A user posts an image, writes a comment, or uploads a video. The preprocessor extracts analyzable content: text is tokenized, images are resized and normalized, video is broken into key frames.
Tier 1 hash check: The content's perceptual hash is compared against a database of known-bad hashes. If there is a match, the content is removed immediately with no further analysis. This catches re-uploads of previously identified illegal material. At Facebook's scale, this tier processes in under 1 millisecond.
Tier 2 ML classification: If no hash match, the content runs through multiple ML classifiers simultaneously (one for nudity, one for violence, one for hate speech, etc.). Each classifier returns a confidence score between 0 and 1.
Confidence routing: A routing engine examines the scores. High confidence (above 0.95) triggers automatic removal. Low confidence (below 0.7) means the content is allowed. The middle band (0.7 to 0.95) is routed to human review.
Human review: A priority queue ranks items by severity (the type of violation) multiplied by reach (how many people will see this content). A moderator reviews the content in context and makes a final decision.
Appeals: Users whose content was removed can appeal. A different moderator reviews the decision. If the second moderator disagrees with the first, the content is restored.

For the interview, the most important thing to communicate is the funnel shape. Tier 1 is the cheapest and handles the clearest cases. Tier 2 is moderate cost and handles the bulk. Tier 3 is expensive and handles only the ambiguous middle.

Perceptual Hashing and Known-Bad Content Detection

The fastest and most accurate layer in the pipeline is hash matching. It answers one question: "Have we seen this exact (or nearly exact) piece of content before, and was it previously classified as a violation?"

The key concept: a perceptual hash is not a cryptographic hash. MD5 or SHA-256 produces a completely different output if you change a single pixel. A perceptual hash produces a similar output for visually similar images. Crop the image, add a watermark, change the resolution, or apply a filter, and the perceptual hash stays nearly identical.

PhotoDNA (developed by Microsoft, adopted as an industry standard) generates a 144-byte hash from the image's structural properties. Two visually identical images produce hashes with a Hamming distance of less than 10 (out of 1152 bits). Two visually different images produce a distance greater than 100.

The Problem Statement

Interviewer: "Facebook processes over 100 million posts per day. YouTube receives 500 hours of video every minute. How do these platforms decide what content violates their policies and should be removed? Walk me through the architecture of a content moderation system at scale."

Clarifying the Scenario

You: "When you say content moderation, are we talking about a specific content type (images, text, video) or the full pipeline across all types?"

Interviewer: "Think about a platform that handles all types, but focus your deep dives on whichever parts are most interesting architecturally."

You: "Got it. And should I assume we are building this for a large platform (100M+ posts per day) or a smaller one?"

Interviewer: "Large scale. Think Facebook or YouTube."

You: "OK. I also want to clarify: are we covering only the automated pipeline, or also the human review workflow and appeals process?"

Interviewer: "All of it. I want to see the end-to-end system."

My Approach

I think about content moderation as a funnel with three stages, each progressively more expensive and accurate:

Perceptual hashing (Tier 1): Compare incoming content against a database of known-bad hashes. This is fast (~1ms), exact, and handles previously identified illegal content (child exploitation material, known terrorist propaganda). It catches 5-15% of all violations but with near-zero false positives.
ML classification (Tier 2): Run every piece of content through machine learning classifiers that detect categories like nudity, violence, hate speech, and spam. This is moderately fast (~50-200ms), catches 70-85% of remaining violations, but produces probabilistic confidence scores that need threshold tuning.
Human review (Tier 3): Route content that falls in the "uncertain" confidence range to human moderators, prioritized by severity and reach. This is slow (minutes to hours) and expensive ($1-3 per review) but handles context-dependent decisions that ML cannot make.

The Architecture

Here is how the flow works, step by step:

Content arrives: A user posts an image, writes a comment, or uploads a video. The preprocessor extracts analyzable content: text is tokenized, images are resized and normalized, video is broken into key frames.
Tier 1 hash check: The content's perceptual hash is compared against a database of known-bad hashes. If there is a match, the content is removed immediately with no further analysis. This catches re-uploads of previously identified illegal material. At Facebook's scale, this tier processes in under 1 millisecond.
Tier 2 ML classification: If no hash match, the content runs through multiple ML classifiers simultaneously (one for nudity, one for violence, one for hate speech, etc.). Each classifier returns a confidence score between 0 and 1.
Confidence routing: A routing engine examines the scores. High confidence (above 0.95) triggers automatic removal. Low confidence (below 0.7) means the content is allowed. The middle band (0.7 to 0.95) is routed to human review.
Human review: A priority queue ranks items by severity (the type of violation) multiplied by reach (how many people will see this content). A moderator reviews the content in context and makes a final decision.
Appeals: Users whose content was removed can appeal. A different moderator reviews the decision. If the second moderator disagrees with the first, the content is restored.

How content moderation works at scale

The Problem Statement

Clarifying the Scenario

My Approach

The Architecture

Perceptual Hashing and Known-Bad Content Detection

Continue Reading with Premium

Comments

How content moderation works at scale

The Problem Statement

Clarifying the Scenario

My Approach

The Architecture

Perceptual Hashing and Known-Bad Content Detection

Continue Reading with Premium

Comments