Design a Version Control System
OOP design for a Git-like version control system covering repositories, commits as a DAG, branching, merging, diffs, staging area, and conflict detection.
The Problem
Your team builds developer tools for a fast-growing startup. Engineers currently share code by copying zip files to a shared drive, emailing diffs, and hoping nobody overwrites someone else's changes. Last week two developers spent a full day reconciling conflicting edits to the same service. The CTO wants an internal version control system that tracks changes, supports branching for parallel feature work, and detects conflicts before they reach production.
A naive approach is storing full copies of every file on each save. That wastes storage, provides no way to trace who changed what, and gives zero support for parallel development. A content-addressable object model (like Git's) solves all three: identical content is stored once, commits form a directed acyclic graph (DAG) that records full history, and branches are lightweight pointers that enable parallel work with merge-time conflict detection.
Design the core classes for a version control system that supports repositories, content-addressable storage, commits forming a DAG, branching, staging, diffing, and three-way merge with conflict detection.
Requirements
Clarifying Questions
Before jumping into class design, ask questions to turn the vague prompt into a concrete specification. Cover four areas: storage model, branching, merging, and boundaries.
You: "How should file content be stored? Full snapshots per commit, or deltas between versions?"
Interviewer: "Use content-addressable storage. Hash the content to get an ID. If two files have identical content, store it once."
Content-addressable storage means a Blob is keyed by SHA-256(content). Deduplication comes for free.
You: "How do we represent directory structure within a commit? A flat file list, or a tree?"
Interviewer: "A tree. Each commit points to a root Tree object that maps names to either Blobs (files) or child Trees (subdirectories)."
Composite pattern. A Tree contains entries that are either Blob or Tree, recursively representing the full project snapshot.
You: "What does the commit model look like? Linear history, or do we support merge commits with multiple parents?"
Interviewer: "Commits form a DAG. Regular commits have one parent. Merge commits have two parents. The very first commit has no parent."
Good. Each Commit stores a List<String> of parent IDs. DAG traversal enables log, blame, and merge-base detection.
You: "How should branching work? Heavyweight copies of all files, or lightweight references?"
Interviewer: "Lightweight. A branch is just a named pointer to a commit ID. Creating a branch is instant."
That means a Branch is essentially a record(String name, String commitId). HEAD points to the current branch (or directly to a commit in detached mode).
You: "What merge strategies do we need to support?"
Interviewer: "Start with fast-forward when possible. Fall back to three-way merge using the merge base. Detect conflicts but do not auto-resolve them."
Strategy pattern for merge. Fast-forward is trivial (just move the pointer). Three-way merge diffs both branches against the common ancestor and flags conflicts.
You: "Do we need a staging area (index) like Git, or do commits snapshot the entire working directory?"
Interviewer: "Yes, support a staging area. Users explicitly add files to the index before committing."
The staging area is a mutable snapshot of what the next commit will contain. add(path) copies the working file into the index.
You: "How should diffs work? Line-by-line comparison, or something simpler?"
Interviewer: "Line-by-line is sufficient. Show added, removed, and modified lines between two file versions."
Strategy pattern for diff algorithms. Start with a simple line-by-line comparison, but keep the interface pluggable.
You: "Are remote repositories, push, and pull in scope?"
Interviewer: "Out of scope for initial design. But design so that remote support can be added later."
Perfect. You have now clarified scope and ruled out unnecessary complexity.
Final Requirements
Functional Requirements:
- Initialize a repository with an empty root tree and a default
mainbranch. - Stage file changes (add, modify, remove) into a staging area.
- Create commits that snapshot the staged tree, record author/message/timestamp, and link to parent commit(s).
- Create branches as lightweight pointers to a commit.
- Switch branches (checkout), updating the working tree to match the target commit's snapshot.
- Merge two branches using fast-forward when possible, three-way merge otherwise.
- Detect and report conflicts when the same file is modified on both branches.
- Generate line-by-line diffs between any two commits or between working tree and staged tree.
Non-Functional Requirements:
- Content-addressable storage: identical content produces the same hash, enabling deduplication.
- Extensibility for new merge strategies, diff algorithms, and future remote support.
- Thread safety for concurrent read operations on the object store.
Out of Scope:
- Remote repositories (push, pull, fetch)
- File permission tracking
- Binary file diffing
- Rebasing (discussed in Extensibility)
- UI or CLI implementation
Example Inputs and Outputs
Scenario 1: Basic commit workflow
- Input: Initialize repo, create
README.mdwith content"hello", stage it, commit with message"initial commit". - Expected: One commit on
main. The commit's tree contains a single blob entryREADME.mdwith hashsha256("hello").mainpoints to this commit. HEAD points tomain. - Why: Validates the full add-stage-commit pipeline and content-addressable storage.
Scenario 2: Branching and fast-forward merge
- Input: From the state above, create branch
feature, switch to it, modifyREADME.mdto"hello world", stage, commit. Switch back tomainand mergefeature. - Expected: Since
mainhas no new commits, the merge is a fast-forward.mainnow points to the same commit asfeature. No conflicts, no merge commit. - Why: Validates branch creation, checkout, and fast-forward merge detection.
Scenario 3: Three-way merge with conflict
- Input: From scenario 1, create branch
feature. Onmain, changeREADME.mdto"hello from main", stage, commit. Switch tofeature, changeREADME.mdto"hello from feature", stage, commit. Switch tomainand mergefeature. - Expected: Both branches modified the same file. The merge detects a conflict on
README.mdand returns aMergeResultwith statusCONFLICTand aConflictMarkershowing both versions. - Why: Validates three-way merge and conflict detection.
Try It Yourself
Try it yourself
Before reading the solution, spend 20-25 minutes sketching your class diagram. Focus on the object model first: how do Blob, Tree, and Commit relate? How does content-addressable storage work? Then think about how branches and the staging area fit in. Compare your approach with the walkthrough below.
Step 1: Identify Core Entities
Start by asking: what are the main "things" in this system? Look at the nouns in your requirements. You will find objects for storage (blobs, trees), history (commits), navigation (branches, HEAD), workspace management (staging area), and merge results.
A common mistake is cramming everything into a single Repository god class. Good design means each class has a single, clear job.
| Entity | Responsibility | Key attributes |
|---|---|---|
| Repository | The orchestrator. Manages branches, HEAD, object store, and staging area. | objectStore, branches, head, stagingArea |
| Blob | Immutable file snapshot. Stored by content hash. | id (hash), content |
| Tree | Immutable directory snapshot. Maps names to child blobs or trees. | id (hash), entries |
| TreeEntry | A single entry in a tree: name, type (blob or tree), and target hash. | name, type, targetId |
| Commit | Immutable snapshot in history. Points to root tree and parent commit(s). | id, treeId, parentIds, author, message, timestamp |
| Branch | Lightweight named pointer to a commit. | name, commitId |
| HeadRef | Points to current branch name (attached) or commit ID (detached). | branchName or commitId |
| StagingArea | Mutable working index. Tracks files staged for the next commit. | stagedEntries |
| ObjectStore | Content-addressable map. Stores blobs, trees, and commits by hash. | objects |
| DiffResult | Output of comparing two files. Contains list of line-level changes. | changes |
| DiffEntry | A single line change: added, removed, or context. | type, lineNumber, content |
| MergeResult | Outcome of merging two branches: success, fast-forward, or conflict. | status, conflicts, mergeCommitId |
| ConflictMarker | Details of a single conflicting file: base, ours, and theirs content. | path, baseContent, oursContent, theirsContent |
Notice we separated Blob and Tree from Commit. A commit references a tree snapshot. The tree and blob objects are reusable across commits (content-addressable deduplication). Merging them would destroy that property.
Step 2: Define Relationships and Class Design
Repository (The Orchestrator)
Repository is the top-level entry point. Every user action (stage, commit, branch, merge) goes through it.
Deriving state from requirements:
| Requirement | What Repository must track |
|---|---|
| "Content-addressable storage" | An ObjectStore for blobs, trees, and commits |
| "Create branches as lightweight pointers" | A map of branch names to Branch objects |
| "Switch branches (checkout)" | A HeadRef pointing to active branch or detached commit |
| "Stage file changes into a staging area" | A StagingArea holding the staged snapshot |
| "Merge using fast-forward or three-way" | A MergeStrategy for delegating merge logic |
| "Generate line-by-line diffs" | A DiffStrategy for delegating diff logic |
Deriving methods from needs:
| Need from requirements | Method |
|---|---|
| "Initialize a repository" | init() |
| "Stage file changes" | stage(path, content), unstage(path) |
| "Create commits" | commit(author, message) |
| "Create branches" | createBranch(name) |
| "Switch branches" | checkout(branchName) |
| "Merge two branches" | merge(branchName) |
| "Generate diffs" | diff(commitId1, commitId2) |
| "View history" | log() |
StagingArea
The staging area sits between the working directory and the commit history. It accumulates changes until the user commits.
Deriving state from requirements:
| Requirement | What StagingArea must track |
|---|---|
| "Users explicitly add files before committing" | A map of file paths to file content |
| "Commit snapshots the staged tree" | Must be convertible to a Tree object |
The key method is buildTree(objectStore). It takes the flat path-to-content map, creates Blob objects for each file, constructs the nested Tree hierarchy, and returns the root Tree. This bridges the mutable staging world and the immutable object store.
Key Relationship Decisions
ObjectStore owns all immutable objects. Blobs, Trees, and Commits are never stored directly on other classes. Everything is referenced by hash ID. This is the heart of content-addressable design.
Commit references Tree by ID, not by direct object reference. This keeps Commit a simple record and lets the ObjectStore handle retrieval and deduplication.
MergeStrategy and DiffStrategy are injected into Repository. This makes merge and diff behavior swappable without changing the Repository class.
Step 3: Choose Design Patterns
Pattern: Composite -- for the file tree snapshot
The signal: Commits reference a "root tree" that contains files and subdirectories. Subdirectories contain more files and subdirectories. That is a recursive tree structure.
Continue Reading with Premium
Unlock this article and every other in-depth system design guide on the platform with NotesFromSDE Premium.