P2P File Sharing
Walk through a complete BitTorrent-style P2P system design, from a basic tracker server to a fully decentralized DHT network that serves petabytes of content with no central point of failure and automatic integrity verification.
What is a peer-to-peer file sharing system?
A P2P file sharing system distributes large files directly between participants, with no central server holding the content. Every downloader simultaneously becomes an uploader, so aggregate bandwidth scales with the number of participants instead of shrinking per-user as more people join. I treat this question as a trust-elimination problem first and a bandwidth problem second: once you internalize that no single node can be relied upon, every mechanism in the design follows naturally. The interesting engineering challenges are coordination without central authority (how do peers find each other?), incentive alignment (how do you stop users from downloading without sharing?), and data integrity (how do you trust data arriving from strangers?). This question tests DHT design, piece-selection and choking algorithms, content addressing, and Merkle tree integrity, making it one of the richest distributed-systems questions at the hard tier.
Functional Requirements
Core Requirements
- A user can add a file to the network and receive a shareable identifier (a magnet link or torrent file) that others can use to download it.
- Any peer can download a file using only its identifier, without contacting the original uploader.
- Files remain downloadable as long as at least one other peer holds a complete copy.
- Every downloaded piece is verified for integrity; corrupted or tampered pieces are rejected and re-fetched.
Below the Line (out of scope)
- User authentication or private sharing (all shared content is treated as public).
- Rate limiting, traffic shaping, or ISP-level optimization (e.g., BitTorrent's LT-seeding and LEDBAT).
- Streaming playback before a download completes.
- Legal enforcement or content takedown mechanisms.
The hardest part in scope: Ensuring a file remains downloadable after the original seeder goes offline. Solving this requires distributing both the content and the knowledge of who holds it across the peer swarm, with no single node that can become unavailable.
Authentication is below the line because the protocol operates on content-addressed data: knowing the infohash of a torrent already proves you have the right to request it. Production systems like private trackers layer HTTP authentication on top of the tracker protocol, separate from the core P2P mechanism.
Streaming is below the line because piece selection changes dramatically for video (sequential instead of rarest-first). Adding streaming support means running a parallel sequential-priority policy on the top N pieces while the rest of the torrent uses rarest-first. The tradeoff is covered in the piece selection deep dive.
Non-Functional Requirements
Core Requirements
- Availability: A torrent remains downloadable as long as at least one seeder is online. Target: 99.9% torrent availability for any torrent with 3+ seeders. There is no single point of failure in the content delivery path.
- Throughput: A peer downloading a popular file should saturate its uplink by receiving pieces from multiple peers simultaneously. Target is full bandwidth utilization (a 100 Mbps client downloads at 100 Mbps).
- Scale: The system must support millions of active torrents and tens of millions of simultaneous peers. The tracker or DHT must handle a sustained announce rate of hundreds of thousands of requests per second.
- Integrity: A corrupted or maliciously altered piece must be detected and discarded before assembly. Target: zero corrupted bytes written to disk; SHA-1 per-piece verification (BitTorrent v1 baseline) completes under 5 ms per 256 KB piece; SHA-256 per-block verification (BitTorrent v2) completes under 1 ms per 16 KB block on commodity hardware.
- Latency (peer discovery): A new peer should find its first set of peers for a torrent within 5 seconds of announcing.
Below the Line
- Sub-second peer discovery (requires centralized index or pre-seeded local DHT nodes).
- Global deduplication across torrents with identical content but different infohashes.
Read/write ratio: For every torrent created (one write to the network), thousands of downloads happen. Unlike a URL shortener where reads hit one central server, reads in P2P are distributed across the swarm. The bottleneck is not serving bandwidth from a single source but coordinating which peer sends which piece to which downloader. Optimize for coordination efficiency, not central throughput.
Full bandwidth utilization across millions of peers means a central server model fails immediately. A tracker serving 10 million concurrent peers at 100 Mbps each would need to push 1 exabyte per second. The tracker's only job is coordination (returning peer lists); the data transfer happens directly between peers.
I'd call out this number on the whiteboard early because it shuts down any central-server proposal immediately. Once the interviewer sees the exabyte math, they understand why P2P is the only viable architecture.
Integrity at petabyte scale means we cannot rely on TLS or a trusted origin. Data arrives from anonymous strangers who may be misconfigured, malicious, or silently corrupting packets. Each piece must carry a self-certifying hash so any receiver can verify it independently.
Core Entities
- Torrent: The metadata bundle for a shared file, containing the infohash (SHA-1 of its info dictionary), file name, total size, piece length (fixed, typically 256 KB to 1 MB), and an ordered list of SHA-1 hashes for each piece.
- Piece: A fixed-size chunk of the file data, identified by its zero-based index within the torrent. Pieces are downloaded independently and verified against the torrent's piece hash list before assembly.
- Block: A sub-unit of a piece (typically 16 KB) used in the wire protocol. Peers request blocks, not full pieces. Once all blocks of a piece arrive, the piece is assembled and verified.
- Peer: A network participant identified by a 20-byte peer ID, IP address, and port. A peer is a leecher if its download is incomplete and a seeder if it holds the complete file.
- Swarm: The complete set of peers (seeders and leechers) currently sharing a particular torrent.
- Tracker: A centralized coordination server that maps an infohash to a list of peer IP/port pairs. Peers announce their presence to the tracker periodically.
The primary relationships: a Torrent has many Pieces. A Swarm belongs to one Torrent and contains many Peers. Each Peer maintains a bitfield (a bit vector, one bit per piece) indicating which pieces it currently holds.
Schema detail and DHT storage layout are deferred to the deep dives.
API Design
The system has two distinct protocol surfaces: the tracker HTTP protocol (used by peers to discover each other) and the peer wire protocol (used by peers to exchange pieces directly).
FR 1 and FR 2: Announce presence and discover peers (tracker)
GET /announce
?info_hash=<20-byte-urlencoded-SHA1>
&peer_id=<20-byte-urlencoded-random>
&port=6881
&uploaded=0
&downloaded=0
&left=<bytes-remaining>
&event=started | completed | stopped
Response (compact format):
{
"interval": 1800,
"peers": "<binary blob: 6 bytes per peer (4-byte IP + 2-byte port)>"
}
The tracker returns up to 50 peer addresses. The peer opens connections to those addresses and begins exchanging pieces. The interval field tells the peer how often to re-announce (default 30 minutes). Use event=completed when the download finishes to tell the tracker the peer has become a seeder.
Supplemental (FR 1): Scrape (current swarm stats)
GET /scrape?info_hash=<20-byte-urlencoded-SHA1>
Response:
{
"files": {
"<info_hash>": {
"complete": 142,
"incomplete": 37,
"downloaded": 9821
}
}
}
Scrape lets a client check swarm health before committing to a download. A torrent with 0 seeders cannot be completed regardless of leecher count.
FR 3 and FR 4: Peer wire protocol (direct peer-to-peer)
The wire protocol is length-prefixed binary, not HTTP. Key message types:
// Peer wire protocol messages (simplified)
HANDSHAKE:
<pstrlen=19><"BitTorrent protocol"><reserved 8 bytes>
<info_hash 20 bytes><peer_id 20 bytes>
BITFIELD (after handshake):
<length><0x05><bitfield> // one bit per piece; 1 = peer has it
INTERESTED / NOT_INTERESTED:
<length><0x02> // "I want pieces you have"
<length><0x03> // "I no longer want pieces from you"
CHOKE / UNCHOKE:
<length><0x00> // "I will not upload to you"
<length><0x01> // "I will now upload to you"
REQUEST (download block from peer):
<length><0x06><piece_index><block_offset><block_length>
PIECE (upload block to peer):
<length><0x07><piece_index><block_offset><block_data>
HAVE (announce newly completed piece):
<length><0x04><piece_index>
REST-style HTTP verbs do not apply to the peer wire protocol. The protocol is a persistent binary TCP connection, not stateless request-response. Authentication is out of scope; in a production private tracker, the torrent file embeds a per-user passkey in the announce URL.
High-Level Design
1. Adding a file to the network
A peer who wants to share a file creates a torrent descriptor locally before any network communication.
Components:
- Seeding client: Hashes the file into fixed-size pieces and computes SHA-1(piece bytes) for each piece. Assembles the info dictionary and computes infohash = SHA-1(bencoded info dict).
- Torrent file / magnet link: The serialized descriptor containing the infohash, tracker URL, piece hashes, and file metadata. This is what the uploader shares with others.
- Tracker: Receives the seeder's first announce and adds it to the swarm list for that infohash.
Request walkthrough:
- Seeding client reads the file from disk and splits it into N pieces of fixed size (e.g., 256 KB each).
- For each piece, the client computes SHA-1(piece bytes) and stores the 20-byte hash in an ordered list.
- Client computes infohash = SHA-1(bencoded info dictionary containing piece hashes + file name + length).
- Client writes a
.torrentfile containing infohash, tracker URL, piece hashes, and piece length. - Client announces to the tracker with
event=started,left=0(it has the complete file). - Client generates a magnet link:
magnet:?xt=urn:btih:<infohash>&dn=<name>&tr=<tracker_url>.
The infohash is the only trust anchor in the system. Whoever holds the infohash and finds peers with matching data is guaranteed content integrity by the piece hashes embedded in the torrent. No central authority is needed after this step.
I always start at the whiteboard by drawing the torrent creation flow first, because it establishes the infohash as the single root of trust before any network communication happens. Candidates who skip this step end up hand-waving about "how peers verify data" later.
2. Peer discovery and swarm joining
A new peer with a magnet link needs to find other peers before it can download anything.
Naive approach: tracker-only discovery
Components:
- Leeching client: Has the infohash from the magnet link, contacts the tracker to get a peer list.
- Tracker: Stores infohash-to-peer-list mappings. Returns up to 50 peers per announce.
- Seeder peers: Already in the tracker's peer list, ready to answer incoming connections.
Request walkthrough:
- Leeching client contacts tracker:
GET /announce?info_hash=<infohash>&event=started&left=<file_size>. - Tracker looks up the infohash and returns a list of up to 50 peer IP:port addresses.
- Leeching client opens TCP connections to those peers and sends a handshake.
- Each peer replies with its bitfield (which pieces it currently holds).
- Client picks pieces to request based on peer bitfields.
What breaks: The tracker is a single point of failure. If the tracker goes down, new peers cannot discover the swarm even if dozens of seeders remain online. Many early torrents became permanently unavailable when their original tracker shut down.
Evolved approach: trackerless DHT
The fix moves peer discovery into the swarm itself using Kademlia DHT. Each peer maintains a routing table of other DHT nodes; the infohash is used as a key to locate which DHT nodes store the peer list for that torrent.
DHT eliminates the availability dependency on the tracker while still converging in O(log N) hops. If an interviewer asks "what happens when the tracker goes down?", DHT is the answer. The Kademlia deep dive covers the routing algorithm in detail.
3. Downloading pieces in parallel
Once a peer has swarm member addresses and their bitfields, it needs to select and request pieces efficiently.
Components:
- Piece picker: Chooses which piece to request next. Algorithm covered in the deep dive.
- Connection manager: Maintains open TCP connections to 30-50 peers concurrently.
- Block requester: Pipelines multiple 16 KB block requests per connection to prevent round-trip stalls.
- Piece buffer: Holds incoming blocks in memory until the full piece assembles for verification.
Request walkthrough:
- Peer computes missing pieces (total pieces minus own bitfield).
- Piece picker selects a target piece from the intersection of missing pieces and what connected peers have.
- Client sends
REQUESTmessages for all 16 KB blocks of the chosen piece across multiple connections. - Peers respond with
PIECEmessages containing block data. - Once all blocks of a piece arrive, the piece is assembled and verified (next step).
- Client sends
HAVE <piece_index>to all connected peers to update their records.
Block pipelining is critical for throughput. Without it, each 16 KB block waits for a full round trip before the next is requested. With a 50 ms RTT and no pipelining, a single connection delivers only 16 KB * (1000/50) = 320 KB/s. Pipelining 10 blocks per connection pushes that to 3.2 MB/s regardless of latency.
I've seen candidates overlook pipelining and then struggle to explain how a peer with 30 connections actually saturates a 100 Mbps link. Without pipelining, 30 connections at 320 KB/s gives only 9.6 MB/s (77 Mbps). With pipelining, the same 30 connections deliver 96 MB/s, which saturates the link.
4. Verifying piece integrity
Every piece received from the network is untrustworthy until verified. The torrent file carries a SHA-1 hash for each piece computed by the original seeder.
Components:
- Piece verifier: Computes SHA-1(assembled piece bytes) and compares against the torrent's piece hash at that index.
- Bad piece handler: Discards the corrupted piece, marks it as missing again, and re-requests it from a different peer.
Request walkthrough:
- All blocks of piece index
iarrive in the piece buffer. - Verifier computes
sha1(piece_bytes)and compares againsttorrent.piece_hashes[i]. - If match: write piece to disk. Send
HAVE <i>to all connected peers. - If mismatch: discard piece. Mark index
ias missing. Queue another request.
The verifier is the cryptographic trust boundary. Because piece hashes are embedded in the torrent file (itself authenticated by the infohash provided out-of-band), the entire content verification chain is self-contained with no central authority.
I always draw the verification step as a gate between the network and disk in the diagram. It makes the trust model visually obvious: untrusted bytes enter from the left, verified bytes exit to the right. Nothing crosses that boundary without a hash check.
Potential Deep Dives
1. How do peers discover each other without a central tracker?
When the tracker is down or the user connects via a magnet link with no tracker, the system needs a fully decentralized fallback.
2. What piece should a peer request next?
The piece selection strategy determines how quickly rare pieces spread across the swarm.
3. How do we prevent free-riders from downloading without uploading?
Without enforcement, every rational agent disables uploading. A swarm of free-riders has no upload bandwidth to share and collapses to zero throughput.
4. How do we verify data integrity efficiently at scale?
Two schemes exist: flat SHA-1 hashes per piece (BitTorrent v1) and Merkle trees (BitTorrent v2).
Final Architecture
The central insight: every mechanism is a response to the same constraint, that no single node can be trusted or relied upon. DHT removes the tracker availability risk. Rarest-first removes the seeder piece-distribution bottleneck. Tit-for-tat removes the free-rider economic incentive. Merkle trees remove the need to trust the data source. Each layer is independently motivated and independently replaceable.
I recommend closing the interview by walking through these four layers in sequence. It gives the interviewer a clean narrative arc and demonstrates that you understand how each component earns its place in the architecture.
Interview Cheat Sheet
- Start by framing the core constraint: no single node can be trusted or relied upon. Every mechanism flows from that.
- The infohash is the only trust anchor. It is SHA-1 of the torrent info dictionary, which contains all piece hashes. Knowing the infohash is sufficient to verify all content.
- A tracker is a directory service, not a CDN. It stores peer lists, never file data. Most tracker traffic is a GET /announce that returns 50 peer addresses.
- BitTorrent DHT uses Kademlia to find peers for any infohash in O(log N) hops across a 160-bit keyspace. With 10 million nodes, convergence takes approximately 23 hops.
- Rarest-first piece selection ensures every piece spreads quickly. Minimum availability across all pieces increases monotonically; no piece stays rare for long.
- Endgame mode: when fewer remaining pieces exist than connected peers, send the same REQUEST to all peers simultaneously and cancel duplicates on receipt. This removes tail latency from one slow peer.
- Tit-for-tat: each peer uploads only to the top 3 peers uploading the most to it, plus 1 random optimistic unchoke rotated every 30 seconds for new entrants.
- A seeder ranks unchoke candidates by upload rate received (bytes absorbed per second), not download rate, because seeders do not download.
- BitTorrent v1 uses flat SHA-1 per-piece hashes. v2 uses SHA-256 Merkle trees enabling block-level verification with a 20-hash proof path instead of re-fetching a full 256 KB piece for a 16 KB corruption.
- Block pipelining is critical for throughput. Without pipelining, a 50 ms RTT limits one connection to 320 KB/s. Pipelining 10 blocks per connection raises that above 3 MB/s regardless of latency.
- A seeder should stay online until it has uploaded at least 1.0x the file size (1:1 share ratio) to ensure the swarm has enough copies to survive after the seeder disconnects.
- The Sybil attack on DHT: an adversary controls many nodes near the target infohash to poison GET_PEERS responses. Mitigation: require ANNOUNCE_PEER to carry a short-lived token issued by the receiving node.
- PEX solves the "tracker is down" case only when at least one existing peer connection exists. It cannot bootstrap a peer that knows nobody. DHT solves bootstrap; PEX supplements it.