Cloudflare BGP leak 2019
How a configuration error at a small ISP created a BGP route leak that caused Cloudflare, Amazon, and hundreds of other services to become unreachable for 6 hours in 2019.
TL;DR
- On June 24, 2019, a small Pennsylvania ISP (Allegheny Technologies, AS396531) misconfigured a BGP route optimizer and re-advertised ~20,000 Cloudflare routes through Verizon's backbone to the global internet.
- BGP's shortest-path algorithm preferred the leaked 3-hop path over the legitimate 7-hop path, funneling hundreds of gigabits of traffic through a link with ~100 Mbps capacity.
- Cloudflare's DNS resolver (1.1.1.1), Amazon, Google, and hundreds of smaller services became unreachable or severely degraded for approximately 6 hours.
- Three defense layers failed simultaneously: no prefix limit filters at Verizon, no RPKI validation, and no automated leak detection.
- The fix was entirely manual: Verizon removed the invalid routes after ~6 hours of escalation and verification.
- The transferable lesson: any system built on trust-based protocols needs defense-in-depth, because a single misconfiguration by a third party can take you offline.
What Happened
On the morning of June 24, 2019, a routine day on the internet turned into a 6-hour crisis that affected millions of users.
Allegheny Technologies (AS396531), a small ISP in Pennsylvania operating as a subsidiary of DQE Communications under Verizon's network, made a BGP configuration error. Their optimizer software leaked approximately 20,000 prefixes it had learned from Cloudflare's network (AS13335) and re-advertised them as customer routes to its upstream peers.
To put the scale in context: Allegheny was a tiny ISP, the kind that serves a local region with modest bandwidth. Cloudflare is one of the largest network operators in the world, with a global anycast network spanning 200+ data centers on six continents. The idea that a configuration change at Allegheny could affect Cloudflare's global reachability sounds absurd. But BGP doesn't care about your network's size. It cares about your AS path length.
Those routes propagated through Verizon's backbone (AS701) and out to the global internet. Because the leaked AS path was shorter than the legitimate path, BGP routers worldwide selected the invalid route. Traffic destined for Cloudflare, Amazon, and hundreds of other major services got funneled through Allegheny's tiny network link.
The speed of propagation is worth emphasizing. BGP updates propagate across the internet in seconds to minutes. By the time Cloudflare's monitoring detected the anomaly (roughly 10 minutes), the leaked routes had already been accepted by thousands of autonomous systems worldwide. There's no "recall" mechanism for a bad route. Once it's propagated, it stays until someone explicitly withdraws it.
I've dealt with BGP incidents in production, and the scariest part is always the same: the problem originates in a network you don't control, and there's nothing you can do except wait for someone else to fix it.
| Timestamp (UTC) | Event |
|---|---|
| ~09:00 | Allegheny Technologies' BGP optimizer leaks ~20,000 prefixes as customer routes |
| ~09:02 | Leaked routes propagate through Verizon (AS701) to global routing tables |
| ~09:05 | Cloudflare, Amazon, Google begin experiencing traffic blackholing and severe congestion |
| ~09:10 | Cloudflare detects the anomaly via external monitoring and begins investigation |
| ~09:15 | Internet routing community begins observing anomalous AS paths via looking glasses |
| ~09:30 | Cloudflare identifies the route leak source (AS396531) and contacts Verizon |
| ~10:00+ | Multiple NOCs (Network Operations Centers) begin filing abuse reports with Verizon |
| ~11:00 | Verizon acknowledges the issue internally and begins investigating |
| ~13:00 | Verizon identifies the specific peering session and misconfigured routes |
| ~15:00 | Verizon manually removes the invalid routes from their backbone |
| ~15:30 | Global routing tables converge back to legitimate paths; services recover |
The total duration was approximately 6 hours. Six hours of degraded or unreachable service for some of the largest properties on the internet, caused by a misconfiguration at a tiny ISP most people had never heard of.
The most frustrating part for every network engineer watching this unfold in real time: there was nothing Cloudflare could do. The misconfiguration wasn't in their network. The invalid routes weren't in their routing tables. They had to wait for Verizon to act.
What is a BGP route leak?
A BGP route leak occurs when an AS advertises routing information to a neighbor that violates the intended routing policy. The most common type: re-advertising routes learned from a provider to another provider, effectively becoming an unauthorized transit point. RFC 7908 formally defines seven categories of route leaks.
How the System Worked Before
To understand why this happened, you need to understand how BGP (Border Gateway Protocol) actually works. BGP is the routing protocol that holds the internet together. Every large network on the internet operates as an Autonomous System (AS), identified by a unique AS number. As of 2019, there were over 65,000 active ASes on the internet.
When Cloudflare wants the world to reach its network, it advertises its IP prefixes to its transit providers. Those providers propagate the advertisements to their peers and customers. The result is a distributed routing table where every AS on the internet knows how to reach every other AS.
An IP prefix is a range of IP addresses expressed in CIDR notation (like 104.16.0.0/12). Cloudflare owns thousands of these prefixes. When Cloudflare "advertises" a prefix, it sends a BGP UPDATE message to its transit providers saying: "I can deliver traffic for these IP addresses." The transit provider then sends its own UPDATE to its peers, prepending its own AS number to the path.
The AS path is the sequence of AS numbers that a route has traversed. Each time a route passes through an AS, that AS's number gets prepended. The result: every router on the internet can see the complete chain of networks a packet would traverse to reach a destination.
Here's what a normal BGP route looks like in a routing table:
// BGP routing table entry for Cloudflare prefix
Prefix: 104.16.0.0/12
Next Hop: 192.0.2.1 (transit provider router)
AS Path: [701, 3356, 13335]
Origin: IGP
Local Pref: 100
MED: 0
Valid: yes
Best: yes
The AS path [701, 3356, 13335] means: this route was originated by AS13335 (Cloudflare), passed through AS3356 (Lumen), and arrived via AS701 (Verizon). Three hops. A router comparing this against an alternative path of [174, 1299, 3356, 13335] (four hops) would prefer the shorter one.
BGP routing relies on three relationship types between autonomous systems:
| Relationship | Description | Route propagation rule |
|---|---|---|
| Customer to Provider | Customer pays provider for transit | Provider re-advertises customer routes to all peers and other customers |
| Peer to Peer | Two networks exchange traffic for free | Peer routes only advertised to customers, not to other peers or providers |
| Provider to Customer | Provider sends full routing table to customer | Customer should only re-advertise its own prefixes and its customers' prefixes upstream |
The critical rule: a customer should never re-advertise routes learned from one provider to another provider. That would make the customer a free transit point, funneling traffic it cannot handle. This is called the "valley-free" routing principle: traffic should flow up to a common transit provider, across at a peering point, or down to a customer. It should never go up, down, and up again.
In the Cloudflare leak, Allegheny violated this principle. They learned Cloudflare's routes from their transit relationship (downward flow) and then re-advertised them upward to Verizon as if they were Allegheny's own customer routes. Verizon had no way to know this was wrong because BGP carries no metadata about the relationship under which a route was learned.
BGP uses a simple algorithm to pick the best path: when multiple routes exist for the same prefix, prefer the route with the shortest AS path length. This is the specific property that made the leak so damaging.
BGP's best-path selection actually considers several attributes in order: local preference, AS path length, origin type, MED (Multi-Exit Discriminator), and several others. But in practice, AS path length is the attribute that matters most in leak scenarios because leaked routes often have artificially short paths.
Think of it like airport routing. Normally, to fly from New York to Sydney, you might go through Los Angeles or San Francisco (long but correct). A route leak is like a new airline suddenly advertising a "direct" flight from New York to Sydney that actually stops at a tiny airstrip with one runway. Every booking system prefers the "shorter" route, and the airstrip collapses under the traffic.
BGP predates modern security
BGP was designed in 1989 (RFC 1105) and standardized in 1995 (BGP-4, RFC 1771). The protocol assumes that every AS operator is competent and trustworthy. There are no cryptographic signatures on route announcements. Any AS can advertise any prefix, and neighbors will accept it unless they've explicitly configured filters.
The Failure Cascade
Ok, now let's trace exactly what went wrong, step by step. This is where the incident gets technically interesting.
Step 1: The misconfiguration. Allegheny Technologies ran a BGP route optimizer, a software tool designed to influence how traffic enters and exits their network by selectively advertising routes. The optimizer was supposed to manage Allegheny's own small set of prefixes (probably fewer than 100). Instead, a configuration error caused it to grab all ~20,000 prefixes it had learned from Cloudflare via its legitimate transit relationship.
BGP route optimizers are common in the ISP world. They work by selectively announcing routes to influence inbound traffic patterns, for example, making certain routes look less attractive via AS path prepending to shift traffic to cheaper links. The tool itself isn't the problem. The problem was that it was configured without proper guardrails to prevent it from announcing routes the ISP didn't own.
Step 2: The re-advertisement. The optimizer re-advertised these 20,000 Cloudflare prefixes as customer routes to Verizon. In BGP terms, this means Allegheny was telling Verizon: "I am the authorized path to reach all of these Cloudflare addresses." This was false, but BGP has no mechanism to verify the claim.
Step 3: Verizon propagates. Verizon's routers accepted these routes without question. No prefix limit filters. No RPKI validation. No IRR checks. Because Allegheny was a legitimate customer, Verizon treated these as valid customer routes and propagated them globally to all peers and other customers.
Step 4: Global adoption. As the leaked routes spread across the internet's default-free zone (the set of routers that carry a full routing table), routers everywhere compared the leaked path against the legitimate path and selected the shorter one. Within minutes, a significant portion of internet traffic destined for Cloudflare's prefixes was being routed through Allegheny.
Step 5: Capacity collapse. Allegheny's ~100 Mbps link was instantly saturated. Packets backed up in router buffers and were dropped. From the perspective of end users, Cloudflare's services simply stopped responding. DNS queries timed out. HTTPS connections failed. CDN content was unreachable.
The cascade amplification is worth quantifying. Not all internet traffic to Cloudflare was affected, only traffic that traversed networks which accepted the leaked routes. But Verizon is one of the largest backbone operators in the world, so a substantial portion of global internet traffic passes through their network at some point. The blast radius extended far beyond Verizon's direct customers.
Continue Reading with Premium
Unlock this article and every other in-depth system design guide on the platform with NotesFromSDE Premium.