Incidents over time
This page orders the atlas by date so you can scan how outage patterns changed over time and which dependencies kept failing.
Open any incident to jump straight to its full archive record.
Every incident below links directly into the canonical archive record. The timeline is the navigation layer. The archive is the record.
- Case file1997April 1997BGP leak High Severity
AS7007 Route Leak
A misconfigured router at MAI Network Services originated a massive set of more-specific routes and polluted routing tables across the internet. The event became an early proof that one broken routing announcement could destabilize far more than the network that sent it.
Open in archives - Case file2005August 2005Telecom infrastructure Critical
Hurricane Katrina Telecom Failures
Katrina destroyed fiber paths, towers, power, and fuel logistics together, collapsing multiple redundant systems at once. It remains one of the clearest examples of geography overpowering abstract redundancy claims.
Open in archives - Case file2008February 2008BGP Route Hijack Critical
Pakistan Telecom Blacks Out YouTube for the World
Pakistan tried to block YouTube at home. The route escaped, spread, and briefly blacked out YouTube for everyone else. A local censorship action became a global routing fact.
Open in archives - Case file2008February 2008Datacenter power High Severity
The Planet Houston Outage
A power failure at The Planet's Houston datacenter exposed how fragile backup systems can be when they are tested under real pressure. Thousands of hosted servers went dark and recovery stretched across days.
Open in archives - Case file2010December 2010P2P platform failure High Severity
Skype Supernode Failure
A software problem destabilized Skype's supernode layer and broke service for a huge share of users, showing how coordination nodes inside distributed platforms can still become central failure points.
Open in archives - Case file2011April 2011Cloud Infrastructure Critical
AWS EC2 Failure Exposes Limits of Availability Zone Isolation
A network upgrade misrouted EBS traffic and took volumes offline in us-east-1. Reddit, Quora, Foursquare, and other services built too tightly around one zone lost their cushion fast.
Open in archives - Case file2013August 2013Platform failure High Severity
Google Global 5-minute Outage
A brief but iconic Google outage took major services offline at the same time and became a durable example of how concentrated platform ecosystems can vanish all at once, even during a short failure.
Open in archives - Case file2013October 2013Commerce platform High Severity
Amazon.com Retail Outage
A high-profile Amazon retail outage showed how visible and immediate the impact becomes when a single commerce platform failure blocks browsing, purchasing, and order flow together.
Open in archives - Case file2014April 2014Route hijack High Severity
Indosat BGP Hijack
An Indonesian provider briefly announced routes for large portions of the internet, diverting traffic that had nothing to do with it. The incident showed how an operational mistake in one network can distort global reachability in minutes.
Open in archives - Case file20142014Backbone fiber failure High Severity
Level 3 Fiber Outage
A backbone fiber disruption at Level 3 highlighted how physical transport failures can still cascade into broad connectivity problems across downstream networks that rely on the same paths.
Open in archives - Case file2014November 2014Cloud deploy failure Critical
Azure Storage Outage
Human error during a storage-system deployment led to a broad Azure outage and became one of the clearer early examples of control-plane mistakes causing large customer impact. The trigger was routine. The spread was not.
Open in archives - Case file2015July 2015Route leak High Severity
AxcelX and AWS Route Leak
Routes connected to AWS address space leaked outward and disrupted access to major sites and services. The fault was not in application code. It was in the routing layer that decides where traffic goes at all.
Open in archives - Case file2016October 2016DNS / DDoS Critical
Dyn DNS DDoS Attack
Mirai used hacked cameras, routers, DVRs, and other junk devices to hammer Dyn's DNS infrastructure. Twitter, Spotify, GitHub, Reddit, and much of the East Coast web started failing together. The point was not only the attack. It was how much of the visible web depended on one naming layer.
Open in archives - Case file2017February 2017Cloud / Human Error Critical
AWS S3 US-East-1 Outage
A mistyped debugging command removed more S3 capacity than intended. Thousands of applications went down with it, including systems people did not realize depended on that region so heavily. The outage became a lasting example of how one control-plane mistake can turn a local action into a public outage.
Open in archives - Case file2017June 2017Software supply chain Critical
NotPetya Global Outage
NotPetya spread through a trusted software-update channel and crippled shipping, logistics, hospitals, and enterprise networks around the world. It remains one of the clearest demonstrations of software supply chains acting like outage multipliers.
Open in archives - Case file2018June 2018Fiber cut High Severity
Comcast Fiber Cut Outage
A large Comcast outage traced back to physical infrastructure damage and showed how ordinary cable-path failures can still produce wide consumer and enterprise impact. The cloud did not make the fiber less real.
Open in archives - Case file2018November 2018Route leak High Severity
MainOne and Google Route Leak
A route leak involving MainOne and China Telecom redirected traffic for Google and other large services through unexpected paths. It was a sharp demonstration of how brittle inter-network trust still is.
Open in archives - Case file2018December 2018Public-safety outage Critical
CenturyLink 911 Outage
A network failure disrupted 911 service across multiple states and affected millions of customers. The incident showed how emergency calling systems could still share failure domains with commercial backbone infrastructure.
Open in archives - Case file2019June 2019Cloud networking Critical
Google Cloud Networking Outage
A routine change cascaded through Google Cloud's networking systems and led to major traffic loss and degraded access across services. The incident showed how internal reliability changes can widen into public unavailability.
Open in archives - Case file2019June 2019BGP Route Leak Critical
Verizon Route Leak Disrupts 15 Percent of Global Internet Traffic
A small provider leaked routes it did not own. Verizon accepted them and propagated them. Cloudflare, Facebook, Google, and much more of the network got pulled into the mistake.
Open in archives - Case file20202020Cloud DNS High Severity
Azure DNS Outage
A separate Azure DNS incident in 2020 reinforced that naming failures recur even inside large cloud platforms, and that those failures can outrank the health of the underlying services they point to.
Open in archives - Case file2020May 2020PKI expiration High Severity
Sectigo AddTrust Root Expiration
The expiration of the AddTrust root certificate triggered trust failures on legacy systems and broke connections that still depended on that chain. A quiet certificate deadline turned into a visible service problem for older clients.
Open in archives - Case file2020August 2020Video platform High Severity
Zoom Partial Global Outage
Zoom experienced a broad service disruption during the period when remote work had made video infrastructure a daily dependency. The incident showed how a platform that looks optional can become operationally central very quickly.
Open in archives - Case file2020August 2020BGP / ISP High Severity
CenturyLink / Level 3 Backbone Outage
A bad Flowspec rule was supposed to block abuse. Instead it blocked BGP itself across one of the internet's largest backbones. Routers crashed, restarted, received the same bad rule, and crashed again. The network started rejecting the information needed to fix the network.
Open in archives - Case file2020December 2020Authentication Failure Critical
Google Auth Outage Renders All Google Services Inaccessible
A quota enforcement mistake during an auth migration knocked out Gmail, YouTube, Drive, and everything else tied to the same gate. The apps were not the first problem. Access was.
Open in archives - Case file20212020–2022Storage exhaustion High Severity
Slack File-storage Outage
A file-storage failure inside Slack disrupted access to uploads and working materials, showing how collaboration platforms break not only when messaging fails but also when their attached operational data stops moving.
Open in archives - Case file20212021Certificate expiration High Severity
Google Voice Expired TLS Certificate
An expired TLS certificate broke access to Google Voice and showed, again, that trust-chain maintenance is part of availability engineering rather than a side concern reserved for security teams.
Open in archives - Case file2021May 2021SaaS DNS High Severity
Salesforce Multi-hour Outage
A DNS-related failure affected Salesforce services and disrupted the large body of business workflows built on top of them. The outage hit as an enterprise dependency problem, not just an app problem.
Open in archives - Case file2021June 2021CDN / Edge Critical
Fastly Global Content Delivery Outage
A dormant software bug sat in Fastly's network until a customer pushed a valid configuration change. Within seconds, most of Fastly's global edge started returning errors. News sites, commerce platforms, and government pages disappeared together because the front door was more shared than it looked.
Open in archives - Case file2021July 2021Cloud DNS High Severity
Azure DNS Outage
A DNS-layer problem inside Azure interrupted name resolution and widened into a broader cloud-service disruption. Parts of the visibility and status path became unreliable too, which made diagnosis harder for customers already in the dark.
Open in archives - Case file2021July 2021DDoS mitigation High Severity
Akamai Prolexic Outage
A platform designed to preserve availability became the source of unavailability instead. Customers depending on Prolexic lost service because the defensive layer itself failed under load.
Open in archives - Case file2021July 2021DNS / CDN Critical
Akamai DNS Outage Silences FedEx, Airlines, and Major Banks
A configuration update triggered a bug in Akamai Edge DNS and took down a long list of companies that looked unrelated until they failed at the same time.
Open in archives - Case file2021September 2021Identity failure Critical
Azure AD Key-Rotation Outage
A signing-key problem became a long authentication outage across Microsoft 365, Teams, Exchange Online, and related services. Systems were still there, but access to them was blocked by the gate in front.
Open in archives - Case file2021September 2021PKI expiration High Severity
Let's Encrypt DST Root CA X3 Expiration
The expiration of DST Root CA X3 caused compatibility failures on older Android devices and legacy clients that still anchored trust there. Modern infrastructure stayed up while part of the user base lost the ability to connect cleanly.
Open in archives - Case file2021October 2021BGP / Platform Critical
Facebook Global Blackout
One backbone maintenance change withdrew Meta's BGP routes. Facebook, Instagram, WhatsApp, and Messenger vanished at the same time. The harder part came next. The same failure also cut engineers off from some of the internal systems needed to fix it, which turned an outage into a recovery trap.
Open in archives - Case file2021October 2021Distributed systems failure Critical
Roblox 73-hour Outage
Roblox went down for roughly three days after failures involving internal service-discovery and data systems compounded across a highly interconnected platform. The length of the outage made the recovery-path problem impossible to ignore.
Open in archives - Case file2021December 2021Cloud DNS and control plane Critical
AWS US-East-1 Control-Plane Outage
Internal networking and DNS issues in US-East-1 disrupted AWS services, Amazon devices, logistics systems, and third-party applications. The region concentration problem was visible, but so was the depth of internal dependency inside the same region.
Open in archives - Case file2022February 2022Collaboration platform High Severity
Slack Outage
Slack suffered a cascading failure involving database and cache systems, which disrupted messaging, connections, and workflow continuity for teams that depend on it as operating infrastructure. Recovery was shaped by how many internal pieces were failing together.
Open in archives - Case file2022July 2022Telecom Core Failure Critical
Rogers Canada: 12 Million Without Service, Including 911
A core network upgrade removed a critical filter and sent traffic into the wrong place at the wrong scale. Rogers collapsed under the load, taking mobile service, internet access, and 911 with it for millions of people.
Open in archives - Case file20232023Regional cloud networking High Severity
GCP us-east4 Traffic Loss
Traffic loss in Google Cloud's us-east4 region highlighted how regional networking faults can still create large downstream application problems when many services quietly share the same cloud locality.
Open in archives - Case file20232023Local network control Medium Severity
UAF DHCP Server Outage
A DHCP outage at the University of Alaska Fairbanks remains useful because it shows a bounded local-control failure that stayed local, which makes it a clean contrast against the much wider shared-layer incidents elsewhere in the atlas.
Open in archives - Case file20242024Mobile carrier outage Critical
Verizon Mobile Outage
A major Verizon mobile outage underscored how quickly carrier failures still spill into daily public life once voice, data, authentication, and payment flows all assume cellular reachability.
Open in archives - Case file20242024Third-party roaming dependency Critical
AT&T / T-Mobile / Verizon Roaming Outage
A shared roaming dependency disrupted multiple major U.S. carriers at once, making the outage notable less for any one brand than for the hidden third-party relationship that linked them together.
Open in archives - Case file2024March 2024Platform ecosystem outage Critical
Meta (Facebook / Instagram) Outage
A broad Meta outage affecting Facebook and Instagram showed that even without a long root-cause disclosure, the operational story remains the same: concentrated social platforms fail at the scale of their audience.
Open in archives - Case file2024July 2024Software Supply Chain Critical
CrowdStrike Falcon Global Outage
A routine CrowdStrike update shipped a bad configuration file and crashed Windows at kernel level on an estimated 8.5 million devices. Airlines could not board passengers. Hospitals switched to paper. Banks shut down systems. Recovery was slow because every broken machine needed hands-on work.
Open in archives - Case file20252025Azure Front Door configuration Critical
Microsoft 365 / Azure Global Outage
A global Microsoft 365 and Azure outage tied to Azure Front Door configuration reinforced the atlas theme that the front door often fails harder than the applications behind it.
Open in archives - Case file20252025Developer platform outage High Severity
GitHub Outage
A GitHub outage disrupted repository operations and development workflows at a layer many teams now treat as critical infrastructure rather than an optional collaboration tool.
Open in archives - Case file20252025Protective edge logic failure Critical
Cloudflare Bot-management Outage
An internal Cloudflare bot-management failure propagated widely because the protective layer itself sat in front of customer traffic at massive scale. The case fits the broader pattern of defensive systems becoming shared failure systems.
Open in archives - Case file20252025Route leak High Severity
Telekom Malaysia Route Leak
A modern route leak involving Telekom Malaysia preserved the same old lesson in contemporary form: routing mistakes still escape local intent and become international reachability problems very quickly.
Open in archives - Case file2025October 2025Cloud / DNS Critical
AWS DynamoDB DNS Failure
DNS resolution for DynamoDB failed in us-east-1. Disney+, Delta, Reddit, Robinhood, Roblox, and many other services went dark. The data was still there. The names stopped resolving. A naming failure overruled the resilience of the underlying system.
Open in archives - Case file20262026Mobile carrier outage High Severity
Verizon Mobile Outage
A later Verizon mobile outage, even with thinner public disclosure, remains useful in the atlas because it reinforces how dependent daily communications and service access remain on a small number of carrier systems.
Open in archives - Case file20262026Enterprise suite outage Critical
Microsoft 365 Outage
A long Microsoft 365 outage highlighted how deeply office coordination, messaging, documents, and identity have been consolidated into one operational dependency for many organizations.
Open in archives - Case file20262026BGP / address announcement failure Critical
Cloudflare BYOIP BGP Outage
A large-scale Cloudflare routing incident tied to BYOIP handling showed how reachability can still disappear at internet scale when address announcement logic goes wrong at a major provider edge.
Open in archives - Case file20262026Physical attack on cloud infrastructure Critical
AWS Middle East Drone-strike Outage
This incident is preserved because it forces the cloud back into physical reality: regional availability ultimately depends on facilities, geography, power, and security on the ground.
Open in archives