Amazon Finally Explains What Caused the Massive AWS Outage

8 months ago

Amazon Web Services (AWS) says a massive overnight outage in its US-East-1 region—one of its busiest data centres—caused widespread internet disruptions across North America and beyond. The company finally restored all services by Monday at 3:01pm PT it says, sharing some more info on just what went wrong.

The outage began late Sunday night after a DNS error in Amazon’s DynamoDB database triggered cascading failures that broke more than 140 AWS services, including EC2, Lambda, CloudWatch, and SQS. Engineers worked through the night and restored full service by Monday afternoon, though some systems, like Redshift, Config, and Connect, needed more time to clear backlogs. Essentially these services power most websites and apps and any outage can cause them to not work properly.

AWS hosts about 30% of the web’s infrastructure, including one-third of Fortune 500 companies, which made the impact especially widespread. When the US-East-1 (Northern Virginia) region fails, it affects a massive portion of the internet that depends on it for cloud operations.

Analysis of the Alexa Top 1,000 sites showed that 11.5% use AWS, and 9.9% rely solely on it. About 25% of infrastructure for major sites like Zillow, ESPN, and IMDb went offline. Roughly 24% of IP addresses in those top sites route through AWS, amplifying the outage’s reach.

During the peak of the incident, over 6.5 million outage reports were logged globally, with AWS network traffic dropping by 68%. Monitoring sites like Downdetector saw spikes for more than 1,000 services, and roughly 2,000 companies were impacted worldwide.

Affected Sectors included:

Finance (30%+ of apps): Coinbase, Robinhood, Venmo, PayPal, Lloyds, and Halifax in the UK.
Canadian services like Wealthsimple were also affected, with users unable to see account balances.
Gaming (50%+): Fortnite, Roblox, and Pokémon GO all experienced downtime.
Streaming and Social (20–40%): Prime Video, Disney+, Hulu, Snapchat, Duolingo (streaks!) and Instagram saw disruptions.
Other services: Canva, Figma, Canvas LMS, several airline systems, and even McDonald’s apps were hit.

Amazon said the outage started because of DNS resolution errors inside its DynamoDB database service—basically, its systems had trouble finding and connecting to the right servers. That glitch then caused a chain reaction, disrupting other internal tools and services that depend on DynamoDB. The company says it will soon publish a detailed report outlining what went wrong and what steps it will take to avoid another major outage.

Want to see more of our stories on Google?

P.S. Want to keep this site truly independent? Support us by buying us a beer, treating us to a coffee, or shopping through Amazon here. Links in this post are affiliate links, so we earn a tiny commission at no charge to you. Thanks for supporting independent Canadian media!

0 Comments

Oldest

Newest Most Voted

Amazon Finally Explains What Caused the Massive AWS Outage

Other articles in the category: News

Google Unveils All-New Gemini Smart Speaker Coming to Canada Next Week

Samsung Releases Spidey Tracker in Partnership With Spider-Man: Brand New Day

Sonos Updating App With New Volume Controls and Navigation