Top 7 outages of 2023
Outlook users lose access to the Microsoft service: Feb. 7
Shortly following the January incident, Microsoft Outlook users experienced another outage on February 7, 2023. Microsoft customers across North America, Europe, and Asia experience issues accessing Outlook for several hours, with the greatest impact experienced in the U.S.
While the outage was global in nature, unlike the previous incident, ThousandEyes determined the network might not be the root cause of the issue as there was no significant packet loss, latency, or unusual routing behavior observed during the incident, according to ThousandEyes. “During the outage, ThousandEyes vantage points observed symptoms indicative of application-related issues, including elevated service response timeouts and increased page load times,” ThousandEyes reported.
Two outages impact service for Virgin Media UK: April 4
BGP routing appeared to be the primary cause of two outages that impacted Virgin Media UK on April 4, 2023. The outages affected the reachability of the Virgin Media UK network and its services to the global internet. The two incidents happened the same day, spanning for most of the day and lasting several hours each time. According to ThousandEyes, “a lack of viable BGP routes appeared to cause most of the observed traffic loss.”
ThousandEyes determined the two outages had similar characteristics that included the withdrawal of routes of its network, traffic loss, and intermittent periods of service restoration. “Given that the initial incident began in a period of time typical of maintenance activities (half past midnight local time), it may have resulted from a change to the network state by the service provider,” ThousandEyes said. “Recurrence of a near identical incident later in the day could indicate that the triggering mechanism for the first incident was either not fully understood or was not completely resolved.”
AWS incident impacts services for 2 hours: June 13
On June 13, 2023, Amazon Web Services (AWS) experienced a more than two-hour incident that impacted a number of services on the East Coast of the U.S. The disruption began in the evening and was resolved a couple of hours later, but ThousandEyes did not observe any significant issues such as high latency or packet loss for network paths to AWS servers. Yet the network observability provider did notice an increase in latency, server timeouts, and HTTP server errors impacting the availability of applications hosted within AWS.
“The incident appears to have manifested as elevated response times, timeouts, and HTTP 5XX server errors for users attempting to access impacted applications,” ThousandEyes said. Shortly after the incident began, AWS identified the source of the issue as a capacity management subsystem that was impacting the availability of many of its services, including Lambda, AWS Management Console, and more. According to ThousandEyes, AWS confirmed that these affected services were experiencing “increased error rates and latencies,” which caused service availability issues for applications using these AWS services, “regardless of where they were hosted or where they were serving users.”