At Microsoft's security summit, experts debated how to prevent another global IT meltdown. Will it help?
There is no doubt that the great CrowdStrike-Windows meltdown in July 2024 was an economic disaster. It was the largest IT outage in history. Its effects disrupted banking systems, healthcare networks, and the global air transportation network. As the post-incident analyses made clear, it was entirely preventable.
Also: Stop paying for antivirus software. Here’s why you don’t need it
In the wake of that incident, Microsoft convened a day-long Windows Endpoint Security Ecosystem Summit held earlier this week at its Redmond headquarters. The goal of the closed event, which was not open to the press or outside observers, was to bring together what Microsoft called “a diverse group of endpoint security vendors and government officials from the US and Europe to discuss strategies for improving resiliency and protecting our mutual customers’ critical infrastructure.”
Did anything useful come out of the session? Who knows? Microsoft VP of Enterprise and OS Security David Weston delivered a wrap-up of the session that was clearly scrubbed by lawyers and communication professionals until all that was left was optimistic corporate messaging and a few vague hints (“key themes and consensus points”) of what might happen in Windows and in endpoint security products … someday, but probably not soon.
Also: Why the NSA advises you to turn off your phone once a week
As that report notes, the roundtable “was not a decision-making meeting … we discussed the complexities of the modern security landscape, acknowledging there are no simple solutions.” But one theme that runs through the meeting summary is a collective realization that the industry cannot afford another CrowdStrike incident.
The CrowdStrike incident in July underscored the responsibility security vendors have to drive both resiliency and agile, adaptive protection. … We face a common set of challenges in safely rolling out updates to the large Windows ecosystem, from deciding how to do measured rollouts with a diverse set of endpoints to being able to pause or rollback if needed. A core [Safe Deployment Practices] principle is gradual and staged deployment of updates sent to customers.
That’s a direct critique of CrowdStrike, which caused the IT outage by rolling out a flawed update to its entire universe of devices rather than using a staged deployment that could have identified the problem early and shut off updates to minimize widespread damage.
There’s slightly more color in the comments from meeting participants that were appended to the end of Microsoft’s corporate blog post, like this blast from Ric Smith, Chief Product and Technology Officer of CrowdStrike competitor SentinelOne:
SentinelOne thanks Microsoft for its leadership in convening the Windows Endpoint Security Ecosystem Summit and we are fully committed to helping drive its goal of reducing the chance of future events like the one caused by CrowdStrike. We believe that transparency is critical and strongly agree with Microsoft that security companies must live up to stringent engineering, testing and deployment standards and follow software development and deployment best practices. We are proud that we have followed the processes that Microsoft has discussed today for years and will continue to do so going forward. [emphasis added]
Ouch.
What was clearly the most energized discussion, though, revolved around kernel-mode access to Windows, a key cause of the CrowdStrike debacle. As I noted a few months ago, the scope of the CrowdStrike outage was due in large part to the Windows architecture:
Developers of system-level apps for Windows, including security software, historically implement their features using kernel extensions and drivers. As this example illustrates, faulty code running in the kernel space can cause unrecoverable crashes, whereas code running in user space can’t.
That used to be the case with MacOS as well, but in 2020, with MacOS 11, Apple changed the architecture of its flagship OS to strongly discourage the use of kernel extensions. Instead, developers are urged to write system extensions that run in user space rather than at the kernel level. On MacOS, CrowdStrike uses Apple’s Endpoint Security Framework and says using that design, “Falcon achieves the same levels of visibility, detection, and protection exclusively via a user space sensor.”
Could Microsoft make the same sort of change for Windows? Perhaps, but doing so would certainly bring down the wrath of antitrust regulators, especially in Europe.
In the broadest possible terms, Microsoft’s post refers to “platform capabilities Microsoft plans to make available in Windows,” with a specific shout-out to security defaults in Windows 11 that “enable the platform to provide more security capabilities to solution providers outside of kernel mode. Both our customers and ecosystem partners have called on Microsoft to provide additional security capabilities outside of kernel mode….”
Also: Yes, you can upgrade that old PC to Windows 11, even if Microsoft says no. These readers proved it
Not every attendee is thrilled at that idea. Sophos CEO Joe Levy, for example, politely noted, “We were very pleased to see Microsoft support many of Sophos’ recommendations, based on the collection of architectural and process innovations we’ve built over the years and present today on the 30 million Windows endpoints we protect globally. The summit was an important and encouraging first step in a journey that will produce incremental improvement over time….”
What are those recommendations? In an August blog post, Sophos Chief Research and Scientific Officer Simon Reed made clear that the company considers access to the Windows kernel to be fundamental. “Operating in ‘kernel-space’ – the most privileged layer of an operating system, with direct access to memory, hardware, resource management, and storage – is vitally important for security products.” Kernel drivers are “fundamental,” he wrote, not just to Sophos products but to “robust Windows endpoint security, in general.”
In a statement that wasn’t attributed to an individual, ESET was even more blunt:
ESET supports modifications to the Windows ecosystem that demonstrate measurable improvements to stability, on condition that any change must not weaken security, affect performance, or limit the choice of cybersecurity solutions. It remains imperative that kernel access remains an option for use by cybersecurity products to allow continued innovation and the ability to detect and block future cyberthreats. We look forward to the continued collaboration on this important initiative. [emphasis added]
And that, ultimately, is why it’s unrealistic to expect any sweeping changes in the Windows platform any time soon. Those arguments from Sophos and ESET are clearly shared by leaders at other security companies, who fear that restricting access to the Windows kernel will give Microsoft’s own endpoint protection products a crucial competitive advantage.
Also: 7 password rules to live by in 2024, according to security experts
That’s the kind of debate that quickly gets handed off from engineers to lawyers. Given Microsoft’s history with antitrust regulators in Europe and the US, it’s likely to end up in court. That’s probably why “government officials from the US and Europe” were invited attendees at the summit, and there’s no doubt they were taking notes.