CrowdStrike Outage - Debunking the Monopoly Myth
EDR needs low-level access and Cyber Builders must access to low-level APIs. As such, the Microsoft model is not that bad.
Hello, Cyber Builders 🖖
Last week, we witnessed one of the biggest IT outages ever. I won't delve into all the details, as many bloggers and journalists have covered them extensively. However, I must express my disappointment with the narratives pushed by TV shows and major newspapers.
I want to counter one of the prevalent ideas circulating, especially in Europe: the outage's root cause would be the monopoly based on Microsoft's Windows system. MS Windows would be detrimental, and we, as Europeans, must find alternative solutions.
This perspective is misguided. It mixes many arguments, and I feel the need to separate them. In this post, I’ll do my best to split between:
The incident impact is expected as modern societies are dependent on IT technologies
How an EDR is working and why - by design - these outages could happen
Why Microsoft Windows is probably one of the most open systems yet available (don’t shout at me before reading the section!)
Why do we need, as Cyber Builders, to work harder and to invest more in our (cyber)security
By the way, the root cause of the outage is NOT Microsoft but Crowdstrike, the vendor of the Falcon EDR, which published a bad software update. I guess people at Microsoft must feel bad for all the unjustified negative buzz about their flagship product.
The Reality of IT Dependency.
The recent outage has enormous implications, but it's crucial to recognize that these implications merely underscore our dependence on IT systems. Historically, societies have always relied on key technologies: in the medieval economy, it was horses or other animals; in the 19th century, it was trains; and in the modern economy, we rely heavily on electricity.
Whenever these technologies failed, the impacts were significant. We are increasingly dependent on IT, which isn't inherently negative because IT provides numerous new services and facilities.
Moreover, as IT accelerates our economies and lives, IT incidents have a large impact for a few minutes. Look at the outage timeline.
Crowdstrike documented the incident: “On July 19, 2024, at 04:09 UTC, as part of ongoing operations, CrowdStrike released a sensor configuration update to Windows systems. Sensor configuration updates are an ongoing part of the protection mechanisms of the Falcon platform. This configuration update triggered a logic error, resulting in a system crash and a blue screen (BSOD) for impacted systems. The sensor configuration update that caused the system crash was remediated on Friday, July 19, 2024, 05:27 UTC.”
So it was only 78 minutes… It's not that long before such a huge impact is made.
Microsoft has confirmed that 8.5 million Windows systems have been impacted (source here). MS also documented manual remediation and scripts, which can be found here.
We must live with these impacts and invest in incident preparation and response.
We must acknowledge this dependence and ensure we have backup plans and recovery strategies similar to those for electricity or transportation (don’t you have a flashlight running on batteries at home?). We are not investing enough in resilience-driven technologies. Corporations are not getting ready because potential IT system failures are not considered.
I wrote last year about the topic
Understanding the EDR Outage
The recent CrowdStrike incident was triggered by a flawed update to their Falcon Sensor, leading to blue screens of death and bootloops on affected Windows devices. These issues can arise from insufficient testing or unexpected interactions between the update and the operating system. Despite rigorous development processes, occasional errors in updates can still slip through, especially in complex systems like EDR software.
How an Endpoint Detection and Response (EDR) System Works
An Endpoint Detection and Response (EDR) system, such as CrowdStrike's Falcon Sensor, is designed to monitor, detect, and respond to cybersecurity threats on endpoints like computers and servers. The primary functions of an EDR system include:
Monitoring System Processes: Keeping track of all running processes to detect anomalies.
Analyzing Network Activity: Inspecting inbound and outbound traffic to spot unusual patterns.
Logging File Operations: Monitoring file creation, modification, and access to detect malicious actions.
Why EDR Systems Need Low-Level Access
To achieve these functions effectively, an EDR product must be able to collect signals, logs, and events about all significant activities happening on the endpoint, such as a Windows PC. This involves detecting attempts to hack the system through the network or identifying threats introduced by users opening malicious files or clicking on dangerous links.
To meet these objectives, the EDR must be loaded at the beginning of the PC boot process. This requirement means the EDR operates at a low level within the operating system, often within the kernel. Kernel-level drivers, like those used by CrowdStrike's EDR and similar products, have the privilege of monitoring and intercepting system events comprehensively. This deep integration allows the EDR to capture all the details needed to detect and respond to threats effectively. (more information here)
The efficacy of an EDR system often correlates with how well it is integrated at this low level. In benchmarks, the better an EDR product is integrated into the operating system, the more effectively it can detect and respond to threats. For more detailed technical insights, refer to the EDR telemetry repository on GitHub - a great project that provides examples and benchmarks for EDR performance.
The Misguided Monopoly Argument
Microsoft Windows is arguably one of the most open systems available today, far more so than others like Apple iOS or Google/Samsung Android. It's poor judgment to say we should dismantle this and switch to something else, like Google Chrome OS.
It's easy to bash Microsoft because it's the biggest company running the dominant OS, but in reality, alternatives often present more restrictions for independent software vendors. For example, platforms like iOS and Android require going through app stores and being vetted by tech juggernauts. Moreover, developing a low-level kernel driver and providing a high level of “independent” security on these systems is impossible.
In contrast, Microsoft Windows allows developers to implement software directly within the OS, offering some flexibility. A third-party vendor—in the US, the EU, or anywhere—can develop a module that controls the key security elements and sends telemetry. Users can use Windows, a leading system they know well while being protected by a trusted solution.
While the impact of IT failures is significant, it doesn't justify claims that we should dismantle Microsoft's monopoly. The system we have now, with Microsoft Windows, is not that bad because it allows significant customization and integration at a low level, which is crucial for creating effective cybersecurity solutions.
Cyber Builders must get low-level APIs
So, Laurent, are you telling me that you are okay with the current status?
No, I am just saying we should not bash Microsoft because it is one of the last open operating systems. Consider this list:
Smartphones and Tablets - 100% closed environment, bound to hardware vendors, application distribution is restricted and controlled
Laptops - Still flexible environments; Apple and Microsoft have developer programs for security vendors, and we - as an industry - must continue to defend access to low-level APIs. Apps and driver signing are good measures, but it must not be a way to move to a closed environment.
Server—Well, it depends, but if you consider Broadcom's bold business and licensing moves after the VMware acquisition (see https://techmonitor.ai/technology/cloud/vmware-takeover-by-broadcom-mistake), I am quite unsure we are gaining freedom here.
Cloud - Do I need to explain?
We must firmly ask our “infrastructure” vendors (OS, Hardware, Platforms) to commit to open APIs, open-source components, and technical documentation access. Our economy depends on IT technology, and our security stacks depend on low-level access.
No major system should be a black box without any capacity to “repair” it.
Cyber Builders—startups building new technologies and practitioners securing IT systems—won't be autonomous or efficient if they don’t defend this open access to low-level APIs.
I want to see more low-level APIs from cloud providers and other OS vendors. We can secure this through code signing and other crypto methods. Opening up the platforms we use daily would help us provide more security.
It is crucial. Don’t you think?
Laurent 💚