HomeCybersecurityCrowdStrike outage: technical analysis

CrowdStrike outage: technical analysis

- Advertisement -spot_img

On 19 July 2024, a global outage impacted numerous Windows systems due to a flawed update to CrowdStrike’s Falcon sensor. The root cause was linked to Channel File 291, leading to widespread system crashes.

The primary issue was a mismatch between the input fields expected and those provided. Channel File 291 specified a comparison against 21 input values, but the integration code supplied only 20.

This discrepancy resulted in an out-of-bounds memory read when the Content Interpreter attempted to access the 21st value, causing system crashes at the next IPC (Inter-Process Communication) notification from the operating system.

Channel File 291 is part of CrowdStrike’s Rapid Response Content, which enhances the Falcon sensor’s detection capabilities without requiring changes to the sensor code. The file contained a latent out-of-bounds read issue due to the incorrect number of input values, leading to the Content Interpreter crashing and causing system-wide failures.

Professor Kevin Beaumont, a security researcher, noted that the channel updates were not tested on actual Windows PCs before deployment, relying instead on automated bespoke code testing.

This lack of real-world testing contributed to the undetected input mismatch and subsequent crash.

CrowdStrike has implemented several measures to prevent similar incidents in the future. Enhanced testing procedures have been integrated into the Content Configuration System to ensure all Template Types undergo automated testing.

Additionally, runtime input array bounds checks have been added to the Content Interpreter to prevent out-of-bounds memory reads. CrowdStrike has introduced extra deployment layers and acceptance checks to validate updates before they reach production.

The company has also engaged third-party security vendors to review the Falcon sensor code and the entire quality control process from development through deployment.

Furthermore, CrowdStrike is collaborating closely with Microsoft to utilize new Windows features that allow security functions to be performed in user space, reducing reliance on kernel drivers. This collaboration aims to enhance system resilience and prevent similar incidents in the future​.

The root cause analysis by CrowdStrike identified a content validation issue and inadequate input handling as the primary factors behind the outage.

- Advertisement -spot_img
- Advertisement -
Stay Connected
Must Read
- Advertisement -
Related News
- Advertisement -