Crowdstrike Incident
By |Published On: July 23rd, 2024|5 min read|Categories: Cybersecurity|

On July 19, 2024, a routine software update from cybersecurity giant CrowdStrike triggered a cascading failure that resulted in one of the largest IT outages in history. This incident affected thousands of businesses and organizations worldwide, causing widespread disruptions across various sectors including aviation, banking, healthcare, and government services.

Timeline of Events

  • July 19, 2024, 04:09 UTC: CrowdStrike releases a sensor configuration update for Windows systems.
  • 04:09 – 05:27 UTC: Systems running Falcon sensor for Windows version 7.11 and above download the faulty update, causing widespread crashes.
  • 05:27 UTC: CrowdStrike identifies and remedies the issue in the sensor configuration update.
  • Early morning hours (various time zones): Reports of outages begin to flood in from across the globe.
  • Later on July 19: CrowdStrike CEO George Kurtz issues a public apology on NBC’s Today show.
  • July 19-20: Governments worldwide, including Australia and the UK, activate emergency response mechanisms.
  • Ongoing: Recovery efforts continue, with manual fixes required for many affected systems.

What Happened?

Crowdstrike Outage in Numbers

The outage was caused by a defect in a Falcon content update for Windows hosts. Specifically, the update was related to Channel File 291, which controls how Falcon evaluates named pipe execution on Windows systems. The configuration update triggered a logic error that resulted in system crashes and blue screens of death (BSODs) on impacted systems.

This incident was not the result of a cyberattack but rather a software bug that slipped through CrowdStrike’s quality control processes. The widespread impact was due to CrowdStrike’s significant market share, with over 24,000 customers including nearly 60% of Fortune 500 companies.

Impact and Consequences

The outage affected a wide range of industries and services:

  • Healthcare providers, including hospitals, encountered system failures.
  • Airlines grounded flights and experienced severe delays.
  • Banks and financial institutions faced disruptions in their operations.
  • Government services, including emergency numbers and websites, were impacted.
  • Media outlets, including broadcasters, experienced outages.

The economic impact of this incident is expected to be significant, potentially running into billions of dollars.

Could This Happen to Other Vendors?

The CrowdStrike incident serves as a reminder that no software vendor, regardless of size or reputation, is immune to the risks associated with software updates. This event highlights several key points:

Interconnectedness of systems: Modern businesses rely on complex software ecosystems, making them vulnerable to cascading failures.

Automation risks: While automated updates are necessary for managing large-scale systems, they can also amplify the impact of errors.

Single points of failure: Over reliance on a single vendor or technology can create dangerous vulnerabilities.

Need for redundancy: Implementing multiple layers of security with different vendors can help mitigate risks.

Importance of testing: Rigorous testing procedures are needed for preventing such incidents.

BlackFog’s Approach to Mitigating Update Risks

In light of this incident, it’s worth highlighting BlackFog’s engineering practices that aim to prevent similar occurrences:

BlackFog prides itself on engineering best practices. As such it has established canary releases, whereby all releases involving significant features or critical code changes will only be deployed to a subset of customers at any one time. This ensures that if there are any significant issues discovered, changes can be reverted immediately using a global flag on our master servers.

This approach offers several advantages:

  • Controlled rollout: By deploying updates to a limited subset of customers initially, BlackFog can detect potential issues before they affect the entire user base.
  • Quick reversion: The ability to revert changes using a global flag allows for rapid response to any discovered problems.
  • Minimized impact: Even if an issue occurs, it would only affect a small portion of users, significantly reducing the potential for widespread disruption.

Lessons Learned

The importance of thorough testing, phased rollout plans, and redundancy in IT systems is highlighted by the CrowdStrike incident. The necessity for businesses to have thorough business continuity plans that take into consideration potential cybersecurity infrastructure failures is also highlighted.

Events such as these are an important reminder of the vulnerability of our technological infrastructure, especially as our dependence on networked digital systems increases. They underline that the software industry as a whole must adopt fail-safe mechanisms, enhance testing protocols, and maintain constant awareness.

Work With BlackFog

Prevent global IT meltdowns with BlackFog’s multi-layered cybersecurity approach. Our anti data exfiltration (ADX) technology, advanced threat hunting, and automated 24/7 protection safeguard against ransomware, data breaches, and cyberattacks. Discover how BlackFog’s innovative solutions go beyond traditional EDR/XDR to keep your organization secure.

Share This Story, Choose Your Platform!

Related Posts