Protecting your business from unforeseen outages: Lessons from the recent CrowdStrike incident

The recent global outage at CrowdStrike, a leading cybersecurity firm, has sent shockwaves through the business community. As organizations increasingly rely on cloud-based services and cybersecurity solutions to protect their digital assets, the incident serves as a stark reminder of the vulnerabilities inherent in even the most robust systems. This article explores the lessons businesses can learn from the CrowdStrike outage and underscores the importance of proactive measures like performing a business impact assessment (BIA) to safeguard operations against similar disruptions.

The CrowdStrike outage: A wake-up call

CrowdStrike’s global outage disrupted services for numerous Microsoft clients, highlighting the far-reaching impact that a single point of failure can have on businesses. The incident not only affected the availability of crucial cybersecurity defenses but also laid bare the broader operational risks associated with third-party service dependencies. Businesses that depended heavily on Microsoft’s services using CrowdStrike’s Falcon systems faced immediate challenges in maintaining their security postures and continuing operations without interruption. According to IDC Research, “Downtime continues to cost cloud buyers” (IDC’s Cloud Pulse, 4Q23: Executive Summary, Part II — 2024 Cloud Return on Investment). So while the CrowdStrike incident is dramatic, cloud outages are not rare.

Lessons learned and steps to protect your business

There has been a rapid adoption of services provided via the cloud, and while these services have generally been reliable, they are still subject to impactful outages. CIOs should plan accordingly by assessing business impacts, focusing on the most critical systems. (See more from IDC on business impact assessments.)

A BIA helps organizations identify critical business functions and assess threats and the potential impact of disruptions. By understanding which processes and systems are most vital, businesses can prioritize resources and recovery efforts effectively. In the recent global outage, air travel was affected by problems with reservations and check-in systems. Impacts included emails and SMS messages from carriers to customers. Carriers’ ability to respond and recover varied, demonstrating varying levels of dependency on third parties and capabilities to respond.

Similarly, some hospitals and medical offices could not obtain vital patient information or maintain visit schedules.

Having a deep understanding of threats and vulnerabilities requires careful planning by CIOs. This includes developing an understanding of the most critical systems and how to respond in the event of an outage.

The BIA process involves evaluating various threat scenarios, including the failure of key third-party services like Microsoft Azure. This prepares businesses for a range of disruptions and ensures that contingency plans are in place. When developing the BIA, it is important to consider specific scenarios.

Had the CrowdStrike scenario been considered by airlines and hospitals, the impacts would likely have been mitigated. While you need to be careful to avoid “fighting the last war,” using recent events from the real world can help teams think through the appropriate response. This may involve everything from manual workarounds to training for staff and support teams or additional redundancy. The global head of cyberthreat management at a leading global property and casualty insurance company suggests: “Look across your portfolio where you’ve got that kind of auto-update automation and [ask], ‘Can they be slowed down some to keep this from being bigger than it needs to be?’ ”

For critical systems, consider:

Diversification and redundancy

  • Avoid single points of failure: For critical systems, build in redundancy at the component level. In the CrowdStrike incident, Linux and Mac endpoints were not affected.
  • Implement redundant systems: Establishing redundant systems and data backups is essential for maintaining continuity.

Enhance incident response plans

  • Regularly test and conduct drills: Incident response plans should be tested and updated regularly to address shortfalls discovered when walking through or testing scenarios.
  • Integrate communication channels: Effective incident response requires seamless communication between IT, security, and business teams.

Vendor risk management

  • Assess vendor capabilities: Regularly evaluate the risk management and disaster recovery capabilities of key vendors. Ensure they have robust plans in place to handle outages and can provide timely support during incidents.
  • Review and adjust third-party rollouts: CrowdStrike Falcon users have the capability to throttle rollouts. A review of third-party rollouts should consider such capabilities.
  • Safeguard contracts: Include clauses in vendor contracts that address service-level agreements (SLAs), response times, and penalties for prolonged outages. This helps mitigate risks and ensures accountability.

Continuous improvement and review

  • Stay informed: Keep abreast of industry trends and incidents like the CrowdStrike outage to learn from the experiences of others. This knowledge can inform your own risk management and business continuity strategies.
  • Update regularly: Continuously review and update your BIA, incident response plans, and vendor assessments to reflect changes in the business environment, emerging threats, and lessons learned from past incidents.

The CrowdStrike outage serves as a critical reminder of the importance of proactive risk management and business continuity planning. By conducting a comprehensive BIA, diversifying dependencies, enhancing incident response plans, managing vendor risks, and committing to continuous improvement, businesses can better protect themselves against unforeseen disruptions. These measures are not just advisable, they are essential for ensuring resilience and maintaining trust with stakeholders.

Learn more about IDC’s research for technology leaders.

International Data Corporation (IDC) is the premier global provider of market intelligence, advisory services, and events for the technology markets. IDC is a wholly owned subsidiary of International Data Group (IDG Inc.), the world’s leading tech media, data, and marketing services company. Recently voted Analyst Firm of the Year for the third consecutive time, IDC’s Technology Leader Solutions provide you with expert guidance backed by our industry-leading research and advisory services, robust leadership and development programs, and best-in-class benchmarking and sourcing intelligence data from the industry’s most experienced advisors. Contact us today to learn more.

Gerald Johnston, an adjunct research advisor with IDC’s IT Executive Programs (IEP), founded GJ Technology Consulting, LLC, where he assisted global financial institutions and helped launch a UK startup bank. Johnston is an experienced financial services and consulting executive who excels at collaborating across teams to deliver results. Prior to his current role, Johnston led technology delivery for Wells Fargo’s Information Cyber Security, Technology, and Corporate Properties groups, where he and his team modernized the company’s Cyber Threat Fusion Center on behalf of the cybersecurity team. He was selected as a Wells Fargo Global Fellow, whereby he helped a Philippine Micro Finance Bank and its clients in conjunction with Bankers Without Borders.  He is the former CTO of shared services for Wachovia, leading technology for Core Banking, Bank Operations, Finance, Risk, Legal and Marketing business units.



Source link