Microsoft has a fix for preventing the next CrowdStrike fiasco, but is it a good one?

The massive worldwide Windows outage caused by a disastrous update from the security company CrowdStrike made clear again just how reliant the world is on technologies few people understand — seemingly even the companies in charge of them.

The incident is a case study in how vulnerable the world is, not just to technology, but to the occasional short-sightedness and incompetence of billion-dollar companies who use them.

In this case, a run-of-the-mill security update of the kind that’s been done thousands of times over the years went wrong because CrowdStrike simply wasn’t paying attention. In the aftermath, there have been calls for changes to the way those kinds of updates are handled to make sure this kind of thing never happens again.

Chief among those calls is one that says CrowdStrike — or any other company — shouldn’t be allowed access to a key part of Windows that could lead to a crash on every system that uses it. By only allowing Microsoft to touch the most vulnerable part of Windows, the thinking goes, Microsoft can keep the OS safe. Those who make this argument say it’s inevitable that if many companies can muck around with the central core of Windows, one of them will make an error and we’ll have more massive crashes like the one caused by CrowdStrike.

But is that really the case — will that solve the problem? To figure that out, we need to first take a look into how the worldwide crash occurred.

Anatomy of a disastrous outage

CrowdStrike offers security software to enterprises and claims it “secures the most critical areas of risk — endpoints and cloud workloads, identity, and data — to keep customers ahead of today’s adversaries and stop breaches.” 

The company says on its website that it is in widespread use among the world’s top companies, including 298 of the Fortune 500 companies, eight of the top 10 financial services firms, six out of the top 10 healthcare providers, and so on.

It provides cybersecurity via its CrowdStrike platform, which, like many other pieces of security software, is composed of two primary parts: the “Falcon sensor,” which is essentially a kind of security engine and “Rapid Response Content,” which contains data the Falcon sensor uses to check for potential cyberattacks and malware. 

The Falcon sensor does not get updated frequently, but the Rapid Response Content is constantly being updated, sometimes multiple times a day.  That’s because cyberattacks and malware are constantly evolving. The Rapid Response Content has information about new potential attacks, and the Falcon sensor uses that information to keep companies safe. The more frequently it’s updated, the safer companies should be.

The Falcon sensor and Rapid Response Content both have access to the Windows kernel — the very core of the operating system. That means if something goes wrong with a CrowdStrike update, it can crash Windows and make it difficult to get the operating system re-started.

That’s exactly what happened here. CrowdStrike didn’t properly vet a Rapid Response Content update, and it brought down every Windows system that received the update with the dreaded Blue Screen of Death. Restarting Windows didn’t solve the problem because the issue affected the Windows kernel. Each PC with the bad update had to be restarted manually, booted into Safe Mode, and then someone had to navigate with File Explorer to Windows > System32 > drivers > CrowdStrike, and delete a specific file. That’s why it took so long to recover from the flawed update; there simply weren’t enough IT staffers available to do the work.

As for why CrowdStrike let a bad update into the Windows kernel, one reason is that Rapid Response Content updates don’t go through as comprehensive checking procedure as a Falcon sensor update. The company apparently thought Rapid Response Content updates couldn’t do as much harm as bad Sensor updates. The company certainly got that wrong and pledged to fix the problem quickly.

Microsoft’s suggestion for better security

Soon after the crash, Microsoft’s John Cable, vice president of program management for Windows servicing and delivery, wrote a blog post about how Windows could better protect against widespread crashes in the future. There wasn’t anything particularly startling in his recommendations, including, “This incident shows clearly that Windows must prioritize change and innovation in the area of end-to-end resilience.” Cable also recommended using technologies that don’t require security companies access the Windows kernel.

Because of that, many people and companies, including CrowdStrike and other security vendors, believed Cable’s comments were a first step towards taking away security companies’ access to the kernel. They fear Microsoft could argue that allowing too many companies to use the kernel makes Windows less secure, and the more companies that access it the more opportunities there are for errors. The companies see that possibility as essentially a land grab by Microsoft; if companies are denied kernel access, Microsoft could take away their business.

Cloudflare CEO Matthew Prince, for example, warns, “Lest we forget, Microsoft themselves had their own eternal screw up where they potentially let a foreign actor read every customer’s email because they failed to adequately secure their session signing keys. We still have no idea how bad the implications of #EternalBlue are.”

At the moment, even if Microsoft wanted to ban access to the kernel, it couldn’t do so An agreement it made with the European Union in 2009 guarantees kernel access to security vendors. But the company could use the CrowdStrike fiasco as a way to reopen negotiations.

Would Windows be safer if only Microsoft had kernel access? Certainly not. Prince is right — Microsoft has a history of big-time security screw-ups. I’ve often written about them, notably lax security practices that allowed Chinese spies to hack the accounts of high-level government officials, including US Commerce Secretary Gina Raimondo, Ambassador to China Nicholas Burns, and Rep. Don Bacon (R-NE). (All of them are involved with the country’s relationship with China.)

The best way to make Windows safer is to give reliable security companies access to the Windows kernel. Collective security is a better bet than allowing Microsoft to go it alone, particularly given its problematic security history. That was true before the CrowdStrike mess, and it remains true in the aftermath.

​The massive worldwide Windows outage caused by a disastrous update from the security company CrowdStrike made clear again just how reliant the world is on technologies few people understand — seemingly even the companies in charge of them.

The incident is a case study in how vulnerable the world is, not just to technology, but to the occasional short-sightedness and incompetence of billion-dollar companies who use them.

In this case, a run-of-the-mill security update of the kind that’s been done thousands of times over the years went wrong because CrowdStrike simply wasn’t paying attention. In the aftermath, there have been calls for changes to the way those kinds of updates are handled to make sure this kind of thing never happens again.

Chief among those calls is one that says CrowdStrike — or any other company — shouldn’t be allowed access to a key part of Windows that could lead to a crash on every system that uses it. By only allowing Microsoft to touch the most vulnerable part of Windows, the thinking goes, Microsoft can keep the OS safe. Those who make this argument say it’s inevitable that if many companies can muck around with the central core of Windows, one of them will make an error and we’ll have more massive crashes like the one caused by CrowdStrike.

But is that really the case — will that solve the problem? To figure that out, we need to first take a look into how the worldwide crash occurred.

Anatomy of a disastrous outage

CrowdStrike offers security software to enterprises and claims it “secures the most critical areas of risk — endpoints and cloud workloads, identity, and data — to keep customers ahead of today’s adversaries and stop breaches.” 

The company says on its website that it is in widespread use among the world’s top companies, including 298 of the Fortune 500 companies, eight of the top 10 financial services firms, six out of the top 10 healthcare providers, and so on.

It provides cybersecurity via its CrowdStrike platform, which, like many other pieces of security software, is composed of two primary parts: the “Falcon sensor,” which is essentially a kind of security engine and “Rapid Response Content,” which contains data the Falcon sensor uses to check for potential cyberattacks and malware. 

The Falcon sensor does not get updated frequently, but the Rapid Response Content is constantly being updated, sometimes multiple times a day.  That’s because cyberattacks and malware are constantly evolving. The Rapid Response Content has information about new potential attacks, and the Falcon sensor uses that information to keep companies safe. The more frequently it’s updated, the safer companies should be.

The Falcon sensor and Rapid Response Content both have access to the Windows kernel — the very core of the operating system. That means if something goes wrong with a CrowdStrike update, it can crash Windows and make it difficult to get the operating system re-started.

That’s exactly what happened here. CrowdStrike didn’t properly vet a Rapid Response Content update, and it brought down every Windows system that received the update with the dreaded Blue Screen of Death. Restarting Windows didn’t solve the problem because the issue affected the Windows kernel. Each PC with the bad update had to be restarted manually, booted into Safe Mode, and then someone had to navigate with File Explorer to Windows > System32 > drivers > CrowdStrike, and delete a specific file. That’s why it took so long to recover from the flawed update; there simply weren’t enough IT staffers available to do the work.

As for why CrowdStrike let a bad update into the Windows kernel, one reason is that Rapid Response Content updates don’t go through as comprehensive checking procedure as a Falcon sensor update. The company apparently thought Rapid Response Content updates couldn’t do as much harm as bad Sensor updates. The company certainly got that wrong and pledged to fix the problem quickly.

Microsoft’s suggestion for better security

Soon after the crash, Microsoft’s John Cable, vice president of program management for Windows servicing and delivery, wrote a blog post about how Windows could better protect against widespread crashes in the future. There wasn’t anything particularly startling in his recommendations, including, “This incident shows clearly that Windows must prioritize change and innovation in the area of end-to-end resilience.” Cable also recommended using technologies that don’t require security companies access the Windows kernel.

Because of that, many people and companies, including CrowdStrike and other security vendors, believed Cable’s comments were a first step towards taking away security companies’ access to the kernel. They fear Microsoft could argue that allowing too many companies to use the kernel makes Windows less secure, and the more companies that access it the more opportunities there are for errors. The companies see that possibility as essentially a land grab by Microsoft; if companies are denied kernel access, Microsoft could take away their business.

Cloudflare CEO Matthew Prince, for example, warns, “Lest we forget, Microsoft themselves had their own eternal screw up where they potentially let a foreign actor read every customer’s email because they failed to adequately secure their session signing keys. We still have no idea how bad the implications of #EternalBlue are.”

At the moment, even if Microsoft wanted to ban access to the kernel, it couldn’t do so An agreement it made with the European Union in 2009 guarantees kernel access to security vendors. But the company could use the CrowdStrike fiasco as a way to reopen negotiations.

Would Windows be safer if only Microsoft had kernel access? Certainly not. Prince is right — Microsoft has a history of big-time security screw-ups. I’ve often written about them, notably lax security practices that allowed Chinese spies to hack the accounts of high-level government officials, including US Commerce Secretary Gina Raimondo, Ambassador to China Nicholas Burns, and Rep. Don Bacon (R-NE). (All of them are involved with the country’s relationship with China.)

The best way to make Windows safer is to give reliable security companies access to the Windows kernel. Collective security is a better bet than allowing Microsoft to go it alone, particularly given its problematic security history. That was true before the CrowdStrike mess, and it remains true in the aftermath. Read More