On Friday 19th July 2024, screens at airports, hospitals and businesses around the world went blue. What followed would send shock waves across the IT industry and millions of Microsoft users across the globe. The cause? A faulty update from CrowdStrike, one of the largest and most trusted IT companies in the world.
At the time of writing, the issues caused by this system failure are still causing ramifications in almost every industry and every continent, people are still staring at blue screens and trying to recover their data. So, in this blog, we look at what we can learn from this catastrophic failure and how SMEs can protect themselves from similar incidents in the future, because if CrowdStrike has taught us anything, it’s that incidents like this can occur anytime, anywhere and to anyone.
Blue Screens and the CrowdStrike Failure
CrowdStrike is an incredibly large security company that operates globally, they have a software product called Falcon which acts as a sensor for malicious software or activity and is installed on computers. It offers anti-malware and anti-ransomware to security protect devices, acting like an antivirus software on steroids and is trusted by many of the world’s largest organisations.
Unlike most software which is installed at a user level, the Falcon software is given hardware access, deep within the kernel of the computer, where it lies in wait looking out for unusual activity. If a user level driver crashes, it might cause a minor inconvenience but is unlikely to have any lasting impact on your computer. Whereas kernel level drivers are designed so that if they break, they will automatically blue screen your computer because it is telling the system that something dangerous or damaging is happening deep within the hardware.
Microsoft guards against such issues by rigorously testing any driver that has Kernel level access to their systems. The Falcon driver had been tested, signed off and later installed on millions of devices including computers at the NHS, major airports and huge corporations all over the world for whom security is paramount.
It should have been safe, but sadly, whilst the Falcon driver had been tested by Microsoft, the individual micro updates are not. So at midnight when CrowdStrike rolled out their latest update across the world, that update had not been tested in the way it should have been and disaster struck.
Essentially, the update included a single faulty file that presented as a series of zeros to the Falcon driver, this triggered a warning that something suspicious was happening within the Kernel level and the system reacted as it was designed to do and blue screened every computer where it identified the threat.
Fixing the CrowdStrike problem
The Falcon software is installed on millions of devices, around 8.5 million computers are estimated to have been affected by the update but some people believe this number is closer to 10 million. Once blue screened, these computers cannot be restarted remotely due to the effectiveness of the security measures in place. Instead, the only way to regain control is to physically get into the computer by booting it into safe mode, finding the file and deleting it.
That is fine if you are an individual user with one computer at home, but when you consider the organisations that are using the Falcon driver and the number of computers they have on their network, it becomes a different matter entirely.
Take for example an airport; every screen you see across the entire airport is controlled by a computer, often hidden from view deliberately and sometimes very difficult to reach. Every single one of those thousands of computers will need to be visited by a security-cleared, trained IT engineer who can safely reboot the computer, find and delete the file before the data can be restored.
This is thousands of hours worth of work and billions of pounds worth of damages and downtime while the engineers take their time to visit every single device. The other fly in the ointment was that many of the organisations affected had done the right thing and encrypted their hard drives. An encrypted hard drive requires a decryption key to open it, even in safe mode.
Unfortunately, it seems that a lot of these organisations had been using a fairly outdated method of encryption where they were storing their bit lock recovery keys on servers and those servers had also been affected by the CrowdStrike update. This meant that the engineers needed to get into the servers to recover the encryption keys before they could start working on restoring the computers affected, adding to the recovery time. At the time of writing, there still isn’t a remote method for fixing this problem and as a result, many of these organisations are still struggling to reboot all of their devices.
The good news is that for most, no data would have been lost on the devices that were affected, so when they are eventually rebooted and the file deleted, normal service will be resumed. However, for those that are unable to recover their encryption keys, the only option is to wipe the computer and start from scratch, which would result in all of the data being lost.
What can SMEs learn from the CrowdStrike update?
CrowdStrike lost around $24 billion as a result of the Falcon update, approximately 15% of their value at the time, but have since recovered. They have a lot to learn from this disaster, but so does everyone else. If huge corporations like CrowdStrike and Microsoft can fall foul of a faulty update, causing a global outage that will take months to rectify, so can any organisation. So, what can we learn from this?
IT Fragility
Whenever something like this happens on this scale, it provides an opportunity for every organisation, big and small, to reassess their processes and fragility when faced with something similar. Every company owner should be sitting down with their high level people and asking:
- If this happened to us, what would the cost be?
- What would the remediation be?
- Who would we call first?
- What would we do about it?
- How would we keep our team working and business running?
If you have an internal IT department or an MSP like Ask4Support, you should be having that discussion with them. We are having the conversation internally and with our clients.
Disaster recovery planning
Unfortunately, there is no way of completely guarding against something like this happening, we have to trust that antivirus companies are testing their products and doing the necessary due diligence to prevent incidents like this from occurring. But, if a company of this size can fail to do this, then it just goes to show that it can happen to anyone at any time.
As a result, we all need to ask ourselves what we would do if Windows stopped working for example. It is estimated that the 8.5 million computers affected by the CrowdStrike update make up only 1% of all the Windows users in the world. The truth is that if something affected all Windows users, it would likely cause a global outage and have huge ramifications for everyone, from the global markets to the transport system.
You should consider how you would enable your team to carry on working if your IT systems fail. It is likely to require an old school approach including pen and paper, landline calls and some grit and determination from everyone in your team, but with some forward planning, you could survive it.
Keep a paper record of your disaster recovery plan and distribute it to your entire team, so everyone knows what to do in the event of an IT failure. You should also include wider disaster recovery plans too such as if the office is broken into or the building damaged by fire, or what to do if key personnel are incapacitated for any reason. Disasters will not always be IT related, so your plan should cover every eventuality.
Put systems in place to protect your SME against future failures
For anyone who has basic level Microsoft subscriptions e.g. Microsoft 365, we suggest using Bit Locker so that recovery keys are stored on the cloud, not on servers that could also potentially be affected by a disaster. This means you should be able to restore your system quicker as you would just need access to the cloud, which you can do on any computer with internet access, to recover your encryption keys.
You should also use the cloud to store data wherever possible too, so that in the instance that your computers are damaged, you can still access your important information and pick up where you left off as soon as you get access to new devices. When you use Microsoft licensing, everything is backed up on the cloud automatically, so you can continue to operate your business in the event of a device failure.
Ensure that everyone in your business has company owned devices that are monitored by Microsoft Defender or a similar security software. If your employees are all using Microsoft licensed devices then you can identify security risks at an early stage and shut down an employee’s access to your data and systems before damage is done.
This is increasingly important in the current digital age where every organisation is at threat from malware and spyware that can be deposited in your systems from a malicious email, file or employee activity. It remains to be seen whether this was the cause of the CrowdStrike update issue, but regardless, SMEs need to guard against this relatively new threat.
Choose an IT management service you can rely on
Whether you choose to have an in-house team, or outsource your IT support to an MSP like Ask4Support, you need to have complete trust that your IT team will do everything in their power to protect against incidents like the CrowdStrike update from costing your SME downtime. We regularly offer consultation services to in-house IT teams to ensure that they have covered every angle and checked every box from a security perspective, but we also work with many of our clients to offer a fully outsourced IT management service.
Thankfully, none of our clients were affected by CrowdStrike, but if anything like this were to happen again, we know that we have protected their businesses as much as possible from any implications of such an attack.
If you would like to discuss your IT security or if you have been affected by CrowdStrike and need help recovering your devices, get in touch.