Balancing Operational Continuity and Safety in Critical Infrastructure
The recent high-profile attacks against Colonial Pipeline and JBS Foods highlight that operational technology (OT) — the devices that drive gas flows and food processing, along with essentially all other machine-driven physical processes — does not need to be directly targeted in order to be shut down as the result of a cyber-attack.
Indeed, in the Colonial Pipeline incident, the information technology (IT) systems were reportedly compromised, with operations shut down intentionally out of an abundance of caution, that is, so as to not risk the attack spreading to OT and threatening safety. This highlights that threats to both human and environmental safety, along with uncertainty as to the scope of infection, present risk factors for these sensitive industrial environments.
Continuity through availability and integrity
In most countries, critical infrastructure (CI) — ranging from power grids and pipelines to transportation and health care — must maintain continuous activity. The recent ransomware attack against Colonial Pipeline demonstrates why this is the case, where gas shortages due to the compromise led to dangerous panic buys and long lines at the pumps.
Ensuring continuous operation of critical infrastructure requires safeguarding the availability and integrity of machinery. This means that organizations overseeing critical infrastructure must foresee any possible risks and implement systems, procedures, and technologies that mitigate or remove these risks so as to keep their operations running.
Operational demand versus safety
Alongside this requirement for operational continuity, and often in opposition to it, is the requirement for operational safety. These requirements can be in opposition because operational continuity demands that devices remain up and running at all costs, and operational safety demands that humans and the environment be protected at all costs.
Safety measures in critical infrastructure have improved and become increasingly prioritized over the last 50 years following numerous high-profile incidents, such as the Bhopal chemical disaster, the Texas City refinery explosion, and the Deepwater Horizon oil spill. Appropriate safety precautions could have likely prevented these incidents, but at the expense of operational continuity.
Consequently, administrators of critical infrastructure have to balance the very real threat that an incident may pose to both human life and the environment with the demand to remain operational at all times. More often than not, the final decision regarding what constitutes an acceptable risk is determined by budgets and cost-benefit analyses.
Cyber-attack: A rising risk profile for critical infrastructure
In 2010, the discovery of the Stuxnet malware — which resulted in a nuclear facility in Iran having its centrifuges ruined via compromised programmable logic controllers (PLCs) — demonstrated that critical infrastructure could be targeted by a cyber-attack.
At the time of Stuxnet, critical infrastructure industries used computers designed to ensure operational continuity with little regard for cyber security, as at the time the risk of a cyber-attack seemed either non-existent or vanishingly low. Since then, a number of attacks targeting industrial environments that have emerged on the global threat landscape.
Classic strains of industrial malware, such as Stuxnet, Triton, and Industroyer, have historically been installed via removable media, such as USB. This is because OT networks are traditionally segregated from the Internet in what is known as an ‘air gap.’ And this remains a prevalent vector of attack, with a study recently finding that cyber-threats installed via USB and other external media doubled in 2021, with 79% of these holding the potential to disrupt OT.
In many ways, operational demands in the subsequent 10 years have made critical infrastructure even more vulnerable. These include the convergence of information technology and operational technology (IT/OT convergence), the adoption of devices in the Industrial Internet of Things (IIoT), and the deprecation of manual back-up systems. This means that OT can be disrupted by cyber-attacks that first target IT systems, rather than having to be installed manually via external media.
At the same time, recent government initiatives — such as the Department of Energy’s 100-day ‘cyber sprint’ to protect electricity operations and President Biden’s Executive Order on Improving the Nation’s Cybersecurity — and regulatory frameworks and directives such as the EU’s NIS directive have either encouraged or mandated that critical infrastructure industries start addressing this new risk.
With the severe and persistent threat that cyber-attacks pose to critical infrastructure, including maritime cybersecurity, and the increasing calls to address the issue, the question remains as to how to best achieve robust cyber defense.
Assessing the risk
To claim administrators of critical infrastructure are ignorant or oblivious to the threat posed by cyber-attacks would be unfair. Many organizations have implemented changes to mitigate or remove the risk either as a result of regulation or their own forward thinking.
However, these projects can take years, even decades. High costs and ever-changing operational demand also mean that these projects may never fully remove the risk.
As a result, many operators may understand the threat of a cyber-attack but not be in a position to do anything about it in the short or medium term. Instead, procedures have to be put in place to minimize risk even if this threatens operational continuity.
For example, a risk assessment may decide it is best to shut down all OT operations in the event of a cyber-attack in order to avoid a major accident. This abundance of caution is forced upon operators, who do not have the ability to immediately confirm the boundaries of a compromise. The prevalence of cyber insurance provides this option with further appeal. Any losses incurred by stopping operations can theoretically be recouped and the risk is therefore transferred.
While the full details of the Colonial Pipeline ransomware incident are still to be determined, the sequence of events outlined below provides a plausible explanation for how a cyber-attack could take down critical infrastructure, even when that cyber-attack does not reach or even target OT systems. Indeed, the CEO of Colonial Pipeline, in a testimony to congress, confirmed “the imperative to isolate and contain the attack to help ensure the malware did not spread to the operational technology network, which controls our pipeline operations, if it had not already.”
The limits of securing IT or OT in isolation
The emergence of OT cyber security solutions in the last five years demonstrates that critical infrastructure industries are trying to find a way to address the risks posed by cyber-attacks. But these solutions have limited scope, as they assume IT and OT are separated and use legacy security techniques such as malware signatures and patch management.
The 2021 SANS ICS Security Summit highlighted how the OT security community suffers from a lack of visibility in knowing and understanding their networks. For many organizations, simply determining whether an unusual incident is an attack or the result of a software error is a challenge.
Given that most OT cyber-attacks actually start in IT networks before pivoting into OT, investing in an IT security solution rather than an OT-specific solution may at first seem like a better business decision. But IT solutions fall short if an attacker successfully pivots into the OT network, or if the attacker is a rogue insider who already has direct access to the OT network. A siloed approach to securing either IT or OT in isolation will thus fall short of the full scope needed to safeguard industrial systems.
It is clear that a mature security posture for critical infrastructure would include security solutions for both IT and OT. Even then, using separate solutions to protect the IT and OT networks is limited, as it presents challenges when defending network boundaries and detecting incidents when an attacker pivots from IT to OT. Under time pressure, a security team does not want changes in visibility, detection, language or interface while trying to determine whether a threat crossed the ‘boundary’ between IT and OT.
Separate solutions can also make detecting an attacker abusing traditional IT attack TTPs within an OT network much harder if the security team is relying on a purely OT solution to defend the OT environment. Examples of this include the abuse of IT remote management tools to affect industrial environments, such as in the suspected cyber-attack at the Florida water facility earlier this year. Cybersecurity for utilities is becoming increasingly important as these sectors face growing cyber threats that can disrupt essential services.
Using AI to minimize cyber risk and maximize cyber safety
In contrast, Darktrace AI is able to defend an entire cyber ecosystem estate, building a ‘pattern of life’ across IT and OT, as well as the points at which they converge. Consequently, cyber security teams can use a single pane of glass to detect and respond to cyber-attacks as they emerge and develop, regardless of where they are in the environment.
Use cases for Darktrace’s Self-Learning AI include containing pre-existing threats to maintain continuous operations. This was seen when Darktrace’s AI detected pre-existing infections and acted autonomously to contain the threat, allowing the operator to leave infected IIoT devices active while waiting for replacements. Darktrace can also thwart ransomware in IT before it can spread into OT, as when Darktrace detected a ransomware attack targeting a supplier for critical infrastructure in North America at its earliest stages.
Darktrace’s unified protection, including visibility and early detection of zero-days, empowers security teams to overcome uncertainty and make a confident decision not to shut down operations. Darktrace has already demonstrated this ability in the wild, and allows organizations to understand normal machine and human behavior in order to enforce this behavior, even in the face of an emerging cyber-attack.