Attacks Against Process Control Systems: Risk Assessment, Detection, and Response

Download as pdf or txt
Download as pdf or txt
You are on page 1of 12

Attacks Against Process Control Systems: Risk

Assessment, Detection, and Response

Alvaro A. Cárdenas§ , Saurabh Amin‡ , Zong-Syun Lin† ,


Yu-Lun Huang† , Chi-Yen Huang† and Shankar Sastry‡
Fujitsu Laboratories of America
§

University of California, Berkeley




National Chiao Tung University, Taiwan

ABSTRACT General Terms


In the last years there has been an increasing interest in the secu- Security
rity of process control and SCADA systems. Furthermore, recent
computer attacks such as the Stuxnet worm, have shown there are
parties with the motivation and resources to effectively attack con-
Keywords
trol systems. SCADA, security, IDS, control systems, critical infrastructure pro-
While previous work has proposed new security mechanisms for tection, cyber-physical systems
control systems, few of them have explored new and fundamen-
tally different research problems for securing control systems when
compared to securing traditional information technology (IT) sys-
1. INTRODUCTION
tems. In particular, the sophistication of new malware attacking Control systems are computer-based systems that monitor and
control systems–malware including zero-days attacks, rootkits cre- control physical processes. These systems represent a wide vari-
ated for control systems, and software signed by trusted certificate ety of networked information technology (IT) systems connected
authorities–has shown that it is very difficult to prevent and detect to the physical world. Depending on the application, these control
these attacks based solely on IT system information. systems are also called Process Control Systems (PCS), Supervi-
In this paper we show how, by incorporating knowledge of the sory Control and Data Aquisition (SCADA) systems (in industrial
physical system under control, we are able to detect computer at- control or in the control of the critical infrastructures), Distributed
tacks that change the behavior of the targeted control system. By Control Systems (DCS) or Cyber-Physical Systems (CPS) (to refer
using knowledge of the physical system we are able to focus on the to embedded sensor and actuator networks).
final objective of the attack, and not on the particular mechanisms Control systems are usually composed of a set of networked
of how vulnerabilities are exploited, and how the attack is hidden. agents, consisting of sensors, actuators, control processing units
We analyze the security and safety of our mechanisms by explor- such as programmable logic controllers (PLCs), and communica-
ing the effects of stealthy attacks, and by ensuring that automatic tion devices. For example, the oil and gas industry use integrated
attack-response mechanisms will not drive the system to an unsafe control systems to manage refining operations at plant sites, re-
state. motely monitor the pressure and flow of gas pipelines, and control
A secondary goal of this paper is to initiate the discussion be- the flow and pathways of gas transmission. Water utilities can re-
tween control and security practitioners–two areas that have had motely monitor well levels and control the wells pumps; monitor
little interaction in the past. We believe that control engineers can flows, tank levels, or pressure in storage tanks; monitor pH, turbid-
leverage security engineering to design–based on a combination of ity, and chlorine residual; and control the addition of chemicals to
their best practices–control algorithms that go beyond safety and the water.
fault tolerance, and include considerations to survive targeted at- Several control applications can be labeled as safety-critical: their
tacks. failure can cause irreparable harm to the physical system being con-
trolled and to the people who depend on it. SCADA systems, in par-
ticular, perform vital functions in national critical infrastructures,
Categories and Subject Descriptors such as electric power distribution, oil and natural gas distribution,
C.2.0 [Computer-Communication Network]: Security and Pro- water and waste-water treatment, and transportation systems. They
tection; B.8.2 [Performance and Reliability]: Performance Anal- are also at the core of health-care devices, weapons systems, and
ysis and Design Aids transportation management. The disruption of these control sys-
tems could have a significant impact on public health, safety and
lead to large economic losses.
Control systems have been at the core of critical infrastructures,
Permission to make digital or hard copies of all or part of this work for manufacturing and industrial plants for decades, and yet, there have
personal or classroom use is granted without fee provided that copies are been few confirmed cases of cyberattacks. Control systems, how-
not made or distributed for profit or commercial advantage and that copies ever, are now at a higher risk to computer attacks because their
bear this notice and the full citation on the first page. To copy otherwise, to vulnerabilities are increasingly becoming exposed and avail-
republish, to post on servers or to redistribute to lists, requires prior specific
able to an ever-growing set of motivated and highly-skilled at-
permission and/or a fee.
ASIACCS ’11, March 22–24, 2011, Hong Kong, China. tackers.
Copyright 2011 ACM 978-1-4503-0564-8/11/03 ...$10.00. No other attack demonstrates the threat to control systems as the
Stuxnet worm [1, 2]. The ultimate goal of Stuxnet is to sabotage speculations or belong only in Hollywood movies.
that facility by reprogramming controllers to operate, most likely, Stuxnet intercepts routines to read, write and locate blocks on a
out of their specified boundaries [1]. Stuxnet demonstrates that Programmable Logic Controller (PLC). By intercepting these re-
the motivation and capability exists for creating computer attacks quests, Stuxnet is able to modify the data sent to or returned from
capable to achieve military goals [3]. the PLC without the operator of the PLC ever realizing it [1].
Not only can Stuxnet cause devastating consequences, but it is Stuxnet was discovered on systems in Iran in June 2010 by re-
also very difficult to detect. Because Stuxnet used zero-day vul- searchers from Belarus–from the company VirusBlokAda; how-
nerabilities, antivirus software would not have prevented the at- ever, it is believed to have been released more than a year before.
tack. In fact, the level of sophistication of the attack prevented Stuxnet is a worm that spreads by infecting Windows computers.
some well known security companies such as Kaspersky to detect It uses multiple methods and zero-day exploits to spread itself via
it initially [4]. In addition, victims attempting to detect modifica- LANs or USB sticks. It is likely that propagation by LAN served as
tions to their embedded controllers would not see any rogue code the first step, and propagation through removable drives was used
as Stuxnet hides its modifications with sophisticated PLC rootkits, to reach PCs not connected to other networks–therefore being iso-
and validated its drivers with trusted certificates. lated from the Internet or other networks is not a complete defense.
The main motivation behind this work is the observation that Once Stuxnet infects a Windows computer, it installs its own
while attackers may be able to hide the specific information tech- drivers. Because these drivers have to be signed, Stuxnet used
nology methods used to exploit the system and reprogram their two stolen certificates. Stuxnet also installs a rootkit to hide it-
computers, they cannot hide their final goal: the need to cause an self. The goal of the worm in a Windows computer is to search
adverse effect on the physical system by sending malicious sensor for WinCC/Step 7, a type of software used to program and monitor
or controller data that will not match the behavior expected by a PLCs. (PLCs are the embedded systems attached to sensors and
supervisory control or an anomaly detection system. actuators that run control algorithms to keep the physical system
Therefore, in this paper we explore security mechanisms that de- operating correctly. They are typically programmed with a ladder
tect attacks by monitoring the physical system under control, and logic program: a logic traditionally used to design control algo-
the sensor and actuator values. Our goal is to detect modifications rithms for panels of electromechanical relays.)
to the sensed or controlled data as soon as possible, before the at- If Stuxnet does not find the WinCC/Step 7 software in the in-
tack causes irreversible damages to the system (such as compro- fected Windows machine, it does nothing; however, if it finds the
mising safety margins). software, it infects the PLC with another zero-day exploit, and then
In the rest of the paper we first summarize the vulnerability of reprograms it. Stuxnet also attempts to hide the PLC changes with
control systems by discussing known attacks. We then discuss a PLC rootkit.
the efforts for securing control systems solely from an information The reprogramming is done by changing only particular parts of
technology perspective and identify the new and unique research the code–overwriting certain process variables every five seconds
problems that can be formulated by including a model of the phys- and inserting rouge ladder logic–therefore it is impossible to pre-
ical system under control. We then develop a new attack detection dict the effects of this change without knowing exactly how the
algorithm and study the methodology on how to evaluate anomaly PLC is originally programmed and what it is connected to, since
detection algorithms and their possible response strategies. the PLC program depends on the physical system under control,
and typically, physical system parameters are unique to each indi-
vidual facility. This means that the attackers were targeting a very
2. THE VULNERABILITY OF CONTROL specific PLC program and configuration (i.e., a very specific con-
SYSTEMS AND STUXNET trol system deployment).
There have been many computer-based incidents in control sys- Many security companies, including Symantec and Kaspersky
tems. Computer-based accidents can be caused by any unantic- have said that Stuxnet is the most sophisticated attack they have
ipated software error, like the power plant shutdown caused by a ever analyzed, and it is not difficult to see the reasons. Stuxnet uses
computer rebooting after a patch [5]. Non-targeted attacks are four zero-day exploits, a Windows rootkit, the first known PLC
incidents caused by the same attacks that any computer connected rootkit, antivirus evasion techniques, peer-to-peer updates, and stolen
to the Internet may suffer, such as the Slammer worm infecting the certificates from trusted CAs. There is evidence that Stuxnet kept
Davis-Besse nuclear power plant [6], or the case of a controller be- evolving since its initial deployment as attackers upgraded the in-
ing used to send spam in a water filtering plant [7]. fections with encryption and exploits, apparently adapting to con-
However, the biggest threat to control systems are Targeted at- ditions they found on the way to their target. The command and
tacks. These attacks are the ones where the miscreants know that control architecture used two servers if the infected machines were
they are targeting control systems, and therefore, they tailor their able to access the Internet, or a peer to peer messaging system that
attack strategy with the aim of damaging the physical system un- could be used for machines that are offline. In addition, the attack-
der control. Targeted attacks against control systems are not new. ers had a good level of intelligence about their target; they knew all
Physical attacks–for extortion and terrorism–are a reality in some the details of the control system configuration and its programs.
countries [8]. Cyber-attacks are a natural progression to physical The sophistication of this attack has lead many to speculate that
attacks: they are cheaper, less risky for the attacker, are not con- Stuxnet is the creation of a state-level sponsored attack. Since Iran
strained by distance, and are easier to replicate and coordinate. has an unusually high percentage of the total number of reported
A classic computer-based targeted attack to SCADA systems is infections of the worm in the world [1], there has been some spec-
the attack on Maroochy Shire Council’s sewage control system in ulation that their target was a specific industrial control system in
Queensland, Australia [9]. There are many other reported targeted Iran [2], such as a gas pipeline or power plant.
attacks [10–16]; however, no other attack has demonstrated the We argue that a threat like the Stuxnet worm must be dealt with
threats that control systems are subject to as well as the Stuxnet defense-in-depth mechanisms like anomaly detection schemes. While
worm [1, 2]. Stuxnet has made clear that there are groups with traditional anomaly detection mechanisms may have some draw-
the motivation and skills to mount sophisticated computer-based backs like false alarms, we argue that for certain control systems,
attacks to critical infrastructures, and that these attacks are not just anomaly detection schemes focusing on the physical system–instead
of using software or network models–can provide good detection 3.2 Differences
capabilities with negligible false alarm rates. While it is clear that the security of control systems has become
an active area in recent years, we believe that, so far, no one has
been able to articulate what is new and fundamentally different in
3. NEW SECURITY PROBLEMS FOR CON- this field from a research point of view when compared to tradi-
TROL SYSTEMS tional IT security.
In this paper we would like to start this discussion by summariz-
ing some previously identified differences and by proposing some
3.1 Efforts for Securing Control Systems new problems.
Most of the efforts for protecting control systems (and in partic- The property of control systems that is most commonly brought
ular SCADA) have focused on safety and reliability (the protection up as a distinction with IT security is that software patching and
of the system against random and/or independent faults). Tradi- frequent updates, are not well suited for control systems. For
tionally, control systems have not dealt with intentional actions or example, upgrading a system may require months of advance in
systematic failures. There is, however, an urgent growing concern planning how to take the system offline; it is, therefore, econom-
for protecting control systems against malicious cyberattacks [6, ically difficult to justify suspending the operation of an industrial
17–24]. computer on a regular basis to install new security patches. Some
There are several industrial and government-led efforts to im- security patches may even violate the certification of control sys-
prove the security of control systems. Several sectors–including tems, or–as previously mentioned–cause accidents to control sys-
chemical, oil and gas, and water–are currently developing programs tems [5].
for securing their infrastructure. The electric sector is leading the Patching, however, is not a fundamental limitation to control sys-
way with the North American Electric Reliability Corporation (NERC) tems. A number of companies have demonstrated that a careful
cybersecurity standards for control systems [25]. NERC is autho- antivirus and patching policy (e.g., the use of tiered approaches)
rized to enforce compliance to these standards, and it is expected can be used successfully [30, 31]. Also, most of the major control
that all electric utilities are fully compliant with these standards by equipment vendors now offer guidance on both patch management
the end of 2010. and antivirus deployment for their control products. Thus there is
NIST has also published a guideline for security best practices little reason for SCADA system operators not to have good patch
for general IT in Special Publication 800-53. Federal agencies and antivirus programs in place today [32].
must meet NIST SP800-53. To address the security of control sys- Large industrial control systems also have a large amount of
tems, NIST has also published a Guide to Industrial Control Sys- legacy systems. Lightweight cryptographic mechanisms to en-
tem (ICS) Security [26], and a guideline to smart grid security in sure data integrity and confidentiality have been proposed to se-
NIST-IR 7628. Although these recommendations are not enforce- cure these systems [33, 34]. The recent IEEE P1711 standard is
able, they can provide guidance for analyzing the security of most designed for providing security in legacy serial links [35]. Having
utility companies. some small level of security is better than having no security at all;
ISA (a society of industrial automation and control systems) is however, we believe that most of the efforts done for legacy systems
developing ISA-SP 99: a security standard to be used in manufac- should be considered as short-term solutions. For properly secur-
turing and general industrial controls. ing critical control systems the underlying technology must satisfy
The Department of Energy has also led security efforts by estab- some minimum performance requirements to allow the implemen-
lishing the national SCADA test bed program [27] and by devel- tation of well tested security mechanisms and standards.
oping a 10-year outline for securing control systems in the energy Another property of control systems that is commonly mentioned
sector [21]. The report–released in January 2006–identifies four is the real-time requirements of control systems. Control systems
main goals (in order from short-term goals to long-term goals): (1) are autonomous decision making agents which need to make deci-
measure the current security posture of the power grid, (2) develop sions in real time. While availability is a well studied problem in
and integrate protective measures, (3) implement attack detection information security, real-time availability provides a stricter op-
and response strategies; and (4) sustain security improvements. erational environment than most traditional IT systems. We show
The use of wireless sensor networks in SCADA systems is be- in this paper that real-time availability requirements depend on the
coming pervasive, and thus we also need to study their security. dynamics (fast vs. slow) of the physical system.
A number of companies have teamed up to bring sensor networks Not all operational differences are more severe in control sys-
in the field of process control systems, and currently, there are tems than in traditional IT systems. By comparison to enterprise
two working groups to standardize their communications [28, 29]. systems, control systems exhibit comparatively simpler network
Their wireless communication proposal has options to configure dynamics: Servers change rarely, there is a fixed topology, a sta-
hop-by-hop and end-to-end confidentiality and integrity mechanisms. ble user population, regular communication patterns, and a limited
Similarly they provide the necessary protocols for access control number of protocols. Therefore, implementing network intrusion
and key management. detection systems, anomaly detection, and white listing may be eas-
All these efforts have essentially three goals: (1) create aware- ier than in traditional enterprise systems [36].
ness of security issues with control systems, (2) help control sys-
tems operators and IT security officers design a security policy, and
(3) recommend basic security mechanisms for prevention (authen- 3.3 What is new and fundamentally different?
tication, access controls, etc), detection, and response to security While all these differences are important, we believe that the ma-
breaches. jor distinction of control systems with respect to other IT systems
While these recommendations and standards have placed signif- is the interaction of the control system with the physical world.
icant importance on survivability of control systems (their ability While current tools from information security can give necessary
to operate while they are under attack); we believe that they have mechanisms for securing control systems, these mechanisms alone
not explored some new research problems that arise when control are not sufficient for defense-in-depth of control systems. When
systems are under attack. attackers bypass basic security defenses they may be able to affect
the physical world. and pi is the probability that event i occurs. Other risk metrics try
In particular, research in computer security has focused tradi- to get more information about the probability distribution of the
tionally on the protection of information; but it has not consid- losses, and not only its mean value (Rµ ). For example the variance
ered how attacks affect estimation and control algorithms–and ulti- of the losses Rχ = E[L2 ] − Rµ is very useful in finance since it
mately, how attacks affect the physical world. gives more information to risk averse individuals. This is particu-
We believe that by understanding the interactions of the control larly important if the average loss is computed for a large period of
system with the physical world, we should be able to develop a time (e.g. annually). If the loss is considered every time there is a
general and systematic framework for securing control systems in computer event then we believe the average loss by itself provides
three fundamentally new areas: enough risk information to make a rational decision.
In this paper we focus on attacks on sensor networks and the
1. Better understand the consequences of an attack for risk as- effects they have on the process control system. Therefore pi de-
sessment. While there has been previous risk assessment notes the likelihood that an attacker will compromise sensor i, and
studies on cyber security for SCADA systems [18, 37–39], Li denotes the losses associated with that particular compromise.
currently, there are few studies on identifying the attack strat- To simplify our presentation we assume that pi is the same for all
egy of an adversary, once it has obtained unauthorized ac- sensors, therefore our focus in the remaining of this section is to
cess to some control network devices. Notable exceptions estimate the potential losses Li . The results can then be used to
are the study of false data-injection attacks to state estima- identify high priority sensors and to invest a given security budget
tion in power grids [40–45], and electricity markets [46]. We in the most cost-effective way.
need further research to understand the threat model in order
to design appropriate defenses and to invest in securing the
most critical sensors or actuators. 4.1 Attack models
We consider the case when the state of the system is measured
2. Design new attack-detection algorithms. By monitoring the by a sensor network of p sensors with measurement vector y(k) =
behavior of the physical system under control, we should be {y1 (k), . . . , yp (k)}, where yi (k) denotes the measurement by sen-
able to detect a wide range of attacks by compromised mea- sor i at time k. All sensors have a dynamic range that defines
surements. The work closest to ours are the study of false the domain of yi for all k. That is, all sensors have defined mini-
data injection attacks in control systems [47] and the intru- mum and maximum values ∀k, yi (k) ∈ [yimin , yimax ]. Let Yi =
sion detection models of Rrushi [48]–this last work; how- [yimin , yimax ]. We assume each sensor has a unique identity pro-
ever, does not consider dynamical models of the process con- tected by a cryptographic key.
trol system. We need further research on dynamical system Let ỹ(k) ∈ Rp denote the received measurements by the con-
models used in control theory as a tool for specification- troller at time k. Based on these measurements the control sys-
based intrusion detection systems. tem defines control actions to maintain certain operational goals. If
some of the sensors are under attack, ỹ(k) may be different from
3. Design new attack-resilient algorithms and architectures: we the real measurement y(k); however, we assume that the attacked
need to design and operate control systems to survive an in- signals ỹi (k) also lie within Yi (signals outside this range can be
tentional cyber assault with no loss of critical functions. Our easily detected by fault-tolerant algorithms).
goal is to design systems where even if attackers manage to Let Ka = {ks , . . . , ke } represent the attack duration; between
bypass some basic security mechanisms, they will still face the start time ks and stop time ke of an attack. A general model for
several control-specific security devices that will minimize the observed signal is the following:
the damage done to the system. In particular, we need to in- 
vestigate how to reconfigure and adapt control systems when yi (k) for k ∈
/ Ka
ỹi (k) =
they are under an attack to increase the resiliency of the sys- ai (k) for k ∈ Ka , ai (k) ∈ Yi
tem. We are not aware of any other work on designing new
control algorithms or reconfiguration and control algorithms where ai (k) is the attack signal. This general sensor attack model
able to withstand attacks, or that reconfigure their operations can be used to represent integrity attacks and DoS attacks. In
based on detected attacks. There is previous work on safety an integrity attack we assume that if attackers have compromised
and fault diagnosis; however, as we explain in this paper, a sensor, then they can inject any arbitrary value, therefore in this
these systems are not enough for detecting deception attacks case, ai (k) is some arbitrary non-zero value.
launched by an intelligent attacker with knowledge on how In a DoS attack, the controller will notice the lack of new mea-
to evade fault detection methods used by the system. surements and will react accordingly. An intuitive response for a
controller to implement against a DoS attack is to use the last sig-
In the next sections we describe our ideas, experiments, and re- nal received: ai (k) = yi (ks ), where yi (ks ) is the last measure-
sults for (1) risk-assessment, (2) false-data-injection detection, and ment received before the DoS attack starts.
(3) automatic attack-response in process control systems. In each
section we first present a general theory for approaching the topic, 4.2 Experiments
and then for experimental validation, we implement our ideas to the
model of a chemical reactor process. To test our attacks, we use the Tennessee-Eastman process con-
trol system (TE-PCS) model and the associated multi-loop PI con-
trol law as proposed by Ricker [49]. We briefly describe the process
4. RISK ASSESSMENT architecture and the control loops in Figure 1. The original process
Risk management is the process of shifting the odds in your favor model is implemented in FORTRAN and the PI control law is im-
by finding among all possible alternatives, the one that minimizes plemented in MATLAB. We use this code for our study.
the impact of uncertain events. The chemical process consists of an irreversible reaction which
Probably the best
P well known risk metric is the average loss occurs in the vapour phase inside a reactor of fixed volume V of
Rµ = E[L] ≈ i Li pi , where Li is the loss if event i occurs, 122 (m3 ). Two non-condensible reactants A and C react in the
When u3 saturates, the loop−4 controller uses u1 to control the
Loop 4 p
Controller
pressure P . The controllers for all four loops in figure 1 are pro-
F4sp
portional integral (PI) controllers.
pmax
Valve
In steady-state operation, the production rate F4 is 100 kmol h−1 ,
F4 Loop 1 F3
Purge
the pressure P is 2700 KP a and the fraction of A in the purge is
Controller
u1 47 mol%.
u3 We study the security issues of control systems by experiment-
F1 Loop 2
ing and simulating cyber attacks on sensor signals in the TE-PCS
Feed 1 psp
(A+B+C) Sensor y5 p model. Because operating the chemical reactor with a pressure
Controller
Valve
larger than 3000 kPa is unsafe (it may lead to an explosion or dam-
Product(D)
Valve age of the equipment) We.assume that that the goal of the attacker
F2
Feed 2
(pure A) Sensor y7 is to raise the pressure level of the tank to a value larger than 3000
Sensor y4 kPa. We model an attacker that only has access to a single sensor at
u2
a given time. We also assume Li > Lj , when an attack i can drive
yA3
the system to an unsafe state and an attack j cannot, and Li = Lj
yA3sp Loop 3 F4 if both attacks i and j either do not drive the system to an unsafe
Controller state, or both can compromise the safety of the sytem.
From the experimental results, we found that the most effective
of these attacks were max/min attacks (i.e., when ai (k) = yimin or
ai (k) = yjmax ). However, not all of the max/min attacks were able
to drive the pressure to unsafe levels. We now summarize some of
Figure 1: Architecture of the Simplified TE Plant.
the results.

presence of an inert B to form a non-volatile liquid product D: • By attacking the sensors, a controller is expected to respond
with incorrect control signals since it receives wrong infor-
B
A + C −→ D. mation from the compromised sensors. For example, by forg-
ing y7 as y7max from t = 0 to 30, the controller believes there
The feed stream 1 contains A, C and trace of B; feed stream 2 is is a large amount of component A in the tank.
pure A; stream 3 is the purge containing vapours of A, B, C; and
stream 4 is the exit for liquid product D. The measured flow rates y5 y7
3000 120
y˜7
of stream i is denoted by Fi (kmol h−1 ). The control objectives

Amount of A in purge ( mol % )


2900
100 y7
are
Pressure ( kPa )

2800
80
2700
- Regulate F4 , the rate of production of the product D, at a
set-point F4sp (kmol h−1 ),
2600 60

2500
40

- Maintain P , the operating pressure of the reactor, below the 2400


0 10 20 30 40 0 10 20 30 40
Time (hour) Time (hour)
shut-down limit of 3000 kP a as dictated safety considera-
tions,
Figure 2: Integrity attack y7max from t = 0 to 30. The system
- Minimize C, the operating cost measured in (kmol-of-product). remains in a safe state for attacks on y7 .
The cost depends linearly on the purge loss of A and C rel-
ative to the production rate of D. The cost considerations From the experiments, we found that the plant system can go
dictate that the pressure be maintained as close as possible to back to the steady state after the attack finishes, as illustrated
3000 kP a. in Fig 2. Furthermore, the pressure in the main tank never
The production rate of D, denoted by rD (kmol h−1 ) is reaches 3000 kPa. In general we found that the plant is very
resilient to attacks on y7 and y4 . Attacks in the limit of the
v1 v2
rD = k0 yA3 yC3 P v3 , sensing range (y min and y max ) were the more damaging,
but they did not force the system into an unsafe state.
where yA3 and yC3 denote the respective fractions of A and C in
the purge and v1 , v2 , v3 are given constants. • By launching attack y5min the controller turns down the purge
There are four input variables (or command signals) available to valve to increase the pressure and prevent the liquid products
achieve the above control objectives. The first three input variables, from accumulating. We can see that the real pressure of the
denoted as u1 , u2 and u3 , trigger the actuators that can change tank (y5 in Fig 3(a)) keeps increasing past 3000 kPa and the
the positions of the respective valves. The fourth input variable, system operates in an unsafe state. In this experiment, it takes
denoted as u4 , is the set point for the proportional controller for the about 20 hours (t = 10 to t = 30) to shut down (or cause
liquid inventory. The input variables as used by the controller in an explosion to) the plant. This long delay in causing an
the following way: effective attack may give defenders the advantage: for phys-
ical processes with slow-dynamics, it is possible that human
• Production rate y4 = F4 is controlled using Feed 1 (u1 ) by
system operators may have enough time to observe unusual
loop−1 controller,
phenomenon and take proper actions against the attack.
• Pressure y5 = P is controlled using the purge rate (u3 ) by
loop−2 controller, • We found out that in general DoS attacks do not affect the
plant. We ran the plant 20 times for 40 hours each and for
• Partial pressure of product A in the purge y7 = yA3 is con- a DoS attack lasting 20 hours the pressure in the tank never
trolled using Feed 2 (u3 ) by loop−3 controller, exceeded 2900kPa.
y˜5 y
5
y
5
3500 2760
y˜5 nomically, and even impossible to obtain in reasonable time due to
y5
3000
X: 28.6 y5 the complex nature of many systems and processes. (The TE-PCS
2740
2500 Y: 3002
system used in our experiments is one of the few cases available in
Pressure ( kPa )

Pressure ( kPa )
2000
2720 the literature of a detailed nonlinear model of an industrial control
1500

1000
problem; this is the reason why the TE-PCS system has been used
2700
500
as a standard testbed in many industrial control papers.)
0 2680 To facilitate the creation of physical models, most industrial con-
0 10 20 30 40 0 10 20 30 40
Time (hour) Time (hour) trol vendors provide tools (called identification packages) to de-
9(a) 9(b) velop models of physical systems from training data. The most
common models are linear systems. Linear systems can be used to
model dynamics that are linear in state x(k) and control input u(k)
Figure 3: Safety can be breached by compromising sensor y5 x(k + 1) = Ax(k) + Bu(k) (1)
(3(a)). DoS attacks, on the other hand, do not cause any damage
(and they are easy to detect.) (3(b)). where time is represented by k ∈ Z+ , x(k) = (x1 (k), . . . , xn (k)) ∈
Rn is the state of the system, and u(k) = (u1 (k), . . . , um (k)) ∈
Rm is the control input. The matrix A = (aij ) ∈ Rn×n models the
We conclude that if the plant operator wants to prevent an attack physical dependence of state i on state j, and B = (bij ) ∈ Rn×m
from making the system operate in an unsafe state, it should priori- is the input matrix for state i from control input j.
tize defenses against integrity attacks rather than on DoS attacks. If Assume the system (1) is monitored by a sensor network with
the plant operator only has enough budget to deploy advanced se- p sensors. We obtain the measurement sequence from the observa-
curity mechanisms for one sensor (e.g., tamper resistance, or TPM tion equations
chips), y5 should be the priority.
ŷ(k) = Cx(k), (2)
p
5. DETECTION OF ATTACKS where ŷ(k) = (ŷ1 (k), . . . , ŷp (k)) ∈ R , and ŷl (k) ∈ R is the
estimated measurement collected by sensor l at time k. Matrix C ∈
Detecting attacks to control systems can be formulated as an Rp×n is called output matrix.
anomaly-based intrusion detection problem [50]. One big differ-
ence in control systems compared to traditional IT systems, is that
instead of creating models of network traffic or software behavior,
5.2 Detection Methods
we can use a representative model of the physical system. The physical-model-based attack detection method presented in
The intuition behind this approach is the following: if we know this paper can be viewed as complementary to intrusion detection
how the output sequence of the physical system, y(k), should react methods based on network and computer systems models.
to the control input sequence, u(k), then any attack to the sensor Because we need to detect anomalies in real time, we can use
data can be potentially detected by comparing the expected output results from sequential detection theory to give a sound foundation
ŷ(k) with the received (and possibly compromised) signal ỹ(k). to our approach. Sequential detection theory considers the problem
Depending on the quality of our estimate ŷ(k) we may have some where the measurement time is not fixed, but can be chosen online
false alarms. We revisit this problem in the next section. as and when the measurements are obtained. Such problem formu-
To formalize the anomaly detection problem, we need (1) a model lations are called optimal stopping problems. Two such problem
of the behavior of the physical system, and (2) an anomaly de- formulations are: sequential detection (also known as sequential
tection algorithm. In section 5.1 we discuss our choice of linear hypothesis testing), and quickest detection (also known as change
models as an approximation of the behavior of the physical system. detection). A good survey of these problems is given by Kailath
In section 5.2, we describe change detection theory and the detec- and Poor [53].
tion algorithm we use–a nonparametric cumulative sum (CUSUM) In optimal stopping problems, we are given a time series se-
statistic. quence z(1), z(2), . . . , z(N ), and the goal is to determine the min-
imum number of samples, N , the anomaly detection scheme should
5.1 Linear Model observe before making a decision dN between two hypotheses: H0
To develop accurate control algorithms, control engineers often (normal behavior) and H1 (attack).
construct a representative model that captures the behavior of the The difference between sequential detection and change detec-
physical system in order to predict how the system will react to a tion is that the former assumes the sequence z(i) is generated either
given control signal. A process model can be derived from first by the normal hypothesis (H0 ), or by the attack hypothesis (H1 ).
principles (a model based on the fundamental laws of physics) or The goal is to decide which hypothesis is true in minimum time.
from empirical input and output data (a model obtained by simu- On the other hand, change detection assumes the observation z(i)
lating the process inputs with a carefully designed test sequence). starts under H0 and then, at a given ks it changes to hypothesis H1 .
It is also very common to use a combination of these two mod- Here the goal is to detect this change as soon as possible.
els; for example, first-principle models are typically calibrated by Both problem formulations are very popular, but security re-
using process test data to estimate key parameters. Likewise, em- searchers have used sequential detection more frequently. How-
pirical models are often adjusted to account for known process ever, for our attack detection method, the change detection formu-
physics [51, 52]. lation is more intuitive. To facilitate this intuition, we now briefly
For highly safety-critical applications, such as the aerospace in- describe the two formulations.
dustry, it is technically and economically feasible to develop accu-
rate models from first principles [51]. However, for the majority of 5.2.1 Sequential Detection
process control systems, the development of process models from Given a fixed probability of false alarm and a fixed probability
fundamental physics is difficult. of detection, the goal of sequential detection is to minimize the
In many cases such detailed models are difficult to justify eco- number of observations required to make a decision between two
hypotheses. The solution is the classic sequential probability ra- where bi is a small positive constant chosen such that
tio test (SPRT) of Wald [54] (also referred as the threshold ran-
dom walk (TRW) by some security papers). SPRT has been widely E0 [kỹi (k) − ŷi (k)k − bi ] < 0. (7)
used in various problems in information security such as detecting The nonparametric CUSUM statistic for sensor i is then:
portscans [55], worms [56], proxies used by spammers [57], and
botnets [58]. Si (k) = (Si (k − 1) + zi (k))+ , Si (0) = 0 (8)
Assuming that the observations z(k) under Hj are generated and the corresponding decision rule is
with a probability distribution pj , the SPRT algorithm can be de- 
scribed by the following equations: H1 if Si (k) > τi
dN,i ≡ dτ (Si (k)) = (9)
H0 otherwise.
p1 (z(k))
S(k + 1) = log + S(k) where τi is the threshold selected based on the false alarm rate for
p0 (z(k))
sensor i.
N = inf {n : S(n) ∈
/ [L, U ]},
n Following [59], we state the following two important results for
starting with S(0) = 0. The SPRT decision rule dN is defined as: Eq. (8)-(9):

- The probability of false alarm decreases exponentially as the



H1 if S(N ) ≥ U
dN = (3) threshold τi increases,
H0 if S(N ) ≤ L,
b
where L ≈ ln 1−a and U ≈ ln 1−b a
, and where a is the desired - The time to detect an attack, (Ni − ks,i )+ , is inversely pro-
probability of false alarm and b is the desired probability of missed portional to bi .
detection (usually chosen as small values).
5.3 Stealthy Attacks
5.2.2 Change Detection A fundamental problem in intrusion detection is the existence of
The goal of the change detection problem is to detect a possible adaptive adversaries that will attempt to evade the detection scheme;
change, at an unknown change point ks .Cumulative sum (CUSUM) therefore, we now consider an adversary that knows about our anomaly
and Shiryaev-Roberts statistics are the two most commonly used detection scheme. We take a conservative approach in our models
algorithms for change detection problems. In this paper we use the by assuming a very powerful attacker with knowledge of: (1) the
CUSUM statistic because it is very similar to the SPRT. exact linear model that we use (i.e., matrices A,B, and C), the pa-
Given a fixed false alarm rate, the CUSUM algorithm attempts to rameters (τi and bi ), and (3) the control command signals. Such
minimize the time N (where N ≥ ks ) for which the test stops and a powerful attacker may be unrealistic in some scenarios, but we
decides that a change has occurred. Let S(0) = 0. The CUSUM want to test the resiliency of our system to such an attacker to guar-
statistic is updated according to antee safety for a wide range of attack scenarios.
„ «+ The goal of the attacker is to raise the pressure in the tank without
p1 (z(k)) being detected (i.e., raise the pressure while keeping the statistic he
S(k + 1) = log + S(k) (4)
p0 (z(k)) controls below the corresponding threshold τi ).
We model three types of attacks: surge attacks, bias attacks and
where (a)+ = a if a ≥ 0 and zero otherwise. The stopping time
geometric attacks. Surge attacks model attackers that want to achieve
is:
maximum damage as soon as they get access to the system. A bias
N = inf {n : S(n) ≥ τ } (5) attack models attackers that try to modify the system discretely by
n
adding small perturbations over a large period of time. Finally,
for a given threshold τ selected based on the false alarm constraint. geometric attacks model attackers that try to shift the behavior of
We can see that the CUSUM algorithm is an SPRT test with L = the system very discretely at the beginning of the attack and then
0, U = τ , and whenever the statistic reaches the lower threshold maximize the damage after the system has been moved to a more
L, it re-starts. vulnerable state.
We now describe how to adapt the results of change detection
theory to the particular problem of detecting compromised sensors. 5.4 Surge Attacks
In the following, we use the subscript i to denote the sequence cor- In a surge attack the adversary tries to maximize the damage as
responding to sensor i. soon as possible, but when the statistic reaches the threshold, it then
One problem that we have in our case is that we do not know stays at the threshold level: Si (k) = τ for the remaining time of
the probability distribution for an attack p1 . In general, an adaptive the attack. To stay at the threshold, the attacker needs to solve the
adversary can select any arbitrary (and possibly) non-stationary se- following quadratic equation:
quence zi (k). Assuming a fixed p1 will thus limit our ability to p
detect a wide range of attacks. Si (k) + (ŷi (k) − ỹi (k))2 − bi = τi
To avoid making assumptions about the probability distribution
The resulting attack (for y5 and y4 ) is:
of an attacker, we use ideas from nonparametric statistics. We do  min
not assume a parametric distribution for p1 and p0 ; instead, only yi if Si (k + 1) ≤ τi
ỹi (k) =
place mild constraints on the observation sequence. One of the ŷi (k) − |τi + bi − Si (k)| if Si (k + 1) > τi
simplest constraints is to assume the expected value of the random
process Zi (k) that generates the sequence zi (k) under H0 is less For y7 we use
than zero (E0 [Zi ] < 0) and the expected value of Zi (k) under H1
 max
y7 if Sy7 (k) ≤ τ7
is greater than zero (E1 [Zi ] > 0). ỹ7 (k) =
ŷ7 + |τ7 + b7 − Sy7 (k)| if Sy7 (k) > τ7
To achieve these conditions let us define
zi (k) := kỹi (k) − ŷi (k)k − bi (6) 5.5 Bias Attacks
In a bias attack the attacker adds a small constant ci at each time model when the operating conditions are reasonably close to the
step. steady-state.

5.7.2 Nonparametric CUSUM parameters


ỹi,k = ŷi,k − ci ∈ Yi In order to select bi for each sensor i, we need to estimate the
In this case, the nonparametric CUSUM statistic can be written expected value of the distance |ŷi (k) − yi (k)| between the linear
as: model estimate ŷi (k) and the sensor measurement yi (k) (i.e., the
n−1
sensor signal without attacks).
X
Si (n) = |ŷi (k) − ỹi (k)| − nbi y4
x 10
4 y5
x 10
4 y7
15000 2.5 2.5
k=0 X: 1.5
X: 0.015
Y: 1.911e+004
X: 0.015 Y: 1.818e+004
Y: 9951
2 2
Assuming the attack starts at time k = 0 and assuming the at- 10000
1.5 1.5
tacker wants to be undetected for n time steps the attacker needs to 1 1
solve the following equation: 5000
0.5 0.5

0 0 0
−0.1 0 0.1 0.2 −10 0 10 20 30 −0.1 0 0.1 0.2
n−1
X
ci = τi + nbi
k=0 Figure 4: The paramenter of ADM: b. For y4 , 9951 bs are
0.015. The mean value of by4 is 0.0642.
Therefore ci = τi /n + b. This attack creates a bias of τi /n + bi
for each attacked signal.
We run experiments for ten thousand times (and for 40 hours
This equation shows the limitations of the attacker. If an attacker
each time) without any attacks to gather statistics. Fig 4 shows the
wants to maximize the damage (maximize the bias of a signal), the
estimated probability distributions (without normalization).
attacker needs to select the smallest n it can find. Because ỹi ∈ Yi
To obtain bi , we compute the empirical expected value for each
this attack reduces to an impulse attack.
distance and then round up to the two most significant units. We
If an attacker wants to attack for a long time, then n will be very
obtain by4 = 0.065, by5 = 4.1, by7 = 0.042.
large. If n is very large then the bias will be smaller.
Once we have bi for each sensor, we need to find a threshold τi
5.6 Geometric Attacks to balance the tradeoff between false alarms and detection time.
In a geometric attack, the attacker wants to drift the value very
slowly at the beginning and maximize the damage at the end. This False Alarm Rate.
attack combines the slow initial drift of the bias attack with a surge We run simulations for twenty times without attacks and com-
attack at the end to cause maximum damage. pute the total number of false alarms for different values of τ (and
Let α ∈ (0, 1). The attack is: for each sensor). Fig 5 shows the results. Taking y4 as an example,
we notice that Sy4 alerts frequently if we set τy4 < 6.
ỹi (k) = ŷi (k) − βi αin−k .
y4 y5 y7
20 20 20
Now we need to find α and β such that Si (n) = τi .
Assume the attack starts at time k = 0 and the attacker wants to 15 15 15

false alarm
false alarm

false alarm

be undetected for n time steps. The attacker then needs to solve the 10 10 10

following equation. 5 5 5
X: 7 X: 4900 X: 44
Y: 1 Y: 1 Y: 1

n−1 0 0 0
X 0 10 20 30 40 50 0 2500 5000 7500 10000 0 25 50 75 100

βi αin−k − nbi = τi tau tau tau

k=0
Figure 5: The number of false alarms decreases exponentially
This addition is a geometric progression.
with increasing τ . This results confirm the theory supporting
n−1 n−1 the nonparametric CUSUM algorithm.
X X 1 − αin
βi αin−k = βi αin (αi−1 )k = βi
αi−1 − 1
k=0 k=0 In general, we would like to select τ as high as possible for each
By fixing α the attacker can select the appropriate β to satisfy the sensor to avoid any false alarm; however, increasing τ increases the
above equation. time to detect attacks.

5.7 Experiments Detection Time.


We continue our use of the TE-PCS model. In this section we To measure the time to detect attacks, we run simulations by
first describe our selection criteria for matrices A, B, and C for launching scaling attacks (ai (k) = λm yi (k)) on sensors y4 , y5
the linear model, and the parameters bi and τi for the CUSUM and y7 . Figs 6 and 7 shows the experimental results.
statistic. We then describe the tradeoffs between false alarm rates The selection of τ is a trade-off between detection time and the
and the delay for detecting attacks. The section ends with the study number of false alarms. The appropriate value differs from system
of stealthy attacks. to system. Because the large number of false alarms is one of the
main problems for anomaly detection systems, and because the TE-
5.7.1 Linear Model PCS process takes at least 10 hours to reach the unsafe state (based
In this paper we use the linear system characterized by the ma- on our risk assessment section), we choose the conservative set of
trices A, B, and C, obtained by linearizing the non-linear TE-PCS parameters τy4 = 50, τy5 = 10000, τy7 = 200. These parameters
model about the steady-state operating conditions. (See Ricker [49].) allow us to detect attacks within a couple of hours, while not raising
The linear model is a good representative of the actual TE-PCS any false alarms.
y4 y5 y7
5 5 5 CUSUM statistic shown in Fig. 10 shows how the attacker remains
undetected until time T = 30 (hrs).
Detection time (hour)

Detection time (hour)

Detection time (hour)


4 4 4

3 3 3 We found that a surge attack does not cause significant damages


2 2 2 because of the inertia of the chemical reactor: by the time the statis-
1 1 1 tic reaches the threshold τ , the chemical reactor is only starting to
0 0 0
respond to the attack. However, since the attacker can only add
0 0.5 1 0 0.5 1 0 0.5 1
λm λm λm very small variations to the signal once it is close to the thresh-
old, the attack ceases to produce any effect and the plant continues
Figure 6: Detection time v.s. scaling attack. Note that for λm operating normally.
i =
1 there is no alarm. y4 y5 y7

Amount of A in purge (mol %)


105 3000 50
ỹ 4 ỹ 5 ỹ 7
y4 y5 y7

Product Rate (kmol / hr)


104 2950 49

103
y4 2900 48
y5 y7

Pressure (kPa)
102
0.8 3 3 2850 47
y = y * 0.9 y = y * 0.9 y = y * 0.9
average detection time (hour)

average detection time (hour)

average detection time (hour)


101
y = y * 0.7 y = y * 0.7 y = y * 0.7
2.5 2.5 2800 46

0.6 y = y * 0.5 y = y * 0.5 y = y * 0.5 100

2750 45
2 2 99
X: 100
X: 50 2700 44
X: 5000 Y: 2 98
0.4 Y: 0.5
1.5 Y: 1.8 1.5
97 2650 43

1 1 96 2600 42
0 10 20 30 40 0 10 20 30 40 0 10 20 30 40
0.2 X: 50
Y: 0.1 X: 5000 Time (hour) Time (hour) Time (hour)
0.5 Y: 0.3 0.5 X: 150
X: 8000 Y: 0.6
Y: 0.5 X: 100
0 0 0 Y: 0.4
0 25 50 75 100 0 2500 5000 7500 10000 0 50 100 150 200
tau
tau tau
Figure 9: Geometric attacks to the three 3 sensors. The solid
lines represent the real state of the system, while the dotted lines
Figure 7: The time for detection increases linearly with increas- represent the information sent by the attacker.
ing τ . This results confirm the theory behind the nonparamet-
ric CUSUM algorithm.
S4 S5 S7
60 12000 220

5.7.3 Stealthy Attacks


200

50 10000 180

160

To test if our selected values for τ are resilient to stealthy attacks,


40 8000
140

120
30 6000

we decided to investigate the effect of stealhty attacks as a function 20 4000


100

80

of τ . To test how the attacks change for all threshols we parameter- 10 2000
60

40

ize each threshold by a parameter p: τitest = pτi . Fig. 8 shows the 0


0 5 10 15 20 25 30 35 40
0
0 5 10 15 20 25 30 35 40
20

0
0 5 10 15 20 25 30 35 40
Time (hour) Time (hour) Time (hour)
percentage of times that geometric stealthy attacks (assuming the
attacker controls all three sensor readings) were able to drive the
pressure above 3000kPa while remaining undetected (as a function Figure 10: Statistics of geometric attacks with 3 sensors com-
of p). promised.

Finally, we assume two types of attackers. An attacker that has


compromised y5 (but who does not know the values of the other
sensors, and therefore can only control Sy5 (k)), and an attacker
that has compromised all three sensors (and therefore can control
the statistic S(k) for all sensors). We launched each attack 20
times. The results are summarized in Figure 11.

Figure 8: Percentage of stealthy attacks that increase the pres-


sure of the tank above 3,000kPa as a function of scaling param-
eter p.

We implemented all stealth attacks starting at time T = 10


(hrs). We assume the goal of the attacker is to be undetected until
T = 30 (hrs). For example, Fig. 9 shows the results of attack- Figure 11: Effect of stealthy attacks. Each attack last 20 hours.
ing all three sensors with a geometric attack. The nonparametric
y y
5 5
Our results show that even though our detection algorithm fails to 3500 3500 y˜5
X: 23.2 y˜5
detect stealthy attacks, we can keep the the plant in safe conditions. 3000
Y: 3000

y5 3000
y5
We also find that the most successful attack strategy are geometric

Pressure ( kPa )

Pressure ( kPa )
2500 2500
attacks.
2000 2000
X: 10.6
Y: 1369
1500 1500
6. RESPONSE TO ATTACKS
1000 1000
A comprehensive security posture for any system should include 0 10 20
Time (hour)
30 40 0 10 20
Time (hour)
30 40

mechanisms for prevention, detection, and response to attacks. Au- 9(a) Without ADM 9(b) ADM detects and responds
tomatic response to computer attacks is one of the fundamental to the attack at T = 10.7 (hr)
problems in information assurance. While most of the research
efforts found in the literature focus on prevention (authentication, Figure 13: y˜5 = y5 ∗ 0.5
access controls, cryptography etc.) or detection (intrusion detec-
tion systems), in practice there are quite a few response mecha-
Alarms Avg y5 Std Dev Max y5
nisms. For example, many web servers send CAPTCHAs to the
0 2700.4 14.73 2757
client whenever they find that connections resemble bot connec-
tions, firewalls drop connections that conform to their rules, the ex-
Table 1: For Thresholds τy4 = 50, τy5 = 10000, τy7 = 200 we
ecution of anomalous processes can be slowed down by intrusion
obtain no false alarm. Therefore we only report the expected
detection systems, etc.
pressure, the standard deviation of the pressure, and the maxi-
Given that we already have an estimate for the state of the system
(given by a linear model), a natural response strategy for control mum pressure reached under no false alarm.
systems is to use this estimate when the anomaly detection statistic
fires an alarm. Fig 12 shows our proposed architecture. Specifi-
time the experiment ran for 40 hours. As expected, with the pa-
cally: for sensor i, if Si (k) > τi , the ADM replaces the sensor
rameter set τy4 = 50, τy5 = 10000, τy7 = 200 our system did
measurements ỹi (k) with measurements generated by the linear
not detect any false alarm (see Table 1); therefore we decided to
model ŷi (k) (that is the controller will receive as input ŷi (k) in-
reduce the detection threshold to τy4 = 5, τy5 = 1000, τy7 = 20
stead of ỹi (k)). Otherwise, it treats ỹi (k) as the correct sensor
and run the same experiments again. Table 2 shows the behavior
signal.
of the pressure after the response to a false alarm. We can see
that while a false response mechanism increases the pressure of the
tank, it never reaches unsafe levels. The maximum pressure ob-
Disturbance tained while controlling the system based on the linear model was
2779kP a, which is in the same order of magnitude than the normal
variation of the pressure without any false alarm (2757kP a).
+ In our case, even if the system is kept in a safe state by the au-
tomated response, our response strategy is meant as a temporary
solution before a human operator responds to the alarm. Based on
ADM Controller
our results we believe that the time for a human response can be
Linear very large (a couple of hours).
Computing Blocks
model
7. CONCLUSIONS
In this work we identified three new research challenges for se-
Figure 12: An Anomaly Detection Module (ADM) can detect curing control systems. We showed that by incorporating a physi-
an attack and send an estimate of the state of the system to the cal model of the system we were able to identify the most critical
controller. sensors and attacks. We also studied the use of physical models
for anomaly detection and proposed three generic types of stealthy
attacks. Finally, we proposed the use of automatic response mech-
Introducing automatic response mechanisms is, however, not an anisms based on estimates of the state of the system. Automatic
easy solution. Every time systems introduce an automatic response responses may be problematic in some cases (especially if the re-
to an alarm, they have to consider the cost of dealing with false sponse to a false alarm is costly); therefore, we would like to em-
alarms. In our proposed detection and response architecture (Fig. 12), phasize that the automatic response mechanism should be consid-
we have to make sure that if there is a false alarm, controlling the ered as a temporary solution before a human investigates the alarm.
system by using the estimated values from the linear system will A full deployment of any automatic response mechanism should
not cause any safety concerns. take into consideration the amount of time in which it is reasonable
for a human operator to respond, and the potential side effects of
6.1 Experiments
The automatic response mechanism works well when we are un-
der attack. For example, Fig. (13) shows that when an attack is Alarms Avg y5 Std Dev Max y5
detected, the response algorithm manages to keep the system in a y4 61 2710 30.36 2779
safe state. Similar results were obtained for all detectable attacks. y5 106 2705 18.72 2794
While our attack response mechanism is a good solution when y7 53 2706 20.89 2776
the alarms are indeed an indication of attacks, Our main concern in
this section is the cost of false alarms. To address these concerns we Table 2: Behavior of the plant after response to a false alarm
ran the simulation scenario without any attacks 1000 times; each with thresholds τy4 = 5, τy5 = 1000, τy7 = 20.
responding to a false alarm. [4] Dale Peterson. Digital bond: Weisscon and stuxnet.
In our experiments with the TE-PCS process we found several http://www.digitalbond.com/index.php/
interesting results. (1) Protecting against integrity attacks is more 2010/09/22/weisscon-and-stuxnet/, October
important than protecting against DoS attacks. In fact, we believe 2010.
that DoS attacks have negligible impact to the TE-PCS process. (2) [5] Brian Krebs. Cyber Incident Blamed for Nuclear Power
The chemical reactor process is a well-behaved system, in the sense Plant Shutdown. Washington Post, http:
that even under perturbations, the response of the system follows //www.washingtonpost.com/wp-dyn/content/
very closely our linear models. In addition, the slow dynamics of article/2008/06/05/AR2008060501958.html,
June 2008.
this process allows us to be able to detect attacks even with large
[6] Robert J. Turk. Cyber incidents involving control systems.
delays with the benefit of not raising any false alarms. (3) Even
Technical Report INL/EXT-05-00671, Idao National
when we configure the system to have false alarms, we saw that the Laboratory, October 2005.
automatic response mechanism was able to control the system in a
[7] Richard Esposito. Hackers penetrate water system
safe mode. computers. http://blogs.abcnews.com/
One of our main conclusions regarding the TE-PCS plant, is that theblotter/2006/10/hackers_penetra.html,
it is a very resiliently-designed process control system. Design of October 2006.
resilient process control systems takes control system design ex- [8] BBC News. Colombia Rebels Blast Power Pylons. BBC,
perience and expertise. The design process is based on iteratively http://news.bbc.co.uk/2/hi/americas/
evaluating the performance on a set of bad situations that can arise 607782.stm, January 2000.
during the operation of the plant and modifying control loop struc- [9] Jill Slay and Michael Miller. Lessons learned from the
tures to build in resilience. In particular, Ricker’s paper discusses maroochy water breach. In Critical Infrastructure Protection,
the set of random faults that the four loop PI control is able to with- volume 253/2007, pages 73–82. Springer Boston, November
stand. 2007.
We like to make two points in this regard: (1). The PI control [10] Paul Quinn-Judge. Cracks in the system. TIME Magazine,
loop structure is distributed, in the sense that no PI control loop 9th Jan 2002.
controls all actuators and no PI loop has access to all sensor mea- [11] Thomas Reed. At the Abyss: An Insider’s History of the Cold
surements, and (2). The set of bad situations to which this control War. Presidio Press, March 2004.
structure is able to withstand may itself result from the one or more [12] United States Attorney, Eastern District of California.
cyber attacks. However, even though the resilience of TE-PCS Willows man arrested for hacking into Tehama Colusa Canal
plant is ensured by expert design, we find it interesting to directly Authority computer system.
test this resilience within the framework of assessment, detection http://www.usdoj.gov/usao/cae/press_
and response that we present in this article. releases/docs/2007/11-28-07KeehnInd.pdf,
November 2007.
However, as a word of caution, large scale control system de-
[13] United States Attorney, Eastern District of California.
signs are often not to resilient by design and may become prey to
Sacramento man pleads guilty to attempting ot shut down
such stealth attacks if sufficient resilience is not built by design in california’s power grid. http:
the first place. Thus, our ideas become all the more relevant for op- //www.usdoj.gov/usao/cae/press_releases/
erational security until there is a principled way of designing fully docs/2007/12-14-07DenisonPlea.pdf,
attack resilient control structures and algorithms (which by itself November 2007.
is a very challenging research endeavor and may not offer a cost [14] David Kravets. Feds: Hacker disabled offshore oil platform
effective design solution). leak-detection system. http://www.wired.com/
Even though we have focused on the analysis of a chemical re- threatlevel/2009/03/feds-hacker-dis/,
actor system, our principles and techniques can be applied to many March 2009.
other physical processes. An automatic detection and response [15] John Leyden. Polish teen derails tram after hacking train
module may not be a practical solution for all control system pro- network. The Register, 11th Jan 2008.
cesses; however, we believe that many processes with similar char- [16] Andrew Greenberg. Hackers cut cities’ power. In Forbes,
acteristics to the TE-PCS can benefit from this kind of response. Jaunuary 2008.
[17] V.M. Igure, S.A. Laughter, and R.D. Williams. Security
Acknowledgments issues in SCADA networks. Computers & Security,
25(7):498–506, 2006.
We would like to thank Gabor Karsai, Adrian Perrig, Bruno Sinop- [18] P. Oman, E. Schweitzer, and D. Frincke. Concerns about
oli, and Jon Wiley for helpful discussions on the security of control intrusions into remotely accessible substation controllers and
systems. This work was supported in part by by the iCAST-TRUST SCADA systems. In Proceedings of the Twenty-Seventh
collaboration project, and by CHESS at UC Berkeley, which re- Annual Western Protective Relay Conference, volume 160.
ceives support from the NSF awards #0720882 (CSR-EHS: PRET) Citeseer, 2000.
and #0931843 (ActionWebs), ARO #W911NF-07-2-0019, MURI [19] US-CERT. Control Systems Security Program. US
#FA9550-06-0312, AFRL, and MuSyC. Department of Homeland Security, http://www.
us-cert.gov/control_systems/index.html,
8. REFERENCES 2008.
[1] Nicolas Falliere, Liam O Murchu, and Eric Chien. [20] GAO. Critical infrastructure protection. Multiple efforts to
W32.Stuxnet Dossier. Symantec, version 1.3 edition, secure control systems are under way, but challenges remain.
November 2010. Technical Report GAO-07-1036, Report to Congressional
[2] Ralph Langner. Langner communications. Requesters, September 2007.
http://www.langner.com/en/, October 2010. [21] Jack Eisenhauer, Paget Donnelly, Mark Ellis, and Michael
[3] Steve Bellovin. Stuxnet: The first weaponized software? O’Brien. Roadmap to Secure Control Systems in the Energy
http://www.cs.columbia.edu/~smb/blog/ Sector. Energetics Incorporated. Sponsored by the U.S.
/2010-09-27.html, October 2010. Department of Energy and the U.S. Department of
Homeland Security, January 2006. Overbye. Detecting false data injection attacks on dc state
[22] Eric Byres and Justin Lowe. The myths and facts behind estimation. In Preprints of the 1st Workshop on Secure
cyber security risks for industrial control systems. In Control Systems, 2010.
Proceedings of the VDE Congress, VDE Association for [42] Henrik Sandberg, Teixeira Andre, and Karl H. Johansson. On
Electrical Electronic & Information Technologies, October security indices for state estimators in power networks. In
2004. Preprints of the 1st Workshop on Secure Control Systems,
[23] D. Geer. Security of critical control systems sparks concern. 2010.
Computer, 39(1):20–23, Jan. 2006. [43] Oliver Kosut, Liyan Jia, Robert J. Thomas, and Lang Tong.
[24] A.A. Cardenas, T. Roosta, and S. Sastry. Rethinking security Malicious data attacks on smart grid state estimation: Attack
properties, threat models, and the design space in sensor strategies and countermeasures. In First International
networks: A case study in SCADA systems. Ad Hoc Conference on Smart Grid Communications
Networks, 2009. (SmartGridComm), pages 220–225, 2010.
[25] NERC-CIP. Critical Infrastructure Protection. North [44] Oliver Kosut, Liyan Jia, Robert J. Thomas, and Lang Tong.
American Electric Reliability Corporation, On malicious data attacks on power system state estimation.
http://www.nerc.com/cip.html, 2008. In UPEC, 2010.
[26] K. Stouffer, J. Falco, and K. Kent. Guide to supervisory [45] A Teixeira, S. Amin, H. Sandberg, K.H. Johansson, and S.S.
control and data acquisition (SCADA) and industrial control Sastry. Cyber-security analysis of state estimators in electric
systems security. Sp800-82, NIST, September 2006. power systems. In IEEE Conference on Decision and
[27] Idaho National Laboratory. National SCADA Test Bed Control (CDC), 2010.
Program. http://www.inl.gov/scada. [46] Le Xie, Yilin Mo, and Bruno Sinopoli. False data injection
[28] Hart. http://www.hartcomm2.org/frontpage/ attacks in electricity markets. In First International
wirelesshart.html. WirelessHart whitepaper, 2007. Conference on Smart Grid Communications
[29] ISA. http://isa.org/isasp100. Wireless Systems (SmartGridComm), pages 226–231, 2010.
for Automation, 2007. [47] Yilin Mo and Bruno Sinopoli. False data injection attacks in
[30] Eric Cosman. Patch management at Dow chemical. In ARC control systems. In Preprints of the 1st Workshop on Secure
Tenth Annual Forum on Manufacturing, February 20-24 Control Systems, 2010.
2006. [48] Julian Rrushi. Composite Intrusion Detection in Process
[31] Patch management strategies for the electric sector. Edison Control Networks. PhD thesis, Universita Degli Studi Di
Electric Institute–IT Security Working Group, March 2004. Milano, 2009.
[32] Eric Byres, David Leversage, and Nate Kube. Security [49] NL Ricker. Model predictive control of a continuous,
incidents and trends in SCADA and process industries. The nonlinear, two-phase reactor. JOURNAL OF PROCESS
Industrial Ethernet Book, 39(2):12–20, May 2007. CONTROL, 3:109–109, 1993.
[33] Andrew K. Wright, John A. Kinast, and Joe McCarty. [50] Dorothy Denning. An intrusion-detection model. Software
Low-latency cryptographic protection for SCADA Engineering, IEEE Transactions on, SE-13(2):222–232, Feb.
communications. In Applied Cryptography and Network 1987.
Security (ACNS), pages 263–277, 2004. [51] S. Joe Quin and Thomas A. Badgwell. A survey of industrial
[34] Patrick P. Tsang and Sean W. Smith. YASIR: A low-latency model predictive control technology. Control Engineering
high-integrity security retrofit for lecacy SCADA systems. In Practice, 11(7):733–764, July 2003.
23rd International Information Security Conference (IFIC [52] J.B. Rawlings. Tutorial overview of model predictive control.
SEC), pages 445–459, September 2008. Control Systems Magazine, IEEE, 20(3):38–52, Jun 2000.
[35] Steven Hurd, Rhett Smith, and Garrett Leischner. Tutorial: [53] T. Kailath and H. V. Poor. Detection of stochastic processes.
Security in electric utility control systems. In 61st Annual IEEE Transactions on Information Theory,
Conference for Protective Relay Engineers, pages 304–309, 44(6):2230–2258, October 1998.
April 2008. [54] A. Wald. Sequential Analysis. J. Wiley & Sons, New York,
[36] Steven Cheung, Bruno Dutertre, Martin Fong, Ulf Lindqvist, 1947.
Keith Skinner, and Alfonso Valdes. Using model-based [55] Jaeyeon Jung, Vern Paxson, Arthur Berger, and Hari
intrusion detection for SCADA networks. In Proceedings of Balakrishan. Fast portscan detection using sequential
the SCADA Security Scientific Symposium, Miami Beach, hypothesis testing. In Proceedings of the 2004 IEEE
FL, USA, 2007 2007. Symposium on Security and Privacy, pages 211–225, May
[37] PAS Ralston, JH Graham, and JL Hieb. Cyber security risk 2004.
assessment for SCADA and DCS networks. ISA [56] Stuart Schechter and Jaeyeon Jung Arthur Berger. Fast
transactions, 46(4):583–594, 2007. detection of scanning worm infections. In Proc. of the
[38] P.A. Craig, J. Mortensen, and J.E. Dagle. Metrics for the Seventh International Symposium on Recent Advances in
National SCADA Test Bed Program. Technical report, Intrusion Detection (RAID), September 2004.
PNNL-18031, Pacific Northwest National Laboratory [57] M. Xie, H. Yin, and H. Wang. An effective defense against
(PNNL), Richland, WA (US), 2008. email spam laundering. In Proceedings of the 13th ACM
[39] G. Hamoud, R.L. Chen, and I. Bradley. Risk assessment of Conference on Computer and Communications Security,
power systems SCADA. In IEEE Power Engineering Society pages 179–190, October 30–November 3 2006.
General Meeting, 2003, volume 2, 2003. [58] Guofei Gu, Junjie Zhang, and Wenke Lee. Botsniffer:
[40] Yao Liu, Michael K. Reiter, and Peng Ning. False data Detecting botnet command and control channels in network
injection attacks against state estimation in electric power traffic. In Proceedings of the 15th Annual Network and
grids. In CCS ’09: Proceedings of the 16th ACM conference Distributed System Security Symposium (NDSS’08), San
on Computer and communications security, pages 21–32, Diego, CA, February 2008.
New York, NY, USA, 2009. ACM. [59] B.E. Brodsky and B.S. Darkhovsky. Non-Parametric
[41] Rakesh Bobba, Katherine M. Rogers, Qiyan Wang, Methods in Change-Point Problems. Kluwer Academic
Himanshu Khurana, Klara Nahrstedt, and Thomas J. Publishers, 1993.

You might also like