Attacks Against Process Control Systems: Risk Assessment, Detection, and Response
presence of an inert B to form a non-volatile liquid product D: • By attacking the sensors, a controller is expected to respond
with incorrect control signals since it receives wrong infor-
A + C −→ D. mation from the compromised sensors. For example, by forg-
ing y7 as y7max from t = 0 to 30, the controller believes there
The feed stream 1 contains A, C and trace of B; feed stream 2 is is a large amount of component A in the tank.
pure A; stream 3 is the purge containing vapours of A, B, C; and
stream 4 is the exit for liquid product D. The measured flow rates y5 y7
3000 120
of stream i is denoted by Fi (kmol h−1 ). The control objectives
- Regulate F4 , the rate of production of the product D, at a
set-point F4sp (kmol h−1 ),
2600 60
Pressure ( kPa )
2720 the literature of a detailed nonlinear model of an industrial control
problem; this is the reason why the TE-PCS system has been used
as a standard testbed in many industrial control papers.)
0 2680 To facilitate the creation of physical models, most industrial con-
0 10 20 30 40 0 10 20 30 40
Time (hour) Time (hour) trol vendors provide tools (called identification packages) to de-
9(a) 9(b) velop models of physical systems from training data. The most
common models are linear systems. Linear systems can be used to
model dynamics that are linear in state x(k) and control input u(k)
Figure 3: Safety can be breached by compromising sensor y5 x(k + 1) = Ax(k) + Bu(k) (1)
(3(a)). DoS attacks, on the other hand, do not cause any damage
(and they are easy to detect.) (3(b)). where time is represented by k ∈ Z+ , x(k) = (x1 (k), . . . , xn (k)) ∈
Rn is the state of the system, and u(k) = (u1 (k), . . . , um (k)) ∈
Rm is the control input. The matrix A = (aij ) ∈ Rn×n models the
We conclude that if the plant operator wants to prevent an attack physical dependence of state i on state j, and B = (bij ) ∈ Rn×m
from making the system operate in an unsafe state, it should priori- is the input matrix for state i from control input j.
tize defenses against integrity attacks rather than on DoS attacks. If Assume the system (1) is monitored by a sensor network with
the plant operator only has enough budget to deploy advanced se- p sensors. We obtain the measurement sequence from the observa-
curity mechanisms for one sensor (e.g., tamper resistance, or TPM tion equations
chips), y5 should be the priority.
ŷ(k) = Cx(k), (2)
5. DETECTION OF ATTACKS where ŷ(k) = (ŷ1 (k), . . . , ŷp (k)) ∈ R , and ŷl (k) ∈ R is the
estimated measurement collected by sensor l at time k. Matrix C ∈
Detecting attacks to control systems can be formulated as an Rp×n is called output matrix.
anomaly-based intrusion detection problem [50]. One big differ-
ence in control systems compared to traditional IT systems, is that
instead of creating models of network traffic or software behavior,
5.2 Detection Methods
we can use a representative model of the physical system. The physical-model-based attack detection method presented in
The intuition behind this approach is the following: if we know this paper can be viewed as complementary to intrusion detection
how the output sequence of the physical system, y(k), should react methods based on network and computer systems models.
to the control input sequence, u(k), then any attack to the sensor Because we need to detect anomalies in real time, we can use
data can be potentially detected by comparing the expected output results from sequential detection theory to give a sound foundation
ŷ(k) with the received (and possibly compromised) signal ỹ(k). to our approach. Sequential detection theory considers the problem
Depending on the quality of our estimate ŷ(k) we may have some where the measurement time is not fixed, but can be chosen online
false alarms. We revisit this problem in the next section. as and when the measurements are obtained. Such problem formu-
To formalize the anomaly detection problem, we need (1) a model lations are called optimal stopping problems. Two such problem
of the behavior of the physical system, and (2) an anomaly de- formulations are: sequential detection (also known as sequential
tection algorithm. In section 5.1 we discuss our choice of linear hypothesis testing), and quickest detection (also known as change
models as an approximation of the behavior of the physical system. detection). A good survey of these problems is given by Kailath
In section 5.2, we describe change detection theory and the detec- and Poor [53].
tion algorithm we use–a nonparametric cumulative sum (CUSUM) In optimal stopping problems, we are given a time series se-
statistic. quence z(1), z(2), . . . , z(N ), and the goal is to determine the min-
imum number of samples, N , the anomaly detection scheme should
5.1 Linear Model observe before making a decision dN between two hypotheses: H0
To develop accurate control algorithms, control engineers often (normal behavior) and H1 (attack).
construct a representative model that captures the behavior of the The difference between sequential detection and change detec-
physical system in order to predict how the system will react to a tion is that the former assumes the sequence z(i) is generated either
given control signal. A process model can be derived from first by the normal hypothesis (H0 ), or by the attack hypothesis (H1 ).
principles (a model based on the fundamental laws of physics) or The goal is to decide which hypothesis is true in minimum time.
from empirical input and output data (a model obtained by simu- On the other hand, change detection assumes the observation z(i)
lating the process inputs with a carefully designed test sequence). starts under H0 and then, at a given ks it changes to hypothesis H1 .
It is also very common to use a combination of these two mod- Here the goal is to detect this change as soon as possible.
els; for example, first-principle models are typically calibrated by Both problem formulations are very popular, but security re-
using process test data to estimate key parameters. Likewise, em- searchers have used sequential detection more frequently. How-
pirical models are often adjusted to account for known process ever, for our attack detection method, the change detection formu-
physics [51, 52]. lation is more intuitive. To facilitate this intuition, we now briefly
For highly safety-critical applications, such as the aerospace in- describe the two formulations.
dustry, it is technically and economically feasible to develop accu-
rate models from first principles [51]. However, for the majority of 5.2.1 Sequential Detection
process control systems, the development of process models from Given a fixed probability of false alarm and a fixed probability
fundamental physics is difficult. of detection, the goal of sequential detection is to minimize the
In many cases such detailed models are difficult to justify eco- number of observations required to make a decision between two
hypotheses. The solution is the classic sequential probability ra- where bi is a small positive constant chosen such that
tio test (SPRT) of Wald [54] (also referred as the threshold ran-
dom walk (TRW) by some security papers). SPRT has been widely E0 [kỹi (k) − ŷi (k)k − bi ] < 0. (7)
used in various problems in information security such as detecting The nonparametric CUSUM statistic for sensor i is then:
portscans [55], worms [56], proxies used by spammers [57], and
botnets [58]. Si (k) = (Si (k − 1) + zi (k))+ , Si (0) = 0 (8)
Assuming that the observations z(k) under Hj are generated and the corresponding decision rule is
with a probability distribution pj , the SPRT algorithm can be de-
scribed by the following equations: H1 if Si (k) > τi
dN,i ≡ dτ (Si (k)) = (9)
H0 otherwise.
p1 (z(k))
S(k + 1) = log + S(k) where τi is the threshold selected based on the false alarm rate for
p0 (z(k))
sensor i.
N = inf {n : S(n) ∈
/ [L, U ]},
n Following [59], we state the following two important results for
starting with S(0) = 0. The SPRT decision rule dN is defined as: Eq. (8)-(9):
0 0 0
−0.1 0 0.1 0.2 −10 0 10 20 30 −0.1 0 0.1 0.2
ci = τi + nbi
k=0 Figure 4: The paramenter of ADM: b. For y4 , 9951 bs are
0.015. The mean value of by4 is 0.0642.
Therefore ci = τi /n + b. This attack creates a bias of τi /n + bi
for each attacked signal.
We run experiments for ten thousand times (and for 40 hours
This equation shows the limitations of the attacker. If an attacker
each time) without any attacks to gather statistics. Fig 4 shows the
wants to maximize the damage (maximize the bias of a signal), the
estimated probability distributions (without normalization).
attacker needs to select the smallest n it can find. Because ỹi ∈ Yi
To obtain bi , we compute the empirical expected value for each
this attack reduces to an impulse attack.
distance and then round up to the two most significant units. We
If an attacker wants to attack for a long time, then n will be very
obtain by4 = 0.065, by5 = 4.1, by7 = 0.042.
large. If n is very large then the bias will be smaller.
Once we have bi for each sensor, we need to find a threshold τi
5.6 Geometric Attacks to balance the tradeoff between false alarms and detection time.
In a geometric attack, the attacker wants to drift the value very
slowly at the beginning and maximize the damage at the end. This False Alarm Rate.
attack combines the slow initial drift of the bias attack with a surge We run simulations for twenty times without attacks and com-
attack at the end to cause maximum damage. pute the total number of false alarms for different values of τ (and
Let α ∈ (0, 1). The attack is: for each sensor). Fig 5 shows the results. Taking y4 as an example,
we notice that Sy4 alerts frequently if we set τy4 < 6.
ỹi (k) = ŷi (k) − βi αin−k .
y4 y5 y7
20 20 20
Now we need to find α and β such that Si (n) = τi .
Assume the attack starts at time k = 0 and the attacker wants to 15 15 15
false alarm
false alarm
false alarm
be undetected for n time steps. The attacker then needs to solve the 10 10 10
following equation. 5 5 5
X: 7 X: 4900 X: 44
Y: 1 Y: 1 Y: 1
n−1 0 0 0
X 0 10 20 30 40 50 0 2500 5000 7500 10000 0 25 50 75 100
Figure 5: The number of false alarms decreases exponentially
This addition is a geometric progression.
with increasing τ . This results confirm the theory supporting
n−1 n−1 the nonparametric CUSUM algorithm.
X X 1 − αin
βi αin−k = βi αin (αi−1 )k = βi
αi−1 − 1
k=0 k=0 In general, we would like to select τ as high as possible for each
By fixing α the attacker can select the appropriate β to satisfy the sensor to avoid any false alarm; however, increasing τ increases the
above equation. time to detect attacks.
y4 2900 48
y5 y7
Pressure (kPa)
0.8 3 3 2850 47
y = y * 0.9 y = y * 0.9 y = y * 0.9
average detection time (hour)
2750 45
2 2 99
X: 100
X: 50 2700 44
X: 5000 Y: 2 98
0.4 Y: 0.5
1.5 Y: 1.8 1.5
97 2650 43
1 1 96 2600 42
0 10 20 30 40 0 10 20 30 40 0 10 20 30 40
0.2 X: 50
Y: 0.1 X: 5000 Time (hour) Time (hour) Time (hour)
0.5 Y: 0.3 0.5 X: 150
X: 8000 Y: 0.6
Y: 0.5 X: 100
0 0 0 Y: 0.4
0 25 50 75 100 0 2500 5000 7500 10000 0 50 100 150 200
tau tau
Figure 9: Geometric attacks to the three 3 sensors. The solid
lines represent the real state of the system, while the dotted lines
Figure 7: The time for detection increases linearly with increas- represent the information sent by the attacker.
ing τ . This results confirm the theory behind the nonparamet-
ric CUSUM algorithm.
S4 S5 S7
60 12000 220
50 10000 180
30 6000
of τ . To test how the attacks change for all threshols we parameter- 10 2000
0 5 10 15 20 25 30 35 40
Time (hour) Time (hour) Time (hour)
percentage of times that geometric stealthy attacks (assuming the
attacker controls all three sensor readings) were able to drive the
pressure above 3000kPa while remaining undetected (as a function Figure 10: Statistics of geometric attacks with 3 sensors com-
of p). promised.
y5 3000
We also find that the most successful attack strategy are geometric
Pressure ( kPa )
Pressure ( kPa )
2500 2500
2000 2000
X: 10.6
Y: 1369
1500 1500
1000 1000
A comprehensive security posture for any system should include 0 10 20
Time (hour)
30 40 0 10 20
Time (hour)
30 40
mechanisms for prevention, detection, and response to attacks. Au- 9(a) Without ADM 9(b) ADM detects and responds
tomatic response to computer attacks is one of the fundamental to the attack at T = 10.7 (hr)
problems in information assurance. While most of the research
efforts found in the literature focus on prevention (authentication, Figure 13: y˜5 = y5 ∗ 0.5
access controls, cryptography etc.) or detection (intrusion detec-
tion systems), in practice there are quite a few response mecha-
Alarms Avg y5 Std Dev Max y5
nisms. For example, many web servers send CAPTCHAs to the
0 2700.4 14.73 2757
client whenever they find that connections resemble bot connec-
tions, firewalls drop connections that conform to their rules, the ex-
Table 1: For Thresholds τy4 = 50, τy5 = 10000, τy7 = 200 we
ecution of anomalous processes can be slowed down by intrusion
obtain no false alarm. Therefore we only report the expected
detection systems, etc.
pressure, the standard deviation of the pressure, and the maxi-
Given that we already have an estimate for the state of the system
(given by a linear model), a natural response strategy for control mum pressure reached under no false alarm.
systems is to use this estimate when the anomaly detection statistic
fires an alarm. Fig 12 shows our proposed architecture. Specifi-
time the experiment ran for 40 hours. As expected, with the pa-
cally: for sensor i, if Si (k) > τi , the ADM replaces the sensor
rameter set τy4 = 50, τy5 = 10000, τy7 = 200 our system did
measurements ỹi (k) with measurements generated by the linear
not detect any false alarm (see Table 1); therefore we decided to
model ŷi (k) (that is the controller will receive as input ŷi (k) in-
reduce the detection threshold to τy4 = 5, τy5 = 1000, τy7 = 20
stead of ỹi (k)). Otherwise, it treats ỹi (k) as the correct sensor
and run the same experiments again. Table 2 shows the behavior
of the pressure after the response to a false alarm. We can see
that while a false response mechanism increases the pressure of the
tank, it never reaches unsafe levels. The maximum pressure ob-
Disturbance tained while controlling the system based on the linear model was
2779kP a, which is in the same order of magnitude than the normal
variation of the pressure without any false alarm (2757kP a).
+ In our case, even if the system is kept in a safe state by the au-
tomated response, our response strategy is meant as a temporary
solution before a human operator responds to the alarm. Based on
ADM Controller
our results we believe that the time for a human response can be
Linear very large (a couple of hours).
Computing Blocks
In this work we identified three new research challenges for se-
Figure 12: An Anomaly Detection Module (ADM) can detect curing control systems. We showed that by incorporating a physi-
an attack and send an estimate of the state of the system to the cal model of the system we were able to identify the most critical
controller. sensors and attacks. We also studied the use of physical models
for anomaly detection and proposed three generic types of stealthy
attacks. Finally, we proposed the use of automatic response mech-
Introducing automatic response mechanisms is, however, not an anisms based on estimates of the state of the system. Automatic
easy solution. Every time systems introduce an automatic response responses may be problematic in some cases (especially if the re-
to an alarm, they have to consider the cost of dealing with false sponse to a false alarm is costly); therefore, we would like to em-
alarms. In our proposed detection and response architecture (Fig. 12), phasize that the automatic response mechanism should be consid-
we have to make sure that if there is a false alarm, controlling the ered as a temporary solution before a human investigates the alarm.
system by using the estimated values from the linear system will A full deployment of any automatic response mechanism should
not cause any safety concerns. take into consideration the amount of time in which it is reasonable
for a human operator to respond, and the potential side effects of
6.1 Experiments
The automatic response mechanism works well when we are un-
der attack. For example, Fig. (13) shows that when an attack is Alarms Avg y5 Std Dev Max y5
detected, the response algorithm manages to keep the system in a y4 61 2710 30.36 2779
safe state. Similar results were obtained for all detectable attacks. y5 106 2705 18.72 2794
While our attack response mechanism is a good solution when y7 53 2706 20.89 2776
the alarms are indeed an indication of attacks, Our main concern in
this section is the cost of false alarms. To address these concerns we Table 2: Behavior of the plant after response to a false alarm
ran the simulation scenario without any attacks 1000 times; each with thresholds τy4 = 5, τy5 = 1000, τy7 = 20.
