Lecture6 0
Lecture6 0
Lecture6 0
Reality
0368-4474, Winter 2013-2014
Lecture 6:
Machine Learning Approach
to Side-Channel Attacks
Guest lecturer: Guy Wolf
1
Machine Learning Approach
to
Side-Channel Attacks
Guy Wolf
guy.wolf@cs.tau.ac.il
2013
Power Consumption
Y
H
HH
H
H
Probing
Power Consumption 6
Y
H
HH
H
H
Probing
Power Consumption 6
Y
H
Response Times
HH *
H
H
Probing
Power Consumption 6
Y
H
Response Times
HH *
H
H
HH
HH
j
H
Acoustics
Probing
Power Consumption 6
Y
H
Response Times
HH *
H
H
HH
HH
j
H
Acoustics
?
Electromagnetic Radiation
Probing
Power Consumption 6
Y
H
Response Times
HH *
H
H
HH
HH
Temperatures j
H
Acoustics
?
Electromagnetic Radiation
leaked data
-
leaked data
-
leaked data
-
-
leaked data
-
-
-
leaked data
-
+)
⊆ RO(100
-
-
-
leaked data
-
+)
⊆ RO(100
-
-
-
Key k -
- Ciphertext
OC
- C
Plaintext p C
C
C
C
Internal value y = f (k, p)
Typical assumption:
power consumption correlates with Hamming Weight of y
p1 - - 00101001 - 3
.. .. .. ..
. . . .
pn - - 10011011 - 5
.. .. .. .. .. .. ..
plaintexts
plaintexts
. ......
:::: trace x :::: n
Traces matrix: Hamming Weight matrix:
times key candidates
~
:::: trace x ::::
P 1
P PP
..
Hamming weight of
.. .. P
.. j
.. ..
plaintexts
plaintexts
internal value for
.
designated plaintext
. . . ..
and key candidate
:::: trace x :::: n
.. .. .. .. .. .. ..
plaintexts
plaintexts
. ......
:::: trace x :::: n
plaintexts
)std(y )
Mutual Information:
.
general statistical correlation
(Entropy(x ) + Entropy(y ) − Entropy(x , y ))
......
:::: trace x :::: n
..
.
Guy Wolf (TAU) Machine Learning & Side-Channels 2013 11 / 61
Alternative Attack: Template Power Analysis
Representing power traces as vectors
(p2 , k2 ) (p3 , k3 )
1
“Template Attacks”, 2003, by S. Chari, J.R. Rao, and P. Rohatgi
Guy Wolf (TAU) Machine Learning & Side-Channels 2013 14 / 61
Alternative Attack: Template Power Analysis
Curse of dimensionality
Curse of Dimensionality
A general term for various phenomena that arise when
analyzing/organizing high-dimensional data.
Common theme - difficult/impractical/impossible to obtain statistical
significance due to sparsity of the data in high-dimensions
Causes poor performance/results of statistical methods compared to
low-dimensional data
3D space
6
-
3D space
6
-
Assume:
avg = 0
3D space
6
-
Find:
max variance directions
times
Covariance
=
matrix
traces
Eigenvector
H
Hj
H
t1
Spectral theorem Eigenvalue
@
applies to cov. R
@
λi φi = Covariance
matrix φi
matrices:
t2
Eigenvectors
H
HH
H
j
H
@
@
Covariance @
matrix = @
@ SVD (Singular Value
@
@
@
@ Decomposition)
6
Eigenvalues
P
Spectral Theorem: cov (t1 , t2 ) = λi φi (t1 ) φi (t2 )
i
Guy Wolf (TAU) Machine Learning & Side-Channels 2013 19 / 61
Dimensionality Reduction
Principal Component Analysis (PCA) - truncated SVD
eig
en
va
lu
es
λ1 λ2 λ3 λ4 λ5
Eigenvalues
Consider simple case of traces that are all on the same high
dimensional line
~
Straight line is defined by a unit vector
ψ
=1
Consider simple case of traces that are all on the same high
dimensional line
~
Straight line is defined by a unit vector
ψ
=1
Consider simple case of traces that are all on the same high
dimensional line
~
Straight line is defined by a unit vector
ψ
=1
f
v
f
v
f
v
3D space
6 f
v
-
f
v
f
v
f
v
φ1 = ψ
f
v
3D space
6 f
v
-
f
v
3D space
6
-
Length: eigenvalues
λ1 φ1
Direction: eigenvectors
λ2 φ2
3D space
I
@
6
-
Traces
Guy Wolf (TAU) Machine Learning & Side-Channels 2013 23 / 61
Dimensionality Reduction
Principal Component Analysis (PCA) - projection
3D space λ1 φ1
6
-
7→
1D space
-
PCA algorithm:
1 Centering
2 Covariance
3 Eigendecomposition
4 Projection
.. 7→ PCA
.. 7→ PCA
.
Next task: how to find keys from the low-dimensional vectors?
Guy Wolf (TAU) Machine Learning & Side-Channels 2013 25 / 61
Clustering & Classification
with
Support Vector Machine
Cluster analysis
Clustering - the task of grouping
objects such that objects in the
same cluster are more similar to
each other than to those in other
clusters.
Cluster analysis
Clustering - the task of grouping
objects such that objects in the
same cluster are more similar to
each other than to those in other
clusters.
Learning types
Unsupervised learning: Supervised learning:
Trying to find hidden Inferring functions from
structures in unlabeled data. labeled training data.
bit i of key
7→ 0 B 1
BN
(p1 , k1 )
(p0 , k0 )
(p2 , k2 ) (p3 , k3 )
b=0
b=1
b=0
b=1
b=0
b=1
b=0
b=1
b=0
b=1
Guy Wolf (TAU) Machine Learning & Side-Channels 2013 30 / 61
Classification
Support Vector Machine (SVM) - quantifying robustness with margins
b=0
b=1
Guy Wolf (TAU) Machine Learning & Side-Channels 2013 31 / 61
Classification
Support Vector Machine (SVM) - quantifying robustness with margins
b=0
b=1
Guy Wolf (TAU) Machine Learning & Side-Channels 2013 31 / 61
Classification
SVM formulation - hyperplane
~ · ~x > 0
w
~
w
Y0
H
~ · ~x < 0
w
Guy Wolf (TAU) Machine Learning & Side-Channels 2013 32 / 61
Classification
SVM formulation - hyperplane with margin
~ · ~x ≥ α
w
~
w
αH
Y0
~ · ~x ≤ −α
w
Guy Wolf (TAU) Machine Learning & Side-Channels 2013 33 / 61
Classification
SVM formulation - shifted hyperplane with margin
~ · ~u
c=w
~ · ~x − c ≥ α
w
~
w
Y~u
αH
~ · ~x − c ≤ −α
w
Guy Wolf (TAU) Machine Learning & Side-Channels 2013 34 / 61
Classification
SVM algorithm
SVM training
Input:
Points {~xi } from PCA of the traces
Labels {bi } according to attacked bit:
1 bit is 0
bi =
−1 bit is 1
Find max α
~ · ~xi − c ≥ bi α
s.t. w
SVM classifier
Input:
New point ~x from PCA projection of attacked trace
The solution (~
w , c, α) from SVM training
~ · ~x − c:
Classify by value of w
Empirical results (on 3DES): desc. success rate (by bit position)
7th bit success rate: ∼95% −→ 1st bit success rate: ∼50%
2
“Side channel attack: an approach based on machine learning”, 2011,
by L. Lerman, G. Bontempi, and O. Markowitch
Guy Wolf (TAU) Machine Learning & Side-Channels 2013 36 / 61
Power Trace Alignment with DTW
Theoretically:
Theoretically:
a
Naı̈ve approach: static alignment by time offset
a-
Realistically:
a a-
Which offset to use?
Training:
1 Acquire power traces
2 Choose reference trace (e.g., arbitrarily or use mean of all traces)
3 Align each trace to the reference trace using the pairwise
alignment
4 Apply training algorithm (e.g., PCA & SVM) to the aligned
traces
Online:
1 Acquire trace from attacked hardware
2 Align trace to the reference trace (from the training) using
pairwise alignment
3 Apply classification algorithm (e.g., PCA & SVM)
Guy Wolf (TAU) Machine Learning & Side-Channels 2013 41 / 61
Power Trace Alignment
Pairwise alignment
Trace x -
i a
Pairwise diff. matrix:
each cell holds difference
H
HH
H x [i] − y [j]
HH
Trace y
H
Guy Wolf (TAU) Machine Learning & Side-Channels 2013 42 / 61
Power Trace Alignment
Pairwise alignment
Trace x -
Alignment path:
get from start to end
of both traces
6
Trace y
1:1 alignment:
trivial - nothing modified
by the alignment
6
Aligned distance:
P 2
= kx − y k2
Trace y
Time offset:
works sometimes, but
not always optimal
6
Aligned distance:
P 2
=?
Trace y
Extreme offset:
complete misalignment -
worst alignment
6 alternative
Aligned distance:
P 2
= kx k2 + ky k2
Trace y
Optimal alignment:
Optimize alignment by
minimizing aligned
6 distance
Aligned distance:
P 2
= min
Trace y
Dynamic Programming
A method for solving complex problems by breaking them down
into simpler subproblems.
Applicable to problems exhibiting the properties of overlapping
subproblems and optimal substructure.
Better performances than naive methods that do not utilize the
subproblem overlap.
Guy Wolf (TAU) Machine Learning & Side-Channels 2013 43 / 61
Power Trace Alignment
Dynamic Time Warp (DTW)
Experimental results:
Compare correlation DPA using 3 alignment methods:
Static: Simple static alignment by time offset
SW: Replace trace entries with avg. of sliding window
Not strictly an alignment method, but simple &
sometimes effective
DTW: Elastic alignment with DTW
3
“Improving Differential Power Analysis by Elastic Alignment”, 2011,
by J.G.J. van Woudenberg, M.F. Witteman, and B. Bakker
Guy Wolf (TAU) Machine Learning & Side-Channels 2013 46 / 61
Power Analysis with DTW-based Alignment
based on recent paper3
Static:
SW:
DTW:
SW:
DTW:
=?
1
+
- q f - q f - qf
qi
i−2 i−1 i
qi+1 =
?
- h
y
@
q i+
1
h
?
=
@
@R
@ z
Transition probabilities:
Pr [qi+1 =? | qi , qi−1 , . . . , q2 , q1 ]
=?
1
+
f
qi
- qi−2m - qi−1m - qi qi+1 =
?
- h
y
@
q i+
1
h
?
=
@
@R
@ z
- tm - em - xm - tm -
- hm
1 - hm
2 - hm
3 - hm
4 -
- hm
1 - hm
2 - hm
3 - hm
4 -
o1f
? f
?
o2 o3f
? f
?
o4
- hm
1 - hm
2 - hm
3 - hm
4 -
o1f
? f
?
o2 o3f
?
o4 f
?
- hm
1 - hm
2 - hm
3 - hm
4 -
o1f
? f
?
o2 o3f
? f
?
o4
Viterbi Algorithm
A dynamic programming algorithm for finding the most likely
sequence of hidden states, especially in the context of Hidden Markov
models.
Guy Wolf (TAU) Machine Learning & Side-Channels 2013 53 / 61
Acoustic Analysis of Keyboards
based on paper4
4
“Keyboard Acoustic Emanations Revisited”,2005, by L. Zhuang, F.
Zhou, and J.D. Tygar
Guy Wolf (TAU) Machine Learning & Side-Channels 2013 54 / 61
Acoustic Analysis of Keyboards
based on paper4
4
“Keyboard Acoustic Emanations Revisited”,2005, by L. Zhuang, F.
Zhou, and J.D. Tygar
Guy Wolf (TAU) Machine Learning & Side-Channels 2013 54 / 61
Acoustic Analysis of Keyboards
based on paper4
Typed text:
4
“Keyboard Acoustic Emanations Revisited”,2005, by L. Zhuang, F.
Zhou, and J.D. Tygar
Guy Wolf (TAU) Machine Learning & Side-Channels 2013 54 / 61
Acoustic Analysis of Keyboards
based on paper4
HMM only:
4
“Keyboard Acoustic Emanations Revisited”,2005, by L. Zhuang, F.
Zhou, and J.D. Tygar
Guy Wolf (TAU) Machine Learning & Side-Channels 2013 54 / 61
Acoustic Analysis of Keyboards
based on paper4
4
“Keyboard Acoustic Emanations Revisited”,2005, by L. Zhuang, F.
Zhou, and J.D. Tygar
Guy Wolf (TAU) Machine Learning & Side-Channels 2013 54 / 61
Acoustic Analysis of Keyboards
based on paper4
Password retrieval
4
“Keyboard Acoustic Emanations Revisited”,2005, by L. Zhuang, F.
Zhou, and J.D. Tygar
Guy Wolf (TAU) Machine Learning & Side-Channels 2013 54 / 61
Acoustic Analysis of Printers
based on recent paper5
5
“Acoustic Side-Channel Attacks on Printers”,2010, by M. Backes, M.
Dürmuth, S. Gerling, M. Pinkal, C. Sporleder
Guy Wolf (TAU) Machine Learning & Side-Channels 2013 55 / 61
Acoustic Analysis of Printers
based on recent paper5
sound intensity
Training:
Feature extraction (split into words, noise reduction, etc.)
Construct DB with (word, sound) pairs
Online:
Feature extraction (same as in training)
For each word:
Sort DB by similarity/difference from recorded sound
Reorder DB by n-gram/word distribution using HMM
Guess printed word as the top candidate from reordered DB
5
“Acoustic Side-Channel Attacks on Printers”,2010, by M. Backes, M.
Dürmuth, S. Gerling, M. Pinkal, C. Sporleder
Guy Wolf (TAU) Machine Learning & Side-Channels 2013 55 / 61
Acoustic Analysis of Printers
based on recent paper5
5
“Acoustic Side-Channel Attacks on Printers”,2010, by M. Backes, M.
Dürmuth, S. Gerling, M. Pinkal, C. Sporleder
Guy Wolf (TAU) Machine Learning & Side-Channels 2013 55 / 61
Further Reading I