Research Article: Pyeeg: An Open Source Python Module For Eeg/Meg Feature Extraction
Research Article: Pyeeg: An Open Source Python Module For Eeg/Meg Feature Extraction
Research Article: Pyeeg: An Open Source Python Module For Eeg/Meg Feature Extraction
Research Article
PyEEG: An Open Source Python Module for
EEG/MEG Feature Extraction
Copyright © 2011 Forrest Sheng Bao et al. This is an open access article distributed under the Creative Commons Attribution
License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly
cited.
Computer-aided diagnosis of neural diseases from EEG signals (or other physiological signals that can be treated as time series,
e.g., MEG) is an emerging field that has gained much attention in past years. Extracting features is a key component in the analysis
of EEG signals. In our previous works, we have implemented many EEG feature extraction functions in the Python programming
language. As Python is gaining more ground in scientific computing, an open source Python module for extracting EEG features
has the potential to save much time for computational neuroscientists. In this paper, we introduce PyEEG, an open source Python
module for EEG feature extraction.
2. Main Framework
PyEEG’s target users are programmers (anyone who writes Feature values
programs) working on computational neuroscience. Figure 1
shows its framework. PyEEG is a Python module that focuses
only on extracting features from EEG/MEG segments. There- Figure 1: PyEEG framework.
fore, it does not contain functions to import data of various
formats or export features to a classifier. This is due to the
modularity and composition principles of building open 3. Supported Feature Extraction
source software which indicate that small programs that can
work well together via simple interfaces are better than big In this section, we detail the definitions and computation
monolithic programs. Since open source tools like EEG/MEG procedures to extract EEG features (as shown in Table 1)
data importers (e.g., EEGLab, Biosig, etc.) and classifier in PyEEG. Since there are many parameters and various
front-ends are already available, there is no need for us to algorithms for one feature, the numerical value of a feature
reinvent the wheel. Users can easily hook PyEEG up with extracted by PyEEG may be different from that extracted by
various existing open source software to build toolchains for other toolboxes. Users may need to adjust our code or use
their EEG/MEG research. non-default values for the parameters in order to meet their
PyEEG consists of two sets of functions. needs. Please note that the index of an array or a vector starts
from 1 rather than 0 in this section.
(1) Preprocessing functions, which do not return any
feature values. Only two such functions have been 3.1. Power Spectral Intensity and Relative Intensity Ratio.
implemented so far. builds embedding To a time series [x1 , x2 , . . . , xN ], denote its Fast Fourier
sequence (from given lag and embedding dimen- Transform (FFT) result as [X1 , X2 , . . . , XN ]. A continuous
sion) and
computes first- frequency band from flow to fup is sliced into K bins,
order differential sequence. One can build differential which can be of equal width or not. Boundaries of bins are
sequences of higher orders by repeatedly applying specified by a vector band = [ f1 , f2 , . . . , fK ], such that the
first-order differential computing. lower and upper frequencies of the ith bin are fi and fi+1 ,
respectively. Commonly used unequal bins are EEG/MEG
(2) Feature extraction functions, that return feature val- rhythms, which are, δ (0.5–4 Hz), θ (4–7 Hz), α (8–12 Hz),
ues. These are listed in Table 1. β (12–30 Hz), and γ (30–100 Hz). For these bins, we have
band = [0.5, 4, 7, 12, 30, 100].
The Power Spectral Intensity (PSI) [12] of the kth bin is
PyEEG only uses functions in standard Python library
evaluated as
and SciPy, the de facto Python module for scientific com-
puting. PyEEG does not define any new data structure, k+1 / fs )
N( f
but instead uses only standard Python and NumPy data PSIk = |Xi |, k = 1, 2, . . . , K − 1, (1)
structures. The reason is that we want to simplify the use i= N( fk / fs )
of PyEEG, especially for users without much program-
ming background. The inputs of all functions are a time where fs is the sampling rate, and N is the series length.
sequence as a list of floating-point numbers and a set Relative Intensity Ratio (RIR) [12] is defined on top of
of optional feature extraction parameters. Parameters have PSI
default values. The output of a feature extraction function is
a floating-point number if the feature is a scalar or a list of PSI j
RIR j = K −1 , j = 1, 2, . . . , K − 1. (2)
floating-point numbers (a vector) otherwise. Details about k=1 PSIk
functions are available in the PyEEG reference guide at
http://PyEEG.SourceForge.net/. PSI and RIR are both vector features.
Computational Intelligence and Neuroscience 3
Table 1: PyEEG-supported features and extraction functions with their return types.
3.2. Petrosian Fractal Dimension (PFD). To a time series, 3.6. SVD Entropy. Reference [17] defines an entropy mea-
PFD is defined as sure using Singular Value Decomposition (SVD). Let the
log10 N input signal be [x1 , x2 , . . . , xN ]. We construct delay vectors as
PFD = , (3)
log10 N + log10 (N/(N + 0.4Nδ )) y(i) = xi , xi+τ , . . . , xi+(dE −1)τ , (7)
where N is the series length, and Nδ is the number of sign
where τ is the delay and dE is the embedding dimension. In
changes in the signal derivative [13]. PFD is a scalar feature.
this paper, dE = 20 and τ = 2. The embedding space is then
constructed by
3.3. Higuchi Fractal Dimension (HFD). Higuchi’s algorithm
[14] constructs k new series from the original series T
Y = y(1), y(2), . . . , y(N − (dE − 1)τ) . (8)
[x1 , x2 , . . . , xN ] by
The SVD is then performed on matrix Y to produce M
xm , xm+k , xm+2k , . . . , xm+(N −m)/kk , (4)
singular values, σ1 , . . . , σM , known as the singular spectrum.
where m = 1, 2, . . . , k. The SVD entropy is then defined as
For each time series constructed from (4), the length
L(m, k) is computed by
M
HSVD = − σ i log2 σ i , (9)
(N −m)/k
i=1
i=2 xm+ik − xm+(i−1)k (N − 1)
L(m, k) = . (5)
(N − m)/kk where M is the number of singular values and σ 1 , . . . , σ M are
normalized singular values such that σ i = σi / M j =1 σ j . SVD
The average length is computed as L(k) = [ ki=1 L(i, k)]/k. entropy is a scalar feature.
This procedure repeats kmax times for each k from 1 to
kmax , and then uses a least-square method to determine the
slope of the line that best fits the curve of ln(L(k)) versus 3.7. Fisher Information. The Fisher information [18] can be
ln(1/k). The slope is the Higuchi Fractal Dimension. HFD is defined in normalized singular spectrum used in (9)
a scalar feature. −1
M
(σ i+1 − σ i )2
I= . (10)
3.4. Hjorth Parameters. To a time series [x1 , x2 , . . . , xN ], i=1
σi
the Hjorth mobility
√
and complexity
[15] are, respectively,
Fisher information is a scalar feature.
defined as M2/TP and (M4 · TP)/(M2 · M2), where
TP = xi /N , M2 = di /N , M4 = (di − di−1 )2 /N , and
3.8. Approximate Entropy. Approximate entropy (ApEn) is
di = xi − xi−1 . Hjorth mobility and complexity are both scalar a statistical parameter to quantify the regularity of a time
features. series [19].
ApEn is computed by the following steps.
3.5. Spectral Entropy. The spectral entropy [16] is defined as
follows (1) Let the input signal be [x1 , x2 , . . . , xN ].
(2) Build subsequence x(i, m) = [xi , xi+1 , . . . , xi+m−1 ] for
1
K
H=− RIRi log RIRi , (6) 1 ≤ i ≤ N − m, where m is the length of the
log(K) i=1 subsequence. In [7], m = 1, 2, or 3.
where RIRi and K are defined in (2). Spectral entropy is (3) Let r represent the noise filter level, defined as r =
a scalar feature. k × SD for k = 0, 0.1, 0.2, . . . , 0.9.
4 Computational Intelligence and Neuroscience
(4) Build a set of subsequences {x( j, m)} = {x( j, m) | 4. Using PyEEG on Real Data
j ∈ [1..N − m]}, where x( j, m) is defined in step 2.
In this section, we use PyEEG on a real EEG dataset to dem-
(5) For each x(i, m) ∈ {x( j, m)}, compute
onstrate its use in everyday research.
N −m The dataset (http://epileptologie-bonn.de/cms/front
kj
j =1
C(i, m) = , (11) content.php?idcat=193&lang=3), from Klinik für Epilep-
N −m tologie, Universität Bonn, Germany [22], has been widely
where used in previous epilepsy research. In total, there are five
⎧ sets, each containing 100 single-channel EEG segments.
⎨1 if x(i, m) − x j, m < r,
kj = ⎩ (12) Each segment has 4096 samples. Data in sets A and B are
0 otherwise. extracranial EEGs from 5 healthy volunteers with eyes open
and eyes closed, respectively. Sets C and D are intracranial
(6) data over interictal periods while Set E over ictal periods.
⎡ ⎤ Segments in D are from within the epileptogenic zone,
−m
N
1 ⎣ C(i, m) ⎦ while those in C are from the hippocampal formation of
ApEn(m, r, N ) = ln . (13)
N −M i=1
C(i, m + 1) the opposite hemisphere of the brain. Sets C, D, and E are
composed from EEGs of 5 patients. The data had a spectral
ApEn is a scalar feature. bandwidth of 0.5–85 Hz. Please refer to [22] for more details.
Using PyEEG is like using any other Python module.
3.9. Detrended Fluctuation Analysis. Detrended Fluctuation Users simply need to import PyEEG and then call its
Analysis (DFA) is proposed in [20]. functions as needed. PyEEG is provided as a single Python
The procedures to compute DFA of a time series file. Therefore, it only needs to be downloaded and placed
[x1 , x2 , . . . , xN ] are as follows. under a directory on Python module search paths, such as the
working directory. Alternatively, environment
(1) First integrate x into a new series y = [y(1), . . . , variable can be set to point to the location of PyEEG.
y(N )], where y(k) = ki=1 (xi − x) and x is the average On Python interpreter, we first import PyEEG and load
of x1 , x2 , . . . , xN . the data
(2) The integrated series is then sliced into boxes of equal
>>>
length n. In each box of length n, a least-squares line is
fit to the data, representing the trend in that box. The >>>
y coordinate of the straight line segments is denoted >>>
!"
by yn (k).
>>> !! # "
!$
$
%
(3) The root-mean-square fluctuation of the
integrated series is calculated by F(n)
= where is the first segment in set A. The data type
of !! is "
. After loading EEG data, we can use PyEEG
(1/N ) Nk=1 [y(k) − yn (k)]2 , where the part
y(k) − yn (k) is called detrending. to extract features as follows (using all default parameters):
(4) The fluctuation can be defined as the slope of the line >>> &' !!!
relating log F(n) to log n.
>>> &'
DFA is a scalar feature.
()**+,-)(+-.*)
3.10. Hurst Exponent. The hurst exponent (HURST) [21] is >>> / 0
1/!!
also called Rescaled Range statistics (R/S). To calculate the >>> / 0
hurst exponent for time series X = [x1 , x2 , . . . , xN ], the first
,(*..+(++),2*
step is to calculate the accumulated deviation from the mean
of time series within range T >>> '& !!
>>> '&
1
t T
X(t, T) = (xi − x), where x = xi , t ∈ [1..N ]. *(,*(.+2)(-.+
i=1
T i=1
(14) Due to space limitations, we are not able to print all
feature values of all EEG segments. Instead, we visualize
Then, R(T)/S(T) is calculated as
the averages of the features (except RIR and PSI) within
R(T) max(X(t, T)) − min(X(t, T)) each of the five sets in Figure 2. Error bars represent the
= . (15)
S(T) (1/T) Tt=1 [x(t) − x]2 variances of features in each set. PSIs for five sets are
plotted in Figure 3. Users can replot these pictures and get
The Hurst Exponent is obtained by calculating the slope of averages of features on Python interpreter by a testing script
the line produced by ln(R(n)/S(n)) versus ln(n) for n ∈ (http://code.google.com/p/pyeeg/wiki/TestScript) from our
[2..N ]. Hurst Exponent is a scalar feature. project website.
Computational Intelligence and Neuroscience 5
Detrended fluctuation
Petrosian fractal dimension Higuchi fractal dimension Fisher information analysis
Table 2: Values of parameters used in our example. on their definitions precisely even though faster algorithms
may exist. There are many other EEG features, such as
Parameter name Value In feature(s)
Lyapunov Exponents, that have not been yet implemented
kmax 5 HFD
in PyEEG. More EEG features will be added into PyEEG
τ 4 SVD Entropy in the future while we finish unit testing and documen-
dE 10 Fisher Information tation for each function. In personal emails, some open
r 0.3σ 1 source projects, such as ConnectomeViewer (http://www.
ApEn
m 10 connectomeviewer.org/viewer) and NIPY/PBrain (http://
fs 173 Spectral Entropy nipy.sourceforge.net/pbrain/), have expressed the interest in
band [1, 3, 5, . . . , 85] PSI and RIR including PyEEG into their code. Therefore, we will keep
1σ
is the standard deviation of the EEG segment. maintaining PyEEG as long as it can benefit the entire
computational neuroscience community.
From Figures 2 and 3, we can see that healthy, interictal,
and ictal EEG signals have different distributions for most
features. Table 2 lists parameters used in this experiment. Availability
5. Discussion and Future Development The software is released under GNU GPL v.3 at Google Code:
http://code.google.com/p/pyeeg/. No commercial software is
So far, we have listed features that can be extracted by required to run PyEEG. Because Python is cross-platform,
PyEEG and their definitions. Our implementation sticks PyEEG can run on Linux, Mac OS, and Windows.
6 Computational Intelligence and Neuroscience
30 50
40
25
40
20 30
30
15
20
20
10
10 10
5
0 0 0
0 10 20 30 40 0 10 20 30 40 0 10 20 30 40
(a) (b) (c)
×104 D ×104 E
250
70
60 200
50
150
40
30 100
20
50
10
0 0
0 10 20 30 40 0 1 20 30 40
(d) (e)
Figure 3: Average PSI of each set. Note that the scale in y-axis of set E is much larger than that of other sets.
Advances in
Fuzzy
Systems
Modelling &
Simulation
in Engineering
Hindawi Publishing Corporation
Hindawi Publishing Corporation Volume 2014 http://www.hindawi.com Volume 2014
http://www.hindawi.com
International Journal of
Advances in Computer Games Advances in
Computer Engineering Technology Software Engineering
Hindawi Publishing Corporation Hindawi Publishing Corporation Hindawi Publishing Corporation Hindawi Publishing Corporation Hindawi Publishing Corporation
http://www.hindawi.com Volume 2014 http://www.hindawi.com Volume 2014 http://www.hindawi.com Volume 2014 http://www.hindawi.com Volume 2014 http://www.hindawi.com Volume 2014
International Journal of
Reconfigurable
Computing