0% found this document useful (0 votes)
28 views67 pages

A Survey On Deep Learning Based Brain Computer Interface

Uploaded by

annupriya1295
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
28 views67 pages

A Survey On Deep Learning Based Brain Computer Interface

Uploaded by

annupriya1295
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 67

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/333802253

[Short Version] A Survey on Deep Learning based Brain-


Computer Interface: Recent Advances and New Frontiers

Preprint · May 2019

CITATIONS READS

0 2,704

6 authors, including:

Xiang Zhang Lina Yao


Harvard University UNSW Sydney
61 PUBLICATIONS 760 CITATIONS 362 PUBLICATIONS 6,235 CITATIONS

SEE PROFILE SEE PROFILE

David McAlpine Yu Zhang


Macquarie University Lehigh University
40 PUBLICATIONS 357 CITATIONS 145 PUBLICATIONS 4,973 CITATIONS

SEE PROFILE SEE PROFILE

Some of the authors of this publication are also working on these related projects:

Human Activity Recognition View project

Label distribution learning View project

All content following this page was uploaded by Xiang Zhang on 15 June 2019.

The user has requested enhancement of the downloaded file.


1

A Survey on Deep Learning based Brain Computer Interface:


Recent Advances and New Frontiers
XIANG ZHANG, University of New South Wales
LINA YAO, University of New South Wales
XIANZHI WANG, University of Technology Sydney
JESSICA MONAGHAN, Macquarie University
DAVID MCALPINE, Macquarie University
YU ZHANG, Stanford University

Brain-Computer Interface (BCI) bridges human’s neural world and the outer physical world by decoding
individuals’ brain signals into commands recognizable by computer devices. Deep learning has enhanced the
performance of brain-computer interface systems significantly in recent years. In this article, we systematically
investigate brain signal types for BCI and related deep learning concepts for brain signal analysis. We
then present a comprehensive survey of deep learning techniques used for BCI, by summarizing over 230
contributions, most published in the past five years. Finally, we discuss the applied areas, emerging challenges,
and future directions for deep learning-based BCI.

Additional Key Words and Phrases: Brain-Computer Interface, deep learning, survey
ACM Reference format:
Xiang Zhang, Lina Yao, Xianzhi Wang, Jessica Monaghan, David McAlpine, and Yu Zhang. 2016. A Survey
on Deep Learning based Brain Computer Interface: Recent Advances and New Frontiers. 1, 1, Article 1
(January 2016), 66 pages.
DOI: 10.1145/nnnnnnn.nnnnnnn

1 INTRODUCTION
Brain-Computer Interface (BCI)1 is a system that translates activity patterns of the human brain
into messages or commands to communicate with the outer world [119]. BCI underpins many novel
applications that are important to people’s daily life, especially to people with psychological/physical
diseases or disabilities. For example, ordinary individuals can enjoy enhanced entertainment and
security when brain waves-based techniques are applied for high fake-resistant user identification
[249]. Another example is that BCI can assist the disabled, elders and people with limited motion
ability (e.g., people with muscle diseases) in controlling wheelchairs, home appliances, and robots.
The key challenge of BCI is to recognize human intents accurately given the meager Signal-to-Noise
Ratio (SNR) of brain signals. Both low classification accuracy and poor generalization ability limit
the real-world application of BCI.
1 There are several terms similar to BCI, e.g.,
Brain Machine Interface (BMI), Brain Interface (BI), Direct Brain Interface (DBI),
and Adaptive Brain Interface (ABI). They all describe machines that are directly controlled by human brain signals.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee
provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and
the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored.
Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires
prior specific permission and/or a fee. Request permissions from permissions@acm.org.
© 2016 ACM. XXXX-XXXX/2016/1-ART1 $15.00
DOI: 10.1145/nnnnnnn.nnnnnnn

, Vol. 1, No. 1, Article 1. Publication date: January 2016.


1:2 Xiang Zhang, et al.

(a) # of papers for all the BCI signals (b) # of papers for the subcategories of EEG

Fig. 1. Breakdown of the papers included in this survey in year of publication and BCI signals. Not all papers
are counted for the years 2018 and 2019 due to limited availability of the related data.

To overcome the above challenges, deep learning techniques, i.e., deep neural networks, have
been investigated to deal with the brain information in the past few years. Deep Learning is a
sub-field of machine learning inspired by the structure and function of the brain. It has shown
excellent representation learning ability since 2006 [42] and therefore been impacting a wide range
of information-processing domains such as computer version, natural language processing, activity
recognition, and logic reasoning [217]. Differing from traditional machine learning algorithms,
deep learning can learn distinct high-level features from raw brain signals without manual feature
selection, and its accuracy scales well with the size of the training set.

1.1 Why Deep Learning?


Although traditional BCI systems have made tremendous progress [2, 20] in the past decades, the
research in BCI still faces significant challenges. First, brain signals are easily corrupted by various
biological (e.g., eye blinks, muscle artifacts, fatigue and concentration level) and environmental
artifacts (e.g., environmental noise) [2]. Therefore, it is crucial to distill informative data from
corrupted brain signals and build a robust BCI system that works under different situations.
Second, BCI has a low SNR due to the non-stationary nature of electrophysiological brain signals
[168]. Although several preprocessing and feature engineering methods have been developed
to decrease the noise level, such methods (e.g., feature selection and extraction both in the time
domain and frequency domain) are time-consuming and may cause information loss in the extracted
features [250].
Third, feature engineering highly depends on human expertise in the specific domain. For
example, it requires basic knowledge of biology to investigate the sleep state through EEG signals.
Human experience may help capture features on some particular aspects but prove insufficient
in more general conditions. Therefore, an algorithm is required to extract representative features
automatically.
Moreover, most existing machine learning research focuses on static data and therefore cannot
classify rapidly changing brain signals accurately. For example, the state-of-the-art classification
accuracy for motor imagery EEG is merely 60% to 80% [120], which is unfeasible for practical uses.
It generally requires novel learning methods to deal with dynamical data streams in BCI systems.

, Vol. 1, No. 1, Article 1. Publication date: January 2016.


A Survey on Deep Learning based Brain Computer Interface 1:3

Table 1. The existing survey on BCI in the last decade. The column ‘comprehensive on signals’ indicates
whether the survey has summarized all the BCI signals or not.

Comprehensive Publication
No. Reference Signal Deep Learning Area
on Signals? Time
1 [48] No EMG, EOG No 2007
Mental Disease
2 [222] No fMRI Yes 2018
Diagnosis
3 [120] Partial EEG (mainly MI EEG and P300) No 2007 Classification
4 [119] Partial EEG (Mainly MI EEG, P300) Partial 2018 Classification
5 [134] Partial EEG (ERD, P300, SSVEP, VEP, AEP) No 2007
Medical Image
6 [113] No MRI, CT Partial 2017
Analysis
Yes (but without
7 [233] No EEG 2019
any model introduction)
8 [20] No EEG No 2007 Signal Processing
9 [220] Partial EEG No 2016 BCI Applications
10 [2] Yes No 2015
11 [140] No EEG Partial (only DBN) 2018
Neurorehabilitation
12 [184] No EEG, fMRI No 2015
of Stroke
13 [5] No MI EEG No 2015
14 [165] No fMRI No 2014
Applications
15 [63] No ERP (P300) No 2017
of ERP”
Applications
16 [114] No fMRI Yes 2018
of fMRI
17 [206] No ERP No 2017 Classification
18 [61] Partial EEG No 2019 Brain Biometrics
Invasive, EEG and the subcategories,
19 Ours yes Yes
fNIRS, fMRI, EOG, MEG

Until now, deep learning has been applied extensively in BCI applications and shown success
in addressing the above challenges [30, 124]. Deep learning possess two advantages. First, it
avoids the time-consuming preprocessing and feature engineering steps by working directly on raw
brain signals to learn distinguishable information through back-propagation. Second, deep neural
networks can capture both representative high-level features and latent dependencies through
deep structures. Our investigation (Figure 1) shows a surge of publications in deep learning based
BCI since 2014.

1.2 Why this Survey is Necessary?


We conduct this survey for three reasons. First, there lacks systematic and comprehensive intro-
duction of BCI signals. Table 1 shows a summary of the existing survey on BCI. To the best of our
knowledge, the limited existing surveys [2, 20, 48, 114, 119, 120, 134, 222] only focus on partial EEG
signals. For example, Lotte et al. [120] and Wang et al. [220] focus on EEG without analyzing EEG
signal types; Cecotti et al. [32] focus on Event-Related Potentials (ERP); Haseer et al. [142] focus
on functional near-infrared spectroscopy (fNIRS); Fatourechi et al. [48] only focus on EMG and
EOG; Wen et al. [222] and Liu et al. [114] only focus on Functional magnetic resonance imaging
(fMRI); Mason et al. [134] brief the neurological phenomenons like event-related desynchronization
(ERD), P300, SSVEP, Visual Evoked Potentials (VEP), Auditory Evoked Potentials (AEP) but have
not organized them systematically; Abdulkader et al. [2] present a topology of brain signals but
have not mentioned spontaneous EEG and Rapid Serial Visual Presentation (RSVP).; Lotte et al.
[119] have not considered ERD and RSVP; Roy et al. [233] list some deep learning based EEG
studies but provide little analysis.
Second, although some overviews have conducted in deep learning ([42, 43, 174]) and BCI
([2, 20, 48, 119, 120, 134]), few focus on their combination. To the best of our knowledge, this paper

, Vol. 1, No. 1, Article 1. Publication date: January 2016.


1:4 Xiang Zhang, et al.

is the first comprehensive summary of the recent advances on deep learning-based BCI. This work
also presents the current frontiers and potential directions in research.
Lastly, unlike this survey, all previous BCI surveys focus on specific areas or applications without
given an overview of the broad scenarios. For example, Litjens et al. [113] review some leading
deep learning concepts pertinent to medical image analysis without covering many other deep
learning models; Soekadar et al. [184] review the BCI systems and machine learning methods that
help overcome stroke-related motor paralysis and focus on Sensorimotor Rhythms (SMR); Vieira et
al. [213] investigate the application of BCI systems on neurological and psychiatric disorders.

1.3 Our Contributions


This survey aims to present a comprehensive and systematic introduction of the recent advances
and new frontiers of deep learning based brain-computer interface techniques. We summarize over
240 contributions in this field, most of which were published in the last five years (after 2014). We
make several key contributions in this survey:
• We first comprehensively summarized the brain signals used for BCI. Also, this is the first
investigation of the biometric signals in deep learning based BCI.
• We summarize deep learning techniques for BCI applications. To our best knowledge, we
are the very first to systemically review Deep learning models for BCI.
• We provide guidelines for choosing a suitable deep learning model for a specific BCI system
and a specific brain signal type.
• We discuss the challenges of deep learning based BCI and highlight the promising topics
for the future research.
The rest of this survey is structured as followed. Section 2 briefly introduces the paradigm of BCI
systems. Section 3 gives a comprehensively introduction of biometric signals used in BCI. Section 4
overviews the commonly used deep learning models. Section 5 presents the state-of-the-art deep
learning techniques for BCI. Section 6 discusses the applications related to brain signals. Section 7
points out the emerging challenges and future directions. Finally, Section 8 gives the concluding
remarks.

2 GENERAL BCI SYSTEM


Figure 2 shows the general paradigm of a BCI system, which receives brain signals and converts
them into control commands for computers. The system includes several key components: brain
signal collection, signal preprocessing, feature engineering, classification, and smart equipment.
The brain signals are collected from humans and sent to the preprocessing component for denoising
and enhancement. Then, the discriminating features are extracted from the processed signals and
sent to the classifier, which recognizes the signals and convert then into external device commands.
The collection methods differ from signal to signal. For example, EEG signals measure the voltage
fluctuation resulting from ionic current within the neurons of the brain. Collecting EEG signals
requires placing a series of electrodes on the scalp of the human head to record the electrical
activity of the brain. Since the ionic current generated within the brain is measured at the scalp,
obstacles (e.g., skull) greatly decrease the signal quality—the fidelity of the collected EEG signals,
measured as Signal-to-Noise Ratio (SNR), is approximately 5% of that of original brain signals
[17]. Therefore, brain signals are usually preprocessed before feature engineering to increase the
SNR. The preprocessing component contains multiple steps such as signal cleaning (smoothing the
noisy signals or resolving the inconsistencies), signal normalization (normalizing each channel of
the signals along time-axis), signal enhancement (removing direct current), and signal reduction
(presenting a reduced representation of the signal).

, Vol. 1, No. 1, Article 1. Publication date: January 2016.


A Survey on Deep Learning based Brain Computer Interface 1:5

Fig. 2. Generally workflow of BCI system.

Feature engineering refers to the process of extracting discriminating features from the input
signals through domain knowledge. Traditional features are extracted from time-domain (e.g.,
variance, mean value, kurtosis), frequency-domain (e.g., fast Fourier transform), and time-frequency
domains (e.g., discrete wavelet transform). They will enrich distinguishable information regarding
user intention. Feature engineering is highly dependent on the domain knowledge. For example,
biomedical knowledge is required to extract features from brain signals of epileptic seizures. Manual
feature extraction is also time-consuming and difficult. Recently, deep learning provides a better
option to automatically extract distinguishable features.
The classification component refers to the machine learning algorithms that classify the extracted
features into logical control signals recognizable by external devices. Deep learning algorithms are
shown to be more powerful than traditional classifiers such as Support Vector Machine (SVM) and
Linear discriminant analysis (LDA).
In this survey, we summarize the state-of-the-art studies which adopt deep learning models
(will be detailed in Section 5): 1) for feature engineering only; 2) for classification only; 3) for both
feature engineering and classification. BCI has vast potential applications for both the disabled
and ordinary individuals. For instance, a BCI system can control household appliances through
patients’ brain signals. Such a system can also serve for entertainment and security purposes. More
BCI applications based on deep learning are introduced in Section 6.

3 BCI SIGNAL RECORDING


In this section, we present a comprehensive and systematic introduction of brain signals used in
BCI systems. Figure 3 shows a taxonomy of brain signals including invasive and non-invasive
signals based on the signal collection method (Section 3.1). Invasive signals are collected from the
surface of the cortex or under the cortex surface (Section 3.2); Non-invasive signals are collected
by the external sensors. EEG signal plays a dominant role among non-invasive signals. Therefore,

, Vol. 1, No. 1, Article 1. Publication date: January 2016.


1:6 Xiang Zhang, et al.

Sleeeping

Intracortical
Motor Imagery

Invasive Spontaneous EEG Emotional EEG


Signals

Mental Disease

EGoG Visual Evoked Rapid Serial Visual


Potential (VEP) Presentation (RSVP)
Others

Event-Related
Auditory Evoked Rapid Serial Auditory
Potiental
Potential (AEP) Presentation (RAVP)
(ERP)

EEG
BCI Signals
Somatosensory
Evoked Potential
(SEP)
Evoked Potential
(EP)
Steady State
fNIRS Visually Evoked
Potentials
(SSVEP)

Event Related Steady State


Steady State
Desynchronization/ Auditory Evoked
Evoked Potentials
Synchronization Potentials
(SSEP)
(ERD/ERS) (SSAEP)
Non-Invasive
fMRI
Signals
Steady State
Somatosensory
Evoked Potentials
(SSSEP)

EOG

MEG

Fig. 3. The biometric signals generally used in BCI system. The dashed quadrilaterals (Intracortical, RAVP,
SEP, SSAEP, and SSSEP) are not included in this survey because there is no existing work focussing on them
involving deep learning algorithms. P300, which is a positive potential recorded approximately 300 ms after
the onset of presented stimuli, is not listed in this signal tree because it is included by ERP (which refers to
all the potentials after the presented stimuli).

we introduce the EEG signal and its subordinate categories in particular in Section 3.3. The basic
characteristics of various brain signals are summarized in Table 2.

3.1 Invasive Recording


Invasive recordings are acquired by electrodes deployed under the scalp. Figure 4 [102] shows both
‘intraparenchymal signals’ gathered from the cortex and ‘Electrocorticography (ECoG)’2 gathered
from the surface of cortex (dura and arachnoid).

2 Some studies name the Intracortical as ‘invasive’ while referring to ECoG as ‘semi-invasive’. In this survey, we combine
the so-called ‘invasive’ and ‘semi-invasive’ into ‘invasive’ since they both require surgery.

, Vol. 1, No. 1, Article 1. Publication date: January 2016.


A Survey on Deep Learning based Brain Computer Interface 1:7

Fig. 4. Signal source locations in the brain

Table 2. Summary of various brain signals’ characteristics.

Invasive Non-invasive
Signals
Intracortical EcoG EEG fNIRS fMRI EOG MEG
Risk High High Low Low Low Low Low
Spatial resolution Very High High Low Mediate High Low Mediate
Temporal resolution High High Mediate Low Low Mediate High
Signal-to-Noise Ratio High High Low Low Mediate Mediate Low
Portability Mediate Mediate High High Low High Low
Cost High High Low Low High Low High
Characteristic Electrical Electrical Electrical Optical Metabolic Electrical Magnetic

Invasive techniques can provide high-quality brain signals as electrodes collect signals directly
from locations near the brain neurons. The collected signals have high temporal and spatial
resolution3 and high SNR. Nevertheless, invasive methods suffer from two challenges. First, the
implantation of electrodes requires a surgical procedure, which is expensive and risky due to the
potential medical complications such as transplant rejection. Second, implanted electrodes are
fixed and therefore can only measure the brain signals from the same locations. For the above
reasons, invasive BCI techniques are mainly used in animals (e.g., monkey and rat) and for people
with severe disabilities (e.g., ALS patients) [2] in practice.
3.1.1 Intracortical. The intracortical recording technique involves the insertion of electrodes
into the cortex of the subject’s brain (Figure 4). The implanted microelectrode can be a single
electrode or an array of electrodes. Generally, the intracortical electrodes provide high-resolution
motor control brain signals, as movement is the most easily observable phenomenon compared to
other phenomena, such as hearing. Under the cortex, the electrodes are sensitive enough to pick up
the discrete all-or-none output of single neurons, the action potential, commonly referred to as a
“spike”, as well as the summed voltage fluctuations from small to large numbers of neurons, called
field potentials. Each electrode provides spiking from up to a few neurons, yielding the population’s
time evolving output pattern. These represent but a small sample of the entire set of neurons in
this limited region, as spiking can only be detected by microelectrodes closely approximated to a
neuron [76]. [151] developed a high-performance BCI system for communication of ALS patients.
This work implanted a 96-channel silicon microelectrode array in the motor cortex corresponding
3 Spatial resolution refers to how well the signal discriminates between nearby locations.

, Vol. 1, No. 1, Article 1. Publication date: January 2016.


1:8 Xiang Zhang, et al.

(a) ECoG microelectrodes (b) ECoG signals

Fig. 5. ECoG grid on cortical surface and ECoG signals.

to hand area and recorded users’ motor intention by the microelectrode array. The array was then
decoded into point-and-click commands to control a cursor.
3.1.2 Electrocorticography (ECoG). Electrocorticography (ECoG) is an extracortical invasive
electrophysiological monitoring method to record brain activity. The electrodes collecting ECoG
are attached under the skull, above (epidural) or below (subdural) the dura mater, but not within the
brain parenchyma itself (Figure 4) [102]. ECoG provides a trade-off between higher SNR compared
to non-invasive recordings and lower risk compared to intracortical recordings. It provides a higher
spatial resolution and a rather high SNR with a lower surgical risk. Therefore, ECoG has a better
prospect in the medical arena than intracortical recordings.
The ECoG collection approach and the signals are shown in Figure 5 [19]. ECoG signals have a
higher amplitude compared to non-invasive brain wave signals. For instance, ECoG has higher than
50 µV maximum amplitude while the EEG amplitude is generally lower than 20 µV . The higher
amplitude renders ECoG less vulnerable to artifacts such as eye blink activity. Moreover, ECoG has
a bandwidth of 0-500 Hz which is much wider than EEG (0-40Hz), due to the low pass filtering
effects of the skull. The wider frequency bands take substantial information from functional areas
of a brain (e.g., motor and language) and thus can be used to train a higher-performance BCI system.
However, the disadvantages of an invasive methods like ECoG (such as the risky surgery and
inconvenience of permanently attached devices) naturally limit its wide deployment in real-world
scenarios.

3.2 Noninvasive Recording


Noninvasive recordings can gather user’s brain information without electrodes being insert. Signals
can be collected using electrical, magnetic or metabolic methods. Noninvasive signals mainly
include Electroencephalogram (EEG), Functional near-infrared spectroscopy (fNIRS), Functional
magnetic resonance imaging (fMRI), Electrooculography (EOG), and Magnetoencephalography
(MEG). EEG related studies represent the considerable majority of noninvasive signals and have
numerous sub-classes. We will introduce more details and sub-classes of EEG in Section 3.3.
3.2.1 Electroencephalography (EEG). Electroencephalography (EEG) is the most commonly
used noninvasive technique for measuring brain activities. EEG monitors the voltage fluctuations

, Vol. 1, No. 1, Article 1. Publication date: January 2016.


A Survey on Deep Learning based Brain Computer Interface 1:9

Fig. 6. (A) and (B) are the left and above view of the international 10-20 system. (C) presents the intermediate
10% electrodes positions.

(a) EEG (b) EEG signals

Fig. 7. EEG collection scenario and the gathered signals. The subject is undertaking imagination task.

generated by an electrical current within human neurons. Electrodes placed on the scalp measure
the amplitude of EEG signals. EEG signals have a low spatial resolution because the number of
electrodes is limited. EEG electrode locations generally follow the international 10-20 system or
the intermediate 10% electrode positions [125]. The international 10-20 system divides the scalp

, Vol. 1, No. 1, Article 1. Publication date: January 2016.


1:10 Xiang Zhang, et al.

Table 3. EEG patterns and corresponding characters. Awareness Degree denotes the awareness the degree of
being aware of an external world.

Patterns Frequency (Hz) Amplitude Brain State Awareness Degree Produced Location
Delta 0.5-4 Higher Deep sleep pattern Lower Frontally and posteriorly
Theta 4-8 High Light sleep pattern Low Entorhinal cortex, hippocampus
Alpha 8-12 Medium Closing the eyes, relax state Medium Posterior regions of head
Beta 12-30 Low Active thinking, focus, high alert, anxious High Most evident frontally
Gamma 30-100 Lower During cross-modal sensory processing Higher Somatosensory cortex

in 10% and 20% intervals and totally contains 21 electrode locations (Figure 6). The intermediate
10% electrodeposition is standardized by the American Electroencephalographic Society and split
the scalp with 10% intervals, which contains 75 electrodes. The existing EEG collection system
is generally less than 75 electrodes, in specific, 64 electrodes (BCI 2000 system), 32 electrodes
(openBCI headset), 14 electrodes (Emotiv EPOC+ headset), five electrodes (Emotiv insight headset),
and one electrode (Mindware headset).
The temporal resolution of EEG signals is much better than the spatial resolution. The ionic
current changes rapidly, which offers a temporal resolution higher than 1000 Hz. The SNR of
EEG is generally very poor due to both objective and subjective factors. Objective factors include
environmental noises, the obstruction of the skull and other tissues between cortex and scalp, and
different stimulations. Subjective factors contain the subject’s mental stage, fatigue status, and the
variance among different subjects.
EEG recording equipment can be installed in a cap-like headset. As shown in Figure 7 [250], the
EEG headset can be mounted on the user’s head to gather signals. Compared to other equipment
used to measure brain signals, EEG headsets are portable and more accessible for most applications.
The EEG signals collected from any typical EEG hardware have several non-overlapping frequency
bands (Delta, Theta, Alpha, Beta, and Gamma) based on the strong intra-band correlation with a
distinct behavioral state [250]. Each EEG pattern contains signals associated with particular brain
information. Table 3 shows EEG frequency patterns and the corresponding characteristics. In this
paper the degree of awareness denotes the perception of individuals when presented with external
stimuli. Each frequency band represents a brain state and a qualitative assessment of awareness:
• Delta pattern (0.5 − 4 Hz) corresponds to deep sleep when the subject has lower awareness.
• Theta pattern (4 − 8 Hz) corresponds to light sleep in the realm of low awareness.
• Alpha pattern (8 − 12 Hz) mainly occurs during eyes closed and deeply relaxed state and
corresponds to the medium awareness.
• Beta pattern (12 − 30 Hz) is the dominant rhythm while the subject’s eyes are open and is
associated with high awareness. Beta patterns capture most of our daily activities (such as
eating, walking, and talking).
• Gamma pattern (30 − 100 Hz) represents the co-interaction of several brain areas to carry
out a specific motor and cognitive function. This pattern is associated with the highest
awareness.
3.2.2 Functional Near-infrared Spectroscopy (fNIRS). Functional near-infrared spectroscopy
(fNIRS) is a noninvasive functional neuro-imaging technology using near-infrared (NIR) light [143].
In specific, fNIRS employs NIR light to measure the aggregation degree of oxygenated hemoglobin
(Hb) and deoxygenated-hemoglobin (deoxy-Hb) because Hb and deoxy-Hb have higher absorbence
of light than other head components such as the skull and scalp. fNIRS relies on blood-oxygen-
level-dependent (BOLD) response or hemodynamic response to form a functional neuro-image.
The BOLD response can detect the oxygenated or deoxygenated blood level in the brain blood. The

, Vol. 1, No. 1, Article 1. Publication date: January 2016.


A Survey on Deep Learning based Brain Computer Interface 1:11

(a) fNIRS collection equipment (b) fNIRS signals

Fig. 8. fNIRS collection equipment and the gathered signals

relative levels reflect the blood flow and neural activation, where increased blood flow implies a
higher metabolic demand caused by active neurons. For example, when the user is concentrating
on a mental task, the prefrontal cortex neurons will be activated, and the BOLD response in the
prefrontal cortex area will be stronger [72].
Figure 84 shows the fNIRS collection hardware and the collected signals. Single or multiple
emitter-detector pairs measure the Hb and deoxy-Hb: the emitter transmits NIR light through the
blood vessels to the detector. Most existing studies use fNIRS technologies to measure the status
of prefrontal and motor cortex. The former response to mental tasks and music/image imagery
while the latter is a response to motor-related tasks (e.g., motor imagery). The monitored Hb
and deoxy-Hb change slowly since the blood speed varies in a relatively slow ratio compared
to electrical signals. Therefore, fNIRS signals have lower temporal resolution5 compared with
electrical or magnetic signals. The spatial resolution depends on the number of emitter-detector
pairs. In current studies, three emitters and eight detectors would suffice for adequately acquiring
the prefrontal cortex signals; and six emitters and six detectors would suffice for covering the
motor cortex area [142]. fNIRS has a drawback in that it cannot be used to measure cortical activity
occurring deeper than 4cm in the brain, due to the limitations in light emitter power and spatial
resolution.
3.2.3 Functional Magnetic Resonance Imaging (fMRI). Functional magnetic resonance imaging
(fMRI) monitors brain activities by detecting changes associated with blood flow in brain areas
[222]. Similar to fNIRS, fMRI relies on the BOLD response. The main differences between fNIRS
and fMRI are as follows [114]. First, as the name implies, fMRI measures BOLD response through
magnetic instead of optical methods. Hemoglobin differs in how it responds to magnetic fields,
depending on whether it has a bound oxygen molecule. The magnetic fields are more sensitive to
and are more easily distorted by deoxy-Hb than Hb molecules. Second, the magnetic fields have
higher penetration than NIR light, which gives fMRI greater ability to capture information from
deep parts of the brain than fNIRS. Third, fMRI has a higher spatial resolution than fNIRS since the
latter’s spatial resolution is limited by the emitter-detector pairs. However, the temporal resolutions
of fMRI and fNIRS are at an equal level because they both constrained by the blood flow speed.

4 https://www.artinis.com/fnirs
5 Temporal resolution refers to the smallest time of neural activity reliably separated by the signal.

, Vol. 1, No. 1, Article 1. Publication date: January 2016.


1:12 Xiang Zhang, et al.

(a) fMRI collection equipment (b) fMRI images

Fig. 9. fMRI collection equipment and the gathered fMRI signals while the subject is speakiing and finger
taping

(a) EOG collection (b) EOG signals

Fig. 10. EOG collection equipment and the gathered vertical signals while the subject is looking in different
directions and blinking

fMRI has several flaws compared to fNIRS: 1) fMRI requires an expensive scanner to generate
magnetic fields; 2) the scanner is heavy and has poor portability. Figure 96 shows the fMRI
acquisition machine, and the resulting brain images. fMRI images of speech perception and finger
taping have a significant difference, which indicates that it has high SNR.
3.2.4 Electrooculography (EOG). Electrooculography (EOG) is a technique for measuring the
corneo-retinal standing potential that exists between the front and the back of the human eyes.
Most patients who have lost voluntary motor movements (e.g., locked-in syndrome patients) remain
in partial control of the eyes [231]. The eye movements can be detected by EOG signals to interact
with the external devices. Therefore, we regard EOG signals as one class of BCI signals in this survey.
EOG can be used to communicate the user and the outer world because different eye movements
will cause different electrical potentials. Pairs of electrodes are typically placed above/below the
6 https://www.jameco.com/Jameco/workshop/HowItWorks/what-is-an-fmri-scan-and-how-does-it-work.html

, Vol. 1, No. 1, Article 1. Publication date: January 2016.


A Survey on Deep Learning based Brain Computer Interface 1:13

(a) MEG equipment (b) MEG signals

Fig. 11. MEG collection equipment and the gathered signals

eye or to the left/right of the eye to measure EOG signals. The EOG collection equipment [15] and
the collected signals [166] can be found in Figure 10. In Figure 10a, EOG electrode placements
where electrodes Ch.V+ and Ch.V- measure the vertical movements, and Ch.H+ and Ch.H- measure
the horizontal movements. G electrode representing the ground line works as a reference point.
Figure 10b shows the vertical EOG in the time domain under six scenarios (looking upward, looking
downward, single blink, double blink, looking leftward, and looking rightward). We can observe
EOG signals have large variances between different scenarios, indicating they have a relatively high
SNR and are easily recognizable by machine learning algorithms. EOG has low spatial resolution
compared to other brain signals since we can only detect the vertical and horizontal potentials.
The temporal resolution of EOG is higher than neuroimaging techniques because the electrical
potentially vary faster than metabolic features (e.g., blood flow).
3.2.5 Magnetoencephalography (MEG). Magnetoencephalography (MEG) is a functional neu-
roimaging technique for mapping brain activity by recording magnetic fields produced by electrical
currents occurring naturally in the brain, using very sensitive magnetometers [39]. The ionic
currents of active neurons will create weak magnetic fields. The generated magnetic fields can
be measured by magnetometers like SQUIDs (superconducting quantum interference devices).
However, producing a detectable magnetic field requires massive (e.g., 50,000) active neurons with
similar orientation. The source of the magnetic field measured by MEG is the pyramidal cells which
are perpendicular to the cortex surface.
MEG has a relatively low spatial resolution since the signal quality highly depends on the
measurement factors (e.g., brain area, neuron orientations, neuron depth). However, MEG can
provide very high temporal resolution (≥1000Hz) since MEG directly monitors the brain activity
from the neuron level, which is in the same level of intracortical signals. The MEG equipment7 and
the signals collected[163] are shown in Figure 11. MEG equipment is expensive and not portable
which limits its real-world deployment for BCI. The brain map snapshots of MEG signals are
collected at different times (140, 200, 350, and 500 ms) from two subjects under different mental
tasks.

7 https://www.biomagcentral.org/biomagnetism/meg

, Vol. 1, No. 1, Article 1. Publication date: January 2016.


1:14 Xiang Zhang, et al.

3.3 EEG Paradigms


Compared to other noninvasive signals (e.g., fMRI, fNIRS, EOG, MEG), EEG has several important
advantages: 1) the hardware has higher portability with much lower price; 2) the temporal resolution
is very high (milliseconds level)8 ; 3) EEG is relatively tolerant of subject movement and artifacts,
which can be minimized by existing signal processing methods; 4) the subject doesn’t need to be
exposed to high-intensity (>1 Tesla) magnetic fields. Thus, EEG can serve subjects that have metal
implants in their body (such as metal-containing pacemakers).
As the most commonly used signals, there are a huge number of sub-classes of EEG signals.
In this section, we present a systematic introduction of EEG sub-class signals. As shown in
Figure 3, we divided EEG signals into spontaneous EEG, evoked potentials, and event-related
desynchronization/synchronization. Evoked potentials can be split into event-related potentials
and steady-state evoked potentials based on the frequency of the external stimuli. Each potential
contains visual-, auditory-, and somatosensory- potentials based on the external stimuli types. The
dashed quadrilaterals in Figure 3, such as Intracortical, SEP, SSAEP, SSSEP, and RSAP, are not
included in this survey because there are very few existing studies working on them with deep
learning algorithms. We list these signals for systematic completeness.
3.3.1 Spontaneous EEG. Generally, when we talk about the term ‘EEG,’ we refer to sponta-
neous spontaneous EEG which measures the brain signals under a specific state without external
stimulation. For example, spontaneous EEG includes the EEG signals while the user is sleeping, un-
dertaking a mental task (e.g., counting), under fatigue stage, suffering brain disorders, undertaking
motor imagery tasks, etc.
The EEG signals recorded while a user stares at a color/shape/image belong to this category.
While the subject is gazing at a specific image, the visual stimuli are steady without any change. This
scenario differs from the visual stimuli in evoked potential, where the visual stimuli are changing
at a specific frequency. Thus, we regard the image stimulation as a particular state and categorise it
as spontaneous EEG. BCI systems based on spontaneous EEG are challenging to train, due to the
lower SNR and the larger variation across subjects [155].
3.3.2 Evoked Potential (EP). Evoked Potentials (EP) or evoked responses refers to the EEG
signals which are evoked by a event stimulus instead of spontaneously. An EP is time-locked to
the external stimulus while the aforementioned spontaneous EEG is non-time-locked. In contrast
to spontaneous EEG, EP generally has higher amplitude and lower frequency. As a result, the
EP signals are more robust across subjects. According to the stimulation method, there exist
two categories of EP: the Event-Related Potential (ERP) and the Steady State Evoked Potential
(SSEP). ERP records the EEG signals in response to an isolated discrete stimulus event. To achieve
this isolation, stimuli in an ERP experiment are typically separated from each other by a long
inter-stimulus interval, allowing for the estimation of a stimulus-independent baseline reference
[144]. The stimuli frequency of ERP is generally lower than 2 Hz. In contrast, SSEP is generated
in response to a periodic stimulus at a fixed rate. The stimuli frequency of SSEP generally ranges
within 3.5-75 Hz.
Event-related potential (ERP). There are three kinds of evoked potentials in extensive research
and clinical use: Visual Evoked Potentials (VEP); Auditory Evoked Potentials (AEP); and Somatosen-
sory Evoked Potentials (SEP) [32]. The VEP signals are mainly on the occipital lobe, and the highest
signal amplitudes are collected at the Calcarine sulcus.
1) Visual Evoked Potentials (VEP). Visual Evoked Potentials are a specific category of ERP which
is caused by visual stimulus (e.g., an alternating checkerboard pattern on a computer screen).
8 Among other noninvasive techniques, only MEG has the same level of temporal resolution.

, Vol. 1, No. 1, Article 1. Publication date: January 2016.


A Survey on Deep Learning based Brain Computer Interface 1:15

(a) ERP components in the 500 ms after the stimulus (b) P300 speller

Fig. 12. P300 waves and visual-based P300 speller

VEP signals are hidden within the normal spontaneous EEG. To separate VEP signals from the
background EEG readings, repetitive stimulation and time-locked signal-averaging techniques are
generally employed.
Rapid Serial Visual Presentation (RSVP) [101] can be regarded as one kind of VEP. An RSVP
diagram is commonly used to examine the temporal characteristics of attention. The subject is
required to stare at a screen where a series of items (e.g., images) are presented one-by-one. There
is a specific item (called the target) separates from the rest of the other items (called distracters).
The subject knows which is the target before the RSVP experiment. Generally, the distracters can
either be a color change or letters among numbers. RSVP contains a static mode (the items appear
on the screen and then disappear without moving) and a moving mode (the items appear on the
screen, move to another place, and finally disappear). Nowadays, BCI research mainly focuses on
the static mode RSVP. Usually, the frequency of RSVP is 10Hz which means that each item will
stay on the screen for 0.1 seconds.
2) Auditory Evoked Potentials (AEP). Auditory Evoked Potentials are a specific subclass of ERP
in which responses to auditory (sound) stimuli are recorded. AEP is mainly recorded from the
scalp but originates at the brainstem or cortex. The most common AEP measured is the auditory
brainstem response (ABR) which is generally employed to test the hearing ability of newborns
and infants. In the BCI area, AEP is mainly used in clinical tests for its accuracy and reliability in
detecting unilateral loss [36]. Similar to RSVP, Rapid Serial Auditory Presentation (RSAP) refers to
experiments with rapid serial presentation of sound stimuli. The task for the subject is to recognize
the target audio among the distracters.
3) Somatosensory Evoked Potentials (SEP).9 Somatosensory Evoked Potentials are another
commonly used subcategory of ERP which is elicited by electrical stimulation of the peripheral
nerves. SEP signals conclude a series of amplitude deflection that can be elicited by virtually any
sensory stimuli.

9 Generally, Somatosensory Evoked Potentials is abbreviated as SSEP or SEP. In this paper, we choose SEP as the abbreviation

in case of the conflict with Steady-State Evoked Potentials (SSEP).

, Vol. 1, No. 1, Article 1. Publication date: January 2016.


1:16 Xiang Zhang, et al.

P300. P300 (also called P3) is an important component in ERP [60]. Here we introduce P300 signal
separately since it is widely-used for BCI. Figure 12a shows the ERP signal fluctuation in the 500
ms after the stimuli onset10 . The waveform mainly concludes five components, P1, N1, P2, N2,
and P3. The capital character P/N represents positive/negative electrical potentials. The following
number refers to the occurrence time of the specific potential. Thus, P300 denotes the positive
potential of ERP waveform at approximately 300 ms after the presented stimuli. Compared to other
components, P300 has the highest amplitude and is easiest to detect. Thus, a large number of BCI
studies focus on P300 analysis. P300 is more of an informative feature instead of a type of brain
signal (e.g., VEP). Therefore, we do no list P300 in Figure 3. P300 can be analyzed in most of ERP
signals such as VEP, AEP, SEP.
In practice, P300 can be elicited by rare, task-relevant events in an ‘oddball’ paradigm (e.g., P300
speaker). In the oddball paradigm, the subject receives a series of stimuli where low-probability
target items are mixed with high-probability non-target items. Visual and auditory stimuli are
the most commonly used in the oddball paradigm. Figure 12b shows an example of visual-based
P300 speller which enables the subject the spell letters/numbers directly through brain signals
[47]. The 26 letters of the alphabet and the Arabic numbers are displayed on a computer screen
which serves as the keyboard. The subject focuses attention successively on the characters they
wish to spell. The computer detects the chosen character online in real time. This detection is
achieved by repeatedly flashing rows and columns of the matrix. When the elements containing
the selected characters are flashing, a P300 fluctuation is elicited. In the 6 × 6 matrix screen, the
rows and columns flash in mixed random order. The flash duration and interval among adjacent
flashes are generally set as 100 ms [25]. The columns and rows flash separately. First, the columns
flash six times with each column flashing one time. Second, the rows will flash for six times. After
that, this paradigm repeats for several times (e.g., N times). The P300 signals of the total 12N flash
will be analyzed to output a single outcome (i.e., one letter/number).
Steady State Evoked Potentials (SSEP). Steady State Evoked Potentials is another subcategory
of evoked potentials, which are periodic cortical responses evoked by certain repetitive stimuli
with a constant frequency. It has been demonstrated that the brain oscillations generally maintain
a steady level over time while the potentials are evoked by steady state stimuli (e.g., a flickering
light with fixed frequency). Technically, SSEP is defined as a form of response to repetitive sensory
stimulation in which the constituent frequency components of the response remain constant over
time in both amplitude and phase [161]. Depending on the type of stimuli, SSEP can be divided
into three subcategories: Steady-State Visually Evoked Potentials (SSVEP), Steady-State Auditory
Evoked Potentials (SSAEP), and Steady-State Somatosensory Evoked Potentials (SSSEP). In the BCI
area, most studies are focused on visual evoked steady potentials, and only rarely do papers focus
on auditory and somatosensory stimuli. Therefore, in this survey, we mainly introduce SSVEP
rather than SSAEP and SSSEP.
Difference Among various visual evoked potentials paradigms. Visual evoked potentials are
the most common used potentials. Therefore, it is essential to distinguish the three different visual
evoked potential paradigms: VEP, RSVP, SSVEP. Here, we theoretically introduce the characteristics
of each paradigm and then give three demonstration videos to provide a better understanding. First,
the frequencies are different: the frequency of VEP is less than 2Hz while the frequency of RSVP
is around 10Hz, and the frequency of SSVEP ranges from 3.5 to 75Hz. Second, they have various
presentation protocols. In the VEP paradigm, different visual patterns will be presented on the
screen to check the user’s brain signals changes. For instance, in this video11 , the image pattern is
10 Note that the negative voltage of ERP is plotted upward, which is common in ERP research.
11 https://www.youtube.com/watch?v=iUW l5YAEEM

, Vol. 1, No. 1, Article 1. Publication date: January 2016.


A Survey on Deep Learning based Brain Computer Interface 1:17

Deep Learning
Models

Discriminative Representative Generative


Hybrid Models
Models Models Models

MLP RNN CNN AE RBM DBN VAE GAN

LSTM GRU D-AE D-RBM DBN-AE DBN-RBM

Fig. 13. Deep learning models. They can be divided into discriminative, representative, generative and hybrid
models based on the algorithm function. D-AE denotes Stacked-Autoencoder which refers to the autoencoder
with multiple hidden layers. Deep Belief Network can be composed of AE or RBM, therefore, we divided DBN
into DBN-AE (stacked AE) and DBN-RBM (stacked RBM).

full of the screen and changes dramatically. In RSVP diagram, several items will be presented on a
screen one-by-one. All the items are shown in the same place and share the same frequency. For
example, the video12 shows an RSVP scenario which is called speed reading. In SSVEP paradigm,
several items will be presented on a screen at the same time while the items are shown at variant
positions with different frequencies. For example, in this demonstration video13 , there are four
circles distributed on the up, down, left, and right sides of a screen and the frequency of each item
differs from each other.
3.3.3 Event-related Desynchronization/Synchronization (ERD/ERS). Event-related desynchro-
nization/synchronization (ERD/ERS) refers to the phenomena that the magnitude and frequency
distribution of the EEG signal power changes during a specific brain state [81]. In particular, ERD
denotes the power decrease of ongoing EEG signals while ERS represents the power increase of
EEG signals. This characteristic of ERD/ERS of brain signals can be used to detect the event which
caused the EEG fluctuation. For example, [154] presents the ERD/ERS phenomena in motor cortex
recorded during a motor-imagery task. The task causes an ERD in the mu band (8-13 Hz) of EEG
and an ERS in the beta band (13-30 Hz).
Compared to spontaneous EEG signals, ERD/ERS is not only a kind of spontaneous brain signal
but also a decreasing/increasing phenomena. Differing from other spontaneous EEG, ERD/ERS
analysis exploits power fluctuations. Compare to evoked potentials, ERD/ERS does not require
external stimuli. ERD/ERS can be collected by performing mental tasks, such as motor imagery,
mental arithmetic, or mental rotation. However, to collect the high-quality ERD/ERS signals, the
subjects are required to take extensive training which may take several weeks. Moreover, the
performance of ERD/ERS among users is quite variable, and the accuracy is not very high [2]. Thus,
this paradigm is not one of the most dominant BCI approaches.

4 DEEP LEARNING MODELS


In this section, we formally introduce the deep learning models including concepts, architectures,
and techniques commonly used in the BCI field. Deep learning is a class of machine learning
12 https://www.youtube.com/watch?v=5yddeRrd0hA&t=36s
13 https://www.youtube.com/watch?v=t96rl1SFHlI

, Vol. 1, No. 1, Article 1. Publication date: January 2016.


1:18 Xiang Zhang, et al.

techniques that uses many layers of information-processing stages in hierarchical architectures


for pattern classification and feature/representation learning [42]. The standard neural network
(Figure 14a) contains three neuron layers including an input layer, a hidden layer, and an output
layer. Any neural network with more than four layers (one input layer, ≥ 2 hidden layers, and
one output layer) can be called deep neural network for the reason that it is ‘deeper’ than the
standard/shallow neural networks (3 layers).
In this survey, we will give relative detail introduction of various deep learning models for the
reason that a part of the potential readers who are from non-computer area (e.g., biomedical) are
not familiar to deep learning.
Deep learning algorithms divide into several subcategories based on the aim of the techniques
(as shown in Figure 13):
• Discriminative deep learning models, which classify the input data into a pre-known label
based on the adaptively learned discriminative features. Discriminative algorithms are able
to learn distinctive features by non-linear transformation, and classification through proba-
bilistic prediction14 . Thus these algorithms can play the role of both feature engineering
and classification (corresponding to Figure 2). Discriminative architectures mainly include
Multi-Layer Perceptron (MLP), Recurrent Neural Networks (RNN), Convolutional Neural
Networks (CNN), along with their variations.
• Representative deep learning models, which learn the pure and representative features
from the input data. These algorithms only have the function of feature engineering
(corresponding to Figure 2) but fail to classify. Commonly used deep learning algorithms
for representation are Autoencoder (AE), Restricted Boltzmann Machine (RBM), Deep Belief
networks (DBN), along with their variations.
• Generative deep learning models, which learn the joint probability distribution of the
input data and the target label. In the BCI scope, generative algorithms are mostly used
in reconstruction or to generate a batch of brain signals samples to enhance the training
set. Generative models commonly used in BCI include variational Autoencoder (VAE)15 ,
Generative Adversarial Networks (GANs), etc.
• Hybrid deep learning models, which combine more than two deep learning models. For
example, the typical hybrid deep learning model employs a representation algorithm for
feature extraction and discriminative algorithms for classification.
The summary of the characteristics of each deep learning subcategories are listed in Table 4. Almost
all the classification functions in neural networks are implemented by a softmax layer, which will
not be regarded as an algorithmic component in this survey. For instance, a model combining a
DBN, and a softmax layer will still be regarded as a representative model instead of a hybrid model.

4.1 Discriminative Deep Learning Models


Since the main task of BCI is brain signal recognition, the discriminative deep learning models
are the most popular and powerful algorithms. Suppose we have a dataset of brain signal samples
{X, Y} where X denotes the set of brain signal observations and Y denotes the set of sample
ground truth (i.e., labels). Suppose an specific sample-label pair {x ∈ RN , y ∈ RM } where N and M
denote the dimension of observations and the number of sample categories, respectively. The aim
14 The classification function is achieved by the combination of a softmax layer and one-hot label encoding. The one-hot
label encoding refers to encoding the label by the one-hot method, which is a group of bits among which the only valid
combinations of values are those with a single high (1) bit and all the others low (0) bits. For instance, a set of labels 0, 1, 2,
3 can be encoded as (1, 0, 0, 0), (0, 1, 0, 0), (0, 0, 1, 0), (0, 0, 0, 1).
15 VAE is a variation of AE. However, they are working on different aspects. Therefore, we separately introduce AE and VAE.

, Vol. 1, No. 1, Article 1. Publication date: January 2016.


A Survey on Deep Learning based Brain Computer Interface 1:19

Table 4. Summary of deep learning model types

Deep Learning Input Output Function Training method


Discriminative Input data Label Feature extraction, Classification Supervised
Representative Input data Representation Feature extraction Unsupervised
Generative Input data New Sample Generation, Reconstruction Unsupervised
Hybrid Input data – – –

Input Layer Hidden Layer Output Layer Input Layer Hidden Layer (1) Hidden Layer (2) Output Layer

(a) Basic fully-connected neural network (b) Multilayer Perceptron

Fig. 14. Illustration of standard neural network and multilayer perceptron. (a) The basic structure of the
fully-connected neural network. The input layer receives the raw data or extracted features of brain signals
while the output layer shows the classification results. The term ‘fully-connected’ denotes each node in a
specific layer is connected with all the nodes in the previous and next layer. (b) MLP could have multiple
hidden layers, the more, the deeper. This is an example of MLP with two hidden layers, which is the simplest
MLP model.

O1 Ot­1 Ot Ot+1

c1 ct­1 ct c
ct­2 ct­1 ct t+1

I1 It-1 It It+1 Input layer Convolutional


Layer 1
Pooling
Layer 1
Convolutional
Layer 2
Pooling
Layer 2
Fully-connected
Layer
Output Layer

(a) Recurrent Neural Networks (b) Convolutional Neural Networks

Fig. 15. Illustration of RNN and CNN models. (a) The recurrent procedure of the RNN model. This procedure
describes the recurrent procedure of a specific node in time range [1, t + 1]. The node at time t receives two
inputs variables (It denotes the input at time t and c t −1 denotes the hidden state at time t − 1) and exports
two variables (the output O t and the hidden state c t at time t). (b) The paradigm of CNN model which
includes two convolutional layers, two pooling layers, and one fully-connected layer.

of discriminative deep learning models is to lean a function with the mapping: x → y. In short,
the discriminative models receive the input data and output the corresponding category or label.
All the discriminative models introduced in this section are supervised learning techniques which
require the information of both the observations and the ground truth.
4.1.1 Multi-Layer Perceptron (MLP). Multilayer Perceptron is the simplest and the most basic
deep learning model. The key difference between MLP and the shallow neural network is that MLP
has more than one hidden layers. All the nodes are fully-connected with the nodes of the adjacent
layers but without connection with the other nodes of the same layer. MLP includes multiple

, Vol. 1, No. 1, Article 1. Publication date: January 2016.


1:20 Xiang Zhang, et al.

(a) Structure of LSTM cell (b) Structure of GRU cell

Fig. 16. Illustration of detailed LSTM and GRU cell structures. (a) LSTM cell receives three inputs (It denotes
the input at time t, O t −1 denotes the output of previous time, and c t −1 denotes the hidden state of the
previous time) and exports two outputs (the output of this time O t and the hidden state of this time c t ).
LSTM cell contains four gates in order to control the data flow, which are the input gate, output gate, forget
gate, and input modulation gate. (b) GRU cell receives two inputs (the input of this time It and the output of
the previous time O t −1 ) and exports its output O t . GRU cell only contains two gates which are the reset gate
and the update gate. Unlike the hidden state c t in LSTM cell, there is no transmittable hidden state in GRU
cell except one intermediate variable O¯t .

hidden layers. As shown in Figure 14b, we take a structure with two hidden layers as an example
to describe the data flow in MLP. First, we define an operation T (·) as
T (x) = w ∗ x + b
T (x, x 0) = w ∗ x + b + w 0 ∗ x 0 + b 0
where x and x0 denote two variables while w, w 0, b, and b 0 denote the corresponding weights and
basis.
The input layer receives the observation x and feeds forward to the first hidden layer,
x h1 = σ (T (x))
where x h1 denotes the data flow in the first hidden layer and σ represents the non-linear activation
function. There several commonly used activation function such as sigmoid/Logistic, Tanh, ReLU,
we choose sigmoid activation function as an example in this section. Then, the data flow to the
second hidden layer and the output layer,
x h2 = σ (T (x h1 ))
y 0 = σ (T (x h2 ))
where y 0 denotes the predict results in one-hot format. The error (i.e., loss) could be calculated
based on the distance between y 0 and the ground truth y. For instance, the Euclidean-distance
based error can be calculated by
error = ky 0 − y k 2 (1)
where k·k 2 denotes the Euclidean norm. Afterward, the error will be back-propagated and optimized
by a suitable optimiser. The optimizer will adjust all the weights and basis in the model until the
error converges. The most widely used loss functions includes cross-entropy, negative log likelihood,

, Vol. 1, No. 1, Article 1. Publication date: January 2016.


A Survey on Deep Learning based Brain Computer Interface 1:21

mean square estimation, etc. The most widely used optimizers include Adaptive moment estimation
(Adam), Stochastic Gradient Descent (SGD), Adagrad (Adaptive subgradient method), etc.
Several terms may be easily confused with each other: Artificial Neural Network (ANN), Deep
Neural Network (DNN), and MLP. These terms have no strict difference and often mixed in literature.
Generally, ANN represents neural networks with fewer hidden layers (shallow) while DNN have
more (in this case, DNN is equivalent to MLP). Additionally, DNN can be used to describe deep
learning models overall, including not only fully-connected networks but also other networks (e.g.,
recurrent, convolutional networks).
4.1.2 Recurrent Neural Networks (RNN). Recurrent Neural Network is a specific subclass of
discriminative deep learning model which are designed to capture temporal dependencies among
input data. Figure 15a describes the activity of a specific RNN node in the time domain. At each
time ranges from [1, t + 1], the node receives an input I 16 and a hidden state c from the previous
time (except the first time). For instance, at time t it receives not only the input It but also the
hidden state of the previous node c t −1 . The hidden state can be regarded as the ‘memory’ of the
nodes which can help the RNN ‘remember’ the historical input.
Next, we will report two typical RNN architectures which have attracted much attention and
achieved great success: long short-term memory and gated recurrent units. They both follow the
basic principles of RNN, and we will pay our attention to the complicated internal structures in
each node. Since the structure is much more complicated than general neural nodes, we call it a
‘cell.’ Cells in RNN are equivalent to nodes in MLP.
Long Short-Term Memory (LSTM). Figure 16a shows the structure of a single LSTM cell at time
t. The LSTM cell has three inputs (It , O t −1 , and c t −1 ) and two outputs (c t and O t ). The operation is
as follows:
It , O t −1 , c t −1 → c t , O t
It denotes the input value at time t, O t −1 denotes the output at the previous time (i.e., time t − 1),
and c t −1 denotes the hidden state at the previous time. c t and O t separately denote the hidden
state and the output at time t. Therefore, we can observe that the output O t at time t not only
related to the input It but also related to the information at the previous time. In this way, LSTM is
empowered to remember the important information in the time domain. Moreover, the essential
idea of LSTM is to control the memory of specific information. For this aim, LSTM cell adopts four
gates: the input gate, forget gate, output gate, and input modulation gate. Each gate is a weight to
control how much information can flow through this gate. For example, if the weight of the forget
gate is zero, the LSTM cell would remember all the information passed from the previous time
t − 1; if the weight is one, the LSTM cell would remember nothing. The corresponding activation
function determines the weight. The detailed data flow as follows:
f = σ (T (It , O t −1 ))
i = σ (T (It , O t −1 ))
o = σ (T (It , O t −1 ))
m = tanh(T (It , O t −1 ))
c t = f ∗ c t −1 + i ∗ m
ht = o ∗ tanh(c t )
where i, f , o and m represent the input gate, forget gate, output gate and input modulation gate,
respectively.
16 The subscript represents the specific time.

, Vol. 1, No. 1, Article 1. Publication date: January 2016.


1:22 Xiang Zhang, et al.

Gated Recurrent Units (GRU). Another widely used RNN architecture is GRU. Similar to LSTM,
GRU attempts to exploit the information from the past. GRU does not require hidden states, however,
it receives temporal information only from the output of time t − 1. Thus, as shown in Figure 16b,
GRU has two inputs (It and O t −1 ) and one output (O t ). The mapping can be described as:
It , O t −1 → O t
GRU contains two gates: reset gate r and update gate z. The former decides how to combine the
input with previous memory. The latter decides how much of previous memory to keep around,
which is similar to the forget gate of LSTM. The data flow as follows:
z = σ (T (It , O t −1 ))
r = σ (T (It , O t −1 ))
O¯t = tanh(T (It , r ∗ O t −1 ))
O t = (1 − z) ∗ O t −1 + z ∗ O¯t
It can be observed that there’s a intermediate variable O¯t which is similar to the hidden state of
LSTM. However, O¯t only works on this time point and unable to pass to the next time point.
We here give a brief comparison between LSTM and GRU since they are very similar. First,
LSTM and GRU have comparable performance as studied by literature. For any specific task, it is
recommended to try both of them to determine which provides better performance. Second, GRU
is lightweight since it only has two gates and without the hidden state. Therefore, GRU is faster to
train and requires few data for generalization. Third, in contrast, LSTM generally works better if
the training dataset is big enough.
4.1.3 Convolutional Neural Networks (CNN). Convolutional Neural Networks is one of the most
popular deep learning models specialized in spatial information exploration. This section will
briefly introduce the working mechanism of CNN. CNN is widely used to discover the latent spatial
information in applications such as image recognition, ubiquitous, and object searching due to
their salient features such as regularized structure, good spatial locality, and translation invariance.
In BCI, specifically, CNN is supposed to capture the distinctive dependencies among the patterns
associated with different brain signals.
We present a standard CNN architecture as shown in Figure 15b. The CNN contains one input
layer, two convolutional layers with each followed by a pooling layer, one fully-connected layer, and
one output layer. The square patch in each layer shows the processing progress of a specific batch of
input values. The key to the CNN is to reduce the input data into a form which is easier to recognize,
with as little information loss as possible. CNN has three stacked layers: the convolutional Layer,
pooling Layer, and fully-connected Layer.
The convolutional layer is the core block of CNN, which contains a set of filters to convolve
the input data followed by a nonlinear transformation to extract the geographical features. In
the deep learning implementation, there are several key hyper-parameters should be set in the
convolutional layer, like the number of filters, the size of each filter, etc. The pooling layer generally
follows the convolutional layer. The pooling layer aims to reduce the spatial size of the features
progressively. In this way, it can help to decrease the number of parameters (e.g., weights and basis)
and the computing burden. There are three kinds of pooling operation: max, min, average. Take
max pooling for example. The pooling operation outputs the maximum value of the pooling area
as a result. The hyper-parameters in the pooling layer includes the pooling operation, the size of
the pooling area, the strides, etc. In the fully-connected layer, as in the basic neural network, the
nodes have full connections to all activations in the previous layer.

, Vol. 1, No. 1, Article 1. Publication date: January 2016.


A Survey on Deep Learning based Brain Computer Interface 1:23

Input Layer Hidden Layer Output Layer Visible Layer Hidden Layer
Encoder

Decoder
(a) Autoencoder (b) Restricted Boltzmann Machine
Visible Hidden Hidden
Layer Layer 1 Layer 2

Input Layer Hidden Layer 1 Hidden Layer 2 Hidden Layer 3 Output Layer
Encoder

Decoder

(c) Deep AE (d) Deep RBM

Fig. 17. Illustration of several standard representative deep learning models. (a) A basic autoencoder contains
three layers where the input layer and the output layer are supposed to have the same values. The process
from the input layer to the hidden layer is an encoder while the process from the hidden layer to the output
layer is a decoder. (b) In the Restricted Boltzmann Machine, the encoder and the decoder share the same
transformation weights. The input layer and the output layer are merged into the visible layer. (c) The stacked
autoencoder has more than one hidden layer. Generally, the number of hidden layers is odd, and the middle
layer is the learned representative features. (d) The deep RBM has one visible layer and multiple hidden
layers, the last layer is the encoded representation.

The CNN is the most popular deep learning model in BCI research, which can be used to exploit
the latent spatial dependencies among the input brain signals like fMRI image, spontaneous EEG,
and so on. More details will be reported in Section 5.

4.2 Representative Deep Learning Models


The essential blocks of representative deep learning models are autoencoders, and restricted
Boltzmann machines17 . Deep Belief Networks are composed of AE or RBM. The representative
models including AE, RBM18 , and DBN, are unsupervised learning methods. Thus, they can learn
the representative features from only the input observations x without the ground truth y. In short,
representative models receive the input data and output a dense representation of the data. There
are various definitions in different studies for several models (such as DBN, Deep RBM, and Deep
AE), in this survey, we choose the most understandable definitions and will present them in detail
in this section.
17 AE and RBM are generally regarded as kind of deep learning although they only have three and two layers, respectively.
18 We regard AE, and RBMas representative methods as most researches in BCI adopt them for feature representation.

, Vol. 1, No. 1, Article 1. Publication date: January 2016.


1:24 Xiang Zhang, et al.

4.2.1 Autoencoder (AE). As shown in Figure 17a, A autoencoder is a neural network that has
three layers: the input layer, the hidden layer, and the output layer. It differs from the standard
neural network, in that the AE is trained to reconstruct its inputs, which forces the hidden layer to
try to learn good representations of the inputs.
The structure of AE contains two blocks. The first block is called the encoder, which embeds the
observation to a latent representation (also called ‘code’),

x h = σ (T (x))

where x h represents the hidden layer. The second block is called the decoder, which decodes the
representation into the original space,

y 0 = σ (T (x h ))

where y 0 represents the output.


AE forces y 0 to be equal to the input x and calculates the error based on the distance between
them. Thus, AE can compute the loss function only by x without the ground truth y

error = ky 0 − x k 2 (2)

Compared to Equation 1, this equation does not involve the variable y because it takes the input x
as the ground truth. This is the reason why AE is able to perform unsupervised learning.
Naturally, one variant of AE is Deep-AE (D-AE) which has more than one hidden layer. We
present the structure of D-AE with three hidden layers in Figure 17c. From the figure, we can
observe that there is one more hidden layer in both the encoder and the decoder. The symmetrical
structure ensures the smoothness of encoding and decoding procedure. Thus, D-AE generally has
an odd number of hidden layers (e.g., 2n + 1) where the first n layers belong to the encoder, the
(n + 1)-th layer works as the code which belongs to both encoder and decoder, and the last n layers
belong to the decoder. The data flow of D-AE (Figure 17c) can be represented as

x h1 = σ (T (x))

x h2 = σ (T (x h2 ))
where x h2 denotes the median hidden layer (the code). Then decode the hidden layer, we can get

x h3 = σ (T (x h2 ))

y 0 = σ (T (x h3 ))
It is almost the same as AE except that D-AE has more hidden layers. Apart from D-AE, AE has
many other variants like denoising autoencoder, sparse autoencoder, contractive AE, etc. Here we
only introduce the D-AE because it is easily confused with the AE-based deep belief network. The
key difference between them will be provided in Section 4.2.3.
The core idea of AE and its variants is simple, which is that condensing the input data x into a
code x h (generally the code layer has lower dimension) and then reconstructing the data based
on the code. If the reconstructed y 0 can approximate to the input data x, it can be demonstrated
that the condensed code x h carries enough information about x, thus, we can regard x h as a
representation of the input data for future operation (e.g., classification).

, Vol. 1, No. 1, Article 1. Publication date: January 2016.


A Survey on Deep Learning based Brain Computer Interface 1:25

Input Layer 1 Output Layer 1

Visible Hidden
Layer 1 Layer 1
Autoencoder 1

Hidden Layer 1

RBM 1

RBM 2
Autoencoder 2
Input Layer 2

Hidden Layer 2
Visible Layer 2 Hidden Layer 2
Output Layer 2

(a) DBN-AE (b) DBN-RBM

Fig. 18. Illustration of deep belief networks. (a) DBN composed of autoencoders. DBN-AE contains multiple
AE components (in this case, two AE), with the hidden layer of the previous AE working as the input layer of
the next AE. The hidden layer of the last AE is the learned representation. (b) DBN composed of RBM. In this
illustration, there are two RBM components with the hidden layer of the first RBM working as the visible
layer of the second RBM. The last hidden layer is the encoded representation. While DBN-RBM and D-RBM
(Figure 17d) have similar architecture, the former is trained greedily while the latter is trained jointly .

4.2.2 Restricted Boltzmann Machine (RBM). Restricted Boltzmann Machine is a stochastic artifi-
cial neural network that can learn a probability distribution over its set of inputs. It contains two
layers including one visible layer (input layer) and one hidden layer, as shown in Figure 17b. From
the figure, we can see that the connection lines between the two layers are bidirectional. RBM is a
variant of Boltzmann Machine with stronger restriction of being without intra-layer connections19 .
Similar to AE, the procedure of RBM also includes two steps. The first step condenses the input data
from the original space to the hidden layer in a latent space. After that, the hidden layer is used to
reconstruct the input data in an identical way. Compared to AE, RBM has a stronger constraint
which is that the encoder weights and the decoder weights should be equal. We have
x h = σ (T (x))
x 0 = σ (T (x h ))
In the above two equations, the weights of T (·) are the same. Then, the error for training can be
calculated by
error = kx 0 − x k 2
We can observe from the Figure 17d that the Deep-RBM (D-RBM) is an RBM with multiple hidden
layers. The input data from the visible layer firstly flow to the first hidden layer and then the second
hidden layer. Then, the code will flow backward into the visible layer for reconstruction.
4.2.3 Deep Belief Networks (DBN). A Deep Belief Network (DBN) is a stack of simple networks,
such as AEs or RBMs [55]. Thus, we divided DBN into DBN-AE (also called stacked AE) which is
composed of AE and DBN-RBM (also called stacked RBM) which is composed of RBM.
As shown in Figure 18a, the DBN-AE contains two AE structures while the hidden layer of the
first AE works as the input layer of the second AE. This diagram has two stages. In the first stage,
19 In a general Boltzmann machine, the nodes in the same hidden layer will connect.

, Vol. 1, No. 1, Article 1. Publication date: January 2016.


1:26 Xiang Zhang, et al.

Input Layer Expectation Hidden Layer Output Layer


ε Real

Real Brain Discriminator


Signals Network

Fake

Latent Random Variable


Generator Fake Brain
Network Signals

Encoder Standard
Decoder
Deviation

(a) Variational Autoencoder (b) Generative Adversarial Networks

Fig. 19. Illustration of generative deep learning models. (a) VAE contains two hidden layers. The first
hidden layer is composed of two components: the expectation and the standard deviation, which are learned
separately from the input layer. The second hidden layer represents the encoded information. ϵ denotes
the standard normal distribution. (b) GAN mainly contain two crucial components: the generator and the
discriminator network. The former receives a latent random variable to generate a fake brain signal while the
latter receives both the real and the generated brain signals and attempts to determine if its generated or not.
In BCI, GAN reconstructs or augments data instead of classification.

the input data feed into the first AE follows the rules introduced in Section 4.2.1. The reconstruction
error is calculated and back propagated to adjust the corresponding weights and basis. This iteration
continues until the AE converges. We get the mapping,

x 1 → x h1
Then, we move on to the second stage where the learned representative code in the hidden layer
x h1 will be used as the input layer of the second AE, which is

x 2 = x h1
and then, after the second AE converges, we have

x 2 → x h2
where x h2 denotes the hidden layer of the second AE, meanwhile, it is the final outcome of the
DBN-AE.
The core idea of AE is that of learning a representative code with lower dimensionality but
containing most information of the input data. The idea behind DBN-AE is to learn a more
representative and purer code.
Similarly, the DBN-RBM is composed of several single RBM structures. Figure 18b shows a DBN
with two RBMs where the hidden layer of the first RBM is used as the visible layer of the second
RBM.
Compare the DBN-RBM (Figure 18b) and D-RBM (Figure 17d). They almost have the same
architecture. Moreover, DBN-AE (Figure 18a) and D-AE (Figure 17c) have similar architecture. The
most important difference between the DBN and the deep AE/RBM is that the former is trained
greedily while the latter is trained jointly. In particular, for the DBN, the first AE/RBM is trained
first, after it converges, the second AE/RBM is trained[74]. For the deep AE/RBM, jointly training
means that the whole structure is trained together, no matter how layers it has.

, Vol. 1, No. 1, Article 1. Publication date: January 2016.


A Survey on Deep Learning based Brain Computer Interface 1:27

4.3 Generative Deep Learning Models


Generative deep learning models are mainly used to generate training samples or data augmentation.
In other words, generative deep learning models play a supporting role in the BCI field to enhance
the training data quality and quantity. After the data augmentation, the discriminative models
will be employed for the classification. This procedure is created to improve the robustness and
effectiveness of the trained deep learning networks, especially when the training data is limited.
In short, the generative models receive the input data and output a batch of similar data. In this
section, we will introduce two typical generative deep learning models: variational Autoencoder
(VAE) and Generative Adversarial Networks (GAN).
4.3.1 Variational Autoencoder (VAE)). Variational Autoencoder, proposed in 2013 [92], is an
important variant of AE, and one of the most powerful generative algorithms. The standard AE
and its other variants can be used for representation but fail in generation for the reason that
the learned code (or representation) may not be continuous. Therefore, we cannot generate a
random sample which is similar to the input sample. In other words, the standard AE does not
allow interpolation. Thus, we can replicate the input sample but cannot generate a similar one. VAE
has one fundamentally unique property that separates it from other AEs, and it is this property
that makes VAE so useful for generative modeling: the latent spaces are designed to be continuous
which allows easy random sampling and interpolation. Next, we will introduce how VAE works.
Similar to the standard AE, VAE can be divided into an encoder and decoder where the former
embeds the input data to a latent space and the latter transfers the data from the latent space to the
original space. However, the learned representation in the latent space is forced to approximate
¯ which is generally set as Standard Gaussian distribution. Based on the
a prior distribution p(z)
reparameterization trick [92], the first hidden layer of VAE is designed to have two parts where
one denotes the expectation µ and another denotes the standard deviation σ , thus we have
µ = σ (T (x))
σ = σ (T (x))
Then, the latent code in the hidden layer is not directly calculated but sampled from a Gaussian
distribution N (µ, σ 2 ). The statistic code
z = µ +σ ∗ε (3)
where ε ∼ N (0, I ). The representation z is forced to a prior distribution, and the distance error K L
is measured by Kullbackfi?!Leibler divergence,
¯
error K L = D K L (z, p(z))
¯ denotes the prior distribution. In the decoder, z is decoded into the output y 0,
where p(z)
y 0 = σ (T (z))
and the reconstruction error is
error r econ = ky 0 − x k 2
The overall error for VAE is combined by the DL divergence and the reconstruction error,
error = error K L + error r econ
The key point of VAE is that all the latent representations z are forced to obey the normal
¯ from the prior distribution
distribution. Thus, we can randomly sample a representation z 0 ∈ p(z)
and then reconstruct a sample based on z . This is why VAE is so powerful in generation.
0

, Vol. 1, No. 1, Article 1. Publication date: January 2016.


1:28 Xiang Zhang, et al.

4.3.2 Generative Adversarial Networks (GAN). Generative Adversarial Networks [57] is proposed
in 2014 and achieved great success in a wide range of research areas (e.g., computer vision and
natural language processing). GAN is composed of two simultaneously trained neural networks
with a generator and a discriminator. The generator captures the distribution of the input data, and
the discriminator is used to estimate the probability that a sample came from the training data.
The generator aims to generate fake samples while the discriminator aims to distinguish whether
the sample is genuine. The functions of the generator and the discriminator are opposite; that’s
why GAN is called ‘adversarial.’ After the convergence of both the generator and the discriminator,
the discriminator ought to be unable to recognize the generated samples. Thus, the pre-trained
generator can be used to create a batch of samples and use them for further operations such as as
classification.
Figure 19b shows the procedure of a standard GAN. The generator receives a noise signal s which
is randomly sampled from a multimodal Gaussian distribution and outputs the fake brain signals
x F . The distributor receives the real brain signals x R and the generated fake sample x F , and then it
predicts whether the received sample is real or fake. The internal architecture of the generator and
discriminator are designed depending on the data types and scenarios. For instance, we can build
the GAN by convolutional layers on fMRI images since CNN has an excellent ability to extract
spatial features. The discriminator and the generator are trained jointly. After the convergence,
numerous brain signals xG can be created by the generator. Thus, the training set is enlarged from
x R to {x R , xG } to train a more effective and robust classifier.

4.4 Hybrid Model


Hybrid deep learning models refers to models which are composed of at least two deep basic learning
models where the basic model is a discriminative, representative, or generative deep learning model.
Hybrid models comprise two subcategories based on their targets: classification-aimed (CA) hybrid
models and the non-classification-aimed (NCA) hybrid models.
Most of the deep learning related studies in BCI are focussed on the first category. Based
on the existing literature, the representative and generative models are employed to enhance
the discriminative models. The representative models can provide more informative and low
dimensional features for the discrimination while the generative models can help to augment the
training data quality and quantity which supply more information for the classification. The CA
hybrid models can be further subdivided into20 : 1) several discriminative models combined to
extract more distinctive and robust features (e.g., CNN+RNN); 2) representative model followed
by a discriminative model (e.g., DBN+MLP); 3) generative + representative model followed by
a discriminative model; 4) generative + representative model followed by a non-deep learning
classifier.
A few NCA hybrid models aim for brain signal reconstruction. For example, St-yves et al. [188]
adopted GAN to reconstruct visual stimuli based on fMRI images.

5 STATE-OF-THE-ART DL TECHNIQUES FOR BCI


In this section, we will systematically summarize the existing state-of-the-art studies for BCI
based on deep learning. Some literature combined deep learning and traditional machine learning
methods are also listed.

20 The representative model followed by a non-deep learning classifier is regarded as a representative deep learning model.

, Vol. 1, No. 1, Article 1. Publication date: January 2016.


A Survey on Deep Learning based Brain Computer Interface 1:29

5.1 Intracortical and ECoG


As a highly invasive method, intracortical brain signals are mainly investigated by researchers in
medical or biological fields who may not pay much attention to deep learning techniques. Thus,
few publications work on intracortical brain signal and ECoG using deep learning algorithms.
Antoniades et al. [12] employed CNN to automatically extract features from epileptic intra-
cortical data in the field of interictal epileptic discharge (IED) detection. IEDs are transients of
electrical activities that appear in brainwaves of patients with epilepsy. Their accurate detection
and localization are vital to the diagnosis and treatment of epilepsy. This paper designed a CNN
model with two convolutional layers to automatically explore the latent features from the raw
input signals. The input data are sliced into many 80 ms segments with 40 ms overlapping, and the
designed model achieved an epilepsy state recognition accuracy of 87.51%. To solve the problem
that the intracortical signals are expensive to collect, the authors also proposed a deep neural
architecture aimed at mapping scalp signals to pseudo-intracranial brain signals [11].
Most ECoG related studies focus on medical healthcare, especially epileptic seizure diagnosis.
For example, Hosseini et al. [77] worked on seizure prediction and localization based on scalp EEG
and ECoG. The ECoG signals were filtered by a fourth-order Butterworth Bandpass filter (0.5 ∼ 150
Hz). After that, the authors manually extracted features through Principal Component Analysis
(PCA), ICA, and Differential Search Algorithm (DSA). Then they compared two deep learning
structures. The first structure is composed of three convolutional layers followed by a softmax
layer, which achieved the binary recognition accuracy of 96%. The second structure adopted a
DBN-AE model with two AE components, and the learned representations were fed into a softmax
layer for classification, which obtained an accuracy of around 93%. This work demonstrated that
CNN is more powerful than DBN in feature engineering of seizure signals. Kiral-Kornek et al. [93]
attempted to develop an epileptic seizure prediction system operatable on a wearable device for
ultra-low power applications. They proposed an MLP algorithm for the prediction and achieved a
mean sensitivity of 69% and a mean time in warning of 27%. Apart from seizure diagnosis, Xie et al.
[225, 226] focused on finger trajectory tracking from ECoG signals. They developed a hybrid deep
learning model based on convolutional layer and LSTM cells. The main contribution of this paper
is that they employed CNN for not only spatial convolution but also temporal convolution. The
motivation of temporal convolutional layer was to make the model learn the optimal band partition
in a data-driven way. The convolution operation produced fixed-length vector representations to
send to the LSTM cell for trajectory tracking. Each ECoG segment lasts for 1 second with 40 ms
overlapping. Thus, the model was enabled to receive a stream of ECoG and form a complete finger
trajectory.

5.2 EEG
More than half of the recent publications are related to EEG signals because this approach is
non-invasive, high-portable and low-cost. In this section, we will summarize the state-of-the-art
research based on three aspects: EEG oscillations, evoked potentials, and ERD/ERS.
5.2.1 EEG Oscillatory. Spontaneous EEG has a vast range of applications since it is well suited
to a range of different scenarios. In particular, spontaneous EEG includes sleeping EEG, motor
imagery EEG, emotional EEG, mental disease EEG, and others. Next, we will present the studies in
each scenario and the deep learning models used.
(1) Sleeping EEG. Sleep quality is significant for diagnosing sleep disorders and cultivating
healthy habits. Sleep EEG is mainly used to recognize sleep stages (or sleep score/state) [35]. In
Rechtschaffen and Kales (R&K) rules, the sleep stages include wakefulness, non-REM (rapid eye
movement) 1, non-REM 2, non-REM 3, non-REM 4, and REM. However, there is no clear distinction

, Vol. 1, No. 1, Article 1. Publication date: January 2016.


1:30 Xiang Zhang, et al.

between non-REM 3 and non-REM 4. Therefore, they are combined into slow wave sleep (SWS)
[241]. The American Academy of Sleep Medicine (AASM) recommends segmentation of sleep in
five stages: wakefulness, non-REM (rapid eye movement) 1, non-REM 2, SWS, and REM. Generally,
in sleep stage analysis, the EEG signals are preprocessed by a filter which has various passband in
different papers, but most of the studies notched at 50 Hz to remove powerline noise. The EEG
signals are usually segmented into 30s windows.
(i) Discriminative models. Many publications have adopted CNN for sleep-stage classification on
single-channel EEG [186, 206]. Viamala et al. [214] manually extracted time-frequency features
from sleeping EEG signals and adopted a CNN algorithm to analyze them. The EEG signal collected
from Fpz − Cz andPz − Oz channels, was sliced into 30 s segments. The employed CNN achieved
an accuracy of 86% in five-class classification. Shahin et al. [177] manually extract 57 features in
the frequency domain and fed them into an MLP for classification, which obtained an accuracy
of 90%in insomnia detection. Fernande et al. [49] adopted CNN to analyze physiological signals
including EEG, EOG, and EMG. The model was evaluated over the Sleep Heart Health Scoring
dataset and achieved a precision of 91%, recall of 90%, and F-1 score of 90%.
RNN is also often used in sleep disorder detection. Biswal et al. [27] demonstrated that RNN
performed better than MLP, and CNN for sleep stage prediction. Tsiouris et al. [207] extracted
many features from the time domain, frequency domain, correlation, and graph theoretical features.
An LSTM was employed to discover the latent dependencies of the features for seizure detection.
(ii) Representative models. Zhang et al. [241] combined a DBN-RBM with three RBMs for sleep
feature extraction and traditional machine learning classifiers (e.g., SVM) for classification. Tan et
al. [201] adopted a DBN-RBM algorithm to detect sleep spindles from the extracted PSD features of
the sleeping EEG signals. They finally reached an F-1 measure of 92.78% in a local dataset.
(iii) Hybrid models. Manzano et al. [128, 129] proposed a multi-view model to predict sleep stage
by combining CNN and MLP. The CNN was employed to receive the raw EEG data in the time
domain while the MLP received the spectrum obtained by a Short-Time Fourier Transform (STFT)
between 0.5-32 Hz. Supratak et al. [197] proposed a model by combining a multi-view CNN and
LSTM for automatic sleep-stage scoring based on raw single-channel EEG. The proposed method
utilized convolutional neural networks to extract time-invariant features, and bidirectional-long
short-term memory to learn transition rules among sleep stages. Dong et al. [44] proposed a hybrid
deep learning model aimed at temporal sleep stage classification. They have taken advantage of MLP
for detecting hierarchical features and LSTM for sequential data learning to optimize classification
performance with single-channel recordings.
(2) MI EEG. Extreme Learning Machine (ELM) [46] Deep learning models have shown the
superior on the classification of MI EEG and real-motor EEG [69, 145].
(i) Discriminative models. CNN is widely used for the recognition of MI EEG [245]. On the
one hand, some studies CNN is only used as a classifier to recognize manually extracted features
[86, 232]. Uktveris et al. [210] extracted a large number of EEG features including Mean channel
energy (MCE), Mean window energy (MWE), Channel variance (CV), Mean band power (BP), etc.
All the extracted features were sent into a 2-D CNN for classification. Lee et al. [100] first processed
the MI EEG signals through wavelet transformation and then manually extracted PSD from mu
and beta bands. Finally, they employed a CNN model for recognition and achieved an accuracy
of 78.93%. Apart from CNN, Zhang et al. [247] used a modified LSTM structure to learn affective
information from EEG signals to control smart home appliances.
On the other hand, CNN deals with the raw EEG data based on feature engineering and classifi-
cation results [202]. Wang et al. [219] designed a fast convolutional feature extraction approach
based on CNN to learn the latent features from MI-EEG signals. Several weak classifiers are applied

, Vol. 1, No. 1, Article 1. Publication date: January 2016.


A Survey on Deep Learning based Brain Computer Interface 1:31

to choose important features for the final classification. Hartmann et al. [69] worked on the
EEG signals corresponding to real motor action. They investigated how the CNN represented
spectral features through the sequence of intermediate stages of the network, which showed higher
sensitivity to EEG phase features at earlier stages and higher sensitivity to EEG amplitude features
at later stages. Moreover, MLP is also applied for MI EEG recognition [193].
(ii) Representative models. DBN is widely employed for MI EEG classification because of its
high representative ability [97, 121]. Ren et al. [162] applied a convolutional DBN based on RBM
components. They claimed that the DBN worked better in feature representation than traditional
hand-crafted features (e.g., CSP, band powers). Li et al. [103] processed EEG signals with discrete
wavelet transformation and then applied a DBN-AE based on denoising AE. They achieved an
accuracy of 73.86% over a local MI EEG dataset. The authors also used denoising AE to generate the
missing values in incomplete EEG signals such as an EEG segment with a portion of data removed
(unevenly spaced). Rekar et al. [160] employed an AE model for feature extraction followed by a
KNN classifier, which achieved an accuracy of 72.38% in binary classification over a local dataset.
Nurse et al. [146] proposed a model combining MLP with Genetic Algorithm (GA) where the
GA was used for optimal hyper-parameter selection (e.g., the number of hidden layers in MLP)
and the MLP worked as the classifier. Zhang et al. [252] combined AE with an XGBoost classifier
to recognize the EEG signals in a multi-person scenario. The authors also proposed a complex
framework by combining LSTM with reinforcement learning to classify multi-modality signals
[248, 253].
(iii) Hybrid models. Several studies proposed hybrid models for the recognition of MI EEG [41].
Fraiwan et al. [50] combined DBN with MLP for neonatal sleep state identification. Twelve features
were extracted from the time and frequency domain of the sleeping EEG signals, which were
refined by a designed DBN-AE. After that, the MLP classifier gave an accuracy of 80.4% on a public
dataset. Tabar et al. [198] combined the time, frequency and location information of the EEG
signals as the input data which would be fed into a CNN for high-level feature extraction. The
features were classified through a DBN-AE with seven AEs while the hidden layer of AE only
had two nodes which corresponded to the probability of the two labels. Tan et al. [200] proposed
a complicated system to achieve multimodal EEG classification. A denoising AE was employed
for dimensional reduction. A multi-view CNN combined with RNN was proposed to discover the
latent temporal and spatial information from the low-dimension representations. They obtained an
average accuracy of 72.22% over the IIa dataset from BCI competition IV.
(3) Emotional EEG. The emotion of an individual can be evaluated by three aspects: the valence,
arousal, and dominance. Each aspect can be rated by an integer between 1 to 9 or can be divided
into positive and negative. The combination of the three aspects forms the emotions which are
familiar to us like fear, sadness, anger. The subject’s EEG signals could be used to predict the
affective state.
(i) Discriminative models. In the beginning, the basic MLP is adopted to classify manually
extracted features when deep learning first arose [234]. Frydenlund et al. [51] extracted the average
and standard deviation of each EEG band and then fed them into an MLP for emotional affect
estimation.
However, CNN is the most popular in the area of EEG based emotion prediction [105, 117]. Li et
al. [105] proposed a hierarchical CNN to implement the EEG-based emotion classifier (positive,
negative and neutral) in a movie-watching task. Differential Entropy (DE) is calculated as the
main feature. This paper first proposes that converting multi-channel EEG signals into a 2-D
matrix, which takes advantage of the spatial dependencies among EEG channels. For the emotion
recognition task, this paper compared the proposed CNN with a DBN-AE and demonstrated that

, Vol. 1, No. 1, Article 1. Publication date: January 2016.


1:32 Xiang Zhang, et al.

CNN has better performance than DBN, which is similar to [77]. Wang et al. [216] employed a CNN
algorithm to classify emotional EEG signals. Of note is the fact that they augmented the training
set by generating new EEG samples by adding Gaussian noise to the original samples. Li et al. [106]
proposed a novel hierarchical convolutional neural network (HCNN) to recognize the subject’s
emotional state (positive, neutral, and negative) and obtained an accuracy of 88.2%. In the HCNN
structure, each convolutional kernel only has localized receptive field, so the kernels can capture
the correlations among adjacent electrodes, which might be of great value for the recognition task.
RNN and its variants are another group of widely used discriminative models. Talathi [199]
utilized a discriminative deep learning model composed of GRU cells to detect early seizure disease
and achieved competitive performance. Zhang et al. [244] proposed a spatial-temporal recurrent
neural network (STRNN) to integrate the feature learning from both spatial and temporal informa-
tion. To capture those spatially co-occurrent variations of human emotions, a multidirectional RNN
layer can capture long-range contextual cues by traversing the spatial regions of each temporal
slice along with different directions. Then, a bi-directional temporal RNN layer is further used
to learn the discriminative features characterizing the temporal dependencies of the sequences
produced by the spatial RNN layer.
(ii) Representative models. DBN, especially DBN-RBM, is widely used for unsupervised repre-
sentation ability in emotion recognition [53, 107, 110]. For instance, Xu et al. [227] proposed a
DBN-RBM algorithm with three RBMs and an RBM-AE to predict the subject’s affective state.
Nevertheless, it is not a strictly semi-supervised method: the model reported by [227] is composed
of unsupervised feature representation and a supervised softmax layer. The authors also tried
to manually extract the PSD features from 14 narrow-down bands of the EEG signals and then
fed them into DBN-RBM for classification [228]. For Alzheimer’s Disease diagnosis, Zhao et al.
[254] adopted DBN-RBM with three RBMs to extract informative representations after filtering
(0.5 ∼ 30 Hz). The proposed representative model is combined with a traditional classifier (SVM)
and achieved an accuracy of 92%. Another work combined DBN-RBM with Hidden Markov Model
(HMM) and achieved an accuracy of 87.62% in a local dataset [258].
Compared to other repetitive models, D-RBM only appears in a few studies. Zheng et al. [255, 256]
introduced a D-RBM with five hidden RBM layers to investigate critical frequency bands and
channels in emotion recognition. The authors claimed that they employed a DBN-RBM; however,
the RBMs are trained jointly. Thus it is regarded as D-RBM in this survey. Jia et al. [85] proposed
an interesting algorithm which is composed of RBMs. The algorithm contains a channel selection
component and an RBM classifier. The data from each EEG channel are reconstructed through
RBM; then, the channels with high error are eliminated. Then the representative features of the
residual channels are sent to D-RBM for affective state recognition.
Emotion is affected by many subjective and environmental factors, such as gender, fatigue,
etc. Yan et al. [118, 230] investigated the differences between males and females in emotion
recognition using EEG and eye movement data. They proposed a novel model called Bimodal Deep
AutoEncoder (BDAE) which is, however, actually formed by RBMs. The BDAE received both EEG
and eye movement features and shared the information in a fusion layer which connected with
an SVM classifier. The results showed that the fearful emotion is more diverse among women
compared with men, and men behave more diversely on the sad emotion compared with women.
Moreover, individual differences in fear are more pronounced than in the other three emotions for
females.
To overcome the mismatched distribution among the samples collected from different subjects
or different experimental sessions, Chai et al. [34] proposed an unsupervised domain adaptation
technology which is called the subspace alignment autoencoder (SAAE). SAAE combined an AE

, Vol. 1, No. 1, Article 1. Publication date: January 2016.


A Survey on Deep Learning based Brain Computer Interface 1:33

and a subspace alignment solution, which could take advantage of both nonlinear transformation
and a consistency constraint. The proposed approach obtained a mean accuracy of 77.88% in a
person-independent scenario.
(iii) Hybrid models. One commonly-used hybrid model is a combination of RNN and MLP. For
example, Alhagry et al. [8] employed an LSTM architecture for feature extraction from emotional
EEG signals, and the features are forwarded into an MLP for classification, which got 85.65%,
85.45%, and 87.99% accuracy on arousal, valence, and liking classes, respectively. Furthermore, Yin
et al. [237] proposed a multi-view ensemble classifier to recognize emotions using multimodal
physiological signals. The ensemble classifier contains several D-AEs with three hidden layers and
a fusion structure. Each D-AE receives one physiological signal (e.g., EEG, EOG, EMG) and then
sends the outputs of D-AE to a fusion structure which is composed of another D-AE. At last, an
MLP classifier classifies the mixed features. Kawde et al. [90] implemented an affect recognition
system by combining a DBN-RBM for effective feature extraction and an MLP for classification.
(4) Mental Disease EEG. A large number of researchers exploited EEG signals to diagnose
neurological disorders, especially epileptic seizures [240].
(i) Discriminative models. CNN is widely used in the automatic detection of epileptic seizures
[3, 173, 211, 218]. For example, Johansen et al. [87] adopted CNN to work on the high-passed
filtered (¿1 Hz) EEG signals of epileptic spikes and achieved an AUC of 94.7%. Acharya et al.
[4] employed a CNN model with 13 layers (5 convolutional layers, five pooling layers, and three
fully-connected layers) on depression detection. The method was evaluated on a local dataset with
30 subjects (15 normal and 15 depressed) and achieved the accuracies of 93.5% and 96.0% using
EEG signals from the left and right hemisphere, respectively. Morabito et al. [138] exploited a
CNN structure to extract suitable features of multi-channel EEG signals to classify Alzheimer’s
Disease from a prodromal version of dementia (Mild Cognitive Impairment, MCI) and age-matched
Healthy Controls (HC). The EEG signals are filtered in bandpass (0.1 ∼ 30 Hz) and finally achieved
an accuracy of around 82% for three-class classification.
In some research, the discriminative model is only employed for feature extraction. For example,
Ansari et al. [10] used CNN to extract the latent features which are fed into a Random Forest
classifier for the final seizure detection in neonatal babies. Chu et al. [38] employed CNN for feature
extraction which was sent to a random forest for schizophrenia recognition.
REM Behavior Disorder (RBD) may cause many mental disorder diseases like Parkinson’s disease
(PD). Ruffini et al. [164] described an Echo State Networks (ESNs) model to distinguish RBD
from healthy individuals. ESN, as a particular class of RNN, implements nonlinear dynamics with
memory and seem ideally poised for the classification of complex time series data. The central
concept in ESNs and related types of so-called fireservoir computationfi systems is to have data
inputs drive a semi-randomly connected, large, fixed recurrent neural network (the fireservoirfi)
where each node/neuron in the reservoir is activated in a nonlinear way.
(ii) Representative models. For disease detection, one commonly used method is adopting a
representative model (e.g., DBN) followed by a softmax layer for classification [209, 229]. Page et al.
[149] adopted DBN-AE to extract useful features from seizure EEG signals. The extracted features
were fed into a traditional logistic regression classifier for seizure detection. Al et al. [7] proposed
a multi-view DBN-RBM structure to analyze EEG signals from depressed patients. The proposed
approach contains multiple input pathways, composed of two RBMs, while each corresponded to
one EEG channel. All the input pathways would merge into a shared structure which is composed
of another RBMs. The results showed that the multi-view DBN-RBM achieved competitive results.
Yuan et al. [239] extract EEG context features in parallel by using global principal component

, Vol. 1, No. 1, Article 1. Publication date: January 2016.


1:34 Xiang Zhang, et al.

analysis (GPCA), deep denoising AE, and EEG embeddings, respectively. The multi-features are
concatenated into a fixed-length feature vector for seizure classification.
Some papers favor preprocessing the EEG signals through dimensionality reduction methods
such as PCA and ICA [78] while others prefer to direct fed the raw signals to the representative
model [111]. Lin et al. [111] proposed a sparse D-AE with three hidden layers to extract the
representative features from epileptic EEG signals while Hosseini et al. [78] adopted a similar
sparse D-AE with two hidden layers.
(iii) Hybrid models. A popular hybrid method is a combination of RNN and CNN. Shah et al. [176]
investigated the performance of CNN-LSTM on seizure detection after channel selection. They used
a reduced number of channels ranging from 8 to 20, and achieved sensitivities between 33% and
37% with false alarms in the range of 38% and 50%. Golmohammadi et al. [56] proposed a hybrid
architecture for automatic interpretation of EEG that integrates temporal and spatial context for
sequential decoding of EEG events. 2D and 1D CNNs capture the spacial features while LSTM
networks capture the temporal features. The authors claimed sensitivity of 30.83% and a specificity
of 96.86% on the well-known TUH EEG seizure corpus.
In the detection of early-stage Creutzfeldtfi?!Jakob Disease (SJD), Morabito et al. [139] combined
D-AE and MLP together. The EEG signals of SJD were first filtered by bandpass (0.5∼70 Hz) and
then fed into a D-AE with two hidden layers for feature representation. At last, the MLP classifier
obtained the accuracy of 81∼ 83% in a local dataset. Convolutional autoencoder, replacing the
fully-connected layers in a standard AE by convolutional and de-convolutional layers, is applied to
extract the seizure features in an unsupervised manner [223].
(5) Data augmentation. Generative models such as GAN can be used for data augmentation in
BCI classification [1]. Palazzo et al. [150] first demonstrated that brain activity EEG signals encode
visually-related information that enables to discriminate between visual object categories accurately.
Then, they extracted a more compact class-dependent representation of EEG data using recurrent
neural networks. At last, they used the learned EEG manifold to condition image generation
employing GANs, which, during inference, will read EEG signals and convert them into images.
Kavasidis et al. [88] aiming at converting EEG signals into images. The EEG signals were collected
when the subjects were observing images on a screen. An LSTM layer was employed to extract the
latent features from the EEG signals, and the extracted features were regarded as the input of a GAN
structure. The generator and the discriminator of the GAN were both composed of convolutional
layers. The generator was supposed to generate an image based on the input EEG signals after the
pre-training. Abdelfattach et al. [1] adopted a GAN on seizure data augmentation. The generator
and discriminator are both composed of fully-connected layers. The authors demonstrated that
GAN outperforms AE and VAE. After the augmentation, the classification accuracy increased
dramatically from 48% to 82%.
(6) Others Other researchers have explored a wide range of interesting topics. The first one is
how EEG affected by audio/visual stimuli. This differs from the potentials evoked by audio/visual
stimulations because the stimuli in this phenomenon are constant instead of fluctuating at a
particular frequency. Stober et al. [190, 191] claimed that EEG signals of rhythm perception might
contain enough information to distinguish different rhythm types/genres or even identify the
rhythms themselves. The authors conducted an experiment where 13 participants were stimulated
by 23 rhythmic stimuli including 12 East African and 12 Western stimuli. For the 24-category
classification, the proposed CNN achieved a mean accuracy of 24.4%. After that, the authors
exploited convolutional AE for feature learning and CNN for classification and achieved an accuracy
of 27% for 12-class classification [192]. Sternin et al. [189] adopted CNN to extract discriminative
features from the EEG signals to distinguish whether the subject was listening or imaging music.

, Vol. 1, No. 1, Article 1. Publication date: January 2016.


A Survey on Deep Learning based Brain Computer Interface 1:35

Similarly, Sarkar et al. [170] designed two deep learning models to recognize the EEG signals
invoked by audio or visual stimuli. For this binary classification task, the proposed CNN and
DBN-RBM with three RBMs achieved the accuracy of 91.63% and 91.75%, respectively. Furthermore,
the spontaneous EEG could be used to distinguish the user’s mental state (logical versus emotional)
[21].
Moreover, some researchers focus on the impact of cognitive load[180] or physical workload
[58] on EEG . Bashivan et al. [23] first extracted informative features through wavelet entropy
and band-specific power which were fed into a DBN-RBM for further refining. At last, an MLP is
employed for cognitive load level recognition. The authors, in another work [22], also aimed to find
representations that are invariant to inter- and intra-subject differences from multi-channel EEG
time-series in the context of the mental load classification task. They transformed EEG activities into
a sequence of topology-preserving multi-spectral images and then trained a recurrent-convolutional
network to preserve the spatial, spectral, and temporal features of the EEG signals. Yin et al. [236]
collected the EEG signals from different mental workload levels (e.g., high and low) for binary
classification. The EEG signals were filtered by a low-pass filter, transformed to the frequency
domain and the power spectral density (PSD) was calculated. The extracted PSD features were fed
into a denoising D-AE structure for future refining. They finally achieved an accuracy of 95.48%. Li
et al. [108] worked on the recognition of mental fatigue level including alert, slight fatigue, and
severe fatigue. They adopted a simple DBN-RBM to extract the related features from single-channel
EEG.
In addition, EEG based driver fatigue detection is a popular area of research[33, 45, 65, 65, 67].
Huang et al. [82] designed a 3D CNN to predict reaction time in drowsy driving. This is useful to
reduce traffic accidents. Hajinoroozi et al. [64] adopted a DBN-RBM to handle the EEG signals
which were processed by ICA. They achieved an accuracy of around 85% in binary classification
(‘drowsy’ or ‘alert’). The strength of this paper is that they evaluated the DBN-RBM on three levels:
time samples, channel epochs, and windowed samples. The experiments showed that the channel
epoch level provided the best performance. San et al. [169] combined deep learning models with a
traditional classifier to detect driver fatigue. The model contains a DBN-RBM structure followed by
an SVM classifier, which achieved a detection accuracy of 73.29%. Almogbel et al. [9] investigated
the drivers’ mental state under high workload and low workload. A proposed CNN is claimed to
detect the driver’s cognitive workload directly based on the raw EEG signals.
Research into detection of eye state has shown exceedingly high accuracy. Narejo et al. [141]
explored the detection of eye state (closed or open) based on EEG signals. They tried a DBN-RBM
with three RBMs and a DBN-AE with three AEs and achieved a very high accuracy of 98.9%. Reddy
et al. [159] tried a simpler structure, MLP, for eye state detection and got a slightly lower accuracy
of 97.5%.
There are still a lot of promising areas that have not drawn much attention to date. Baltatzis et al.
[18] adopted CNN to detect school bullying through EEG when watching the specific video. They
achieved 93.7% and 88.58% for binary and four-class classification. Khurana et al. [91] proposed
deep dictionary learning that outperformed several deep learning methods. Volker et al. [215]
evaluated the use of Deep CNN in a flanker task, which achieved an averaging accuracy of 84.1%
within subject and 81.7 on unseen subjects. Zhang et al. [246] combined CNN and graph network
to discover the latent information from the EEG signal.
Miranda-Correa et al. [137] proposed a cascaded framework by combing RNN and CNN to
predict individuals’ affective level and personal factors (Big-five personality traits, mood, and social
context). An experiment conducted by Putten et al. [157] attempted to identify the user’s gender
based on their EEG signals. They employed a standard CNN algorithm and achieved the binary

, Vol. 1, No. 1, Article 1. Publication date: January 2016.


1:36 Xiang Zhang, et al.

classification accuracy of 81% over a local dataset. The detection of emergency braking intention
could help to reduce the responses time. Hernandez et al. [73] demonstrated that the driver’s
EEG signals could distinguish braking intention and normal driving state. They combined a CNN
algorithm which achieved the accuracy of 71.8% in binary classification. Behncke et al. [24] applied
deep learning, a CNN model, in the context of robot assistive devices. They attempted to use CNN
to improve the accuracy of decoding robot errors from EEG while the subject watching the robot
both during an object grasping and a pouring task.
Teo et al. [203, 204] tried to combine the BCI and recommender system, which predicted the
user’s preference by EEG signals. A cohort of 16 users was shown 60 bracelet-like objects as
rotating visual stimuli (a 3D object) on a computer display while their preferences and EEGs were
recorded. Then, an MLP algorithm was adopted to classify whether the user liked or disliked the
object. This exploration got the prediction accuracy of 63.99%. Some researchers have tried to
explore a common framework which can be used for various BCI paradigms. Lawhern et al. [99]
introduced a compact CNN for EEG-based BCI. The authors described the use of depth-wise and
separable convolutions to construct an EEG-specific model which encapsulates well-known EEG
feature extraction concepts for BCI. The proposed EEGNet is evaluated on four BCI paradigms:
P300 visual-evoked potentials, error-related negativity responses (ERN), movement-related cortical
potentials (MRCP), and sensory-motor rhythms (SMR).

5.2.2 Evoked Potential. (1) ERP In most situations, ERP signals are analyzed in terms of the P300
peak. Likewise, almost all the studies on P300 are based on the ERP paradigm. Therefore, in this
section, a majority of the P300 related publications are introduced in the subsection of VEP/AEP
according to the paradigm.
(i) VEP VEP is one of the most popular subcategories of ERP [63, 187, 235]. Ma et al. [136] worked
on motion-onset VEP (mVEP) by extracting representative features through deep learning. They
used improved multi-level compressed sensing combined with a genetic algorithm as the first stage
to compress the original mVEP EEG. The compressed signals were sent to a DBN-RBM algorithm to
capture the more abstract high-level features. Maddula et al. [123] filtered the P300 signals to visual
stimuli using a bandpass filter (2 ∼ 35 Hz) and then fed them into a proposed hybrid deep learning
model for further analysis. The model included a 2D CNN structure to capture the spatial features,
followed by an LSTM layer for temporal feature extraction. Liu et al. [116] combined a DBN-RBM
representative model with an SVM classifier for concealed information test and achieved a high
accuracy of 97.3% over a local dataset. Gao et al. [52] employed an AE model for feature extraction
followed by an SVM classifier. In the experiment, each segment contains 150 points which were
divided into five time-steps, and each step had 30 points. This model achieved an accuracy of 88.1%
over a local dataset. A wide range of P300 related studies are based on the P300 speller [179] which
allows the user to write characters, as introduced in Section 3.3.2. Cecotti et al. [29] tried to increase
the P300 detection accuracy for more precise word-spelling. A new model was presented based on
CNN, which including five low-level CNN classifiers with the different feature set and the final
high-level results are voted by the low-level classifiers. The highest accuracy reached 95.5% over
the dataset II from the third BCI competition. Liu et al. [115] proposed a Batch Normalized Neural
Network (BN3 ) which is a variant of CNN in P300 speller. The proposed method consists of six
layers, and the batch normalization was operated in each batch. Kawasaki et al. [89] employed an
MLP model to detect P300 segments from non-P300 segments and achieved the accuracy of 90.8%.
(ii) AEP A few works focused on the recognition of AEP. For example, Carabez et al. [28]
proposed and tested 18 CNN structures to identify and classify single-trial AEP signals. In the
experiment, the auditory stimuli following the oddball paradigm were presented via earphones
from six different virtual directions. The authors found that the models that consider data from

, Vol. 1, No. 1, Article 1. Publication date: January 2016.


A Survey on Deep Learning based Brain Computer Interface 1:37

both the time and space domains and those that overlap in the pooling process usually offer better
results regardless of the number of CNN layers. The AEP signals are bandpass filtered between
0.1 ∼ 8 Hz and downsampled from 256 Hz to 25 Hz. The experimental results showed that the
downsampled data works better.
(iii) RSVP Among various VEP diagrams, RSVP has attracted much attention [59]. In the analysis
of RSVP, a number of discriminative deep learning models (e,g., CNN [29, 112, 185] and MLP [131])
have achieved great success. A common preprocessing method used in RSVP signals is frequency
filtering. The pass bands are generally ranged from0.1 ∼ 50 Hz [126, 178]. Cecotti et al. [30]
worked on the classification of ERP signals in the RSVP paradigm. This paper proposed a CNN
algorithm with a layer dedicated to spatial filtering for the detection of the specific target in RSVP.
In the experiment, the images of faces and cars were regarded as target or non-target, respectively.
Each image was presented for 500 ms and immediately replaced by the subsequent image. In each
session, the target probability was 10%. The proposed model offered an AUC of 86.1%. Yoon et al.
[238] provided a way to analyze the spatial and temporal features of ERP. The authors trained a
CNN with two convolutional layers whose feature maps represented spatial and temporal features
of the event-related potential. The results demonstrated that literate subjects’ ERP shows a high
correlation between the occipital lobe and parietal lobe, whereas illiterate subjects only show the
correlation between neural activities from the frontal lobe and central lobe. Most importantly,
they found that the P700 may be used to distinguish illiterate and literate subjects when the P300
peak is not shown in some subjects’ ERP signals. Hajinoroozi et al. [66] adopted a CNN model for
cross-subject and cross-task detection of RSVP. CNN was designed to capture both temporal and
spatial features. The experimental results showed that CNN worked good in cross-task but failed to
get satisfying performance in the cross-subject scenario. Mao et al. [130] compared three different
deep learning models in the prediction of whether the subject had seen the target or not. The MLP,
CNN, and DBN models obtained the AUC of 81.7%, 79.6%, and 81.6%, respectively. The author also
applied a CNN model to analyze the RSVP signals for person identification [132].
Representative deep learning models are also applied in RSVP. Vareka et al. [212] verified if deep
learning performs well for single trial P300 classification. They conducted an RSVP experiment
while the subjects were asked to recognize the target from non-target and distracters. The A
DBN-AE was implemented and compared with some non-deep learning algorithms. The DBN-AE
was composed of five AEs while the hidden layer of the last AE only has two nodes which can be
used for classification through softmax function. Finally, the proposed model achieved the accuracy
of 69.2%. Manor et al. [127] applied two deep learning models to deal with the RSVP signals after
lowpass filtering (0 ∼ 51 Hz). The discriminative CNN achieved the accuracy of 85.06%. Meanwhile,
the representative convolutional D-AE achieved the accuracy of 80.68%.
(3) SSEP. Most deep-learning based studies in the SSEP field focus on SSVEP, such as [6, 98].
SSVEP are neural oscillations from the parietal and occipital regions of the brain evoked from
flickering visual stimuli. Attia et al. [14] aimed at finding an appropriate intermediate representation
of SSVEP. A hybrid method combining CNN and RNN was proposed to capture the meaningful
features from the time domain directly, which achieved an accuracy of 93.59%. Waytowich et
al. [221] applied a compact CNN model to directly work on the raw SSVEP signals without any
hand-crafted features. The reported cross subject mean accuracy was approximately 80%. Thomas
et al. [205] first filtered the raw SSVEP signals through a bandpass filter (5 ∼ 48 Hz) and then
operated discrete FFT on consecutive 512 points. The processed data were classified by a CNN
(69.03%) and an LSTM (66.89%) independently.
Perez et al. [153] adopted a representative model, a sparse AE, to extract the distinct features from
the SSVEP from multi-frequency visual stimuli. The proposed model employed a softmax layer for

, Vol. 1, No. 1, Article 1. Publication date: January 2016.


1:38 Xiang Zhang, et al.

the final classification and achieved an accuracy of 97.78%. Kulasingham et al. [96] classified SSVEP
signals in the context of a guilty knowledge test. The authors applied DBN-RBM and DBN-AE
independently and achieved the accuracy of 86.9% and 86.01%, respectively. Hachem et al. [62]
investigated the influence of fatigue on SSVEP through an MLP model during wheelchair navigation.
The goal of this study was to seek the key parameters to switch between manual, semi-autonomous,
and autonomous wheelchair command. Aznan et al. [16] explored SSVEP classification where the
signals were collected through dry electrodes. The dry signals were more challenging for the lower
SNR then standard EEG signals. This study applied a CNN discriminative model and achieved the
highest accuracy of 96% over a local dataset.

5.2.3 ERD/ERS. ERD/ERS is not widely used in BCI due to drawbacks like unstable accuracy
cross subjects [81]. In most situations, the ERD/ERS is regarded as a specific feature of EEG powers
for further analysis [41, 198]. In particular, the ERD/ERS were calculated as relative changes in
power concerning baseline [37]:
P e − Pb
ERD/ERS =
Pb
, where Pe denotes the average power over a one-second window during the event and Pb denotes
the baseline average power in a one-second window preceding the event. Generally, the baseline
refers to the rest state. For example, Sakhavi et al. [167] calculated the ERD/ERS map and analyzed
the different patterns among different tasks. The analysis demonstrated that changes in energy
should be considered because static energy is not sufficiently discriminatory for some tasks.

5.3 fNIRS
Up to now, only a few researchers have focussed on deep learning based fNIRS. Naseer et al.
[143] analyzed the difference between two mental tasks (mental arithmetic and rest) based on
fNIRS signals. The authors manually extracted six features from the prefrontal cortex fNIRS and
compared six different classifiers. The results demonstrated that the MLP with the accuracy of
96.3% outperformed all the traditional classifiers including SVM, KNN, naive Bayes, etc. Huve et al.
[83] classified the fNIRS signals which were collected from the subjects during three mental states
including subtractions, word generation, and rest. The employed MLP model achieved an accuracy
of 66.48% based on the hand-crafted features (e.g., the concentration of OxyHb/DeoxyHb). After
that, the authors study mobile robot control through fNIRS signals and got a binary classification
accuracy of 82% (offline) and 66% (online) [84]. Chiarelli et al. [37] exploited the combination of
fNIRS and EEG for left/right MI EEG classification. Sixteen features extracted from fNIRS signals
(eight from OxyHb and eight from DeoxyHb) were fed into an MLP classifier with four hidden
layers.
On the other hand, Hiroyasu et al. [75] attempted to detect the gender of the subject through
their fNIRS signals. The authors employed a denoising D-AE with three hidden layers to extract
distinctive features to be fed into an MLP classifier for gender detection. The model was evaluated
over a local dataset and achieved the average accuracy of 81%. This paper also pointed out that,
compared with PET and fMRI, fNIRS is cheaper and can measure cerebral blood flow changes with
higher temporal resolution.

5.4 fMRI
Recently, several deep learning methods have been applied to fMRI analysis, especially on the
diagnosis of cognitive impairment [213, 222].
(1) Discriminative models

, Vol. 1, No. 1, Article 1. Publication date: January 2016.


A Survey on Deep Learning based Brain Computer Interface 1:39

Among the discriminative models, CNN is a promising model to analysis fMRI [182]. For
example, Havaei et al. [71] presented brain tumor segmentation approaches based on fMRI. A novel
CNN algorithm was proposed, which can capture both the local features and the global features
simultaneously. The convolutional filters have a different size. Thus, the small-size and large-size
filter could exploit the local and global features, independently. Tu et al. [208] used simultaneous
EEG-fMRI dataset to demonstrate that the temporal and spatial hierarchical correspondences
between the multi-stage processing in CNN and the activity observed in the EEG and fMRI. Sarrraf
et al. [171, 172] applied deep CNN to recognize Alzheimer’s Disease based on fMRI and MRI data.
Morenolopez et al. [133] employed a CNN model to deal with fMRI of brain tumor patients for
three-class recognition (normal, edema, or active tumor). The model was evaluated over BRATS
dataset and obtained the F1 score of 88%. Hosseini et al. [79] employed CNN for feature extraction.
The extracted features were classified by SVM for the detection of an epileptic seizure.
Furthermore, Li et al. [109] proposed a data completion method based on CNN. Specifically,
the information from fMRI data is used to complete positron emission tomography (PET), then a
classifier is trained based on both fMRI and PET data. In the model, the input data of the proposed
CNN is an fMRI patch with shape [15, 15, 15] and the output is a PET patch with shape [3, 3, 3].
There are two convolutional layers with ten filters each to mapping the input to output. The
experiments illustrated that the classifier trained by the combination of fMRI and PET (92.87%)
outperformed the one trained by solo fMRI (91.92%).
Moreover, Koyamada et al. [95] used a nonlinear MLP to extract common features from different
subjects. The model is evaluated over a dataset from the Human Connectome Project (HCP).
(2) Representative models A wide range of publications demonstrated the effectiveness of repre-
sentative models in recognition of fMRI data [26, 194]. Hu et al. [80] demonstrated the advantages
of deep learning in diagnosing brain disease and providing clinical decision support in Alzheimer’s
disease detection. Firstly, the fMRI images were converted to a matrix to represent the activity of 90
brain regions. Secondly, a correlation matrix was obtained by calculating the correlation between
each pair of brain regions to represent the functional connectivity between different brain regions.
Furthermore, a targeted AE was built to classify the correlation matrix, which was sensitive to AD.
The proposed approach achieved an accuracy of 87.5%. Plis et al. [156] employed a DBN-RBM with
three RBM components to extract the distinctive features from ICA processed fMRI and finally
achieved an average F1 measure of above 90% over four public datasets. Suk et al. [195] compared
the effectiveness of DBN-RBM and DBN-AE on Alzheimer’s disease detection. The experimental
results showed that the former obtained the accuracy of 95.4% which was slightly lower than the
latter (97.9%). Suk et al. [196] applied a D-AE model to extract latent features from the resting-state
fMRI data on the diagnosis of Mild Cognitive Impairment (MCI). The latent features are classified
by SVM and achieved an accuracy of 72.58%. Ortiz et al. [147] proposed a multi-view DBN-RBM to
receive the information of MRI and PET simultaneously. The learned representations were sent to
several simple SVM classifiers which were ensembled to form a stronger, high-level classifier by
voting.
(3) Generative models
The reconstruction of natural images based on fMRI data has attracted lots of attention [68, 175,
181, 244]. Seeliger et al. [175] proposed a deep convolutional GAN for reconstructing visual stimuli
from fMRI. The objective was to create an image similar to the presented stimulus image through a
well-trained generator. The generator is composed of four convolutional layers to convert the input
fMRI to a natural image. Han et al. [68] focused on the generation of synthetic multi-sequence
fMRI using GAN. The generated image can be used for data augmentation for better diagnostic
accuracy or physician training to help better understand various diseases. The authors applied the

, Vol. 1, No. 1, Article 1. Publication date: January 2016.


1:40 Xiang Zhang, et al.

existing Deep Convolutional GAN (DCGAN) [158] and Wasserstein GAN (WGAN) [13] and found
that the former works better. Shen et al. [181] presented a novel image reconstruction method, in
which the pixel values of a generated image are optimized to make its features, which is decoded
by MLP, similar to those decoded from the real fMRI.

5.5 EOG
In most situations, the EOG signals are regarded as artifacts which should be removed from the
collected EEG. However, they can also be used as informative signals to deploy EOG based BCI.
Although a number of researchers focussed on EOG analysis, only a limited number of papers
utilized deep learning. For example, Xia et al. [224] attempted to detect the subjects’ sleep stage
only using EOG signals. They employed a DBN-RBM for feature representation and a HMM for
classification. Moreover, EOG has been widely used as a supplementation of other signals (e.g.,
EEG) in several research topics such as emotion detection [90, 118, 218], sleep stage recognition
[49, 197], and driving fatigue detection [45].

5.6 MEG
Garg et al. [54] worked on the refining of MEG signals by removing the artifacts like eye-blinks
and cardiac activity. The MEG signals were decomposed by ICA first and then classified by a 1-D
CNN model. Finally, the proposed approach achieved the sensitivity of 85% and specificity of 97%
over a local dataset. Hasasneh et al. [70] also focused on artifact detection (cardiac and ocular
artifacts). The proposed approach used CNN to capture temporal features and MLP to extract
spatial information. Shu et al. [183] employed a sparse AE to learn the latent dependencies of MEG
signals in the task of single word decoding. The results demonstrated that the proposed approach is
advantageous for some subjects although it didn’t produce an overall increase in decoding accuracy.
Cichy et al. [40] applied a CNN model to recognize visual objects based on MEG and fMRI signals.

5.7 Discussion
In this section, we first analyze what is the most suitable deep learning models for BCI signals.
Then, we will summarize the popular deep learning models in BCI research. We hope this survey
could help our readers to select the most effective and efficient methods when dealing with BCI
signals. In Table 5, we summarize the BCI signals and the corresponding deep learning models of
the state-of-the-art papers. Hybrid models are divided into three parts: the combination of RNN
and CNN, the combination of representative and discriminative models, and others. Figure 20
illustrated of the publications proportion for crucial BCI signals and deep learning models.
5.7.1 BCI Signal based Discussion. Our investigations above reveal that studies on non-invasive
signals dominate the BCI research. Among the summarized 238 publications, there are only seven
focused on invasive BCI and most of them worked on ECoG instead of intracortical signals. One
important reason result in this phenomena is that the invasive BCI has a higher requirement on
the hardware and experiment environments. For example, the collection of ECoG signals needs
a volunteer patient and a surgeon who can operate craniotomy which makes most researchers
unqualified. Moreover, there are few public datasets of invasive brain signals. Therefore, most people
can not access the invasive data. In terms of the classification of invasive signals, CNN-related
algorithms often have higher ability to recognize the spikes from cortical neurons.
Besides, among the non-invasive signals, the studies on EEG is far more than the sum of all the
other BCI paradigms (fNIRS, fMRI, EOG, and MEG). Furthermore, there are about 70% of the EEG
papers concentrate on spontaneous EEG (133 publications). For better understanding, we split the
spontaneous EEG into several aspects: sleeping, motor imagery, emotional, mental disease, data

, Vol. 1, No. 1, Article 1. Publication date: January 2016.


Table 5. The summary of BCI studies based on deep learning models. Repre + Discri represents the hybrid models which combined representative and
distinctive models.

Deep Learning Models


Discriminative Models Representative Models Generative Models Hybrid Models
BCI Signals
DBN
MLP RNN CNN AE (D-AE) RBM (D-RBM) VAE GAN LSTM+CNN Repre + Discri Others
DBN-AE DBN-RBM
[11, 77]
Invasive [93] [12, 77] [225]
[226]
[214],[35],
[197] [128],
Sleeping EEG [27, 177] [27, 207] [186, 206], [201, 241] [50]
[27] [44, 129]
[27, 49]
Non- Spontaneous
[100], [210],
invasive EEG
[145],[242], [146, 248],
Signals EEG [249], [104, 160] [97, 162],
MI EEG [37],[193] [86],[99], [103] [41] [200, 250] [198],[46] [167, 219]
[245, 247] [252] [121]
[69, 232] [253]
[202]
[105],[117], [85],[227],
Emotional [34],[230], [255, 256] [53, 90]
[51] [244] [216], [227] [110],[228], [137] [8]
EEG [118] [85] [237]
[106, 218] [107, 258]
[211],[4],
[3],[138], [239],[111],
Mental Disease [77, 78],
[240] [199],[164] [173],[79], [139, 229], [149] [209, 254] [176]
EEG [7, 56]
[10, 87] [223]
[77]
[1, 41],
Data
[150]
Augmentation
[88]
[24], [82],
[18],[190],
[159, 203] [180],[215],[73], [64],[141],
Others [180] [236],[45] [141] [65, 65] [192],[23, 33] [246]
[234] [9, 157] [108, 169]
[65, 65]
[38, 189]
Table 5. The summary of BCI studies based on deep learning models (continued)

Deep Learning Models


Discriminative Models Representative Models Generative Models Hybrid Models
BCI Signals
DBN
MLP RNN CNN AE (D-AE) RBM (D-RBM) VAE GAN LSTM+CNN Repre + Discri Others
DBN-AE DBN-RBM
[187],[99]
[65, 115], [116, 122], [123],[22], [179, 255]
VEP [89, 94], [88, 187] [52]
[65, 170] [170] [21] [256]
ERP
EP [31]
Non- EEG
[126],[29],
[185],[66],
invasive
[127, 132],
RSVP [130, 131], [212] [127, 130] [30]
[112, 130]
Signals
[59, 238]
[30, 178]
[28, 170],
AEP [170]
[31, 191]
[98], [221],
SSEP SSVEP [62] [205] [16, 205] [96] [96] [14] [153]
[208]
ERD/ERS [37] [41, 198] [167]
[143],[83],
fNIRS [84],[37], [83] [75]
[72]
[40],[86],
[71],[182],
[68, 175],
fMRI [95, 181] [79],[171], [196] [195], [156],[195],[147, 194] [80],[26]
[181, 243]
[109, 208],
[172]
[189, 218]
EOG [45, 118] [224]
[49]
MEG [54],[40] [183] [70]
A Survey on Deep Learning based Brain Computer Interface 1:43

augmentation, and others. First, the classification of sleeping EEG mainly depends on discriminative
and hybrid models. Among the nineteen studies about sleeping stage classification, there are six
that employed CNN and modified CNN models independently while two papers adopted RNN
models. Three studies used hybrid models built based on the combination of CNN and RNN. Second,
in terms of research on MI EEG (30 publications), independent CNN and CNN-based hybrid models
are widely used. As for representative models, DBN-RBM is often applied to capture the latent
features from the MI EEG signals. Third, there are twenty-five publications related to spontaneous
emotional EEG. More than half of them employed representative models (such as D-AE, D-RBM,
especially DBN-RBM) for unsupervised feature learning. Most affective state recognition works
recognize the user’s emotion as positive, neutral, or negative. Some researchers take a further
step to classify the valence, and the arousal rate, which is more complex and challenging. Fourth,
research on mental disease diagnosis is promising and attractive. The majority of the related
research focuses on the detection of epileptic seizures and Alzheimer’s Disease. Since the detection
is a binary classification problem, many studies can achieve a high accuracy like above 90%. In this
area, the standard CNN model and the D-AE are prevalent. One possible reason is that CNN and AE
are the most well-known and effective deep learning models for classification and dimensionality
reduction. Fifth, several publications focus on GAN based data augmentation. At last, about thirty
studies investigated other spontaneous EEG such as driving fatigue, audio/visual stimuli impact,
cognitive/mental load, and eye state detection. These studies extensively apply standard CNN
models and variants.
Moreover, apart from spontaneous EEG, evoked potentials also attracted much attention. On
the one hand, in ERP, VEP and the subcategory RSVP has drawn lots of investigations because
visual stimuli, compared to other stimuli, are easier to present and more applicable in the real
world (e.g., P300 speller can be used for brain typing). For VEP (21 publications), 11 studies applied
discriminative models and six works adopted hybrid models. In terms of RSVP, CNN is the most
prevelent algorithm employed. Additionally, five papers focused on the analysis of AEP signals.
On the other hand, among the steady-state related researches, only SSVEP has been studied using
deep learning models. Most of them only applied discriminative models on the recognition of the
target image. At last, few papers attempted to investigate the ERD/ERS singles. Several publications
utilized ERD/ERS to analyze the signals and calculated the ERD/ERS value as a distinct feature.
Furthermore, beyond the diverse EEG paradigms, a wide range of papers paid attention to
fNIRS and fMRI. fNIRS images are rarely studied by deep learning and the majority of studies
just employed the simple MLP models. We believe more attention should be paid to the research
on fNIRS for the high portability and low cost relative to fMRI. As for the fMRI, twenty-three
papers proposed deep learning models to the classification. The CNN model is widely used for its
outstanding performance in feature learning from images. There are also several papers interested
in image reconstruction based on fMRI signals. One reason why fMRI is so popular is that several
public datasets are available on the internet although the fMRI equipment is expensive. EOG
has mainly been regarded as noise instead of a useful signal. However, it enables individuals to
communicate with the outer world by detection of the user’s eye movement. MEG signals are
mainly used in the medical field, where deep learning algorithm are not much employed. Thus, we
only found very few studies on MEG. The sparse AE and CNN algorithms have a positive influence
on the feature refining and classification of MEG.

5.7.2 Deep Learning Model-based Discussion. In this section, we will discuss the deep learning
models which are applied in BCI systems. First of all, in a high-level view, the discriminative models,
especially CNN, are most widely adopted in the summarized 238 publications. This is reasonable
because almost all the BCI issues can be regarded as a classification problem. CNN algorithms

, Vol. 1, No. 1, Article 1. Publication date: January 2016.


1:44 Xiang Zhang, et al.

(a) Publication proportion for BCI signals (b) Publication proportion for deep learning models

Fig. 20. Illustration of the publications proportion for crucial BCI signals and deep learning models.

account for more than 70% of discriminative models. We provide several possible reasons. First,
the design of CNN is powerful enough to extract the latent discriminative features and spatial
dependencies from the EEG signals for classification. As a result, CNN structures are adopted for
classification in some studies while adopted for feature engineering in some other studies. Second,
CNN has been achieved great success in some research areas (e.g., computer vision) which makes it
extremely famous and easy to implement (through the available public code). Thus, BCI researchers
have more chance to understand and apply CNN on their works. Third, some BCI approaches (e.g.,
fMRI, MEG) naturally form two-dimension images conducive to processing by CNN. Meanwhile,
other 1-D signals (e.g., EEG) could be converted into 2-D images for further analysis by CNN.
Here, we provide several methods converting 1-D EEG signals (with multiple channels) to a 2-D
matrix: 1) convert each time-point21 to a 2-D image; 2) convert a segment into a 2-D matrix. For
the first situation, suppose we have 32 channels, and we can collect 32 elements (each element
corresponding to a channel) at each time-point. As described in [105], the collected 32 elements
could be converted into a 2-D image based on the spatial position as shown in Figure 6. For the
second situation, suppose we have 32 channels, and the segment contains 100 time-points. The
collected data can be arranged as a matrix with the shape of [32, 100] where each row and column
refers to a specific channel and time-point, respectively. Fourth, there are a lot of variants of CNN
which are suitable for a wide range of BCI scenarios. For example, single-channel EEG signals
can be processed by 1-D CNN. In terms of RNN, only about 20% of discriminative model based
papers adopted RNN, which is much less than we expected since RNN has demonstrated powerful
in temporal feature learning. One possible reason for this phenomena is that processing a long
sequence by RNN is time-consuming and the EEG signals generally form a long sequence. For
example, sleeping EEG signals are usually sliced into segments of 30 seconds, which has 3000
time-points for a 100 Hz sampling rate. For a sequence with 3000 elements, through our preliminary
experiments, RNN takes more than 20 times as long to train as CNN. Moreover, MLP is not popular
due to its inferior effectiveness (e.g., non-linear ability) to the other algorithms because of its simple
deep learning architecture.
Additionally, for the representative models, DBN, especially DBN-RBM, is the most popular
model for feature extraction. DBN is widely used in BCI because of two advantages: 1) it is an
21 Time-point represents one sampling point. For example, we can have 100 time-points if the sampling rate is 100 Hz.

, Vol. 1, No. 1, Article 1. Publication date: January 2016.


A Survey on Deep Learning based Brain Computer Interface 1:45

efficient procedure to learn the top-down generative parameters that determine how the variables
in one layer depend on the variables in the layer above; 2) the values of the latent variables in
every layer can be inferred by a single, bottom-up pass that starts with an observed data vector in
the bottom layer and uses the generative weights in the reverse direction. However, most work
that employs the DBN-RBM model were published before 2016, indicating DBN is currently not
popular. It can be inferred that the researchers prefer to use DBN for feature learning followed by
a non-deep learning classifier before 2016; but recently, an increasing number of studies prefer to
adopt CNN or hybrid models for both feature learning and classification.
Moreover, generative models are rarely employed independently. The GAN and VAE based
data augmentation and image reconstruction are mainly focused on fMRI and EEG signals. It has
been demonstrated that a trained classifier will achieve more competitive performance after data
augmentation. Therefore, this is a promising research prospect in the future.
Last but not least, there are fifty-three publications proposing hybrid models for BCI studies.
Among them, the combinations of RNN and CNN make up about one-fifth. Since RNN and CNN are
known to have excellent temporal and spatial feature extraction ability, it is natural to combine them
for both temporal and spatial feature learning. Another type of hybrid models is the combination
of representative and discriminative models. This is easy to understand because the former is
employed for feature refining and the latter is employed for classification. There are twenty-eight
publications using this type of hybrid deep learning model, covering almost all types of BCI signals.
The adopted representative models are mostly AE or DBN-RBM, while the adopted discriminative
models are mostly CNN. Furthermore, there are twelve papers proposed other hybrid models, such
as two discriminative models. For example, several studies proposed the combination of CNN and
MLP where the CNN structure is used to extract spatial features which are fed into an MLP for
classification.

6 BCI APPLICATIONS
Deep learning models have contributed to various BCI applications including health care, smart
environments, security, affective computing, etc. In Table 7, we summarized deep learning based
BCI paradigms. The papers focused on signal classification without a specific application are not
listed in this table. Therefore, the publication numbers in this table are lower than in Table 5.

6.1 Health Care


In the health care area, deep learning based BCI systems mainly work on the detection and diagnosis
of mental diseases such as sleeping disorders, Alzheimer’s Disease, epileptic seizure, and other
disorders. In the first place, for the sleeping disorder detection, most studies are focused on
sleep-stage detection based on sleeping spontaneous EEG. In this situation, the researchers do not
need to recruit patients with sleeping disorders because the sleeping EEG signals can be easily
collected from healthy individuals. In terms of the algorithm, it can be observed from Table 7 that
the DBN-RBM and CNN are widely adopted for feature engineering and classification. Ruffini
et al. [164] went one step further by detecting REM Behavior Disorder (RBD) which may cause
neurodegenerative diseases such as Parkinson’s disease. They achieved an average accuracy of 85%
in recognition of the RBD from healthy controls.
Moreover, fMRI is widely used in the diagnosis of Alzheimer’s Disease. By taking advantage
of the high spatial resolution of fMRI, the diagnosis achieved an accuracy of above 90% in several
studies. Another reason that contributes to competitive performance is the binary classification
paradigm. Additionally, several publications aim to diagnose AD based on spontaneous EEG
[138, 254].

, Vol. 1, No. 1, Article 1. Publication date: January 2016.


1:46 Xiang Zhang, et al.

Table 7. Summary of deep learning based BCI applications. The ‘local’ dataset refers to private or not publicly
available dataset and the public datasets (with links) will be introduced in Section 6.9. In the signals, S-EEG,
MD EEG, and E-EEG separately denote sleeping EEG, mental disease EEG, and emotional EEG. The single
‘EEG’ refers to the other subcategory of spontaneous EEG. In the models, RF and LR denote to random forest
and logistic regression algorithms, respectively. In the performance column, ‘N/A’, ‘sen’, ‘spe’, ’aro’, ‘val’,
‘dom’, and ‘like’ denote not-found, sensitivity, specificity, arousal, valence, dominance, and liking, respectively.

Deep Learning
BCI Applications Reference Signals Dataset Performance
Models
Vilamala et al. [214] S-EEG CNN Sleep-EDF 0.86
Chambon et al. [35] S-EEG Multi-view CNN MASS session 3 N/A
Zhang et al. [241] S-EEG DBN + voting UCD 0.9131
Tsinalis et al. [206] S-EEG CNN Sleep-EDF 0.82
Sors et al. [186] S-EEG CNN SHHS 0.87
Manzano et al. [128] S-EEG CNN + MLP Sleep-EDF 0.732
Sleeping University
Quality Shahin et al. [177] S-EEG MLP Hospital 0.9
Evaluation in Berlin
Manzano et al. [129] S-EEG CNN, MLP Sleep-EDF 0.686/0.689
MASS/
Supratak et al. [197] EEG, EOG CNN + LSTM 0.862/0.82
Sleep-EDF
DBN-RBM
Xia et al. [224] EOG Sleep-EDF 0.833
+ HMM
Ruffini et al. [164] S-EEG RNN Local 0.85
Fraiwan et al. [50] S-EEG DBN-AE + MLP Local 0.804
Tan et al. [201] S-EEG DBN-RBM Local 0.9278 (F1)
Fernandez et al. [49] EEG, EOG CNN SHHS 0.9 (F1)
Biswai et al. [27] S-EEG RNN Local 0.8576
Health Hu et al. [80] fMRI D-AE + MLP ADNI 0.875
Care Morabito et al. [138] MD EEG CNN Local 0.82
DBN-AE; 0.979;
Suk et al. [195] fMRI ADNI
DBN-RBM 0.954
AD
Zhao et al. [254] MD EEG DBN-RBM Local 0.92
Detection
Sarraf et al. [171] fMRI CNN ADNI 0.9685
Sarraf et al. [172] fMRI CNN ADNI 0.999
Bhatkoti et al. [26] fMRI, PET AE + MLP ADNI 0.7922
DBN-RBM
Ortiz et al. [147] fMRI, PET ADNI 0.9
+ SVM
Li et al. [109] fMRI CNN + LR ADNI 0.9192
Tsiouris et al. [207] MD EEG LSTM CHB-MIT ¿0.99
Yuan et al. [240] MD EEG Attention-MLP CHB-MIT 0.9661
Yuan et al. [239] MD EEG D-AE + SVM CHB-MIT 0.95
Ullah et al. [211] MD EEG CNN + voting UBD 0.954
Lin et al.[111] MD EEG D-AE UBD 0.96
Hosseini et al. [78] MD EEG D-AE + MLP Local 0.94
Page et al. [149] MD EEG DBN-AE + LR N/A 0.8 ∼ 0.9
Seizure Sen: 0.3083;
Golmohammadi et al. [56] MD EEG RNN+CNN TUH
Detection Spe: 0.9686
Wen et al. [223] MD EEG AE Local 0.92
Acharya et al. [3] MD EEG CNN UBD 0.8867
Schirmeister et al. [173] MD EEG CNN TUH 0.854
Hosseini et al. [79] MD EEG CNN Local N/A
Talathi et al. [199] MD EEG GRU BUD 0.996
Kiral et al. [93] EcoG MLP Local Sen: 0.69
Johansen et al. [87] MD EEG CNN Local 0.947 (AUC)
Ansari et al. [10] MD EEG CNN + RF Local 0.77

, Vol. 1, No. 1, Article 1. Publication date: January 2016.


A Survey on Deep Learning based Brain Computer Interface 1:47

Table 7. Summary of deep learning based BCI applications (Continued).

Deep Learning
BCI Applications Reference Signals Dataset Performance
Models
Seizure
Hosseini et al. [77] EEG, EcoG CNN Local 0.96
Detection
Sen: 0.39;
Shah et al. [176] MD EEG CNN+ LSTM TUH
Spe: 0.9037
DBN-RBM
Turner et al. [209] MD EEG Local N/A
+ LR
Others:
Sen: 0.85,
Cardiac Garg [54] MEG CNN Local
Health Spe: 0.97
Detection
Care Hasasneh et al. [70] MEG CNN + MLP Local 0.944
Acharya et al. [4] MD EEG CNN Local 0.935 ∼ 0.9596
Depression
DBN-RBM
Al et al. [7] MD EEG Local 0.695
+ MLP
Morenolopez et al. [133] fMRI CNN BRATS 0.88 (F1)
Brain Tumor Havaei et al. [71] fMRI Muli-scale CNN BRATS 0.88 (F1)
Shreyas et al. [182] fMRI CNNC BRATS 0.83
Interictal Epileptic Antoniades et al. [12] EcoG CNN Local 0.8751
Discharge (IED) Antoniades et al. [11] EEG, EcoG AE + CNN Local 0.68
Plils et al. [156] fMRI DBN-RBM Combined ¿0.9 (F1)
Schizophrenia
CNN + RF
Chu et al. [38] Local 0.816, 0.967, 0.992
+ Voting
Creutzfeldt-Jakob
Morabito et al. [139] MD EEG D-AE Local 0.81 ∼ 0.83
Disease (CJD)
Mild Cognitive
Suk et al. [196] fMRI AE + SVM ADNI2 0.7258
Impairment (MCI)
Robot Behncke et al. [24] EEG CNN Local 0.75
Smart Control Huve et al. [84] fNIRS MLP Local 0.82
Environment Exoskeleton
Kwak et al. [98] SSVEP CNN Local 0.9403
Control
Smart
Zhang et al. [247] MI EEG RNN EEGMMI 0.9553
Home
Kawasaki et al. [89] VEP MLP Local 0.908
The third BCI
Cecotti et al. [31] VEP CNN + Voting competition, 0.955
Brain Communication
Dataset II
LSTM+CNN
Zhang et al. [250] MI EEG Local 0.9452
+AE
The third BCI
Cecotti et al. [31] VEP CNN competition, 0.945
Dataset II
Maddula et al. [123] VEP RCNN Local 0.65∼0.76
The third BCI
Liu et al. [115] VEP CNN competition, 0.92 ∼ 0.96
Dataset II
Attention-based
Zhang et al. [249] MI-EEG EEGMMI + local 0.9882
RNN
Identification
Security Koike et al. [94] VEP MLP Local 0.976
Mao et al. [132] RSVP CNN Local 0.97
Authentication Zhang et al. [245] MI EEG Hybrid EEGMMI + local 0.984
Mioranda et al. [137] E-EEG RNN + CNN AMIGOS ¡0.7
0.8 ∼
Jia et al. [85] E-EEG DBN-RBM DEAP
0.85 (AUC)
Hierarchical
Li et al. [105] E-EEG SEED 0.882
CNN
DBN-AE,
Affective Computing Xu et al. [227] E-EEG DEAP ¿0.86 (F1)
DBN-RBM
Liu et al. [117] E-EEG CNN Local 0.82
Frydenlund et al. [51] E-EEG MLP DEAP N/A
Multi-view D-AE Aro: 0.7719;
Yin et al. [237] E-EEG DEAP
+ MLP Val: 0.7617
Chai et al. [34] E-EEG AE SEED 0.818
Aro: 0.7033;
Kawde et al. [90] EEG, EOG DBN-RBM DEAP Val: 0.7828;
Dom: 0.7016
Aro:0.642,
Li et al. [110] E-EEG DBN-RBM DEAP Val:0.584,
DomJanuary
, Vol. 1, No. 1, Article 1. Publication date: 0.658 2016.
1:48 Xiang Zhang, et al.

Table 7. Summary of deep learning based BCI applications (Continued).

BCI Applications Reference Signals Deep Learning Models Dataset Performance


Aro:0.6984,
Xu et al. [228] E-EEG DBN-RBM DEAP Val:0.6688,
Lik: 0.7539
DBN-RBM
Zheng et al. [258] E-EEG Local 0.8762
+ HMM
Affective Computing
Aro:0.8565,
Alhagry et al. [8] E-EEG LSTM + MLP DEAP Val:0.8545,
Lik: 0.8799
Li et al. [75] E-EEG CNN SEED 0.882
DBN-RBM
Zhang et al. [255, 256] E-EEG SEED 0.8608
+ MLP
EEG, SEED,
Liu et al. [118] AE 0.9101, 0.8325
EOG DEAP
DBN-RBM
Gao et al. [53] E-EEG Local 0.684
+ MLP
Zhang et al. [244] E-EEG RNN SEED 0.895
Hung et al. [82, 82] EEG CNN Local 0.572 (RMSE)
Hajinoroozi et al. [64] EEG DBN-RBM Local 0.85
Hung et al. [82] EEG CNN Local
EEG,
Du et al. [45] D-AE + SVM Local 0.094 (RMSE)
Drive Fatigue Detection EOG
San et al. [169] EEG DBN-RBM + SVM Local 0.7392
Almogbel et al. [9] EEG CNN Local 0.9531
Hachem et al. [62] SSVEP MLP Local 0.75
Chai et al. [33] EEG DBN + MLP Local 0.931
Hajinoroozi et al. [65, 65] EEG CNN Local 0.8294
Naseer et al. [143] fNIRS MLP Local 0.963
Yin et al. [236] EEG D-AE Local 0.9584
Hennrich et al. [72] fNIRS MLP Local 0.641
Mental Load Measurement Bashivan et al. [22] EEG R-CNN Local 0.9111
Bashivan et al. [21] EEG DBN + MLP Local N/A
Bashivan et al. [23] EEG DBN-RBM Local 0.92
Li et al. [108] EEG DBN-RBM Local 0.9886
School Bullying Baltatzis et al. [18] EEG CNN Local 0.937
Stober et al. [190] EEG CNN Local 0.776
Stober et al. [192] EEG AE + CNN Open MIIR 0.27 for 12-class
Music Detection
Stober et al. [191] EEG CNN Local 0.244
EEG,
Sternin et al. [189] CNN Local 0.75
EOG
Number
Waytowich et al. [221] SSVEP CNN Local 0.8
Choosing
fMRI,
Cichy et al. [40] CNN N/A N/A
MEG
Other
Manor et al. [126] RSVP CNN Local 0.75
Appli- Visual Object
Cecotti et al. [29] RSVP CNN Local 0.897 (AUC)
-cations Recognition
Hajinoroozi et al. [66] RSVP CNN Local 0.7242 (AUC)
Perez et al. [153] SSVEP AE Local 0.9778
Shamwell et al. [178] RSVP CNN Local 0.7252 (AUC)
Finger BCI
Xie et al. [225, 226] EcoG RNN+CNN N/A
Trajector Competition IV
Guilty
DBN-RBM; 0.869;
Knowledge Kulasingham et al. [96] SSVEP Local
DBN-AE 0.8601
Test
Concealed
Information Liu et al. [116] EEG DBN-RBM Local 0.973
Test
Flanker Task Volker et al. [215] EEG CNN Local 0.841
Narejo et al. [141] EEG DBN-RBM UCI 0.989
Eye State
Reddy et al. [159] EEG MLP Local 0.975
User Preference Teo et al. [203] EEG MLP Local 0.6399
Emergency
Hernandez et al. [73] EEG CNN Local 0.718
Braking
Gender Hiroyasu et al. [75] fNIRS D-AE + MLP Local 0.81
Detection Putten et al. [157] EEG CNN Local 0.81
, Vol. 1, No. 1, Article 1. Publication date: January 2016.
A Survey on Deep Learning based Brain Computer Interface 1:49

Another area that has attracted much attention is the diagnosis of epileptic seizure. Seizure
detection is mainly based on mental disease spontaneous EEG and occasionally on ECoG signals.
The popular deep learning models in this scenario are independent CNN and RNN, along with
hybrid models combining RNN and CNN. Some models integrated deep learning models for feature
extraction and traditional classifiers for detection [149, 209]. For example, Yuan et al. [239] applied
a D-AE in feature engineering followed by SVM for seizure diagnosis. Ullah et al. [211] adopted
voting for post-processing, which proposed several different CNN classifiers and predicted the final
result by voting.
Furthermore, there are a lot of other healthcare issues which can potentially be solved by
BCI systems. Cardiac artifacts in MEG signals can be automatically detected by deep learning
models[54, 70]. Several modified CNN structures are proposed to detect brain tumors based on
fMRI from the public BRATS dataset [71, 133, 182]. The literature demonstrates the effectiveness
of deep learning models in the detection of a number of mental disorders such as depression [4],
Interictal Epileptic Discharge (IED) [12], schizophrenia [156], Creutzfeldt-Jakob Disease (CJD) [139],
and Mild Cognitive Impairment (MCI) [196].

6.2 Smart Environment


The smart environment is a promising application scenario for BCI in the future. With the develop-
ment of Internet of Things (IoT), an increasing number of smart environments can be connected
to BCI. For example, an assisting robot can be used in smart home [247, 253], in which the robot
can be controlled by brain signals of the individuals. Moreover, Behncke et al. [24] and Huve et
al. [84] investigated how to control a robot based on the visual stimulated spontaneous EEG and
fNIRS signals. BCI controlled exoskeletons could help people with damaged to the motor control
in the lower limbs in walking and daily activities [98]. In the future, research on brain-controlled
appliances may be beneficial to the elderly people and the disabled in creating smart homes and
smart hospitals.

6.3 Brain Communication


The biggest advantage of BCI, compared to other human-machine interface techniques, is that
BCI enables patients who have lost most motor abilities, like speaking, to communicate with the
outer world. Deep learning technology has substantially improved the efficiency of brain signal
based communications. One typical paradigm which enables individual to type without any motor
system is the P300 speller which can convert the user’s intent into text [89]. Powerful deep learning
models allow the BCI systems to recognize P300 segments from non-P300 segments while the
former contains the communication information of the user [31]. At a higher level, representative
deep learning models can help to detect what character (as shown in Figure 12b) the user is focusing
on and print it on the screen to chat with others [31, 115, 123].
Additionally, Zhang et al. [250] proposed a hybrid model combined RNN, CNN, and AE to
extract informative features from MI EEG to recognize what letter the user wants to type. The
proposed interface including 27 characters (26 English alphabets and the space bar) and all of
them are separated by 3 character blocks (each block contains 9 characters) in the initial interface.
Overall, there are three alternative selections, and each selection will lead to a specific sub-interface
which includes 9 characters. Again, the 9 = 3 × 3 characters are divided into three character blocks,
and each of them contains nine characters. Again, the 9 = 3 × 3 characters are divided into three
character blocks, and each of them is connected to a lower level interface. In the bottom level, each
block represents only one character. However, compared to P300 speller, the MI-based protocols

, Vol. 1, No. 1, Article 1. Publication date: January 2016.


1:50 Xiang Zhang, et al.

have lower information transform rate because it requires three operations to find the specific
letter at the bottom level.

6.4 Security
The security field is a common area of interest for BCI researchers. The security problem can be
divided into identification (also called recognition) and authentication (also called verification)
aspects. The former generally is a multi-class classification problem, and its aim is to recognize the
identity of the test-person [249]. The latter usually is a binary classification problem, which only
cares whether the test-person is authorized or unauthorized [245].
The existing biometric identification/authentication systems are mainly based on individuals’
unique intrinsic physiological features (e.g., face, iris, retina, voice, and fingerprint). However, the
state-of-the-art person identification systems are vulnerable, e.g., anti-surveillance prosthetic masks
can thwart face recognition, contact lenses can trick iris recognition, vocoders can compromise
voice identification, and fingerprint films can deceive fingerprint sensors. In this perspective, the
EEG (Electroencephalography) based biometric person identification systems are emerging as
promising alternatives due to their high attack-resilience. An individualfis EEG signals are virtually
impossible to mimic for an imposter, thus making this approach highly resilient to spoofing attacks
encountered by other identification techniques. Koike et al. [94] have adopted deep neural networks
to identify the user’s ID based on VEP signals while Mao et al. [132] applied CNN for person
identification based on RSVP signals. Zhang et al. [249] proposed an attention-based LSTM model
and evaluated it over both public and local datasets. The authors [245] then combined EEG signals
with gait information to introduce a dual-authentication system with a hybrid deep learning model.

6.5 Affective Computing


The affective states of a user provide critical information for many applications such as personalized
information (e.g., multimedia content) retrieval or intelligent human-computer interface design
[227]. Recent research illustrated that deep learning models can enhance the performance of
affective computing. Emotion can be defined according to several dimensions. Dimensional models
of emotion attempt to conceptualize human emotions by defining where they lie in two or three
dimensions. The most widely used circumplex model states the emotions are distributed in two
dimensions: arousal and valence. The arousal refers to the intensity of the emotional stimuli or how
strong the emotion is. The valence refers to the relationship within the person who experiences
the emotion (positive to negative). In some other models, the dominance and liking dimensions are
used instead.
Some papers only attempt to classify the user’s emotional state into a binary (positive/negative)
or three-category (positive, neutral, and negative) problem and seek to identify them using deep
learning algorithms [51]. A range of publications adopted CNN and its variants to classify emotional
EEG signals [105, 117, 216]. The DBN-RBM is the most representative deep learning model used to
discover concealed features from emotional spontaneous EEG [227, 256]. Xu et al. [227] applied a
DBN-RBM as specific feature extractors for the affective state classification problem using EEG
signals.
Furthermore, at a more fundamental level, some researchers aim for the recognition of a posi-
tive/negative state for each specific emotional dimension. For example, Yin et al. [237] proposed a
multiple-fusion-layer based ensemble classifier of AE for recognizing emotions. Each AE consists
of three hidden layers to filter the unwanted noise in the physiological features and derives the
stable feature representations. The proposed model was evaluated over the benchmark DEAP
and achieved the arousal of 77.19% and valence of 76.17%. Mioranda et al. [137] presented a

, Vol. 1, No. 1, Article 1. Publication date: January 2016.


A Survey on Deep Learning based Brain Computer Interface 1:51

multi-task cascaded deep neural network which jointly predicts people’s affective levels (valence
and arousal) and personal factors using EEG signals recorded in response to the presentation of
affective multimedia content.

6.6 Driver Fatigue Detection


Vehicle driver’s ability to maintain optimal performance and attention is essential to ensure the
safety of the traffic. EEG signals have been proven to be useful in evaluating peoples cognitive state
during specific tasks [9]. Generally, the driver is regarded as being in an alert state if the reaction
time is below or equal to 0.7 seconds and in a fatigued state if the reaction time is higher or equal to
2.1 seconds. Hajinoroozi et al. [64] considered the prediction of driver’s fatigue from EEG signals
by extracting the distinct features. They explored an approach based on DBN for dimensionality
reduction.
The detection of driver fatigue is crucial because the drowsiness of the driver may lead to
accidents. Additionally, driver fatigue detection is feasible in the real world. In terms of the
hardware, the equipment used to collect EEG signals is off-the-shelf and portable enough to be
used in a car. Moreover, the price of an EEG headset is affordable for most people. In terms of the
algorithms, deep learning models have greatly enhanced the performance of fatigue detection. As
we summarized, the EEG based driving drowsiness can be recognized with high accuracy (82% ∼
95%).
The future scope of driver-fatigue detection is in the self-driving scenario. As we know, in
most self-driving situations (e.g., Automation level 322 ), the human driver is expected to respond
appropriately to a request to intervene, which necessitates that the driver should maintain an alert
state. Therefore, we believe the application of BCI based drive fatigue detection will benefit the
development of the self-driving car.

6.7 Mental Load Measurement


Evaluation of operator mental workload levels via ongoing electroencephalogram (EEG) is quite
promising in Human-Machine collaborative task environments to alert when the operator perfor-
mance is degraded[236]. The human operator works as a vital component in automation systems
for decision making and strategy development. However, unlike machines or computers, the
human functional states cannot always fit the task requirements due to limited working memory
and time-dependent psychophysiological experience. Therefore, In such a case, operator perfor-
mance degradation caused by abnormal cognitive states, e.g., high working stress or distraction, is
considered to be a crucial factor for catastrophic accidents [152].
A number of researchers have focussed on this topic. The mental workload can be measured from
fNIRS signals or spontaneous EEG. Naseer et al. [143] analyzed and compared the classification
accuracies of six different classifiers, including five traditional classifiers and a MLP classifier for
a two-class mental task (mental arithmetic and rest) using fNIRS signals. The experiment results
showed that the MLP outperformed the traditional classifiers like SVM, kNN and achieved the
highest accuracy of 96.3%. Bashivan et al. [23] presented a statistical approach, a DBN model, to
predict cognitive load from single trial EEG. Before the DBN, the authors manually extracted the
wavelet entropy and band-specific power from theta, alpha and beta bands. Finally, the experiments
demonstrated the recognition of cognitive load across four different levels with an overall accuracy
of 92% during execution of a memory task.

22 https://en.wikipedia.org/wiki/Self-driving car

, Vol. 1, No. 1, Article 1. Publication date: January 2016.


1:52 Xiang Zhang, et al.

Table 10. The summary of public dataset for BCI systems. ‘# Sub’, ‘# Cla’, andS-Rate denote the number of
subject, the number of class, and the sampling rate, respectively. FM denote finger movement while BCI-C
denote the BCI competition. The datasets may contain more biometric signals (e.g., ECG) but we only list the
channels related to BCI.

BCI Signals Name Link # Sub # Cla S-Rate # Channel


Inv- FM EcoG BCI-C IV23 , Dataset IV 3 5 1000 48 ∼ 64
-asive BCI-C III24 ,
MI EcoG 1 2 1000 64
Dataset I
25
Sleep-EDF : Telemetry 22 6 100 2 EEG, 1 EOG, 1 EMG
2 EEG (100Hz), 1 EOG (100Hz),
Sleep-EDF: Cassette 78 6 100, 1
1 EMG (1Hz)
Sleeping MASS-126 53 5 256 17/19 EEG, 2 EOG, 5 EMG
EEG MASS-2 19 6 256 19 EEG, 4 EOG, 1EMG
MASS-3 62 5 256 20 EEG, 2 EOG, 3 EMG
MASS-4 40 6 256 4 EEG, 4 EOG, 1 EMG
MASS-5 26 6 256 20 EEG, 2 EOG, 3 EMG
27 2 EEG (125Hz), 1 EOG (50Hz),
SHHS 5804 N/A 125, 50
1 EMG (125Hz)
Seizure CHB-MIT28 22 2 256 18
EEG TUH29 315 2 200 19
EEGMMI30 109 4 160 64
EEG
BCI-C II31 , Dataset III 1 2 128 3
BCI-C III, Dataset III a 3 4 250 60
BCI-C III, Dataset III b 3 2 125 2
MI BCI-C III, Dataset IV a 5 2 1000 118
EEG BCI-C III, Dataset IV b 1 2 1001 119
BCI-C III, Dataset IV c 1 2 1002 120
BCI-C IV, Dataset I 7 2 1000 64
BCI-C IV, Dataset II a 9 4 250 22 EEG, 3 EOG
BCI-C IV, Dataset II b 9 2 250 3 EEG, 3 EOG
AMIGOS32 40 4 128 14
Emotional
SEED33 15 3 200 62
EEG
DEAP34 32 4 512 32
Others
Open MIIR35 10 12 512 64
EEG
BCI-C II, Dataset II b 1 36 240 64
VEP
BCI-C III, Dataset II 2 26 240 64
ADNI36 202 3 N/A N/A
fMRI
BRATS37 2013 65 4 N/A N/A
MEG BCI-C IV, Dataset III 2 4 400 10

23 http://www.bbci.de/competition/iv/
24 http://www.bbci.de/competition/iii/
25 https://physionet.org/physiobank/database/sleep-edfx/
26 https://massdb.herokuapp.com/en/
27 https://physionet.org/pn3/shhpsgdb/
28 https://physionet.org/pn6/chbmit/
29 https://www.isip.piconepress.com/projects/tuh eeg/html/downloads.shtml
30 https://physionet.org/pn4/eegmmidb/
31 http://www.bbci.de/competition/ii/
32 http://www.eecs.qmul.ac.uk/mmv/datasets/amigos/readme.html
33 http://bcmi.sjtu.edu.cn/ seed/download.html

, Vol. 1, No. 1, Article 1. Publication date: January 2016.


A Survey on Deep Learning based Brain Computer Interface 1:53

6.8 Other Applications


Apart from the aforementioned key applications, there are still some interesting scenarios, such
as recommender system [203] and emergency braking [73] to which deep learning based BCI can
be applied. One possible topic is the recognition of a visual object, which may be used in guilty
knowledge test [96] and concealed information test [116]. The neurons of the participant will
produce a pulse when he/she suddenly perceives a familiar object. Based on the theory, visual
target recognition main uses RSVP signals. Cecotti et al. [29] investigated the performance of
CNNs in terms of their architecture and how they are evaluated. Specifically, the authors aimed to
build a common model target recognition which can work for various subjects instead of a specific
subject. They addressed the change of performance that can be observed between specifying a
neural network for a single subject, or by considering a neural network for a group of subjects,
taking advantage of a larger number of trials from different subjects.
Other researchers have investigated whether it is possible to distinguish the subject’s gender
using fNIRS [75] or spontaneous EEG [157]. Hiriyasu et al. [75] adopted deep learning to recognize
the gender of the subject based on the cerebral blood flow. The experiment results suggested that
there exists a relation between cerebral blood flow changes and biological information. Putten et al.
[157] tried to discover the sex-specific information from the brain rhythms and adopted a CNN
model to recognize the participant’s gender. This paper illustrated that fast beta activity (20 ∼25
Hz), and its spatial distribution is one of the main distinctive attributes.

6.9 Benchmark Datasets


In this section, we extensively explore the benchmark datasets which can be used in deep learning
based BCI. As listed in Table 10, we provide 31 reusable public datasets with download links, which
cover most BCI signals. The BCI competition IV (BCI-C IV) contains five datasets. We give the
access link at the first dataset. For better understanding, we present the number of subjects, the
number of classes (how many categories), sampling rate and the number of channels of each dataset.
In the ‘# Channel’ column, the default channel is EEG signals.

7 FUTURE DIRECTIONS
Although deep learning has increased the performance of BCI systems, technical and usability
challenges remains. The technical challenges concern the classification ability in complex BCI
scenarios; and the usability challenges refer to limitations in large scale real-world deployment. In
this section, we introduce these challenges and point out the possible solutions.

7.1 General Framework


Until now, we have introduce several types of BCI signals (e.g., spontaneous EEG, ERP, fMRI) and
deep learning models that have been applied for each type. One promising research direction for
deep learning based BCI is to develop a general framework that can handle various BCI signals
regardless of the number of channels used for signal collection, the sample dimensions (e.g., 1-D or
2-D sample), and stimulation types (e.g., visual or audio stimuli), etc. The general framework would
requires two key capabilities: the attention mechanism and the ability to capture latent feature.
The former guarantees the framework can focus on the most valuable parts of input signals and
the latter enables the framework to capture the distinctive and informative features.
34 https://www.eecs.qmul.ac.uk/mmv/datasets/deap/
35 https://owenlab.uwo.ca/research/the openmiir dataset.html
36 http://adni.loni.usc.edu/data-samples/access-data/
37 https://www.med.upenn.edu/sbia/brats2018/data.html

, Vol. 1, No. 1, Article 1. Publication date: January 2016.


1:54 Xiang Zhang, et al.

The attention mechanism can be implemented based on attention scores or by various machine
learning algorithms such as reinforcement learning. The attention scores can be inferred from the
input data and work as a weight to help the framework to pay attention to the parts with high
attention scores. Reinforcement learning has been shown to be able to find the most valuable part
through a policy search [248]. CNN is the most suitable structure for capturing features in various
levels and ranges. In the future, CNN could be used as a fundamental feature learning tool and be
integrated with suitable attention mechanisms to form a general classification framework.

7.2 Person-independent Classification


Until now, most BCI classification tasks focus on person-dependent scenarios, where the training and
the testing sets come from the same person. The future direction is to realize person-independent
classification so that the testing data will never appear in the training set. High-performance
person-independent classification is necessary for the wide application of BCI Systems in the
real-world.
One possible solution to achieving this goal is to build a personalized model with transfer
learning. A personalized effective model can adopt a transductive parameter transfer approach
to construct individual classifiers and to learn a regression function that maps the relationship
between data distribution and classifier parameters [257]. Another potential solution is mining
the subject-independent components from the input data. The input data can be decomposed
into two parts: a subject-dependent component, which depends on the subject and a subject-
independent component, which is common for all subjects. A hybrid multi-task model can work on
two tasks simultaneously, one focusing on person identification and the other on class recognition.
A well-trained and converged model ought to extract the subject-independent features in a class
recognition task.

7.3 Semi-supervised and Unsupervised Classification


The performance of deep learning models highly depends on the size of training data, which
requires expensive and time-consuming manual labeling to collect abundant class labels in a wide
range of scenarios such as sleeping EEG. While supervised learning requires both observations and
labels for the training, unsupervised learning requires no labels and semi-supervised learning only
requires partial labels [85]. Therefore, they are more suitable for problems with little ground truth
data available.
Zhang et al. proposed an Adversarial Variational Embedding (AVAE) framework that combines
a VAE++ model (as a high-quality generative model) and semi-supervised GAN (as a posterior
distribution learner) [251] for robust and effective semi-supervised learning. Jia et al. [85] proposed
a semi-supervised framework by leveraging label information in feature extraction and integrating
unlabeled information to regularize the supervised training.
Two methods may enhance unsupervised learning: one is to employ crowd-sourcing to label the
unlabeled observations; the other is to leverage unsupervised domain adaption learning to align the
distribution of source BCI signals and the distribution of target signals with a linear transformation.

7.4 Hardware Portability


Poor portability of hardware has prevented the wide application of BCI systems in the real world.
In most scenarios, users would like to use small, comfortable, or even wearable BCI hardware to
collect brain signals and to control appliances and assistant robots.
Currently, there are three types of EEG collection equipment: the unportable, the portable
headset, and ear-EEG sensors. The unportable equipment (e.g., Neuroscan, Biosemi) has high

, Vol. 1, No. 1, Article 1. Publication date: January 2016.


A Survey on Deep Learning based Brain Computer Interface 1:55

sampling frequency, channel numbers, and signal quality but is expensive. It is suitable for physical
examination in hospital. The portable headsets (e.g., Neurosky, Emotiv EPOC) have 1 ∼ 14 channels
and 128∼ 256 sampling rate but may cause discomfort for users after a long-time use. The ear-EEG
sensors, which are attached to the outer ear, have gained increasing attention recently but remain
mostly at the laboratory stage [148]. The ear-EEG platform comprises a set of electrodes placed
inside each ear canal, together with additional electrodes in the concha of each ear [135]. The
EEGrids, to the best of our knowledge, is the only commercial ear-EEG. It has multi-channel sensor
arrays placed around the ear using an adhesive 38 and is even more expensive. An promising future
direction is to improve the usability by developing a cheaper (e.g., lower than 200$) and more
comfortable (e.g., can last longer than 3 hours without feeling uncomfortable) wireless ear-EEG
equipment.

8 CONCLUSION
In this paper, we systematically survey the recent advances in deep learning models for Brain-
Computer Interface. Compared with traditional methods, deep learning not only enables to learn
high-level features automatically from BCI signals but also depends less on manual-crafted features
and domain knowledge. We summarize BCI signals and dominant deep learning models, followed
by discussing state-of-the-art deep learning techniques for BCI and identifying the suitable deep
learning algorithms for each BCI signal type. Finally, we overview deep learning based BCI
applications and point out the open challenges and future directions.

REFERENCES
[1] Sherif M Abdelfattah, Ghodai M Abdelrahman, and Min Wang. 2018. Augmenting The Size of EEG datasets Using
Generative Adversarial Networks. In 2018 International Joint Conference on Neural Networks (IJCNN). IEEE, 1–6.
[2] Sarah N Abdulkader, Ayman Atia, and Mostafa-Sami M Mostafa. 2015. Brain computer interfacing: Applications and
challenges. Egyptian Informatics Journal 16, 2 (2015), 213–230.
[3] U Rajendra Acharya, Shu Lih Oh, Yuki Hagiwara, Jen Hong Tan, and Hojjat Adeli. 2018. Deep convolutional neural
network for the automated detection and diagnosis of seizure using EEG signals. Computers in biology and medicine
100 (2018), 270–278.
[4] U Rajendra Acharya, Shu Lih Oh, Yuki Hagiwara, Jen Hong Tan, Hojjat Adeli, and D Puthankattil Subha. 2018.
Automated EEG-based screening of depression using deep convolutional neural network. Computer methods and
programs in biomedicine 161 (2018), 103–113.
[5] Minkyu Ahn and Sung Chan Jun. 2015. Performance variation in motor imagery brain–computer interface: a brief
review. Journal of neuroscience methods 243 (2015), 103–110.
[6] Min-Hee Ahn and Byoung-Kyong Min. 2018. Applying deep-learning to a top-down SSVEP BMI. In Brain-Computer
Interface (BCI), 2018 6th International Conference on. IEEE, 1–3.
[7] Alaa M Al-kaysi, Ahmed Al-Ani, and Tjeerd W Boonstra. 2015. A multichannel deep belief network for the classification
of EEG data. In International Conference on Neural Information Processing. Springer, 38–45.
[8] Salma Alhagry, Aly Aly Fahmy, and Reda A El-Khoribi. 2017. Emotion Recognition based on EEG using LSTM
Recurrent Neural Network. Emotion 8, 10 (2017).
[9] Mohammad A Almogbel, Anh H Dang, and Wataru Kameyama. 2018. EEG-signals based cognitive workload detection
of vehicle driver using deep learning. In Advanced Communication Technology (ICACT), 2018 20th International
Conference on. IEEE, 256–259.
[10] Amir H Ansari, Perumpillichira J Cherian, Alexander Caicedo, Gunnar Naulaers, Maarten De Vos, and Sabine Van Huffel.
2018. Neonatal seizure detection using deep convolutional neural networks. International journal of neural systems
(2018), 1850011.
[11] Andreas Antoniades, Loukianos Spyrou, David Martin-Lopez, Antonio Valentin, Gonzalo Alarcon, Saeid Sanei, and
Clive Cheong Took. 2018. Deep neural architectures for mapping scalp to intracranial EEG. International journal of
neural systems (2018), 1850009.

38 http://ceegrid.com/home/concept/

, Vol. 1, No. 1, Article 1. Publication date: January 2016.


1:56 Xiang Zhang, et al.

[12] Andreas Antoniades, Loukianos Spyrou, Clive Cheong Took, and Saeid Sanei. 2016. Deep learning for epileptic
intracranial EEG data. In Machine Learning for Signal Processing (MLSP), 2016 IEEE 26th International Workshop on.
IEEE, 1–6.
[13] Martin Arjovsky, Soumith Chintala, and Léon Bottou. 2017. Wasserstein generative adversarial networks. In Interna-
tional Conference on Machine Learning (ICML). 214–223.
[14] Mohamed Attia, Imali Hettiarachchi, Mohammed Hossny, and Saeid Nahavandi. 2018. A time domain classification of
steady-state visual evoked potentials using deep recurrent-convolutional neural networks. In Biomedical Imaging (ISBI
2018), 2018 IEEE 15th International Symposium on. IEEE, 766–769.
[15] Siriwadee Aungsakun, Angkoon Phinyomark, Pornchai Phukpattaranont, and Chusak Limsakul. 2011. Robust eye
movement recognition using EOG signal for human-computer interface. In International Conference on Software
Engineering and Computer Systems. Springer, 714–723.
[16] Nik Khadijah Nik Aznan, Stephen Bonner, Jason D Connolly, Noura Al Moubayed, and Toby P Breckon. 2018. On the
Classification of SSVEP-Based Dry-EEG Signals via Convolutional Neural Networks. arXiv preprint arXiv:1805.04157
(2018).
[17] Tonio Ball, Markus Kern, Isabella Mutschler, Ad Aertsen, and Andreas Schulze-Bonhage. 2009. Signal quality of
simultaneously recorded invasive and non-invasive EEG. Neuroimage 46, 3 (2009), 708–716.
[18] Vasileios Baltatzis, Kyriaki-Margarita Bintsi, Georgios K Apostolidis, and Leontios J Hadjileontiadis. 2017. Bullying
incidences identification within an immersive environment using HD EEG-based analysis: A Swarm Decomposition
and Deep Learning approach. Scientific reports 7, 1 (2017), 17292.
[19] S Kathleen Bandt, Jarod L Roland, Mrinal Pahwa, Carl D Hacker, David T Bundy, Jonathan D Breshears, Mohit
Sharma, Joshua S Shimony, and Eric C Leuthardt. 2017. The impact of high grade glial neoplasms on human cortical
electrophysiology. PloS one 12, 3 (2017), e0173448.
[20] Ali Bashashati, Mehrdad Fatourechi, Rabab K Ward, and Gary E Birch. 2007. A survey of signal processing algorithms
in brain–computer interfaces based on electrical brain signals. Journal of Neural engineering 4, 2 (2007), R32.
[21] Pouya Bashivan, Irina Rish, and Steve Heisig. 2016. Mental state recognition via Wearable EEG. arXiv preprint
arXiv:1602.00985 (2016).
[22] Pouya Bashivan, Irina Rish, Mohammed Yeasin, and Noel Codella. 2016. Learning representations from EEG with
deep recurrent-convolutional neural networks. ICLR (2016).
[23] Pouya Bashivan, Mohammed Yeasin, and Gavin M Bidelman. 2015. Single trial prediction of normal and excessive
cognitive load through EEG feature fusion. In Signal Processing in Medicine and Biology Symposium (SPMB), 2015 IEEE.
IEEE, 1–5.
[24] Joos Behncke, Robin T Schirrmeister, Wolfram Burgard, and Tonio Ball. 2018. The signature of robot action success
in EEG signals of a human observer: Decoding and visualization using deep convolutional neural networks. In
Brain-Computer Interface (BCI), 2018 6th International Conference on. IEEE, 1–6.
[25] Andrei Belitski, Jason Farquhar, and Peter Desain. 2011. P300 audio-visual speller. Journal of Neural Engineering 8, 2
(2011), 025022.
[26] Pushkar Bhatkoti and Manoranjan Paul. 2016. Early diagnosis of Alzheimer’s disease: A multi-class deep learning
framework with modified k-sparse autoencoder classification. In Image and Vision Computing New Zealand (IVCNZ),
2016 International Conference on. IEEE, 1–5.
[27] Siddharth Biswal, Joshua Kulas, Haoqi Sun, Balaji Goparaju, M Brandon Westover, Matt T Bianchi, and Jimeng Sun.
2017. SLEEPNET: automated sleep staging system via deep learning. arXiv preprint arXiv:1707.08262 (2017).
[28] Eduardo Carabez, Miho Sugi, Isao Nambu, and Yasuhiro Wada. 2017. Identifying Single Trial Event-Related Potentials
in an Earphone-Based Auditory Brain-Computer Interface. Applied Sciences 7, 11 (2017), 1197.
[29] H Cecotti. 2017. Convolutional neural networks for event-related potential detection: impact of the architecture.
In Engineering in Medicine and Biology Society (EMBC), 2017 39th Annual International Conference of the IEEE. IEEE,
2031–2034.
[30] Hubert Cecotti, Miguel P Eckstein, and Barry Giesbrecht. 2014. Single-trial classification of event-related potentials
in rapid serial visual presentation tasks using supervised spatial filtering. IEEE transactions on neural networks and
learning systems 25, 11 (2014), 2030–2042.
[31] Hubert Cecotti and Axel Graser. 2011. Convolutional neural networks for P300 detection with application to brain-
computer interfaces. IEEE transactions on pattern analysis and machine intelligence 33, 3 (2011), 433–445.
[32] Hubert Cecotti and Anthony J Ries. 2017. Best practice for single-trial detection of event-related potentials: Application
to brain-computer interfaces. International Journal of Psychophysiology 111 (2017), 156–169.
[33] Rifai Chai, Sai Ho Ling, Phyo Phyo San, Ganesh R Naik, Tuan N Nguyen, Yvonne Tran, Ashley Craig, and Hung T
Nguyen. 2017. Improving eeg-based driver fatigue classification using sparse-deep belief networks. Frontiers in
neuroscience 11 (2017), 103.

, Vol. 1, No. 1, Article 1. Publication date: January 2016.


A Survey on Deep Learning based Brain Computer Interface 1:57

[34] Xin Chai, Qisong Wang, Yongping Zhao, Xin Liu, Ou Bai, and Yongqiang Li. 2016. Unsupervised domain adaptation
techniques based on auto-encoder for non-stationary EEG-based emotion recognition. Computers in biology and
medicine 79 (2016), 205–214.
[35] Stanislas Chambon, Mathieu N Galtier, Pierrick J Arnal, Gilles Wainrib, and Alexandre Gramfort. 2018. A deep learning
architecture for temporal sleep stage classification using multivariate and multimodal time series. IEEE Transactions
on Neural Systems and Rehabilitation Engineering (2018).
[36] Keith H Chiappa. 1997. Evoked potentials in clinical medicine. Lippincott Williams & Wilkins.
[37] Antonio Maria Chiarelli, Pierpaolo Croce, Arcangelo Merla, and Filippo Zappasodi. 2018. Deep learning for hybrid
EEG-fNIRS brain–computer interface: application to motor imagery classification. Journal of neural engineering 15, 3
(2018), 036028.
[38] Lei Chu, Robert Qiu, Haichun Liu, Zenan Ling, Tianhong Zhang, and Jijun Wang. 2017. Individual recognition in
schizophrenia using deep learning methods with random forest and voting classifiers: Insights from resting state EEG
streams. arXiv preprint arXiv:1707.03467 (2017).
[39] Radoslaw Martin Cichy, Aditya Khosla, Dimitrios Pantazis, and Aude Oliva. 2017. Dynamics of scene representations
in the human brain revealed by magnetoencephalography and deep neural networks. Neuroimage 153 (2017), 346–358.
[40] Radoslaw Martin Cichy, Aditya Khosla, Dimitrios Pantazis, Antonio Torralba, and Aude Oliva. 2016. Comparison of
deep neural networks to spatio-temporal cortical dynamics of human visual object recognition reveals hierarchical
correspondence. Scientific reports 6 (2016), 27755.
[41] Mengxi Dai, Dezhi Zheng, Rui Na, Shuai Wang, and Shuailei Zhang. 2019. EEG Classification of Motor Imagery Using
a Novel Deep Learning Framework. Sensors 19, 3 (2019), 551.
[42] Li Deng. 2014. A tutorial survey of architectures, algorithms, and applications for deep learning. APSIPA Transactions
on Signal and Information Processing 3 (2014).
[43] Li Deng, Dong Yu, and others. 2014. Deep learning: methods and applications. Foundations and Trends® in Signal
Processing 7, 3–4 (2014), 197–387.
[44] Hao Dong, Akara Supratak, Wei Pan, Chao Wu, Paul M Matthews, and Yike Guo. 2018. Mixed neural network approach
for temporal sleep stage classification. IEEE Transactions on Neural Systems and Rehabilitation Engineering 26, 2 (2018),
324–333.
[45] Li-Huan Du, Wei Liu, Wei-Long Zheng, and Bao-Liang Lu. 2017. Detecting driving fatigue with multimodal deep
learning. In Neural Engineering (NER), 2017 8th International IEEE/EMBS Conference on. IEEE, 74–77.
[46] Lijuan Duan, Menghu Bao, Jun Miao, Yanhui Xu, and Juncheng Chen. 2016. Classification Based on Multilayer Extreme
Learning Machine for Motor Imagery Task from EEG signals. Procedia Computer Science 88 (2016), 176–184.
[47] Lawrence Ashley Farwell and Emanuel Donchin. 1988. Talking off the top of your head: toward a mental prosthesis
utilizing event-related brain potentials. Electroencephalography and clinical Neurophysiology 70, 6 (1988), 510–523.
[48] Mehrdad Fatourechi, Ali Bashashati, Rabab K Ward, and Gary E Birch. 2007. EMG and EOG artifacts in brain computer
interface systems: A survey. Clinical neurophysiology 118, 3 (2007), 480–494.
[49] Isaac Fernández-Varela, Dimitrios Athanasakis, Samuel Parsons, Elena Hernández-Pereira, and Vicente Moret-Bonillo.
Sleep staging with deep learning: a convolutional model. In Proceedings of the European Symposium on Artificial Neural
Networks, Computational Intelligence and Machine Learning (ESANN 2018).
[50] Luay Fraiwan and Khaldon Lweesy. 2017. Neonatal sleep state identification using deep learning autoencoders. In
Signal Processing & its Applications (CSPA), 2017 IEEE 13th International Colloquium on. IEEE, 228–231.
[51] Arvid Frydenlund and Frank Rudzicz. 2015. Emotional affect estimation using video and EEG data in deep neural
networks. In Canadian Conference on Artificial Intelligence. Springer, 273–280.
[52] Wei Gao, Jin-an Guan, Junfeng Gao, and Dao Zhou. 2015. Multi-ganglion ANN based feature learning with application
to P300-BCI signal classification. Biomedical Signal Processing and Control 18 (2015), 127–137.
[53] Yongbin Gao, Hyo Jong Lee, and Raja Majid Mehmood. 2015. Deep learninig of EEG signals for emotion recognition.
In Multimedia & Expo Workshops (ICMEW), 2015 IEEE International Conference on. IEEE, 1–5.
[54] Prabhat Garg, Elizabeth Davenport, Gowtham Murugesan, Ben Wagner, Christopher Whitlow, Joseph Maldjian, and
Albert Montillo. 2017. Automatic 1D convolutional neural network-based detection of artifacts in MEG acquired
without electrooculography or electrocardiography. In Pattern Recognition in Neuroimaging (PRNI), 2017 International
Workshop on. IEEE, 1–4.
[55] Patrick O Glauner. 2015. Comparison of training methods for deep neural networks. arXiv preprint arXiv:1504.06825
(2015).
[56] Meysam Golmohammadi, Saeedeh Ziyabari, Vinit Shah, Silvia Lopez de Diego, Iyad Obeid, and Joseph Picone. 2017.
Deep Architectures for Automated Seizure Detection in Scalp EEGs. arXiv preprint arXiv:1712.09776 (2017).
[57] Ian Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and
Yoshua Bengio. 2014. Generative adversarial nets. In Advances in neural information processing systems. 2672–2680.

, Vol. 1, No. 1, Article 1. Publication date: January 2016.


1:58 Xiang Zhang, et al.

[58] Yuri Gordienko, Sergii Stirenko, Yuriy Kochura, Oleg Alienin, Michail Novotarskiy, and Nikita Gordienko. 2017. Deep
learning for fatigue estimation on the basis of multimodal human-machine interactions. arXiv preprint arXiv:1801.06048
(2017).
[59] Stephen M Gordon, Matthew Jaswa, Amelia J Solon, and Vernon J Lawhern. 2017. Real world BCI: cross-domain
learning and practical applications. In Proceedings of the 2017 ACM Workshop on An Application-oriented Approach to
BCI out of the laboratory. ACM, 25–28.
[60] Christoph Guger, Shahab Daban, Eric Sellers, Clemens Holzner, Gunther Krausz, Roberta Carabalona, Furio Gramatica,
and Guenter Edlinger. 2009. How many people are able to control a P300-based brain–computer interface (BCI)?
Neuroscience letters 462, 1 (2009), 94–98.
[61] Qiong Gui, Maria Ruiz-blondet, Sarah Laszlo, and Zhanpeng Jin. 2019. A Survey on Brain Biometrics. Comput. Surveys
51, 112 (2019).
[62] A HACHEM, Mohamed Moncef Ben Khelifa, Adel M Alimi, Philippe Gorce, SP VALAN ARASU, S BAULKANI,
SUKANT KISHORO BISOY, PRASANT KUMAR PATTNAIK, SELVI RAVINDRAN, NARAYANASAMY PALANISAMY,
and others. 2014. Effect of fatigue on ssvep during virtual wheelchair navigation. Journal of Theoretical and Applied
Information Technology 65, 1 (2014).
[63] Ali Haider and Reza Fazel-Rezai. 2017. Application of P300 Event-Related Potential in Brain-Computer Interface. In
Event-Related Potentials and Evoked Potentials. InTech.
[64] Mehdi Hajinoroozi, Tzyy-Ping Jung, Chin-Teng Lin, and Yufei Huang. 2015. Feature extraction with deep belief
networks for driver’s cognitive states prediction from EEG data. In Signal and Information Processing (ChinaSIP), 2015
IEEE China Summit and International Conference on. IEEE, 812–815.
[65] Mehdi Hajinoroozi, Zijing Mao, and Yufei Huang. 2015. Prediction of driver’s drowsy and alert states from EEG
signals with deep learning. In Computational Advances in Multi-Sensor Adaptive Processing (CAMSAP), 2015 IEEE 6th
International Workshop on. IEEE, 493–496.
[66] Mehdi Hajinoroozi, Zijing Mao, Yuan-Pin Lin, and Yufei Huang. 2017. Deep Transfer Learning for Cross-subject
and Cross-experiment Prediction of Image Rapid Serial Visual Presentation Events from EEG Data. In International
Conference on Augmented Cognition. Springer, 45–55.
[67] Mehdi Hajinoroozi, Jianqiu Zhang, and Yufei Huang. 2017. Prediction of fatigue-related driver performance from EEG
data by deep Riemannian model. In Engineering in Medicine and Biology Society (EMBC), 2017 39th Annual International
Conference of the IEEE. IEEE, 4167–4170.
[68] Changhee Han, Hideaki Hayashi, Leonardo Rundo, Ryosuke Araki, Wataru Shimoda, Shinichi Muramatsu, Yujiro
Furukawa, Giancarlo Mauri, and Hideki Nakayama. 2018. GAN-based synthetic brain MR image generation. In 2018
IEEE 15th International Symposium on Biomedical Imaging (ISBI 2018). IEEE, 734–738.
[69] Kay Gregor Hartmann, Robin Tibor Schirrmeister, and Tonio Ball. 2018. Hierarchical internal representation of
spectral features in deep convolutional networks trained for EEG decoding. In Brain-Computer Interface (BCI), 2018 6th
International Conference on. IEEE, 1–6.
[70] Ahmad Hasasneh, Nikolas Kampel, Praveen Sripad, N Jon Shah, and Jürgen Dammers. 2018. Deep Learning Approach
for Automatic Classification of Ocular and Cardiac Artifacts in MEG Data. Journal of Engineering 2018 (2018).
[71] Mohammad Havaei, Axel Davy, David Warde-Farley, Antoine Biard, Aaron Courville, Yoshua Bengio, Chris Pal,
Pierre-Marc Jodoin, and Hugo Larochelle. 2017. Brain tumor segmentation with deep neural networks. Medical image
analysis 35 (2017), 18–31.
[72] Johannes Hennrich, Christian Herff, Dominic Heger, and Tanja Schultz. 2015. Investigating deep learning for fNIRS
based BCI.. In EMBC. 2844–2847.
[73] Luis G Hernández, Oscar Martinez Mozos, José M Ferrández, and Javier M Antelis. 2018. EEG-Based Detection of
Braking Intention Under Different Car Driving Conditions. Frontiers in neuroinformatics 12 (2018).
[74] Geoffrey E Hinton and Ruslan R Salakhutdinov. 2006. Reducing the dimensionality of data with neural networks.
science 313, 5786 (2006), 504–507.
[75] Tomoyuki Hiroyasu, Kenya Hanawa, and Utako Yamamoto. 2014. Gender classification of subjects from cerebral blood
flow changes using Deep Learning. In Computational Intelligence and Data Mining (CIDM), 2014 IEEE Symposium on.
IEEE, 229–233.
[76] Mark L Homer, Arto V Nurmikko, John P Donoghue, and Leigh R Hochberg. 2013. Sensors and decoding for intracortical
brain computer interfaces. Annual review of biomedical engineering 15 (2013), 383–405.
[77] Mohammad-Parsa Hosseini, Dario Pompili, Kost Elisevich, and Hamid Soltanian-Zadeh. 2017. Optimized deep learning
for EEG big data and seizure prediction BCI via internet of things. IEEE Transactions on Big Data 3, 4 (2017), 392–404.
[78] Mohammad-Parsa Hosseini, Hamid Soltanian-Zadeh, Kost Elisevich, and Dario Pompili. 2017. Cloud-based deep
learning of big eeg data for epileptic seizure prediction. arXiv preprint arXiv:1702.05192 (2017).
[79] Mohammad-Parsa Hosseini, Tuyen X Tran, Dario Pompili, Kost Elisevich, and Hamid Soltanian-Zadeh. 2017. Deep
learning with edge computing for localization of epileptogenicity using multimodal rs-fMRI and EEG big data. In

, Vol. 1, No. 1, Article 1. Publication date: January 2016.


A Survey on Deep Learning based Brain Computer Interface 1:59

Autonomic Computing (ICAC), 2017 IEEE International Conference on. IEEE, 83–92.
[80] Chenhui Hu, Ronghui Ju, Yusong Shen, Pan Zhou, and Quanzheng Li. 2016. Clinical decision support for Alzheimer’s
disease based on deep learning and brain network. In Communications (ICC), 2016 IEEE International Conference on.
IEEE, 1–6.
[81] Dandan Huang, Kai Qian, Ding-Yu Fei, Wenchuan Jia, Xuedong Chen, and Ou Bai. 2012. Electroencephalography
(EEG)-based brain–computer interface (BCI): A 2-D virtual wheelchair control based on event-related desynchro-
nization/synchronization and state control. IEEE Transactions on Neural Systems and Rehabilitation Engineering 20, 3
(2012), 379–388.
[82] Yu-Chia Hung, Yu-Kai Wang, Mukesh Prasad, and Chin-Teng Lin. 2017. Brain dynamic states analysis based on 3D
convolutional neural network. In Systems, Man, and Cybernetics (SMC), 2017 IEEE International Conference on. IEEE,
222–227.
[83] Gauvain Huve, Kazuhiko Takahashi, and Masafumi Hashimoto. 2017. Brain activity recognition with a wearable
fNIRS using neural networks. In Mechatronics and Automation (ICMA), 2017 IEEE International Conference on. IEEE,
1573–1578.
[84] Gauvain Huve, Kazuhiko Takahashi, and Masafumi Hashimoto. 2018. Brain-computer interface using deep neural
network and its application to mobile robot control. In Advanced Motion Control (AMC), 2018 IEEE 15th International
Workshop on. IEEE, 169–174.
[85] Xiaowei Jia, Kang Li, Xiaoyi Li, and Aidong Zhang. 2014. A novel semi-supervised deep learning framework for
affective state recognition on eeg signals. In Bioinformatics and Bioengineering (BIBE), 2014 IEEE International Conference
on. IEEE, 30–37.
[86] Liu Jingwei, Cheng Yin, and Zhang Weidong. 2015. Deep learning EEG response representation for brain computer
interface. In Control Conference (CCC), 2015 34th Chinese. IEEE, 3518–3523.
[87] Alexander Rosenberg Johansen, Jing Jin, Tomasz Maszczyk, Justin Dauwels, Sydney S Cash, and M Brandon Westover.
2016. Epileptiform spike detection via convolutional neural networks. In Acoustics, Speech and Signal Processing
(ICASSP), 2016 IEEE International Conference on. IEEE, 754–758.
[88] Palazzo S. Spampinato C. Giordano D. Shah M. Kavasidis, I. 2017. Brain2image: Converting brain signals into images.
In Proceedings of the 25th ACM international conference on Multimedia. 1809–1817.
[89] Koki Kawasaki, Tomohiro Yoshikawa, and Takeshi Furuhashi. 2015. Visualizing extracted feature by deep learning in
P300 discrimination task. In Soft Computing and Pattern Recognition (SoCPaR), 2015 7th International Conference of.
IEEE, 149–154.
[90] Piyush Kawde and Gyanendra K Verma. 2017. Deep belief network based affect recognition from physiological signals.
In Electrical, Computer and Electronics (UPCON), 2017 4th IEEE Uttar Pradesh Section International Conference on. IEEE,
587–592.
[91] Prerna Khurana, Angshul Majumdar, and Rabab Ward. 2016. Class-wise deep dictionaries for EEG classification. In
Neural Networks (IJCNN), 2016 International Joint Conference on. IEEE, 3556–3563.
[92] Diederik P Kingma and Max Welling. 2013. Auto-encoding variational bayes. arXiv preprint arXiv:1312.6114 (2013).
[93] Isabell Kiral-Kornek, Subhrajit Roy, Ewan Nurse, Benjamin Mashford, Philippa Karoly, Thomas Carroll, Daniel Payne,
Susmita Saha, Steven Baldassano, Terence O’Brien, and others. 2018. Epileptic seizure prediction using big data and
deep learning: toward a mobile system. EBioMedicine 27 (2018), 103–111.
[94] Toshiaki Koike-Akino, Ruhi Mahajan, Tim K Marks, Ye Wang, Shinji Watanabe, Oncel Tuzel, and Philip Orlik. 2016.
High-accuracy user identification using EEG biometrics. In 2016 38th Annual International Conference of the IEEE
Engineering in Medicine and Biology Society (EMBC). IEEE, 854–858.
[95] Sotetsu Koyamada, Yumi Shikauchi, Ken Nakae, Masanori Koyama, and Shin Ishii. 2015. Deep learning of fMRI big
data: a novel approach to subject-transfer decoding. arXiv preprint arXiv:1502.00093 (2015).
[96] JP Kulasingham, V Vibujithan, and AC De Silva. 2016. Deep belief networks and stacked autoencoders for the P300
Guilty Knowledge Test. In Biomedical Engineering and Sciences (IECBES), 2016 IEEE EMBS Conference on. IEEE, 127–132.
[97] Shiu Kumar, Alok Sharma, Kabir Mamun, and Tatsuhiko Tsunoda. 2016. A deep learning approach for motor imagery
EEG signal classification. In Computer Science and Engineering (APWC on CSE), 2016 3rd Asia-Pacific World Congress on.
IEEE, 34–39.
[98] No-Sang Kwak, Klaus-Robert Müller, and Seong-Whan Lee. 2017. A convolutional neural network for steady state
visual evoked potential classification under ambulatory environment. PloS one 12, 2 (2017), e0172578.
[99] Vernon Lawhern, Amelia Solon, Nicholas Waytowich, Stephen M Gordon, Chou Hung, and Brent J Lance. 2018.
EEGNet: a compact convolutional neural network for EEG-based brain–computer interfaces. Journal of neural
engineering (2018).
[100] Hyeon Kyu Lee and Young-Seok Choi. 2018. A convolution neural networks scheme for classification of motor
imagery EEG based on wavelet time-frequecy image. In Information Networking (ICOIN), 2018 International Conference
on. IEEE, 906–909.

, Vol. 1, No. 1, Article 1. Publication date: January 2016.


1:60 Xiang Zhang, et al.

[101] Stephanie Lees, Natalie Dayan, Hubert Cecotti, Paul Mccullagh, Liam Maguire, Fabien Lotte, and Damien Coyle.
2018. A review of rapid serial visual presentation-based brain–computer interfaces. Journal of neural engineering 15, 2
(2018), 021001.
[102] Eric C Leuthardt, Gerwin Schalk, Jarod Roland, Adam Rouse, and Daniel W Moran. 2009. Evolution of brain-computer
interfaces: going beyond classic motor physiology. Neurosurgical focus 27, 1 (2009), E4.
[103] Junhua Li and Andrzej Cichocki. 2014. Deep learning of multifractal attributes from motor imagery induced EEG. In
International Conference on Neural Information Processing. Springer, 503–510.
[104] Junhua Li, Zbigniew Struzik, Liqing Zhang, and Andrzej Cichocki. 2015. Feature learning from incomplete EEG with
denoising autoencoder. Neurocomputing 165 (2015), 23–31.
[105] Jinpeng Li, Zhaoxiang Zhang, and Huiguang He. 2016. Implementation of eeg emotion recognition system based on
hierarchical convolutional neural networks. In International Conference on Brain Inspired Cognitive Systems. Springer,
22–33.
[106] Jinpeng Li, Zhaoxiang Zhang, and Huiguang He. 2017. Hierarchical convolutional neural networks for EEG-based
emotion recognition. Cognitive Computation (2017), 1–13.
[107] Kang Li, Xiaoyi Li, Yuan Zhang, and Aidong Zhang. 2013. Affective state recognition from EEG with deep belief
networks. In 2013 IEEE International Conference on Bioinformatics and Biomedicine. IEEE, 305–310.
[108] Pinyi Li, Wenhui Jiang, and Fei Su. 2016. Single-channel EEG-based mental fatigue detection based on deep belief
network. In Engineering in Medicine and Biology Society (EMBC), 2016 IEEE 38th Annual International Conference of the.
IEEE, 367–370.
[109] Rongjian Li, Wenlu Zhang, Heung-Il Suk, Li Wang, Jiang Li, Dinggang Shen, and Shuiwang Ji. 2014. Deep learning
based imaging data completion for improved brain disease diagnosis. In International Conference on Medical Image
Computing and Computer-Assisted Intervention. Springer, 305–312.
[110] Xiang Li, Peng Zhang, Dawei Song, Guangliang Yu, Yuexian Hou, and Bin Hu. 2015. EEG based emotion identification
using unsupervised deep feature learning. (2015).
[111] Qin Lin, Shu-qun Ye, Xiu-mei Huang, Si-you Li, Mei-zhen Zhang, Yun Xue, and Wen-Sheng Chen. 2016. Classification
of epileptic EEG signals with stacked sparse autoencoder based on deep learning. In International Conference on
Intelligent Computing. Springer, 802–810.
[112] Zhimin Lin, Ying Zeng, Li Tong, Hangming Zhang, Chi Zhang, and Bin Yan. 2017. Method for enhancing single-trial
P300 detection by introducing the complexity degree of image information in rapid serial visual presentation tasks.
PloS one 12, 12 (2017), e0184713.
[113] Geert Litjens, Thijs Kooi, Babak Ehteshami Bejnordi, Arnaud Arindra Adiyoso Setio, Francesco Ciompi, Mohsen
Ghafoorian, Jeroen Awm Van Der Laak, Bram Van Ginneken, and Clara I Sánchez. 2017. A survey on deep learning in
medical image analysis. Medical image analysis 42 (2017), 60–88.
[114] Jin Liu, Yi Pan, Min Li, Ziyue Chen, Lu Tang, Chengqian Lu, and Jianxin Wang. 2018. Applications of deep learning
to mri images: a survey. Big Data Mining and Analytics 1, 1 (2018), 1–18.
[115] Mingfei Liu, Wei Wu, Zhenghui Gu, Zhuliang Yu, FeiFei Qi, and Yuanqing Li. 2018. Deep learning based on Batch
Normalization for P300 signal detection. Neurocomputing 275 (2018), 288–297.
[116] Qi Liu, Xiao-Guang Zhao, Zeng-Guang Hou, and Hong-Guang Liu. 2017. Deep Belief Networks for EEG-Based
Concealed Information Test. In International Symposium on Neural Networks. Springer, 498–506.
[117] Wenqiang Liu, Huiping Jiang, and Yao Lu. 2017. Analyze EEG Signals with Convolutional Neural Network Based on
Power Spectrum Feature Selection. Proceedings of Science (2017).
[118] Wei Liu, Wei-Long Zheng, and Bao-Liang Lu. 2016. Emotion recognition using multimodal deep learning. In
International Conference on Neural Information Processing. Springer, 521–529.
[119] Fabien Lotte, Laurent Bougrain, Andrzej Cichocki, Maureen Clerc, Marco Congedo, Alain Rakotomamonjy, and
Florian Yger. 2018. A review of classification algorithms for EEG-based brain–computer interfaces: a 10 year update.
Journal of neural engineering 15, 3 (2018), 031005.
[120] Fabien Lotte, Marco Congedo, Anatole Lécuyer, Fabrice Lamarche, and Bruno Arnaldi. 2007. A review of classification
algorithms for EEG-based brain–computer interfaces. Journal of neural engineering 4, 2 (2007), R1.
[121] Na Lu, Tengfei Li, Xiaodong Ren, and Hongyu Miao. 2017. A deep learning scheme for motor imagery classification
based on restricted boltzmann machines. IEEE transactions on neural systems and rehabilitation engineering 25, 6 (2017),
566–576.
[122] Teng Ma, Hui Li, Hao Yang, Xulin Lv, Peiyang Li, Tiejun Liu, Dezhong Yao, and Peng Xu. 2017. The extraction of
motion-onset VEP BCI features based on deep learning and compressed sensing. Journal of neuroscience methods 275
(2017), 80–92.
[123] RK Maddula, J Stivers, M Mousavi, S Ravindran, and VR de Sa. 2017. Deep recurrent convolutional neural networks
for classifying P300 BCI signals. In Proceedings of the 7th Graz Brain-Computer Interface Conference, Graz, Austria.
18–22.

, Vol. 1, No. 1, Article 1. Publication date: January 2016.


A Survey on Deep Learning based Brain Computer Interface 1:61

[124] Mufti Mahmud, Mohammed Shamim Kaiser, Amir Hussain, and Stefano Vassanelli. 2018. Applications of deep
learning and reinforcement learning to biological data. IEEE transactions on neural networks and learning systems 29, 6
(2018), 2063–2079.
[125] Jaakko Malmivuo, Robert Plonsey, and others. 1995. Bioelectromagnetism: principles and applications of bioelectric and
biomagnetic fields. Oxford University Press, USA.
[126] Ran Manor and Amir B Geva. 2015. Convolutional neural network for multi-category rapid serial visual presentation
bci. Frontiers in computational neuroscience 9 (2015), 146.
[127] Ran Manor, Liran Mishali, and Amir B Geva. 2016. Multimodal neural network for rapid serial visual presentation
brain computer interface. Frontiers in computational neuroscience 10 (2016), 130.
[128] Martı́ Manzano, Alberto Guillén, Ignacio Rojas, and Luis Javier Herrera. 2017. Combination of EEG Data Time and
Frequency Representations in Deep Networks for Sleep Stage Classification. In International Conference on Intelligent
Computing. Springer, 219–229.
[129] Martı́ Manzano, Alberto Guillén, Ignacio Rojas, and Luis Javier Herrera. 2017. Deep learning using EEG data in time
and frequency domains for sleep stage classification. In International Work-Conference on Artificial Neural Networks.
Springer, 132–141.
[130] Zijing Mao. 2016. Deep learning for rapid serial visual presentation event from electroencephalography signal. Ph.D.
Dissertation. The University of Texas at San Antonio.
[131] Zijing Mao, Vernon Lawhern, Lenis Mauricio Merino, Kenneth Ball, Li Deng, Brent J Lance, Kay Robbins, and Yufei
Huang. 2014. Classification of non-time-locked rapid serial visual presentation events for brain-computer interaction
using deep learning. In Signal and Information Processing (ChinaSIP), 2014 IEEE China Summit & International Conference
on. IEEE, 520–524.
[132] Zijing Mao, Wan Xiang Yao, and Yufei Huang. 2017. EEG-based biometric identification with deep learning. In Neural
Engineering (NER), 2017 8th International IEEE/EMBS Conference on. IEEE, 609–612.
[133] Lopez Marc Moreno. 2017. Deep learning for brain tumor segmentation. Master diss. University of Colorado Colorado
Springs. (2017).
[134] SG Mason, A Bashashati, M Fatourechi, KF Navarro, and GE Birch. 2007. A comprehensive survey of brain interface
technology designs. Annals of biomedical engineering 35, 2 (2007), 137–169.
[135] Kaare B Mikkelsen, Simon L Kappel, Danilo P Mandic, and Preben Kidmose. 2015. EEG recorded from the ear:
Characterizing the ear-EEG method. Frontiers in neuroscience 9 (2015), 438.
[136] Seonwoo Min, Byunghan Lee, and Sungroh Yoon. 2017. Deep learning in bioinformatics. Briefings in bioinformatics
18, 5 (2017), 851–869.
[137] Juan Abdon Mioranda-Correa and Ioannis Patras. 2018. A Multi-Task Cascaded Network for Prediction of Affect,
Personality, Mood and Social Context Using EEG Signals. In Automatic Face & Gesture Recognition (FG 2018), 2018 13th
IEEE International Conference on. IEEE, 373–380.
[138] Francesco Carlo Morabito, Maurizio Campolo, Cosimo Ieracitano, Javad Mohammad Ebadi, Lilla Bonanno, Alessia
Bramanti, Simona Desalvo, Nadia Mammone, and Placido Bramanti. 2016. Deep convolutional neural networks for
classification of mild cognitive impaired and Alzheimer’s disease patients from scalp EEG recordings. In Research and
Technologies for Society and Industry Leveraging a better tomorrow (RTSI), 2016 IEEE 2nd International Forum on. IEEE,
1–6.
[139] Francesco Carlo Morabito, Maurizio Campolo, Nadia Mammone, Mario Versaci, Silvana Franceschetti, Fabrizio
Tagliavini, Vito Sofia, Daniela Fatuzzo, Antonio Gambardella, Angelo Labate, and others. 2017. Deep learning
representation from electroencephalography of Early-Stage Creutzfeldt-Jakob disease and features for differentiation
from rapidly progressive dementia. International journal of neural systems 27, 02 (2017), 1650039.
[140] Faezeh Movahedi, James L Coyle, and Ervin Sejdić. 2018. Deep belief networks for electroencephalography: A review
of recent contributions and future outlooks. IEEE journal of biomedical and health informatics 22, 3 (2018), 642–652.
[141] Sanam Narejo, Eros Pasero, and Farzana Kulsoom. 2016. EEG based eye state classification using deep belief network
and stacked AutoEncoder. International Journal of Electrical and Computer Engineering (IJECE) 6, 6 (2016), 3131–3141.
[142] Noman Naseer and Keum-Shik Hong. 2015. fNIRS-based brain-computer interfaces: a review. Frontiers in human
neuroscience 9 (2015), 3.
[143] Noman Naseer, Nauman Khalid Qureshi, Farzan Majeed Noori, and Keum-Shik Hong. 2016. Analysis of differ-
ent classification techniques for two-class functional near-infrared spectroscopy-based brain-computer interface.
Computational intelligence and neuroscience 2016 (2016).
[144] Anthony M Norcia, L Gregory Appelbaum, Justin M Ales, Benoit R Cottereau, and Bruno Rossion. 2015. The
steady-state visual evoked potential in vision research: a review. Journal of vision 15, 6 (2015), 4–4.
[145] Ewan Nurse, Benjamin S Mashford, Antonio Jimeno Yepes, Isabell Kiral-Kornek, Stefan Harrer, and Dean R Freestone.
2016. Decoding EEG and LFP signals using deep learning: heading TrueNorth. In Proceedings of the ACM International
Conference on Computing Frontiers. ACM, 259–266.

, Vol. 1, No. 1, Article 1. Publication date: January 2016.


1:62 Xiang Zhang, et al.

[146] Ewan S Nurse, Philippa J Karoly, David B Grayden, and Dean R Freestone. 2015. A generalizable brain-computer
interface (bci) using machine learning for feature discovery. PloS one 10, 6 (2015), e0131328.
[147] Andres Ortiz, Jorge Munilla, Juan M Gorriz, and Javier Ramirez. 2016. Ensembles of deep learning architectures for
the early diagnosis of the Alzheimerfis disease. International journal of neural systems 26, 07 (2016), 1650025.
[148] Marlene Pacharra, Stefan Debener, and Edmund Wascher. 2017. Concealed around-the-ear EEG captures cognitive
processing in a visual simon task. Frontiers in human neuroscience 11 (2017), 290.
[149] Adam Page, JT Turner, Tinoosh Mohsenin, and Tim Oates. 2014. Comparing Raw Data and Feature Extraction for
Seizure Detection with Deep Learning Methods.. In FLAIRS Conference.
[150] Simone Palazzo, Concetto Spampinato, Isaak Kavasidis, Daniela Giordano, and Mubarak Shah. 2017. Generative
adversarial networks conditioned by brain signals. In Proceedings of the IEEE International Conference on Computer
Vision. 3410–3418.
[151] Chethan Pandarinath, Paul Nuyujukian, Christine H Blabe, Brittany L Sorice, Jad Saab, Francis R Willett, Leigh R
Hochberg, Krishna V Shenoy, and Jaimie M Henderson. 2017. High performance communication by people with
paralysis using an intracortical brain-computer interface. Elife 6 (2017), e18554.
[152] Raja Parasuraman and Yang Jiang. 2012. Individual differences in cognition, affect, and performance: Behavioral,
neuroimaging, and molecular genetic approaches. Neuroimage 59, 1 (2012), 70–82.
[153] JL Pérez-Benı́tez, JA Pérez-Benı́tez, and JH Espina-Hernández. 2018. Development of a brain computer interface
interface using multi-frequency visual stimulation and deep neural networks. In Electronics, Communications and
Computers (CONIELECOMP), 2018 International Conference on. IEEE, 18–24.
[154] Gert Pfurtscheller and FH Lopes Da Silva. 1999. Event-related EEG/MEG synchronization and desynchronization:
basic principles. Clinical neurophysiology 110, 11 (1999), 1842–1857.
[155] Gert Pfurtscheller and Christa Neuper. 2001. Motor imagery and direct brain-computer communication. Proc. IEEE
89, 7 (2001), 1123–1134.
[156] Sergey M Plis, Devon R Hjelm, Ruslan Salakhutdinov, Elena A Allen, Henry J Bockholt, Jeffrey D Long, Hans J
Johnson, Jane S Paulsen, Jessica A Turner, and Vince D Calhoun. 2014. Deep learning for neuroimaging: a validation
study. Frontiers in neuroscience 8 (2014), 229.
[157] Michel JAM Putten, Sebastian Olbrich, and Martijn Arns. 2018. Predicting sex from brain rhythms with deep learning.
Scientific reports 8, 1 (2018), 3069.
[158] Alec Radford, Luke Metz, and Soumith Chintala. 2016. Unsupervised representation learning with deep convolutional
generative adversarial networks. International Conference on Learning Representations (ICLR) (2016).
[159] Tharun Kumar Reddy and Laxmidhar Behera. 2016. Online Eye state recognition from EEG data using Deep
architectures. In Systems, Man, and Cybernetics (SMC), 2016 IEEE International Conference on. IEEE, 000712–000717.
[160] Sangram Redkar. 2015. Using Deep Learning for Human Computer Interface via Electroencephalography. IAES
International Journal of Robotics and Automation 4, 4 (2015).
[161] David Regan. 1977. Steady-state evoked potentials. JOSA 67, 11 (1977), 1475–1489.
[162] Yuanfang Ren and Yan Wu. 2014. Convolutional deep belief networks for feature extraction of EEG signal. In Neural
Networks (IJCNN), 2014 International Joint Conference on. IEEE, 2850–2853.
[163] Roozbeh Rezaie, Panagiotis G Simos, Jack M Fletcher, Jenifer Juranek, Paul T Cirino, Zhimin Li, Antony D Passaro, and
Andrew C Papanicolaou. 2011. The timing and strength of regional brain activation associated with word recognition
in children with reading difficulties. Frontiers in human neuroscience 5 (2011), 45.
[164] Giulio Ruffini, David Ibañez, Marta Castellano, Stephen Dunne, and Aureli Soria-Frisch. 2016. EEG-driven RNN
classification for prognosis of neurodegeneration in at-risk patients. In International Conference on Artificial Neural
Networks. Springer, 306–313.
[165] Sergio Ruiz, Korhan Buyukturkoglu, Mohit Rana, Niels Birbaumer, and Ranganatha Sitaram. 2014. Real-time fMRI
brain computer interfaces: self-regulation of single brain regions to networks. Biological psychology 95 (2014), 4–20.
[166] Muhammad Rusydi, Takeo Okamoto, Satoshi Ito, and Minoru Sasaki. 2014. Rotation matrix to operate a robot
manipulator for 2D analog tracking objects using electrooculography. Robotics 3, 3 (2014), 289–309.
[167] Siavash Sakhavi, Cuntai Guan, and Shuicheng Yan. 2015. Parallel convolutional-linear neural network for motor
imagery classification. In Signal Processing Conference (EUSIPCO), 2015 23rd European. IEEE, 2736–2740.
[168] Wojciech Samek, Klaus-Robert Müller, Motoaki Kawanabe, and Carmen Vidaurre. 2012. Brain-computer interfacing
in discriminative and stationary subspaces. In Engineering in Medicine and Biology Society (EMBC), 2012 Annual
International Conference of the IEEE. IEEE, 2873–2876.
[169] Phyo Phyo San, Sai Ho Ling, Rifai Chai, Yvonne Tran, Ashley Craig, and Hung Nguyen. 2016. EEG-based driver
fatigue detection using hybrid deep generic model. In Engineering in Medicine and Biology Society (EMBC), 2016 IEEE
38th Annual International Conference of the. IEEE, 800–803.
[170] Soumalya Sarkar, Kishore Reddy, Alex Dorgan, Cali Fidopiastis, and Michael Giering. 2016. Wearable EEG-based
activity recognition in PHM-related service environment via deep learning. Int. J. Progn. Health Manag 7 (2016), 1–10.

, Vol. 1, No. 1, Article 1. Publication date: January 2016.


A Survey on Deep Learning based Brain Computer Interface 1:63

[171] Saman Sarraf and Ghassem Tofighi. 2016. Deep learning-based pipeline to recognize Alzheimer’s disease using fMRI
data. In Future Technologies Conference (FTC). IEEE, 816–820.
[172] Saman Sarraf, Ghassem Tofighi, and others. 2016. DeepAD: Alzheimerfi s Disease Classification via Deep Convolu-
tional Neural Networks using MRI and fMRI. bioRxiv (2016), 070441.
[173] R Schirrmeister, Lukas Gemein, Katharina Eggensperger, Frank Hutter, and Tonio Ball. 2017. Deep learning with
convolutional neural networks for decoding and visualization of EEG pathology. In Signal Processing in Medicine and
Biology Symposium (SPMB), 2017 IEEE. IEEE, 1–7.
[174] Jürgen Schmidhuber. 2015. Deep learning in neural networks: An overview. Neural networks 61 (2015), 85–117.
[175] K Seeliger, U Güçlü, L Ambrogioni, Y Güçlütürk, and MAJ Van Gerven. 2018. Generative adversarial networks for
reconstructing natural images from brain activity. NeuroImage 181 (2018), 775–785.
[176] Vinit Shah, Meysam Golmohammadi, Saeedeh Ziyabari, Eva Von Weltin, Iyad Obeid, and Joseph Picone. 2017.
Optimizing channel selection for seizure detection. In Signal Processing in Medicine and Biology Symposium (SPMB),
2017 IEEE. IEEE, 1–5.
[177] Mostafa Shahin, Beena Ahmed, Sana Tmar-Ben Hamida, Fathima Lamana Mulaffer, Martin Glos, and Thomas Penzel.
2017. Deep Learning and Insomnia: Assisting Clinicians With Their Diagnosis. IEEE journal of biomedical and health
informatics 21, 6 (2017), 1546–1553.
[178] Jared Shamwell, Hyungtae Lee, Heesung Kwon, Amar R Marathe, Vernon Lawhern, and William Nothwang. 2016.
Single-trial EEG RSVP classification using convolutional neural networks. In Micro-and Nanotechnology Sensors,
Systems, and Applications VIII, Vol. 9836. International Society for Optics and Photonics, 983622.
[179] Ajay Shanbhag, Aman Prabhu Kholkar, Saish Sawant, Allister Vicente, Sparsh Martires, and Supriya Patil. 2017. P300
analysis using deep neural network. In 2017 International Conference on Energy, Communication, Data Analytics and
Soft Computing (ICECDS). IEEE, 3142–3147.
[180] Jian Shang, Wei Zhang, Jiang Xiong, and Qingshan Liu. 2017. Cognitive load recognition using multi-channel complex
network method. In International Symposium on Neural Networks. Springer, 466–474.
[181] Guohua Shen, Tomoyasu Horikawa, Kei Majima, and Yukiyasu Kamitani. 2019. Deep image reconstruction from
human brain activity. PLoS computational biology 15, 1 (2019), e1006633.
[182] V Shreyas and Vinod Pankajakshan. 2017. A deep learning architecture for brain tumor segmentation in MRI images.
In Multimedia Signal Processing (MMSP), 2017 IEEE 19th International Workshop on. IEEE, 1–6.
[183] Michelle Shu and Alona Fyshe. 2013. Sparse autoencoders for word decoding from magnetoencephalography. In
Proceedings of the third NIPS Workshop on Machine Learning and Interpretation in NeuroImaging (MLINI).
[184] Surjo R Soekadar, Niels Birbaumer, Marc W Slutzky, and Leonardo G Cohen. 2015. Brain–machine interfaces in
neurorehabilitation of stroke. Neurobiology of disease 83 (2015), 172–179.
[185] Amelia J Solon, Stephen M Gordon, BJ Lance, and VJ Lawhern. 2017. Deep Learning Approaches for P300 Classification
in Image Triage: Applications to the NAILS Task. In Proceedings of the 13th NTCIR Conference on Evaluation of
Information Access Technologies, NTCIR-13, Tokyo, Japan. 5–8.
[186] Arnaud Sors, Stéphane Bonnet, Sébastien Mirek, Laurent Vercueil, and Jean-François Payen. 2018. A convolutional
neural network for sleep stage scoring from raw single-channel eeg. Biomedical Signal Processing and Control 42
(2018), 107–114.
[187] Concetto Spampinato, Simone Palazzo, Isaak Kavasidis, Daniela Giordano, Nasim Souly, and Mubarak Shah. 2017.
Deep learning human mind for automated visual classification. In Proceedings of the IEEE Conference on Computer
Vision and Pattern Recognition. 6809–6817.
[188] Ghislain St-Yves and Thomas Naselaris. 2018. Generative Adversarial Networks Conditioned on Brain Activity
Reconstruct Seen Images. In 2018 IEEE International Conference on Systems, Man, and Cybernetics (SMC). IEEE, 1054–
1061.
[189] Avital Sternin, Sebastian Stober, JA Grahn, and AM Owen. 2015. Tempo estimation from the eeg signal during percep-
tion and imagination of music. In 1st International Workshop on Brain-Computer Music Interfacing/11th International
Symposium on Computer Music Multidisciplinary Research (BCMI/CMMRfi15)(Plymouth).
[190] Sebastian Stober, Daniel J Cameron, and Jessica A Grahn. 2014. Classifying EEG Recordings of Rhythm Perception..
In ISMIR. 649–654.
[191] Sebastian Stober, Daniel J Cameron, and Jessica A Grahn. 2014. Using Convolutional Neural Networks to Recognize
Rhythm Stimuli from Electroencephalography Recordings. In Advances in neural information processing systems.
1449–1457.
[192] Sebastian Stober, Avital Sternin, Adrian M Owen, and Jessica A Grahn. 2015. Deep feature learning for EEG recordings.
arXiv preprint arXiv:1511.04306 (2015).
[193] Irene Sturm, Sebastian Lapuschkin, Wojciech Samek, and Klaus-Robert Müller. 2016. Interpretable deep neural
networks for single-trial EEG classification. Journal of neuroscience methods 274 (2016), 141–145.

, Vol. 1, No. 1, Article 1. Publication date: January 2016.


1:64 Xiang Zhang, et al.

[194] Nur Farahana Mohd Suhaimi, Zaw Zaw Htike, and Nahrul Khair Alang Md Rashid. 2015. Studies on classification of
fMRI data using deep learning approach. (2015).
[195] Heung-Il Suk, Dinggang Shen, Alzheimerfis Disease Neuroimaging Initiative, and others. 2015. Deep learning in
diagnosis of brain disorders. In Recent Progress in Brain and Cognitive Engineering. Springer, 203–213.
[196] Heung-Il Suk, Chong-Yaw Wee, Seong-Whan Lee, and Dinggang Shen. 2016. State-space model with deep learning
for functional dynamics estimation in resting-state fMRI. NeuroImage 129 (2016), 292–307.
[197] Akara Supratak, Hao Dong, Chao Wu, and Yike Guo. 2017. DeepSleepNet: a model for automatic sleep stage scoring
based on raw single-channel EEG. IEEE Transactions on Neural Systems and Rehabilitation Engineering 25, 11 (2017),
1998–2008.
[198] Yousef Rezaei Tabar and Ugur Halici. 2016. A novel deep learning approach for classification of EEG motor imagery
signals. Journal of neural engineering 14, 1 (2016), 016003.
[199] Sachin S Talathi. 2017. Deep Recurrent Neural Networks for seizure detection and early seizure detection systems.
arXiv preprint arXiv:1706.03283 (2017).
[200] Chuanqi Tan, Fuchun Sun, Wenchang Zhang, Jianhua Chen, and Chunfang Liu. 2017. Multimodal Classification with
Deep Convolutional-Recurrent Neural Networks for Electroencephalography. In International Conference on Neural
Information Processing. Springer, 767–776.
[201] Dakun Tan, Rui Zhao, Jinbo Sun, and Wei Qin. 2015. Sleep spindle detection using deep learning: a validation
study based on crowdsourcing. In Engineering in Medicine and Biology Society (EMBC), 2015 37th Annual International
Conference of the IEEE. IEEE, 2828–2831.
[202] Zhichuan Tang, Chao Li, and Shouqian Sun. 2017. Single-trial EEG classification of motor imagery using deep
convolutional neural networks. Optik-International Journal for Light and Electron Optics 130 (2017), 11–18.
[203] Jason Teo, Chew Lin Hou, and James Mountstephens. 2017. Deep learning for EEG-Based preference classification.
In AIP Conference Proceedings, Vol. 1891. AIP Publishing, 020141.
[204] Jason Teo, Chew Lin Hou, and James Mountstephens. 2018. Preference Classification Using Electroencephalography
(EEG) and Deep Learning. Journal of Telecommunication, Electronic and Computer Engineering (JTEC) 10, 1-11 (2018),
87–91.
[205] John Thomas, Tomasz Maszczyk, Nishant Sinha, Tilmann Kluge, and Justin Dauwels. 2017. Deep learning-based
classification for brain-computer interfaces. In Systems, Man, and Cybernetics (SMC), 2017 IEEE International Conference
on. IEEE, 234–239.
[206] Orestis Tsinalis, Paul M Matthews, Yike Guo, and Stefanos Zafeiriou. 2016. Automatic sleep stage scoring with
single-channel EEG using convolutional neural networks. arXiv preprint arXiv:1610.01683 (2016).
[207] Kostas M Tsiouris, Vasileios C Pezoulas, Michalis Zervakis, Spiros Konitsiotis, Dimitrios D Koutsouris, and Dimitrios I
Fotiadis. 2018. A Long Short-Term Memory deep learning network for the prediction of epileptic seizures using EEG
signals. Computers in biology and medicine 99 (2018), 24–37.
[208] Tao Tu, Jonathan Koss, and Paul Sajda. 2018. Relating deep neural network representations to EEG-fMRI spatiotem-
poral dynamics in a perceptual decision-making task. In Proceedings of the IEEE Conference on Computer Vision and
Pattern Recognition Workshops. 1985–1991.
[209] JT Turner, Adam Page, Tinoosh Mohsenin, and Tim Oates. 2014. Deep belief networks used on high resolution
multichannel electroencephalography data for seizure detection. In 2014 AAAI Spring Symposium Series.
[210] Tomas Uktveris and Vacius Jusas. 2017. Application of convolutional neural networks to four-class motor imagery
classification problem. Information Technology And Control 46, 2 (2017), 260–273.
[211] Ihsan Ullah, Muhammad Hussain, Hatim Aboalsamh, and others. 2018. An automated system for epilepsy detection
using EEG brain signals based on deep learning approach. Expert Systems with Applications 107 (2018), 61–71.
[212] Lukáš Vařeka and Pavel Mautner. 2017. Stacked Autoencoders for the P300 Component Detection. Frontiers in
neuroscience 11 (2017), 302.
[213] Sandra Vieira, Walter HL Pinaya, and Andrea Mechelli. 2017. Using deep learning to investigate the neuroimaging
correlates of psychiatric and neurological disorders: Methods and applications. Neuroscience & Biobehavioral Reviews
74 (2017), 58–75.
[214] Albert Vilamala, Kristoffer H Madsen, and Lars Kai Hansen. 2017. Neural Networks for Interpretable Analysis of EEG
Sleep Stage Scoring. In INTERNATIONAL WORKSHOP ON MACHINE LEARNING FOR SIGNAL PROCESSING 2017.
[215] Martin Völker, Robin T Schirrmeister, Lukas DJ Fiederer, Wolfram Burgard, and Tonio Ball. 2018. Deep transfer
learning for error decoding from non-invasive EEG. In Brain-Computer Interface (BCI), 2018 6th International Conference
on. IEEE, 1–6.
[216] Fang Wang, Sheng-hua Zhong, Jianfeng Peng, Jianmin Jiang, and Yan Liu. 2018. Data Augmentation for EEG-Based
Emotion Recognition with Deep Convolutional Neural Networks. In International Conference on Multimedia Modeling.
Springer, 82–93.

, Vol. 1, No. 1, Article 1. Publication date: January 2016.


A Survey on Deep Learning based Brain Computer Interface 1:65

[217] Jindong Wang, Yiqiang Chen, Shuji Hao, Xiaohui Peng, and Lisha Hu. 2018. Deep learning for sensor-based activity
recognition: A survey. Pattern Recognition Letters (2018).
[218] Kai Wang, Youjin Zhao, Qingyu Xiong, Min Fan, Guotan Sun, Longkun Ma, and Tong Liu. 2016. Research on
healthy anomaly detection model based on deep learning from multiple time-series physiological signals. Scientific
Programming 2016 (2016).
[219] Qian Wang, Yongjun Hu, and He Chen. 2017. Multi-channel EEG Classification Based on Fast Convolutional Feature
Extraction. In International Symposium on Neural Networks. Springer, 533–540.
[220] Xiashuang Wang, Guanghong Gong, Ni Li, and Yaofei Ma. 2016. A Survey of the BCI and its Application Prospect. In
Theory, Methodology, Tools and Applications for Modeling and Simulation of Complex Systems. Springer, 102–111.
[221] Nicholas R Waytowich, Vernon Lawhern, Javier O Garcia, Jennifer Cummings, Josef Faller, Paul Sajda, and Jean M
Vettel. 2018. Compact Convolutional Neural Networks for Classification of Asynchronous Steady-state Visual Evoked
Potentials. arXiv preprint arXiv:1803.04566 (2018).
[222] Dong Wen, Zhenhao Wei, Yanhong Zhou, Guolin Li, Xu Zhang, and Wei Han. 2018. Deep Learning Methods to
Process fMRI Data and Their Application in the Diagnosis of Cognitive Impairment: A Brief Overview and Our
Opinion. Frontiers in neuroinformatics 12 (2018), 23.
[223] Tingxi Wen and Zhongnan Zhang. 2018. Deep Convolution Neural Network and Autoencoders-Based Unsupervised
Feature Learning of EEG Signals. IEEE Access 6 (2018), 25399–25410.
[224] Bin Xia, Qianyun Li, Jie Jia, Jingyi Wang, Ujwal Chaudhary, Ander Ramos-Murguialday, and Niels Birbaumer.
2015. Electrooculogram based sleep stage classification using deep belief network. In Neural Networks (IJCNN), 2015
International Joint Conference on. IEEE, 1–5.
[225] Ziqian Xie. 2018. Deep Learning Approach for Brain Machine Interface. (2018).
[226] Ziqian Xie, Odelia Schwartz, and Abhishek Prasad. 2018. Decoding of finger trajectory from ECoG using deep
learning. Journal of neural engineering 15, 3 (2018), 036009.
[227] Haiyan Xu and Konstantinos N Plataniotis. 2016. Affective states classification using EEG and semi-supervised deep
learning approaches. In Multimedia Signal Processing (MMSP), 2016 IEEE 18th International Workshop on. IEEE, 1–6.
[228] Haiyan Xu and Konstantinos N Plataniotis. 2016. EEG-based affect states classification using deep belief networks. In
Digital Media Industry & Academic Forum (DMIAF). IEEE, 148–153.
[229] Bo Yan, Yong Wang, Yuheng Li, Yejiang Gong, Lu Guan, and Sheng Yu. 2016. An EEG signal classification method
based on sparse auto-encoders and support vector machine. In Communications in China (ICCC), 2016 IEEE/CIC
International Conference on. IEEE, 1–6.
[230] Xue Yan, Wei-Long Zheng, Wei Liu, and Bao-Liang Lu. 2017. Identifying Gender Differences in Multimodal Emotion
Recognition Using Bimodal Deep AutoEncoder. In International Conference on Neural Information Processing. Springer,
533–542.
[231] CC Yang, JS Lieberman, and CZ Hong. 1989. Early smooth horizontal eye movement: a favorable prognostic sign in
patients with locked-in syndrome. Archives of physical medicine and rehabilitation 70, 3 (1989), 230–232.
[232] Huijuan Yang, Siavash Sakhavi, Kai Keng Ang, and Cuntai Guan. 2015. On the use of convolutional neural networks
and augmented CSP features for multi-class motor imagery of EEG signals classification. In Engineering in Medicine
and Biology Society (EMBC), 2015 37th Annual International Conference of the IEEE. IEEE, 2620–2623.
[233] Roy Yannick, Banville Hubert, Albuquerque Isabela, Gramfort Alexandre, Faubert Jocelyn, and others. 2019. Deep
learning-based electroencephalography analysis: a systematic review. arXiv preprint arXiv:1901.05498 (2019).
[234] Antonio Jimeno Yepes, Jianbin Tang, and Benjamin Scott Mashford. 2017. Improving classification accuracy of
feedforward neural networks for spiking neuromorphic chips. arXiv preprint arXiv:1705.07755 (2017).
[235] Erwei Yin, Timothy Zeyl, Rami Saab, Tom Chau, Dewen Hu, and Zongtan Zhou. 2015. A hybrid brain–computer
interface based on the fusion of P300 and SSVEP scores. IEEE Transactions on Neural Systems and Rehabilitation
Engineering 23, 4 (2015), 693–701.
[236] Zhong Yin and Jianhua Zhang. 2017. Cross-session classification of mental workload levels using EEG and an adaptive
deep learning model. Biomedical Signal Processing and Control 33 (2017), 30–47.
[237] Zhong Yin, Mengyuan Zhao, Yongxiong Wang, Jingdong Yang, and Jianhua Zhang. 2017. Recognition of emotions
using multimodal physiological signals and an ensemble deep learning model. Computer methods and programs in
biomedicine 140 (2017), 93–110.
[238] Jaehong Yoon, Jungnyun Lee, and Mincheol Whang. 2018. Spatial and Time Domain Feature of ERP Speller System
Extracted via Convolutional Neural Network. Computational intelligence and neuroscience 2018 (2018).
[239] Ye Yuan, Guangxu Xun, Kebin Jia, and Aidong Zhang. 2017. A novel wavelet-based model for eeg epileptic seizure
detection using multi-context learning. In Bioinformatics and Biomedicine (BIBM), 2017 IEEE International Conference
on. IEEE, 694–699.
[240] Ye Yuan, Guangxu Xun, Fenglong Ma, Qiuling Suo, Hongfei Xue, Kebin Jia, and Aidong Zhang. 2018. A novel channel-
aware attention framework for multi-channel EEG seizure detection via multi-view deep learning. In Biomedical &

, Vol. 1, No. 1, Article 1. Publication date: January 2016.


1:66 Xiang Zhang, et al.

Health Informatics (BHI), 2018 IEEE EMBS International Conference on. IEEE, 206–209.
[241] Junming Zhang, Yan Wu, Jing Bai, and Fuqiang Chen. 2016. Automatic sleep stage classification based on sparse
deep belief net and combination of multiple classifiers. Transactions of the Institute of Measurement and Control 38, 4
(2016), 435–451.
[242] Jin Zhang, Chungang Yan, and Xiaoliang Gong. 2017. Deep convolutional neural network for decoding motor imagery
based brain computer interface. In Signal Processing, Communications and Computing (ICSPCC), 2017 IEEE International
Conference on. IEEE, 1–5.
[243] Pengyue Zhang, Fusheng Wang, Wei Xu, and Yu Li. 2018. Multi-channel generative adversarial network for parallel
magnetic resonance image reconstruction in k-space. In International Conference on Medical Image Computing and
Computer-Assisted Intervention. Springer, 180–188.
[244] Tong Zhang, Wenming Zheng, Zhen Cui, Yuan Zong, and Yang Li. 2018. Spatial-temporal recurrent neural network
for emotion recognition. IEEE transactions on cybernetics 99 (2018), 1–9.
[245] Xiang Zhang, Lina Yao, Kaixuan Chen, Xianzhi Wang, Quanz Sheng, and Tao Gu. 2017. DeepKey: An EEG and Gait
Based Dual-Authentication System. arXiv preprint arXiv:1706.01606 (2017).
[246] Xiang Zhang, Lina Yao, Chaoran Huang, Salil S Kanhere, and Dalin Zhang. 2018. Brain2Object: Printing Your Mind
from Brain Signals with Spatial Correlation Embedding. arXiv preprint arXiv:1810.02223 (2018).
[247] Xiang Zhang, Lina Yao, Chaoran Huang, Quan Z Sheng, and Xianzhi Wang. 2017. Intent recognition in smart
living through deep recurrent neural networks. In International Conference on Neural Information Processing. Springer,
748–758.
[248] Xiang Zhang, Lina Yao, Chaoran Huang, Sen Wang, Mingkui Tan, Guodong Long, and Can Wang. 2018. Multi-
modality sensor data classification with selective attention. International Joint Conferences on Artificial Intelligence
(IJCAI) (2018).
[249] Xiang Zhang, Lina Yao, Salil S Kanhere, Yunhao Liu, Tao Gu, and Kaixuan Chen. 2018. MindID: Person Identification
from Brain Waves through Attention-based Recurrent Neural Network. Proceedings of the ACM on Interactive, Mobile,
Wearable and Ubiquitous Technologies 2, 3 (2018), 149.
[250] Xiang Zhang, Lina Yao, Quan Z Sheng, Salil S Kanhere, Tao Gu, and Dalin Zhang. 2018. Converting your thoughts
to texts: Enabling brain typing via deep feature learning of EEG signals. In 2018 IEEE International Conference on
Pervasive Computing and Communications (PerCom). IEEE, 1–10.
[251] Xiang Zhang, Lina Yao, and Feng Yuan. 2019. Adversarial Variational Embedding for Robust Semi-supervised
Learning. In SIGKDD 2019.
[252] Xiang Zhang, Lina Yao, Dalin Zhang, Xianzhi Wang, Quan Z Sheng, and Tao Gu. 2017. Multi-person brain activity
recognition via comprehensive eeg signal analysis. In Proceedings of the 14th EAI International Conference on Mobile
and Ubiquitous Systems: Computing, Networking and Services. ACM, 28–37.
[253] Xiang Zhang, Lina Yao, Shuai Zhang, Salil Kanhere, Michael Sheng, and Yunhao Liu. 2018. Internet of Things Meets
Brain-Computer Interface: A Unified Deep Learning Framework for Enabling Human-Thing Cognitive Interactivity.
IEEE Internet of Things Journal (2018).
[254] Yilu Zhao and Lianghua He. 2014. Deep learning in the EEG diagnosis of Alzheimerfis disease. In Asian Conference
on Computer Vision. Springer, 340–353.
[255] Wei-Long Zheng, Hao-Tian Guo, and Bao-Liang Lu. 2015. Revealing critical channels and frequency bands for
emotion recognition from EEG with deep belief network. In Neural Engineering (NER), 2015 7th International IEEE/EMBS
Conference on. IEEE, 154–157.
[256] Wei-Long Zheng and Bao-Liang Lu. 2015. Investigating critical frequency bands and channels for EEG-based emotion
recognition with deep neural networks. IEEE Transactions on Autonomous Mental Development 7, 3 (2015), 162–175.
[257] Wei-Long Zheng and Bao-Liang Lu. 2016. Personalizing EEG-based affective models with transfer learning. In
Proceedings of the Twenty-Fifth International Joint Conference on Artificial Intelligence. AAAI Press, 2732–2738.
[258] Wei-Long Zheng, Jia-Yi Zhu, Yong Peng, and Bao-Liang Lu. 2014. EEG-based emotion classification using deep belief
networks. In Multimedia and Expo (ICME), 2014 IEEE International Conference on. IEEE, 1–6.

, Vol. 1, No. 1, Article 1. Publication date: January 2016.

View publication stats

You might also like