arXiv:1907.10509v1 [eess.SP] 19 Jul 2019
This is the version of the article before peer review or editing, as submitted by an
author to Journal of Neural Engineering. IOP Publishing Ltd is not responsible for any
errors or omissions in this version of the manuscript or any version derived from it. The
Version of Record is available online at https://doi.org/10.1088/1741-2552/aae8c7.
Direct information transfer rate optimisation for
SSVEP-based BCI
Anti Ingel
Institute of Computer Science, University of Tartu, Tartu, Estonia
E-mail: antiingel@gmail.com
Ilya Kuzovkin
Institute of Computer Science, University of Tartu, Tartu, Estonia
E-mail: ilya.kuzovkin@gmail.com
Raul Vicente
Institute of Computer Science, University of Tartu, Tartu, Estonia
E-mail: raulvicente@gmail.com
June 2018
Abstract. In this work, a classification method for SSVEP-based BCI is proposed.
The classification method uses features extracted by traditional SSVEP-based BCI
methods and finds optimal discrimination thresholds for each feature to classify
the targets. Optimising the thresholds is formalised as a maximisation task of a
performance measure of BCIs called information transfer rate (ITR). However, instead
of the standard method of calculating ITR, which makes certain assumptions about
the data, a more general formula is derived to avoid incorrect ITR calculation when the
standard assumptions are not met. This allows the optimal discrimination thresholds
to be automatically calculated and thus eliminates the need for manual parameter
selection or performing computationally expensive grid searches.
The proposed method shows good performance in classifying targets of a BCI,
outperforming previously reported results on the same dataset by a factor of 2 in
terms of ITR. The highest achieved ITR on the used dataset was 62 bit/min. The
proposed method also provides a way to reduce false classifications, which is important
in real-world applications.
1. Introduction
In this work, a classification method for steady-state visual evoked potential (SSVEP)
based brain-computer interface (BCI) is proposed. Brain-computer interface is a direct
communication channel between the brain and some external device. SSVEP-based BCI
uses visual stimuli, called targets, to elicit a response in the brain of the user and then
Direct information transfer rate optimisation for SSVEP-based BCI
2
extract features from the recorded brain activity and, finally, classifies the result as one
of the commands that the user can send to the external device.
Often, the visual stimuli used in SSVEP-based BCIs are multiple squares displayed
on a computer screen, each flickering with different frequency [1]. Different flickering
frequencies elicit different brain responses and there are many feature extraction
methods to evaluate the presence of this signal in a recorded electroencephalography
(EEG) signal.
The novelty of the classification method proposed in this work is that it is based
on information transfer rate (ITR) maximisation. ITR is a standard measure of
performance for BCIs [2]. It combines the accuracy and the speed of the classifier
into a single number which shows how much information is transferred by the BCI in
one unit of time. Therefore, maximising ITR maximises the amount of information that
the BCI can transfer in fixed time interval.
Often, for SSVEP-based BCIs with similar classification rules, parameters like
discrimination threshold are either chosen manually or found using grid search
approaches [3–7]. This work proposes classification method that finds discrimination
thresholds so that they maximise ITR thus improving the performance of the BCI. It
is faster than grid-search approach and in addition provides a way to compare different
feature extraction methods and parameters of the BCI without having to worry that
the results are affected by the manual choice of discrimination threshold.
The proposed method also provides a way to combine multiple feature extraction
methods using linear discriminant analysis (LDA). LDA is used to project feature vectors
produced by multiple feature extraction methods into lower dimensional space after
which the classification algorithm can be used the same way as for one feature extraction
method with distances to the decision borders as new features. Combining multiple
feature extraction methods together with the threshold selection described above, the
present approach achieves more than two times larger ITR than previously reported
comparable result. The comparison was made to the works in which the same SSVEP
dataset was used as in this work to make the comparison as fair as possible.
2. Methods
2.1. SSVEP dataset for offline experiments
The performance of the proposed method was tested using publicly available SSVEP
dataset by Bakardijan et al.‡ which contains SSVEP data from four subjects [8]. The
dataset contains EEG recordings of the subjects’ brain responses to 8 Hz, 14 Hz and
28 Hz flickering. The visual stimuli were displayed on a computer monitor with 170 Hz
refresh rate, placed approximately 90 cm away from the subjects eyes.
For each subject, the dataset contains five trials for each flickering frequency. Each
trial consists of about 25 seconds of EEG data, 10 seconds of which is without the visual
‡ http://www.bakardjian.com/work/ssvep_data_Bakardjian.html
Direct information transfer rate optimisation for SSVEP-based BCI
3
stimulation. EEG data in the dataset was recorded using BIOSEMI EEG system with
128 channels and 256 Hz sampling rate. However, in this work only the data from two
channels—O1 and O2—were used. These channels were chosen to make the results more
comparable to the previous works in which similar channels were used (see Section 3.2).
The exact choice of O1 and O2 was made to increase comparability to the results where
consumer-grade EEG device Emotiv EPOC was used and for Emotiv EPOC, channels
O1 and O2 are most suitable for SSVEP detection.
2.2. Feature extraction methods
In this work, two feature extraction methods were used
• Power spectral density analysis (PSDA) [9]
• Canonical correlation analysis (CCA) [10]
Both of these methods extract a feature for each target, that is for every possible
command of the BCI. In general, the larger the feature value for a target, the more likely
the target is the user’s choice. The same holds for other feature extraction methods, like
Minimum energy combination [11], Likelihood ratio test [12] and Continuous wavelet
transform [13]. Thus these feature extraction methods could also be used with the
proposed classification method.
2.2.1. Power spectral density analysis Power spectral density analysis (PSDA) [9]
feature extraction method is based on estimating the power spectrum of the EEG signal
using fast fourier transform (FFT). Since each of the flickering targets elicits brain
response with the same fundamental frequency as the flickering itself, the estimated
power of this frequency serves as a good feature for classifying which of the targets
is the user looking at. But since brain signals are inherently very noisy, single power
value at certain time point is not enough to reliably classify the target. Thus the
feature is calculated from a fixed-size signal segment after every fixed time step. In the
experiments of this paper, window length of 1 second and time step of 0.125 seconds
were used. In addition to the power of fundamental frequency of the flickering, also the
powers of its second and third harmonic were used as features. To obtain one feature
per target, all the powers obtained for one target can be added together. However, in
the experiments of this work, LDA dimensionality reduction, described in Section 2.3,
was used to find one feature for each target. These features are then used to classify
the target.
2.2.2. Canonical correlation analysis Canonical correlation analysis (CCA) [10] feature
extraction method is inherently multidimensional, meaning that it calculates single
feature value for each target for multichannel EEG signal. This is not the case with the
PSDA method, which calculates powers for each channel separately. CCA uses canonical
correlation as a feature value, which is a correlation between two sets of signals—one
Direct information transfer rate optimisation for SSVEP-based BCI
4
set is the multichannel EEG signal and the other is set of reference signals, which mimic
the expected behaviour of the brain’s response to the stimulation. Each target has its
own set of reference signals and reference signals also contain signals for the second and
third harmonic of the flickering frequency. The feature extraction method calculates
canonical correlation of each set of reference signals to the multichannel EEG signal.
As before, the larger the correlation is, the more likely the corresponding target is the
user’s choice.
2.3. Combining multiple feature extraction methods
To make it possible for the BCI to use multiple feature extraction methods at the
same time and thus improve performance of the BCI, a dimensionality reduction and
classification method called linear discriminant analysis (LDA) was used. LDA projects
the input data to a lower-dimensional space so that the variance between the classes is
maximised with respect to the variance within the classes. To classify a new sample,
LDA projects the new sample to the lower-dimensional space and finds which class mean
is closest to it. This classification rule defines decision borders for the classes and the
distance from the decision border to a new sample can be interpreted as confidence score
for corresponding class. Therefore, to combine multiple feature extraction methods,
LDA is used to find the lower-dimensional space into which the original features are
projected. Then their distances to decision borders are used as new features. This
combines the information from all the original features into single new feature per target.
Having one feature per target is needed because this is required by the classification rule
described in section 3.1.1.
3. Results
3.1. Classification method
This section describes the derivation of the proposed classification method that is
based on ITR maximisation. It follows the description of the method given in [14].
First, the classification rule is described, and then a formula for estimating ITR from
discrimination thresholds is derived. Finally, it is shown that the formula is differentiable
and thus gradient descent can be used to find optimal discrimination thresholds. For
more details about the derivation see Appendix C and Appendix D. Implementation of
the algorithm can be found in Appendix E.
3.1.1. Classification rule Assuming that there are n targets {1, . . . , n}, a feature
extraction method extracts n features {f1 , . . . , fn }. The classification rule uses a
discrimination threshold tj for each feature fj to classify a sample as follows:
A sample is classified as class k if for this sample fk ≥ tk and fj < tj for
j ∈ {1, 2 . . . , n} \ {k}.
Direct information transfer rate optimisation for SSVEP-based BCI
5
Or in other words, a sample is classified as class k if the feature value for this class is
over the threshold and all the other feature values are below their thresholds.
It can be seen that this rule might not classify all the samples. If multiple features
are over their thresholds, or none of the features are, then classification is not made.
This case is viewed as the BCI not being confident enough and thus it avoids making
a prediction. This is useful when not making a prediction is less costly than making a
wrong prediction.
3.1.2. Finding the thresholds Finding the thresholds {t1 , . . . , tn } will be formalised as
a maximisation task, where the function to be maximised is a performance measure of
the BCI. The performance measure ITR(~t ) is a function of thresholds ~t = (t1 , . . . , tn )T
and thus the maximisation task is
arg max ITR(~t )
(1)
~t
which was solved in this work using gradient descent algorithm. The performance
measure itself is defined similarly to standard ITR, which is used to evaluate SSVEPbased BCIs, but the assumptions [2] about precisions, the probabilities of errors and
predictions is not made. In particular, the standard assumptions are
• all predictions are equally likely
• if choosing a wrong target, all wrong targets are equally likely to be chosen
• the accuracy of successfully classifying a target has to be the same for every target.
Making these assumptions while they are not met can result in incorrect ITR calculation
and can cause the maximisation algorithm to return non-optimal thresholds.
The performance measure is desired to be a continuous function of thresholds and
will be defined as
ITR(~t ) = MI(~t ) ·
60
MDT(~t )
(2)
where MI(~t ) is the mutual information between random variables P and C, which model
the predicted class and the correct class respectively, and MDT(~t ) is the mean detection
time in seconds. Mutual information MI(~t ) shows how many bits of information is
transferred with one prediction and is multiplied with the amount of predictions made
in one minute, thus giving units of bits per minute.
3.1.3. Calculating mutual information By denoting the event of predicting class i by
Pi and the event of correct class being j by Cj , the mutual information between random
variables P and C as defined previously is
n X
n
X
P(P
i ∩ Cj )
.
(3)
P(Pi ∩ Cj ) log2
MI(~t ) = I(P, C) =
P(Pi ) · P(Cj )
i=1 j=1
Direct information transfer rate optimisation for SSVEP-based BCI
6
When using the assumptions discussed in 3.1.2 the formula (3) can be simplified to
standard ITR [2] calculation formula in units of bits per prediction
1−P
(4)
ITRS = log2 N + P log2 P (1 − P ) log2
N −1
where N is the number of targets and P is the classification accuracy. Therefore the
proposed ITR calculation method is a generalisation of the standard method.
Next we show that the mutual information is indeed a function of thresholds. If the
feature value for class k is modelled with random variable Fk , the event of predicting
class i can be represented as
^
Pi = ω ∈ Ω : Fi (ω) ≥ ti ∧
Fj (ω) < tj
(5)
j∈{1,...,n}\{i}
where Ω denotes the sample space. From this, by making the assumption that
the features are conditionally independent, conditioned on the correct class C, the
conditional probability of predicting a class can be calculated as
Y
P(Pi | Ck ) = 1 − FFi |Ck (ti )
FFj |Ck (tj )
(6)
j∈{1,...,n}\{i}
where FFℓ |Ck denotes the cumulative distribution function (CDF) of the conditional
distribution Fℓ | Ck . It was empirically observed in this work that skew normal
distribution models well the conditional distributions of features (see Appendix B).
Thus by using least squares method for fitting skew normal distribution to the data, the
required CDFs can be calculated.
Therefore, after having fit skew normal distributions to the data, the probability
P(Pi | Ck ) only depends on the thresholds ~t. Now, by using known rules from probability
theory, all the probabilities required in calculating mutual information can be estimated
Number of samples from class j
Number of samples
n
X
P(Pi ) =
P(Pi | Cj )P(Cj )
P(Cj ) =
(7)
(8)
j=1
P(Pi ∩ Cj ) = P(Pi | Cj )P(Cj ).
(9)
Thus mutual information can indeed be represented as a continuous function of
thresholds and hence its gradient can be calculated.
3.1.4. Calculating mean detection time So far it has been shown that the amount of
information transferred with one prediction can be calculated as a function of thresholds.
Now we show that the mean detection time can also be represented as function of
thresholds.
Direct information transfer rate optimisation for SSVEP-based BCI
7
Since it is desired that the performance measure would reflect the performance of
the online BCI, a method for estimating online MDT is proposed. First, let us discuss
how the implemented BCI splits the incoming signal. The feature extraction methods
always use w seconds of data at a time. Whether there is an overlap between consecutive
windows depends on whether the last prediction was successful, that is prediction was
made. If a prediction is made, there is no overlap between windows. In this case the
time step between consecutive feature extractions is w. If a prediction is not made, there
is w − s seconds of overlap between windows. In this case s is the time step between
consecutive feature extractions.
By making the assumption that the probability of making a prediction is the same
in every time step and that the probability of making a prediction will be the same in
online case as it was offline, the MDT can be calculated as
1
S
−1 ·s
(10)
MDT(~t ) = w +
P ( ni=1 Pi )
where s is multiplied with the expected number of failures before a successful prediction.
MDT is a continuous function of thresholds, because
!
n
n
[
X
P
Pi =
P(Pi )
(11)
i=1
i=1
and we have already shown that P(Pi ) is a continuous function of thresholds. Thus the
performance measure as shown in equation (2) is a continuous function of thresholds.
3.1.5. Gradient Descent Finally, thresholds which maximise the ITR directly are found
using gradient ascent (GA) algorithm. GA iteratively updates the threshold values by
moving in the direction of greatest increase of ITR using the rule
~tn+1 = ~tn + µ · ∇ITR(~tn )
(12)
where ∇ITR is the gradient of ITR and µ is step size parameter. For the calculation
of the gradient, see Appendix C and Appendix D. GA stops if ITR improves less than
10−6 . Since ITR(~t ) is not concave, GA has to be run multiple times, each time obtaining
a local maximum and the final result will be the largest from these local maxima. In
this work, 20 random initialisations were used for GA.
3.2. Comparison of the proposed method to previous results
The performance of the proposed method was tested using SSVEP dataset by Bakardijan
et al. [8] and the results were compared to two other works [3, 7] where the same dataset
was used. Detailed comparison of these works can be seen in Appendix A.
In the work by Demir et al. [7], a feature extraction method called bio-inspired filter
bank (BIFB) is introduced. The method is in essence very similar to PSDA method.
In their method, power spectral density of the EEG signal is estimated and triangular
8
Direct information transfer rate optimisation for SSVEP-based BCI
Table 1. Comparison of CCA, PSDA feature extraction methods and their
combination, using proposed classifier.
CCA + PSDA (using LDA)
CCA
PSDA
S1
S2
S3
S4
Avg
S1
S2
S3
S4
Avg
S1
S2
S3
S4
Avg
ITR proposed
64.62
53.73
33.19
5.65
39.30
55.6
47.33
13.71
14.47
32.78
43.41
28.2
16.68
5.06
23.34
ITR standard
62.33
48.31
27.61
1.74
35.00
54.52
44.08
16.12
9.91
31.16
40.93
22.9
11.18
2.84
19.46
Accuracy
0.94
0.89
0.82
0.55
0.80
0.94
0.9
0.74
0.63
0.80
0.87
0.76
0.62
0.49
0.68
Mean detection time
1.14
1.22
1.56
1.72
1.41
1.33
1.37
1.9
1.61
1.55
1.33
1.41
1.29
1.56
1.4
Number of predictions
790
606
308
348
513
1008
857
270
433
642
995
718
1231
479
856
Table 2. Comparison of proposed classification method, proposed method which
always makes a prediction and random forest.
Proposed method
Proposed method: Always predicts
Random Forest
S1
S3
S3
S4
Avg
S1
S2
S3
S4
Avg
S1
S2
S3
S4
Avg
ITR proposed
64.62
53.73
33.19
5.65
39.30
41.91
31.62
19.32
1.8
23.67
40.4
31.01
19.08
2.46
23.24
ITR standard
62.33
48.31
27.61
1.74
35.00
38.31
26.94
13.12
0.02
19.60
38.63
26.03
15.62
0.97
20.31
Accuracy
0.94
0.89
0.82
0.55
0.8
0.79
0.72
0.6
0.35
0.62
0.79
0.71
0.63
0.48
0.65
Mean detection time
1.14
1.22
1.56
1.72
1.41
1
1
1
1
1
1
1
1
1
1
Number of predictions
790
606
308
348
513
1695
1695
1695
1695
1695
1695
1695
1695
1695
1695
filters are used on the result to make the powers of different frequencies more comparable.
They also compare their method to CCA feature extraction method. Only the data for
Oz electrode is used in their work.
In the work by Jukiewicz and Cysewska-Sobusiak [3] PSDA method was used for
feature extraction and two classification methods were compared—bilinear separation
and SVM. They only used electrodes O1, O2 and Oz and two classes 8 Hz and 14 Hz.
These two works were chosen, because similarly to this work, the classifier was
allowed to not make predictions for some samples. As can be seen from Table 2, the
possibility to not make predictions results in better performance. Overview of the results
of other works where the same dataset was used for testing SSVEP-based BCI can be
seen in Appendix A.
The proposed classifier was also compared to Random Forest to see how well
it performs compared to standard machine learning algorithm. Detailed comparison
can be seen in Table 2 and comparison to other works and to random forest can be
seen in Table 3. All the tables contain both, the standard ITR and proposed ITR.
The difference between them is that standard ITR uses formula (4) to estimate the
amount of information sent by one prediction, while proposed ITR uses formula (3).
Table 1 contains the comparison of PSDA and CCA feature extraction methods to their
combination. In this work, classifiers were evaluated using five-fold cross-validation,
each fold corresponding to data from separate recording trial.
9
Direct information transfer rate optimisation for SSVEP-based BCI
Table 3. Proposed classifier compared to Random Forest and related work.
Feature
Classification
Classes
extraction
Window
MDT (s)
Accuracy
length (s)
ITR
ITR
standard
proposed
PSDA + CCA
Proposed method
3
1
1.41
80%
35.00
39.30
This
PSDA + CCA
Random Forest
3
1
1
64%
20.46
23.73
work
PSDA
Proposed method
3
1
1.4
68%
19.46
23.34
PSDA
Random Forest
3
1
1
53%
9.64
12.41
[7]
BIFB
No ML
3
N/A
8.4
88%
8.2
-
[3]
PSDA
Bilinear separation
2
1
N/A
74%
16
-
4. Discussion
From the results it can be seen that allowing the classifier to not make predictions
when it is not confident enough results in better performance. This can be seen as a
trade-off between accuracy and mean detection time: allowing the classifier to not make
predictions results in better accuracy but worse mean detection time. According to
ITR, which is a combination of both of these measures, allowing to not make predictions
results in better performance. Fair comparison between different methods was possible
thanks to the proposed classification method, which tries to find optimal discrimination
threshold for each of the methods and thus the results are not affected by manual
parameter selection.
The proposed method outperforms both, the Random Forest classifier and the
previously published comparable results in terms of ITR. If one values accuracy over
ITR, the performance measure can be modified to take this into account and therefore
a classifier with higher accuracy can be obtained. By combining multiple feature
extraction methods, the achieved ITR is more than two times larger than the ones
reported in related work. For one of the subjects, ITR of over 60 bit/min was achieved
in this work. For further comparison to related work see [14].
For future work, the classifier will be tested online as the functionality for doing that
is already implemented in our software. This is required because in the current work
online ITR and MDT were estimated and might differ from the actual values. Online
testing should also provide information whether the learned discrimination thresholds
can be used in long periods of time or the thresholds need to be re-learned frequently,
which would decrease usability of this method. In our offline experiments, however, the
proposed method clearly outperforms the previous comparable results.
5. Conclusion
The main conclusions of the work are the following
• Proposed classifier outperforms previous results (Table 3).
Direct information transfer rate optimisation for SSVEP-based BCI
10
• Combining feature extraction methods outperforms the same methods used
individually (Table 1).
• Allowing the classifier to not make predictions when it is not confident enough
results in better performance (Tables 2).
In this work, a classification method for SSVEP-based BCI based on ITR optimisation
was proposed. This enabled fair comparison of different feature extraction methods by
finding the best discrimination threshold for each method. This clearly gives better
results than manually chosen thresholds, as has been done in previous works, and
is faster than grid-search method. By combining the features of different feature
extraction methods using LDA, the proposed classification method can be used with
multiple feature extraction methods, which increases the overall performance. Finally,
by allowing the classifier to not make predictions when it is not confident enough is
shown to be a good way to decrease the proportion of false-positive predictions.
6. Acknowledgments
The authors thank the financial support from The Estonian Research Council through
the personal research grant PUT1476. We also acknowledge funding by the European
Regional Development Fund through the Estonian Center of Excellence in IT, EXCITE.
7. References
[1] Zhu D, Bieger J, Molina G G and Aarts R M 2010 A survey of stimulation methods used in
SSVEP-based BCIs Computational Intelligence and Neuroscience 2010 1–13
[2] Wolpaw J R, Ramoser H, McFarland D J and Pfurtscheller G 1998 EEG-Based Communication:
Improved Accuracy by Response Verification IEEE Transactions on Rehabilitation Engineering
6 326–333
[3] Jukiewicz M and Cysewska-Sobusiak A 2015 Implementation of Bilinear Separation algorithm as
a classification method for SSVEP-based brain-computer interface Measurement Automation
Monitoring 61 51–53
[4] Karnati V B R, Verma U S G, Amerineni J S and Shah V H 2014 Frequency detection in Medium
and High frequency SSVEP based Brain Computer Interface systems by scaling of sine-curve fit
amplitudes 2014 First International Conference on Networks Soft Computing (ICNSC2014) pp
300–303
[5] Velchev Y, Radev D and Radeva S 2016 Features Extraction Based on Subspace Methods
with Application to SSVEP BCI International Journal of Emerging Engineering Research and
Technology 4 52–58
[6] Anindya S F, Rachmat H H and Sutjiredjeki E 2016 A prototype of SSVEP-based BCI for home
appliances control 2016 1st International Conference on Biomedical Engineering (IBIOMED)
pp 1–6
[7] Demir A F, Arslan H and Uysal I 2016 Bio-inspired Filter Banks for SSVEP-based Brain-computer
Interfaces 2016 IEEE International Conference on Biomedical and Health Informatics (BHI)
(Las Vegas, NV, USA)
[8] Bakardjian H, Tanaka T and Cichocki A 2010 Optimization of SSVEP brain responses with
application to eight-command Brain–Computer Interface Neuroscience letters 469 34–38
Direct information transfer rate optimisation for SSVEP-based BCI
11
[9] Cheng M, Gao X, Gao S and Xu D 2002 Design and implementation of a brain-computer interface
with high transfer rates Biomedical Engineering, IEEE Transactions on 49 1181–1186
[10] Lin Z, Zhang C, Wu W and Gao X 2007 Frequency Recognition Based on Canonical Correlation
Analysis for SSVEP-based BCIs Biomedical Engineering 54
[11] Friman O, Volosyak I and Graser A 2007 Multiple channel detection of steady-state visual evoked
potentials for brain-computer interfaces Biomedical Engineering, IEEE Transactions on 54(4)
742–750
[12] Zhang Y, Dong L, Zhang R, Yao D, Zhang Y and Xu P 2014 An Efficient Frequency Recognition
Method Based on Likelihood Ratio Test for SSVEP-Based BCI Computational and Mathematical
Methods in Medicine 2014
[13] Zhang Z, Li X and Deng Z 2010 A CWT-based SSVEP classification method for braincomputer interface system 2010 International Conference on Intelligent Control and Information
Processing pp 43–48
[14] Ingel A 2017 Machine Learning in VEP-based BCI Master’s thesis University of Tartu
[15] Yehia A G, Eldawlatly S and Taher M 2015 Principal component analysis-based spectral
recognition for SSVEP-based Brain-Computer Interfaces 2015 Tenth International Conference
on Computer Engineering Systems (ICCES) pp 410–415
12
Direct information transfer rate optimisation for SSVEP-based BCI
Appendix A. Related work
Table A1. Results of the related articles [14]. In each of these works the dataset by
Bakardijan et al. [8] was used.
Article
Feature
extraction
Window
Classification
Preprocessing
Classes
MUSIC
Gaussian
Channels
kernel,
0.5
N/A
8 Hz,
1
14 Hz
2
SVM linear
PSDA
SVM RBF
Windowed
8 Hz,
sinc filter,
14 Hz,
Blackman
28 Hz
N/A
Accuracy
(%)
63.2
All
N/A
4
one-vs-one
[6]
Subjects
MDT (s)
SVM,
[5]
length/
75
91.1
94.3
All
O1, O2,
POz, Oz
65
71.67
window
BIFB
No machine
learning
(but requires
[7]
PSDA
finding
7.5
1
100
7.8
2
100
Bandpass
10
3
86.7
filter,
8.33
4
66.7
Hamming
10
1
66.7
9
2
15
3
15
4
6.7
4
1
73.3
4
2
60
5
3
66.7
3
4
66.7
1
1
93
8 Hz,
1
All
O1, O2,
∼74
14 Hz
1
1
Oz
90
1
All
window
8 Hz,
14 Hz,
28 Hz
subject
specific
parameters)
CCA
N/A
Bilinear
[3]
PSDA
separation
N/A
SVM
Least
[4]
square
sine fitting
No machine
learning
8 Hz
AMUSE
14 Hz
0.5
28 Hz
Two
subjects
Oz
66.7
60
∼71
8 subject
86
specific
83
channels
92
1
92.43
PSDA
2
87.39
+ PCA
3
94.56
4
Subject
80.50
2
specific
71.16
Average
3
from the
76.76
over time
4
selection:
60.53
windows
1
P7, P3,
69.98
up to 4 s
2
Pz, P4,
58.35
3
P8, O1,
79.80
4
Oz, O2
47.17
Multiway
LDA (and
CCA
grid search
Bandbass
8 Hz,
for subject
filter, CAR,
14 Hz,
Multi-
specific
MAF
28 Hz
set
parameters)
[15]
CCA
CCA
76.61
1
1
62.69
2
59.52
3
65.51
4
56.04
13
Direct information transfer rate optimisation for SSVEP-based BCI
Appendix B. Skew normal distribution fit to data
C=3
C=2
C=1
fF1 |C
fF2 |C
fF3 |C
8
6
4
2
08
8
6
4
2
08
8
6
4
2
08
6
4
2
80
6
4
2
08
6
4
2
80
6
4
2
0
6
4
2
0
6
4
2
0
0
0.5
1
0
0.5
1
0
0.5
1
Figure B1. Histograms of features given class (blue) for subject 1. Green line denotes
skew normal distribution fit to the data and red lines indicate possible thresholds [14].
F̄F2 |C
C=3
C=2
C=1
F̄F1 |C
1.0
0.8
0.6
0.4
0.2
0.0
1.0
0.8
0.6
0.4
0.2
0.0
1.0
0.8
0.6
0.4
0.2
0.0
0
0.5
1
1.0
0.8
0.6
0.4
0.2
0.0
1.0
0.8
0.6
0.4
0.2
0.0
1.0
0.8
0.6
0.4
0.2
0.0
0
0.5
F̄F3 |C
1
1.0
0.8
0.6
0.4
0.2
0.0
1.0
0.8
0.6
0.4
0.2
0.0
1.0
0.8
0.6
0.4
0.2
0.0
0
0.5
1
Figure B2. Complementary cumulative distribution function of the corresponding
probability density function in Figure B1 [14].
14
Direct information transfer rate optimisation for SSVEP-based BCI
Appendix C. ITR calculation when predictions can be skipped
In this appendix, we derive the formula for calculating ITR taking into account the
fact that the classifier might not make a prediction at each time step. Therefore, in
addition to the events of prediction class i and correct class being j denoted by Pi and
Cj respectively, let us denote the event of making a prediction as M. Furthermore, let
us assume that the number of classes is n. Then
M=
n
[
(C.1)
Pi
i=1
and since predicting different classes are mutually exclusive events, it holds that
!
n
n
[
X
P(M) = P
Pi =
P(Pi )
(C.2)
i=1
i=1
where P(Pi ) can be calculated using equation (8). The previous probability is required
to calculate
P(Pi | M) =
P(Pi )
P(Pi ∩ M)
=
P(M)
P(M)
(C.3)
which is needed to calculate ITR. We also need to condition Cj on the event M
S
P (Cj ∩ ( ni=1 Pi ))
P(Cj | M) =
P(M)
Sn
P ( i=1 (Cj ∩ Pi ))
=
P(M)
n
1 X
=
P(Cj ∩ Pi )
P(M) i=1
n
1 X
=
P(Pi | Cj )P(Cj )
P(M) i=1
(C.4)
where P(Cj ) can be calculated using equation (7). The last probability needed to
calculate ITR is given by
P(Pi | Cj , M) =
P(Pi ∩ Cj ∩ M)
P(Pi ∩ M | Cj )P(Cj )
P(Pi | Cj )
=
=
P(Cj ∩ M)
P(Cj ∩ M)
P(M | Cj )
(C.5)
where
P(M | Cj ) =
n
X
P(Pi | Cj ).
(C.6)
i=1
Finally, ITR can be calculated as
I(P |M, C|M) =
n X
n
X
i=1 j=1
P(Pi | Cj , M)P(Cj | M) log2
P(Pi | Cj , M)
P(Pj | M)
.
(C.7)
15
Direct information transfer rate optimisation for SSVEP-based BCI
Appendix D. Deriving gradient of ITR
In this appendix the formula for calculating gradient of ITR is derived. Let again feature
value for class i be modelled by random variable Fi and denote the event of k being the
correct class by Ck . First note that probability density function fFi |Ck of the random
variable Fi | Ck is the derivative of the corresponding cumulative density function FFi |Ck
dFFi |Ck
= fFi |Ck .
dti
This allows us to calculate
(
Q
−fFi |Ck (ti ) · j∈{1,...,n}\{i} FFj |Ck (tj )
∂
Q
P(Pi | Ck ) =
F Fi |Ck (ti ) · fFℓ |Ck (tℓ ) · j∈{1,...,n}\{i,ℓ} FFj |Ck (tj )
∂tℓ
(D.1)
if i = ℓ
if i =
6 ℓ
(D.2)
that is needed later.
To make differentiation of ITR easier, the following property will be used
I(P |M, C|M) = H(P |M) − H((P |M) | (C|M))
(D.3)
where
H(P |M) = −
n
X
P(Pi |M) log2 P(Pi |M)
(D.4)
P(Cj | M)H(P | Cj , M).
(D.5)
i=1
H((P |M) | (C|M)) =
n
X
j=1
The corresponding partial derivatives are
n
X
ln(P(Pi | M)) + 1 ∂
∂
H(P | M) = −
P(Pi | M)
(D.6)
∂tℓ
ln 2
∂tℓ
i=1
n
X
∂
∂
∂
H((P |M) | (C|M)) =
H(P | Cj , M) P(Ck | M) + P(Ck | M) H(P | Cj , M)
∂tℓ
∂tℓ
∂tℓ
j=1
(D.7)
and the partial derivatives for the probabilities are
n
X
∂
∂
P(Pi ) =
P(Cj ) P(Pi | Cj )
∂tℓ
∂tℓ
j=1
(D.8)
n
X ∂
∂
P(M) =
P(Pj )
∂tℓ
∂t
ℓ
j=1
(D.9)
P(M) ∂t∂ℓ P(Pi ) − P(Pi ) ∂t∂ℓ P(M)
∂
P(Pi | M) =
(D.10)
∂tℓ
(P(M))2
P
Pn ∂
n
∂
P(M)
P(P
∩
C
)
−
P(P
∩
C
)
P(M)
j
k
j
k ∂tℓ
j=1 ∂tℓ
j=1
∂
P(Ck | M) =
(D.11)
∂tℓ
(P(M))2
n
X
∂
∂
P(M | Cj ) =
P(Pi | Cj )
(D.12)
∂tℓ
∂t
ℓ
i=1
Direct information transfer rate optimisation for SSVEP-based BCI
P(M | Cj ) ∂t∂ℓ P(Pi | Cj ) − P(Pi | Cj ) ∂t∂ℓ P(M | Cj )
∂
.
P(Pi | Cj , M) =
∂tℓ
(P(M | Cj ))2
16
(D.13)
Therefore we are able to calculate
∂
∂
∂
I(P |M, C|M) =
H(P | M) −
H(P | C, M).
∂tℓ
∂tℓ
∂tℓ
(D.14)
Next, the partial derivative of MDT will be calculated
s
∂
∂
MDT = −
·
P(M)
2
∂tℓ
(P(M)) ∂tℓ
(D.15)
which together with the previous result can be used to get
∂
∂
60
∂
ITR = MDT I(P |M, C|M) − I(P |M, C|M) MDT
.
∂tℓ
∂tℓ
∂tℓ
MDT2
(D.16)
Finally, the required gradient is
∇ITR =
∂
∂
ITR, . . . ,
ITR
∂t1
∂tn
T
.
(D.17)
Appendix E. Implementation of the algorithm
The implementation of the proposed classification algorithm can be found in Github
repository§. The detailed instruction on how to run the scrip is in README.md file in the
repository.
§ https://github.com/antiingel/ITR-optimisation