Transfer Learning for Brain-Computer Interfaces: A Euclidean Space Data Alignment Approach
aa r X i v : . [ c s . L G ] A p r Transfer Learning for Brain-Computer Interfaces:A Euclidean Space Data Alignment Approach
He He and Dongrui Wu
Abstract — Objective : This paper targets a major challengein developing practical EEG-based brain-computer interfaces(BCIs): how to cope with individual differences so that betterlearning performance can be obtained for a new subject, withminimum or even no subject-specific data?
Methods : We proposea novel approach to align EEG trials from different subjectsin the Euclidean space to make them more similar, and henceimprove the learning performance for a new subject. Ourapproach has three desirable properties: 1) it aligns the EEGtrials directly in the Euclidean space, and any signal processing,feature extraction and machine learning algorithms can then beapplied to the aligned trials; 2) its computational cost is verylow; and, 3) it is unsupervised and does not need any labelinformation from the new subject.
Results : Both offline andsimulated online experiments on motor imagery classification andevent-related potential classification verified that our proposedapproach outperformed a state-of-the-art Riemannian spacedata alignment approach, and several approaches without dataalignment.
Conclusion : The proposed Euclidean space EEG dataalignment approach can greatly facilitate transfer learning inBCIs.
Significance : Our proposed approach is effective, efficient,and easy to implement. It could be an essential pre-processingstep for EEG-based BCIs.
Index Terms —Brain-computer interface, data alignment, EEG,Riemannian geometry, transfer learning
I. I
NTRODUCTION
A brain-computer interface (BCI) [17], [34] is a commu-nication pathway for a user to interact with his/her surround-ings by using brain signals, which contain information aboutthe user’s cognitive state or intentions. Electroencephalogram(EEG) is the most popular input in BCI systems. Motorimagery (MI) and event-related potentials (ERPs) are twocommon approaches of EEG-based BCIs, and also the focusof this paper.For MI-based BCIs, the user needs to imagine the move-ments of his/her body parts (e.g., hands, feet, and tongue),which causes modulations of brain rhythms in the involvedcortical areas. So, the imagination of different movementscan be distinguished from the spatial localization of differentsensorimotor rhythm modulations, and then used to controlexternal devices. For ERP-based BCIs, the user is stimulatedby a majority of common stimuli (non-target) and a small
He He and Dongrui Wu are with the Key Laboratory of Image Processingand Intelligent Control (Huazhong University of Science and Technology),Ministry of Education. They are also with the School of Artificial Intelligenceand Automation, Huazhong University of Science and Technology, Wuhan,China. Email: [email protected], [email protected] Wu is the corresponding author.This work has been submitted to the IEEE for possible publication.Copyright may be transferred without notice, after which this version mayno longer be accessible. number of rare stimuli (target). The EEG response shows aspecial ERP pattern after the user perceives a target stimulus.So, a target stimulus can be detected by determining if thereis an ERP pattern associated with it.Early BCI systems were mainly used to help people withdisabilities [24]. For example, MI-based BCIs have been usedto help severely paralyzed patients to control powered ex-oskeletons or wheelchairs without the involvement of muscles,and ERP spellers enable patients who can not move nor speakto type. Recently, the application scope of BCIs has beenextended to able-bodied people [22], [33], and EEG becomesthe most popular input signal because it is easy and safeto acquire, and has high temporal resolution. However, EEGmeasures the very weak brain electrical signals from the scalp,which results in poor spatial resolution and low signal-to-noiseratio [4].Consequently, sophisticated signal processing and machinelearning algorithms are needed in EEG-based BCI systems todecode the EEG signal, especially for single-trial classificationof EEG signals in real-world applications. Usually the EEGsignals are first band-pass filtered and spatially filtered toincrease the signal-to-noise ratio, and then discriminative fea-tures are extracted, which are next fed into machine learningalgorithms such as Linear Discriminant Analysis (LDA) andSupport Vector Machine (SVM) [3] for classification.The covariance matrix of multi-channel EEG signals playsan important role in signal processing. For instance, commonspatial pattern (CSP) filters [11], [16], [21], [25], computeddirectly from the covariance matrices, are the most popularspatial filters for MI. An intuitive explanation is that theinteractions between different channels are encoded in thecovariance matrices, which can be decomposed to find thespatial distribution of brain activities.Recent years have also witnessed an increasing interest inusing the EEG covariance matrices for both classification andregression [1], [7], [40], [41]. Since the covariance matricesare symmetric positive definite (SPD) and lie on a Riemannianmanifold, a popular approach is to view each covariance matrixas a point in the Riemannian space, and use its geodesicto the Riemannian mean as a feature in classification. Thisapproach is called the Minimum Distance to Riemannian Mean(MDRM) classifier [1], [7], [41].MDRM can be directly applied to MI-based BCIs becausethe spatial information plays the most critical role in decodingMI signals. However, the discriminative information of ERPsignals is represented temporally rather than spatially. SoBarachant and Congedo [2] augmented the ERP trials toembed this temporal information. More specifically, the mean of the ERP trials is concatenated to each trial. The covariancematrix of the concatenated trial then contains both temporaland spatial information, which makes MDRM also applicableto ERP classification.Transfer learning (TL) [23], which utilizes information insource domains to improve the learning performance in atarget domain, has also been successfully used for BCIs[12], [35], [36], [38], [39]. Kang et al. [13] and Lotte andGuan [19] improved covariance matrix estimation for CSPfilters by regularizing it towards the average of other subjects,or constructing a common feature space. Samek et al. [27]proposed an approach to transfer information about non-stationarities in the data to reduce the shift between subjects,and verified its performance in MI BCIs. Kindermans etal. [14] integrated dynamic stopping, transfer learning andlanguage model in a probabilistic zero-training frameworkand demonstrated competitive performance to a state-of-the-art supervised classifier in an ERP speller. Kobler and Scherer[15] pre-trained a Restricted Boltzmann Machine on a publiclyavailable dataset and then adapted it to new observations insensory motor rhythm based BCI.Recently, Zanini et al. [42] proposed a TL framework for theMDRM classifier, which is denoted as Riemannian alignment(RA)-MDRM in this paper, by utilizing the information ofthe resting state. In MI, the resting state is the time windowthat the subject is not performing any task, e.g., the transitionwindow between two successive imageries. In ERP, particu-larly rapid serial visual presentation (RSVP), the stimuli arepresented quickly one after another and the responses overlap,so it is difficult to find the resting state. [42] used the non-target stimuli as the resting state in ERP, which means somelabeled data from the new subject must be known.Experiments have shown that RA-MDRM outperformedMDRM in MI and ERP tasks [42], when compared in aTL setting. But as mentioned above, it still needs a smallamount of labeled subject-specific calibration trials for ERPclassification. Moreover, for both MI and ERP, the classifica-tion is performed in the Riemannian space, whose geodesiccomputation is much more complicated, time-consuming, andunstable than the distance calculation in the Euclidean space.In this paper we propose a new EEG data alignment approachin the Euclidean space, which has the following desirablecharacteristics:1) It transforms and aligns the EEG trials in the Euclideanspace, and any signal processing, feature extraction andmachine learning algorithms can then be applied to thealigned trials. On the contrary, RA aligns the covariancematrices (instead of the EEG trials themselves) in theRiemannian space, and hence a subsequent classifiermust be able to operate on the covariance matricesdirectly, whereas there are very few such classifiers.2) It can be computed several times faster than RA.3) It only requires unlabeled EEG trials but does not needany label information from the new subject; so, it canbe used in completely unsupervised learning.The effectiveness of our proposed approach is then demon-strated in two BCI classification scenarios: 1)
Offline unsupervised classification , in which unlabeledEEG trials from a new subject are available, and weneed to label them by making use of auxiliary labeleddata from other subjects.2)
Simulated online supervised classification , in which asmall number of labeled EEG epochs from a new subjectare obtained sequentially on-the-fly, and a classifier istrained from them and auxiliary labeled data from othersubjects to label future incoming epochs from the newsubject.The remainder of this paper is organized as follows: Sec-tion II introduces the RA-MDRM approach in the Riemannianspace. Section III proposes our Euclidean space data alignmentapproach. Section IV introduces the three datasets used in ourexperiments, including two MI datasets and one ERP dataset.Sections V and VI compare the performance of our approachwith RA-MDRM in offline and simulated online learning,respectively. Finally, Section VII draws conclusion and pointsout some future research directions.II. R
ELATED W ORK
The covariance matrices of EEG trials are SPD, and lie ina Riemannian space instead of a Euclidean space [41]. Sincethe covariance matrices directly encode the spatial informationof the EEG trials, and by appropriately augmenting the EEGtrials (such as in ERP classification) they can also encodethe temporal information, we can perform EEG classificationdirectly based on the covariance matrices.This section introduces the MDRM classifier, which assignsa trial to the class whose Riemannian mean is the closest toits covariance matrix, and also a Riemannian space covariancematrix alignment approach (RA).
A. Riemannian Distance
The Riemannian distance between two SPD matrices P and P is called the geodesic , which is the minimum length of acurve connecting them on the Riemannian manifold: δ ( P , P ) = k log( P − P ) k F = " R X r =1 log λ r , (1)where the subscript F denotes the Frobenius norm, and λ r ( r = 1 , , · · · , R ) are the real eigenvalues of P − P .The Riemannian distance between two SPD matrices P and P remains unchanged under linear invertible transformation: δ ( C T P C, C T P C ) = δ ( P , P ) , (2)where C is an invertible matrix. This property of the Rieman-nian distance is called congruence invariance . B. Riemannian Mean
The mean of a set of SPD matrices can be computed inthe Euclidean space as their arithmetic mean, and also in theRiemannian space as the Riemannian mean (geometric mean), defined as the matrix minimizing the sum of the squaredRiemannian distances: ̺ ( P , · · · , P N ) = arg min P N X n =1 δ ( P, P n ) . (3)There is no closed-form solution to (3), and it is usuallycomputed by an iterative gradient descent algorithm [8]. C. MDRM
The MDRM classifier [1], [7], [41] first computes theRiemannian mean of each class from the covariance matricesof the labeled training trials, then assigns each test trial to theclass whose Riemannian mean is the closest to its covariancematrix, i.e., g (Σ) = arg min c δ (Σ , ¯Σ c ) , (4)where Σ is the covariance matrix of the test trial, ¯Σ c is theRiemannian mean of Class c , and g (Σ) is the predicted classlabel of Σ . D. RA-MDRM
Zanini et al. [42] proposed a novel TL approach in theRiemannian space, referred to in this paper as RA-MDRM, toimprove the performance of the MDRM classifier by utilizingauxiliary data from other sessions and/or subjects when thereare only a few labeled trials from a new subject. Since thecovariance matrices of the trials are the input to MDRM, RA-MDRM aims to align the covariance matrices from differentsessions/subjects to give them a common reference. [42]assumes that “ different source configurations and electrodepositions induce shifts of covariance matrices with respect toa reference (resting) state, but that when the brain is engagedin a specific task, covariance matrices move over the SPDmanifold in the same direction. ” Then RA-MDRM centers “ thecovariance matrices of every session/subject with respect to areference covariance matrix so that what we observe is onlythe displacement with respect to the reference state due to thetask. ”More specifically, RA-MDRM first computes the covariancematrices of some resting trials, { R i } ki =1 , in which the subjectis not performing any task, and then computes the Riemannianmean ¯ R of these matrices. ¯ R is then used as the referencematrix in RA-MDRM to reduce the inter-session/subject vari-ability by the following transformation: ˜Σ i = ¯ R − / Σ i ¯ R − / , (5)where Σ i is the covariance matrix of the i th trial, and ˜Σ i isthe corresponding aligned covariance matrix.Equation (5) makes the reference state of different ses-sions/subjects centered at the identity matrix. This transfor-mation would not change the distance between the covariancematrices belonging to the same session/subject because ofthe congruence invariance property in (2), but makes thecovariance matrices of different sessions/subjects move overthe Riemannian manifold in different directions with respectto the corresponding reference matrices, and hence reduces the cross-session/subject differences. As a result, covariancematrices from different sessions/subjects can be aligned andbecome comparable if ¯ R can be appropriately estimated.In MI, the resting state is the time window that the subject isnot performing any task, e.g., the transition window betweentwo imageries. In ERP, particularly the RSVP, the stimuli arepresented quickly one after another and the responses overlap,so it is difficult to find the resting state. [42] used the non-target stimuli as the resting state in ERP, which requires thatsome labeled trials from the new subject must be known. Thatis, in ERP, ¯ R = arg min R X i ∈ I δ ( R, Σ i ) , (6)where I is the index set of the non-target trials.RA-MDRM can be applied to both MI and ERP data; how-ever, there is an important difference in building covariancematrices in these two paradigms.Specifically, the covariance matrix of an MI trial X i issimply computed as: Σ i = X i X Ti . (7) Σ i encodes the most discriminative information of an MI trial,i.e., the spatial distribution of the brain activity.However, the main discriminative information of ERP trialsis carried temporally rather than spatially. The normal covari-ance matrix such as (7) ignores this temporal information.So Barachant and Congedo [2] proposed a novel approachto augment the ERP trials so that their covariance matricescan also encode the temporal information. They first computethe mean of the ERP trials: ¯ X = 1 | I | X i ∈ I X i , (8)where I is the index set of the ERP trials. They then build anaugmented trial X ∗ i by concatenating ¯ X and X i : X ∗ i = (cid:20) ¯ XX i (cid:21) (9)The covariance matrix of X ∗ i is then used in RA-MDRM. E. Limitations of RA
Although RA-MDRM has demonstrated promising perfor-mance in several BCI applications [42], it still has somelimitations:1) RA-MDRM aligns the covariance matrices in the Rie-mannian space, instead of the EEG trials themselves.A subsequent classifier must be able to operate on thecovariance matrices directly, whereas there are very fewsuch classifiers.2) RA-MDRM uses the Riemannian mean of the covari-ance matrices, which is time-consuming to compute,especially when the number of EEG channels is large.3) RA-MDRM for ERP classification needs some labeledtrials from the new subject, more specifically, RA needssome non-target trials to compute the reference matrixin (6), and MDRM needs some target trials to construct X ∗ i in (9), so it is a supervised learning approach andcannot be used when there is no label information fromthe new subject at all.III. EEG D ATA A LIGNMENT IN THE E UCLIDEAN S PACE (EA)This section introduces our proposed Euclidean-space align-ment (EA) approach.
A. The EA
To cope with the limitations of RA, we propose EA thatdoes not need any labeled data from the new subject, and canbe computed much more efficiently. The rationale is to makethe data distributions from different subjects more similar, andhence a classifier trained on the auxiliary data would have abetter chance to perform well on the new subject. This ideahas been widely used in TL [23], [30], [39].Similar to RA, our approach is also based on a referencematrix ¯ R , but estimated in a different way. Assume a subjecthas n trials. Then, ¯ R = 1 n n X i =1 X i X Ti , (10)i.e., ¯ R is the arithmetic mean of all covariance matrices froma subject. We then perform the alignment by ˜ X i = ¯ R − / X i . (11)After the alignment, the mean covariance matrix of all n aligned trials is: n n X i =1 ˜ X i ˜ X Ti = 1 n n X i =1 ¯ R − / X i X Ti ¯ R − / = ¯ R − / n n X i =1 X i X Ti ! ¯ R − / = ¯ R − / ¯ R ¯ R − / = I, (12)i.e., the mean covariance matrices of all subjects are equal tothe identity matrix after alignment, and hence the distributionsof the covariance matrices from different subjects are moresimilar. This is very desirable in TL.The idea of EA can also be explained using the conceptof maximum mean discrepancy (MMD) [10], [39], widelyused in TL. MMD represents the distances between differentdistributions as the distances between their mean embeddingsof features. Smaller distances indicate that the distributions aremore similar, and hence more suitable for TL. If we view thecovariance matrices as the feature embeddings of EEG trials,then, after EA, the distances between EEG trials from differentsubjects become zero (because the mean covariance matricesof all subjects are identical), which should generally benefitTL. B. Comparison with RA
Both EA and RA ensure the Riemannian distances amongthe covariance matrices are kept unchanged after the align-ment. However, there are three major differences betweenthem:1) RA computes the reference matrix ¯ R as the Riemannian(geometric) mean of the resting state covariance matri-ces , whereas EA computes the reference matrix ¯ R as the Euclidean (arithmetic) mean of all covariance matrices .2) RA aligns the covariance matrices in the Riemannianspace , whereas EA aligns the time domain EEG trials in the
Euclidean space .3) After RA, the
Riemannian mean of the resting statecovariance matrices becomes an identity matrix (butthe Riemannian mean of all covariance matrices is not).After EA, the
Euclidean mean of all covariance matrix becomes an identity matrix.Compared with RA, EA has the following desirable prop-erties:1) EA transforms and aligns the EEG trials in the Eu-clidean space. Any subsequent signal processing, featureextraction and machine learning algorithms can then beapplied to the aligned trials. So, it has much broader ap-plications than RA, which aligns the covariance matrices(instead of the EEG trials) in the Riemannian space.2) EA can be computed much faster than RA, becauseEA uses the arithmetic mean as the reference matrix,whereas RA uses the Riemannian mean as the referencematrix.3) EA does not need any label information from the newsubject, whereas RA needs some label information forERP classification.
C. Relationship to CORAL
A “ frustratingly easy domain adaptation ” approach, COR-relation ALignment (CORAL) [30], was proposed in 2016 tominimize domain shift by aligning the second-order statisticsof different distributions, without requiring any target labels.Its idea is very similar to EA.CORAL considers 1D features (vectors), instead of 2Dfeatures (matrices) such as EEG trials in this paper. Let C S ∈ R d S × d S and C T ∈ R d T × d T be the feature covariancematrices in the source and target domains, respectively, where d S and d T are the number of features in the source andtarget domains, respectively. Then, CORAL finds a lineartransformation A ∈ R d S × d T to the source domain features,so that the Frobenius norm of the difference between theircovariance matrices is minimized, i.e., min A || A T C S A − C T || F (13)The linear transformation A has a simple closed-form solution[30].EA and CORAL are similar; however, there are also someimportant differences:1) CORAL considers features, and each domain hasonly one covariance matrix, which measures the covari-ances between different pairs of individual features. EA considers features (EEG trials), and each domainhas many covariance matrices (each corresponding toone EEG trial), each of which measures the covariancesbetween different pairs of EEG channels in an EEG trial.2) CORAL minimizes the distance between the covariancematrices in different domains, whereas EA minimizesthe distance between the mean covariance matrices indifferent domains.3) CORAL finds a linear transformation to the sourcedomain features only, so that the transformed sourcedomain covariance matrix approaches the original targetdomain covariance matrix. EA finds a separate lineartransformation to each domain, so that the mean of thetransformed source domain covariance matrices equalsthe mean of the transformed target domain covariancematrices. IV. D ATASETS
This section introduces two MI datasets and one ERPdataset used in our experiments.
A. MI Datasets
Two MI datasets from BCI Competition IV were used.Their experimental paradigms were similar: In each sessiona subject sat in a comfortable chair in front of a computer.At the beginning of a trial, a fixation cross appeared on theblack screen to prompt the subject to be prepared. A momentlater, an arrow pointing to a certain direction was presentedas a visual cue for a few seconds. In this period the subjectwas asked to perform a specific MI task without feedbackaccording to the direction of the arrow. Then the visual cuedisappeared from the screen and a short break followed untilthe next trial began.The first dataset (Dataset 1 [5]) was recorded from sevenhealthy subjects. For each subject two classes of MI wereselected from three classes: left hand, right hand, and foot.Continuous 59-channel EEG signals were acquired for threephases: calibration, evaluation, and special feature. Here weonly used the calibration data which provided complete markerinformation. Each subject had 100 trials from each class in thecalibration phase.The second MI dataset (Dataset 2a) consisted of EEGdata from nine heathy subjects. Each subject was instructedto perform four different MI tasks, namely the imaginationof the movement of the left hand, right hand, both feet, andtongue. 22-channel EEG signals and 3-channel EOG signalswere recorded at 250Hz. A training phase and an evaluationphase were recorded on different days for each subject. Herewe only used the EEG data from the training phase, whichincluded complete marker information. Additionally, two MIclasses (left hand and right hand) were selected and each classhad 72 trials.A causal band-pass filter (50-order linear phase Hammingwindow FIR filter designed by Matlab function fir1 , with [0 . , . seconds after thecue appearance as our trials for both datasets. EEG signalsbetween [4 . , . seconds after the cue appearance wereextracted as resting states. B. ERP Dataset
We used an RSVP dataset from PhysioNet [9] for ERPclassification. It contained EEG data from 11 healthy subjectsupon rapid presentation of images at 5, 6, and 10 Hz [20].Each subject was seated in front of a computer showing aseries of images rapidly. The images were aerial pictures ofLondon falling into two categories, namely target images andnon-target images. Target images contained a randomly rotatedand positioned airplane that had been photo realistically super-imposed, and non-target images did not contain airplanes. Thetask was to recognize if the images were target or non-targetfrom EEG signals, which were recorded from 8 channels at2048 Hz.For each presentation rate and subject there were twosessions represented by “a” and “b”, which indicated whetherthe first image was “ target” or “non-target”, respectively. Herewe used the 5 Hz version (five images per second) in Session a.The number of samples for different subjects varying between368 and 565, and the target to non-target ratio was around 1:9.The continuous EEG data had been bandpass filtered be-tween [0 . , Hz. We downsampled the EEG signal from2048Hz to 64Hz, and epoched each trial to the [0 , . secondinterval time-locked to the stimulus onset. C. Data Visualization
It’s interesting to visualize how the EEG trials are modifiedby EA. Fig. 1 shows two examples (one for left hand imagery,and the other for right) from Subject 1 in Dataset 2a. Theblack and red curves are EEG signals before and after EA,respectively, and the vertical axis numbers show their corre-lations. The magnitudes of the EEG signals are smaller andmore uniform after EA, and the EEG signals before and afterEA generally have low correlation.To visualize how EA reduces individual differences, weused t -Stochastic Neighbor Embedding ( t -SNE) [32], a non-linear dimensionality reduction technique that embeds high-dimensional data in a two- or three- dimensional space, toshow and compare the EEG trials before and after EA.Each time we picked trials from one subject as the testset, and combined trials from all remaining subjects as thetraining set. Fig. 2(a) shows the t -SNE visualization of thefirst two subjects in MI Dataset 1, each row corresponding toa different test subject. The red dots are trials from the testsubject, and the blue dots from the training subjects. In eachrow, the left plot shows the trials before EA, and the rightafter EA. Corresponding visualization results for the first twosubjects in MI Dataset 2a and ERP are shown in Figs. 2(b)and 2(c), respectively.
200 400 600
Time
Time
Fig. 1. EEG trials before (black curves) and after (red curves) EA. Each rowis a different channel.
The training trials (blue dots) may be scattered far awayfrom the test trials (red dots) before EA, especially in Fig. 2(a).So, applying a classifier designed on the training trials directlyto the test trials may not achieve good performance. However,after EA, the training and test trials overlap with each other,i.e., the discrepancies between them are reduced.V. P
ERFORMANCE E VALUATION : O
FFLINE U NSUPERVISED C LASSIFICATION
This section presents the performance comparison of EAwith other approaches on both MI and ERP datasets in offlineunsupervised classification.
A. Offline Unsupervised Classification
In each dataset, there were multiple subjects, and eachsubject was first aligned independently, either in the Rie-mannian space using (5), or in the Euclidean space using(11). Since we had access to all EEG recordings in offlineclassification, all trials or resting epochs between all trials wereused to estimate the reference matrices. We then used leave-one-subject-out cross-validation to evaluate the classificationperformance: each time we picked one subject as the newsubject (test set), combined EEG trails from all remainingsubjects as the training set to build the classifier, and thentested the classifier on the new subject.
B. Offline Classification Results on the MI Datasets
We first tested EA on the two MI datasets, and compared itsperformance with RA-MDRM. In the Euclidean space, afterEA, we used CSP [11], [16], [21], [25] for spatial filtering andLDA for classification. More specifically, the following fourapproaches were compared:1) MDRM: The basic MDRM classifier, as introduced inSection II-C. It does not include any data alignment.2) RA-MDRM: It is the approach introduced in Sec-tion II-D, which first aligns the covariance matrices inthe Riemannian space, and then performs MDRM. -40 -20 0 20 40-20020
Subject 1, before EA -40 -20 0 20 40-20020
Subject 1, after EA -40 -20 0 20-40-20020
Subject 2, before EA -40 -20 0 20-40-20020
Subject 2, after EA (a) -20 0 20 40-20020
Subject 1, before EA -20 0 20 40-20020
Subject 1, after EA -30 -20 -10 0 10 20-2002040
Subject 2, before EA -30 -20 -10 0 10 20-2002040
Subject 2, after EA (b) -40 -20 0 20-40-20020
Subject 1, before EA -40 -20 0 20-40-20020
Subject 1, after EA -20 0 20 40-2002040
Subject 2, before EA -20 0 20 40-2002040
Subject 2, after EA (c)Fig. 2. t -SNE visualization of the first two subjects before and after EA.(a) MI Dataset 1; (b) MI Dataset 2a; (c) ERP. Red dots: trials from the testsubject; blue dots: trials from the training subjects.
3) CSP-LDA: It is a standard Euclidean space classificationapproach for MI, which spatially filters the EEG trialsby CSP and then classifies them by LDA. It does notinclude any data alignment.4) EA-CSP-LDA: It first aligns the EEG trials in theEuclidean space by EA (Section III), and then performsCSP filtering and LDA classification.The classification accuracies of the four approaches arepresented in Fig. 3 and Table I, which show that:1) RA-MDRM outperformed MDRM on 15 out of the 16subjects, suggesting that RA was effective.2) EA-CSP-LDA also outperformed CSP-LDA on 14 outof the 16 subjects, suggesting that the proposed EA wasalso effective.3) EA-CSP-LDA outperformed RA-MDRM on 11 out ofthe 16 subjects, suggesting that the proposed EA, whichenables the use of a wide range of Euclidean space signalprocessing and machine learning approaches, could bemore effective than RA.Finally, it is worth noting that for a small number of subjects(e.g., Subjects 4 and 9 in Dataset 2a), EA actually degraded theclassification accuracy. Some possible reasons are explainedat the end of the paper, and will be investigated in our futureresearch.
Subject A cc u r ac y ( % ) MDRMRA-MDRMCSP-LDAEA-CSP-LDA (a)
Subject A cc u r ac y ( % ) MDRMRA-MDRMCSP-LDAEA-CSP-LDA (b)Fig. 3. Offline unsupervised classification accuracies on the MI datasets: (a)Dataset 1; (b) Dataset 2a.
To determine if the differences between our proposed ap-proach (EA-CSP-LDA) and each other approach was statis-tically significant, we performed paired-sample t -test on theaccuracies in Table I using MATLAB function ttest . The TABLE IO
FFLINE UNSUPERVISED CLASSIFICATION ACCURACIES (%)
ON THE TWO MI DATASETS .Dataset Subject MDRM RA-MDRM CSP-LDA EA-CSP-LDA1 51.00 72.50 48.00
MI Dataset 1 4 51.00 54.50 50.00 avg 54.36 68.07 59.71 null hypothesis for each pairwise comparison was that thedifference between the paired samples has mean zero, andit was rejected if p ≤ α , where α = 0 . was used. Beforeperforming each t -test, we also performed a Lilliefors test [18]to verify that the null hypothesis that the data come from anormal distribution cannot be rejected.The paired-sample t -test results are shown in Table II,where the statistically significant ones are marked in bold.EA-CSP-LDA significantly outperformed CSP-LDA on bothMI datasets, suggesting that EA was effective. In addition, EA-CSP-LDA significantly outperformed RA-MDRM on Dataset1, and had comparable performance with it on Dataset 2a,suggesting that EA may be preferred over RA. TABLE IIP
AIRED - SAMPLE t - TEST RESULTS ON THE TEST ACCURACIES IN T ABLE
I.MI Dataset 1MDRM RA-MDRM CSP-LDAEA-CSP-LDA
MI Dataset 2MDRM RA-MDRM CSP-LDAEA-CSP-LDA
It is also interesting to compare the computational cost ofdifferent data alignment approaches. The platform was a DellXPS15 laptop with Intel Core i7-6700HQ [email protected],16GB memory, and 512 GB SSD, running 64-bit Windows10 Education and Matlab 2017a. The results are shown inTable III. Our proposed EA-CSP-LDA was 3.6-19.5 timesfaster than RA-MDRM, and also it had much smaller standarddeviation. RA-MDRM ran much slower on Dataset 1 becauseit had much more channels than Dataset 2a (59 versus 22).
TABLE IIIT
HE COMPUTING TIME ( SECONDS ) OF EA-CSP-LDA
AND
RA-MDRM.EA-CSP-LDA RA-MDRMMean std Mean stdMI Dataset 1 0.3864 0.0514 7.5326 0.2200MI Dataset 2a 0.2405 0.0322 0.8766 0.0729
In summary, we have demonstrated that our proposed EAis more effective and efficient than RA in offline unsupervisedMI classification.
C. Offline Classification Results on the ERP Dataset
As RA-MDRM cannot be applied to ERP classificationwhen there are no labeled trials at all from the new subject [RAneeds some non-target trials to compute the reference matrixin (6), and MDRM needs some target trials to construct X ∗ i in(9)], we only validate the effectiveness of EA by comparingit with cases that no data alignment is performed, in leave-one-subject-out cross-validation. All approaches used SVMclassifiers, which cannot be associated with RA because RAonly outputs covariance matrices.More specifically, we compared the performances of thefollowing four approaches (all trials were downsampled to 64Hz):1) SVM, which performs principal component analysis(PCA) on the EEG trials to suppress noise and extractfeatures, and then SVM for classification. It does notinclude any data alignment.2) EA-SVM, which first performs EA to align the trialsfrom different subjects in the Euclidean space, and thenPCA and SVM classification.3) xDAWN-SVM, which first performs xDAWN [26], [37]to spatially filter the EEG trials, and then PCA and SVMclassification. It does not include any data alignment.4) EA-xDAWN-SVM, which first performs EA to align thetrials from different subjects in the Euclidean space, thenxDAWN to spatially filter the EEG trials, and finallyPCA and SVM classification.For all approaches, we first reshaped the 2D features (ma-trices) of EEG data into 1D vectors, then normalized each di-mension to zero mean and unit variance. We then applied PCAto extract 20 features. Because these features had differentranges, we further normalized each feature to interval [0 , .LibSVM [6] with a linear kernel was used for classification.We considered the trade-off parameter C ∈ { − , − , ..., } ,and used nested 5-fold cross-validation on the training data toidentify the optimal C . Finally, we used all training data andthe optimal C to train a linear SVM classifier, and applied itto the test data.Because ERPs had significant class imbalance, we used thebalanced classification accuracy (BCA) as the performancemeasure. Let m + and m − be the true number of trials fromthe target and non-target classes, respectively. Let n + and n − be the number of trials that are correctly classified byan algorithm as target and non-target, respectively. Then, wefirst compute a + = n + m + , a − = n − m − , (14)where a + is the classification accuracy on the target class, and a − on the non-target class. The BCA is then computed as: BCA = a + + a − . (15) The BCAs for the four approaches are presented in Fig. 4and Table IV, which show that:1) EA-SVM outperformed SVM on nine out of 11 subjects,suggesting that the proposed EA was generally effectivefor ERP classification.2) EA-xDAWN-SVM outperformed xDAWN-SVM oneight out of 11 subjects, suggesting again that the pro-posed EA was generally effective for ERP classification.3) On average xDAWN-SVM and SVM achieved similarperformances, but EA-xDAWN-SVM slightly outper-formed EA-SVM, suggesting that our proposed EA mayalso help unleash the full potential of xDAWN. BC A ( % ) SVMEA-SVMxDAWN-SVMEA-xDAWN-SVM
Fig. 4. BCAs of offline unsupervised classification on the ERP dataset.TABLE IVBCA S (%) OF OFFLINE UNSUPERVISED CLASSIFICATION ON THE
ERP
DATASET .Subject SVM EA-SVM xDAWN-SVM EA-xDAWN-SVM1 77.54 avg 64.64 67.85 64.60 Paired-sample t -tests were also performed for the results inTable IV. As RA-MDRM could not be applied in this scenario,only two pairs of algorithms were compared, i.e., SVM ver-sus EA-SVM, and xDAWN-SVM versus EA-xDAWN-SVM.The results are shown in Table V, where the statisticallysignificant ones are marked in bold. EA-SVM significantlyoutperformed SVM, and EA-xDAWN-SVM significantly out-performed xDAWN-SVM, suggesting that EA was effectiveon the ERP dataset, too. TABLE VP
AIRED t - TEST RESULTS ON THE TEST
BCA
S IN T ABLE
IV.SVM xDAWN-SVMEA-SVM
EA-xDAWN-SVM
D. Discussion: Different Choices of the Reference Matrix
Reference matrix estimation has a direct impact on theperformance of the alignment algorithms. RA uses the Rie-mannian mean of the resting covariance matrices for MI classi-fication, and the Riemannian mean of the non-target covariancematrices for ERP classification [see (6)]. EA estimates thereference matrix from all trials by (10), whose procedure isthe same for both MI and ERP classification.In summary, the reference matrix can be estimated from twotypes of trials for MI classification: 1) the resting trials thatthe subject is not performing any task; and, 2) the imagery trials that the subject is performing a motor imagery task.Furthermore, the reference matrix can be computed as theRiemannian mean or the Euclidean mean. So we have fourpossible combinations:
Riemannian mean of the resting trials(RR),
Euclidean mean of the resting trials (ER),
Riemannian mean of all imagery trials (RI), and
Euclidean mean of all imagery trials (EI).This subsection compares the performances of the abovefour reference matrices. The results are shown in Figs. 5(a)and 5(b) for MI Datasets 1 and 2a, respectively. They showthat:1) On average RI-MDRM outperformed RR-MDRM, andEI-CSP-LDA outperformed ER-CSP-LDA, on bothdatasets, suggesting that estimating the reference matrixfrom all imagery trials would be better than using allresting trials.2) On average across all 16 subjects, EI achieved thebest performance for CSP-LDA, and RI achieved thebest performance for MDRM. This is consistent withour expectation: MDRM operates in the Riemannianspace, hence the Riemannian mean might give a moreaccurate estimation of mean covariance matrices than theEuclidean mean; on the other hand, CSP-LDA operatesin the Euclidean space, so the Euclidean mean soundsmore reasonable.3) On average across all 16 subjects, EI-CSP-LDA out-performed RI-MDRM, suggesting that EA was advanta-geous to RA even when they both used the best referencematrix.VI. P
ERFORMANCE E VALUATION : S
IMULATED O NLINE S UPERVISED C LASSIFICATION
This section evaluates the performance of EA in simulatedonline supervised classification. The same three datasets wereused.
A. Simulated Online Supervised Classification
In online supervised classification, we have labeled trialsfrom multiple auxiliary subjects, but initially no trials at allfrom the new subject. We acquire labeled trials from the newsubject sequentially on-the-fly, which are then used to train aclassifier to label future trials from the new subject, with thehelp of data from the auxiliary subjects.We simulated the online supervised classification scenariousing the offline datasets presented in Section IV. Take MI A cc u r ac y ( % ) RR-MDRMER-MDRMRI-MDRMEI-MDRMRR-CSP-LDAER-CSP-LDARI-CSP-LDAEI-CSP-LDA (a) A cc u r ac y ( % ) RR-MDRMER-MDRMRI-MDRMEI-MDRMRR-CSP-LDAER-CSP-LDARI-CSP-LDAEI-CSP-LDA (b)Fig. 5. Comparison of different reference matrices on the MI datasets. (a)Dataset 1; (b) Dataset 2a. RR:
Riemannian mean of the resting trials; ER:
Euclidean mean of the resting trials; RI:
Riemannian mean of all imagery trials; EI:
Euclidean mean of all imagery trials.
Dataset 1 as an example. Each time we picked one subject asthe new subject, and the remaining six subjects as auxiliarysubjects. The new subject had 200 trials. We generated arandom integer n ∈ [1 , , reserved the subsequent m trials { n + i } mi =1 as the online pool , and used the remaining − m trials as the test data. Starting from an empty trainingset, we added r trials from the online pool to it each time, builta classifier by combining the training set with the auxiliarydata, evaluated its performance on the test data, until all m trials in the online pool were exhausted.The main difference between offline unsupervised classifi-cation and simulated online supervised classification is thatthe former has a large number of unlabeled trials from thenew subject, but none of them have labels, whereas the latterhas only a small number of trials from the new subject, all ofwhich are labeled. B. Simulated Online Classification Results on the MI Datasets
The four approaches (MDRM, RA-MDRM, CSP-LDA andEA-CSP-LDA) introduced in Section V-B were comparedagain in simulated online MI classification. In offline unsu-pervised classification, we had access to all unlabeled EEGtrials of the new subject, so its ¯ R was computed by using all When n + i was larger than 200, we rewound to the beginning of thetrial sequence, i.e., replaced n + i by n + i − . trials for EA, and the resting trials between them for RA. Insimulated online supervised classification, we only had accessto a small number of labeled trials from the new subject, so its ¯ R was computed by using these trials for EA, and the restingtrials between them for RA (the label information was notneeded in either EA or RA; only the EEG trials were used).All labeled trials from auxiliary subjects and the small numberof available labeled trials from the new subject were combinedto train MDRM, CSP and LDA. We paid special attention tothe implementation to make sure it was causal, i.e., we did notmake use of EEG and label information that was not supposedto be known at a given time point.We used m = 40 and r = 4 for both MI datasets. Inorder to obtain statistically meaningful results, we repeatedthe experiment 30 times (each time with a random n ) foreach new subject. The average classification accuracies of thefour approaches are presented in Fig. 6, which shows that:1) RA-MDRM outperformed MDRM on 15 out of the 16subjects, suggesting that RA was effective in simulatedonline supervised classification.2) EA-CSP-LDA outperformed CSP-LDA on 14 out of the16 subjects, suggesting that the proposed EA was alsoeffective in simulated online supervised classification.3) EA-CSP-LDA outperformed RA-MDRM on 12 out ofthe 16 subjects, suggesting that EA was generally moreeffective than RA in simulated online supervised classi-fication.To determine if the differences between our proposed al-gorithm and the others were statistically significant in sim-ulated online experiments, we first defined an aggregatedperformance measure called the area under the curve (AUC).For a particular algorithm on a particular subject, the AUCwas the area under its accuracy curve when the number oflabeled subject-specific trials increased from 4 to 40. As werepeated the experiments 30 times, we first computed themean AUC of these 30 repetitions for each subject. Eachalgorithm had N mean AUCs, where N was the number ofsubjects. We then compared these mean AUCs using paired-sample t -tests. The results are shown in Table VI, where thestatistically significant ones are marked in bold. EA-CSP-LDAsignificantly outperformed RA-MDRM on Dataset 1, and hadcomparable performance with it on Dataset 2a, suggesting thatEA may be preferred over RA. TABLE VIP
AIRED - SAMPLE t - TEST RESULTS ON THE MEAN
AUC
S IN SIMULATEDONLINE MI CLASSIFICATION .MI Dataset 1MDRM RA-MDRM CSP-LDAEA-CSP-LDA
MI Dataset 2MDRM RA-MDRM CSP-LDAEA-CSP-LDA
C. Simulated Online Classification Results on the ERPDataset
Four approaches (MDRM, RA-MDRM, xDAWN-SVM, andEA-xDAWN-SVM) were compared in simulated online su-
Subject 1
Subject 2
Subject 3
Subject 4
Subject 5
Subject 6
Subject 7
Average (a)
Subject 1
Subject 2
Subject 3
Subject 4
Subject 5
Subject 6
Subject 7
Subject 8
Subject 9
Average
MDRMRA-MDRMCSP-LDAEA-CSP-LDA (b)Fig. 6. Classification accuracies (%) of simulated online learning on the MIdatasets: (a) Dataset 1; (b) Dataset 2a. The horizontal axis shows the numberof subject-specific labeled trials from the new subject. The error bars indicatethe 95% confidence intervals. The legends in (a) are the same as those in (b). pervised classification on the ERP dataset. Note that MDRMand RA-MDRM were not used in offline unsupervised ERPclassification because they needed some labeled trials from thenew subject to construct the augmented trials, which were notavailable in offline unsupervised classification. However, theywere used in simulated online supervised ERP classificationbecause here labeled trials were available.We used m = 80 and r = 10 , and started with 20 trialsin the first iteration. In order to obtain statistically meaningfulresults, we again repeated the experiment 30 times (each timewith a random n ) for each new subject. The average BCAsof the four approaches are shown in Fig. 7. Observe that:1) On average RA-MDRM outperformed MDRM, and EA-xDAWN-SVM outperformed xDAWN-SVM, suggestingthat both alignment approaches were effective in simu-lated online supervised classification.2) EA-xDAWN-SVM outperformed RA-MDRM on all 11subjects, suggesting that the proposed EA was moreeffective than RA in simulated online supervised classi-fication.
20 40 60 8050100
Subject 1
20 40 60 8050100
Subject 2
20 40 60 8050100
Subject 3
20 40 60 8050100
Subject 4
20 40 60 8050100
Subject 5
20 40 60 8050100
Subject 6
20 40 60 8050100
Subject 7
20 40 60 8050100
Subject 8
20 40 60 8050100
Subject 9
20 40 60 8050100
Subject 10
20 40 60 8050100
Subject 11
20 40 60 8050100
Average
MDRMRA-MDRMxDAWN-SVMEA-xDAWN-SVM
Fig. 7. BCAs (%) of simulated online calibration on the ERP dataset. Thehorizontal axis shows the number of subject-specific labeled trials from thenew subject. The error bars indicate the 95% confidence intervals.
Paired-sample t -tests were also performed to compare EA-xDAWN-SVM with the other three algorithms. The results areshown in Table VII, where the statistically significant ones aremarked in bold. EA-xDAWN-SVM significantly outperformedall other approaches, suggesting that the proposed EA waseffective and may be preferred over RA. TABLE VIIP
AIRED - SAMPLE t - TEST RESULTS ON THE MEAN
AUC
S IN SIMULATEDONLINE
ERP
CLASSIFICATION .MDRM RA-MDRM xDAWN-SVMEA-xDAWN-SVM
VII. C
ONCLUSION AND F UTURE R ESEARCH
Transfer learning is a promising approach to improve theEEG classification performance in BCIs, by using labeleddata from auxiliary subjects in similar tasks. However, dueto individual differences, if the EEG trials from differentsubjects are not aligned properly, the discrepancies amongthem may result in negative transfer. A Riemannian spacecovariance matrix alignment approach (RA) has been proposedto transform the covariance matrices of EEG trials to give thema common reference. However, it has some limitations: 1) italigns the covariance matrices instead of the EEG trials, so aclassifier that operates directly on the covariance matrices mustbe used to take advantage of the alignment, whereas there arevery few such classifiers; 2) its computational cost is high;and, 3) it needs some labeled subject-specific trials from thenew subject for ERP-based BCIs.This paper has proposed a Euclidean space EEG trialalignment approach (EA), which has three desirable properties:1) it aligns the EEG trials directly in the Euclidean space, and any signal processing, feature extraction and machine learningalgorithms can be applied to the aligned trials, so it has muchbroader applications than the Riemannian space alignmentapproach; 2) it can be computed several times faster than theRiemannian space alignment approach; and, 3) it does not needany labeled trials from the new subject. Experiments in offlineand simulated online classification on two MI datasets and oneERP dataset verified the effectiveness and efficiency of EA.However, the current EA may still have some limitations.Its goal is to compensate the dataset shift among differentsubjects, which includes three types of shift:1)
Covariate shift [28], [29]: the distribution of the inputs(independent variables) changes.2)
Prior probability shift : the distribution of the output(target variable) changes.3)
Concept shift [31]: the relationship between the inputsand the output changes.The current EA only considers covariate shift but ignores theother two. So, the per-class input data distributions may stillhave large discrepancies among different subjects after EA.Moreover, in compensating for the covariate shift, EA mayeven increase the concept shift, i.e., it is possible that fora specific subject, the two classes become more difficult todistinguish after EA. These could be some of the reasonswhy EA demonstrated improved performance on most but notall subjects. Another possible reason that EA did not offeradvantages on some subjects is that there could be bad trialsand/or outliers for these subjects. Including these trials incomputing the reference matrix ¯ R would result in a large error,which further affects the classification accuracy.Additionally, we acknowledge that the simulated onlinesupervised classification experiments are not identical to realonline experiments. Our results would be more convincingif they were obtained from real experiments. Our futureresearch will investigate and accommodate the limitations ofEA, and validate the improvements in real-world closed-loopBCI experiments. R EFERENCES[1] A. Barachant, S. Bonnet, M. Congedo, and C. Jutten, “Multiclass brain-computer interface classification by Riemannian geometry,”
IEEE Trans.on Biomedical Engineering , vol. 59, no. 4, pp. 920–928, 2012.[2] A. Barachant and M. Congedo, “A plug & play P300 BCI usinginformation geometry,” arXiv: 1409.0107 , 2014.[3] C. M. Bishop,
Pattern Recognition and Machine Learning . NY:Springer-Verlag, 2006.[4] B. Blankertz, R. Tomioka, S. Lemm, M. Kawanabe, and K. R. Muller,“Optimizing spatial filters for robust EEG single-trial analysis,”
IEEESignal Processing Magazine , vol. 25, no. 1, pp. 41–56, 2008.[5] B. Blankertz, G. Dornhege, M. Krauledat, K. R. Muller, and G. Curio,“The non-invasive Berlin brain-computer interface: Fast acquisition ofeffective performance in untrained subjects,”
NeuroImage , vol. 37, no. 2,pp. 539–550, 2007.[6] C.-C. Chang and C.-J. Lin, “LIBSVM: A library for support vectormachines,”
ACM Trans. on Intelligent Systems and Technology , vol. 2,no. 3, pp. 27:1–27:27, 2011.[7] M. Congedo, A. Barachant, and A. Andreev, “A new generation of brain-computer interface based on Riemannian geometry,” arXiv: 1310.8115 ,2013.[8] P. T. Fletcher and S. Joshi, “Principal geodesic analysis on symmetricspaces: Statistics of diffusion tensors,”
Lecture Notes in ComputerScience , vol. 3117, pp. 87–98, 2004. [9] A. L. Goldberger, L. A. N. Amaral, L. Glass, J. M. Hausdorff, P. C.Ivanov, R. G. Mark, J. E. Mietus, G. B. Moody, C.-K. Peng, and H. E.Stanley, “PhysioBank, PhysioToolkit, and PhysioNet: Components of anew research resource for complex physiologic signals,” Circulation ,vol. 101, no. 23, pp. e215–e220, 2000.[10] A. Gretton, K. M. Borgwardt, M. Rasch, B. Sch¨olkopf, and A. J. Smola,“A kernel method for the two-sample-problem,” in
Proc. Advances inNeural Information Processing Systems , Vancouver, Canada, Dec. 2007,pp. 513–520.[11] H. He and D. Wu, “Transfer learning enhanced common spatial patternfiltering for brain computer interfaces (BCIs): Overview and a newapproach,” in
Proc. 24th Int’l. Conf. on Neural Information Processing ,Guangzhou, China, November 2017.[12] V. Jayaram, M. Alamgir, Y. Altun, B. Scholkopf, and M. Grosse-Wentrup, “Transfer learning in brain-computer interfaces,”
IEEE Com-putational Intelligence Magazine , vol. 11, no. 1, pp. 20–31, 2016.[13] H. Kang, Y. Nam, and S. Choi, “Composite common spatial pattern forsubject-to-subject transfer,”
Signal Processing Letters , vol. 16, no. 8, pp.683–686, 2009.[14] P.-J. Kindermans, M. Tangermann, K.-R. M¨uller, and B. Schrauwen,“Integrating dynamic stopping, transfer learning and language models inan adaptive zero-training ERP speller,”
Journal of Neural Engineering ,vol. 11, no. 3, p. 035005, 2014.[15] R. J. Kobler and R. Scherer, “Restricted Boltzmann machines in sen-sory motor rhythm brain-computer interfacing: a study on inter-subjecttransfer and co-adaptation,” in
Proc. IEEE Int’l Conf. on Systems, Man,and Cybernetics . Budapest, Hungary: IEEE, Oct. 2016, pp. 469–474.[16] Z. J. Koles, M. S. Lazar, and S. Z. Zhou, “Spatial patterns underlyingpopulation differences in the background EEG,”
Brain Topography ,vol. 2, no. 4, pp. 275–284, 1990.[17] B. J. Lance, S. E. Kerick, A. J. Ries, K. S. Oie, and K. McDowell,“Brain-computer interface technologies in the coming decades,”
Proc.of the IEEE , vol. 100, no. 3, pp. 1585–1599, 2012.[18] H. W. Lilliefors, “On the Kolmogorov-Smirnov test for normalitywith mean and variance unknown,”
Journal of the American statisticalAssociation , vol. 62, no. 318, pp. 399–402, 1967.[19] F. Lotte and C. Guan, “Learning from other subjects helps reducingbrain-computer interface calibration time,” in
Proc. IEEE Int’l. Conf. onAcoustics Speech and Signal Processing (ICASSP) , Dallas, TX, March2010.[20] A. Matran-Fernandez and R. Poli, “Towards the automated localisationof targets in rapid image-sifting by collaborative brain-computer inter-faces,”
PLoS ONE , vol. 12, pp. 21–34, 2017.[21] J. M¨uller-Gerking, G. Pfurtscheller, and H. Flyvbjerg, “Designing op-timal spatial filters for single-trial EEG classification in a movementtask,”
Clinical Neurophysiology , vol. 110, no. 5, pp. 787–798, 1999.[22] L. F. Nicolas-Alonso and J. Gomez-Gil, “Brain computer interfaces, areview,”
Sensors , vol. 12, no. 2, pp. 1211–1279, 2012.[23] S. J. Pan and Q. Yang, “A survey on transfer learning,”
IEEE Trans.on Knowledge and Data Engineering , vol. 22, no. 10, pp. 1345–1359,2010.[24] G. Pfurtscheller, G. R. M¨uller-Putz, R. Scherer, and C. Neuper, “Re-habilitation with brain-computer interface systems,”
Computer , vol. 41,no. 10, pp. 58–65, 2008.[25] H. Ramoser, J. Muller-Gerking, and G. Pfurtscheller, “Optimal spatialfiltering of single trial EEG during imagined hand movement,”
IEEETrans. on Rehabilitation Engineering , vol. 8, no. 4, pp. 441–446, 2000.[26] B. Rivet, A. Souloumiac, V. Attina, and G. Gibert, “xDAWN algorithmto enhance evoked potentials: application to brain-computer interface,”
IEEE Trans. on Biomedical Engineering , vol. 56, no. 8, pp. 2035–2043,2009.[27] W. Samek, F. Meinecke, and K.-R. Muller, “Transferring subspaces be-tween subjects in brain-computer interfacing,”
IEEE Trans. on Biomed-ical Engineering , vol. 60, no. 8, pp. 2289–2298, 2013.[28] H. Shimodaira, “Improving predictive inference under covariate shift byweighting the log-likelihood function,”
Journal of Statistical Planningand Inference , vol. 90, no. 2, pp. 227–244, 2000.[29] M. Sugiyama, S. Nakajima, H. Kashima, P. V. Buenau, and M. Kawan-abe, “Direct importance estimation with model selection and its ap-plication to covariate shift adaptation,” in
Proc. 32nd Annual Conf.on Advances in Neural Information Processing Systems , Vancouver,Canada, Dec. 2008, pp. 1433–1440.[30] B. Sun, J. Feng, and K. Saenko, “Return of frustratingly easy domainadaptation,” in
Proc. 30th AAAI Conf. on Artificial Intelligence , vol. 6,no. 7, Phoenix, AZ, Feb. 2016, pp. 2058–2065.[31] P. E. Utgoff, “Shift of bias for inductive concept learning,” in
Machinelearning: An artificial intelligence approach , R. Michalski, J. Carbonell, and T. Mitchell, Eds. CA: Morgan Kaufmann, 1986, vol. 2, pp. 107–148.[32] L. van der Maaten and G. Hinton, “Visualizing data using t-SNE,”
Journal of Machine Learning Research , vol. 9, pp. 2579–2605, 2008.[33] J. van Erp, F. Lotte, and M. Tangermann, “Brain-computer interfaces:Beyond medical applications,”
Computer , vol. 45, no. 4, pp. 26–34,2012.[34] J. R. Wolpaw, N. Birbaumer, D. J. McFarland, G. Pfurtscheller, and T. M.Vaughan, “Brain-computer interfaces for communication and control,”
Clinical Neurophysiology , vol. 113, no. 6, pp. 767–791, 2002.[35] D. Wu, “Active semi-supervised transfer learning (ASTL) for offlineBCI calibration,” in
Proc. IEEE Int’l. Conf. on Systems, Man andCybernetics , Banff, Canada, October 2017.[36] D. Wu, “Online and offline domain adaptation for reducing BCI cali-bration effort,”
IEEE Trans. on Human-Machine Systems , vol. 47, no. 4,pp. 550–563, 2017.[37] D. Wu, J.-T. King, C.-H. Chuang, C.-T. Lin, and T.-P. Jung, “Spatialfiltering for EEG-based regression problems in brain-computer interface(BCI),”
IEEE Trans. on Fuzzy Systems , vol. 26, no. 2, pp. 771–781,2018.[38] D. Wu, V. J. Lawhern, S. Gordon, B. J. Lance, and C.-T. Lin, “Driverdrowsiness estimation from EEG signals using online weighted adap-tation regularization for regression (OwARR),”
IEEE Trans. on FuzzySystems , vol. 25, no. 6, pp. 1522–1535, 2017.[39] D. Wu, V. J. Lawhern, W. D. Hairston, and B. J. Lance, “SwitchingEEG headsets made easy: Reducing offline calibration effort using activewighted adaptation regularization,”
IEEE Trans. on Neural Systems andRehabilitation Engineering , vol. 24, no. 11, pp. 1125–1137, 2016.[40] D. Wu, V. J. Lawhern, B. J. Lance, S. Gordon, T.-P. Jung, and C.-T. Lin, “EEG-based user reaction time estimation using Riemanniangeometry features,”
IEEE Trans. on Neural Systems and RehabilitationEngineering , vol. 25, no. 11, pp. 2157–2168, 2017.[41] F. Yger, M. Berar, and F. Lotte, “Riemannian approaches in brain-computer interfaces: a review,”
IEEE Trans. on Neural Systems andRehabilitation Engineering , vol. 25, no. 10, pp. 1753–1762, 2017.[42] P. Zanini, M. Congedo, C. Jutten, S. Said, and Y. Berthoumieu, “Transferlearning: a Riemannian geometry framework with applications to brain-computer interfaces,”