[PDF] Transfer Learning for Brain-Computer Interfaces: A Euclidean Space Data Alignment Approach

Abstract

Objective: This paper targets a major challenge in developing practical EEG-based brain-computer interfaces (BCIs): how to cope with individual differences so that better learning performance can be obtained for a new subject, with minimum or even no subject-specific data? Methods: We propose a novel approach to align EEG trials from different subjects in the Euclidean space to make them more similar, and hence improve the learning performance for a new subject. Our approach has three desirable properties: 1) it aligns the EEG trials directly in the Euclidean space, and any signal processing, feature extraction and machine learning algorithms can then be applied to the aligned trials; 2) its computational cost is very low; and, 3) it is unsupervised and does not need any label information from the new subject. Results: Both offline and simulated online experiments on motor imagery classification and event-related potential classification verified that our proposed approach outperformed a state-of-the-art Riemannian space data alignment approach, and several approaches without data alignment. Conclusion: The proposed Euclidean space EEG data alignment approach can greatly facilitate transfer learning in BCIs. Significance: Our proposed approach is effective, efficient, and easy to implement. It could be an essential pre-processing step for EEG-based BCIs.

Full PDF

aa r X i v : . [ c s . L G ] A p r Transfer Learning for Brain-Computer Interfaces:A Euclidean Space Data Alignment Approach

He He and Dongrui Wu

Abstract — Objective : This paper targets a major challengein developing practical EEG-based brain-computer interfaces(BCIs): how to cope with individual differences so that betterlearning performance can be obtained for a new subject, withminimum or even no subject-speciﬁc data?

Methods : We proposea novel approach to align EEG trials from different subjectsin the Euclidean space to make them more similar, and henceimprove the learning performance for a new subject. Ourapproach has three desirable properties: 1) it aligns the EEGtrials directly in the Euclidean space, and any signal processing,feature extraction and machine learning algorithms can then beapplied to the aligned trials; 2) its computational cost is verylow; and, 3) it is unsupervised and does not need any labelinformation from the new subject.

Results : Both ofﬂine andsimulated online experiments on motor imagery classiﬁcation andevent-related potential classiﬁcation veriﬁed that our proposedapproach outperformed a state-of-the-art Riemannian spacedata alignment approach, and several approaches without dataalignment.

Conclusion : The proposed Euclidean space EEG dataalignment approach can greatly facilitate transfer learning inBCIs.

Signiﬁcance : Our proposed approach is effective, efﬁcient,and easy to implement. It could be an essential pre-processingstep for EEG-based BCIs.

Index Terms —Brain-computer interface, data alignment, EEG,Riemannian geometry, transfer learning

I. I

NTRODUCTION

A brain-computer interface (BCI) [17], [34] is a commu-nication pathway for a user to interact with his/her surround-ings by using brain signals, which contain information aboutthe user’s cognitive state or intentions. Electroencephalogram(EEG) is the most popular input in BCI systems. Motorimagery (MI) and event-related potentials (ERPs) are twocommon approaches of EEG-based BCIs, and also the focusof this paper.For MI-based BCIs, the user needs to imagine the move-ments of his/her body parts (e.g., hands, feet, and tongue),which causes modulations of brain rhythms in the involvedcortical areas. So, the imagination of different movementscan be distinguished from the spatial localization of differentsensorimotor rhythm modulations, and then used to controlexternal devices. For ERP-based BCIs, the user is stimulatedby a majority of common stimuli (non-target) and a small

He He and Dongrui Wu are with the Key Laboratory of Image Processingand Intelligent Control (Huazhong University of Science and Technology),Ministry of Education. They are also with the School of Artiﬁcial Intelligenceand Automation, Huazhong University of Science and Technology, Wuhan,China. Email: [email protected], [email protected] Wu is the corresponding author.This work has been submitted to the IEEE for possible publication.Copyright may be transferred without notice, after which this version mayno longer be accessible. number of rare stimuli (target). The EEG response shows aspecial ERP pattern after the user perceives a target stimulus.So, a target stimulus can be detected by determining if thereis an ERP pattern associated with it.Early BCI systems were mainly used to help people withdisabilities [24]. For example, MI-based BCIs have been usedto help severely paralyzed patients to control powered ex-oskeletons or wheelchairs without the involvement of muscles,and ERP spellers enable patients who can not move nor speakto type. Recently, the application scope of BCIs has beenextended to able-bodied people [22], [33], and EEG becomesthe most popular input signal because it is easy and safeto acquire, and has high temporal resolution. However, EEGmeasures the very weak brain electrical signals from the scalp,which results in poor spatial resolution and low signal-to-noiseratio [4].Consequently, sophisticated signal processing and machinelearning algorithms are needed in EEG-based BCI systems todecode the EEG signal, especially for single-trial classiﬁcationof EEG signals in real-world applications. Usually the EEGsignals are ﬁrst band-pass ﬁltered and spatially ﬁltered toincrease the signal-to-noise ratio, and then discriminative fea-tures are extracted, which are next fed into machine learningalgorithms such as Linear Discriminant Analysis (LDA) andSupport Vector Machine (SVM) [3] for classiﬁcation.The covariance matrix of multi-channel EEG signals playsan important role in signal processing. For instance, commonspatial pattern (CSP) ﬁlters [11], [16], [21], [25], computeddirectly from the covariance matrices, are the most popularspatial ﬁlters for MI. An intuitive explanation is that theinteractions between different channels are encoded in thecovariance matrices, which can be decomposed to ﬁnd thespatial distribution of brain activities.Recent years have also witnessed an increasing interest inusing the EEG covariance matrices for both classiﬁcation andregression [1], [7], [40], [41]. Since the covariance matricesare symmetric positive deﬁnite (SPD) and lie on a Riemannianmanifold, a popular approach is to view each covariance matrixas a point in the Riemannian space, and use its geodesicto the Riemannian mean as a feature in classiﬁcation. Thisapproach is called the Minimum Distance to Riemannian Mean(MDRM) classiﬁer [1], [7], [41].MDRM can be directly applied to MI-based BCIs becausethe spatial information plays the most critical role in decodingMI signals. However, the discriminative information of ERPsignals is represented temporally rather than spatially. SoBarachant and Congedo [2] augmented the ERP trials toembed this temporal information. More speciﬁcally, the mean of the ERP trials is concatenated to each trial. The covariancematrix of the concatenated trial then contains both temporaland spatial information, which makes MDRM also applicableto ERP classiﬁcation.Transfer learning (TL) [23], which utilizes information insource domains to improve the learning performance in atarget domain, has also been successfully used for BCIs[12], [35], [36], [38], [39]. Kang et al. [13] and Lotte andGuan [19] improved covariance matrix estimation for CSPﬁlters by regularizing it towards the average of other subjects,or constructing a common feature space. Samek et al. [27]proposed an approach to transfer information about non-stationarities in the data to reduce the shift between subjects,and veriﬁed its performance in MI BCIs. Kindermans etal. [14] integrated dynamic stopping, transfer learning andlanguage model in a probabilistic zero-training frameworkand demonstrated competitive performance to a state-of-the-art supervised classiﬁer in an ERP speller. Kobler and Scherer[15] pre-trained a Restricted Boltzmann Machine on a publiclyavailable dataset and then adapted it to new observations insensory motor rhythm based BCI.Recently, Zanini et al. [42] proposed a TL framework for theMDRM classiﬁer, which is denoted as Riemannian alignment(RA)-MDRM in this paper, by utilizing the information ofthe resting state. In MI, the resting state is the time windowthat the subject is not performing any task, e.g., the transitionwindow between two successive imageries. In ERP, particu-larly rapid serial visual presentation (RSVP), the stimuli arepresented quickly one after another and the responses overlap,so it is difﬁcult to ﬁnd the resting state. [42] used the non-target stimuli as the resting state in ERP, which means somelabeled data from the new subject must be known.Experiments have shown that RA-MDRM outperformedMDRM in MI and ERP tasks [42], when compared in aTL setting. But as mentioned above, it still needs a smallamount of labeled subject-speciﬁc calibration trials for ERPclassiﬁcation. Moreover, for both MI and ERP, the classiﬁca-tion is performed in the Riemannian space, whose geodesiccomputation is much more complicated, time-consuming, andunstable than the distance calculation in the Euclidean space.In this paper we propose a new EEG data alignment approachin the Euclidean space, which has the following desirablecharacteristics:1) It transforms and aligns the EEG trials in the Euclideanspace, and any signal processing, feature extraction andmachine learning algorithms can then be applied to thealigned trials. On the contrary, RA aligns the covariancematrices (instead of the EEG trials themselves) in theRiemannian space, and hence a subsequent classiﬁermust be able to operate on the covariance matricesdirectly, whereas there are very few such classiﬁers.2) It can be computed several times faster than RA.3) It only requires unlabeled EEG trials but does not needany label information from the new subject; so, it canbe used in completely unsupervised learning.The effectiveness of our proposed approach is then demon-strated in two BCI classiﬁcation scenarios: 1)

Ofﬂine unsupervised classiﬁcation , in which unlabeledEEG trials from a new subject are available, and weneed to label them by making use of auxiliary labeleddata from other subjects.2)

Simulated online supervised classiﬁcation , in which asmall number of labeled EEG epochs from a new subjectare obtained sequentially on-the-ﬂy, and a classiﬁer istrained from them and auxiliary labeled data from othersubjects to label future incoming epochs from the newsubject.The remainder of this paper is organized as follows: Sec-tion II introduces the RA-MDRM approach in the Riemannianspace. Section III proposes our Euclidean space data alignmentapproach. Section IV introduces the three datasets used in ourexperiments, including two MI datasets and one ERP dataset.Sections V and VI compare the performance of our approachwith RA-MDRM in ofﬂine and simulated online learning,respectively. Finally, Section VII draws conclusion and pointsout some future research directions.II. R

ELATED W ORK

The covariance matrices of EEG trials are SPD, and lie ina Riemannian space instead of a Euclidean space [41]. Sincethe covariance matrices directly encode the spatial informationof the EEG trials, and by appropriately augmenting the EEGtrials (such as in ERP classiﬁcation) they can also encodethe temporal information, we can perform EEG classiﬁcationdirectly based on the covariance matrices.This section introduces the MDRM classiﬁer, which assignsa trial to the class whose Riemannian mean is the closest toits covariance matrix, and also a Riemannian space covariancematrix alignment approach (RA).

A. Riemannian Distance

The Riemannian distance between two SPD matrices P and P is called the geodesic , which is the minimum length of acurve connecting them on the Riemannian manifold: δ ( P , P ) = k log( P − P ) k F = " R X r =1 log λ r , (1)where the subscript F denotes the Frobenius norm, and λ r ( r = 1 , , · · · , R ) are the real eigenvalues of P − P .The Riemannian distance between two SPD matrices P and P remains unchanged under linear invertible transformation: δ ( C T P C, C T P C ) = δ ( P , P ) , (2)where C is an invertible matrix. This property of the Rieman-nian distance is called congruence invariance . B. Riemannian Mean

The mean of a set of SPD matrices can be computed inthe Euclidean space as their arithmetic mean, and also in theRiemannian space as the Riemannian mean (geometric mean), deﬁned as the matrix minimizing the sum of the squaredRiemannian distances: ̺ ( P , · · · , P N ) = arg min P N X n =1 δ ( P, P n ) . (3)There is no closed-form solution to (3), and it is usuallycomputed by an iterative gradient descent algorithm [8]. C. MDRM

The MDRM classiﬁer [1], [7], [41] ﬁrst computes theRiemannian mean of each class from the covariance matricesof the labeled training trials, then assigns each test trial to theclass whose Riemannian mean is the closest to its covariancematrix, i.e., g (Σ) = arg min c δ (Σ , ¯Σ c ) , (4)where Σ is the covariance matrix of the test trial, ¯Σ c is theRiemannian mean of Class c , and g (Σ) is the predicted classlabel of Σ . D. RA-MDRM

Zanini et al. [42] proposed a novel TL approach in theRiemannian space, referred to in this paper as RA-MDRM, toimprove the performance of the MDRM classiﬁer by utilizingauxiliary data from other sessions and/or subjects when thereare only a few labeled trials from a new subject. Since thecovariance matrices of the trials are the input to MDRM, RA-MDRM aims to align the covariance matrices from differentsessions/subjects to give them a common reference. [42]assumes that “ different source conﬁgurations and electrodepositions induce shifts of covariance matrices with respect toa reference (resting) state, but that when the brain is engagedin a speciﬁc task, covariance matrices move over the SPDmanifold in the same direction. ” Then RA-MDRM centers “ thecovariance matrices of every session/subject with respect to areference covariance matrix so that what we observe is onlythe displacement with respect to the reference state due to thetask. ”More speciﬁcally, RA-MDRM ﬁrst computes the covariancematrices of some resting trials, { R i } ki =1 , in which the subjectis not performing any task, and then computes the Riemannianmean ¯ R of these matrices. ¯ R is then used as the referencematrix in RA-MDRM to reduce the inter-session/subject vari-ability by the following transformation: ˜Σ i = ¯ R − / Σ i ¯ R − / , (5)where Σ i is the covariance matrix of the i th trial, and ˜Σ i isthe corresponding aligned covariance matrix.Equation (5) makes the reference state of different ses-sions/subjects centered at the identity matrix. This transfor-mation would not change the distance between the covariancematrices belonging to the same session/subject because ofthe congruence invariance property in (2), but makes thecovariance matrices of different sessions/subjects move overthe Riemannian manifold in different directions with respectto the corresponding reference matrices, and hence reduces the cross-session/subject differences. As a result, covariancematrices from different sessions/subjects can be aligned andbecome comparable if ¯ R can be appropriately estimated.In MI, the resting state is the time window that the subject isnot performing any task, e.g., the transition window betweentwo imageries. In ERP, particularly the RSVP, the stimuli arepresented quickly one after another and the responses overlap,so it is difﬁcult to ﬁnd the resting state. [42] used the non-target stimuli as the resting state in ERP, which requires thatsome labeled trials from the new subject must be known. Thatis, in ERP, ¯ R = arg min R X i ∈ I δ ( R, Σ i ) , (6)where I is the index set of the non-target trials.RA-MDRM can be applied to both MI and ERP data; how-ever, there is an important difference in building covariancematrices in these two paradigms.Speciﬁcally, the covariance matrix of an MI trial X i issimply computed as: Σ i = X i X Ti . (7) Σ i encodes the most discriminative information of an MI trial,i.e., the spatial distribution of the brain activity.However, the main discriminative information of ERP trialsis carried temporally rather than spatially. The normal covari-ance matrix such as (7) ignores this temporal information.So Barachant and Congedo [2] proposed a novel approachto augment the ERP trials so that their covariance matricescan also encode the temporal information. They ﬁrst computethe mean of the ERP trials: ¯ X = 1 | I | X i ∈ I X i , (8)where I is the index set of the ERP trials. They then build anaugmented trial X ∗ i by concatenating ¯ X and X i : X ∗ i = (cid:20) ¯ XX i (cid:21) (9)The covariance matrix of X ∗ i is then used in RA-MDRM. E. Limitations of RA

Although RA-MDRM has demonstrated promising perfor-mance in several BCI applications [42], it still has somelimitations:1) RA-MDRM aligns the covariance matrices in the Rie-mannian space, instead of the EEG trials themselves.A subsequent classiﬁer must be able to operate on thecovariance matrices directly, whereas there are very fewsuch classiﬁers.2) RA-MDRM uses the Riemannian mean of the covari-ance matrices, which is time-consuming to compute,especially when the number of EEG channels is large.3) RA-MDRM for ERP classiﬁcation needs some labeledtrials from the new subject, more speciﬁcally, RA needssome non-target trials to compute the reference matrixin (6), and MDRM needs some target trials to construct X ∗ i in (9), so it is a supervised learning approach andcannot be used when there is no label information fromthe new subject at all.III. EEG D ATA A LIGNMENT IN THE E UCLIDEAN S PACE (EA)This section introduces our proposed Euclidean-space align-ment (EA) approach.

A. The EA

To cope with the limitations of RA, we propose EA thatdoes not need any labeled data from the new subject, and canbe computed much more efﬁciently. The rationale is to makethe data distributions from different subjects more similar, andhence a classiﬁer trained on the auxiliary data would have abetter chance to perform well on the new subject. This ideahas been widely used in TL [23], [30], [39].Similar to RA, our approach is also based on a referencematrix ¯ R , but estimated in a different way. Assume a subjecthas n trials. Then, ¯ R = 1 n n X i =1 X i X Ti , (10)i.e., ¯ R is the arithmetic mean of all covariance matrices froma subject. We then perform the alignment by ˜ X i = ¯ R − / X i . (11)After the alignment, the mean covariance matrix of all n aligned trials is: n n X i =1 ˜ X i ˜ X Ti = 1 n n X i =1 ¯ R − / X i X Ti ¯ R − / = ¯ R − / n n X i =1 X i X Ti ! ¯ R − / = ¯ R − / ¯ R ¯ R − / = I, (12)i.e., the mean covariance matrices of all subjects are equal tothe identity matrix after alignment, and hence the distributionsof the covariance matrices from different subjects are moresimilar. This is very desirable in TL.The idea of EA can also be explained using the conceptof maximum mean discrepancy (MMD) [10], [39], widelyused in TL. MMD represents the distances between differentdistributions as the distances between their mean embeddingsof features. Smaller distances indicate that the distributions aremore similar, and hence more suitable for TL. If we view thecovariance matrices as the feature embeddings of EEG trials,then, after EA, the distances between EEG trials from differentsubjects become zero (because the mean covariance matricesof all subjects are identical), which should generally beneﬁtTL. B. Comparison with RA

Both EA and RA ensure the Riemannian distances amongthe covariance matrices are kept unchanged after the align-ment. However, there are three major differences betweenthem:1) RA computes the reference matrix ¯ R as the Riemannian(geometric) mean of the resting state covariance matri-ces , whereas EA computes the reference matrix ¯ R as the Euclidean (arithmetic) mean of all covariance matrices .2) RA aligns the covariance matrices in the Riemannianspace , whereas EA aligns the time domain EEG trials in the

Euclidean space .3) After RA, the

Riemannian mean of the resting statecovariance matrices becomes an identity matrix (butthe Riemannian mean of all covariance matrices is not).After EA, the

Euclidean mean of all covariance matrix becomes an identity matrix.Compared with RA, EA has the following desirable prop-erties:1) EA transforms and aligns the EEG trials in the Eu-clidean space. Any subsequent signal processing, featureextraction and machine learning algorithms can then beapplied to the aligned trials. So, it has much broader ap-plications than RA, which aligns the covariance matrices(instead of the EEG trials) in the Riemannian space.2) EA can be computed much faster than RA, becauseEA uses the arithmetic mean as the reference matrix,whereas RA uses the Riemannian mean as the referencematrix.3) EA does not need any label information from the newsubject, whereas RA needs some label information forERP classiﬁcation.

C. Relationship to CORAL

A “ frustratingly easy domain adaptation ” approach, COR-relation ALignment (CORAL) [30], was proposed in 2016 tominimize domain shift by aligning the second-order statisticsof different distributions, without requiring any target labels.Its idea is very similar to EA.CORAL considers 1D features (vectors), instead of 2Dfeatures (matrices) such as EEG trials in this paper. Let C S ∈ R d S × d S and C T ∈ R d T × d T be the feature covariancematrices in the source and target domains, respectively, where d S and d T are the number of features in the source andtarget domains, respectively. Then, CORAL ﬁnds a lineartransformation A ∈ R d S × d T to the source domain features,so that the Frobenius norm of the difference between theircovariance matrices is minimized, i.e., min A || A T C S A − C T || F (13)The linear transformation A has a simple closed-form solution[30].EA and CORAL are similar; however, there are also someimportant differences:1) CORAL considers features, and each domain hasonly one covariance matrix, which measures the covari-ances between different pairs of individual features. EA considers features (EEG trials), and each domainhas many covariance matrices (each corresponding toone EEG trial), each of which measures the covariancesbetween different pairs of EEG channels in an EEG trial.2) CORAL minimizes the distance between the covariancematrices in different domains, whereas EA minimizesthe distance between the mean covariance matrices indifferent domains.3) CORAL ﬁnds a linear transformation to the sourcedomain features only, so that the transformed sourcedomain covariance matrix approaches the original targetdomain covariance matrix. EA ﬁnds a separate lineartransformation to each domain, so that the mean of thetransformed source domain covariance matrices equalsthe mean of the transformed target domain covariancematrices. IV. D ATASETS

This section introduces two MI datasets and one ERPdataset used in our experiments.

A. MI Datasets

Two MI datasets from BCI Competition IV were used.Their experimental paradigms were similar: In each sessiona subject sat in a comfortable chair in front of a computer.At the beginning of a trial, a ﬁxation cross appeared on theblack screen to prompt the subject to be prepared. A momentlater, an arrow pointing to a certain direction was presentedas a visual cue for a few seconds. In this period the subjectwas asked to perform a speciﬁc MI task without feedbackaccording to the direction of the arrow. Then the visual cuedisappeared from the screen and a short break followed untilthe next trial began.The ﬁrst dataset (Dataset 1 [5]) was recorded from sevenhealthy subjects. For each subject two classes of MI wereselected from three classes: left hand, right hand, and foot.Continuous 59-channel EEG signals were acquired for threephases: calibration, evaluation, and special feature. Here weonly used the calibration data which provided complete markerinformation. Each subject had 100 trials from each class in thecalibration phase.The second MI dataset (Dataset 2a) consisted of EEGdata from nine heathy subjects. Each subject was instructedto perform four different MI tasks, namely the imaginationof the movement of the left hand, right hand, both feet, andtongue. 22-channel EEG signals and 3-channel EOG signalswere recorded at 250Hz. A training phase and an evaluationphase were recorded on different days for each subject. Herewe only used the EEG data from the training phase, whichincluded complete marker information. Additionally, two MIclasses (left hand and right hand) were selected and each classhad 72 trials.A causal band-pass ﬁlter (50-order linear phase Hammingwindow FIR ﬁlter designed by Matlab function ﬁr1 , with [0 . , . seconds after thecue appearance as our trials for both datasets. EEG signalsbetween [4 . , . seconds after the cue appearance wereextracted as resting states. B. ERP Dataset

We used an RSVP dataset from PhysioNet [9] for ERPclassiﬁcation. It contained EEG data from 11 healthy subjectsupon rapid presentation of images at 5, 6, and 10 Hz [20].Each subject was seated in front of a computer showing aseries of images rapidly. The images were aerial pictures ofLondon falling into two categories, namely target images andnon-target images. Target images contained a randomly rotatedand positioned airplane that had been photo realistically super-imposed, and non-target images did not contain airplanes. Thetask was to recognize if the images were target or non-targetfrom EEG signals, which were recorded from 8 channels at2048 Hz.For each presentation rate and subject there were twosessions represented by “a” and “b”, which indicated whetherthe ﬁrst image was “ target” or “non-target”, respectively. Herewe used the 5 Hz version (ﬁve images per second) in Session a.The number of samples for different subjects varying between368 and 565, and the target to non-target ratio was around 1:9.The continuous EEG data had been bandpass ﬁltered be-tween [0 . , Hz. We downsampled the EEG signal from2048Hz to 64Hz, and epoched each trial to the [0 , . secondinterval time-locked to the stimulus onset. C. Data Visualization

It’s interesting to visualize how the EEG trials are modiﬁedby EA. Fig. 1 shows two examples (one for left hand imagery,and the other for right) from Subject 1 in Dataset 2a. Theblack and red curves are EEG signals before and after EA,respectively, and the vertical axis numbers show their corre-lations. The magnitudes of the EEG signals are smaller andmore uniform after EA, and the EEG signals before and afterEA generally have low correlation.To visualize how EA reduces individual differences, weused t -Stochastic Neighbor Embedding ( t -SNE) [32], a non-linear dimensionality reduction technique that embeds high-dimensional data in a two- or three- dimensional space, toshow and compare the EEG trials before and after EA.Each time we picked trials from one subject as the testset, and combined trials from all remaining subjects as thetraining set. Fig. 2(a) shows the t -SNE visualization of theﬁrst two subjects in MI Dataset 1, each row corresponding toa different test subject. The red dots are trials from the testsubject, and the blue dots from the training subjects. In eachrow, the left plot shows the trials before EA, and the rightafter EA. Corresponding visualization results for the ﬁrst twosubjects in MI Dataset 2a and ERP are shown in Figs. 2(b)and 2(c), respectively.

200 400 600

Time

Fig. 1. EEG trials before (black curves) and after (red curves) EA. Each rowis a different channel.

The training trials (blue dots) may be scattered far awayfrom the test trials (red dots) before EA, especially in Fig. 2(a).So, applying a classiﬁer designed on the training trials directlyto the test trials may not achieve good performance. However,after EA, the training and test trials overlap with each other,i.e., the discrepancies between them are reduced.V. P

ERFORMANCE E VALUATION : O

FFLINE U NSUPERVISED C LASSIFICATION

This section presents the performance comparison of EAwith other approaches on both MI and ERP datasets in ofﬂineunsupervised classiﬁcation.

A. Ofﬂine Unsupervised Classiﬁcation

In each dataset, there were multiple subjects, and eachsubject was ﬁrst aligned independently, either in the Rie-mannian space using (5), or in the Euclidean space using(11). Since we had access to all EEG recordings in ofﬂineclassiﬁcation, all trials or resting epochs between all trials wereused to estimate the reference matrices. We then used leave-one-subject-out cross-validation to evaluate the classiﬁcationperformance: each time we picked one subject as the newsubject (test set), combined EEG trails from all remainingsubjects as the training set to build the classiﬁer, and thentested the classiﬁer on the new subject.

B. Ofﬂine Classiﬁcation Results on the MI Datasets

We ﬁrst tested EA on the two MI datasets, and compared itsperformance with RA-MDRM. In the Euclidean space, afterEA, we used CSP [11], [16], [21], [25] for spatial ﬁltering andLDA for classiﬁcation. More speciﬁcally, the following fourapproaches were compared:1) MDRM: The basic MDRM classiﬁer, as introduced inSection II-C. It does not include any data alignment.2) RA-MDRM: It is the approach introduced in Sec-tion II-D, which ﬁrst aligns the covariance matrices inthe Riemannian space, and then performs MDRM. -40 -20 0 20 40-20020

Subject 1, before EA -40 -20 0 20 40-20020

Subject 1, after EA -40 -20 0 20-40-20020

Subject 2, before EA -40 -20 0 20-40-20020

Subject 2, after EA (a) -20 0 20 40-20020

Subject 1, before EA -20 0 20 40-20020

Subject 1, after EA -30 -20 -10 0 10 20-2002040

Subject 2, before EA -30 -20 -10 0 10 20-2002040

Subject 2, after EA (b) -40 -20 0 20-40-20020

Subject 1, before EA -40 -20 0 20-40-20020

Subject 1, after EA -20 0 20 40-2002040

Subject 2, before EA -20 0 20 40-2002040

Subject 2, after EA (c)Fig. 2. t -SNE visualization of the ﬁrst two subjects before and after EA.(a) MI Dataset 1; (b) MI Dataset 2a; (c) ERP. Red dots: trials from the testsubject; blue dots: trials from the training subjects.

3) CSP-LDA: It is a standard Euclidean space classiﬁcationapproach for MI, which spatially ﬁlters the EEG trialsby CSP and then classiﬁes them by LDA. It does notinclude any data alignment.4) EA-CSP-LDA: It ﬁrst aligns the EEG trials in theEuclidean space by EA (Section III), and then performsCSP ﬁltering and LDA classiﬁcation.The classiﬁcation accuracies of the four approaches arepresented in Fig. 3 and Table I, which show that:1) RA-MDRM outperformed MDRM on 15 out of the 16subjects, suggesting that RA was effective.2) EA-CSP-LDA also outperformed CSP-LDA on 14 outof the 16 subjects, suggesting that the proposed EA wasalso effective.3) EA-CSP-LDA outperformed RA-MDRM on 11 out ofthe 16 subjects, suggesting that the proposed EA, whichenables the use of a wide range of Euclidean space signalprocessing and machine learning approaches, could bemore effective than RA.Finally, it is worth noting that for a small number of subjects(e.g., Subjects 4 and 9 in Dataset 2a), EA actually degraded theclassiﬁcation accuracy. Some possible reasons are explainedat the end of the paper, and will be investigated in our futureresearch.

Subject A cc u r ac y ( % ) MDRMRA-MDRMCSP-LDAEA-CSP-LDA (a)

Subject A cc u r ac y ( % ) MDRMRA-MDRMCSP-LDAEA-CSP-LDA (b)Fig. 3. Ofﬂine unsupervised classiﬁcation accuracies on the MI datasets: (a)Dataset 1; (b) Dataset 2a.

To determine if the differences between our proposed ap-proach (EA-CSP-LDA) and each other approach was statis-tically signiﬁcant, we performed paired-sample t -test on theaccuracies in Table I using MATLAB function ttest . The TABLE IO

FFLINE UNSUPERVISED CLASSIFICATION ACCURACIES (%)

ON THE TWO MI DATASETS .Dataset Subject MDRM RA-MDRM CSP-LDA EA-CSP-LDA1 51.00 72.50 48.00

MI Dataset 1 4 51.00 54.50 50.00 avg 54.36 68.07 59.71 null hypothesis for each pairwise comparison was that thedifference between the paired samples has mean zero, andit was rejected if p ≤ α , where α = 0 . was used. Beforeperforming each t -test, we also performed a Lilliefors test [18]to verify that the null hypothesis that the data come from anormal distribution cannot be rejected.The paired-sample t -test results are shown in Table II,where the statistically signiﬁcant ones are marked in bold.EA-CSP-LDA signiﬁcantly outperformed CSP-LDA on bothMI datasets, suggesting that EA was effective. In addition, EA-CSP-LDA signiﬁcantly outperformed RA-MDRM on Dataset1, and had comparable performance with it on Dataset 2a,suggesting that EA may be preferred over RA. TABLE IIP

AIRED - SAMPLE t - TEST RESULTS ON THE TEST ACCURACIES IN T ABLE

I.MI Dataset 1MDRM RA-MDRM CSP-LDAEA-CSP-LDA

MI Dataset 2MDRM RA-MDRM CSP-LDAEA-CSP-LDA

It is also interesting to compare the computational cost ofdifferent data alignment approaches. The platform was a DellXPS15 laptop with Intel Core i7-6700HQ [email protected],16GB memory, and 512 GB SSD, running 64-bit Windows10 Education and Matlab 2017a. The results are shown inTable III. Our proposed EA-CSP-LDA was 3.6-19.5 timesfaster than RA-MDRM, and also it had much smaller standarddeviation. RA-MDRM ran much slower on Dataset 1 becauseit had much more channels than Dataset 2a (59 versus 22).

TABLE IIIT

HE COMPUTING TIME ( SECONDS ) OF EA-CSP-LDA

AND

RA-MDRM.EA-CSP-LDA RA-MDRMMean std Mean stdMI Dataset 1 0.3864 0.0514 7.5326 0.2200MI Dataset 2a 0.2405 0.0322 0.8766 0.0729

In summary, we have demonstrated that our proposed EAis more effective and efﬁcient than RA in ofﬂine unsupervisedMI classiﬁcation.

C. Ofﬂine Classiﬁcation Results on the ERP Dataset

As RA-MDRM cannot be applied to ERP classiﬁcationwhen there are no labeled trials at all from the new subject [RAneeds some non-target trials to compute the reference matrixin (6), and MDRM needs some target trials to construct X ∗ i in(9)], we only validate the effectiveness of EA by comparingit with cases that no data alignment is performed, in leave-one-subject-out cross-validation. All approaches used SVMclassiﬁers, which cannot be associated with RA because RAonly outputs covariance matrices.More speciﬁcally, we compared the performances of thefollowing four approaches (all trials were downsampled to 64Hz):1) SVM, which performs principal component analysis(PCA) on the EEG trials to suppress noise and extractfeatures, and then SVM for classiﬁcation. It does notinclude any data alignment.2) EA-SVM, which ﬁrst performs EA to align the trialsfrom different subjects in the Euclidean space, and thenPCA and SVM classiﬁcation.3) xDAWN-SVM, which ﬁrst performs xDAWN [26], [37]to spatially ﬁlter the EEG trials, and then PCA and SVMclassiﬁcation. It does not include any data alignment.4) EA-xDAWN-SVM, which ﬁrst performs EA to align thetrials from different subjects in the Euclidean space, thenxDAWN to spatially ﬁlter the EEG trials, and ﬁnallyPCA and SVM classiﬁcation.For all approaches, we ﬁrst reshaped the 2D features (ma-trices) of EEG data into 1D vectors, then normalized each di-mension to zero mean and unit variance. We then applied PCAto extract 20 features. Because these features had differentranges, we further normalized each feature to interval [0 , .LibSVM [6] with a linear kernel was used for classiﬁcation.We considered the trade-off parameter C ∈ { − , − , ..., } ,and used nested 5-fold cross-validation on the training data toidentify the optimal C . Finally, we used all training data andthe optimal C to train a linear SVM classiﬁer, and applied itto the test data.Because ERPs had signiﬁcant class imbalance, we used thebalanced classiﬁcation accuracy (BCA) as the performancemeasure. Let m + and m − be the true number of trials fromthe target and non-target classes, respectively. Let n + and n − be the number of trials that are correctly classiﬁed byan algorithm as target and non-target, respectively. Then, weﬁrst compute a + = n + m + , a − = n − m − , (14)where a + is the classiﬁcation accuracy on the target class, and a − on the non-target class. The BCA is then computed as: BCA = a + + a − . (15) The BCAs for the four approaches are presented in Fig. 4and Table IV, which show that:1) EA-SVM outperformed SVM on nine out of 11 subjects,suggesting that the proposed EA was generally effectivefor ERP classiﬁcation.2) EA-xDAWN-SVM outperformed xDAWN-SVM oneight out of 11 subjects, suggesting again that the pro-posed EA was generally effective for ERP classiﬁcation.3) On average xDAWN-SVM and SVM achieved similarperformances, but EA-xDAWN-SVM slightly outper-formed EA-SVM, suggesting that our proposed EA mayalso help unleash the full potential of xDAWN. BC A ( % ) SVMEA-SVMxDAWN-SVMEA-xDAWN-SVM

Fig. 4. BCAs of ofﬂine unsupervised classiﬁcation on the ERP dataset.TABLE IVBCA S (%) OF OFFLINE UNSUPERVISED CLASSIFICATION ON THE

ERP

DATASET .Subject SVM EA-SVM xDAWN-SVM EA-xDAWN-SVM1 77.54 avg 64.64 67.85 64.60 Paired-sample t -tests were also performed for the results inTable IV. As RA-MDRM could not be applied in this scenario,only two pairs of algorithms were compared, i.e., SVM ver-sus EA-SVM, and xDAWN-SVM versus EA-xDAWN-SVM.The results are shown in Table V, where the statisticallysigniﬁcant ones are marked in bold. EA-SVM signiﬁcantlyoutperformed SVM, and EA-xDAWN-SVM signiﬁcantly out-performed xDAWN-SVM, suggesting that EA was effectiveon the ERP dataset, too. TABLE VP

AIRED t - TEST RESULTS ON THE TEST

BCA

S IN T ABLE

IV.SVM xDAWN-SVMEA-SVM

EA-xDAWN-SVM

D. Discussion: Different Choices of the Reference Matrix

Reference matrix estimation has a direct impact on theperformance of the alignment algorithms. RA uses the Rie-mannian mean of the resting covariance matrices for MI classi-ﬁcation, and the Riemannian mean of the non-target covariancematrices for ERP classiﬁcation [see (6)]. EA estimates thereference matrix from all trials by (10), whose procedure isthe same for both MI and ERP classiﬁcation.In summary, the reference matrix can be estimated from twotypes of trials for MI classiﬁcation: 1) the resting trials thatthe subject is not performing any task; and, 2) the imagery trials that the subject is performing a motor imagery task.Furthermore, the reference matrix can be computed as theRiemannian mean or the Euclidean mean. So we have fourpossible combinations:

Riemannian mean of the resting trials(RR),

Euclidean mean of the resting trials (ER),

Riemannian mean of all imagery trials (RI), and

Euclidean mean of all imagery trials (EI).This subsection compares the performances of the abovefour reference matrices. The results are shown in Figs. 5(a)and 5(b) for MI Datasets 1 and 2a, respectively. They showthat:1) On average RI-MDRM outperformed RR-MDRM, andEI-CSP-LDA outperformed ER-CSP-LDA, on bothdatasets, suggesting that estimating the reference matrixfrom all imagery trials would be better than using allresting trials.2) On average across all 16 subjects, EI achieved thebest performance for CSP-LDA, and RI achieved thebest performance for MDRM. This is consistent withour expectation: MDRM operates in the Riemannianspace, hence the Riemannian mean might give a moreaccurate estimation of mean covariance matrices than theEuclidean mean; on the other hand, CSP-LDA operatesin the Euclidean space, so the Euclidean mean soundsmore reasonable.3) On average across all 16 subjects, EI-CSP-LDA out-performed RI-MDRM, suggesting that EA was advanta-geous to RA even when they both used the best referencematrix.VI. P

ERFORMANCE E VALUATION : S

IMULATED O NLINE S UPERVISED C LASSIFICATION

This section evaluates the performance of EA in simulatedonline supervised classiﬁcation. The same three datasets wereused.

A. Simulated Online Supervised Classiﬁcation

In online supervised classiﬁcation, we have labeled trialsfrom multiple auxiliary subjects, but initially no trials at allfrom the new subject. We acquire labeled trials from the newsubject sequentially on-the-ﬂy, which are then used to train aclassiﬁer to label future trials from the new subject, with thehelp of data from the auxiliary subjects.We simulated the online supervised classiﬁcation scenariousing the ofﬂine datasets presented in Section IV. Take MI A cc u r ac y ( % ) RR-MDRMER-MDRMRI-MDRMEI-MDRMRR-CSP-LDAER-CSP-LDARI-CSP-LDAEI-CSP-LDA (a) A cc u r ac y ( % ) RR-MDRMER-MDRMRI-MDRMEI-MDRMRR-CSP-LDAER-CSP-LDARI-CSP-LDAEI-CSP-LDA (b)Fig. 5. Comparison of different reference matrices on the MI datasets. (a)Dataset 1; (b) Dataset 2a. RR:

Riemannian mean of the resting trials; ER:

Euclidean mean of the resting trials; RI:

Riemannian mean of all imagery trials; EI:

Euclidean mean of all imagery trials.

Dataset 1 as an example. Each time we picked one subject asthe new subject, and the remaining six subjects as auxiliarysubjects. The new subject had 200 trials. We generated arandom integer n ∈ [1 , , reserved the subsequent m trials { n + i } mi =1 as the online pool , and used the remaining − m trials as the test data. Starting from an empty trainingset, we added r trials from the online pool to it each time, builta classiﬁer by combining the training set with the auxiliarydata, evaluated its performance on the test data, until all m trials in the online pool were exhausted.The main difference between ofﬂine unsupervised classiﬁ-cation and simulated online supervised classiﬁcation is thatthe former has a large number of unlabeled trials from thenew subject, but none of them have labels, whereas the latterhas only a small number of trials from the new subject, all ofwhich are labeled. B. Simulated Online Classiﬁcation Results on the MI Datasets

The four approaches (MDRM, RA-MDRM, CSP-LDA andEA-CSP-LDA) introduced in Section V-B were comparedagain in simulated online MI classiﬁcation. In ofﬂine unsu-pervised classiﬁcation, we had access to all unlabeled EEGtrials of the new subject, so its ¯ R was computed by using all When n + i was larger than 200, we rewound to the beginning of thetrial sequence, i.e., replaced n + i by n + i − . trials for EA, and the resting trials between them for RA. Insimulated online supervised classiﬁcation, we only had accessto a small number of labeled trials from the new subject, so its ¯ R was computed by using these trials for EA, and the restingtrials between them for RA (the label information was notneeded in either EA or RA; only the EEG trials were used).All labeled trials from auxiliary subjects and the small numberof available labeled trials from the new subject were combinedto train MDRM, CSP and LDA. We paid special attention tothe implementation to make sure it was causal, i.e., we did notmake use of EEG and label information that was not supposedto be known at a given time point.We used m = 40 and r = 4 for both MI datasets. Inorder to obtain statistically meaningful results, we repeatedthe experiment 30 times (each time with a random n ) foreach new subject. The average classiﬁcation accuracies of thefour approaches are presented in Fig. 6, which shows that:1) RA-MDRM outperformed MDRM on 15 out of the 16subjects, suggesting that RA was effective in simulatedonline supervised classiﬁcation.2) EA-CSP-LDA outperformed CSP-LDA on 14 out of the16 subjects, suggesting that the proposed EA was alsoeffective in simulated online supervised classiﬁcation.3) EA-CSP-LDA outperformed RA-MDRM on 12 out ofthe 16 subjects, suggesting that EA was generally moreeffective than RA in simulated online supervised classi-ﬁcation.To determine if the differences between our proposed al-gorithm and the others were statistically signiﬁcant in sim-ulated online experiments, we ﬁrst deﬁned an aggregatedperformance measure called the area under the curve (AUC).For a particular algorithm on a particular subject, the AUCwas the area under its accuracy curve when the number oflabeled subject-speciﬁc trials increased from 4 to 40. As werepeated the experiments 30 times, we ﬁrst computed themean AUC of these 30 repetitions for each subject. Eachalgorithm had N mean AUCs, where N was the number ofsubjects. We then compared these mean AUCs using paired-sample t -tests. The results are shown in Table VI, where thestatistically signiﬁcant ones are marked in bold. EA-CSP-LDAsigniﬁcantly outperformed RA-MDRM on Dataset 1, and hadcomparable performance with it on Dataset 2a, suggesting thatEA may be preferred over RA. TABLE VIP

AIRED - SAMPLE t - TEST RESULTS ON THE MEAN

AUC

S IN SIMULATEDONLINE MI CLASSIFICATION .MI Dataset 1MDRM RA-MDRM CSP-LDAEA-CSP-LDA

MI Dataset 2MDRM RA-MDRM CSP-LDAEA-CSP-LDA

C. Simulated Online Classiﬁcation Results on the ERPDataset

Four approaches (MDRM, RA-MDRM, xDAWN-SVM, andEA-xDAWN-SVM) were compared in simulated online su-

Subject 1

Subject 2

Subject 3

Subject 4

Subject 5

Subject 6

Subject 7

Average (a)

Subject 1

Subject 2

Subject 3

Subject 4

Subject 5

Subject 6

Subject 7

Subject 8

Subject 9

Average

MDRMRA-MDRMCSP-LDAEA-CSP-LDA (b)Fig. 6. Classiﬁcation accuracies (%) of simulated online learning on the MIdatasets: (a) Dataset 1; (b) Dataset 2a. The horizontal axis shows the numberof subject-speciﬁc labeled trials from the new subject. The error bars indicatethe 95% conﬁdence intervals. The legends in (a) are the same as those in (b). pervised classiﬁcation on the ERP dataset. Note that MDRMand RA-MDRM were not used in ofﬂine unsupervised ERPclassiﬁcation because they needed some labeled trials from thenew subject to construct the augmented trials, which were notavailable in ofﬂine unsupervised classiﬁcation. However, theywere used in simulated online supervised ERP classiﬁcationbecause here labeled trials were available.We used m = 80 and r = 10 , and started with 20 trialsin the ﬁrst iteration. In order to obtain statistically meaningfulresults, we again repeated the experiment 30 times (each timewith a random n ) for each new subject. The average BCAsof the four approaches are shown in Fig. 7. Observe that:1) On average RA-MDRM outperformed MDRM, and EA-xDAWN-SVM outperformed xDAWN-SVM, suggestingthat both alignment approaches were effective in simu-lated online supervised classiﬁcation.2) EA-xDAWN-SVM outperformed RA-MDRM on all 11subjects, suggesting that the proposed EA was moreeffective than RA in simulated online supervised classi-ﬁcation.

20 40 60 8050100

Subject 1

20 40 60 8050100

Subject 2

20 40 60 8050100

Subject 3

20 40 60 8050100

Subject 4

20 40 60 8050100

Subject 5

20 40 60 8050100

Subject 6

20 40 60 8050100

Subject 7

20 40 60 8050100

Subject 8

20 40 60 8050100

Subject 9

20 40 60 8050100

Subject 10

20 40 60 8050100

Subject 11

20 40 60 8050100

Average

MDRMRA-MDRMxDAWN-SVMEA-xDAWN-SVM

Fig. 7. BCAs (%) of simulated online calibration on the ERP dataset. Thehorizontal axis shows the number of subject-speciﬁc labeled trials from thenew subject. The error bars indicate the 95% conﬁdence intervals.

Paired-sample t -tests were also performed to compare EA-xDAWN-SVM with the other three algorithms. The results areshown in Table VII, where the statistically signiﬁcant ones aremarked in bold. EA-xDAWN-SVM signiﬁcantly outperformedall other approaches, suggesting that the proposed EA waseffective and may be preferred over RA. TABLE VIIP

AIRED - SAMPLE t - TEST RESULTS ON THE MEAN

AUC

S IN SIMULATEDONLINE

ERP

CLASSIFICATION .MDRM RA-MDRM xDAWN-SVMEA-xDAWN-SVM

VII. C

ONCLUSION AND F UTURE R ESEARCH

Transfer learning is a promising approach to improve theEEG classiﬁcation performance in BCIs, by using labeleddata from auxiliary subjects in similar tasks. However, dueto individual differences, if the EEG trials from differentsubjects are not aligned properly, the discrepancies amongthem may result in negative transfer. A Riemannian spacecovariance matrix alignment approach (RA) has been proposedto transform the covariance matrices of EEG trials to give thema common reference. However, it has some limitations: 1) italigns the covariance matrices instead of the EEG trials, so aclassiﬁer that operates directly on the covariance matrices mustbe used to take advantage of the alignment, whereas there arevery few such classiﬁers; 2) its computational cost is high;and, 3) it needs some labeled subject-speciﬁc trials from thenew subject for ERP-based BCIs.This paper has proposed a Euclidean space EEG trialalignment approach (EA), which has three desirable properties:1) it aligns the EEG trials directly in the Euclidean space, and any signal processing, feature extraction and machine learningalgorithms can be applied to the aligned trials, so it has muchbroader applications than the Riemannian space alignmentapproach; 2) it can be computed several times faster than theRiemannian space alignment approach; and, 3) it does not needany labeled trials from the new subject. Experiments in ofﬂineand simulated online classiﬁcation on two MI datasets and oneERP dataset veriﬁed the effectiveness and efﬁciency of EA.However, the current EA may still have some limitations.Its goal is to compensate the dataset shift among differentsubjects, which includes three types of shift:1)

Covariate shift [28], [29]: the distribution of the inputs(independent variables) changes.2)

Prior probability shift : the distribution of the output(target variable) changes.3)

Concept shift [31]: the relationship between the inputsand the output changes.The current EA only considers covariate shift but ignores theother two. So, the per-class input data distributions may stillhave large discrepancies among different subjects after EA.Moreover, in compensating for the covariate shift, EA mayeven increase the concept shift, i.e., it is possible that fora speciﬁc subject, the two classes become more difﬁcult todistinguish after EA. These could be some of the reasonswhy EA demonstrated improved performance on most but notall subjects. Another possible reason that EA did not offeradvantages on some subjects is that there could be bad trialsand/or outliers for these subjects. Including these trials incomputing the reference matrix ¯ R would result in a large error,which further affects the classiﬁcation accuracy.Additionally, we acknowledge that the simulated onlinesupervised classiﬁcation experiments are not identical to realonline experiments. Our results would be more convincingif they were obtained from real experiments. Our futureresearch will investigate and accommodate the limitations ofEA, and validate the improvements in real-world closed-loopBCI experiments. R EFERENCES[1] A. Barachant, S. Bonnet, M. Congedo, and C. Jutten, “Multiclass brain-computer interface classiﬁcation by Riemannian geometry,”

IEEE Trans.on Biomedical Engineering , vol. 59, no. 4, pp. 920–928, 2012.[2] A. Barachant and M. Congedo, “A plug & play P300 BCI usinginformation geometry,” arXiv: 1409.0107 , 2014.[3] C. M. Bishop,

Pattern Recognition and Machine Learning . NY:Springer-Verlag, 2006.[4] B. Blankertz, R. Tomioka, S. Lemm, M. Kawanabe, and K. R. Muller,“Optimizing spatial ﬁlters for robust EEG single-trial analysis,”

IEEESignal Processing Magazine , vol. 25, no. 1, pp. 41–56, 2008.[5] B. Blankertz, G. Dornhege, M. Krauledat, K. R. Muller, and G. Curio,“The non-invasive Berlin brain-computer interface: Fast acquisition ofeffective performance in untrained subjects,”

NeuroImage , vol. 37, no. 2,pp. 539–550, 2007.[6] C.-C. Chang and C.-J. Lin, “LIBSVM: A library for support vectormachines,”

ACM Trans. on Intelligent Systems and Technology , vol. 2,no. 3, pp. 27:1–27:27, 2011.[7] M. Congedo, A. Barachant, and A. Andreev, “A new generation of brain-computer interface based on Riemannian geometry,” arXiv: 1310.8115 ,2013.[8] P. T. Fletcher and S. Joshi, “Principal geodesic analysis on symmetricspaces: Statistics of diffusion tensors,”

Lecture Notes in ComputerScience , vol. 3117, pp. 87–98, 2004. [9] A. L. Goldberger, L. A. N. Amaral, L. Glass, J. M. Hausdorff, P. C.Ivanov, R. G. Mark, J. E. Mietus, G. B. Moody, C.-K. Peng, and H. E.Stanley, “PhysioBank, PhysioToolkit, and PhysioNet: Components of anew research resource for complex physiologic signals,” Circulation ,vol. 101, no. 23, pp. e215–e220, 2000.[10] A. Gretton, K. M. Borgwardt, M. Rasch, B. Sch¨olkopf, and A. J. Smola,“A kernel method for the two-sample-problem,” in

Proc. Advances inNeural Information Processing Systems , Vancouver, Canada, Dec. 2007,pp. 513–520.[11] H. He and D. Wu, “Transfer learning enhanced common spatial patternﬁltering for brain computer interfaces (BCIs): Overview and a newapproach,” in

Proc. 24th Int’l. Conf. on Neural Information Processing ,Guangzhou, China, November 2017.[12] V. Jayaram, M. Alamgir, Y. Altun, B. Scholkopf, and M. Grosse-Wentrup, “Transfer learning in brain-computer interfaces,”

IEEE Com-putational Intelligence Magazine , vol. 11, no. 1, pp. 20–31, 2016.[13] H. Kang, Y. Nam, and S. Choi, “Composite common spatial pattern forsubject-to-subject transfer,”

Signal Processing Letters , vol. 16, no. 8, pp.683–686, 2009.[14] P.-J. Kindermans, M. Tangermann, K.-R. M¨uller, and B. Schrauwen,“Integrating dynamic stopping, transfer learning and language models inan adaptive zero-training ERP speller,”

Journal of Neural Engineering ,vol. 11, no. 3, p. 035005, 2014.[15] R. J. Kobler and R. Scherer, “Restricted Boltzmann machines in sen-sory motor rhythm brain-computer interfacing: a study on inter-subjecttransfer and co-adaptation,” in

Proc. IEEE Int’l Conf. on Systems, Man,and Cybernetics . Budapest, Hungary: IEEE, Oct. 2016, pp. 469–474.[16] Z. J. Koles, M. S. Lazar, and S. Z. Zhou, “Spatial patterns underlyingpopulation differences in the background EEG,”

Brain Topography ,vol. 2, no. 4, pp. 275–284, 1990.[17] B. J. Lance, S. E. Kerick, A. J. Ries, K. S. Oie, and K. McDowell,“Brain-computer interface technologies in the coming decades,”

Proc.of the IEEE , vol. 100, no. 3, pp. 1585–1599, 2012.[18] H. W. Lilliefors, “On the Kolmogorov-Smirnov test for normalitywith mean and variance unknown,”

Journal of the American statisticalAssociation , vol. 62, no. 318, pp. 399–402, 1967.[19] F. Lotte and C. Guan, “Learning from other subjects helps reducingbrain-computer interface calibration time,” in

Proc. IEEE Int’l. Conf. onAcoustics Speech and Signal Processing (ICASSP) , Dallas, TX, March2010.[20] A. Matran-Fernandez and R. Poli, “Towards the automated localisationof targets in rapid image-sifting by collaborative brain-computer inter-faces,”

PLoS ONE , vol. 12, pp. 21–34, 2017.[21] J. M¨uller-Gerking, G. Pfurtscheller, and H. Flyvbjerg, “Designing op-timal spatial ﬁlters for single-trial EEG classiﬁcation in a movementtask,”

Clinical Neurophysiology , vol. 110, no. 5, pp. 787–798, 1999.[22] L. F. Nicolas-Alonso and J. Gomez-Gil, “Brain computer interfaces, areview,”

Sensors , vol. 12, no. 2, pp. 1211–1279, 2012.[23] S. J. Pan and Q. Yang, “A survey on transfer learning,”

IEEE Trans.on Knowledge and Data Engineering , vol. 22, no. 10, pp. 1345–1359,2010.[24] G. Pfurtscheller, G. R. M¨uller-Putz, R. Scherer, and C. Neuper, “Re-habilitation with brain-computer interface systems,”

Computer , vol. 41,no. 10, pp. 58–65, 2008.[25] H. Ramoser, J. Muller-Gerking, and G. Pfurtscheller, “Optimal spatialﬁltering of single trial EEG during imagined hand movement,”

IEEETrans. on Rehabilitation Engineering , vol. 8, no. 4, pp. 441–446, 2000.[26] B. Rivet, A. Souloumiac, V. Attina, and G. Gibert, “xDAWN algorithmto enhance evoked potentials: application to brain-computer interface,”

IEEE Trans. on Biomedical Engineering , vol. 56, no. 8, pp. 2035–2043,2009.[27] W. Samek, F. Meinecke, and K.-R. Muller, “Transferring subspaces be-tween subjects in brain-computer interfacing,”

IEEE Trans. on Biomed-ical Engineering , vol. 60, no. 8, pp. 2289–2298, 2013.[28] H. Shimodaira, “Improving predictive inference under covariate shift byweighting the log-likelihood function,”

Journal of Statistical Planningand Inference , vol. 90, no. 2, pp. 227–244, 2000.[29] M. Sugiyama, S. Nakajima, H. Kashima, P. V. Buenau, and M. Kawan-abe, “Direct importance estimation with model selection and its ap-plication to covariate shift adaptation,” in

Proc. 32nd Annual Conf.on Advances in Neural Information Processing Systems , Vancouver,Canada, Dec. 2008, pp. 1433–1440.[30] B. Sun, J. Feng, and K. Saenko, “Return of frustratingly easy domainadaptation,” in

Proc. 30th AAAI Conf. on Artiﬁcial Intelligence , vol. 6,no. 7, Phoenix, AZ, Feb. 2016, pp. 2058–2065.[31] P. E. Utgoff, “Shift of bias for inductive concept learning,” in

Machinelearning: An artiﬁcial intelligence approach , R. Michalski, J. Carbonell, and T. Mitchell, Eds. CA: Morgan Kaufmann, 1986, vol. 2, pp. 107–148.[32] L. van der Maaten and G. Hinton, “Visualizing data using t-SNE,”

Journal of Machine Learning Research , vol. 9, pp. 2579–2605, 2008.[33] J. van Erp, F. Lotte, and M. Tangermann, “Brain-computer interfaces:Beyond medical applications,”

Computer , vol. 45, no. 4, pp. 26–34,2012.[34] J. R. Wolpaw, N. Birbaumer, D. J. McFarland, G. Pfurtscheller, and T. M.Vaughan, “Brain-computer interfaces for communication and control,”

Clinical Neurophysiology , vol. 113, no. 6, pp. 767–791, 2002.[35] D. Wu, “Active semi-supervised transfer learning (ASTL) for ofﬂineBCI calibration,” in

Proc. IEEE Int’l. Conf. on Systems, Man andCybernetics , Banff, Canada, October 2017.[36] D. Wu, “Online and ofﬂine domain adaptation for reducing BCI cali-bration effort,”

IEEE Trans. on Human-Machine Systems , vol. 47, no. 4,pp. 550–563, 2017.[37] D. Wu, J.-T. King, C.-H. Chuang, C.-T. Lin, and T.-P. Jung, “Spatialﬁltering for EEG-based regression problems in brain-computer interface(BCI),”

IEEE Trans. on Fuzzy Systems , vol. 26, no. 2, pp. 771–781,2018.[38] D. Wu, V. J. Lawhern, S. Gordon, B. J. Lance, and C.-T. Lin, “Driverdrowsiness estimation from EEG signals using online weighted adap-tation regularization for regression (OwARR),”

IEEE Trans. on FuzzySystems , vol. 25, no. 6, pp. 1522–1535, 2017.[39] D. Wu, V. J. Lawhern, W. D. Hairston, and B. J. Lance, “SwitchingEEG headsets made easy: Reducing ofﬂine calibration effort using activewighted adaptation regularization,”

IEEE Trans. on Neural Systems andRehabilitation Engineering , vol. 24, no. 11, pp. 1125–1137, 2016.[40] D. Wu, V. J. Lawhern, B. J. Lance, S. Gordon, T.-P. Jung, and C.-T. Lin, “EEG-based user reaction time estimation using Riemanniangeometry features,”

IEEE Trans. on Neural Systems and RehabilitationEngineering , vol. 25, no. 11, pp. 2157–2168, 2017.[41] F. Yger, M. Berar, and F. Lotte, “Riemannian approaches in brain-computer interfaces: a review,”

IEEE Trans. on Neural Systems andRehabilitation Engineering , vol. 25, no. 10, pp. 1753–1762, 2017.[42] P. Zanini, M. Congedo, C. Jutten, S. Said, and Y. Berthoumieu, “Transferlearning: a Riemannian geometry framework with applications to brain-computer interfaces,”