[PDF] Common Spatial Generative Adversarial Networks based EEG Data Augmentation for Cross-Subject Brain-Computer Interface

Abstract

The cross-subject application of EEG-based brain-computer interface (BCI) has always been limited by large individual difference and complex characteristics that are difficult to perceive. Therefore, it takes a long time to collect the training data of each user for calibration. Even transfer learning method pre-training with amounts of subject-independent data cannot decode different EEG signal categories without enough subject-specific data. Hence, we proposed a cross-subject EEG classification framework with a generative adversarial networks (GANs) based method named common spatial GAN (CS-GAN), which used adversarial training between a generator and a discriminator to obtain high-quality data for augmentation. A particular module in the discriminator was employed to maintain the spatial features of the EEG signals and increase the difference between different categories, with two losses for further enhancement. Through adaptive training with sufficient augmentation data, our cross-subject classification accuracy yielded a significant improvement of 15.85% than leave-one subject-out (LOO) test and 8.57% than just adapting 100 original samples on the dataset 2a of BCI competition IV. Moreover, We designed a convolutional neural networks (CNNs) based classification method as a benchmark with a similar spatial enhancement idea, which achieved remarkable results to classify motor imagery EEG data. In summary, our framework provides a promising way to deal with the cross-subject problem and promote the practical application of BCI.

Full PDF

11 Common Spatial Generative Adversarial Networksbased EEG Data Augmentation for Cross-SubjectBrain-Computer Interface

Yonghao Song, Lie Yang, Xueyu Jia, Longhan Xie,

Member, IEEE

Abstract —The cross-subject application of EEG-based brain-computer interface (BCI) has always been limited by largeindividual difference and complex characteristics that are dif-ﬁcult to perceive. Therefore, it takes a long time to collect thetraining data of each user for calibration. Even transfer learningmethod pre-training with amounts of subject-independent datacannot decode different EEG signal categories without enoughsubject-speciﬁc data. Hence, we proposed a cross-subject EEGclassiﬁcation framework with a generative adversarial networks(GANs) based method named common spatial GAN (CS-GAN),which used adversarial training between a generator and adiscriminator to obtain high-quality data for augmentation. Aparticular module in the discriminator was employed to maintainthe spatial features of the EEG signals and increase the differ-ence between different categories, with two losses for furtherenhancement. Through adaptive training with sufﬁcient augmen-tation data, our cross-subject classiﬁcation accuracy yielded asigniﬁcant improvement of 15.85% than leave-one subject-out(LOO) test and 8.57% than just adapting 100 original sampleson the dataset 2a of BCI competition IV. Moreover, We designeda convolutional neural networks (CNNs) based classiﬁcationmethod as a benchmark with a similar spatial enhancement idea,which achieved remarkable results to classify motor imageryEEG data. In summary, our framework provides a promisingway to deal with the cross-subject problem and promote thepractical application of BCI.

Index Terms —Electroencephalograph (EEG), generative ad-versarial networks (GANs), data augmentation, brain-computerinterface (BCI), motor imagery (MI), cross-subject.

I. I

NTRODUCTION R ESEARCHERS have been trying to decode the informa-tion in the brain for many years. One of the commonlyused in the ﬁeld is electroencephalograph (EEG), a non-invasive monitoring method to record brain neurons’ electricalactivity with some electrodes on the scalp [1]. Because of thestable temporal and spatial resolution, EEG has achieved goodperformance in diagnosing some diseases such as epilepsy,insomnia and Alzheimer’s [2]–[4], and also brings great pos-sibilities in brain-computer interface (BCI) [5].BCI is a technology that aims to establish pathways betweenthe brain and external devices, commonly used in robot con-trol, entertainment and rehabilitation [6], [7]. Motor imagery isdecoded to aid paralyzed people with physical therapy, which (corresponding author: Longhan Xie)Yonghao Song, Lie Yang, Xueyu Jia and Longhan Xie are with theShien-Ming Wu School of Intelligent Engineering, South China Universityof Technology, Guangzhou 510460, China (e-mail: [email protected],[email protected]). has proven to be beneﬁcial for rehabilitation after a stroke orspinal cord injury. [8]–[10].With the improvement of classiﬁcation methods, the de-coding of motor intention is becoming increasingly accurate.However, most research is to train classiﬁers on a singlesubject instead of putting together the data of different subjects[11]–[13], which means that sufﬁcient training data have to becollected for a new user. It is obviously unfeasible in manyscenarios, especially for some patients.The classiﬁcation problem called cross-subject is mainlyrestricted for two reasons. One reason is that EEG signalsare non-stationary with large individual differences causedby different physiological characteristics [14]. Like in facialexpression recognition research, the effect of classiﬁcationperformance is severely affected by identity information [15].Furthermore, the limited amount of subject-speciﬁc EEG datais insufﬁcient to support some good methods such as convolu-tional neural networks (CNNs) perceiving individual featuresrelated to different categories [16]. Subsequently, transferlearning has been implemented to cross-subject problems bypre-training with subject-independent data ﬁrstly, but still notcompetent unless there is enough data of target subject forﬁne-tuning [17]. Naturally, researchers thought of artiﬁciallygenerating EEG data to resemble and augment the trainingset. Methods such as adding Gaussian noise and segmentationhave been used and obtained a certain extent of improvement[18], [19]. The redundant noise or the loss of informationdoes not meet our needs yet. With an emphasis on datageneration, the generative models maybe the potential solutionto this dilemma. In particular, generative adversarial networks(GANs) have attracted great attention in computer vision dueto the excellent artiﬁcial image generation capabilities [20].Another reason limiting the cross-subject problem is thelow signal-to-noise ratio of the EEG signals, which are easilyinterfered by various factors such as impedance and muscleartifacts. It is also prone to draw into a mass of irrelevantinformation when the subjects are not concentrated. Thisobstacle has not been handled well just using original temporalfeatures with end-to-end machine learning methods. There-fore, some work added feature extraction processing beforethe classiﬁer and conﬁrmed that it makes some sense. Thisprocessing could be summarized into two types. One is tocalculate different features of the signal, such as autoregressivemodel to obtain time-series features [21], power spectrumto obtain spectral features and wavelet transforms to obtaintime-frequency features [22], [23]. The other is to project a r X i v : . [ c s . L G ] F e b the original signal into a subspace while obtaining spatialfeatures for better classiﬁcation, such as optimal spatial ﬁlterand xDAWN algorithm [24], [25].Although the existing works have had some improvementsin EEG decoding, few of them really deal with the cross-subject problem. In this article, therefore, we propose a GANframework to augment multi-channel EEG data with motorintention, and preserve the spatial features in addition totemporal features, while enhancing the discrimination betweendifferent categories of generated signals in a subspace. Toverify the potential of this framework named common spatialGAN (CS-GAN), a well-designed CNN model that emphasizesspatial enhancement for multi-task was given ﬁrstly as abenchmark classiﬁcation method and has achieved the state-of-the-art single-subject results on BCI competition IV Datasets2a. Then, a minimal amount of subject-speciﬁc data wasemployed by CS-GAN to obtain generated data, which wasused to adaptively augment the training set composed ofsubject-independent data. The results showed that the aug-mentation with CS-GAN framework yielded a more signiﬁcantimprovement on cross-subject classiﬁcation, compared to otherexisting augmentation approaches. Besides, detailed analysishas been conducted on the temporal, spatial and frequencyfeatures of the generated EEG signal to show the authenticity.Overall, the CS-GAN assessments from different perspectivesdemonstrate that it is a promising method to promote cross-subject problem and improve BCI systems’ usability.The main contributions of this work can be summarized asfollows.1) We propose the CS-GAN framework for EEG dataaugmentation. Not only are EEG signals generated with goodtemporal pattern, but their spatial characteristics are alsopreserved well with the categories difference increased.2) With the data augmenting by CS-GAN, we improvethe EEG classiﬁcation performance under the cross-subjectcondition to a signiﬁcant measure, which helps reduce thecalibration time of BCI systems.3) We also design a CNN model as a benchmark formotor intention classiﬁcation with EEG signals projected toa new space, where the spatial difference is enhanced and thetemporal information is retained. Remarkable performance onBCI competition IV Datasets 2a has been obtained.4) Besides, we provide a feasible attempt to use GANs-based framework for data augmentation with just a few data.II. R ELATED WORKS

A. GANs and GANs for EEG

GANs are a machine learning strategy inspired by gametheory. Goodfellow et al. proposed this network consisting ofa generator and a discriminator for image generation ﬁrstly in2014 [20]. The generator is used to generate fake data similarto the real data from random series by estimating the originaldata distribution, and the discriminator is to discriminatewhether the generated data are real or fake. After multiplerounds of adversarial training, the two modules gradually reachequilibrium, when the generator could create very real data thatthe discriminator cannot distinguish. Research has emerged to solve several limitations of theﬁrst version of GAN. Conditional GANs proposed by Mirza et al. [26] and auxiliary classiﬁer GANs proposed by Odena etal. [27] introduced category information as a prior conditioninto the generator and discriminator to generate samples of thespeciﬁed category. Radford et al. presented deep convolutionalGANs to build a bridge between supervised and unsupervisedlearning by adding convolutional structures. Features wereeasy to capture, producing more delicate images [28]. Itis worth noting that Earth-Mover divergence was given byArjovsky et al. in Wasserstein GANs (WGANs) [29], whichand the gradient penalty [30] greatly improve the stability ofGANs training. Some exciting attempts have also been putforward, such as image-to-image translation [31] and imageinpainting [32].The amazing ability of GANs to generate artiﬁcial dataquickly grabbed BCI researchers. Hartmann et al. proposed theEEG-GAN to generate single-channel EEG signals with verygood visual inspection [33]. Roy et al. employed long short-term memory networks in generator and discriminator andobtained motor imagery EEG signals with the same dynamicand time-frequency characteristics as the original signals [34].After being convinced that GANs have the potential to gen-erate EEG, non-stationary time series, different applicationshave been tried, such as up-sampling EEG spatial resolution[35], session-invariant representation learning [36] and dataaugmentation. Luo et al. implemented a conditional Wasser-stein GAN to generate power spectral density and differentialentropy of EEG signals for enhancing EEG-based emotionrecognition [37]. Zhang et al. utilized a conditional deepconvolutional GAN to augment data after wavelet transformwas applied [38]. In addition to generating EEG features,researchers have also tried to generate unprocessed EEGsignals for broader purposes. Aznan et al. compared threegenerative models for EEG data generation, which was provento be beneﬁcial in EEG classiﬁcation used in online controlof humanoid robots [39]. Fahimi et al. extracted a featurevector from the target subject’s data as a condition into GANsand obtained the multi-channel EEG signals that inherited thespeciﬁc characteristics of the subject [40]. This research showsa signiﬁcant improvement to classify motor imagery EEG.Does it also indicate that the possibility of GANs is not limitedto generating more EEG signals?

B. EEG Feature Extraction and Cross-subject Task

Various EEG feature extraction methods have been triedto decode EEG better. Zhao et al. arranged the channelsaccording to the spatial distribution at each time point toobtain 3D data with more spatial information [41]. Fan et al. gained the spectral graph theoretic features to quantify thetemporal synchronization for detecting abnormal patterns ofepileptic seizures with EEG [42]. Fast Fourier transform (FFT)and continuous wavelet transform (CWT) was applied byDurongbhan et al. to extract the frequency and time-frequencyfeatures and construct a framework that uses EEG to classifyAlzheimer’s participants and the healthy [43]. Spatial ﬁlterand some of its extensions, which ﬁnd an optimal spatial ﬁlter to maximize the difference between two categories, have alsoachieved promising results. The ﬁlter bank common spatialﬁlter (FBCSP) proposed by Ang et al. decomposes EEG datainto nine frequency bands, gets the spatial features respec-tively, and then selects the features with mutual information-based algorithm for better classiﬁcation [44]. Some researchemployed end-to-end models that embedded a feature extractorand classiﬁer into a deep neural network to conduct jointoptimization. Three convolutional blocks of a CNN weredesigned by Gao et al. to get spatial-temporal features of EEG[45], similar to the way of Li et al. [46].Feature-oriented methods are often used for cross-subjecttasks. Handiru et al. presented a channel selection methodto ﬁnd the most relevant common features of motor imagery[47]. Gupta et al. decomposed the EEG signal into sub-bands with ﬂexible analytic wavelet transform and appliedinformation potential to extract the features for cross-subjectemotion recognition [48]. A popular technique is recentlyused for cross-subject problems termed as transfer learningor domain adaption, which extracts important information orpre-trains classiﬁers with the training data of source domain,and then adapts to a target domain by ﬁne-tuning to getbetter performance. Dose et al. trained a subject-independentclassiﬁer and adapted it to every single subject [49]. Hang etal. improved the classiﬁcation performance on target domainwith the deep features extracted from source domain’s rawEEG signals, considering that the data of target subject wasthe target domain and the data of other subjects was the sourcedomain [50]. Zhao et al. further added a discriminator in anend-to-end model for learning well of the shared features ofthe source domain and target domain [51].In summary, we use the CS-GAN for EEG data augmenta-tion with a small number of samples from the target subject,and then use these augmentation data to enhance the cross-subject classiﬁcation ability through adaptive training.III. M

ETHODS

In the actual use of BCI, too many subject-speciﬁc data areneed for calibration, under the case where good classiﬁcationresults cannot be accomplished with only subject-independentdata. Here, we propose CS-GAN, an EEG signal augmentationmethod focusing on spatial enhancement, which employs asmall amount of subject-speciﬁc data to generate data withthe same characteristics of the original signal and improvethe cross-subject performance in EEG classiﬁcation tasks.The overall framework of the augmented classiﬁcation withCS-GAN is shown in Fig. 1. Firstly, subject-speciﬁc data isprocessed to obtain the spatial features and spatial ﬁlters. Thenthe spatial features and are used as the constraints in CS-GANfor data augmentation and enhance the discrimination betweencategories, and then many generated data are introduced tosubject-independent data for adaptive training. The spatialﬁlters are also used in CS-GAN, and ﬁnally applied to thetraining set of the classiﬁer.

A. Data Description

Dataset 2a of BCI competition IV, provided by Graz Uni-versity of Technology is used for demonstrating our method

Fig. 1. The overall framework of cross-subject EEG signal classiﬁcation withdata augmentation. [52]. The dataset contains EEG data sampled at 250 Hz from9 subjects, which were collected with 22 channels on fourdifferent motor imagery tasks, including the imagination ofmovement of the left hand, right hand, both feet, and tongue.Two sessions recorded on different days were obtained foreach subject with 288 trails in one session, 72 trails per motorimagery task. All the signals were bandpass ﬁltered between0.5 Hz and 100 Hz (with the 50 Hz notch ﬁlter enabled).

B. Pre-processing

Four-second segmentation of a trail ([2, 6] second ) wastaken as a sample from the cue beginning to the end. ’NaN’were replaced with 0. The EEG signals were further ﬁlteredbetween 4 Hz and 40 Hz to include µ band and β band [53].The 578 samples (’T’ and ’E’) of one subject were shufﬂed,of which 516 constituted training set and 50 constituted testset to evaluate the performance of the CNN we proposed.However, only 100 samples in ’T’ of one subject were inputtedto CS-GAN, demonstrating the good ability to generate newdata. In the ﬁnal cross-subject test, the ’T’ data of 8 subjectswere used for training, and the other 188 data of the remainingone subject was used for the test.The z-score standardization was performed to reduce thenonstationarity and ﬂuctuation.The standardization can be for-mulated as X (cid:48) = X − µ √ σ (1)where X (cid:48) and X denote the standardized and input ﬁlteredsignal. µ and σ represent the mean value and variance cal-culated with the training set, using which the signals becomenormally distributed with a mean of 0 and a standard deviationof 1. Then the mean and variance are applied to the test set. C. Spatial Features and Filter

The spatial features of EEG, especially the correlationbetween different channels, are often ignored. Therefore, inthis part, we extracted the spatial features of the EEG samples and constructed a mapping space, in which different categoriesof motor imagery could be better distinguished.Because of the multi-classiﬁcation task we faced, a modiﬁedone-versus-rest (OVR) strategy was adopted to overcomethe shortcoming of traditional usage of spatial ﬁlter, whichonly separated two categories. OVR means that the multi-classiﬁcation was transformed into multiple bi-classiﬁcation,consisting of one class and the remaining classes. We calcu-lated each sample’s covariance matrix for one bi-classiﬁcationin four of our whole task as cov ( X ) = XX T trace ( XX T ) (2)where cov () means to obtain the covariance matrix, and trace () means to calculate the trace of a matrix. X is the EEGsample, which can be expressed as C × T , C is the numberof the channels, and T is the sample length. The shape of X in this task is × .After that, all samples in a category were averaged to obtaintwo covariance matrices R and R , R is the ’one’, and R is the ’rest’, which depicted the spatial relationship betweenchannels of the two categories, respectively. The reason is thatcovariance is a measure of the joint variability of two randomvariables, and the covariance matrix contains the covarianceof every two rows in the original sample. To utilize the spatialfeatures derived from the covariance matrix, the Euclideandistance from each sample’s covariance matrix to R wascalculated as Dis = (cid:107) cov ( X ) − R (cid:107) = (cid:118)(cid:117)(cid:117)(cid:116) C (cid:88) i =1 C (cid:88) j =1 [ cov ( X ) ij − R ij ] (3)where Dis denotes the Euclidean distance between cov ( X ) ∈ R C × C and R ∈ R C × C . The mean Dis mean and standarddeviation

Dis std of all

Dis , and R were saved for later use.Next, a common space R was obtained by R = R + R (4)And eigendecomposition was conducted as R = U Λ U T (5)where U and Λ (sorted in descending) represent the eigen-vectors and the eigenvalues, respectively. Then the whiteningmatrix P of R was obtained: P = √ Λ − U T (6)with which R and R were transformed as S = P R P T S = P R P T (7)The orthogonal diagonalization of S was obtained as S = B Λ S B T (8)where B and Λ S (sorted in ascending) are the eigenvectorsand eigenvalues of S . Due to orthogonality, there was B T B = I (9) I = B T √ Λ − U T U Λ U T ( √ Λ − U T ) T B = B T P RP T B = B T P R P T B + B T P R P T B = B T P R P T B + Λ S (10)where B T P R P T B is a diagonal matrix, denoted as Λ (cid:48) S witha value of I − Λ S , which means that B T P diagonalizes both R and R . We could see that the values of Λ (cid:48) S becomelarger when the values of Λ S becomes smaller. B and P werealso saved for later use in CS-GAN. Moreover, P T B wasconsidered as a spatial ﬁlter, with which the difference betweenthe ’one’ and the ’rest’ was maximized. We obtained four sub-ﬁlters here because of the four categories we had, and the ﬁrstfour columns of each sub-ﬁlter were token corresponding tothe largest four eigenvalues of Λ (cid:48) S , to reduce the computationalcomplexity. Then the ﬁnal spatial ﬁlter W was completed bystacking the four sub-ﬁlter and could be used as Z = W (cid:48) X ; (11)where Z is the processed sample. In this framework, the spatialﬁlter W with a shape of channel × was saved for CS-GAN.The number of columns of the spatial ﬁlter could be chosenas actual needs. D. CS-GAN

The most important part of this article is to propose anovel EEG data augmentation method based on GANs, whicheffectively maintain the characteristics of the original signalwhile enhancing the spatial features and discrimination be-tween different categories.GAN is a training strategy that constructs a competition be-tween two networks named generator G and discriminator D .To illustrate with an analogy, it is a game between counterfeiterand police, where the counterfeiter tries to deceive the policewith fake paintings. In actual use, artiﬁcial data generated by G from random noise are inputted to D along with real data,and D identiﬁes whether it is real or fake. G and D ﬁnallyreach equilibrium after enough adversarial training. At thispoint, the generated data captures the distribution of real data,and cannot be distinguished by the discriminator.The architecture of CS-GAN is given in Fig. 2. As wecan see, it also includes a generator part and a discriminatorpart. But different from other research, we have innovativelyexpanded the discriminator into two modules to distinguishEEG data and common spatial data (CS data), respectively.Besides, R , Dis mean , Dis std , P and B obtained in theprevious step were used to constrain the generator and betterretain EEG samples’ spatial features. The detailed design ofCS-GAN is given as follows.

1) Generator:

The generator was employed to generatefake EEG signals similar to the real EEG signals of differentmotor imagery. We inputted a piece of normally distributedrandom noise ( × ) to the generator and introduced onefully-connected layer and ﬁve transposed convolutional layersto up-sample the input to the same size as the original sample( channel × sample point ). Note that the input size was longer Fig. 2. The architecture of CS-GAN, which contains a generator and a discriminator. The discriminator has EEG module and CS-module. The cov-loss andev-loss are used as constraints. CS-module, cov-loss and ev-loss are used together to maintain EEG signal’s spatial features and enhance the spatial differencebetween different categories. than the input of common GANs, due to the scale of EEGsamples was also more extensive.The generator’s network structure is shown in Table I,where

F C is the fully-connected layer, and

ConvT rans isthe transposed convolutional layer. It is worth noting thatour method used 2-dimensional convolution to learn the deeprelationship among channels, some of the spatial features ofEEG. Batch normalization was applied behind the ﬁrst fourtransposed convolutional layers. The activation function of thefully-connected layers and the transposed convolutional layersexcept the last layer was LeakyReLU with a negative slope of0.2.

TABLE IT

HE NETWORK STRUCTURE OF THE GENERATOR . Layers Input Output Kernel stride

F C

ConvT rans

128 128 (3, 15) (1, 3)128 128 (3, 13) (1, 3)128 64 (3, 5) (1, 2)64 32 (4, 5) (2, 1)32 1 (1, 2) (1, 1)

2) Discriminator:

Compared with the past usage, we haveconsiderably changed the discriminator part, the purpose ofwhich is to make the generated data retain the spatial featuresof original data, through more targeted adversarial training.The usual practice is to input original real data and generatedfake data into the discriminator. In our method, the spatialﬁlters obtained from the previous step were introduced ﬁrstly to process the real EEG data and fake EEG data. Then theresults (called real CS data and fake CS data) were input tothe discriminator together with real EEG data and fake EEGdata. Two modules distinguished whether the EEG data andCS data were real or fake.The discriminator’s network structure is given in TableII, where

Conv is the convolutional layer,

M axpool is themax-pooling layer, and

F C is the fully-connected layer.

T emporal Conv and

Spatial Conv were used to extract thetemporal features and spatial features separately, with whichthe distinguishing ability of the discriminator was enhanced.The only difference between EEG module and CS-modulewas that the kernel size of

Spatial Conv in EEG modulewas ( channels, . And there was an extra layer to separatethe channels according to four sub-ﬁlter for four categories in Spatial Conv of the CS-module. LeakyReLU with a negativeslope of 0.2 was applied as the activation function after eachconvolutional layer, but no batch normalization. Finally, twofully-connected layers were used to obtain the probability ofidentifying EEG data and CS data, respectively. In this way,not only the EEG signals conformed to the original distributionwould be generated, but the good feature characteristics andseparability brought by the spatial ﬁlter was maintained.

3) Loss Function:

The loss function of CS-GAN mainlyconsisted of adversarial loss, covariance loss (cov-loss) andeigenvalue loss (ev-loss). The adversarial was constructedbased on Wasserstein GAN with a gradient penalty to stablethe training processing [30] as L adv = E G ( z ) ∼ P g [ D ( G ( z ))] − E x ∼ P r [ D ( x )] + λ gp GP (ˆ x ) (12) GP (ˆ x ) = E ˆ x ∼ P ˆ x [( (cid:107)∇ ˆ x D (ˆ x ) (cid:107) − ] (13) TABLE IIT

HE NETWORK STRUCTURE OF THE DISCRIMINATOR . Modules Layers Input Output Kernel strideEEG

T emporal Conv

Spatial Conv

10 30 (22, 1) (1, 1)

Conv

30 30 (1, 17) (1, 1)

Maxpool - - (1, 6) (1, 6)

Conv

30 30 (1, 7) (1, 1)

Maxpool - - (1, 6) (1, 6)

F C

750 1 - -CS

T emporal Conv

Spatial Conv

10 30 (4, 1) (4, 1)30 30 (4, 1) (1, 1)

Conv

30 30 (1, 17) (1, 1)

Maxpool - - (1, 6) (1, 6)

Conv

30 30 (1, 7) (1, 1)

Maxpool - - (1, 6) (1, 6)

F C where E is the expectation operator, G ( z ) denotes the gen-erated sample produced by the generator G from a randomnoise input z and D ( x ) is the probability of sample x belonging to the real. P g is the generated data distributionand P r is the real data distribution. GP () is the gradientpenalty, and ˆ x represents the data obtained by linear samplingbetween the generated data G ( z ) and the real data x, where ˆ x = αx +(1 − α ) G ( z ) and α is a random value on the interval (0 , .The cov-loss was proposed to make the covariance matrixof each generated sample approximate to the real samples ofits category because the spatial relationship between channelsof a sample was reﬂected by the covariance matrix. It wascalculated as L cov = abs ( (cid:107) cov ( G ( z )) − R (cid:107) − Dis mean ) Dis std (14)where loss is set to 0 when loss ≤ , abs () means absolutevalue operator and cov ( G ( z )) is the covariance matrix ofthe generated sample. The Euclidean distance from which tothe average covariance matrix of the corresponding categoryis calculated and made as close as possible to the originalsamples’ distance distribution.According to the previous steps, the eigenmatrix Λ S of thecovariance matrix could be diagonalized by B T , and the largervalues of Λ (cid:48) S corresponded to the smaller value of Λ S . The ﬁrstfour columns of the spatial ﬁlter were chosen correspondingto the four largest values of Λ (cid:48) S , when obtaining the spatialﬁler. Therefore, we designed the eigenvalue loss as ev = diag ( B T P cov ( G ( z )) P T B )[1 : 4] (15) L ev = abs ( log ( E ( ev )) (16)where diag () is used to obtain diagonal elements, ev isthe largest four eigenvalues of the generated samples aftertransformation, and log () is the natural logarithm, which isused to make the eigenvalues larger within the range of zeroto one, because the diagonal elements of both B T P R P T B and B T P R P T B are greater than zero. Thus, the categorydifference of the generated samples increases with eigenvalueloss, further promoting later classiﬁcation.The adversarial loss was improved for the discriminatorpart so that it could identify original EEG data and CS datasimultaneously. The overall loss of the generator and thediscriminator could be formulated in the following equations: L G = − E ( D ( G ( z ))) − λ cs E ( D ( W G ( z )))+ λ cov L cov + λ ev L ev (17) L D = E ( D ( G ( z )) − E ( D ( x )) + λ gp GP (ˆ x )+ E ( D ( W G ( z )) − E ( D ( W x )) + λ gp GP ( W ˆ x ) (18)where W is the spatial ﬁlter obtained for the previous step. λ gp , λ cs , λ cov , λ ev denote the weight of gradient penalty, theloss of common spatial part, covariance loss and eigenvalueloss, separately, the values of which were determined to be , . , and by comparative experiments. E. Classiﬁer

We presented a general method for EEG classiﬁcation,which was also used to test CS-GAN’s performance. The ﬁrststep was to get a spatial ﬁlter using matrix transformationand revised OVR strategy as before. The category differencewas increased by projecting the original data of differentcategories into a new common space. The processed datahad 16 channels, and every four channels was a kind ofrepresentation that was easier to be distinguished into onecategory.A CNN was designed to learn such mixed enhanced repre-sentation as TABLE III. After initializing the network param-eters, a convolutional layer was used to separate the sampleinto four parts, with which the four categories were moremanageable to be token out, respectively. Then several convo-lutional layers with max-pooling layers were used to extracttemporal features. And there were three fully-connected layersto obtain the deep representations. The activation functionwas LeakyReLU with a negative slope of 0.2. Dropout witha rate of 0.3 in fully-connected layers was used to avoidoverﬁtting. Finally, a fully-connected layer with four units andsoftmax function was implemented to obtain the classiﬁcationprobabilities of the four categories.IV. E

XPERIMENTS AND R ESULTS

An approach has been proposed to improve the cross-subject practicability of motor imagery EEG classiﬁcationalong with the idea of adaptive training. A small amount ofsubject-speciﬁc data is augmented with CS-GAN to introduceindividual speciﬁc information into the subject-independentdata.In this section, we are going to validate the ability ofthis method to improve the cross-subject classiﬁcation frommultiple perspectives. Firstly, we conﬁrm the effectiveness ofthis method by comparing whether to add augmented data.Secondly, we compare the improvement of our method andother great augmentation methods in cross-subject classiﬁ-cation and the ablation test of CS-GAN. Additionally, our

TABLE IIIT

HE NETWORK STRUCTURE OF THE

CNN

CLASSIFIER . Layers Input Output Kernel stride

CS Conv

Conv

16 32 (1, 23) (1, 3)

Conv

32 64 (1, 17) (1, 1)

Maxpool - - (1, 6) (1, 6)

Conv

64 128 (1, 7) (1, 1)

Maxpool - - (1, 2) (1, 2)

F C benchmark for classiﬁcation is compared with several greatmethods to ensure the rationality of the previous veriﬁcation.The generated EEG signals are also compared with originalsignals in the temporal domain and frequency domain. Finally,we test the generalization ability of our method on dataset 2bof BCI competition IV.

A. Experiment Settings

Our framework was implemented with the PyTorch libraryin Python 3.6 on a server with Intel Xeon CPU and Geforce2080Ti GPU. For the dataset, the three electrooculography(EOG) channels were directly discarded without artifact re-moving operation. In CS-GAN, Adam was used as the opti-mizer with a learning rate of 0.0001, β of 0.1 and β of 0.999.The network parameters were updated after every batch witha size of 5. kaiming uniform initialization was employed. Inthe classiﬁer, β changed to 0.9 and the batch size to 50.The ’T’ session of nine subjects in BCI competition IVdataset 2a was chosen for the experiments so that each subjecthad 288 samples, 72 samples for each category. Besides,100 data was randomly chosen from the 288 samples to testthe extreme performance with a few data. We know thatGANs usually need a lot of data for training, but only 16to 31 data were used for augmenting one category with CS-GAN. Moreover, the paired t-test was employed for statisticalanalysis. B. Data Augmentation for Cross-Subject Classiﬁcation

Leave-one-subject-out (LOO) validation was ﬁrstly per-formed, and the results are shown in Table IV. As we cansee, due to the inherent non-stationary of EEG signal, if onlythe data of eight subjects is used for training and the data ofthe other one subject is used for the test, very poor averageclassiﬁcation accuracy of 52.12% with a standard deviation of14.08% is obtained which cannot be used in practice.As far as we know, there is no widely accepted methodto directly use any method to achieve good enough LOOclassiﬁcation results, because we have no reasonable basisfor distinguishing the motor imagery related information andidentity related information. We cannot introduce the informa-tion reﬂects the data distribution of a speciﬁc subject as well. Therefore, some research tries to pre-train with the subject-independent data and adaptively ﬁne-tune with the subject-speciﬁc data. As Table III presents, the classiﬁcation accuracyis 59.40% when adapting 100 real subject-speciﬁc data intothe subject-independent data, which is signiﬁcantly higher thanthat of LOO ( p-value < p-value > p-value < p-value < ± p-value < Fig. 3. The average EEG classiﬁcation accuracy with the different numberof augmentation data.

C. Some Augmentation Methods and Ablation Study

In this part, we compare CS-GAN with some other greataugmentation methods and perform ablation study to furtherensure the CS-GAN’s superiority and the effectiveness of theadditional CS-module, cov-loss and ev-loss. The input was thesame as CS-GAN, and 3000 fake data generated by different

TABLE IV

CLASSIFICATION ACCURACY ( IN PERCENTAGE %) UNDER DIFFERENT AUGMENTATION SITUATIONS

S01 S02 S03 S04 S05 S06 S07 S08 S09 AccuracyLeave-one-subject-out 69.10 36.81 60.76 45.83 33.33 42.71 45.83 65.97 68.75 52.12 ± ± ± ± ways were introduced for estimation, respectively. The averageaccuracy was obtained from the results of nine subjects.

1) Adding Gaussian Noise:

Adding Gaussian noise to thepre-processed data with normal distribution is a commonmethod which will not change the original distribution of theEEG signals. The standard deviation of the noise was chosento be 0.2, according to the optimal result obtained in [18].

2) Segmentation and Recombination (S&R):

We imple-mented S&R, a widely accepted method, to generate aug-mentation data. EEG trials of the same category were dividedinto several segments, which would be concatenated randomlymaintaining the order in the trial. A trial was segmented intoeight segments, as suggested in [54].

3) Variational Auto-Encoder (VAE):

Generative modelsthat appear in recent years have also been compared. Theﬁrst was VAE inspired by [39], which employed an encoderconsisted of a 1-dimensional convolutional layer with max-pooling layer and two fully-connected layers, and a decoderconsisted of a fully-connected layer and three transposedconvolutional layers. The encoder projected the EEG data tolatent vectors, using which the decoder reconstructed the input.

4) Deep Convolutional GAN (DCGAN):

We also imple-mented DCGANs which showed remarkable performance in[40], one of the few works to apply GANs for EEG dataaugmentation. The generator used two fully-connected lay-ers and up-sampling followed by convolutional layers. Thediscriminator consisted of pairs of convolutional and max-pooling layers, followed by a fully-connected layer. Adamwith a learning rate of 0.0001 and β of 0.2 was employedfor optimization.

5) Ablating All:

The difference between our CS-GAN andother GANs lies in the CS-module, cov-loss and ev-loss, inaddition to the well-designed network structure with particularparts for temporal and spatial information. Therefore, theablation tests were conducted from the above perspectives.Firstly, the CS-module was completely removed, and λ cov aswell as λ ev was set to be . It can be said to be a kind ofDCGAN with Wasserstein loss.

6) Ablating Common Spatial Module (CS-M), covarianceloss (cov-loss), eigenvalue loss (ev-loss):

The role of CS-module was to maintain EEG data’s spatial characteristics forclassiﬁcation by controlling the projected data, since we hadconstructed a new subspace where the four categories hadgreater discrimination. Then the cov-loss was employed tofurther holding the spatial pattern of EEG signals, and the ev-loss was used to improve the discrimination between differentcategories. Therefore, we tested the classiﬁcation performance using the fake data generated with no CS-module, no cov-lossand no ev-loss separately.

Fig. 4. The average classiﬁcation accuracy using different augmentationmethods, including adding Gaussian noise, segmentation and reconstruction,variational auto-encoder, deep convolutional GAN and ablation test.

These competitive methods have been used to generatedfake data for augmentation, and the classiﬁcation results areshown in Fig. 4. As we can see, these two commonly useddata augmentation methods, adding Gaussian noise and S&R,indeed have very good results. However, the introduced noisereduces the signal-to-noise ratio of the EEG, which itselfhas a lot of noise. Segmenting the samples and randomlyreconstructing destroys each sample’s inherent consistencybecause the latent period is usually difﬁcult to keep the same.Therefore, such shortcomings make the results of these twomethods unsatisfactory and lower than the result of CS-GAN(Noise: 3.37%, p-value < p-value < p-value < p-value < p-value < ablation tests. We can see that when there is no CS-module, thequality of the generated signals is greatly reduced (no CS-M:10.64%, p-value < p-value < p-value < D. Quality of The Generated Data

After conﬁrming the improvement of cross-subject classi-ﬁcation performance with CS-GAN, we tried to evaluate thequality of the generated signals from the perspective of timedomain, frequency domain and spatial domain. Since the CS-GAN model for each category of each subject is parallel,subject 1’s training data and generated data of right-handmovement were averaged separately for visualization.Firstly, the three main channels on the motor area, C3, Czand C4 were chosen as Fig. 5 to compare the original realdata and generated fake data in the time domain [55]. We putthe real signal in orange and the fake signal in blue on thesame axis. It could be seen that the temporal distribution ofthe generated data is similar to the original data. Both meanand range are relatively close. There are some mismatchesat the beginning and the end, due to the large kernel of theﬁrst two convolutional layers in the generator. Moreover, ourspatial enhancement of the relationship between channels alsomakes the three channels more relevant, which is an attributeconsistent with the real signal.Secondly, we averaged the 22 channels of the samples anddrew the spectrogram of the real and fake data in Fig. 6 toshow the power spectral density, with which we can comparethe original and generated signals by measuring the power

Fig. 5. Comparison channel C3, Cz and C4 of original real data and generatedfake data. The real data is represented by orange, and the fake data isrepresented by blue.Fig. 6. Comparison the spectrogram of the original real data and generatedfake data, of which the 22 channels are averaged. The above is real, and thebelow is fake. The unit of the vertical axis is Hz, and of the colorbar is dB. content versus frequency. The spectrogram display 4-40 Hz asthe preprocessing. It can be seen that generated signal showshigher power at the frequency of where the original signalpower is higher. Especially in the range of 5 to 25 Hz, thepower distribution over the entire time is close.Thirdly, we focused on the spatial distribution of the gener-ated data and used heat maps to plot the normalized covariancematrix of the original real data and generated fake data in Fig.7. This way is used to observe the relationship between thechannels since the covariance matrix reﬂects the relationshipbetween the rows of the data. From the heat maps, we can seethat the relationship between adjacent EEG channels is wellmaintained, which shows that the spatial distribution of thegenerated data is consistent with the original data. So far, wecan be conﬁrmed that we have indeed generated fake data ofsufﬁcient quality for our augmentation. Fig. 7. Comparison of the heat map of the covariance matrix of original realdata and generated fake data to show the relationship between channels. Thecoordinates from top to bottom and left to right refer to channel 1 to channel22. Each small square represents the covariance between two channels.

E. Classiﬁcation Benchmark

To verify the performance of the CS-GAN for cross-subjectBCI classiﬁcation, we also designed a multi-classiﬁcationmethod as a benchmark. The EEG samples were transformedinto a new subspace with more category discrimination andthen inputted to a CNN. Here we compare with other goodmethods to make sure our classiﬁcation method is available. Itshould be noted that most of the research train one classiﬁerfor each subject’s data in Dataset 2a of BCI competition IV.So we do the same thing, taking about one-tenth (50) of onesubject’s data as the test data and the rest as the training data.The comparison results are given in Table V.For evaluation, we employ the accuracy and kappa value,which measures the accuracy occurring by chance. From thetable, it could be seen that our method is competitive inaverage results compared to other state-of-the-art methods.The result of each subject also has superiority, except subject5. Further, we can be sure that there is no problem using ourclassiﬁcation methods as a benchmark to verify the effect ofdata augmentation.Obviously, this condition has enough subject-speciﬁc datafor training (more than ﬁve times data as much as we used totrain CS-GAN). As the previous experiments, the classiﬁcationresult obtained without subject-speciﬁc data is 30.84% lowerthan the result with enough subject-speciﬁc data, and the resultobtained with fewer data but no augmentation is 23.56% lower.

F. Generalization ability of the Proposed Method

Similar tests were also conducted on dataset 2b of BCIcompetition IV, which has two categories of 3-channel motorimagery data. The ﬁrst session with 120 trials of each subjectwas chosen and separated to a training set of 50 trials anda test set of 70 trials. Twenty-ﬁve samples were used totrain the CS-GAN and obtain the augmentation data of onecategory. One thousand samples, approximating the amountof subject-independent data, were generated for each subject’stest. The average classiﬁcation accuracy of LOO, adapting 50real samples, adapting 1000 fake samples was 75.71 ± ± ± p-value < p-value < ISCUSSION

The cross-subject problem restricts the application of EEG-based BCI for a long time. There are two stumbling blocks.One is the large individual difference that makes it difﬁcultto classify EEG with just subject-independent data. The otheris the low signal-to-noise ratio, with which existing classiﬁ-cation methods cannot easily perceive the difference betweendifferent categories.In this article, we propose a novel strategy to solve thisproblem from two angles, data augmentation and featureenhancement, on the basis of adaptive training. A generativeadversarial network named CS-GAN has been developed toaugment a very small amount of subject-speciﬁc data foradaptively training the classiﬁer with subject-independent data.By this way, the calibration time before the actual use of a BCIsystem could be signiﬁcantly reduced, and the classiﬁer is alsowell trained with enough subject-speciﬁc information.Many related studies implement algorithms source fromthe ﬁeld of image processing but ignore the inherent spatialcharacteristics of EEG signals, which is exactly what thispaper focuses. We use the covariance matrix to construct asubspace, in which the discrimination of different categories isenlarged a lot. The spatial information obtained in the processis further used as constraints, CS-module, cov-loss and ev-loss, in CS-GAN to generate high-quality multi-channel datawith implicit discrimination. Even some good research usinggenerative models only generate EEG with few channels orjust some speciﬁc features. Through this augmentation strategyand CS-GAN, we have greatly improved the performance ofcross-subject EEG classiﬁcation. Moreover, we design a CNNfor motor imagery EEG classiﬁcation, along with the idea ofspatial enhancement. Remarkable results have been achievedon a general public dataset.It is worth noting that we also give a potential way ofapplying GANs, which often appears in some entertainmentscenes. Although GANs-based data augmentation has beenproposed for a long time, there is a paradox that it isimpossible to introduce new information for deeper learningwith the original dataset. Nevertheless, we use it to introduceindividual information into the total dataset and prove that itis more effective than other augmentation methods.This paper also has some limitations and issues that needfurther veriﬁcation. We focus on studied cross-subject, but nocross-session. The difference in the performance of differentsessions is also not small. That is because we think cross-subject is a more urgent problem, and cross-session can bealleviated by controlling the objective factors such as useenvironment, impedance and user mood. TABLE V

SINGLE - SUBJECT CLASSIFICATION ACCURACY WITH DIFFERENT METHODS ON DATASET A OF

BCI

COMPETITION IV S01 S02 S03 S04 S05 S06 S07 S08 S09 Accuracy (kappa)Multi-Branch 3D [41] 77.40 60.14 82.93 72.29 75.84 68.99 76.04 76.85 84.66 75.02 (0.6669)TSSM+LDA [56] 81.80 62.50 88.80 63.70 62.90 58.50 86.60 85.10 90.00 75.50 (0.6733)Envelop+CNN [57] 85.23 69.73 90.15 65.57 77.42 52.41 93.68 90.04 84.75 78.78 (0.7171)Functional Brain Network [58] 82.80 65.50 87.90 77.60 72.40

Ours

Another is that actually we conduct adaptive training bymixing the subject-independent data and augmentation datato train the classiﬁer at the same time, instead of strictlypre-training with subject-independent data and ﬁne-tuningwith augmentation data. This practice refers to [60], and nosigniﬁcant difference between the two ways has been observedin our pre-tests.One more is about to evaluate the quality of the generateddata. There is no widely accepted method to evaluate thequality of the generated EEG signal. So we choose to comparethe signal itself from the time domain, frequency domain andspatial domain. In fact, it can be seen that the distributionof the generated signals is not perfect, due to more Gaussiannoise inputted to CS-GAN. However, if we use shorter inputlike other GANs, the generator can easily fool the discrimi-nator that only learns a small amount of real data, so that themodel collapses. That is a balance of noise and diversity thatworthy more investigating.VI. C

ONCLUSION

This paper proposes CS-GAN for EEG signals, in whichthe spatial features are used to generate EEG data close to theoriginal data distribution and increase the category discrimina-tion. Experimental results show that the cross-subject problemin EEG-based BCI has been successfully alleviated using thegenerated data. Besides, we also give a good classiﬁcationmethod that can be used as a benchmark. Great potential theproposed framework has to improve the practicality of BCI.A

CKNOWLEDGMENT

This work was supported in part by the National Nat-ural Science Foundation of China (Grant No. 52075177),Joint Fund of the Ministry of Education for Equipment Pre-Research (Grant No. 6141A02033124), Research Founda-tion of Guangdong Province (Grant No. 2019A050505001and 2018KZDXM002), and Guangzhou Research Foundation(Grant No. 202002030324 and 201903010028).R

EFERENCES[1] Barlow, John, and S., “Electroencephalography: Basic principles, clinicalapplications and related ﬁelds,”

JAMA , vol. 250, no. 22, pp. 3108–3108,2011.[2] H. Adeli, S. Ghosh-Dastidar, and N. Dadmehr, “A wavelet-chaosmethodology for analysis of eegs and eeg subbands to detect seizureand epilepsy.”

IEEE transactions on bio-medical engineering , vol. 54,no. 2, pp. 205–11, 2007. [3] N. P. Raju, U. Venkatesh, and S. Yadhav, “Diagnosing insomnia usingsingle channel eeg signal,” in , 2019, pp. 570–573.[4] Y. Zhao, Y. Zhao, P. Durongbhan, L. Chen, J. Liu, S. A. Billings,P. Zis, Z. C. Unwin, M. De Marco, A. Venneri, D. J. Blackburn,and P. G. Sarrigiannis, “Imaging of nonlinear and dynamic functionalbrain connectivity based on eeg recordings with the application onthe diagnosis of alzheimer’s disease,”

IEEE Transactions on MedicalImaging , vol. 39, no. 5, pp. 1571–1581, 2020.[5] C. Guger, G. Edlinger, W. Harkam, I. Niedermayer, and G. Pfurtscheller,“How many people are able to operate an eeg-based brain-computer in-terface (bci)?”

IEEE Transactions on Neural Systems and RehabilitationEngineering , vol. 11, no. 2, pp. 145–147, 2003.[6] M. M. Shanechi, “Brain–machine interfaces from motor to mood,”

Nature Neuroence , vol. 22, no. 10, pp. 1554–1564, 2019.[7] L. Luis and J. Gomez-Gil, “Brain computer interfaces, a review,”

Sensors(Basel, Switzerland) , vol. 12, pp. 1211–79, 12 2012.[8] R. Foong, K. K. Ang, C. Quek, C. Guan, K. S. Phua, C. W. K. Kuah,V. A. Deshmukh, L. H. L. Yam, D. K. Rajeswaran, N. Tang, E. Chew,and K. S. G. Chua, “Assessment of the efﬁcacy of eeg-based mi-bciwith visual feedback and eeg correlates of mental fatigue for upper-limbstroke rehabilitation,”

IEEE Transactions on Biomedical Engineering ,vol. 67, no. 3, pp. 786–795, 2020.[9] U. Chaudhary, N. Birbaumer, and A. Ramos-Murguialday,“Brain–computer interfaces for communication and rehabilitation,”

Nature Reviews Neurology , 2017.[10] S. R. S. a b, N. B. b c, M. W. S. d, and L. G. C. e, “Brain-machineinterfaces in neurorehabilitation of stroke,”

Neurobiology of Disease ,vol. 83, no. 1, pp. 172–179, 2015.[11] R. Zhang, Q. Zong, L. Dou, and X. Zhao, “A novel hybrid deep learningscheme for four-class motor imagery classiﬁcation,”

Journal of NeuralEngineering , vol. 16, no. 6, pp. 066 004.1–066 004.11, 2019.[12] J. Luo, Z. Feng, and N. Lu, “Spatio-temporal discrepancy feature forclassiﬁcation of motor imageries,”

Biomedical Signal Processing andControl , vol. 47, no. JAN., pp. 137–144, 2019.[13] Y. Hou, L. Zhou, S. Jia, and X. Lun, “A novel approach of decodingeeg four-class motor imagery tasks via scout esi and cnn,”

Journal ofNeural Engineering , vol. 17, no. 1, 2019.[14] X. Li, D. Song, P. Zhang, Y. Zhang, Y. Hou, and B. Hu, “Exploring eegfeatures in cross-subject emotion recognition,”

Frontiers in Neuroence ,vol. 12, no. 1, 2018.[15] X. Liu, B. V. K. Vijaya Kumar, P. Jia, and J. You, “Hard negative gen-eration for identity-disentangled facial expression recognition,”

PatternRecognition , vol. 88, pp. 1–12, 2019.[16] W. Huang, L. Wang, Z. Yan, and Y. Liu, “Classify motor imagery by anovel cnn with data augmentation*,” in , 2020, pp. 192–195.[17] S. Sakhavi and C. Guan, “Convolutional neural network-based transferlearning and knowledge distillation using multi-subject data in motorimagery bci,” in , 2017, pp. 588–591.[18] F. Wang, S. Zhong, J. Peng, J. Jiang, and Y. Liu, “Data augmentationfor eeg-based emotion recognition with deep convolutional neural net-works,” in

MMM , 2018.[19] E. Lashgari, D. Liang, and U. Maoz, “Data augmentation for deep-learning-based electroencephalography,”

Journal of Neuroence Methods ,vol. 346, p. 108885, 2020.[20] I. J. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley,S. Ozair, A. Courville, and Y. Bengio, “Generative adversarial networks,” Advances in Neural Information Processing Systems , vol. 3, pp. 2672–2680, 2014.[21] Y. Zhang, B. Liu, X. Ji, and D. Huang, “Classiﬁcation of eeg signalsbased on autoregressive model and wavelet packet decomposition,”

Neural Processing Letters , vol. 45, no. 2, pp. 1–14, 2017.[22] H. Zhang, C. E. Stevenson, T. Jung, and L. Ko, “Stress-induced effectsin resting eeg spectra predict the performance of ssvep-based bci,”

IEEETransactions on Neural Systems and Rehabilitation Engineering , vol. 28,no. 8, pp. 1771–1780, 2020.[23] M. Saini, U. Satija, and M. D. Upadhayay, “Wavelet based waveformdistortion measures for assessment of denoised eeg quality with refer-ence to noise-free eeg signal,”

IEEE Signal Processing Letters , vol. 27,pp. 1260–1264, 2020.[24] H. Ramoser, J. Muller-Gerking, and G. Pfurtscheller, “Optimal spatialﬁltering of single trial eeg during imagined hand movement,”

IEEETransactions on Rehabilitation Engineering , vol. 8, no. 4, pp. 441–446,2000.[25] B. Rivet*, A. Souloumiac, V. Attina, and G. Gibert, “xdawn algorithmto enhance evoked potentials: Application to brain–computer interface,”

IEEE Transactions on Biomedical Engineering , vol. 56, no. 8, pp. 2035–2043, 2009.[26] M. Mirza and S. Osindero, “Conditional generative adversarial nets,”2014.[27] A. Odena, C. Olah, and J. Shlens, “Conditional image synthesis withauxiliary classiﬁer gans,” 2017.[28] A. Radford, L. Metz, and S. Chintala, “Unsupervised representationlearning with deep convolutional generative adversarial networks,” 2016.[29] M. Arjovsky, S. Chintala, and L. Bottou, “Wasserstein gan,” 2017.[30] I. Gulrajani, F. Ahmed, M. Arjovsky, V. Dumoulin, and A. Courville,“Improved training of wasserstein gans,” 2017.[31] J.-Y. Zhu, T. Park, P. Isola, and A. A. Efros, “Unpaired image-to-imagetranslation using cycle-consistent adversarial networks,” 2020.[32] K. Nazeri, E. Ng, T. Joseph, F. Z. Qureshi, and M. Ebrahimi, “Edge-connect: Generative image inpainting with adversarial edge learning,”2019.[33] K. G. Hartmann, R. T. Schirrmeister, and T. Ball, “Eeg-gan: Generativeadversarial networks for electroencephalograhic (eeg) brain signals,”2018.[34] S. Roy, S. Dora, K. McCreadie, and G. Prasad, “Mieeg-gan: Generat-ing artiﬁcial motor imagery electroencephalography signals,” in , 2020, pp.1–8.[35] I. A. Corley and Y. Huang, “Deep eeg super-resolution: Upsamplingeeg spatial resolution with generative adversarial networks,” in , 2018, pp. 100–103.[36] O. Ozdenizci, Y. Wang, T. Koike-Akino, and D. Erdogmus, “Adversarialdeep learning in eeg biometrics,”

IEEE Signal Processing Letters ,vol. 26, no. 5, pp. 710–714, 2019.[37] Y. Luo, L.-Z. Zhu, Z.-Y. Wan, and B.-L. Lu, “Data augmentationfor enhancing EEG-based emotion recognition with deep generativemodels,”

Journal of Neural Engineering , vol. 17, no. 5, p. 056021, oct2020.[38] Q. Zhang and Y. Liu, “Improving brain computer interface performanceby data augmentation with conditional deep convolutional generativeadversarial networks,” 2018.[39] N. K. Nik Aznan, A. Atapour-Abarghouei, S. Bonner, J. D.Connolly, N. Al Moubayed, and T. P. Breckon, “Simulating brainsignals: Creating synthetic eeg data via neural-based generativemodels for improved ssvep classiﬁcation,” , Jul 2019. [Online]. Available:http://dx.doi.org/10.1109/IJCNN.2019.8852227[40] F. Fahimi, S. Dosen, K. K. Ang, N. Mrachacz-Kersting, and C. Guan,“Generative adversarial networks-based data augmentation for brain-computer interface,”

IEEE Transactions on Neural Networks and Learn-ing Systems , pp. 1–13, 2020.[41] X. Zhao, H. Zhang, G. Zhu, F. You, S. Kuang, and L. Sun, “A multi-branch 3d convolutional neural network for eeg-based motor imageryclassiﬁcation,”

IEEE Transactions on Neural Systems and RehabilitationEngineering , vol. 27, no. 10, pp. 2164–2177, 2019.[42] M. Fan and C. Chun-An, “Detecting abnormal pattern of epilepticseizures via temporal synchronization of eeg signals,”

IEEE Transactionson Biomedical Engineering , vol. PP, pp. 1–1, 2018.[43] P. Durongbhan, Y. Zhao, L. Chen, P. Zis, M. De Marco, Z. C. Unwin,A. Venneri, X. He, S. Li, and Y. a. Zhao, “A dementia classiﬁcationframework using frequency and time-frequency features based on eeg signals,”

IEEE Transactions on Neural Systems and RehabilitationEngineering , pp. 826–835, 2019.[44] Kai Keng Ang, Zheng Yang Chin, Haihong Zhang, and Cuntai Guan,“Filter bank common spatial pattern (fbcsp) in brain-computer interface,”in , 2008, pp. 2390–2397.[45] Z. Gao, X. Wang, Y. Yang, C. Mu, Q. Cai, W. Dang, and S. Zuo, “Eeg-based spatio–temporal convolutional neural network for driver fatigueevaluation,”

IEEE Transactions on Neural Networks and LearningSystems , vol. 30, no. 9, pp. 2755–2763, 2019.[46] Y. Li, X. Zhang, B. Zhang, M. Lei, W. Cui, and Y. Guo, “A channel-projection mixed-scale convolutional neural network for motor imageryeeg decoding,”

IEEE Transactions on Neural Systems and RehabilitationEngineering , vol. 27, no. 6, pp. 1170–1180, 2019.[47] V. S. Handiru and V. A. Prasad, “Optimized bi-objective eeg channel se-lection and cross-subject generalization with brain–computer interfaces,”

IEEE Transactions on Human-Machine Systems , vol. 46, no. 6, pp. 777–786, 2016.[48] V. Gupta, M. D. Chopda, and R. B. Pachori, “Cross-subject emotionrecognition using ﬂexible analytic wavelet transform from eeg signals,”

IEEE Sensors Journal , vol. 19, no. 6, pp. 2266–2274, 2019.[49] H. Dose, J. S. Moller, H. K. Iversen, and S. Puthusserypady, “An end-to-end deep learning approach to mi-eeg signal classiﬁcation for bcis,”

Expert Systems with Applications , vol. 114, no. DEC., pp. 532–542,2018.[50] W. Hang, W. Feng, R. Du, S. Liang, Y. Chen, Q. Wang, and X. Liu,“Cross-subject eeg signal recognition using deep domain adaptationnetwork,”

IEEE Access , vol. 7, pp. 128 273–128 282, 2019.[51] H. Zhao, Q. Zheng, K. Ma, H. Li, and Y. Zheng, “Deep representation-based domain adaptation for nonstationary eeg classiﬁcation,”

IEEETransactions on Neural Networks and Learning Systems , pp. 1–11, 2020.[52] A. K. Keng, C. Z. Yang, W. Chuanchu, G. Cuntai, and Z. Haihong,“Filter bank common spatial pattern algorithm on bci competition ivdatasets 2a and 2b,”

Frontiers in Neuroence , vol. 6, p. 39, 2012.[53] J. S. Kirar and R. K. Agrawal, “Relevant frequency band selectionusing sequential forward feature selection for motor imagery braincomputer interfaces,” in , 2018.[54] F. Lotte, “Signal processing approaches to minimize or suppress cal-ibration time in oscillatory activity-based brain–computer interfaces,”

Proceedings of the IEEE , vol. 103, no. 6, pp. 871–890, 2015.[55] G. Pfurtscheller, C. Brunner, A. Schlogl, and F. H. L. D. Silva, “Murhythm (de)synchronization and eeg single-trial classiﬁcation of differentmotor imagery tasks.”

Neuroimage , vol. 31, no. 1, pp. 153–159, 2006.[56] X. Xie, Z. L. Yu, H. Lu, Z. Gu, and Y. Li, “Motor imagery classiﬁcationbased on bilinear sub-manifold learning of symmetric positive-deﬁnitematrices,”

IEEE Transactions on Neural Systems and RehabilitationEngineering , vol. 25, no. 6, pp. 504–516, 2017.[57] S. Sakhavi, C. Guan, and S. Yan, “Learning temporal information forbrain-computer interface using convolutional neural networks,”

IEEETransactions on Neural Networks and Learning Systems , vol. 29, no. 11,pp. 5619–5629, 2018.[58] Q. Ai, A. Chen, K. Chen, Q. Liu, T. Zhou, S. Xin, and Z. Ji, “Featureextraction of four-class motor imagery eeg signals based on functionalbrain network,”

Journal of Neural Engineering , vol. 16, no. 2, pp.026 032.1–026 032.14, 2019.[59] L. Yang, Y. Song, K. Ma, and L. Xie, “Motor imagery eeg decodingmethod based on a discriminative feature learning strategy,”

IEEETransactions on Neural Systems and Rehabilitation Engineering , 2021.[60] K. He, R. Girshick, and P. Dollar, “Rethinking imagenet pre-training,” in2019 IEEE/CVF International Conference on Computer Vision (ICCV)