[PDF] Re-thinking EEG-based non-invasive brain interfaces: modeling and analysis

Abstract

Brain interfaces are cyber-physical systems that aim to harvest information from the (physical) brain through sensing mechanisms, extract information about the underlying processes, and decide/actuate accordingly. Nonetheless, the brain interfaces are still in their infancy, but reaching to their maturity quickly as several initiatives are released to push forward their development (e.g., NeuraLink by Elon Musk and `typing-by-brain' by Facebook). This has motivated us to revisit the design of EEG-based non-invasive brain interfaces. Specifically, current methodologies entail a highly skilled neuro-functional approach and evidence-based \emph{a priori} knowledge about specific signal features and their interpretation from a neuro-physiological point of view. Hereafter, we propose to demystify such approaches, as we propose to leverage new time-varying complex network models that equip us with a fractal dynamical characterization of the underlying processes. Subsequently, the parameters of the proposed complex network models can be explained from a system's perspective, and, consecutively, used for classification using machine learning algorithms and/or actuation laws determined using control system's theory. Besides, the proposed system identification methods and techniques have computational complexities comparable with those currently used in EEG-based brain interfaces, which enable comparable online performances. Furthermore, we foresee that the proposed models and approaches are also valid using other invasive and non-invasive technologies. Finally, we illustrate and experimentally evaluate this approach on real EEG-datasets to assess and validate the proposed methodology. The classification accuracies are high even on having less number of training samples.

Full PDF

RRe-thinking EEG-based non-invasive brain interfaces: modeling and analysis

Gaurav Gupta ˚ , S´ergio Pequito : and Paul Bogdan ˚˚ Ming Hsieh Department of Electrical Engineering, University of Southern California, Los Angeles, CA, USAEmail: { ggaurav, pbogdan } @usc.edu : Department of Industrial and Systems Engineering, Rensselaer Polytechnic Institute, Troy, NY, USAEmail: [email protected]

Abstract —Brain interfaces are cyber-physical systems thataim to harvest information from the (physical) brain throughsensing mechanisms, extract information about the underlyingprocesses, and decide/actuate accordingly. Nonetheless, thebrain interfaces are still in their infancy, but reaching to theirmaturity quickly as several initiatives are released to pushforward their development (e.g., NeuraLink by Elon Musk and‘typing-by-brain’ by Facebook). This has motivated us to revisitthe design of EEG-based non-invasive brain interfaces. Specif-ically, current methodologies entail a highly skilled neuro-functional approach and evidence-based a priori knowledgeabout speciﬁc signal features and their interpretation froma neuro-physiological point of view. Hereafter, we propose todemystify such approaches, as we propose to leverage newtime-varying complex network models that equip us with afractal dynamical characterization of the underlying processes.Subsequently, the parameters of the proposed complex networkmodels can be explained from a system’s perspective, and, con-secutively, used for classiﬁcation using machine learning algo-rithms and/or actuation laws determined using control system’stheory. Besides, the proposed system identiﬁcation methodsand techniques have computational complexities comparablewith those currently used in EEG-based brain interfaces,which enable comparable online performances. Furthermore,we foresee that the proposed models and approaches arealso valid using other invasive and non-invasive technologies.Finally, we illustrate and experimentally evaluate this approachon real EEG-datasets to assess and validate the proposedmethodology. The classiﬁcation accuracies are high even onhaving less number of training samples.

Keywords -brain interfaces; spatiotemporal; fractional dy-namics; unknown inputs; classiﬁcation; motor prediction

I. I

NTRODUCTION

We have recently testimony a renewed interest in invasiveand non-invasive brain interfaces. Elon Musk has releasedthe NeuraLink initiative [1] that aims to develop devicesand mechanisms to interact with the brain in a symbioticfashion, thus merging the artiﬁcial intelligence with thehuman brain. The potential is enormous since it wouldideally present a leap in our understanding of the brain, andan unseen enhancement of its functionality. Alternatively,Facebook just announced the ‘Typing-by-Brain’ project [2]that gathered a team of researchers whose target is tobe capable of writing 100 words per minute that contrastswith the state-of-the-art of . to . words per minuteassuming an average of letters per word. Towards this goal, Facebook has invested in developing new non-invasiveoptical imaging technology that is ﬁve times faster andportable with respect to the one available on the marketand would possess increased spatial and temporal resolution.Nonetheless, these are just some of the initiatives amongothers by some big Silicon Valley players that want tocommercialize brain technologies [3].Part of the motivation for the ‘hype’ in the use ofneurotechnologies – both invasive and non-invasive brain in-terfaces – is due to their promising application domains [4]:( i ) replace , i.e., the interaction of the brain with a wheelchairor a prosthetic device to replace a permanent functionalityloss, ( ii ) restore , i.e., to bring to its normal use somereversible loss of functionality such as walking after a severecar accident or limb movement after a stroke, ( iii ) enhance ,i.e., to enable to outperform in a speciﬁc function or task, asfor instance an alert system to drive for long periods of timewhile attention is up; and ( iv ) supplement as in equippingone with extra functionality, as a third arm to be used duringsurgery. Notwithstanding, these are just some of the (known)potential uses of neurotechnology.Despite the developments and promise of future applica-tions of brain interfaces (some of which we cannot currentlyconceive), we believe that current approaches to both inva-sive and non-invasive brain interfaces can greatly beneﬁtfrom cyber-physical systems (CPS) oriented approaches andtools to increase their efﬁcacy and resilience. Hereafter, wepropose to focus on non-invasive technology relying onelectroencephalogram (EEG) and revisit it through a CPSlens. Moreover, we believe that the proposed methodologycan be easily applicable to other technologies, e.g., electro-magnetic ﬁelds (magnetoencephalography (MEG) [5], andthe hemodynamic responses associated to neural activity,e.g. functional magnetic resonance imaging (fMRI) [6],[7], and functional near-infrared spectroscopy (fNIRS) [8]).Nonetheless, these technologies present several drawbackscompared to EEG, e.g., cost, scalability, and user comfort,which lead us to focus on EEG-based technologies. Similarargument can be applied in the context of non-invasiveversus invasive technologies that require patient surgery.Traditional approach to EEG neuroadaptive technologyconsists of proceeding through the following steps [9],[10]: ( a ) signal acquisition (in our case, measurement of a r X i v : . [ q - b i o . N C ] M a r EG channels EEG time series featureextraction machine learningclassiﬁcation predictedmotor taskreal time processing of time seriesfor machine learning classiﬁcation

Figure 1: A systematic process ﬂow of the Brain interface. The imagined motor movements of the subject are captured in theform of EEG time series which are then fed to the computational unit. A fractional-order dynamics based complex networkmodel is estimated for the time series and the model parameters are used as features for machine learning classiﬁcation.The output of classiﬁer predicts various motor movements with certain conﬁdence.the EEG time-series); ( b ) signal processing (e.g., ﬁlteringwith respect to known error sources); ( c ) feature extraction(i.e., an artiﬁcial construction of the signal that aims tocapture quantities of interest); ( d ) feature translation (i.e.,classiﬁcation of the signal according to some plausibleneurophysiological hypothesis); and ( e ) decision making(e.g., provide instructions to the computer to move a cursor,or a wheelchair to move forward) – see also Figure 1 for anoverview diagram.In this paper, we propose to merge steps (b) and (c)motivated by the fact that these often accounts for spatialor temporal properties, and are only artiﬁcially combinedin a later phase of the pipeline, i.e., at step (d) of fea-ture translation. First, we construct a time-varying complexnetwork where the activity of nodes (or EEG signal) havelong-range memory and the edges accounts for inter-nodespatial dependence. Thus, we argue that the previous ap-proach discards several spatial-temporal properties that canbe weighted for signal processing and feature extractionphases. In other words, current EEG brain interfaces requireone to have an understanding of the different regions ofthe brain responsible. For instance, regions for motor orvisual actions, as well as artiﬁcial frequency bands that arebelieved to be more signiﬁcant for a speciﬁc action (alsoknown as evidence-based ). Besides, one needs to understandand anticipate the most likely causes noise/artifacts in theEEG data collected and ﬁlter out entire frequency bands,which possibly compromises phenomena of interest notbeing available for post-processing. Instead, we propose amodeling capability to enable the modeling of long-rangememory time-series that at the same time accounts for un- known stimuli, e.g., artifacts or inter-region communication. A. Related Work and Novel Contributions

We put forward that the ability to properly model the EEGtime-series with complex network models, that account forrealistic setups, can enable the brain interfaces methods toget transformed from detectors to decoders. In other words,we do not want to solely look for the existence of a peakof activity in a given band that is believed to be associatedwith a speciﬁc action. But we want to decompose the bunchof signals into different features, i.e., parameters of ourcomplex network model, that are interpretable. Thus, al-lowing us to understand how different regions communicateduring a speciﬁc action/task by representing them as nodesof the complex network and estimating the dependence viacoupling matrix. The node activities are assumed to beevolving at different time-scales and in the general setup ofpresence of external stimuli which could either be unwantednoise or external process driving the system. In engineering,this will enable us to depart from a skill dependent situationto general context analysis, which will enhance the resilienceof the approaches for practical nonsurgical brain interfaces.Besides, it will equip bioengineers, neuroscientists, andphysicians with an exploratory tool to pursue new technolo-gies for neuro-related diagnostics and treatments, as well asneuro-enhancement.The proposed approach departs from those available in theliterature, see [4], [9], [10] and references therein. In fact, tothe best of authors’ knowledge, in the context of noninvasiveEEG-based technologies, [11] is the only existing work thatexplores fractional-order models in the context of single-channel analysis, which contrasts with the spatiotemporalodeled leveraged in this paper that copes with multi-channel analysis. Furthermore, the methodology presentedin this paper also accommodate unknown stimuli [12].For which efﬁcient algorithms are proposed and leveragedhereafter to simultaneously retrieve the best model thatconforms with unknown stimuli, and separating the unknownstimuli from the time-series associated with brain activity.Our methods are as computationally efﬁcient and stable asleast-squares and spectral analysis methods used in a spatialand temporal analysis, respectively; thus, suitable for onlineimplementation in nonsurgical brain interfaces.The main contributions of the present paper are thoseof leveraging some of the recently proposed methods todevelop new modeling capabilities for the EEG based neuro-wearables that are capable of enhancing the signal qualityand decision-making. Furthermore, the parametric descrip-tion of these models provides us with new features that arebiologically motivated and easier to translate in the contextof brain function associated with a signal characterization,and free of signal artifacts. Thus, making the brain-relatedactivity interpretable, which leads to resilient and functionalnonsurgical brain interfaces.

B. Paper Organization

The remaining of the paper is organized as follows. Sec-tion II introduces the complex network model considered inthis paper and the main problem studied in this manuscript.Also we will see the description of the employed methodfor feature selection and then classiﬁcation techniques. InSection III, we present an elaborated study on the differentdatasets taken from the BCI competition [13].II. R E - THINKING

EEG-

BASED NON - INVASIVEBRAIN INTERFACES

Brain interfaces aim to address the following problem.

Is it possible to classify a speciﬁc cognitive state, e.g., mo-tor task or its imagination, by using measurements collectedwith a speciﬁc sensing technology that harvest informationabout brain activity?

In the current manuscript, we revisit this problem in thecontext of brain-computer interfaces (BCI), when dealingwith EEG-based noninvasive brain interfaces. Towards thisgoal, we review the currently adopted procedures for solvingthis problem (see Figure 1 for an overview), and proposea systems’ perspective that enables to enhance the BCIreliability and resilience. Therefore, in Section II-A we pro-vide a brief overview of the EEG-based technology and theconnection with the brain-areas’ function associated withstudies conducted in the past. Next, in Section II-B, we in-troduce the spatiotemporal fractional model under unknownstimuli. This will be the core of the proposed approach inthis manuscript to retrieve new features for classiﬁcation,and, subsequently, enhancing brain interfaces capabilities.In Section II-C, we describe how to determine the system model’s parameters, and in Section II-D we describe how tointerpret them for the classiﬁcation task.

A. EEG-based Technology for Brain Interfaces: a briefoverview

EEG enables the electrophysiological monitoring ofspace-averaged synaptic source activity from millions ofneurons occurring at the neocortex level. The EEG signalshave a poor spatial resolution but high temporal resolution,since the electrical activity generated at the ensemble ofneurons level arrives at the recording sites within mil-liseconds. The electrodes (i.e., sensor) are placed over anarea of the brain of interest, being the most common thevisual, motor, sensory, and pre-frontal cortices. Usually, theyfollow standard montages – the International 10-20 systemis depicted in Figure 2.Most of the activity captured by the EEG electrodes isdue to the interactions between inhibitory interneurons andexcitatory pyramidal cells, which produces rhythmic ﬂuctu-ations commonly referred to as oscilations . The mechanismsthat generate those oscillations is not yet completely under-stood, but it has been already identiﬁed that some ‘naturaloscillations’ provide evidence of activity being ‘processed’in certain regions of the brain at certain ‘frequencies’.Therefore, oscillatory behavior of human brain is oftenpartitioned in bands (covering a wide range of frequenciesdecaying as { f in power): ( i ) δ -band (0.5-3Hz); ( ii ) θ -band(3.5-7Hz); ( iii ) α -band (8-13Hz); ( iv ) β -band (14-30Hz); and( v ) γ -band (30-70Hz). Furthermore, there has been someevidence that activity in certain bands is associated withsensory registration, perception, movement and cognitiveprocesses related to attention, learning and memory [14]–[16]. Notwithstanding, such associations are often madeusing correlation and/or coherence techniques that onlycapture relationships between speciﬁc channels. But suchmethods are not able to assess the causality between signalsthat enables forecasting on the signal evolution, captured bya model-based representation as we propose to do hereafter.Different changes in the signals across different bandsare also used to interpret the event-related potentials (ERPs)in the EEG signals, i.e., variations due to speciﬁc events –see [4] for detailed analysis. In the context of sensory-motordata used in the current manuscript to validate the proposedmethodology, sensorimotor rhythms (SMRs) are often con-sidered. These represent oscillations that are recorded overthe posterior frontal and anterior parietal areas of the brain,i.e., over the sensorimotor cortices (see Figure 2). SMRsoccur mainly in the α -band (for sensors located on the top ofthe motor cortices), and on beta and lower gamma for thoseon the sensorimotor cortices [17]. Consequently, these havebeen used as a default feature for classiﬁcation of motor-related execution and only the imagination of performing amotor task. Notwithstanding, the spatiotemporal modeling issimultaneously captured through direct state-space modeling rontallobe temporallobe m o t o rc o r t e x s e n s o r y c o r t e x parietallobe occipitallobe Figure 2: Description of the brain functional regions and their corresponding location with respect to the EEG sensor cap.that enables the system’s understanding of the dynamics ofthe underlying process. In addition, it provides a new setof attributes that can be used to improve feature translation,i.e., classiﬁcation.

B. Spatiotemporal Fractional Model with Unknown Stimuli

A multitude of complex systems exhibit long-range (non-local) properties, interactions and/or dependencies (e.g.,power-law decays in memories). Example of such systemsincludes Hamiltonian systems, where memory is the resultof stickiness of trajectories in time to the islands of regularmotion [18]. Alternatively, it has been rigorously conﬁrmedthat viscoelastic properties are typical for a wide variety ofbiological entities like stem cells, liver, pancreas, heart valve,brain, muscles [18]–[26], suggesting that memories of thesesystems obey the power law distributions. These dynam-ical systems can be characterized by the well-establishedmathematical theory of fractional calculus [27], and thecorresponding systems could be described by fractionaldifferential equations [28]–[32]. However, it is until recentlythat fractional order system (FOS) starts to ﬁnd its strongposition in a wide spectrum of applications in differentdomains. This is due to the availability of computing anddata acquisition methods to evaluate its efﬁcacy in terms ofcapturing the underlying system states evolution.Subsequently, we consider a linear discrete timefractional-order dynamical model under unknown stimuli(i.e., inputs) described as follows: ∆ α x r k ` s “ Ax r k s ` Bu r k s y r k s “ Cx r k s , (1)where x P R n is the state, u P R p is the unknown input and y P R n is the output vector. The differencing operator ∆ is used as the discrete version of the derivative, for example ∆ x r k s “ x r k s ´ x r k ´ s . As evident, the differenceorder of has only one-step memory, and hence the classiclinear-time invariant models are not able to answer thelong-range memory property of several physiological signalsas discussed before. On the other hand, the expansion offractional-order derivative in the discretized setting [33] forany i th state p ď i ď n q can be written as ∆ α i x i r k s “ k ÿ j “ ψ p α i , j q x i r k ´ j s , (2)where α i is the fractional order corresponding to the i thstate and ψ p α i , j q “ Γ p j ´ α i q Γ p´ α i q Γ p j ` q with Γ p . q denoting thegamma function. We can observe from (2) that fractional-order derivate provide long-range memory by including all x i r k ´ j s terms. A quick comparison between the predictionaccuracy of fractional-order derivative model and linear-timeinvariant model is shown in Figure 3. The fractional-ordermodel can cope with sudden changes in the signals whilethe linear model cannot.We can also describe the system by its matrices tuple p α, A, B, C q of appropriate dimensions. The coupling matrix A represents the spatial coupling between the states acrosstime while the input coupling matrix B determines howinputs are affecting the states. In what follows, we assumethat the input size is always strictly less than the size of statevector, i.e., p ă n .Having deﬁned the system model, the system identiﬁca-tion, i.e., estimation of model parameters, from the givendata is an important step. It becomes nontrivial when wehave unknown inputs since one has to be able to differentiatewhich part of the evolution of the system is due to its

00 150 200 250 300 350 400 450 500

Sample ID -300-200-1000100200300 N e u r a l A c t i v i t y observedfractionallinear Figure 3: Comparison of observed experimental EEG datawith the prediction from fractional-order model and linear-time invariant model.intrinsic dynamics and what is due to the unknown inputs.Subsequently, the analysis part that we need to address isthat of system identiﬁcation from the data, as described next.

C. Data driven system identiﬁcation

The problem consists of estimating α , A and inputs t u r k su t ` T ´ t from the given limited observations y r k s , k “r t, t ` T ´ s , which due to the dedicated nature of sensingmechanism is same as x r k s and under the assumption thatthe input matrix B is known. The realization of B canbe application dependent and is computed separately usingexperimental data – as we explore later in the case study,see Section III. For the simplicity of notation, let us denote z r k s “ ∆ α x r k ` s with k chosen appropriately. Thepre-factors in the summation in (2) grows as ψ p α i , j q „ O p j ´ α i ´ q and, therefore, for the purpose of computationalease we would be limiting the summation in (2) to J values,where J ą is sufﬁciently large. Therefore, z i r k s can bewritten as z i r k s “ J ´ ÿ j “ ψ p α i , j q x r k ` ´ j s , (3)with the assumption that x r k s , u r k s “ for k ď t ´ . Usingthe above introduced notations and the model deﬁnition in(1), the given observations can be written as z r k s “ Ax r k s ` Bu r k s ` e r k s , (4)where e „ N p , Σ q is assumed to be Gaussian noise inde-pendent across space and time. For simplicity we would as-sume that Σ “ σ I . Also, let us denote the system matricesas A “ r a , a , . . . , a n s T and B “ r b , b , . . . , b n s T . Thevertical concatenated states and inputs during an arbitrarywindow of time as X r t ´ ,t ` T ´ s “ r x r t ´ s , x r t s , . . . , x r t ` T ´ ss T , U r t ´ ,t ` T ´ s “ r u r t ´ s , u r t s , . . . , u r t ` T ´ ss T respectively, and for any i th state we have Z i, r t ´ ,t ` T ´ s “r z i r t ´ s , z i r t s , . . . , z i r t ` T ´ ss T . For the sake of brevitywe would be dropping the time horizon subscript from theabove matrices as it is clear from the context.Since the problem of joint estimation of the differentparameters is highly nonlinear, we proceed as follows: ( i ) we estimate the fractional order α using the wavelet techniquedescribed in [34]; and ( ii ) with α known, the z in equation(3) is computed under the additional assumption that thesystem matrix B is known. Therefore, the problem nowreduces to estimate A and the inputs t u r k su t ` T ´ t . Towardsthis goal, we exploit the algorithm similar to expectation-maximization (EM) [35] from [12]. Brieﬂy, the EM algo-rithm is used for maximum likelihood estimation (MLE)of parameters subject to hidden variables. Intuitively, inour case, in Algorithm 1, we estimate A in the presenceof hidden variables or unknown unknowns t u r k su t ` T ´ t .Therefore, the ‘E-step’ is performed to average out theeffects of unknown unknowns and obtain an estimate of u , where due to the diversity of solutions, we control thesparsity of the inputs using the parameter λ . Subsequently,the ‘M-step’ can then accomplish MLE estimation to obtainan estimate of A .It was shown theoretically in [12] that the algorithm isconvergent in the likelihood sense. It should also be notedthat the EM algorithm can converge to saddle points asexempliﬁed in [35]. The Algorithm 1 being iterative is cru-cially dependent on the initial condition for the convergence.We will see in Section III that the convergence is very fastmaking it suitable for online estimation of parameters. Algorithm 1:

EM algorithm

Input: x r k s , k P r t, t ` T ´ s and B Output: A and t u r k su t ` T ´ t Initialize compute α using [34] and then z r k s . For l “ , initialize A p l q as a p l q i “ arg min a || Z i ´ Xa || repeat (i) ‘E-step’ : For k P r t, t ` T ´ s obtain u r k s as u r k s “ arg min u || z r k s ´ A p l q x r k s ´ Bu || ` λ || u || , where λ “ σ λ ;(ii) ‘M-step’ :obtain A p l ` q “ r a p l ` q , a p l ` q , . . . , a p l ` q n s T where a p l ` q i “ arg min a || ˜ Z i ´ Xa || , and ˜ Z i “ Z i ´ U b i ; l Ð l ` ; until until converge; D. Feature Translation (Classiﬁcation)

The unprocessed EEG signals coming from the sensorsalthough carrying vital information may not be directlyuseful for making the predictions. However, by representingthe signals in terms of parametric model p α, A q and theunknown signals as we did in the last section, we canain better insights. The parameters of the model beingrepresentative of the original signal itself can be used tomake a concise differentiation.The A matrix represents how strong is the particular signaland how much it is affecting/being affected by the othersignals that are considered together. While performing orimagining particular motor tasks, certain regions of the braingets more activated than others. Simultaneously, the inter-region activity also changes. Therefore, the columns of A which represent the coefﬁcients of the strength of a signalaffecting other signals can be used as a feature for classiﬁca-tion of motor tasks. In this work, we will be considering themachine learning based classiﬁcation techniques like logisticregression and Support Vector Machines (SVM) [36]. Theother classiﬁcation techniques c The choice of kernels wouldvary from simple ‘linear’ to radial basis function (RBF), i.e., k p x i , x j q “ e ´ γ p x i ´ x j q . The value of parameters of theclassiﬁer and possibly of the kernels are determined usingthe cross-validation. The range of parameters in the cross-validation are from ´ , . . . , for γ and ´ , . . . , for C “ { λ , both in the logarithmic scale, where λ is theregularization parameter which appears in optimization costof the classiﬁers [36].III. C ASE S TUDY

We will now illustrate the usefulness of thefractional-order dynamic model with unknown inputsin the context of classiﬁcation for BCI. We have consideredtwo datasets from the BCI competition [13]. The datasetswere selected on the priority of larger number of EEG T C C C C Z C C C T used sensors sensors asfeatures Figure 4: A description of the sensor distribution for themeasurement of EEG. The channel labels for the selectedsensors are shown for dataset-I. channels and number of trials for training. The availabledata is split into the ratio of and for the purposeof training and testing, respectively.

A. Dataset-I

We consider for validation the dataset labeled ‘datasetIVa’ from BCI Competition-III [37]. The recording wasmade using BrainAmp ampliﬁers and a channel elec-trode cap and out of which channels were used. Thesignals were band-pass ﬁltered between . and Hz andthen digitized at

Hz. For the purpose of this study wehave used the downsampled version at

Hz. The datasetfor subject ID ‘al’ is considered, and it contains trials.The subject was provided a visual cue, and immediately afterasked to imagine two motor tasks: (R) right hand, and (F)right foot.

1) Sensor Selection and Modeling:

To avoid the curse-of-dimensionality, instead of considering 118 sensors available,which implies the use of ˆ dynamics entries forclassiﬁcation, only a subset of sensors is considered.Speciﬁcally, only the sensors indicated in Figure 4 are se-lected on the basis that only hand and feet movements needto be predicted, and only a ˆ dynamics matrix and fractional order coefﬁcients are required for modeling thefractional order system. Besides, these sensors are selectedbecause they are close to the region of the brain known tobe associated with motor actions. number of iterations M ea n s qu a r e d e rr o r Figure 5: Mean squared error of Algorithm 1 as function ofnumber of iterations for dataset-I.

2) System Identiﬁcation and Validation:

The model pa-rameters p α, A q and the unknown inputs are estimated byusing the Algorithm 1. As mentioned before, the perfor-mance of the algorithm being iterative is dependent on thechoice of the initial conditions. For the current case, wehave observed that the algorithm converges very fast, andeven a single iteration is enough. The convergence of meansquared error in the M-step of Algorithm 1 for one samplefrom dataset-I is shown in Figure 5. This shows that thechoice of initial conditions are fairly good. The one stepand ﬁve step prediction of the estimated model is shownin Figure 6. It is evident that the predicted values for onestep very closely follow the actual values. There are someifferences between the actual and predicted values for ﬁvestep prediction. Sample ID -120-100-80-60-40-20 N e u r a l A c t i v i t y observedpredicted (a) Sample ID -120-100-80-60-40-20 N e u r a l A c t i v i t y observedpredicted (b) Figure 6: Comparison of predicted EEG state for the channel T using fractional-order dynamical model with unknowninputs. The one step and ﬁve step predictions are shown in(a) and (b) respectively.

3) Discussion of the results:

The most popular featuresused in the motor-imagery based BCI classiﬁcation relies onexploiting the spectral information. The features used are theband-power which quantiﬁes the energy of the EEG signalsin certain spectrum bands [38]–[40]. The motor cortex ofthe brain is known to be affecting the energy in the bandsnamely, α and β as discussed in Section II. While it happensthat unwanted signal energy is captured in these bands aswell while performing the experiments, for example neckmovement, other muscle activities etc. The ﬁltering of theseso called ‘unwanted’ components from the original signalis a challenging task using the spectral techniques as theyoften share the same band.We used a different approach to deal with these unknownunknowns in Section II. The magnitude spectrum of theoriginal EEG signal and on removing the estimated unknowninputs is shown in Figure 7. It should be observed that theoriginal signal and the signal upon removing the unknowninputs have signiﬁcant energy in the α and β bands. Theunknown inputs behave similar to the white noise which isevident from their Gaussian probability distribution (PDF)as shown in Figure 8. The inputs are not mean zero but theirPDF is centered around a mean value of approximately 58.The model parameters p α, A q are jointly estimated withthe unknown inputs using Algorithm 1, therefore the effect frequency (Hz) M a gn i t ud e S p ec t r u m original signalsignal without inputs Figure 7: Magnitude spectrum of the signal recorded bychannel T with and without unknown inputs. P r ob a b ili t y d e n s i t y Probability density of unknown inputs at channel: T7

Figure 8: Probability density function of the unknown inputsestimated from the signal recorded by channel T .of the inputs is inherently taken care of in the parameters.The structure of matrix A for two different labels is shownin Figure 9. We have used the sensors C and C whichare indexed as and , respectively in Figure 9. It isapparent from Figure 9 that the columns corresponding tothese sensors have different activity and hence deem to befair candidates for the features to be used in classiﬁcation.Therefore, the total number of features are ˆ “ . B. Dataset-II A channel EEG data from BCI Competition-III,labeled as ‘dataset IVb’ is taken [37]. The data acquisitiontechnique and sampling frequencies are same as in datasetof the previous subsection. The total number of labeled trials atrix A for label: 1 -0.3-0.2-0.100.10.20.30.40.5 matrix A for label: 2 -0.3-0.2-0.100.10.20.30.40.5 Figure 9: Estimated A matrix of size ˆ for the dataset-Iwith marked columns corresponding to the sensor index and used for classiﬁcation.are . The subjects upon provided visual cues were askedto imagine two motor tasks, namely (L) left hand and (F)right foot.

1) Sensor Selection and Modeling:

Due to the smallnumber of training examples, we have again resorted toselect the subset of sensors for the model estimation as wedid for the dataset-I in the previous section. Since the motortasks were left hand and feet, therefore we have selectedthe sensors in the right half of the brain and close to theregion which is known to be associated with hand and feetmovements as shown in Figure 11. We will see in the ﬁnalpart of this section that selecting sensors based on suchanalogy helps not only in reducing the number of features,but also to gain better and meaningful results. number of iterations M ea n s qu a r e d e rr o r Figure 10: Mean squared error of Algorithm 1 as functionof number of iterations for dataset-II.

2) System Identiﬁcation and Validation:

After perform-ing the estimation of the model p α, A q and the unknowninputs using the subset of sensors, we can see the similarperformance of the model on dataset-II as was in dataset-I. The convergence of mean squared error in the M-stepof Algorithm 1 for one sample from dataset-II is shownin Figure 10. The one step and ﬁve step predictions areshown in Figure 12. The model prediction follows closelythe original signal.

3) Discussion of the results:

The spectrum of the originalEEG signal at channel

CF C and its version with unknown CFC CFC CFC CDC C Z C C C T CCP CCP CCP CCP used sensors sensors asfeatures Figure 11: A description of the sensor distribution for themeasurement of EEG. The channel labels for the selectedsensors are shown for dataset-II.

600 700 800 900 1000 1100

Sample ID N e u r a l A c t i v i t y observedpredicted (a)

600 650 700 750 800 850 900 950 1000 1050 1100

Sample ID -20020406080 N e u r a l A c t i v i t y observedpredicted (b) Figure 12: Comparison of predicted EEG state for thechannel

CF C using fractional-order dynamical model withunknown inputs. The one step and ﬁve step predictions areshown in (a) and (b) respectively.inputs removed are shown in Figure 13. The spectrum showspeaks in the α and β bands. We witness a similar observationas before that both of the signals share the same bandand hence making it difﬁcult to remove the effects of the

10 15 20 25 30 35 40 45 50 frequency (Hz) M a gn i t ud e S p ec t r u m original signalsignal without inputs Figure 13: Magnitude spectrum of the signal recorded bychannel

CF C with and without unknown inputs. P r ob a b ili t y d e n s i t y Probability density of unknown inputs at channel: CFC2

Figure 14: Probability density function of the unknowninputs estimated from the signal recorded by channel

CF C .unwanted inputs. The unknown inputs resembles that ofwhite noise and the PDF is close to Gaussian distributionwith mean centered at around 48.The estimated A matrix from Algorithm 1 is shown inFigure 15 for two different labels. Out of all sensors, thesensors CCP and CCP which are indexed as and in the matrix have striking different activity. The columnscorresponding to these two sensors seem good choice forbeing the features for classiﬁcation. Therefore, the totalnumber of features are ˆ “ for this dataset. Next,we discuss the classiﬁcation accuracy for both the datasets. matrix A for label: 1 -0.500.511.5 matrix A for label: -1 -0.8-0.6-0.4-0.200.20.40.60.81 Figure 15: Estimated A matrix of size ˆ for the dataset-II with marked columns corresponding to the sensor index and used for classiﬁcation. lR linear SVM linear lR RBF SVM RBF0.80.820.840.860.880.90.920.940.96 A cc u r acy testtrain (a) lR linear SVM linear lR RBF SVM RBF0.60.650.70.750.80.850.90.951 A cc u r acy testtrain (b) Figure 16: Testing and training accuracies for various classi-ﬁers arranged in the order of classiﬁcation model complexityfrom left to right. The estimated accuracies for dataset-I anddataset-II are shown in (a) and (b) respectively.

C. Classiﬁcation Performance

Finally, the performance of the classiﬁers using the fea-tures explained for both the datasets are shown in Figure 16.The classiﬁers are arranged in the order of complexityfrom left to right with logistic regression (lR) and linearkernel being simplest and SVM with RBF kernel being mostcomplex. The performance plot parallels the classic machinelearning divergence curve for both the datasets. The accuracyor training data increases when increasing the classiﬁcationmodel complexity while it reduces for the testing data. Thisis intuitive because a complex classiﬁcation model wouldtry to better classify the training data. But the performanceof the test data would reduce due to overﬁtting upon usingthe complex models. We have very few training examplesto build the classiﬁer and hence such trend is expected. Theperformance of the classiﬁers for both the datasets are fairlyhigh which reﬂects the strength of the estimated features. Wecan see a . test accuracy for dataset-I and . fordataset-II. While these accuracies depend a lot on the cross-validation numbers and other factors like choice of classiﬁerwhich can be better tuned to get higher numbers.For both the datasets we have seen that the proposedmethodology efﬁciently extracts the features which servesas good candidate to differentiate the imagined motor move-ments. By implicitly removing the effects of the unwantedstimuli, the coefﬁcients of the coupling matrix A are shownto be sufﬁcient for discriminating relation between variousEEG signals which are indicative of the motor movements.The testing accuracies are high which indicate the goodquality of the extracted features.IV. C ONCLUSION

We have revisited the EEG-based noninvasive brain in-terfaces feature extraction and translation from a cyber-physical systems’ lens. Speciﬁcally, we leveraged spatiotem-poral fractional-order models that cope with the unknowninputs. The fractional-order models provide us the dynamiccoupling changes that rule the EEG data collected from thedifferent EEG sensors, and the fractional-order exponentscapture the long-term memory of the process. Subsequently,unknown stimuli is determined as the external input that leastconforms with the fractional-order model. By doing so, wehave ﬁltered-out from the brain EEG signals the unknowninputs, that might be originated in the deeper brain struc-tures. The presence of unknown stimuli is possibly the resultof the structural connectivity of the brain that crisscrossesdifferent regions, or due to artifacts originated in the muscles(e.g., eye blinking or head movement). As a consequence,the ﬁltered signal does not need to annihilate an entire bandin the frequency domain, thus keeping information aboutsome frequency regions of the signal that would be otherwiselost.We have shown how the different features obtained fromthe proposed model can be used towards rethinking the EEG-based noninvasive interfaces. In particular, two datasets usedin BCI competitions were used to validate the performanceof the methodology introduced in this manuscript, which iscompatible with some of the state-of-the-art performanceswhile requiring a relatively small number of training points.We believe that the proposed methodology can be usedwithin the context of different neurophysiological processesand corresponding sensing technologies. Future research will focus on leveraging additional information from theunknown inputs retrieved to anticipate speciﬁc artifacts andenable the deployment of neuro-wearables in the contextof real-life scenarios. Furthermore, the presented methodol-ogy can be used as an exploratory tool by neuroscientistsand physicians, by testing input and output responses andtracking their impact in the unknown inputs retrieved bythe algorithm proposed; in other words, one will be able tosystematically identify the origin and dynamics of stimulusacross space and time. Finally, it would be interesting toexplore the proposed approach in the closed-loop context,where the present models would beneﬁt from control-likestrategies to enhance the brain towards certain tasks orattenuate side effects of certain neurodegenerative diseasesor disorders. A

CKNOWLEDGMENT

The authors are thankful to the reviewers for their valuablefeedback. G.G. and P.B. gratefully acknowledge the supportby the U.S. Army Defense Advanced Research ProjectsAgency (DARPA) under grant no. W911NF-17-1-0076,DARPA Young Faculty Award under grant no. N66001-17-1-4044, and the National Science Foundation under CAREERAward CPS-1453860 support.R

EFERENCES [1] E. Strickland, “5 Neuroscience Experts Weigh in on ElonMusk’s Mysterious ‘Neural Lace’ Company,”

IEEE Spectrum ,Apr 2017.[2] ——, “Facebook Announces ”Typing-by-Brain” Project,”

IEEE Spectrum , Apr 2017. [Online]. Avail-able: https://spectrum.ieee.org/the-human-os/biomedical/bionics/facebook-announces-typing-by-brain-project[3] ——, “Silicon Valleys Latest Craze: Brain Tech,”

IEEE Spec-trum , Jun 2017. [Online]. Available: https://spectrum.ieee.org/biomedical/devices/silicon-valleys-latest-craze-brain-tech[4] J. Wolpaw and E. W. Wolpaw,

Brain-computer interfaces:principles and practice . OUP USA, 2012.[5] J. Mellinger, G. Schalk, C. Braun, H. Preissl, W. Rosen-stiel, N. Birbaumer, and A. K¨ubler, “An MEG-based brain–computer interface (BCI),”

Neuroimage , vol. 36, no. 3, pp.581–593, 2007.[6] R. Sitaram, A. Caria, R. Veit, T. Gaber, G. Rota, A. Kuebler,and N. Birbaumer, “Fmri brain-computer interface: a toolfor neuroscientiﬁc research and treatment,”

Computationalintelligence and neuroscience , 2007.[7] N. Weiskopf, K. Mathiak, S. W. Bock, F. Scharnowski,R. Veit, W. Grodd, R. Goebel, and N. Birbaumer, “Principlesof a brain-computer interface (bci) based on real-time func-tional magnetic resonance imaging (fmri),”

IEEE transactionson biomedical engineering , vol. 51, no. 6, pp. 966–970, 2004.8] S. M. Coyle, T. E. Ward, and C. M. Markham, “Brain–computer interface using a simpliﬁed functional near-infraredspectroscopy system,”

Journal of neural engineering , vol. 4,no. 3, p. 219, 2007.[9] J. R. Wolpaw, N. Birbaumer, D. J. McFarland,G. Pfurtscheller, and T. M. Vaughan, “Brain–computerinterfaces for communication and control,”

Clinicalneurophysiology , vol. 113, no. 6, pp. 767–791, 2002.[10] F. Lotte, “A tutorial on eeg signal-processing techniquesfor mental-state recognition in brain–computer interfaces,” in

Guide to Brain-Computer Music Interfacing . Springer, 2014,pp. 133–161.[11] N. Brodu, F. Lotte, and A. L´ecuyer, “Exploring two novelfeatures for eeg-based brain–computer interfaces: Multifrac-tal cumulants and predictive complexity,”

Neurocomputing ,vol. 79, pp. 87–94, 2012.[12] G. Gupta, S. Pequito, and P. Bogdan, “Dealing with unknownunknowns: Identiﬁcation and selection of minimal sensingfor fractional dynamics with unknown inputs,” to appear inAmerican Control Conference , 2018.[13] B. Blankertz, K. R. Muller, D. J. Krusienski, G. Schalk,J. R. Wolpaw, A. Schlogl, G. Pfurtscheller, J. R. Millan,M. Schroder, and N. Birbaumer, “The bci competition iii:validating alternative approaches to actual bci problems,”

IEEE Transactions on Neural Systems and RehabilitationEngineering , vol. 14, no. 2, pp. 153–159, June 2006.[14] E. Bas¸ar, C. Bas¸ar-Eroglu, S. Karakas¸, and M. Sch¨urmann,“Gamma, alpha, delta, and theta oscillations govern cogni-tive processes,”

International journal of psychophysiology ,vol. 39, no. 2, pp. 241–248, 2001.[15] A. K. Engel, P. Fries, and W. Singer, “Dynamic predictions:oscillations and synchrony in top-down processing,”

Naturereviews. Neuroscience , vol. 2, no. 10, p. 704, 2001.[16] G. Buzsaki,

Rhythms of the Brain . Oxford University Press,2006.[17] G. Pfurtscheller and C. Neuper, “Motor imagery and directbrain-computer communication,”

Proceedings of the IEEE ,vol. 89, no. 7, pp. 1123–1134, 2001.[18] T. McMillen, T. Williams, and P. Holmes, “Nonlinear mus-cles, passive viscoelasticity and body taper conspire to createneuromechanical phase lags in anguilliform swimmers,”

PLoScomputational biology , vol. 4, no. 8, p. e1000157, 2008.[19] Y. Kobayashi, H. Watanabe, T. Hoshi, K. Kawamura, andM. G. Fujie, “Viscoelastic and nonlinear liver modeling forneedle insertion simulation,” in

Soft Tissue BiomechanicalModeling for Computer Assisted Surgery . Springer, 2012,pp. 41–67.[20] C. Wex, M. Fr¨ohlich, K. Brandst¨adter, C. Bruns, and A. Stoll,“Experimental analysis of the mechanical behavior of theviscoelastic porcine pancreas and preliminary case study onthe human pancreas,”

Journal of the mechanical behavior ofbiomedical materials , vol. 41, pp. 199–207, 2015. [21] K. Wang, R. McCarter, J. Wright, J. Beverly, and R. Ramirez-Mitchell, “Viscoelasticity of the sarcomere matrix of skeletalmuscles. the titin-myosin composite ﬁlament is a dual-stagemolecular spring,”

Biophysical Journal , vol. 64, no. 4, pp.1161–1177, 1993.[22] T. M. Best, J. McElhaney, W. E. Garrett, and B. S. Myers,“Characterization of the passive responses of live skeletalmuscle using the quasi-linear theory of viscoelasticity,”

Jour-nal of biomechanics , vol. 27, no. 4, pp. 413–419, 1994.[23] T. C. Doehring, A. D. Freed, E. O. Carew, and I. Vesely,“Fractional order viscoelasticity of the aortic valve cusp: analternative to quasilinear viscoelasticity,”

Journal of biome-chanical engineering , vol. 127, no. 4, pp. 700–708, 2005.[24] E. Mac´e, I. Cohen, G. Montaldo, R. Miles, M. Fink, andM. Tanter, “In vivo mapping of brain elasticity in smallanimals using shear wave imaging,”

IEEE transactions onmedical imaging , vol. 30, no. 3, pp. 550–558, 2011.[25] S. Nicolle, L. Noguer, and J.-F. Palierne, “Shear mechanicalproperties of the spleen: experiment and analytical mod-elling,”

Journal of the mechanical behavior of biomedicalmaterials , vol. 9, pp. 130–136, 2012.[26] N. Grahovac and M. ˇZigi´c, “Modelling of the hamstringmuscle group by use of fractional derivatives,”

Computers& Mathematics with Applications , vol. 59, no. 5, pp. 1695–1700, 2010.[27] V. E. Tarasov,

Fractional dynamics: applications of fractionalcalculus to dynamics of particles, ﬁelds and media . SpringerScience & Business Media, 2011.[28] P. Bogdan, B. M. Deasy, B. Gharaibeh, T. Roehrs, and R. Mar-culescu, “Heterogeneous structure of stem cells dynamics:statistical models and quantitative predictions,”

Scientiﬁc re-ports , vol. 4, 2014.[29] M. Ghorbani and P. Bogdan, “A cyber-physical systemapproach to artiﬁcial pancreas design,” in

Proceedings ofthe ninth IEEE/ACM/IFIP international conference on hard-ware/software codesign and system synthesis . IEEE Press,2013, p. 17.[30] Y. Xue, S. Rodriguez, and P. Bogdan, “A spatio-temporalfractal model for a cps approach to brain-machine-body inter-faces,” in

Design, Automation & Test in Europe Conference& Exhibition (DATE), 2016 . IEEE, 2016, pp. 642–647.[31] Y. Xue, S. Pequito, J. R. Coelho, P. Bogdan, and G. J.Pappas, “Minimum number of sensors to ensure observabilityof physiological systems: A case study,” in

Communication,Control, and Computing (Allerton), 2016 54th Annual Aller-ton Conference on . IEEE, 2016, pp. 1181–1188.[32] Y. Xue and P. Bogdan, “Constructing compact causal math-ematical models for complex dynamics,” in

Proceedings ofthe 8th International Conference on Cyber-Physical Systems .ACM, 2017, pp. 97–107.[33] A. Dzielinski and D. Sierociuk, “Adaptive feedback controlof fractional order discrete state-space systems,” in

CIMCA-IAWTIC , 2005.34] P. Flandrin, “Wavelet analysis and synthesis of fractionalbrownian motion,”

IEEE Transactions on Information Theory ,vol. 38, no. 2, pp. 910–917, March 1992.[35] G. McLachlan and T. Krishnan,

The EM Algorithm andExtensions . John Wiley & Sons, New York, 1996.[36] K. P. Murphy,

Machine Learning: A Probabilistic Perspective .The MIT Press, 2012.[37] G. Dornhege, B. Blankertz, G. Curio, and K.-R. Mller,“Boosting bit rates in non-invasive EEG single-trial classi-ﬁcations by feature combination and multi-class paradigms,”

IEEE Trans. Biomed. Eng. , vol. 51, no. 6, pp. 993–1002, Jun2004.[38] A. Bashashati, M. Fatourechi, R. K. Ward, and G. E. Birch,“A survey of signal processing algorithms in braincomputerinterfaces based on electrical brain signals,”

Journal of NeuralEngineering , vol. 4, no. 2, p. R32, 2007.[39] N. Brodu, F. Lotte, and A. Lcuyer, “Comparative study ofband-power extraction techniques for motor imagery classiﬁ-cation,” in , April2011, pp. 1–6.[40] P. Herman, G. Prasad, T. M. McGinnity, and D. Coyle, “Com-parative analysis of spectral approaches to feature extractionfor eeg-based motor imagery classiﬁcation,”