Bi-Smoothed Functional Independent Component Analysis for EEG Artifact Removal
aa r X i v : . [ s t a t . M E ] J a n Vidal, Rosso and Aguilera , Preprint arXiv.org
P-spline smoothed functional ICA ofEEG data
Marc Vidal , Mattia Rosso , Ana M. Aguilera *For correspondence: [email protected] (M.V.); [email protected] (AM.A.) Present address: โ Department ofStatistics and O.R., Granada 18071,Spain; โก Department of Musicology,IPEM, Gent 9000, Belgium University of Granada; Ghent University
Abstract
We propose a novel functional data framework for artifact extraction and removal toestimate brain electrical activity sources from EEG signals. Our methodology is derived on the ba-sis of event related potential (ERP) analysis, and motivated by mapping adverse artifactual eventscaused by body movements and physiological activity originated outside the brain. A functionalindependent component analysis (FICA) based on the use of fourth moments is conducted on theprincipal component expansion in terms of B-spline basis functions. We extend this model setupby introducing a discrete roughness penalty in the orthonormality constraint of the functional prin-cipal component decomposition to later compute estimates of FICA. Compared to other ICA algo-rithms, our method combines a regularization mechanism stemmed from the principal eigendirec-tions with a discrete penalization given by the ๐ -order di๏ฌerence operator. In this regard, it allowsto naturally control high-frequency remnants of neural origin overlapping latent artifactual eigen-functions and thus to preserve this persistent activity at artifact extraction level. Furthermore, weintroduce a new cross-validation method for the selection of the penalization parameter whichuses shrinkage to asses the performance of the estimators for functional representations withlarger basis dimension and excess of roughness. This method is used in combination with a kurto-sis measure in order to provide the optimal number of independent components.The FICA modelis illustrated at functional and longitudinal dimensions by an example on real EEG data where asubject willingly performs arm gestures and stereotyped physiological artifacts. Our method canbe relevant in neurocognitive research and related ๏ฌelds, particularlly in situations where move-ment can bias the estimation of brain potentials. Keywords : Electroencephalography; Fourth order blind identi๏ฌcation; Functional data analysis; Functional independent com-ponent analysis; Karhunen-Loรจve expansion; Source separation
Human body is a complex self-regulatory system. As a result of physiological activity, some partic-ular organs and individual cells undergo changes in electrical potentials generating a bioelectricalsignal (Singh et al., 2012) which can be recorded, monitored, or processed in real-time for biomed-ical applications. Such signals are continuous in nature, but sampled in a discrete set of obser-vations with a temporal resolution which directly depends on the sampling rate of the recordingdevice. The higher the sampling frequency, the higher the number of observations per time unit,and the better the approximation to the local shape of the data.In the ๏ฌeld of neurophysiology, electroencephalography (EEG) represents one of the few tech-niques providing a direct measure of bio-electrical brain activity, as oscillations in excitability ofpopulations of cortical pyramidal cells (Wang, 2010) contribute to variations in the electrical poten-tials over the scalp. Oscillations are characterized by dominant intrinsic rhythms conventionallygrouped into frequency bands, which are by now validated as markers of several neurocognitive idal, Rosso and Aguilera , Preprint arXiv.org phenomena (Buzsรกki, 2006). However, despite the temporal resolution achievable with its highsampling rate, EEG is a technique that su๏ฌers from low signal-to-noise ratio. This is mainly dueto the fact that the layers of tissue dividing the electrodes from the cortex act as a natural ๏ฌlterattenuating genuine brain activity and mixing the sources: due to volume conduction, the activ-ity originating from each single dipole is picked up by electrodes at all scalp locations fading asa function of distance from the origin (Nunez and Srinivasan, 2006). Furthermore, the dominantbrain-related spectral features often overlap with artifactual activity in higher frequency bands(Castellanos and Makarov, 2006; Muthukumaraswamy, 2013), and particularly at lower frequen-cies most of the variance in the signal is explained by physiological sources outside the brain. Forthese reasons, analyzing EEG signals can ultimately be viewed as solving a source-separation prob-lem with the goal of estimating brain potentials of interest.Blind source separation (BSS) techniques such as independent component analysis (ICA) arecommonly used to address artifact detection and correction of EEG signals. The term ICA encom-passes a broad scope of algorithms and theoretical rudiments aligned to the assumption of inde-pendence of the underling sources. ICA might be generally de๏ฌned as an unsupervised statisti-cal tool used to isolate mutually independent components from a sample assumed to be gener-ated by a mixture of unknown marginal distributions. From the statistical perspective, it couldbe regarded as an extension of principal component analysis (PCA) that goes beyond the vari-ance patterns of the data introducing high order statistical measures such as kurtosis or negen-tropy. Since the ICA problem was framed in Herault and Jutten (1986), the technique has takenthe form of sophisticated algorithms with varying approaches. In fact, the study of the artifac-tual activity from observed multivariate EEG signals can be performed by a wide collection ofBSS algorithms. Among the most popular, we mention fastICA (Hyvรคrinen and Oja, 1997), Info-max (Bell and Sejnowski, 1995) or the joint approximate diagonalisation of eigenmatrices (JADE,Cardoso and Souloumiac, 1993). In this paper, however, we will tackle the separation of sourceartefacts using the fourth-order blind identi๏ฌcation (FOBI, Cardoso, 1989), which is a classical so-lution based on the decomposition of the kurtosis matrix or the weighted covariance. Our choiceis motivated by recent extensions of this method to the functional data paradigm. Indeed, theFOBI estimator has interesting properties (see for a review Nordhausen and Virta, 2019) and is ofeasy computation, which makes it a suitable method to explore and generalize ICA framework infunctional spaces. We will use this kind of data to prove the validity of the method at extractingphysiological artifacts.Functional data analysis (FDA) is a branch of modern statistics with active research in method-ological developments for sampling units in form of functions instead of vectors of measurementsas in multivariate analysis. In practice, however, curves are observed in a ๏ฌnite set of samplingpoints and thereby the ๏ฌrst step in FDA is to convert these values into a functional structurethat mimics the original (continuous) nature of the sample paths. As functional data is inherentlyin๏ฌnite-dimensional, generalizations of multivariate objects such as inverses become an importantissue which complicates the implementation of ICA in functional spaces. In this paper, we considerthe projection of the sample curves on a ๏ฌnite-dimensional space generated by a suitable basis offunctions in order to solve the ICA problem in functional terms.The use of functional data in brain imaging analysis has gaining notoriety in the last years de-spite the problems arisen in its application. These include a considerable computational cost dueto the large volume of data and the highly multicollinearity of electrode signals in regional densespatial sets. Moreover, the assumption of normality of the error term in the functional model (see1) should not be taken pragmatically as it can be biased due to sources of noise originated by non-neural agents (Tian, 2010). On the other end, data acquired from EEG might elicit a wide variety ofFDA methods, going from the estimation of smoothing sample curves to more advanced reductionand prediction techniques. An application of FDA to EEG data was proposed in Pokora et al. (2018)to study Auditory Evoked Potentials (AEPs) using cross distance measures on functional responsesand its derivatives approximated by cubic smoothing B-splines. Sche๏ฌer et al. (2018) introduced idal, Rosso and Aguilera , Preprint arXiv.org a novel hybrid principal component analysis (HPCA) for high-dimensional data data consisting of aPCA in the frequency domain along regional dimensions (electrode group) combined with a func-tional PCA in the longitudinal and functional dimension assuming, in the decomposition, the no-tion of weak separability among marginal covariances. This procedure is bene๏ฌcial in a sense thatit avoids collapsing sparse data in any of the considered dimensions. Similarly, Hasenstab et al.(2017) proposed a multidimensional functional PCA that preserves the singularity of the Event Re-lated Potentials (ERPs) over the longitudinal domain from its moving average. EEG data was alsoused in the context of functional linear regression to test a supervised version of functional PCA(Nie et al., 2018). As noted, most of the FDA methods used in EEG brain studies are focused on mod-elling data free of artifactual sources. However, the e๏ฌciency of FDA as a signal pre-processing toolin stages where data is contaminated by physiological artifacts remains, to the best of our knowl-edge, not yet been tested.During the past decade, neurocognitive research started to move towards ecological, mobileand interactive experimental scenarios (Gramann, 2014). In this context, full-body movementssuch as gait, trunk sway and arm gestures are likely to exacerbate the artefacts most commonly en-countered in neurocognitive research (e.g., blinks, ocular movements, temporo-mandibular mus-cular activity, sweat, etc.), and bring into the scenario high-amplitude mechanical artefacts due tocable movements. Given the need for overcoming these issues, we propose a framework whichaccounts for both the most stereotyped artifacts and the ones strictly related to body movement.In order to demonstrate its validity, we reproduced a typical experimental scenario where a hu-man participant had to perform full-arm movements synchronised to a periodic auditory stimulusduring an EEG recording. Arguably, what we provide here is a paradigmatic example wherein theresearcher needs to clean the signal from motion-related artefacts, while preserving activity gen-uinely related to perceptual and motor brain processes.Nevertheless, the main contribution of this work does not end with an ordinary estimation andremoval of these artifacts; we propose a funtional-valued denoising tool based on B-spline expan-sions that takes advantage of the FDA smoothing techniques to preserve high-frequency remnantsof neural origin overlapping latent artifactual sources. An earlier work that might resemble to ourmethod applies directly to the data the discrete wavelet transform (DWT) using di๏ฌerent levels ofdecomposition to smooth the approximation coe๏ฌcients in order to obtain ocular artifacts free ofneural data (Kelly et al., 2011). Further approaches combine the wavelet decomposition and ICAto denoise the captured artifactual components (Akhtar et al., 2012; Mahajan and Morshed, 2015).In spite of the obvious di๏ฌerences between both data kinds, our ICA model provides a componentestimation based on a P-spline smoothed approach (Eilers and Marx, 1996; see also Xiao, 2019) ,which has a lower computational cost and is mathematically simpler. Moreover, in the proposedmethod, smoothing is the primary property that prompts the FICA decomposition of the EEG signalrather than an auxiliary strategy used to correct the estimates. However, what di๏ฌerentiates ourmethod from others is that the decomposition is naturally regulated by the principal componenteigendirections and optimized by means of penalized estimators whereas in using wavelet decom-position this is decided on the basis of a frequency band features of the data or the components,as the case may be. The end user will ๏ฌnally appreciate how artifact extraction can be ๏ฌne-tunedby regulating a single smoothing parameter, making it intuitive to improve the outcomes by meansof a visual inspection of the independent component scores.The paper is organized as follows. In Section 2 we start by de๏ฌning the FICA framework from anin๏ฌnite-dimensional space perspective to further develop the theoretical foundations of our model.Section 3 introduces the novel regularized FICA decomposition from the basis expansion represen-tations of functional data. An innovative method for the selection of the model parameters basedon the normalized kurtosis of the independent component vectors is also proposed. To prove theperformance of our methods, in Section 4 we use real EEG data on a single and group-level ERPdesigns. We also provide guidelines and the procedure for artifact detection and removal to getclean EEG functional data. Finally, some concluding remarks and prospective research directions idal, Rosso and Aguilera , Preprint arXiv.org are presented in Section 6.
The extension of ICA to functional data has not yet received the attention nor the proli๏ฌc develop-ments of its predecessor, the functional principal component analysis (FPCA). A ๏ฌrst attempt to im-plement independent techniques for functional data was proposed in Peรฑa et al. (2014), where thekurtosis (FOBI) operator is de๏ฌned as an estimator over an approximation to a separable in๏ฌnite-dimensional Hilbert space. In this space setting, the independent component weight functions areexpected to be rougher as it does not contain functions that are pointwise convergent. Their ap-proach focuses on the classi๏ฌcation properties of the kurtosis operator, whose decomposition isassumed to have a form similar to the Fisher discriminant function.A version of functional ICA based on the FOBI method that can be regarded as an extensionof its multivariate counterpart was ๏ฌrst developed in Li et al. (2015). The distinctive aspect thatcharacterizes the model is that the ICA procedure starts from the Karhunen-Loรจve (K-L) expansion(Ash and Gardner, 1975, p. 37), which is a less rough version of the original sample space since itsteems from its optimality in the least-squared error sense. We extend this model, which through-out the paper is referred as functional ICA in terms of principal components or KL-FICA, endowingthe space with a di๏ฌerent geometrical structure given by a Sobolev inner product to control theroughness of the latent functions. In a sense, both approaches can be considered a re๏ฌnementof Peรฑa et al. (2014). More recently, Virta et al. (2020) proposed a version of the KL-FICA model formultivariate functional data using FOBI and JADE estimators.
Basic model setup
Let ๐ฅ ๐ = ( ๐ฅ ๐ , โฆ , ๐ฅ ๐๐ ๐ ) โค be a signal of ๐, ( ๐ = 1 , โฆ , ๐ ) components digitized at the sampling point ๐ก ๐๐ , ( ๐ = 1 , โฆ , ๐ ๐ ) . Consider that the sample data is observed with error, so that it can be modeledas ๐ฅ ๐๐ = ๐ ๐ ( ๐ก ๐๐ ) + ๐ ๐๐ (1)where ๐ ๐ is the ๐ -th functional trajectory of the signal and ๐ ๐๐ mutually independent measurementerrors with zero means. The functions ๐ , โฆ , ๐ ๐ are assumed to be independent and identically dis-tributed copies of a random functional variable ๐ in ๐ฟ ( ๐ ) , the separable Hilbert space of squareintegrable functions from ๐ to โ , endowed with inner product โจ โ , โ โฉ and the induced norm โ โ โ . Thor-ough the text, ๐ is assumed to have zero mean and ๏ฌnite fourth moments, i.e ๐ธ โ ๐ โ < โ , whichimplies that higher order operators are well de๏ฌned.The concept of independent components of a random vector cannot be immediately extendedto the case of Hilbert-valued random elements (functional data) due to the fact that a probabilitydensity function is not generally de๏ฌned in this context. Here, we use the de๏ฌnition of indepen-dence ๏ฌrst introduced by Gutch and Theis (2012); Li et al. (2015) in the ICA framework for in๏ฌnite-dimensional spaces, which basically states that a random functional variable has independent com-ponents if the coordinates obtained after projecting into a given orthonormal basis are indepen-dent variables. Then, the aim of FICA is to ๏ฌnd an operator ฮ , such that โจ ฮ ๐, ๐ ๐ โฉ , ( ๐ = 1 , โฆ , ๐ ) aremutually independent variables with { ๐ ๐ โถ ๐ โ ๐ } being a truncated orthonormal basis in ๐ฟ ( ๐ ) . Asfunctional data is not inherently Gaussian, our IC model is based on the assumption that if ๐ isapproximately represented by a truncated orthonormal basis, then it admits the FICA decomposi-tion. Otherwise, by considering ๐ prompted by a Gaussian process, a FPCA will su๏ฌce to obtainthe independent components (Ash and Gardner, 1975, p. 40). This IC model begs inevitably to thequestion of the basis choice. In our FICA approach, the sample curves are reconstructed using theK-L expansion, meaning that the chosen basis is provided by the decomposition of the covarianceoperator. idal, Rosso and Aguilera , Preprint arXiv.org
For an arbitrary function โ โ ๐ฟ ( ๐ ) and ๐ โ ๐ , we de๏ฌne the sample covariance operator as ๎ฏ ( โ )( ๐ ) = ๐ โ1 ๐ โ ๐ =1 โจ ๐ ๐ , โ โฉ ๐ ๐ ( ๐ ) = โจ ๐ถ ( ๐ , โ ) , โ โฉ , which induced by the covariance kernel of ๐ , ๐ถ ( ๐ , ๐ก ) = ๐ โ1 โ ๐๐ =1 ๐ ๐ ( ๐ ) ๐ ๐ ( ๐ก ) , ๐ก, ๐ โ ๐ , transforms โ intoanother function of ๐ฟ ( ๐ ) . Then, Mercerโs theorem provides the eigendecomposition ๐ถ ( ๐ , ๐ก ) = โ โ ๐ =1 ๐ ๐ ๐ ๐ ( ๐ ) ๐ ๐ ( ๐ก ) , denoting by { ๐ ๐ , ๐ ๐ } ๐ the positive sequence of eigenvalues in descending order and their associatedorthonormal eigenfunctions. As a remainder, observe that when the kernel is hermitian (symmet-ric) and positive-de๏ฌnite, the uniformly converging series expansions are obtained for both kerneland its associated operator. It follows that, for every ๐ก โ ๐ , we can approximately represent ๐ ๐ ( ๐ก ) by a truncated series of the K-L expansion ๐ ๐๐ ( ๐ก ) = ๐ โ ๐ =1 ๐ง ๐๐ ๐ ๐ ( ๐ก ) , where ๐ง ๐๐ = โจ ๐ ๐ , ๐ ๐ โฉ are zero mean random variables with var ( ๐ง ๐ ) = ๐ ๐ and cov ( ๐ง ๐ , ๐ง ๐ โฒ ) = 0 for ๐ โ ๐ โฒ .These variables are referred to as the principal components scores. Moreover, if the ๐ term of theK-L is optimally selected, the mean squared error is minimized (Ghanem and Spanos, 1991, p. 21),providing the best linear approximation to ๐ ๐ ( ๐ก ) .Our main assumption facts that the target functions can be found in the space spanned by the๏ฌrst ๐ eigenfunctions of ๎ฏ , as it is endowed with a second-order structure represented by the majormodes of variation of ๐ ๐ ( ๐ก ) . Thus, in such eigensubspace it is expected to gain some accuracy in theforthcoming results due to the attenuation of the higher oscillation modes corresponding to thesmall eigenvalues of ๎ฏ . In the following, we denote ๎ด ๐ = span { ๐ , โฆ , ๐ ๐ } the subspace spannedby the ๐ ๏ฌrst eigenfunctions of ๎ฏ , which form the chosen basis for our IC model. Without loss ofgenerality, ๎ด ๐ will be assumed to preserve the inner product in ๐ฟ ( ๐ ) .Most of the multivariate ICA methods require the standardization of the observed data withthe inverse square root of the covariance matrix in order to remove any linear dependencies andnormalize the variance along its dimensions. In in๏ฌnite-dimensional spaces, however, covarianceoperators are not invertible giving rise to an ill-posed problem. As long as our signal is representedin ๎ด ๐ , no regularization is needed and consequently, the inverse of the covariance operator can bewell de๏ฌned. Then, the ๏ฌrst step towards the estimation of the independent components consistsof a transformation of the K-L coe๏ฌcients (PCs) with respect to the usual Euclidean geometry. Sincestandardization is a particular case whitening (or sphering), we can generalize the procedure in theform of a whitening operator ฮจ that transforms a function in ๎ด ๐ into a standarized function on thesame space. This implies that ฮจ( ๐ ๐ ) = ฬ ๐ ๐ is a standardized functional variable whose covarianceoperator ๎ฏ ฬ ๐๐ naturally satis๏ฌes to be the identity inside the space.By analogy to the FPCA, it follows the decomposition of the FOBI operator (kurtosis), which isde๏ฌned as ๎ท ( โ )( ๐ ) = ๐ โ1 ๐ โ ๐ =1 โจ ฬ ๐ ๐๐ , ฬ ๐ ๐๐ โฉ โจ ฬ ๐ ๐๐ , โ โฉ ฬ ๐ ๐๐ ( ๐ ) = โจ ๐พ ( ๐ , โ ) , โ โฉ , (2)where ๐พ ( ๐ , ๐ก ) = ๐ โ1 โ ๐๐ =1 โโ ฬ ๐ ๐๐ โโ ฬ ๐ ๐๐ ( ๐ ) ฬ ๐ ๐๐ ( ๐ก ); ๐ , ๐ก โ ๐ denotes the FOBI kernel function of ฬ ๐ ๐ and โ , thefunction in ๎ด ๐ to be transformed. As in Li et al. (2015), we assume the FOBI operator to be positive-de๏ฌnite, hermitian and equivariant such that there exists the eigendecomposition ๐พ ( ๐ , ๐ก ) = ๐ โ ๐ =1 ๐ผ ๐ ๐ ๐ ( ๐ ) ๐ ๐ ( ๐ก ) , ๐ = 1 , โฆ , ๐, where { ๐ผ ๐ , ๐ ๐ } ๐ is a positive sequence of distinct eigenvalues and related eigenfuntions. idal, Rosso and Aguilera , Preprint arXiv.org
With this, we are now ready to de๏ฌne the independent components of ๐ ๐๐ as generalized lin-ear combinations with maximum kurtosis given by ๐ ๐๐,ฬ ๐ = โจ ฬ ๐ ๐๐ , ๐ ๐ โฉ . Challenging questions arise onhow the Karhunen-Loรจve theorem might be applied in this context. Intuitively, we note that thisprocedure leads to the expansion ฬ ๐ ๐๐ ( ๐ก ) = โ ๐๐ =1 ๐ ๐๐,ฬ ๐ ๐ ๐ ( ๐ก ) which can also be approximated in termsof ๐ eigenfuntions ๐ ๐ of interest, e.g. those associated with the independent components with themost extreme kurtosis values. Under mild conditions, this problem was solved in Li et al. (2015);Virta et al. (2020) by choosing ๐ = ๐ . However, there are other possibilities, such as consider ๐ < ๐ or ๐ ๐ as a basis of projection for either ๐ , ๐ ๐ or ฬ ๐ ๐ , in view of the fact that it preserves the four-orderstructure of the standardized data. In our EEG study, we propose to project the original functionson the basis { ๐ , โฆ , ๐ ๐ } to discern artifactual patterns. We then subtract the space generated bya basis of selected artifactual eigenfunctions from the original curves to obtain a new sample thatcan be regarded as an estimate of the genuine brain activity. Silverman (1996) introduced a method that combines the use of the d -order di๏ฌerential opera-tor to control the roughness of the weight functions in order to estimate smooth (or regularized)functional principal components. By this heuristic, it incorporates a ๏ฌxed roughness penalty intothe orthonormality constraint between principal components, unlike other approaches that pe-nalize the variance (Rice and Silverman, 1991) or the covariance operator (Yao et al., 2005). InAguilera and Aguilera-Morillo (2013b) two alternative versions of smoothed FPCA are discussed,taking advantage of a discrete penalization in terms of adjacent B-spline coe๏ฌcients (P-splinepenalty). In a sense, the smoothed functional ICA considered here is based on the second FPCAversion in Aguilera and Aguilera-Morillo (2013b) which incorporates the P-spline penalty in the or-thonormality constraint.In order to estimate the P-spline smoothed PCs, we assume next that the sample paths be-long to a ๏ฌnite analogue of the Hilbert space ๐ฟ ( ๐ ) spanned by a basis { ๐ ( ๐ก ) , โฆ , ๐ ๐ ( ๐ก )} . Each func-tion of the sample can be expanded as ๐ ๐ ( ๐ก ) = โ ๐๐ =1 ๐ ๐๐ ๐ ๐ ( ๐ก ) , where ๐ ๐๐ are random coe๏ฌcients. Inmatrix form, ๐ = ๐จ ๐, where ๐จ is the coe๏ฌcient matrix ๐จ = ( ๐ ๐๐ ) โ โ ๐ ร ๐ associated to the basis ๐ = ( ๐ , โฆ , ๐ ๐ ) โค , with ๐ = ( ๐ , โฆ , ๐ ๐ ) โค . For the rest of this section, we consider that the samplecurves are expanded in terms of B-spline basis functions.Under the model (1), the basis coe๏ฌcients of sample curves can be found by least squaresapproximation minimizing the Mean Squared Error (
๐ผ๐๐ด ) for each ๐ , i. e., ( ฬ๐ ๐ , โฆ , ฬ๐ ๐๐ ) โค = argmin ๐ ๐ โ ๐ =1 { ๐ฅ ๐๐ โ ๐ โ ๐ =1 ๐ ๐๐ ๐ ๐ ( ๐ก ๐๐ ) } , and thus, the estimate of ๐ ๐ = ( ๐ ๐ , โฆ , ๐ ๐๐ ) โค that minimizes the ๐ผ๐๐ด is ฬ๐ ๐ = ( ๐ฝ โค ๐ ๐ฝ ๐ ) โ1 ๐ฝ โค ๐ ๐ ๐ , where ๐ฝ ๐ = { ๐ ๐ ( ๐ก ๐๐ )} โ โ ๐ ๐ ร ๐ . For general guidance on both basis selection and its order, we referethe reader to Chapters 3 and 4 in Ramsay and Silverman (2005). Although in this paper it is as-sumed a non-penalised B-spline basis to approximate the functional representations of the data,Aguilera and Aguilera-Morillo (2013a) give a detailed account of how to solve it using di๏ฌerentroughness penalty approaches (continuous and discrete) for estimating curves in terms of B-splinebasis.The truncated K-L expansion is generated by the ๏ฌrst q -eigenfunctions ( ๐ โค ๐ ) of the covarianceoperator, which are given in terms of the B-spline basis as ๐ ( ๐ก ) = ๐ โ ๐ =1 ๐ ๐ ๐ ๐ ( ๐ก ) = ๐ ( ๐ก ) โค ๐ , with ๐ = ( ๐ , โฆ , ๐ ๐ ) โค being its vector of basis coe๏ฌcients. The discrete P-spline roughness penaltyfunction is de๏ฌned by ๐ฟ๐ด๐ฝ ๐ ( ๐ ) = ๐ โค ๐ท ๐ ๐ , where ๐ท ๐ is a matrix of penalties given by ๐ท ๐ = ( ๐ซ ๐ ) โค ๐ซ ๐ , idal, Rosso and Aguilera , Preprint arXiv.org with ๐ซ ๐ being the matrix representation of the ๐ -order di๏ฌerence operator. Throughout the paperit is usually assumed a penalty order of ๐ = 2 which is equivalent to ๐ โค ๐ท ๐ = ( ๐ โ 2 ๐ + ๐ ) + โฏ + ( ๐ ๐ โ2 โ 2 ๐ ๐ โ1 + ๐ ๐ ) , so that the di๏ฌerence matrix ๐ซ has the form ๐ซ = โกโขโขโขโขโฃ โฏ โฏ โฏโฎ โฎ โฎ โฎ โฑ โคโฅโฅโฅโฅโฆ . According to Aguilera and Aguilera-Morillo (2013b); Silverman (1996), the weight functions ๐ are de-termined by maximizing var โจ ๐ , ๐ โฉ subject to othonormality with respect to the Sobolev type innnerproduct given by โจ ๐, โ โฉ ๐ = โจ โ, ๐ โฉ + ๐ ๐ โค ๐ท ๐ ๐ , with ๐ and ๐ being the vectors of basis coe๏ฌcients of thefunctions โ ( ๐ก ) and ๐ ( ๐ก ) , respectively. This is equivalent to maximize the penalized sample variancede๏ฌned by var โจ ๐ , ๐ โฉโจ ๐ , ๐ โฉ + ๐ ร ๐ฟ๐ด๐ฝ ๐ ( ๐ ) = ๐ โค ๐ฏ๐บ ๐จ ๐ฏ ๐๐ โค ( ๐ฏ + ๐ ๐ท ๐ ) ๐ , (3)where ๐ฏ = ( โจ ๐ ๐ , ๐ ๐ โฒ โฉ ) , ( ๐, ๐ โฒ = 1 , โฆ , ๐ ) , ๐บ ๐จ = ๐ ๐จ โค ๐จ and ๐ โฅ is a penalty parameter that controls thetrade-o๏ฌ between maximizing the sample variance and the strength of the penalty.Because B-spline basis are non-orthogonal with respect to the usual ๐ฟ geometry (inner prod-uct), we can apply Choleski factorization of the form ๐ณ๐ณ โค = ๐ฏ + ๐ ๐ท ๐ in order to ๏ฌnd a non-singularmatrix that allows us to operate in terms of the B-spline geometrical structure induced into โ ๐ .Then, the solution leads to an eigenproblem of a Hermitian matrix ๐ณ โ1 ๐ฏ๐บ ๐จ ๐ฏ ๐ณ โ1 โค ๐ ๐ = ๐ ๐ ๐ ๐ , (4)to compute the coe๏ฌcients of the weight functions as ๐ ๐ = ๐ณ โ1 โค ๐ ๐ , and renormalized so that ๐ โค ๐ ๐ฏ ๐ ๐ =1 . The ๐ -th smoothed principal component is then obtained as ๐ ๐ = ๐จ ๐ฏ ๐ ๐ . Under this framework,the multivariate PCA of ๐จ ๐ฏ ๐ณ โ1 โค in โ ๐ and the P-spline smoothed FPCA of ๐ ( ๐ก ) in ๐ฟ ( ๐ ) are equivalent(see section 4 in Aguilera and Aguilera-Morillo, 2013a).Having estimated the weight functions coe๏ฌcients and principal components scores, assumenext that the smooth K-L expansion is truncated at the q -term, e.g. we may choose ๐ = ๐ . Then,the vector of sample curves is given by ๐ ๐ ( ๐ก ) = ๐๐ ( ๐ก ) , where ๐ = ( ๐ง ๐๐ ) โ โ ๐ ร ๐ is the matrix ofprincipal components coe๏ฌcients (scores) with respect to the basis of smooth PC weight functions ๐ ( ๐ก ) = ( ๐ ( ๐ก ) , โฆ , ๐ ๐ ( ๐ก )) โค . Following the ICA pre-processing steps, we ๏ฌrst standardize the approximated K-L curves de๏ฌn-ing the whitening operator as ฬ ๐ ๐ ( ๐ก ) = ฮจ{ ๐ ๐ ( ๐ก )} = ฬ๐๐ ( ๐ก ) , with ฬ๐ = ๐ ๐บ โ ๐ being the matrix of stan-dardized PCs and ๐บ โ ๐ = โ ๐ ( ๐ โค ๐ ) โ , the inverse square root of the covariance matrix of ๐ . Asthe described whitening transformation is essentially an orthogonalization of the probabilistic partof ๐ ๐ , now the matrix ฬ๐ โ โ ๐ ร ๐ satisfy ๐บ ฬ๐ = ๐ฐ ๐ .Second, the diagonalization of the FOBI operator (expresssion 2) of the standardized K-L curves ฬ ๐ ๐ ( ๐ก ) leads to the diagonalization of the FOBI matrix of the standardized PCs ฬ๐ as ๐บ , ฬ๐ ๐ ๐ = ๐ผ ๐ ๐ ๐ , ๐ = 1 , โฆ , ๐, (5)where ๐บ , ฬ๐ = ๐ โ ๐๐ =1 โโ ฬ๐ ๐ โโ ฬ๐ ๐ ฬ๐ โค ๐ = ๐ ฬ๐ โค ๐ซ ฬ๐ ฬ๐ โ โ ๐ ร ๐ , with ๐ซ ฬ๐ = diag( ฬ๐ ฬ๐ โค ) , and ฬ๐ ๐ being the columnvector ๐ ร 1 with the ๐ -th row of the matrix ฬ๐ . This means that the P-spline smoothed KL-FICA of ๐ ( ๐ก ) is obtained from the multivariate ICA of ๐ in โ ๐ . Expression (5) is not restricting to assume that ๐บ , ฬ๐ is uniquely determined, in fact, severaldi๏ฌerent de๏ฌnitions has been proposed since the classical formulation in Cardoso (1989). It is alsoworthy to note the kurtosis matrix in Kollo (2008), ๐บ , ฬ๐ = ๐ โ1 ๐ โ ๐ =1 ฬ๐ ๐ โค ๐ ๐ ฬ๐ โค ๐ ฬ๐ ๐ ฬ๐ โค ๐ + ( ๐ + 2) ๐ฐ ๐ , idal, Rosso and Aguilera , Preprint arXiv.org which includes all mixed fourth moments by incorporating all-one vectors, or the matrix proposedin Loper๏ฌdo (2017) based on the dominant eigenpair of the fourth standardized moment. ThisFOBI extensions will be discussed in a future paper.This way, the KL-FICA decomposition of ๐ is already obtained. The IC weight functions are nowexpressed as ๐ ๐ ( ๐ก ) = โ ๐๐ =1 ๐ข ๐๐ ๐ ๐ ( ๐ก ) , ( ๐ = 1 , โฆ , ๐ ) , where the coe๏ฌcients vectors ๐ ๐ are the eigenvectorsof the prede๏ฌned kurtosis matrix. Then, the independent components are obtained as ๐ป ๐,ฬ ๐ = ฬ๐๐ ๐ . Finally, the operator ฮ de๏ฌning the FICA model is given by ฮ( ๐ ๐๐ ) = ๐ โค ๐ผ โค ๐บ โ1โ2 ๐ ๐ ๐ , with ๐ ๐ being thecolumn vector ๐ ร 1 with the ๐ -th row of the matrix ๐ . Choosing ๐ according to the kurtosis excess of the coordinate vectors The problem concerning the estimation of the independent component curves lies in the fact of๏ฌnding an optimal truncation point of the K-L expansion and in an appropriate selection of thepenalty parameter. When ๐ approaches to ๐ , more of the higher oscillation modes of the standard-ized sample are induced in the estimation. Otherwise, we are denoising the weight functions fromthe fourth-order structure of the data. From this perspective, it is desirable to increase the valueof ๐ such that the latent functions from the whitened space can be captured (Virta et al., 2020).Observe that this kind of regularization is not exactly the same as the one providing the P-splinepenalization of the roughness of the weight functions. Attenuating the higher frequency compo-nents of the FPCA model does not necessarily a๏ฌect an entire frequency bandwidth of the data.Thus, if the original curves are observed with independent error, it may overlap the estimationof the weight functions. In this context, smoothing would be appropriate. Once the value of ๐ ischosen, we should examine those components with kurtosis excess, contrary to the FPCA wherethe components associated to large eigenvalues are considered.We next propose a new method to approach ๐ de๏ฌning a fourth-moment measure which ex-presses the degree of kurtosis excess on a given independent component coordinate space. As-sume this space to be ๐ ๐๐,ฬ ๐ = โจ ฬ ๐ ๐๐ , ๐ ๐ โฉ , i.e. the projection of the standardized original sample curveson the FOBI basis estimated from the FICA decomposition of ๐ . In this independent setting, seemsmore reasonable to evaluate the non-gaussianity of the component vectors as they provide themost direct eigenfunction contribution to the original sample. Then, to calculate the degree ofkurtosis excess in ICs, we de๏ฌne a fourth-moment measure given by ๐บ๐ด ๐ = โโโ{ kurt ( ๐ป ๐,ฬ ๐ )} ๐๐ =1 โโโ , where ๐ป ๐,ฬ ๐ is the ๐ -th IC and kurt( ๐ป ๐,ฬ ๐ ) = ๐ โ1 โ ๐๐ =1 {( ๐ ๐๐,ฬ ๐ โ ๐ ๐,ฬ ๐ ) โ ๐ ( ๐ ๐,ฬ ๐ ) } โ3 , which gives the normalizedexcess of kurtosis for each IC. Then the value of ๐ is selected according to argmax <๐ โค ๐ ( ๐บ๐ด ๐ ) , (6)for ๐ -FICA decompositions of ๐ . In our EEG example, (6) is iterated until the velocity of the eigenval-ues ฮ { ๐ , ๐ , โฆ , ๐ ๐ } associated to the FPCA of ๐ ceases locally to increase in the neighbourhood ofthe dominant eigenvalue. Velocity ๏ฌuctuations can occur in the exponential decay of ๐ ๐ , meaningthat, asymptotic stability is not necessarily being reached using this criterion. We do ๏ฌnd, however,that this is a way of exploring independence in the high variability structure of our data, but it alsogoes to ensure that the FICA decomposition induces enough independent-part of the model toseparate the latent functions without losing accuracy in its estimation. In analysing EEG signals,this entails a major e๏ฌectiveness in reducing the artifactual content to a few eigenfunctions, par-ticularly for low-frequency physiological activity such as blinks and motor artifacts. The presenceof other high-frequency muscular artifacts, however, poses the researcher in a more challengingsituation.The choice of an appropriate truncation point should be seen as a measure to improvethe accuracy of the estimation of those artifacts as to preserve modes of variability related brainactivity rhythms. We believe that, instead of adjusting ๐ to larger values of cumulated variance, aniterative FICA process to scale artifact removal may be considered to solve the problem. idal, Rosso and Aguilera , Preprint arXiv.org
Penalty parameter selection
Leave-one-out cross validation ( ๐ฒ๐ ) is generally used to select the penalty parameter in order toachieve a desirable degree of smoothness of the weight functions. In a more explicit and con-densed form, the ๐ฒ๐ procedure in our model lies in ๏ฌnding a value of ๐ that minimizes ๐ฒ๐ ๐ ( ๐ ) = 1 ๐ ๐ โ ๐ =1 โโโ ๐ ๐ โ ๐ ๐ (โ ๐ ) ๐ โโโ , (7)where ๐ ๐ (โ ๐ ) ๐ = โ ๐๐ =1 ๐ง (โ ๐ ) ๐๐ ๐ (โ ๐ ) ๐ is the K-L representation of the ๐ -th curve ๐ ๐ in terms of the ๐ ๏ฌrst prin-cipal components by leaving out it in the estimation process. This method can be combined with(6) so that once ๐ is choosen for each ๏ฌxed ๐, then the optimum ๐ is the one that maximizes ๐บ๐ด ๐ .Here, an important remark has to be made about the regularization of the weight functions.Castellanos and Makarov (2006) and references therein (see section 2.6) discuss about the relia-bility of the estimated artifactual sources as they are assumed to contain traces of independentleaked cerebral activity. Such assumption complicates things when oscillations related to brainrhythms are yet observable on the estimated IC weight functions. Moreover, we found that crossvalidation was not sensitive for a reasonably large basis dimension, forcing us to reformulate thestrategy. To address this issue, the penalty parameter might be subjectively chosen to the suitabledegree of smoothness, however, this can lead in a distortion and poor extraction of the artifactualsources. For the results presented in this paper, we propose a novel adaptation of cross validationwhich consists in replacing (7) by: ๐ฑ๐ฒ๐ ๐ ( ๐ ) = 1 ๐ ๐ โ ๐ =1 โโโ ๐ ๐ ; ๐ (โ ๐ ) ๐ โ ๐ ๐ ; ๐ + ๐ (โ ๐ ) ๐ โโโ , (8)where ๐ ๐ ; ๐ (โ ๐ ) ๐ is a smoothed K-L representation for some ๐ and ๐ > a value that increases thesmoothing in the second term of the norm, assume ๐ = 0 . . For a ๏ฌxed ๐ , (8) is iterated for a set of ๐ โs to ๏ฌnd which minimizes ๐ฑ๐ฒ๐ ๐ ( ๐ ) and provide the choice of the penalty parameter. Similarly asin (7), it can be combined with (6) to select the most suitable ๐ via maximizing ๐บ๐ด . We call this method baseline cross validation , as it operates across di๏ฌerent K-L reconstructionsof ๐ ๐ for a given baseline penalty parameter and a ๏ฌxed ๐ . This approach is more versatile and par-ticularly useful when the original curves are extremely rough and have been approximated witha larger basis dimension. In addition, for a given ๐ it lets score more than one ๐ as a result of thevarious relative minima it produces. The intuition behind baseline cross validation is that there areseveral levels of smoothing to endow the estimator with the ability for predictive modelling. Thus,the selection of the value that minimizes it is a merely practical matter in this work. The implemen-tation of the method for a larger basis order using shrinkage estimators is given in Section 4. Letus remind that no categorical rules are provided here for the selection of ๐ as the physiologicalsigni๏ฌcance of smoothing is not tested in this paper; this is a matter for a future research.Further complication arises in evaluating the relationship between the kurtosis excess and theselected penalty parameter, from which ๐บ๐ด for some ๐ is expected to be optimized with respect to ๐ = 0 . In general, we advocate to dismiss smoothing if ๐บ๐ด diminishes, as it would represent thatreguralization is naturally optimized in the sense of the principal eigendirectons, and thereforethe induced roughness may be attributable to the independent part of the model. In other words,the latent functions strongly depend on the induced noise to shape a more accurate independentspace structure. However, this rule will be relaxed in order to estimate low-frequency artifactualcurves free of persistent brain activity, even though it does come with the price of losing otherinteresting artifacts. We now discuss the e๏ฌcacy of our method in estimating artifactual functions on real EEG data.A ๏ฌrst use case is proposed in the context of event-related potentials (ERPs) analysis, where the idal, Rosso and Aguilera , Preprint arXiv.org researcher preferably deals with a high number of short time series aligned to some event. Webegin this section by describing the data and the method to reconstruct the functional form of thesample paths. Then, we analyze distinct short recordings containing common artifacts of inducedmovements and propose a procedure for artifact detection, correction and subtraction. The ordi-nary and penalized KL-FICA are compared. Finally, we propose an automated method for a groupof functional ERPs.
Materials and experimental design
EEG data were recorded across di๏ฌerent conditions from a single healthy subject (male, 35 yearsold) with a 64-channels eegoโขmylab system, in the Art Science and Interaction Lab (ASIL) of GhentUniversity (Belgium). In a ๏ฌrst session, the subject was sitting on a comfortable chair in front of atable and instructed to perform the following classes of self-paced movements in separate single-trials of 3 seconds: nodding, hand-tapping with a wide arm movement starting from the shoulder,eye-blinking and chewing. In a second session, the subject was instructed to tap his hand on thetable synchronizing with a steady auditory stimulus in one condition, and to listen to the same stim-ulus in a condition without any movement involved. We recorded 100 trials per condition, dividedin randomized blocks of 25 trials. The stimulus period was 750ms. Movements were intentionallyexaggerated, to maximize eventual movement-related artifacts.Our purpose was two-fold. With the ๏ฌrst recording in absence of sensory stimulation, we aimedat isolating the stereotyped artifacts, and show how we can estimate and selectively remove themfrom heavily contaminated signal. In the second recording, we reproduced a minimal form ofexperimental design, contrasting a condition of listening while moving with a baseline condition oflistening while sitting still. Disposing of a baseline, we could directly compare the outcome of ourcleaning procedure with an uncontaminated experimental situation. Results will be shown on aselection of individual contaminated segments and on the grand-average ERPs for each condition.Any cognitive interpretation of the di๏ฌerence across conditions is beyond the scope of the presentwork.
Methods
Pre-processing and general parameter tuning.
All recordings were performed with a sampling rate of 1 kHz, so for a trial length of 3 secondswe have sampling points. The signal was high-pass and low-pass ๏ฌltered using Butterworth๏ฌlters (cut-o๏ฌ at 0.5 and 30 Hz, order 4 and 6 , respectively). An additional notch ๏ฌlter was appliedfor suppressing the 50 Hz power-line noise. For all the trials of the second recording, the ๏ฌlteredtime series were baseline-corrected by subtracting the average of the -200 ms to 0 ms pre-stimulusfrom each time point.Let ๐ ๐ ( ๐ก ๐๐ ) denote the EEG potentials for the signal component ๐, ( ๐ = 1 , โฆ , at time point ๐ก ๐๐ , ( ๐ =1 , โฆ , for each one single trial and trials whose FICA decompositions are conductedindependently. In reconstructing the functional form of the sample paths, we sough a less smooth๏ฌtting, so we could mimic the brain potential ๏ฌuctuations. Accordingly, a basis of cubic B-splinefunctions of dimension ๐ = 230 was ๏ฌtted to all signal components minimizing the mean squarederror of the estimation for the B-spline coe๏ฌcients to a negligible value.Since we require a basis dimension greater than the number of signal components (samplesize), a shrinkage covariance estimator (Schรคfer and Strimmer, 2005) is considered to compute ๐บ ๐จ in the PCA algorithm. This method guarantees positive de๏ฌniteness and consequently an estima-tion of the higher and important eigenvalues not biased upwards. The same strategy is used forbaseline cross-validation. Recall the quadratic distances in (8). For the ๐ -th trial, this distances in
10 of 18 idal, Rosso and Aguilera , Preprint arXiv.org terms of basis functions can be expressed as โโโ ๐ ๐ ; ๐ (โ ๐ ) ๐ โ ๐ ๐ ; ๐ + ๐ (โ ๐ ) ๐ โโโ = โซ ๐ [ ๐ ๐ ; ๐ (โ ๐ ) ๐ ( ๐ก ) โ ๐ ๐ ; ๐ + ๐ (โ ๐ ) ๐ ( ๐ก ) ] ๐๐ก == โซ ๐ [ ๐ โ ๐ =1 ๐ง ๐ (โ ๐ ) ๐๐ ๐ โ ๐ =1 ๐ ๐ (โ ๐ ) ๐๐ ๐ ๐ ( ๐ก ) โ ๐ โ ๐ =1 ๐ง ๐ + ๐ (โ ๐ ) ๐๐ ๐ โ ๐ =1 ๐ ๐ + ๐ (โ ๐ ) ๐๐ ๐ ๐ ( ๐ก ) ] = โซ ๐ [ ๐ โ ๐ =1 ๐ ๐๐ ๐ ๐ ( ๐ก ) ] ๐๐ก = ๐ โค ๐ ๐ฏ ๐ ๐ , where ๐ ๐ = ( ๐ ๐ , โฆ , ๐ ๐๐ ) โค . Next, the matrix ๎ = ( ๐ ๐๐ ) โ โ ๐ ร ๐ is reconstructed via shrinkage. That is,๏ฌrst we compute cov ๐ ( ๐ถ ) where cov ๐ is the shrinkage covariance estimator, then we apply Choleskydecomposition of the form ๐ณ๐ณ โค = cov ๐ ( ๎ ) . Finally, the basis coe๏ฌcients of the reconstructed resid-ual functions are given by ๎ ฬ๐ ๐ = ๎ ๐ณ โ1 โค ๐ ๐ , and consequently now ๐ฑ๐ฒ๐ ( ๐ ) ๐ = 1 ๐ ๐ โ ๐ =1 โโโ ๐ ๐ ; ๐ (โ ๐ ) ๐ โ ๐ ๐ ; ๐ + ๐ (โ ๐ ) ๐ โโโ = ฬ๐ โค ๐ ๐ฏ ฬ๐ ๐ . The baseline cross-validation method combined with shrinkage estimators has resulted to be sen-sitive at larger basis dimension, but also provides insights within di๏ฌerent levels of roughness ofthe sample curves at each ๐ . For the present application, we will take the value that minimizes ๐ฑ๐ฒ๐ ( ๐ ) ๐ for di๏ฌerent values of ๐ and use ๐บ๐ด in its optimization. Artifact detection and removal.
The process of identifying artifactual functions has not only been conducted by their suggestedshape, also checked by using topographic maps that roughly represent patterns of eigenactivityrelated to the distribution of bio-electric energy along the scalp. These maps have been elaboratedfrom the coordinates of the projection of ๐ ๐ onto the basis of independent weight functions, ๐ ๐๐, ๐ = โจ ๐ ๐ , ๐ ๐ โฉ , ๐ = 1 , โฆ , ๐, (9)whose resulting score vectors ๐ป ๐, ๐ = ( ๐ ๐ , โฆ , ๐ ๐๐ ) โค associated to each eigenfunction are depictedin the spatial electrode domain. Thus, the points in these maps represent energy markers color-coded for their positive or negative eigenfunction contribution to the each signal component ๐ ๐ .Once the artifactual eigenfunctions have been identi๏ฌed, they can be manually selected togetherwith their corresponding projection coe๏ฌcients to reconstruct an expansion so that it can be sub-tracted from the original functional observations in order to remove the undesired artifactualcurves. The projection coe๏ฌcients where the selection is made are obtained from (9). In orderto simplify the burden of a manual selection, assume ๐ ๐๐ ( ๐ก ) = ๐ โ ๐ =1 ๐ ๐๐, ๐ ๐ ๐ ( ๐ก ) , to be an expansion of components corresponding to their related artifactual eigenfunctions. Then,the artifact subtraction in terms of basis expansions is given by ๐ ๐ ( ๐ก ) โ ๐ ๐๐ ( ๐ก ) = ๐ โ ๐ =1 ๐ ๐๐ ๐ ๐ ( ๐ก ) โ ๐ โ ๐ =1 ๐ ๐๐, ๐ ๐ โ ๐ =1 ( ๐ โค ๐ ๐ ๐ ) ๐ ๐ ( ๐ก ) = ๐ โ ๐ =1 ๐ ๐๐ ๐ ๐ ( ๐ก ) , (10)where ๐ ๐๐ are the residual (or cleaned) coe๏ฌcients, with ๐ ๐ being the vector of coe๏ฌcients of theindependent weight function ๐ ๐ in terms of the principal eigenfunctions, and ๐ ๐ being the vector ofcoe๏ฌcients of the principal eigenfunctions in the basis ๐ ๐ . This is equal to asume that all set of ICweight functions obtained from our model corresponds to an structure of underlaying artifactualpatterns of the EEG signal.
11 of 18 idal, Rosso and Aguilera , Preprint arXiv.orgAlgorithm
KL-FICA estimation procedure for artifact reduction1. Find a suitable B-spline representation of the sampled signal components. Esti-mate the matrix of coe๏ฌcients ๐จ by least squares approximation.2. Calculate the P-spline FPCA of ๐ , or equivalently, the vector valued PCA of ๐จ ๐ฏ ๐ณ โ1 โค to obtain the matrix of smooth principal components ๐ and the coe๏ฌcients of theprincipal weight functions ๐ ๐ .(a) If ๐ > ๐ , consider shrinkage estimators to compute ๐บ ๐จ .(b) Obtain a ๐ for each ๐ using ๐ฑ๐ฒ๐ .3. Perform the vector-valued ICA on the matrix ๐ for each ๐ .(a) Whiten the matrix of principal components: ฬ ๐ = ๐ ๐บ โ1โ2 ๐ .(b) Fix a fourth-order matrix ๐บ , ฬ๐ and diagonalize it. Obtain the distinct eigenval-ues ๐ผ ๐ and associated eigenvectors ๐ ๐ .(c) Calculate the components of interest by projecting the original observationson the basis of IC weight functions, ๐ ๐๐, ๐ = โจ ๐ ๐ , ๐ ๐ โฉ or simply ๐ ๐, ๐ = ๐๐ ๐ .4. Choose an optimal ๐ using ๐บ๐ด . Select manually the artifactual components (weconsider all ICs) and expand the artifactual space.5. Subtract the artifactual coe๏ฌcients using (10) to obtain the coordinates of thecleaned signal components. Repeat the procedure for each trial. Stereotyped artifacts
We ๏ฌrst present full results of the unpenalized and penalized KL-FICA decomposition on the single-trials corresponding to motor related artifacts in absence of sensory stimulation. Baseline cross-validation was performed on a given grid of ๐ โs, selecting the value which minimizes ๐ฑ๐ฒ๐ ๐ ( ๐ ) for ( ๐ = 1 , โฆ , ๐ ) where ๐ is de๏ฌned as ฮ ๐ ๐ = { ๐ โถ max (ฮ ๐ ๐ ) } , i.e. the index entry corresponding tothe ๏ฌrst relative maxima of FPCA eigenvalues in ๏ฌrst di๏ฌerences. Among all the results obtained,we selected the ๐ that for its corresponding estimated ๐ maximizes ๐บ๐ด .Our preliminary results comparing both decompositions show that the smoothed KL-FICA pro-vides more stylised functions revealing the latent form of the artifact. More importantly, however,is that all topographic maps of the chosen components re๏ฌect well-known spatial ๏ฌrings of theexpected artifactual activity. A selection of eigenfunctions from each dataset and their associatedprojection scores depicted on a topographic map are presented in Figure 1. The third eigenfunc-tion, for example, corresponds to a continuous blinking which is characterised by a high energy in-tensity in the prefrontal area. Physiological non-brain activity that occurs near the recording zonecan be easily detected, as they become sources of contrasting amplitude regarding brain activity orother artifacts. In this case, the smoothed KL-FICA presumably attenuates the high-frequency ac-tivity associated to brain patterns that interfere in the estimation. Such reasoning can be extendedto the ๏ฌrst and second artifactual eigenfunctions. Notice that, when the artifact has low-frequency(as the one captured by the ๏ฌrst eigenfunction), ๐ increases considerably whereas other artifactualforms require less smoothing; see Table 1 for more details.However, when artifacts are characterised by localised high-amplitude curves, as it is the casefor the fourth artifactual eigenfunction (chewing), smoothing is not able to mimic those curves anddoes not discriminate e๏ฌectively the high-frequency activity produced by other potentials. As hasbeen observed before, this happens because the weight function strongly depends on the noiseprovided by the fourth-order structure of the model to set up its underlying shape. Thus, theseartifacts are quite sensible to smoothing, which in turn can have the opposite e๏ฌect by causing aloss of accuracy. In addition, we had to increase the range of ๐ settled to perform ๐บ๐ด , leading tothe selection among a larger number of eigenfunctions.
12 of 18 idal, Rosso and Aguilera , Preprint arXiv.org
Table 1.
Summary of di๏ฌerent parameters and kurtosis of components after performing the P-spline KL-FICAon the di๏ฌerent ERP trials. ๐บ๐ด is estimated without penalization as well as for the ๐ obtained with baselinecros-validation ( ๐ฑ๐ฒ๐ ) . The shaded kurtosis values of the components are those related to the depictedeigenfunctions in Figure 1. Trial ๐ ๐ ๐บ๐ด ๐บ๐ด ๐ ๐ฑ๐ฒ๐ kurt ๐ = 0 ๐ ๐ฑ๐ฒ๐ ๐ ,ฬ ๐ ๐ ,ฬ ๐ ๐ ,ฬ ๐ ๐ ,ฬ ๐ ๐ ,ฬ ๐ Nodding 6 5 17.80 45.86 4800 48.73 5.473 5.165 3.825 3.601Arm mov. 4 4 4.000 4.518 1000 7.441 2.637 3.736 2.891 โBlinks 4 3 4.128 3.606 300 6.567 3.115 2.476 โ โChewing ๐ (3) Eigenfunction (blinks)(1) Eigenfunction (nodding)(4) Eigenfunction (chewing)(2) Eigenfunction (arm movement) A B C โ1000100200 โ0 . . . . Fp1FT8FT8FT8 โ0 . . . . โ0 . . . . . โ0 . . . . . Fp1 0 500 1000 1500 2000 2500 3000, ms
Figure 1. (a) A selection of illustrative artifactual eigenfunctions from di๏ฌerent single trial ERP data sets using unpenalized KL-FICA (grey) andP-spline KL-FICA (blue) decompositions. The eigenfunctions are ordered from low to high frequency characteristics. (b) The componentsobtained by projection of the smooth depicted eigenfunction on the original functional sample and distributed in the spatial electrode domain.(c) Comparison of peripheral channels before the extraction, correction and removal (shaded in beige) and after (non-shaded).
From this results it is clearly observed that penalized KL-FICA provides a smooth separation ofmost common artifacts. We also notice that the second eigenfunction, whose characterisation isof interest for the next section, successfully captures the arm movement in the stipulated time
13 of 18 idal, Rosso and Aguilera , Preprint arXiv.org course. This, however, is only the ๏ฌrst half of process to estimate the underlying brain activity. Ourtests on such datasets have shown that a good approximation of brain sources can be obtained bysubtracting the whole estimated artifactual space. It seems reasonable to conjecture that restrict-ing ๐ to the ๏ฌrst FPCA eigenvalues decreases the odds of obtaining spurious artifactual functionsas they represent to higher modes of variability of the FPCA decomposition. Moreover, by havingselected an appropriate penalty parameter we may avoid the distorsion of the underlying neuralactivity. Other methods could be considered to ๏ฌne-tune the selection of artifacts (see Zima et al.,2012), although their application is not as straightforward for functional data. Group-level ERP
To illustrate our method for a group of ERPs, we consider two functional data sets on the followingexperimental conditions: tap to the sound involving an arm movement, listening withoutmovement, both with ๐ = 1 , โฆ , and 100 trials per condition. The P-spline KL-FICA is iterativelyused to obtain brain estimates by subtracting the smooth artifactual curves in each trial. Here, thecomplexity of extracting artifacts increases as we now assume a mixture them as well as otherbrain processes due to the cognitive task. The boxplots of the number of components obtainedand the measures of ๐บ๐ด by using P-spline KL-FICA for each condition are shown in Figure 3. As weexpected, the number of components obtained in condition 1 (movement) is signi๏ฌcantly higherthan in condition 2 (no movement). In addition, note that ๐บ๐ด takes larger values for condition 2;we believe this is a direct consequence of a homogeneous independent mixture of artifactual andcerebral activity. A FT8
Fp1FpzAF8 Fp1FpzAF8Fp1FpzAF8 Fp1FpzAF8
Tapping with arm movement No movement FT8FT8FT8 B Auditory stimuli
EEG recording
Figure 2. (a) Grand average across trials in some prefrontal channels where the artifactual activity is expected.The shaded ones show the raw curves and non-shaded, the curves after artifact removal. (b) A descriptivescheme of the arm movement and corresponding time measures is provided at the bottom of the panels.
Figure 2 shows a contrast between conditions using the proposed approach. All curves fromeach channel were averaged across trials. The upper left panel displays frontal channels wherethe movement-related artifact is clearly visible before the subtraction. Further evidence of suchartifactual content is given in the right panel where the raw curves in the other condition areshown. Clearly, the accumulated artifacts across the trials have here a di๏ฌerent origin. Finally,
14 of 18 idal, Rosso and Aguilera , Preprint arXiv.org the same panel shows the curves after the subtraction of the artifactual curves. We observe thatour procedure notably reduces the movement-related artifactual activity and renders the signalmore stationary. The same applies to the other condition, although di๏ฌerences are smaller. Weshow that, in both conditions, our algorithm is still capable of reducing artifactual content whileretaining brain activity. However, we may expect some loss of information or distortion (if esti-mates are oversmoothed) at trial level depending on the selected ๐ . We also observe that, as theresponse to the repeated stimulus is assumed to be invariant and small in terms of amplitude,averaging suppresses non-phase-locked activity and reveals the potential elicited by the stimulus(Tong and Thakor, 2009). Consequently, attenuate artifactual sources will lead to a better estima-tion of brain potentials at averaging rather than subtract rough artifactual curves. Number of artifactual componentsNo movement510
Conditions N u m be r o f c o m ponen t s Movement Kurtosis Excess on the coordinate space01020304050
Conditions K u r t o s i s E xc e ss No movementMovement
Figure 3.
Boxplot of the number of components computed using an iterative KL-PFICA on 100 trials percondition and boxplot of related ๐บ๐ด measures. In this paper, we proposed a smoothed FICA approach based on a roughness discrete penalty.The fourth-order blind identi๏ฌcation is performed on the smoothed Karhunen-Loรจve expansion,extending the functional IC model introduced by Li et al. (2015) in combination with the smoothFPCA proposed by Aguilera and Aguilera-Morillo (2013b) which introduces the P-spline penalty inthe orthonormality constraint between principal components. We have shown that conductingthe multivariate ICA procedure on the coordinate vectors of the K-L expansion is equivalent to aparticular functional ICA of the original B-spline expansions. This equivalence is inherited from theprocedure proposed in Ocaรฑa et al. (2007) to compute estimates of functional PCA under generalsettings, although now, since ๐ ๐ are orthogonal the task is simpli๏ฌed. The asymptotic propertiesof Silvermanโs method of smooth FPCA has been studied by Lakraj and Ruymgaart (2017) usingexpansions of the perturbed eigensystem of a smoothed covariance operator, however, there iswork to be done ensuring this desirable asymptotic properties on our kurtosis operator.The goal of this paper was to develop the mathematical framework for identi๏ฌcation and re-moval artifacts from functional EEG data, using smooth estimators to prevent the loss of brainactivity. In our tests experiments, the kurtosis operator has proven to work well in capturing ar-tifactual forms with di๏ฌerent frequency characteristics. One of the strengths of our model is thedouble regularisation, providing a more versatile approach which may be bene๏ฌcial depending onthe objectives of preprocessing contaminated EEG data. In essence, the degree of separation isde๏ฌned through the space dimension, from more dependent (๏ฌrst ๐ terms of the KL) to more in-dependent, thus ๐ acts as a regularization parameter in the sense of the PC eigendirections. An
15 of 18 idal, Rosso and Aguilera , Preprint arXiv.org additional penalization using the integrated ๐ -order derivative might be considered to get moreaccurate estimations, especially when the obtained IC weight functions are assumed to containtraces of brain activity.We also introduce some tools to approach ๐ and ๐ . The former is selected with the kurtosisdistance, a measure that re๏ฌects the kurtosis excess in a given coordinate space, the second, withbaseline cross validation, a novel approach that does not collapse information for ๐ > ๐ by tak-ing advantage of shrinkage estimators. To facilitate the identi๏ฌcation of artifactual functions, thecoordinate vector associated to the IC weight functions of interest was represented in the spatialelectrode domain on the scalp. The visual inspection of these topographical maps has reveal in-teresting insights on the composite spectra of the artifactual curves to determine wether or notmight be considered to be subtracted.In order to provide a more feasible instrument for the researcher, we have proposed a methodfor artifact removal for a group-level of contaminated functional ERPs. Although our method isquite ๏ฌexible, it only allows to assemble artifacts with certain characteristics, so it does not enableto gather all the artifactual components in a single space, or at least, functions free of leaked neuralactivity. However, it is meant as a powerful tool at capturing and denoising low-frequency artifactsrelated to body movements and other physiological events. In such framework, a discussion ispending on the choice of the regularization parameters as well as the physiological signi๏ฌcanceof smoothing. We observed that, in the trials where the subject was performing a motor and per-ceptual task, the values obtained by baseline cross-validation were more variable than in our tests.This is probably in part due to a more complex mixture of brain and artifactual activity. Despiteour choice was to exploit the minimization of ๐ฑ๐ฒ๐ , clearly there are other possible approaches toconsider. We see that our method paves the way to further developments in the ๏ฌeld of neural sig-nal processing: in the future a review for performance comparison with di๏ฌerent FOBI estimatorsand extensions of other ICA methodologies would be interesting as well.
P-spline KL-FICA was conducted using a modi๏ฌed version of pspline.ffobi function of the pfica
Rpackage (Vidal and Aguilera, 2020). The implemented code together with a sample input data setsand documentation are available on request from the corresponding author (M.V.).
Author Contributions
Marc Vidal:
Conceptualization, Methodology, Software, Data collection, Writing โ original draft. Mattia Rosso:
Experimental design, Data collection, Preprocessing, Writing โ editing. Ana M.Aguilera:
Methodology, Writing โ review and editing, Supervision. Acknowledgments
We thank Daniel Gost for helping with ๏ฌgures and formatting. The research of Marc Vidal wassupported by the Methusalem funding from the Flemish Government given to Marc Leman. Theauthors thank the support of the Spanish Ministry of Science, Innovation and Universities underproject MTM2017-88708-P also supported by the FEDER program) and research group FQM307funded by the Government of Andalusia (Spain).
Con๏ฌict of Interest : None declared.
References
Aguilera AM , Aguilera-Morillo MC. Comparative study of di๏ฌerent B-spline approaches for functional data.Mathematical and Computer Modelling. 2013; 58:1568โ1579.
Aguilera AM , Aguilera-Morillo MC. Penalized PCA approaches for B-spline expansions of smooth functionaldata. Applied Mathematics and Computation. 2013; 219:7805โ7819. 16 of 18 idal, Rosso and Aguilera , Preprint arXiv.org
Akhtar MT , Mitsuhashi W, James CJ. Employing spatially constrained ICA and wavelet denoising, for automaticremoval of artifacts from multichannel EEG data. Signal Processing. 2012; 92:401โ416.
Ash RB , Gardner MF. Topics in stochastic processes. New York: Academic Press; 1975.
Bell AJ , Sejnowski TJ. An information-maximization approach to blind separation and blind deconvolution.Neural Computation. 1995; 7:1129โ1159.
Buzsรกki G . Rhythms of the Brain. Oxford University Press; 2006.
Cardoso JF . Source separation using higher order moments. In:
Proceedings of IEEE international conference onacoustics, speech and signal processing ; 1989. p. 2109โ2112.
Cardoso JF , Souloumiac A. Blind beamforming for non-Gaussian signals. In:
IEE Proceedings F-Radar and SignalProcessing , vol. 140; 1993. p. 362โ370.
Castellanos NP , Makarov VA. Recovering EEG brain signals: Artifact suppression with wavelet enhanced inde-pendent component analysis. Journal of Neuroscience Methods. 2006; 158:300 โ 312.
Eilers PHC , Marx BD. Flexible smoothing with b-splines and penalties. Statistical Science. 1996; 11:89โ121.
Ghanem R , Spanos P. Stochastic Finite Elements: A Spectral Approach. New York: Springer-Verlag; 1991.
Gramann K , An introduction to mobile brain/body imaging (MoBI); 2014.
Gutch H , Theis F. To in๏ฌnity and beyond: On ICA over Hilbert spaces. In: Theis F, Cichocki A, Yeredor A,Zibulevsky M, editors.
Latent Variable Analysis and Signal Separation ; 2012. p. 180โ187.
Hasenstab K , Sche๏ฌer A, Telesca D, Sugar CA, Jeste S, DiStefano C, Sentรผrk D. A multi-dimensional functionalprincipal components analysis of EEG data. Biometrics. 2017; 3(73):999โ1009.
Herault J , Jutten C. Space or time adaptive signal processing by neural models. In: Denker JS, editor.
AIPConference: Neural Networks for Computing , vol. 151 American Institute for Physics; 1986. p. 206โ211.
Hyvรคrinen A , Oja E. A fast ๏ฌxed-point algorithm for independent component analysis. Neural Computation.1997; 9:1483โ1492.
Kelly JW , Siewiorek DP, Smailagic A, Collinger JL, Weber DJ, Wang W. Fully automated reduction of ocularartifacts in high-dimensional neural data. IEEE Transactions on Biomedical Engineering. 2011; 58:598โ606.
Kollo T . Multivariate skewness and kurtosis measures with an application in ICA. Journal of Multivariate Analysis.2008; 99(10):2328โ2338.
Lakraj GP , Ruymgaart F. Some asymptotic theory for Silvermanโs smoothed functional principal componentsin an abstract Hilbert space. Journal of Multivariate Analysise. 2017; 155:122โ132.
Li B , Bever GV, Oja H, Sabolovรก R, Critchley F. Functional independent component analysis: an extension of thefourth-orderblind identi๏ฌcation.; 2015, submitted.
Loper๏ฌdo N . A new kurtosis matrix, with statistical applications. Linear Algebra and its Applications. 2017;512:1โ17.
Mahajan R , Morshed BI. Unsupervised eye blink artifact denoising of EEG data with modi๏ฌed multiscale sampleentropy, kurtosis, and Wavelet-ICA. IEEE Journal of Biomedical and Health Informatics. 2015; 19:158โ165.
Muthukumaraswamy SD . High-frequency brain activity and muscle artifacts in MEG/EEG: a review and rec-ommendations. Frontiers in human neuroscience. 2013; 7:138.
Nie Y , Wang L, Liu B, Cao J. Supervised functional principal component analysis. Statistics and Computing.2018; 28:713โ723.
Nordhausen K , Virta J. An overview of properties and extensions of FOBI. Knowledge-Based Systems. 2019;173:113โ116.
Nunez PL , Srinivasan R. Electric ๏ฌelds of the brain: the neurophysics of EEG. Oxford: Oxford University Press;2006.
Ocaรฑa FA , Aguilera AM, Escabias M. Computational considerations in functional principal component analysis.Computational Statistics. 2007; 22:449โ465. 17 of 18 idal, Rosso and Aguilera , Preprint arXiv.org
Peรฑa C , Prieto J, Rendรณn C. Independent components techniques based on kurtosis for functional data analysis.Universidad Carlos III de Madrid; 2014.
Pokora O , Kolacek J, Chiu T, Qiu W. Functional data analysis of single-trial auditory evoked potentials recordedin the awake rat. Biosystems. 2018; 161:67โ75.
Ramsay J , Silverman BW. Functional Data Analysis. New York: Springer; 2005.
Rice JA , Silverman BW. Estimating the Mean and Covariance Structure Nonparametrically When the Data areCurves. Journal of the Royal Statistical Society: Series B (Methodological). 1991; 53(1):233โ243.
Schรคfer J , Strimmer K. A shrinkage approach to large-scale covariance matrix estimation and implications forfunctional genomics. Statistical Applications in Genetics and Molecular Biology. 2005; 4:1โ29.
Sche๏ฌer A , Telesca D, Q Li Q, Sugar CA, Distefano C, Jeste S, ลentรผrk D. Hybrid principal components analysisfor region-referenced longitudinal functional EEG data. Biostatistics. 2018; 21:139โ157.
Silverman BW . Smoothed functional principal components analysis by choice of norm. The Annals of Statistics.1996; 24:1โ24.
Singh YN , Singh SK, Ray AK. Bioelectrical signals as emerging biometrics: Issues and challenges. ISRN signalprocessing. 2012; 2012:1โ13.
Tian TS . Functional data analysis in brain imaging studies. Frontiers Psychology. 2010; 1:1โ11.
Tong S , Thakor NV. In:
Quantitative EEG analysis methods and clinical applications ; 2009. .
Vidal M , Aguilera AM. p๏ฌca: Penalized Independent Component Analysis for Univariate Functional Data; 2020, https://CRAN.R-project.org/package=p๏ฌca , r package version 0.1.1.
Virta J , Li B, Nordhausen K, Oja H. Independent component analysis for multivariate functional data. Journalof Multivariate Analysis. 2020; 176:1โ19.
Wang X . Neurophysiological and computational principles of cortical rhythms in cognition. Physiological re-views. 2010; 90:1195โ1268.
Xiao L . Asymptotic theory of penalized splines. Electronic Journal of Statistics. 2019; 13:747โ794.
Yao F , Mรผller HG, Wang JL. Functional Data Analysis for Sparse Longitudinal Data. Journal of the AmericanStatistical Association. 2005; 100(470):577โ590.