[PDF] Bi-Smoothed Functional Independent Component Analysis for EEG Artifact Removal

Abstract

Motivated by mapping adverse artifactual events caused by body movements in electroencephalographic (EEG) signals, we present a functional independent component analysis based on the spectral decomposition of the kurtosis operator of a smoothed principal component expansion. A discrete roughness penalty is introduced in the orthonormality constraint of the covariance eigenfunctions in order to obtain the smoothed basis for the proposed independent component model. To select the tuning parameters, a cross-validation method that incorporates shrinkage is used to enhance the performance on functional representations with large basis dimension. This method provides an estimation strategy to determine the penalty parameter and the optimal number of components. Our independent component approach is applied to real EEG data to estimate genuine brain potentials from a contaminated signal. As a result, it is possible to control high-frequency remnants of neural origin overlapping artifactual sources to optimize their removal from the signal. An R package implementing our methods is available at CRAN.

Full PDF

aa r X i v : . [ s t a t . M E ] J a n Vidal, Rosso and Aguilera , Preprint arXiv.org

P-spline smoothed functional ICA ofEEG data

Marc Vidal , Mattia Rosso , Ana M. Aguilera *For correspondence: [email protected] (M.V.); [email protected] (AM.A.) Present address: † Department ofStatistics and O.R., Granada 18071,Spain; ‡ Department of Musicology,IPEM, Gent 9000, Belgium University of Granada; Ghent University

Abstract

We propose a novel functional data framework for artifact extraction and removal toestimate brain electrical activity sources from EEG signals. Our methodology is derived on the ba-sis of event related potential (ERP) analysis, and motivated by mapping adverse artifactual eventscaused by body movements and physiological activity originated outside the brain. A functionalindependent component analysis (FICA) based on the use of fourth moments is conducted on theprincipal component expansion in terms of B-spline basis functions. We extend this model setupby introducing a discrete roughness penalty in the orthonormality constraint of the functional prin-cipal component decomposition to later compute estimates of FICA. Compared to other ICA algo-rithms, our method combines a regularization mechanism stemmed from the principal eigendirec-tions with a discrete penalization given by the 𝑑 -order diﬀerence operator. In this regard, it allowsto naturally control high-frequency remnants of neural origin overlapping latent artifactual eigen-functions and thus to preserve this persistent activity at artifact extraction level. Furthermore, weintroduce a new cross-validation method for the selection of the penalization parameter whichuses shrinkage to asses the performance of the estimators for functional representations withlarger basis dimension and excess of roughness. This method is used in combination with a kurto-sis measure in order to provide the optimal number of independent components.The FICA modelis illustrated at functional and longitudinal dimensions by an example on real EEG data where asubject willingly performs arm gestures and stereotyped physiological artifacts. Our method canbe relevant in neurocognitive research and related ﬁelds, particularlly in situations where move-ment can bias the estimation of brain potentials. Keywords : Electroencephalography; Fourth order blind identiﬁcation; Functional data analysis; Functional independent com-ponent analysis; Karhunen-Loève expansion; Source separation

Human body is a complex self-regulatory system. As a result of physiological activity, some partic-ular organs and individual cells undergo changes in electrical potentials generating a bioelectricalsignal (Singh et al., 2012) which can be recorded, monitored, or processed in real-time for biomed-ical applications. Such signals are continuous in nature, but sampled in a discrete set of obser-vations with a temporal resolution which directly depends on the sampling rate of the recordingdevice. The higher the sampling frequency, the higher the number of observations per time unit,and the better the approximation to the local shape of the data.In the ﬁeld of neurophysiology, electroencephalography (EEG) represents one of the few tech-niques providing a direct measure of bio-electrical brain activity, as oscillations in excitability ofpopulations of cortical pyramidal cells (Wang, 2010) contribute to variations in the electrical poten-tials over the scalp. Oscillations are characterized by dominant intrinsic rhythms conventionallygrouped into frequency bands, which are by now validated as markers of several neurocognitive idal, Rosso and Aguilera , Preprint arXiv.org phenomena (Buzsáki, 2006). However, despite the temporal resolution achievable with its highsampling rate, EEG is a technique that suﬀers from low signal-to-noise ratio. This is mainly dueto the fact that the layers of tissue dividing the electrodes from the cortex act as a natural ﬁlterattenuating genuine brain activity and mixing the sources: due to volume conduction, the activ-ity originating from each single dipole is picked up by electrodes at all scalp locations fading asa function of distance from the origin (Nunez and Srinivasan, 2006). Furthermore, the dominantbrain-related spectral features often overlap with artifactual activity in higher frequency bands(Castellanos and Makarov, 2006; Muthukumaraswamy, 2013), and particularly at lower frequen-cies most of the variance in the signal is explained by physiological sources outside the brain. Forthese reasons, analyzing EEG signals can ultimately be viewed as solving a source-separation prob-lem with the goal of estimating brain potentials of interest.Blind source separation (BSS) techniques such as independent component analysis (ICA) arecommonly used to address artifact detection and correction of EEG signals. The term ICA encom-passes a broad scope of algorithms and theoretical rudiments aligned to the assumption of inde-pendence of the underling sources. ICA might be generally deﬁned as an unsupervised statisti-cal tool used to isolate mutually independent components from a sample assumed to be gener-ated by a mixture of unknown marginal distributions. From the statistical perspective, it couldbe regarded as an extension of principal component analysis (PCA) that goes beyond the vari-ance patterns of the data introducing high order statistical measures such as kurtosis or negen-tropy. Since the ICA problem was framed in Herault and Jutten (1986), the technique has takenthe form of sophisticated algorithms with varying approaches. In fact, the study of the artifac-tual activity from observed multivariate EEG signals can be performed by a wide collection ofBSS algorithms. Among the most popular, we mention fastICA (Hyvärinen and Oja, 1997), Info-max (Bell and Sejnowski, 1995) or the joint approximate diagonalisation of eigenmatrices (JADE,Cardoso and Souloumiac, 1993). In this paper, however, we will tackle the separation of sourceartefacts using the fourth-order blind identiﬁcation (FOBI, Cardoso, 1989), which is a classical so-lution based on the decomposition of the kurtosis matrix or the weighted covariance. Our choiceis motivated by recent extensions of this method to the functional data paradigm. Indeed, theFOBI estimator has interesting properties (see for a review Nordhausen and Virta, 2019) and is ofeasy computation, which makes it a suitable method to explore and generalize ICA framework infunctional spaces. We will use this kind of data to prove the validity of the method at extractingphysiological artifacts.Functional data analysis (FDA) is a branch of modern statistics with active research in method-ological developments for sampling units in form of functions instead of vectors of measurementsas in multivariate analysis. In practice, however, curves are observed in a ﬁnite set of samplingpoints and thereby the ﬁrst step in FDA is to convert these values into a functional structurethat mimics the original (continuous) nature of the sample paths. As functional data is inherentlyinﬁnite-dimensional, generalizations of multivariate objects such as inverses become an importantissue which complicates the implementation of ICA in functional spaces. In this paper, we considerthe projection of the sample curves on a ﬁnite-dimensional space generated by a suitable basis offunctions in order to solve the ICA problem in functional terms.The use of functional data in brain imaging analysis has gaining notoriety in the last years de-spite the problems arisen in its application. These include a considerable computational cost dueto the large volume of data and the highly multicollinearity of electrode signals in regional densespatial sets. Moreover, the assumption of normality of the error term in the functional model (see1) should not be taken pragmatically as it can be biased due to sources of noise originated by non-neural agents (Tian, 2010). On the other end, data acquired from EEG might elicit a wide variety ofFDA methods, going from the estimation of smoothing sample curves to more advanced reductionand prediction techniques. An application of FDA to EEG data was proposed in Pokora et al. (2018)to study Auditory Evoked Potentials (AEPs) using cross distance measures on functional responsesand its derivatives approximated by cubic smoothing B-splines. Scheﬄer et al. (2018) introduced idal, Rosso and Aguilera , Preprint arXiv.org a novel hybrid principal component analysis (HPCA) for high-dimensional data data consisting of aPCA in the frequency domain along regional dimensions (electrode group) combined with a func-tional PCA in the longitudinal and functional dimension assuming, in the decomposition, the no-tion of weak separability among marginal covariances. This procedure is beneﬁcial in a sense thatit avoids collapsing sparse data in any of the considered dimensions. Similarly, Hasenstab et al.(2017) proposed a multidimensional functional PCA that preserves the singularity of the Event Re-lated Potentials (ERPs) over the longitudinal domain from its moving average. EEG data was alsoused in the context of functional linear regression to test a supervised version of functional PCA(Nie et al., 2018). As noted, most of the FDA methods used in EEG brain studies are focused on mod-elling data free of artifactual sources. However, the eﬃciency of FDA as a signal pre-processing toolin stages where data is contaminated by physiological artifacts remains, to the best of our knowl-edge, not yet been tested.During the past decade, neurocognitive research started to move towards ecological, mobileand interactive experimental scenarios (Gramann, 2014). In this context, full-body movementssuch as gait, trunk sway and arm gestures are likely to exacerbate the artefacts most commonly en-countered in neurocognitive research (e.g., blinks, ocular movements, temporo-mandibular mus-cular activity, sweat, etc.), and bring into the scenario high-amplitude mechanical artefacts due tocable movements. Given the need for overcoming these issues, we propose a framework whichaccounts for both the most stereotyped artifacts and the ones strictly related to body movement.In order to demonstrate its validity, we reproduced a typical experimental scenario where a hu-man participant had to perform full-arm movements synchronised to a periodic auditory stimulusduring an EEG recording. Arguably, what we provide here is a paradigmatic example wherein theresearcher needs to clean the signal from motion-related artefacts, while preserving activity gen-uinely related to perceptual and motor brain processes.Nevertheless, the main contribution of this work does not end with an ordinary estimation andremoval of these artifacts; we propose a funtional-valued denoising tool based on B-spline expan-sions that takes advantage of the FDA smoothing techniques to preserve high-frequency remnantsof neural origin overlapping latent artifactual sources. An earlier work that might resemble to ourmethod applies directly to the data the discrete wavelet transform (DWT) using diﬀerent levels ofdecomposition to smooth the approximation coeﬃcients in order to obtain ocular artifacts free ofneural data (Kelly et al., 2011). Further approaches combine the wavelet decomposition and ICAto denoise the captured artifactual components (Akhtar et al., 2012; Mahajan and Morshed, 2015).In spite of the obvious diﬀerences between both data kinds, our ICA model provides a componentestimation based on a P-spline smoothed approach (Eilers and Marx, 1996; see also Xiao, 2019) ,which has a lower computational cost and is mathematically simpler. Moreover, in the proposedmethod, smoothing is the primary property that prompts the FICA decomposition of the EEG signalrather than an auxiliary strategy used to correct the estimates. However, what diﬀerentiates ourmethod from others is that the decomposition is naturally regulated by the principal componenteigendirections and optimized by means of penalized estimators whereas in using wavelet decom-position this is decided on the basis of a frequency band features of the data or the components,as the case may be. The end user will ﬁnally appreciate how artifact extraction can be ﬁne-tunedby regulating a single smoothing parameter, making it intuitive to improve the outcomes by meansof a visual inspection of the independent component scores.The paper is organized as follows. In Section 2 we start by deﬁning the FICA framework from aninﬁnite-dimensional space perspective to further develop the theoretical foundations of our model.Section 3 introduces the novel regularized FICA decomposition from the basis expansion represen-tations of functional data. An innovative method for the selection of the model parameters basedon the normalized kurtosis of the independent component vectors is also proposed. To prove theperformance of our methods, in Section 4 we use real EEG data on a single and group-level ERPdesigns. We also provide guidelines and the procedure for artifact detection and removal to getclean EEG functional data. Finally, some concluding remarks and prospective research directions idal, Rosso and Aguilera , Preprint arXiv.org are presented in Section 6.

The extension of ICA to functional data has not yet received the attention nor the proliﬁc develop-ments of its predecessor, the functional principal component analysis (FPCA). A ﬁrst attempt to im-plement independent techniques for functional data was proposed in Peña et al. (2014), where thekurtosis (FOBI) operator is deﬁned as an estimator over an approximation to a separable inﬁnite-dimensional Hilbert space. In this space setting, the independent component weight functions areexpected to be rougher as it does not contain functions that are pointwise convergent. Their ap-proach focuses on the classiﬁcation properties of the kurtosis operator, whose decomposition isassumed to have a form similar to the Fisher discriminant function.A version of functional ICA based on the FOBI method that can be regarded as an extensionof its multivariate counterpart was ﬁrst developed in Li et al. (2015). The distinctive aspect thatcharacterizes the model is that the ICA procedure starts from the Karhunen-Loève (K-L) expansion(Ash and Gardner, 1975, p. 37), which is a less rough version of the original sample space since itsteems from its optimality in the least-squared error sense. We extend this model, which through-out the paper is referred as functional ICA in terms of principal components or KL-FICA, endowingthe space with a diﬀerent geometrical structure given by a Sobolev inner product to control theroughness of the latent functions. In a sense, both approaches can be considered a reﬁnementof Peña et al. (2014). More recently, Virta et al. (2020) proposed a version of the KL-FICA model formultivariate functional data using FOBI and JADE estimators.

Basic model setup

Let 𝑥 𝑖 = ( 𝑥 𝑖 , … , 𝑥 𝑖𝑚 𝑖 ) ⊤ be a signal of 𝑖, ( 𝑖 = 1 , … , 𝑛 ) components digitized at the sampling point 𝑡 𝑖𝑘 , ( 𝑘 = 1 , … , 𝑚 𝑖 ) . Consider that the sample data is observed with error, so that it can be modeledas 𝑥 𝑖𝑘 = 𝜒 𝑖 ( 𝑡 𝑖𝑘 ) + 𝜀 𝑖𝑘 (1)where 𝜒 𝑖 is the 𝑖 -th functional trajectory of the signal and 𝜀 𝑖𝑘 mutually independent measurementerrors with zero means. The functions 𝜒 , … , 𝜒 𝑛 are assumed to be independent and identically dis-tributed copies of a random functional variable 𝑋 in 𝐿 ( 𝑇 ) , the separable Hilbert space of squareintegrable functions from 𝑇 to ℝ , endowed with inner product ⟨ ⋅ , ⋅ ⟩ and the induced norm ‖ ⋅ ‖ . Thor-ough the text, 𝑋 is assumed to have zero mean and ﬁnite fourth moments, i.e 𝐸 ‖ 𝑋 ‖ < ∞ , whichimplies that higher order operators are well deﬁned.The concept of independent components of a random vector cannot be immediately extendedto the case of Hilbert-valued random elements (functional data) due to the fact that a probabilitydensity function is not generally deﬁned in this context. Here, we use the deﬁnition of indepen-dence ﬁrst introduced by Gutch and Theis (2012); Li et al. (2015) in the ICA framework for inﬁnite-dimensional spaces, which basically states that a random functional variable has independent com-ponents if the coordinates obtained after projecting into a given orthonormal basis are indepen-dent variables. Then, the aim of FICA is to ﬁnd an operator Γ , such that ⟨ Γ 𝑋, 𝑓 𝑗 ⟩ , ( 𝑗 = 1 , … , 𝑞 ) aremutually independent variables with { 𝑓 𝑗 ∶ 𝑗 ∈ 𝑞 } being a truncated orthonormal basis in 𝐿 ( 𝑇 ) . Asfunctional data is not inherently Gaussian, our IC model is based on the assumption that if 𝑋 isapproximately represented by a truncated orthonormal basis, then it admits the FICA decomposi-tion. Otherwise, by considering 𝑋 prompted by a Gaussian process, a FPCA will suﬃce to obtainthe independent components (Ash and Gardner, 1975, p. 40). This IC model begs inevitably to thequestion of the basis choice. In our FICA approach, the sample curves are reconstructed using theK-L expansion, meaning that the chosen basis is provided by the decomposition of the covarianceoperator. idal, Rosso and Aguilera , Preprint arXiv.org

For an arbitrary function ℎ ∈ 𝐿 ( 𝑇 ) and 𝑠 ∈ 𝑇 , we deﬁne the sample covariance operator as  ( ℎ )( 𝑠 ) = 𝑛 −1 𝑛 ∑ 𝑖 =1 ⟨ 𝜒 𝑖 , ℎ ⟩ 𝜒 𝑖 ( 𝑠 ) = ⟨ 𝐶 ( 𝑠, ⋅ ) , ℎ ⟩ , which induced by the covariance kernel of 𝜒 , 𝐶 ( 𝑠, 𝑡 ) = 𝑛 −1 ∑ 𝑁𝑖 =1 𝜒 𝑖 ( 𝑠 ) 𝜒 𝑖 ( 𝑡 ) , 𝑡, 𝑠 ∈ 𝑇 , transforms ℎ intoanother function of 𝐿 ( 𝑇 ) . Then, Mercer’s theorem provides the eigendecomposition 𝐶 ( 𝑠, 𝑡 ) = ∞ ∑ 𝑗 =1 𝜂 𝑗 𝑓 𝑗 ( 𝑠 ) 𝑓 𝑗 ( 𝑡 ) , denoting by { 𝜂 𝑗 , 𝑓 𝑗 } 𝑗 the positive sequence of eigenvalues in descending order and their associatedorthonormal eigenfunctions. As a remainder, observe that when the kernel is hermitian (symmet-ric) and positive-deﬁnite, the uniformly converging series expansions are obtained for both kerneland its associated operator. It follows that, for every 𝑡 ∈ 𝑇 , we can approximately represent 𝜒 𝑖 ( 𝑡 ) by a truncated series of the K-L expansion 𝜒 𝑞𝑖 ( 𝑡 ) = 𝑞 ∑ 𝑗 =1 𝑧 𝑖𝑗 𝑓 𝑗 ( 𝑡 ) , where 𝑧 𝑖𝑗 = ⟨ 𝜒 𝑖 , 𝑓 𝑗 ⟩ are zero mean random variables with var ( 𝑧 𝑗 ) = 𝜂 𝑗 and cov ( 𝑧 𝑗 , 𝑧 𝑗 ′ ) = 0 for 𝑗 ≠ 𝑗 ′ .These variables are referred to as the principal components scores. Moreover, if the 𝑞 term of theK-L is optimally selected, the mean squared error is minimized (Ghanem and Spanos, 1991, p. 21),providing the best linear approximation to 𝜒 𝑖 ( 𝑡 ) .Our main assumption facts that the target functions can be found in the space spanned by theﬁrst 𝑞 eigenfunctions of  , as it is endowed with a second-order structure represented by the majormodes of variation of 𝜒 𝑖 ( 𝑡 ) . Thus, in such eigensubspace it is expected to gain some accuracy in theforthcoming results due to the attenuation of the higher oscillation modes corresponding to thesmall eigenvalues of  . In the following, we denote  𝑞 = span { 𝑓 , … , 𝑓 𝑞 } the subspace spannedby the 𝑞 ﬁrst eigenfunctions of  , which form the chosen basis for our IC model. Without loss ofgenerality,  𝑞 will be assumed to preserve the inner product in 𝐿 ( 𝑇 ) .Most of the multivariate ICA methods require the standardization of the observed data withthe inverse square root of the covariance matrix in order to remove any linear dependencies andnormalize the variance along its dimensions. In inﬁnite-dimensional spaces, however, covarianceoperators are not invertible giving rise to an ill-posed problem. As long as our signal is representedin  𝑞 , no regularization is needed and consequently, the inverse of the covariance operator can bewell deﬁned. Then, the ﬁrst step towards the estimation of the independent components consistsof a transformation of the K-L coeﬃcients (PCs) with respect to the usual Euclidean geometry. Sincestandardization is a particular case whitening (or sphering), we can generalize the procedure in theform of a whitening operator Ψ that transforms a function in  𝑞 into a standarized function on thesame space. This implies that Ψ( 𝜒 𝑞 ) = ̃ 𝜒 𝑞 is a standardized functional variable whose covarianceoperator  ̃ 𝜒𝑞 naturally satisﬁes to be the identity inside the space.By analogy to the FPCA, it follows the decomposition of the FOBI operator (kurtosis), which isdeﬁned as  ( ℎ )( 𝑠 ) = 𝑛 −1 𝑛 ∑ 𝑖 =1 ⟨ ̃ 𝜒 𝑞𝑖 , ̃ 𝜒 𝑞𝑖 ⟩ ⟨ ̃ 𝜒 𝑞𝑖 , ℎ ⟩ ̃ 𝜒 𝑞𝑖 ( 𝑠 ) = ⟨ 𝐾 ( 𝑠, ⋅ ) , ℎ ⟩ , (2)where 𝐾 ( 𝑠, 𝑡 ) = 𝑛 −1 ∑ 𝑛𝑖 =1 ‖‖ ̃ 𝜒 𝑞𝑖 ‖‖ ̃ 𝜒 𝑞𝑖 ( 𝑠 ) ̃ 𝜒 𝑞𝑖 ( 𝑡 ); 𝑠, 𝑡 ∈ 𝑇 denotes the FOBI kernel function of ̃ 𝜒 𝑞 and ℎ , thefunction in  𝑞 to be transformed. As in Li et al. (2015), we assume the FOBI operator to be positive-deﬁnite, hermitian and equivariant such that there exists the eigendecomposition 𝐾 ( 𝑠, 𝑡 ) = 𝑞 ∑ 𝑙 =1 𝛼 𝑙 𝑔 𝑙 ( 𝑠 ) 𝑔 𝑙 ( 𝑡 ) , 𝑙 = 1 , … , 𝑞, where { 𝛼 𝑙 , 𝑔 𝑙 } 𝑙 is a positive sequence of distinct eigenvalues and related eigenfuntions. idal, Rosso and Aguilera , Preprint arXiv.org

With this, we are now ready to deﬁne the independent components of 𝜒 𝑞𝑖 as generalized lin-ear combinations with maximum kurtosis given by 𝜁 𝑖𝑙,̃ 𝜒 = ⟨ ̃ 𝜒 𝑞𝑖 , 𝑔 𝑙 ⟩ . Challenging questions arise onhow the Karhunen-Loève theorem might be applied in this context. Intuitively, we note that thisprocedure leads to the expansion ̃ 𝜒 𝑞𝑖 ( 𝑡 ) = ∑ 𝑞𝑙 =1 𝜁 𝑖𝑙,̃ 𝜒 𝑔 𝑙 ( 𝑡 ) which can also be approximated in termsof 𝑟 eigenfuntions 𝑔 𝑙 of interest, e.g. those associated with the independent components with themost extreme kurtosis values. Under mild conditions, this problem was solved in Li et al. (2015);Virta et al. (2020) by choosing 𝑟 = 𝑞 . However, there are other possibilities, such as consider 𝑟 < 𝑞 or 𝑔 𝑙 as a basis of projection for either 𝜒 , 𝜒 𝑞 or ̃ 𝜒 𝑞 , in view of the fact that it preserves the four-orderstructure of the standardized data. In our EEG study, we propose to project the original functionson the basis { 𝑔 , … , 𝑔 𝑞 } to discern artifactual patterns. We then subtract the space generated bya basis of selected artifactual eigenfunctions from the original curves to obtain a new sample thatcan be regarded as an estimate of the genuine brain activity. Silverman (1996) introduced a method that combines the use of the d -order diﬀerential opera-tor to control the roughness of the weight functions in order to estimate smooth (or regularized)functional principal components. By this heuristic, it incorporates a ﬁxed roughness penalty intothe orthonormality constraint between principal components, unlike other approaches that pe-nalize the variance (Rice and Silverman, 1991) or the covariance operator (Yao et al., 2005). InAguilera and Aguilera-Morillo (2013b) two alternative versions of smoothed FPCA are discussed,taking advantage of a discrete penalization in terms of adjacent B-spline coeﬃcients (P-splinepenalty). In a sense, the smoothed functional ICA considered here is based on the second FPCAversion in Aguilera and Aguilera-Morillo (2013b) which incorporates the P-spline penalty in the or-thonormality constraint.In order to estimate the P-spline smoothed PCs, we assume next that the sample paths be-long to a ﬁnite analogue of the Hilbert space 𝐿 ( 𝑇 ) spanned by a basis { 𝜙 ( 𝑡 ) , … , 𝜙 𝑝 ( 𝑡 )} . Each func-tion of the sample can be expanded as 𝜒 𝑖 ( 𝑡 ) = ∑ 𝑝𝑗 =1 𝑎 𝑖𝑗 𝜙 𝑗 ( 𝑡 ) , where 𝑎 𝑖𝑗 are random coeﬃcients. Inmatrix form, 𝝌 = 𝑨 𝜙, where 𝑨 is the coeﬃcient matrix 𝑨 = ( 𝑎 𝑖𝑗 ) ∈ ℝ 𝑛 × 𝑝 associated to the basis 𝝓 = ( 𝜙 , … , 𝜙 𝑝 ) ⊤ , with 𝝌 = ( 𝜒 , … , 𝜒 𝑛 ) ⊤ . For the rest of this section, we consider that the samplecurves are expanded in terms of B-spline basis functions.Under the model (1), the basis coeﬃcients of sample curves can be found by least squaresapproximation minimizing the Mean Squared Error (

𝙼𝚂𝙴 ) for each 𝑖 , i. e., ( ̂𝑎 𝑖 , … , ̂𝑎 𝑖𝑝 ) ⊤ = argmin 𝑚 𝑖 ∑ 𝑘 =1 { 𝑥 𝑖𝑘 − 𝑝 ∑ 𝑗 =1 𝑎 𝑖𝑗 𝜙 𝑗 ( 𝑡 𝑖𝑘 ) } , and thus, the estimate of 𝒂 𝑖 = ( 𝑎 𝑖 , … , 𝑎 𝑖𝑝 ) ⊤ that minimizes the 𝙼𝚂𝙴 is ̂𝒂 𝑖 = ( 𝚽 ⊤ 𝑖 𝚽 𝑖 ) −1 𝚽 ⊤ 𝑖 𝒙 𝑖 , where 𝚽 𝑖 = { 𝜙 𝑗 ( 𝑡 𝑖𝑘 )} ∈ ℝ 𝑚 𝑖 × 𝑝 . For general guidance on both basis selection and its order, we referethe reader to Chapters 3 and 4 in Ramsay and Silverman (2005). Although in this paper it is as-sumed a non-penalised B-spline basis to approximate the functional representations of the data,Aguilera and Aguilera-Morillo (2013a) give a detailed account of how to solve it using diﬀerentroughness penalty approaches (continuous and discrete) for estimating curves in terms of B-splinebasis.The truncated K-L expansion is generated by the ﬁrst q -eigenfunctions ( 𝑞 ≤ 𝑝 ) of the covarianceoperator, which are given in terms of the B-spline basis as 𝑓 ( 𝑡 ) = 𝑝 ∑ 𝑘 =1 𝑏 𝑘 𝜙 𝑘 ( 𝑡 ) = 𝝓 ( 𝑡 ) ⊤ 𝒃 , with 𝒃 = ( 𝑏 , … , 𝑏 𝑝 ) ⊤ being its vector of basis coeﬃcients. The discrete P-spline roughness penaltyfunction is deﬁned by 𝙿𝙴𝙽 𝑑 ( 𝑓 ) = 𝒃 ⊤ 𝑷 𝑑 𝒃 , where 𝑷 𝑑 is a matrix of penalties given by 𝑷 𝑑 = ( 𝚫 𝑑 ) ⊤ 𝚫 𝑑 , idal, Rosso and Aguilera , Preprint arXiv.org with 𝚫 𝑑 being the matrix representation of the 𝑑 -order diﬀerence operator. Throughout the paperit is usually assumed a penalty order of 𝑑 = 2 which is equivalent to 𝒃 ⊤ 𝑷 𝒃 = ( 𝑏 − 2 𝑏 + 𝑏 ) + ⋯ + ( 𝑏 𝑝 −2 − 2 𝑏 𝑝 −1 + 𝑏 𝑝 ) , so that the diﬀerence matrix 𝚫 has the form 𝚫 = ⎡⎢⎢⎢⎢⎣ ⋯ ⋯ ⋯⋮ ⋮ ⋮ ⋮ ⋱ ⎤⎥⎥⎥⎥⎦ . According to Aguilera and Aguilera-Morillo (2013b); Silverman (1996), the weight functions 𝑓 are de-termined by maximizing var ⟨ 𝑓 , 𝜒 ⟩ subject to othonormality with respect to the Sobolev type innnerproduct given by ⟨ 𝑔, ℎ ⟩ 𝜆 = ⟨ ℎ, 𝑔 ⟩ + 𝜆 𝒉 ⊤ 𝑷 𝑑 𝒈 , with 𝒉 and 𝒈 being the vectors of basis coeﬃcients of thefunctions ℎ ( 𝑡 ) and 𝑔 ( 𝑡 ) , respectively. This is equivalent to maximize the penalized sample variancedeﬁned by var ⟨ 𝑓 , 𝜒 ⟩⟨ 𝑓 , 𝑓 ⟩ + 𝜆 × 𝙿𝙴𝙽 𝑑 ( 𝑓 ) = 𝒃 ⊤ 𝚯𝚺 𝑨 𝚯 𝒃𝒃 ⊤ ( 𝚯 + 𝜆 𝑷 𝑑 ) 𝒃 , (3)where 𝚯 = ( ⟨ 𝜙 𝑗 , 𝜙 𝑗 ′ ⟩ ) , ( 𝑗, 𝑗 ′ = 1 , … , 𝑝 ) , 𝚺 𝑨 = 𝑛 𝑨 ⊤ 𝑨 and 𝜆 ≥ is a penalty parameter that controls thetrade-oﬀ between maximizing the sample variance and the strength of the penalty.Because B-spline basis are non-orthogonal with respect to the usual 𝐿 geometry (inner prod-uct), we can apply Choleski factorization of the form 𝑳𝑳 ⊤ = 𝚯 + 𝜆 𝑷 𝑑 in order to ﬁnd a non-singularmatrix that allows us to operate in terms of the B-spline geometrical structure induced into ℝ 𝑞 .Then, the solution leads to an eigenproblem of a Hermitian matrix 𝑳 −1 𝚯𝚺 𝑨 𝚯 𝑳 −1 ⊤ 𝒗 𝑗 = 𝜂 𝑗 𝒗 𝑗 , (4)to compute the coeﬃcients of the weight functions as 𝒃 𝑗 = 𝑳 −1 ⊤ 𝒗 𝑗 , and renormalized so that 𝒃 ⊤ 𝑗 𝚯 𝒃 𝑗 =1 . The 𝑗 -th smoothed principal component is then obtained as 𝒛 𝑗 = 𝑨 𝚯 𝒃 𝑗 . Under this framework,the multivariate PCA of 𝑨 𝚯 𝑳 −1 ⊤ in ℝ 𝑞 and the P-spline smoothed FPCA of 𝜒 ( 𝑡 ) in 𝐿 ( 𝑇 ) are equivalent(see section 4 in Aguilera and Aguilera-Morillo, 2013a).Having estimated the weight functions coeﬃcients and principal components scores, assumenext that the smooth K-L expansion is truncated at the q -term, e.g. we may choose 𝑞 = 𝑝 . Then,the vector of sample curves is given by 𝝌 𝑞 ( 𝑡 ) = 𝒁𝒇 ( 𝑡 ) , where 𝒁 = ( 𝑧 𝑖𝑗 ) ∈ ℝ 𝑛 × 𝑞 is the matrix ofprincipal components coeﬃcients (scores) with respect to the basis of smooth PC weight functions 𝒇 ( 𝑡 ) = ( 𝑓 ( 𝑡 ) , … , 𝑓 𝑞 ( 𝑡 )) ⊤ . Following the ICA pre-processing steps, we ﬁrst standardize the approximated K-L curves deﬁn-ing the whitening operator as ̃ 𝝌 𝑞 ( 𝑡 ) = Ψ{ 𝝌 𝑞 ( 𝑡 )} = ̃𝒁𝒇 ( 𝑡 ) , with ̃𝒁 = 𝒁 𝚺 − 𝒁 being the matrix of stan-dardized PCs and 𝚺 − 𝒁 = √ 𝑛 ( 𝒁 ⊤ 𝒁 ) − , the inverse square root of the covariance matrix of 𝒁 . Asthe described whitening transformation is essentially an orthogonalization of the probabilistic partof 𝜒 𝑞 , now the matrix ̃𝒁 ∈ ℝ 𝑛 × 𝑞 satisfy 𝚺 ̃𝒁 = 𝑰 𝑞 .Second, the diagonalization of the FOBI operator (expresssion 2) of the standardized K-L curves ̃ 𝝌 𝑞 ( 𝑡 ) leads to the diagonalization of the FOBI matrix of the standardized PCs ̃𝒁 as 𝚺 , ̃𝒁 𝒖 𝑙 = 𝛼 𝑙 𝒖 𝑙 , 𝑙 = 1 , … , 𝑞, (5)where 𝚺 , ̃𝒁 = 𝑛 ∑ 𝑛𝑖 =1 ‖‖ ̃𝒛 𝑖 ‖‖ ̃𝒛 𝑖 ̃𝒛 ⊤ 𝑖 = 𝑛 ̃𝒁 ⊤ 𝑫 ̃𝒁 ̃𝒁 ∈ ℝ 𝑞 × 𝑞 , with 𝑫 ̃𝒁 = diag( ̃𝒁 ̃𝒁 ⊤ ) , and ̃𝒛 𝑖 being the columnvector 𝑞 × 1 with the 𝑖 -th row of the matrix ̃𝒁 . This means that the P-spline smoothed KL-FICA of 𝜒 ( 𝑡 ) is obtained from the multivariate ICA of 𝒁 in ℝ 𝑞 . Expression (5) is not restricting to assume that 𝚺 , ̃𝒁 is uniquely determined, in fact, severaldiﬀerent deﬁnitions has been proposed since the classical formulation in Cardoso (1989). It is alsoworthy to note the kurtosis matrix in Kollo (2008), 𝚺 , ̃𝒁 = 𝑛 −1 𝑛 ∑ 𝑖 =1 ̃𝒛 𝑖 ⊤ 𝑞 𝑞 ̃𝒛 ⊤ 𝑖 ̃𝒛 𝑖 ̃𝒛 ⊤ 𝑖 + ( 𝑞 + 2) 𝑰 𝑞 , idal, Rosso and Aguilera , Preprint arXiv.org which includes all mixed fourth moments by incorporating all-one vectors, or the matrix proposedin Loperﬁdo (2017) based on the dominant eigenpair of the fourth standardized moment. ThisFOBI extensions will be discussed in a future paper.This way, the KL-FICA decomposition of 𝜒 is already obtained. The IC weight functions are nowexpressed as 𝑔 𝑙 ( 𝑡 ) = ∑ 𝑞𝑗 =1 𝑢 𝑙𝑗 𝑓 𝑗 ( 𝑡 ) , ( 𝑙 = 1 , … , 𝑞 ) , where the coeﬃcients vectors 𝒖 𝑙 are the eigenvectorsof the predeﬁned kurtosis matrix. Then, the independent components are obtained as 𝜻 𝑙,̃ 𝜒 = ̃𝒁𝒖 𝑙 . Finally, the operator Γ deﬁning the FICA model is given by Γ( 𝜒 𝑞𝑖 ) = 𝒇 ⊤ 𝑼 ⊤ 𝚺 −1∕2 𝒁 𝒛 𝑖 , with 𝒛 𝑖 being thecolumn vector 𝑞 × 1 with the 𝑖 -th row of the matrix 𝒁 . Choosing 𝑞 according to the kurtosis excess of the coordinate vectors The problem concerning the estimation of the independent component curves lies in the fact ofﬁnding an optimal truncation point of the K-L expansion and in an appropriate selection of thepenalty parameter. When 𝑞 approaches to 𝑝 , more of the higher oscillation modes of the standard-ized sample are induced in the estimation. Otherwise, we are denoising the weight functions fromthe fourth-order structure of the data. From this perspective, it is desirable to increase the valueof 𝑞 such that the latent functions from the whitened space can be captured (Virta et al., 2020).Observe that this kind of regularization is not exactly the same as the one providing the P-splinepenalization of the roughness of the weight functions. Attenuating the higher frequency compo-nents of the FPCA model does not necessarily aﬀect an entire frequency bandwidth of the data.Thus, if the original curves are observed with independent error, it may overlap the estimationof the weight functions. In this context, smoothing would be appropriate. Once the value of 𝑞 ischosen, we should examine those components with kurtosis excess, contrary to the FPCA wherethe components associated to large eigenvalues are considered.We next propose a new method to approach 𝑞 deﬁning a fourth-moment measure which ex-presses the degree of kurtosis excess on a given independent component coordinate space. As-sume this space to be 𝜁 𝑖𝑙,̃ 𝜒 = ⟨ ̃ 𝜒 𝑞𝑖 , 𝑔 𝑙 ⟩ , i.e. the projection of the standardized original sample curveson the FOBI basis estimated from the FICA decomposition of 𝜒 . In this independent setting, seemsmore reasonable to evaluate the non-gaussianity of the component vectors as they provide themost direct eigenfunction contribution to the original sample. Then, to calculate the degree ofkurtosis excess in ICs, we deﬁne a fourth-moment measure given by 𝙺𝙴 𝑞 = ‖‖‖{ kurt ( 𝜻 𝑙,̃ 𝜒 )} 𝑞𝑙 =1 ‖‖‖ , where 𝜻 𝑙,̃ 𝜒 is the 𝑙 -th IC and kurt( 𝜻 𝑙,̃ 𝜒 ) = 𝑛 −1 ∑ 𝑛𝑖 =1 {( 𝜁 𝑖𝑙,̃ 𝜒 − 𝜁 𝑙,̃ 𝜒 ) ∕ 𝜎 ( 𝜁 𝑙,̃ 𝜒 ) } −3 , which gives the normalizedexcess of kurtosis for each IC. Then the value of 𝑞 is selected according to argmax <𝑞 ≤ 𝑝 ( 𝙺𝙴 𝑞 ) , (6)for 𝑞 -FICA decompositions of 𝜒 . In our EEG example, (6) is iterated until the velocity of the eigenval-ues Δ { 𝜂 , 𝜂 , … , 𝜂 𝑝 } associated to the FPCA of 𝜒 ceases locally to increase in the neighbourhood ofthe dominant eigenvalue. Velocity ﬂuctuations can occur in the exponential decay of 𝜂 𝑗 , meaningthat, asymptotic stability is not necessarily being reached using this criterion. We do ﬁnd, however,that this is a way of exploring independence in the high variability structure of our data, but it alsogoes to ensure that the FICA decomposition induces enough independent-part of the model toseparate the latent functions without losing accuracy in its estimation. In analysing EEG signals,this entails a major eﬀectiveness in reducing the artifactual content to a few eigenfunctions, par-ticularly for low-frequency physiological activity such as blinks and motor artifacts. The presenceof other high-frequency muscular artifacts, however, poses the researcher in a more challengingsituation.The choice of an appropriate truncation point should be seen as a measure to improvethe accuracy of the estimation of those artifacts as to preserve modes of variability related brainactivity rhythms. We believe that, instead of adjusting 𝑞 to larger values of cumulated variance, aniterative FICA process to scale artifact removal may be considered to solve the problem. idal, Rosso and Aguilera , Preprint arXiv.org

Penalty parameter selection

Leave-one-out cross validation ( 𝙲𝚅 ) is generally used to select the penalty parameter in order toachieve a desirable degree of smoothness of the weight functions. In a more explicit and con-densed form, the 𝙲𝚅 procedure in our model lies in ﬁnding a value of 𝜆 that minimizes 𝙲𝚅 𝑞 ( 𝜆 ) = 1 𝑛 𝑛 ∑ 𝑖 =1 ‖‖‖ 𝜒 𝑖 − 𝜒 𝑞 (− 𝑖 ) 𝑖 ‖‖‖ , (7)where 𝜒 𝑞 (− 𝑖 ) 𝑖 = ∑ 𝑞𝑙 =1 𝑧 (− 𝑖 ) 𝑖𝑙 𝑓 (− 𝑖 ) 𝑙 is the K-L representation of the 𝑖 -th curve 𝜒 𝑖 in terms of the 𝑞 ﬁrst prin-cipal components by leaving out it in the estimation process. This method can be combined with(6) so that once 𝜆 is choosen for each ﬁxed 𝑞, then the optimum 𝑞 is the one that maximizes 𝙺𝙴 𝑞 .Here, an important remark has to be made about the regularization of the weight functions.Castellanos and Makarov (2006) and references therein (see section 2.6) discuss about the relia-bility of the estimated artifactual sources as they are assumed to contain traces of independentleaked cerebral activity. Such assumption complicates things when oscillations related to brainrhythms are yet observable on the estimated IC weight functions. Moreover, we found that crossvalidation was not sensitive for a reasonably large basis dimension, forcing us to reformulate thestrategy. To address this issue, the penalty parameter might be subjectively chosen to the suitabledegree of smoothness, however, this can lead in a distortion and poor extraction of the artifactualsources. For the results presented in this paper, we propose a novel adaptation of cross validationwhich consists in replacing (7) by: 𝙱𝙲𝚅 𝑞 ( 𝜆 ) = 1 𝑛 𝑛 ∑ 𝑖 =1 ‖‖‖ 𝜒 𝑞 ; 𝜆 (− 𝑖 ) 𝑖 − 𝜒 𝑞 ; 𝜆 + 𝓁 (− 𝑖 ) 𝑖 ‖‖‖ , (8)where 𝜒 𝑞 ; 𝜆 (− 𝑖 ) 𝑖 is a smoothed K-L representation for some 𝜆 and 𝓁 > a value that increases thesmoothing in the second term of the norm, assume 𝓁 = 0 . . For a ﬁxed 𝑞 , (8) is iterated for a set of 𝜆 ’s to ﬁnd which minimizes 𝙱𝙲𝚅 𝑞 ( 𝜆 ) and provide the choice of the penalty parameter. Similarly asin (7), it can be combined with (6) to select the most suitable 𝑞 via maximizing 𝙺𝙴 . We call this method baseline cross validation , as it operates across diﬀerent K-L reconstructionsof 𝜒 𝑖 for a given baseline penalty parameter and a ﬁxed 𝑞 . This approach is more versatile and par-ticularly useful when the original curves are extremely rough and have been approximated witha larger basis dimension. In addition, for a given 𝑞 it lets score more than one 𝜆 as a result of thevarious relative minima it produces. The intuition behind baseline cross validation is that there areseveral levels of smoothing to endow the estimator with the ability for predictive modelling. Thus,the selection of the value that minimizes it is a merely practical matter in this work. The implemen-tation of the method for a larger basis order using shrinkage estimators is given in Section 4. Letus remind that no categorical rules are provided here for the selection of 𝜆 as the physiologicalsigniﬁcance of smoothing is not tested in this paper; this is a matter for a future research.Further complication arises in evaluating the relationship between the kurtosis excess and theselected penalty parameter, from which 𝙺𝙴 for some 𝜆 is expected to be optimized with respect to 𝜆 = 0 . In general, we advocate to dismiss smoothing if 𝙺𝙴 diminishes, as it would represent thatreguralization is naturally optimized in the sense of the principal eigendirectons, and thereforethe induced roughness may be attributable to the independent part of the model. In other words,the latent functions strongly depend on the induced noise to shape a more accurate independentspace structure. However, this rule will be relaxed in order to estimate low-frequency artifactualcurves free of persistent brain activity, even though it does come with the price of losing otherinteresting artifacts. We now discuss the eﬃcacy of our method in estimating artifactual functions on real EEG data.A ﬁrst use case is proposed in the context of event-related potentials (ERPs) analysis, where the idal, Rosso and Aguilera , Preprint arXiv.org researcher preferably deals with a high number of short time series aligned to some event. Webegin this section by describing the data and the method to reconstruct the functional form of thesample paths. Then, we analyze distinct short recordings containing common artifacts of inducedmovements and propose a procedure for artifact detection, correction and subtraction. The ordi-nary and penalized KL-FICA are compared. Finally, we propose an automated method for a groupof functional ERPs.

Materials and experimental design

EEG data were recorded across diﬀerent conditions from a single healthy subject (male, 35 yearsold) with a 64-channels eego™mylab system, in the Art Science and Interaction Lab (ASIL) of GhentUniversity (Belgium). In a ﬁrst session, the subject was sitting on a comfortable chair in front of atable and instructed to perform the following classes of self-paced movements in separate single-trials of 3 seconds: nodding, hand-tapping with a wide arm movement starting from the shoulder,eye-blinking and chewing. In a second session, the subject was instructed to tap his hand on thetable synchronizing with a steady auditory stimulus in one condition, and to listen to the same stim-ulus in a condition without any movement involved. We recorded 100 trials per condition, dividedin randomized blocks of 25 trials. The stimulus period was 750ms. Movements were intentionallyexaggerated, to maximize eventual movement-related artifacts.Our purpose was two-fold. With the ﬁrst recording in absence of sensory stimulation, we aimedat isolating the stereotyped artifacts, and show how we can estimate and selectively remove themfrom heavily contaminated signal. In the second recording, we reproduced a minimal form ofexperimental design, contrasting a condition of listening while moving with a baseline condition oflistening while sitting still. Disposing of a baseline, we could directly compare the outcome of ourcleaning procedure with an uncontaminated experimental situation. Results will be shown on aselection of individual contaminated segments and on the grand-average ERPs for each condition.Any cognitive interpretation of the diﬀerence across conditions is beyond the scope of the presentwork.

Methods

Pre-processing and general parameter tuning.

All recordings were performed with a sampling rate of 1 kHz, so for a trial length of 3 secondswe have sampling points. The signal was high-pass and low-pass ﬁltered using Butterworthﬁlters (cut-oﬀ at 0.5 and 30 Hz, order 4 and 6 , respectively). An additional notch ﬁlter was appliedfor suppressing the 50 Hz power-line noise. For all the trials of the second recording, the ﬁlteredtime series were baseline-corrected by subtracting the average of the -200 ms to 0 ms pre-stimulusfrom each time point.Let 𝜒 𝑖 ( 𝑡 𝑖𝑘 ) denote the EEG potentials for the signal component 𝑖, ( 𝑖 = 1 , … , at time point 𝑡 𝑖𝑘 , ( 𝑘 =1 , … , for each one single trial and trials whose FICA decompositions are conductedindependently. In reconstructing the functional form of the sample paths, we sough a less smoothﬁtting, so we could mimic the brain potential ﬂuctuations. Accordingly, a basis of cubic B-splinefunctions of dimension 𝑝 = 230 was ﬁtted to all signal components minimizing the mean squarederror of the estimation for the B-spline coeﬃcients to a negligible value.Since we require a basis dimension greater than the number of signal components (samplesize), a shrinkage covariance estimator (Schäfer and Strimmer, 2005) is considered to compute 𝚺 𝑨 in the PCA algorithm. This method guarantees positive deﬁniteness and consequently an estima-tion of the higher and important eigenvalues not biased upwards. The same strategy is used forbaseline cross-validation. Recall the quadratic distances in (8). For the 𝑁 -th trial, this distances in

10 of 18 idal, Rosso and Aguilera , Preprint arXiv.org terms of basis functions can be expressed as ‖‖‖ 𝜒 𝑞 ; 𝜆 (− 𝑖 ) 𝑖 − 𝜒 𝑞 ; 𝜆 + 𝓁 (− 𝑖 ) 𝑖 ‖‖‖ = ∫ 𝑇 [ 𝜒 𝑞 ; 𝜆 (− 𝑖 ) 𝑖 ( 𝑡 ) − 𝜒 𝑞 ; 𝜆 + 𝓁 (− 𝑖 ) 𝑖 ( 𝑡 ) ] 𝑑𝑡 == ∫ 𝑇 [ 𝑞 ∑ 𝑙 =1 𝑧 𝜆 (− 𝑖 ) 𝑖𝑙 𝑝 ∑ 𝑗 =1 𝑏 𝜆 (− 𝑖 ) 𝑙𝑗 𝜙 𝑗 ( 𝑡 ) − 𝑞 ∑ 𝑙 =1 𝑧 𝜆 + 𝓁 (− 𝑖 ) 𝑖𝑙 𝑝 ∑ 𝑗 =1 𝑏 𝜆 + 𝓁 (− 𝑖 ) 𝑙𝑗 𝜙 𝑗 ( 𝑡 ) ] = ∫ 𝑇 [ 𝑝 ∑ 𝑗 =1 𝑐 𝑖𝑗 𝜙 𝑗 ( 𝑡 ) ] 𝑑𝑡 = 𝒆 ⊤ 𝑖 𝚯 𝒆 𝑖 , where 𝒆 𝑖 = ( 𝑒 𝑖 , … , 𝑒 𝑖𝑝 ) ⊤ . Next, the matrix  = ( 𝑒 𝑖𝑗 ) ∈ ℝ 𝑛 × 𝑞 is reconstructed via shrinkage. That is,ﬁrst we compute cov 𝑆 ( 𝐶 ) where cov 𝑆 is the shrinkage covariance estimator, then we apply Choleskydecomposition of the form 𝑳𝑳 ⊤ = cov 𝑆 (  ) . Finally, the basis coeﬃcients of the reconstructed resid-ual functions are given by  ̂𝒆 𝑖 =  𝑳 −1 ⊤ 𝒆 𝑖 , and consequently now 𝙱𝙲𝚅 ( 𝜆 ) 𝑞 = 1 𝑛 𝑛 ∑ 𝑖 =1 ‖‖‖ 𝜒 𝑞 ; 𝜆 (− 𝑖 ) 𝑖 − 𝜒 𝑞 ; 𝜆 + 𝓁 (− 𝑖 ) 𝑖 ‖‖‖ = ̂𝒆 ⊤ 𝑖 𝚯 ̂𝒆 𝑖 . The baseline cross-validation method combined with shrinkage estimators has resulted to be sen-sitive at larger basis dimension, but also provides insights within diﬀerent levels of roughness ofthe sample curves at each 𝑞 . For the present application, we will take the value that minimizes 𝙱𝙲𝚅 ( 𝜆 ) 𝑞 for diﬀerent values of 𝑞 and use 𝙺𝙴 in its optimization. Artifact detection and removal.

The process of identifying artifactual functions has not only been conducted by their suggestedshape, also checked by using topographic maps that roughly represent patterns of eigenactivityrelated to the distribution of bio-electric energy along the scalp. These maps have been elaboratedfrom the coordinates of the projection of 𝜒 𝑖 onto the basis of independent weight functions, 𝜁 𝑖𝑙, 𝜒 = ⟨ 𝜒 𝑖 , 𝑔 𝑙 ⟩ , 𝑙 = 1 , … , 𝑞, (9)whose resulting score vectors 𝜻 𝑙, 𝜒 = ( 𝜁 𝑙 , … , 𝜁 𝑛𝑙 ) ⊤ associated to each eigenfunction are depictedin the spatial electrode domain. Thus, the points in these maps represent energy markers color-coded for their positive or negative eigenfunction contribution to the each signal component 𝜒 𝑖 .Once the artifactual eigenfunctions have been identiﬁed, they can be manually selected togetherwith their corresponding projection coeﬃcients to reconstruct an expansion so that it can be sub-tracted from the original functional observations in order to remove the undesired artifactualcurves. The projection coeﬃcients where the selection is made are obtained from (9). In orderto simplify the burden of a manual selection, assume 𝜒 𝑞𝑖 ( 𝑡 ) = 𝑞 ∑ 𝑙 =1 𝜁 𝑖𝑙, 𝜒 𝑔 𝑙 ( 𝑡 ) , to be an expansion of components corresponding to their related artifactual eigenfunctions. Then,the artifact subtraction in terms of basis expansions is given by 𝜒 𝑖 ( 𝑡 ) − 𝜒 𝑞𝑖 ( 𝑡 ) = 𝑝 ∑ 𝑗 =1 𝑎 𝑖𝑗 𝜙 𝑗 ( 𝑡 ) − 𝑞 ∑ 𝑙 =1 𝜁 𝑖𝑙, 𝜒 𝑝 ∑ 𝑗 =1 ( 𝒖 ⊤ 𝑙 𝒃 𝑗 ) 𝜙 𝑗 ( 𝑡 ) = 𝑝 ∑ 𝑗 =1 𝑑 𝑖𝑗 𝜙 𝑗 ( 𝑡 ) , (10)where 𝑑 𝑖𝑗 are the residual (or cleaned) coeﬃcients, with 𝒖 𝑙 being the vector of coeﬃcients of theindependent weight function 𝑔 𝑙 in terms of the principal eigenfunctions, and 𝒃 𝑗 being the vector ofcoeﬃcients of the principal eigenfunctions in the basis 𝜙 𝑗 . This is equal to asume that all set of ICweight functions obtained from our model corresponds to an structure of underlaying artifactualpatterns of the EEG signal.

11 of 18 idal, Rosso and Aguilera , Preprint arXiv.orgAlgorithm

KL-FICA estimation procedure for artifact reduction1. Find a suitable B-spline representation of the sampled signal components. Esti-mate the matrix of coeﬃcients 𝑨 by least squares approximation.2. Calculate the P-spline FPCA of 𝜒 , or equivalently, the vector valued PCA of 𝑨 𝚯 𝑳 −1 ⊤ to obtain the matrix of smooth principal components 𝒁 and the coeﬃcients of theprincipal weight functions 𝒃 𝑗 .(a) If 𝑝 > 𝑛 , consider shrinkage estimators to compute 𝚺 𝑨 .(b) Obtain a 𝜆 for each 𝑞 using 𝙱𝙲𝚅 .3. Perform the vector-valued ICA on the matrix 𝒁 for each 𝑞 .(a) Whiten the matrix of principal components: ̃ 𝒁 = 𝒁 𝚺 −1∕2 𝒁 .(b) Fix a fourth-order matrix 𝚺 , ̃𝒁 and diagonalize it. Obtain the distinct eigenval-ues 𝛼 𝑙 and associated eigenvectors 𝒖 𝑙 .(c) Calculate the components of interest by projecting the original observationson the basis of IC weight functions, 𝜁 𝑖𝑙, 𝜒 = ⟨ 𝜒 𝑖 , 𝑔 𝑙 ⟩ or simply 𝜁 𝑙, 𝜒 = 𝒁𝒖 𝑙 .4. Choose an optimal 𝑞 using 𝙺𝙴 . Select manually the artifactual components (weconsider all ICs) and expand the artifactual space.5. Subtract the artifactual coeﬃcients using (10) to obtain the coordinates of thecleaned signal components. Repeat the procedure for each trial. Stereotyped artifacts

We ﬁrst present full results of the unpenalized and penalized KL-FICA decomposition on the single-trials corresponding to motor related artifacts in absence of sensory stimulation. Baseline cross-validation was performed on a given grid of 𝜆 ’s, selecting the value which minimizes 𝙱𝙲𝚅 𝑞 ( 𝜆 ) for ( 𝑞 = 1 , … , 𝑗 ) where 𝑗 is deﬁned as Δ 𝜂 𝑗 = { 𝑗 ∶ max (Δ 𝜂 𝑗 ) } , i.e. the index entry corresponding tothe ﬁrst relative maxima of FPCA eigenvalues in ﬁrst diﬀerences. Among all the results obtained,we selected the 𝑞 that for its corresponding estimated 𝜆 maximizes 𝙺𝙴 .Our preliminary results comparing both decompositions show that the smoothed KL-FICA pro-vides more stylised functions revealing the latent form of the artifact. More importantly, however,is that all topographic maps of the chosen components reﬂect well-known spatial ﬁrings of theexpected artifactual activity. A selection of eigenfunctions from each dataset and their associatedprojection scores depicted on a topographic map are presented in Figure 1. The third eigenfunc-tion, for example, corresponds to a continuous blinking which is characterised by a high energy in-tensity in the prefrontal area. Physiological non-brain activity that occurs near the recording zonecan be easily detected, as they become sources of contrasting amplitude regarding brain activity orother artifacts. In this case, the smoothed KL-FICA presumably attenuates the high-frequency ac-tivity associated to brain patterns that interfere in the estimation. Such reasoning can be extendedto the ﬁrst and second artifactual eigenfunctions. Notice that, when the artifact has low-frequency(as the one captured by the ﬁrst eigenfunction), 𝜆 increases considerably whereas other artifactualforms require less smoothing; see Table 1 for more details.However, when artifacts are characterised by localised high-amplitude curves, as it is the casefor the fourth artifactual eigenfunction (chewing), smoothing is not able to mimic those curves anddoes not discriminate eﬀectively the high-frequency activity produced by other potentials. As hasbeen observed before, this happens because the weight function strongly depends on the noiseprovided by the fourth-order structure of the model to set up its underlying shape. Thus, theseartifacts are quite sensible to smoothing, which in turn can have the opposite eﬀect by causing aloss of accuracy. In addition, we had to increase the range of 𝑞 settled to perform 𝙺𝙴 , leading tothe selection among a larger number of eigenfunctions.

12 of 18 idal, Rosso and Aguilera , Preprint arXiv.org

Table 1.

Summary of diﬀerent parameters and kurtosis of components after performing the P-spline KL-FICAon the diﬀerent ERP trials. 𝙺𝙴 is estimated without penalization as well as for the 𝜆 obtained with baselinecros-validation ( 𝙱𝙲𝚅 ) . The shaded kurtosis values of the components are those related to the depictedeigenfunctions in Figure 1. Trial 𝑗 𝑞 𝙺𝙴 𝙺𝙴 𝜆 𝙱𝙲𝚅 kurt 𝜆 = 0 𝜆 𝙱𝙲𝚅 𝜁 ,̃ 𝜒 𝜁 ,̃ 𝜒 𝜁 ,̃ 𝜒 𝜁 ,̃ 𝜒 𝜁 ,̃ 𝜒 Nodding 6 5 17.80 45.86 4800 48.73 5.473 5.165 3.825 3.601Arm mov. 4 4 4.000 4.518 1000 7.441 2.637 3.736 2.891 —Blinks 4 3 4.128 3.606 300 6.567 3.115 2.476 — —Chewing 𝑝 (3) Eigenfunction (blinks)(1) Eigenfunction (nodding)(4) Eigenfunction (chewing)(2) Eigenfunction (arm movement) A B C −1000100200 −0 . . . . Fp1FT8FT8FT8 −0 . . . . −0 . . . . . −0 . . . . . Fp1 0 500 1000 1500 2000 2500 3000, ms

Figure 1. (a) A selection of illustrative artifactual eigenfunctions from diﬀerent single trial ERP data sets using unpenalized KL-FICA (grey) andP-spline KL-FICA (blue) decompositions. The eigenfunctions are ordered from low to high frequency characteristics. (b) The componentsobtained by projection of the smooth depicted eigenfunction on the original functional sample and distributed in the spatial electrode domain.(c) Comparison of peripheral channels before the extraction, correction and removal (shaded in beige) and after (non-shaded).

From this results it is clearly observed that penalized KL-FICA provides a smooth separation ofmost common artifacts. We also notice that the second eigenfunction, whose characterisation isof interest for the next section, successfully captures the arm movement in the stipulated time

13 of 18 idal, Rosso and Aguilera , Preprint arXiv.org course. This, however, is only the ﬁrst half of process to estimate the underlying brain activity. Ourtests on such datasets have shown that a good approximation of brain sources can be obtained bysubtracting the whole estimated artifactual space. It seems reasonable to conjecture that restrict-ing 𝑞 to the ﬁrst FPCA eigenvalues decreases the odds of obtaining spurious artifactual functionsas they represent to higher modes of variability of the FPCA decomposition. Moreover, by havingselected an appropriate penalty parameter we may avoid the distorsion of the underlying neuralactivity. Other methods could be considered to ﬁne-tune the selection of artifacts (see Zima et al.,2012), although their application is not as straightforward for functional data. Group-level ERP

To illustrate our method for a group of ERPs, we consider two functional data sets on the followingexperimental conditions: tap to the sound involving an arm movement, listening withoutmovement, both with 𝑖 = 1 , … , and 100 trials per condition. The P-spline KL-FICA is iterativelyused to obtain brain estimates by subtracting the smooth artifactual curves in each trial. Here, thecomplexity of extracting artifacts increases as we now assume a mixture them as well as otherbrain processes due to the cognitive task. The boxplots of the number of components obtainedand the measures of 𝙺𝙴 by using P-spline KL-FICA for each condition are shown in Figure 3. As weexpected, the number of components obtained in condition 1 (movement) is signiﬁcantly higherthan in condition 2 (no movement). In addition, note that 𝙺𝙴 takes larger values for condition 2;we believe this is a direct consequence of a homogeneous independent mixture of artifactual andcerebral activity. A FT8

Fp1FpzAF8 Fp1FpzAF8Fp1FpzAF8 Fp1FpzAF8

Tapping with arm movement No movement FT8FT8FT8 B Auditory stimuli

EEG recording

Figure 2. (a) Grand average across trials in some prefrontal channels where the artifactual activity is expected.The shaded ones show the raw curves and non-shaded, the curves after artifact removal. (b) A descriptivescheme of the arm movement and corresponding time measures is provided at the bottom of the panels.

Figure 2 shows a contrast between conditions using the proposed approach. All curves fromeach channel were averaged across trials. The upper left panel displays frontal channels wherethe movement-related artifact is clearly visible before the subtraction. Further evidence of suchartifactual content is given in the right panel where the raw curves in the other condition areshown. Clearly, the accumulated artifacts across the trials have here a diﬀerent origin. Finally,

14 of 18 idal, Rosso and Aguilera , Preprint arXiv.org the same panel shows the curves after the subtraction of the artifactual curves. We observe thatour procedure notably reduces the movement-related artifactual activity and renders the signalmore stationary. The same applies to the other condition, although diﬀerences are smaller. Weshow that, in both conditions, our algorithm is still capable of reducing artifactual content whileretaining brain activity. However, we may expect some loss of information or distortion (if esti-mates are oversmoothed) at trial level depending on the selected 𝜆 . We also observe that, as theresponse to the repeated stimulus is assumed to be invariant and small in terms of amplitude,averaging suppresses non-phase-locked activity and reveals the potential elicited by the stimulus(Tong and Thakor, 2009). Consequently, attenuate artifactual sources will lead to a better estima-tion of brain potentials at averaging rather than subtract rough artifactual curves. Number of artifactual componentsNo movement510

Conditions N u m be r o f c o m ponen t s Movement Kurtosis Excess on the coordinate space01020304050

Conditions K u r t o s i s E xc e ss No movementMovement

Figure 3.

Boxplot of the number of components computed using an iterative KL-PFICA on 100 trials percondition and boxplot of related 𝙺𝙴 measures. In this paper, we proposed a smoothed FICA approach based on a roughness discrete penalty.The fourth-order blind identiﬁcation is performed on the smoothed Karhunen-Loève expansion,extending the functional IC model introduced by Li et al. (2015) in combination with the smoothFPCA proposed by Aguilera and Aguilera-Morillo (2013b) which introduces the P-spline penalty inthe orthonormality constraint between principal components. We have shown that conductingthe multivariate ICA procedure on the coordinate vectors of the K-L expansion is equivalent to aparticular functional ICA of the original B-spline expansions. This equivalence is inherited from theprocedure proposed in Ocaña et al. (2007) to compute estimates of functional PCA under generalsettings, although now, since 𝑓 𝑗 are orthogonal the task is simpliﬁed. The asymptotic propertiesof Silverman’s method of smooth FPCA has been studied by Lakraj and Ruymgaart (2017) usingexpansions of the perturbed eigensystem of a smoothed covariance operator, however, there iswork to be done ensuring this desirable asymptotic properties on our kurtosis operator.The goal of this paper was to develop the mathematical framework for identiﬁcation and re-moval artifacts from functional EEG data, using smooth estimators to prevent the loss of brainactivity. In our tests experiments, the kurtosis operator has proven to work well in capturing ar-tifactual forms with diﬀerent frequency characteristics. One of the strengths of our model is thedouble regularisation, providing a more versatile approach which may be beneﬁcial depending onthe objectives of preprocessing contaminated EEG data. In essence, the degree of separation isdeﬁned through the space dimension, from more dependent (ﬁrst 𝑞 terms of the KL) to more in-dependent, thus 𝑞 acts as a regularization parameter in the sense of the PC eigendirections. An

15 of 18 idal, Rosso and Aguilera , Preprint arXiv.org additional penalization using the integrated 𝑑 -order derivative might be considered to get moreaccurate estimations, especially when the obtained IC weight functions are assumed to containtraces of brain activity.We also introduce some tools to approach 𝑞 and 𝜆 . The former is selected with the kurtosisdistance, a measure that reﬂects the kurtosis excess in a given coordinate space, the second, withbaseline cross validation, a novel approach that does not collapse information for 𝑝 > 𝑛 by tak-ing advantage of shrinkage estimators. To facilitate the identiﬁcation of artifactual functions, thecoordinate vector associated to the IC weight functions of interest was represented in the spatialelectrode domain on the scalp. The visual inspection of these topographical maps has reveal in-teresting insights on the composite spectra of the artifactual curves to determine wether or notmight be considered to be subtracted.In order to provide a more feasible instrument for the researcher, we have proposed a methodfor artifact removal for a group-level of contaminated functional ERPs. Although our method isquite ﬂexible, it only allows to assemble artifacts with certain characteristics, so it does not enableto gather all the artifactual components in a single space, or at least, functions free of leaked neuralactivity. However, it is meant as a powerful tool at capturing and denoising low-frequency artifactsrelated to body movements and other physiological events. In such framework, a discussion ispending on the choice of the regularization parameters as well as the physiological signiﬁcanceof smoothing. We observed that, in the trials where the subject was performing a motor and per-ceptual task, the values obtained by baseline cross-validation were more variable than in our tests.This is probably in part due to a more complex mixture of brain and artifactual activity. Despiteour choice was to exploit the minimization of 𝙱𝙲𝚅 , clearly there are other possible approaches toconsider. We see that our method paves the way to further developments in the ﬁeld of neural sig-nal processing: in the future a review for performance comparison with diﬀerent FOBI estimatorsand extensions of other ICA methodologies would be interesting as well.

P-spline KL-FICA was conducted using a modiﬁed version of pspline.ffobi function of the pfica

Rpackage (Vidal and Aguilera, 2020). The implemented code together with a sample input data setsand documentation are available on request from the corresponding author (M.V.).

Author Contributions

Marc Vidal:

Conceptualization, Methodology, Software, Data collection, Writing − original draft. Mattia Rosso:

Experimental design, Data collection, Preprocessing, Writing − editing. Ana M.Aguilera:

Methodology, Writing − review and editing, Supervision. Acknowledgments

We thank Daniel Gost for helping with ﬁgures and formatting. The research of Marc Vidal wassupported by the Methusalem funding from the Flemish Government given to Marc Leman. Theauthors thank the support of the Spanish Ministry of Science, Innovation and Universities underproject MTM2017-88708-P also supported by the FEDER program) and research group FQM307funded by the Government of Andalusia (Spain).

Conﬂict of Interest : None declared.

References

Aguilera AM , Aguilera-Morillo MC. Comparative study of diﬀerent B-spline approaches for functional data.Mathematical and Computer Modelling. 2013; 58:1568–1579.

Aguilera AM , Aguilera-Morillo MC. Penalized PCA approaches for B-spline expansions of smooth functionaldata. Applied Mathematics and Computation. 2013; 219:7805–7819. 16 of 18 idal, Rosso and Aguilera , Preprint arXiv.org

Akhtar MT , Mitsuhashi W, James CJ. Employing spatially constrained ICA and wavelet denoising, for automaticremoval of artifacts from multichannel EEG data. Signal Processing. 2012; 92:401–416.

Ash RB , Gardner MF. Topics in stochastic processes. New York: Academic Press; 1975.

Bell AJ , Sejnowski TJ. An information-maximization approach to blind separation and blind deconvolution.Neural Computation. 1995; 7:1129–1159.

Buzsáki G . Rhythms of the Brain. Oxford University Press; 2006.

Cardoso JF . Source separation using higher order moments. In:

Proceedings of IEEE international conference onacoustics, speech and signal processing ; 1989. p. 2109–2112.

Cardoso JF , Souloumiac A. Blind beamforming for non-Gaussian signals. In:

IEE Proceedings F-Radar and SignalProcessing , vol. 140; 1993. p. 362–370.

Castellanos NP , Makarov VA. Recovering EEG brain signals: Artifact suppression with wavelet enhanced inde-pendent component analysis. Journal of Neuroscience Methods. 2006; 158:300 – 312.

Eilers PHC , Marx BD. Flexible smoothing with b-splines and penalties. Statistical Science. 1996; 11:89–121.

Ghanem R , Spanos P. Stochastic Finite Elements: A Spectral Approach. New York: Springer-Verlag; 1991.

Gramann K , An introduction to mobile brain/body imaging (MoBI); 2014.

Gutch H , Theis F. To inﬁnity and beyond: On ICA over Hilbert spaces. In: Theis F, Cichocki A, Yeredor A,Zibulevsky M, editors.

Latent Variable Analysis and Signal Separation ; 2012. p. 180–187.

Hasenstab K , Scheﬄer A, Telesca D, Sugar CA, Jeste S, DiStefano C, Sentürk D. A multi-dimensional functionalprincipal components analysis of EEG data. Biometrics. 2017; 3(73):999–1009.

Herault J , Jutten C. Space or time adaptive signal processing by neural models. In: Denker JS, editor.

AIPConference: Neural Networks for Computing , vol. 151 American Institute for Physics; 1986. p. 206–211.

Hyvärinen A , Oja E. A fast ﬁxed-point algorithm for independent component analysis. Neural Computation.1997; 9:1483–1492.

Kelly JW , Siewiorek DP, Smailagic A, Collinger JL, Weber DJ, Wang W. Fully automated reduction of ocularartifacts in high-dimensional neural data. IEEE Transactions on Biomedical Engineering. 2011; 58:598–606.

Kollo T . Multivariate skewness and kurtosis measures with an application in ICA. Journal of Multivariate Analysis.2008; 99(10):2328–2338.

Lakraj GP , Ruymgaart F. Some asymptotic theory for Silverman’s smoothed functional principal componentsin an abstract Hilbert space. Journal of Multivariate Analysise. 2017; 155:122–132.

Li B , Bever GV, Oja H, Sabolová R, Critchley F. Functional independent component analysis: an extension of thefourth-orderblind identiﬁcation.; 2015, submitted.

Loperﬁdo N . A new kurtosis matrix, with statistical applications. Linear Algebra and its Applications. 2017;512:1–17.

Mahajan R , Morshed BI. Unsupervised eye blink artifact denoising of EEG data with modiﬁed multiscale sampleentropy, kurtosis, and Wavelet-ICA. IEEE Journal of Biomedical and Health Informatics. 2015; 19:158–165.

Muthukumaraswamy SD . High-frequency brain activity and muscle artifacts in MEG/EEG: a review and rec-ommendations. Frontiers in human neuroscience. 2013; 7:138.

Nie Y , Wang L, Liu B, Cao J. Supervised functional principal component analysis. Statistics and Computing.2018; 28:713–723.

Nordhausen K , Virta J. An overview of properties and extensions of FOBI. Knowledge-Based Systems. 2019;173:113–116.

Nunez PL , Srinivasan R. Electric ﬁelds of the brain: the neurophysics of EEG. Oxford: Oxford University Press;2006.

Ocaña FA , Aguilera AM, Escabias M. Computational considerations in functional principal component analysis.Computational Statistics. 2007; 22:449–465. 17 of 18 idal, Rosso and Aguilera , Preprint arXiv.org

Peña C , Prieto J, Rendón C. Independent components techniques based on kurtosis for functional data analysis.Universidad Carlos III de Madrid; 2014.

Pokora O , Kolacek J, Chiu T, Qiu W. Functional data analysis of single-trial auditory evoked potentials recordedin the awake rat. Biosystems. 2018; 161:67–75.

Ramsay J , Silverman BW. Functional Data Analysis. New York: Springer; 2005.

Rice JA , Silverman BW. Estimating the Mean and Covariance Structure Nonparametrically When the Data areCurves. Journal of the Royal Statistical Society: Series B (Methodological). 1991; 53(1):233–243.

Schäfer J , Strimmer K. A shrinkage approach to large-scale covariance matrix estimation and implications forfunctional genomics. Statistical Applications in Genetics and Molecular Biology. 2005; 4:1–29.

Scheﬄer A , Telesca D, Q Li Q, Sugar CA, Distefano C, Jeste S, Şentürk D. Hybrid principal components analysisfor region-referenced longitudinal functional EEG data. Biostatistics. 2018; 21:139–157.

Silverman BW . Smoothed functional principal components analysis by choice of norm. The Annals of Statistics.1996; 24:1–24.

Singh YN , Singh SK, Ray AK. Bioelectrical signals as emerging biometrics: Issues and challenges. ISRN signalprocessing. 2012; 2012:1–13.

Tian TS . Functional data analysis in brain imaging studies. Frontiers Psychology. 2010; 1:1–11.

Tong S , Thakor NV. In:

Quantitative EEG analysis methods and clinical applications ; 2009. .

Vidal M , Aguilera AM. pﬁca: Penalized Independent Component Analysis for Univariate Functional Data; 2020, https://CRAN.R-project.org/package=pﬁca , r package version 0.1.1.

Virta J , Li B, Nordhausen K, Oja H. Independent component analysis for multivariate functional data. Journalof Multivariate Analysis. 2020; 176:1–19.

Wang X . Neurophysiological and computational principles of cortical rhythms in cognition. Physiological re-views. 2010; 90:1195–1268.

Xiao L . Asymptotic theory of penalized splines. Electronic Journal of Statistics. 2019; 13:747–794.

Yao F , Müller HG, Wang JL. Functional Data Analysis for Sparse Longitudinal Data. Journal of the AmericanStatistical Association. 2005; 100(470):577–590.