Independent component analysis for multivariate functional data
IIndependent component analysis for multivariate functional data ∗ Joni Virta , Bing Li , Klaus Nordhausen , and Hannu Oja University of Turku, Finland Pennsylvania State University, PA, U.S.A. Vienna University of Technology, Austria
Abstract
We extend two methods of independent component analysis, fourthorder blind identification and joint approximate diagonalization ofeigen-matrices, to vector-valued functional data. Multivariate func-tional data occur naturally and frequently in modern applications,and extending independent component analysis to this setting allowsus to distill important information from this type of data, going astep further than the functional principal component analysis. Toallow the inversion of the covariance operator we make the assump-tion that the dependency between the component functions lies in afinite-dimensional subspace. In this subspace we define fourth cross-cumulant operators and use them to construct the two novel, Fisherconsistent methods for solving the independent component problemfor vector-valued functions. Both simulations and an application ona hand gesture data set show the usefulness and advantages of theproposed methods over functional principal component analysis.
Keywords:
Covariance operator; Dimension reduction; Functional principalcomponent analysis; Fourth order blind identification; Hilbert space; Jointapproximate diagonalization of eigenmatrices ∗ The research of Joni Virta and Hannu Oja was partially supported by the Academyof Finland Grant 268703. The research of Bing Li was supported in part by the U.S.National Science Foundation grants DMS-1407537 and DMS-1713078. The research ofKlaus Nordhausen was supported by CRoNoS COST Action IC1408. a r X i v : . [ m a t h . S T ] D ec Introduction
Independent component analysis is a classical problem in multivariate statis-tics and signal processing where one assumes that the observed independentand identically distributed random vectors are linear mixtures of latent ran-dom vectors having independent marginal distributions. At its simplest thiscorresponds to presuming that, given the observed random vector x ∈ R p ,there exists a non-singular unmixing matrix Γ ∈ R p × p such that Γx = z , (1)where the random vector z ∈ R p has independent marginals. In the indepen-dent component problem a random sample of x is observed and the objectiveis to estimate any matrix Γ such that (1) holds. We say any matrix as theformulation of the problem is clearly not well-defined, one can freely scale,permute and change the signs of the rows in (1) and the right-hand sidestill retains the independence of its components. As such the constraintcov( z ) = I is usually introduced, freeing us of the scale invariance. Furtherassuming that at most one of the components of z is normally distributed,one can show that the vector z can be estimated up to marginal signs, orderand location (Comon and Jutten, 2010).Since its introduction in the 1980s a multitude of methods with varyingapproaches and assumptions have been proposed for solving the problem.These methods are generally based either on projection pursuit, decompo-sitions of various matrices of cumulants or maximum likelihood. The mostwell-known example belonging to the first class is FastICA (Hyv¨arinen andOja, 1997), a projection pursuit method that extracts the independent com-ponents either sequentially or simultaneously by maximizing some measureof non-Gaussianity. Several different variations of FastICA exist, see for ex-ample Koldovsky et al. (2006); Miettinen et al. (2014, 2017). The secondclass includes classic methods like FOBI and JADE (see below) but also sev-eral newer ones such as Moreau (2001); Bonhomme and Robin (2009). Foran example of likelihood-based methods, see e.g. Risk et al. (2015).In this work we focus exclusively on two of the very first methods pro-posed for independent component analysis, fourth order blind identification(FOBI) (Cardoso, 1989) and joint approximate diagonalization of eigenmatri-ces (JADE) (Cardoso and Souloumiac, 1993), which are simply based on the2iagonalization of various moment-based matrices. As such FOBI and JADEoffer an easy starting point for various extensions of independent componentanalysis into the realms of non-standard data structures, some examples in-cluding versions specially tailored for time series data (Matilainen et al.,2015), tensor-valued data (Virta et al., 2017a,b) and univariate functionaldata (Li et al., 2015).Before describing our contribution we first briefly review the key stepsbehind FOBI and JADE, both to motivate our exposition and to contrastthe constructions in the later sections. Namely, in both methods we assumethat the zero-mean random vector x ∈ R p obeys the independent componentmodel in (1), and additionally that the components of z have finite fourthmoments, β i = E ( z i ) < ∞ , i = 1 , . . . , p . A basic result in independentcomponent analysis then says that if Σ ( x ) − / is the symmetric inverse squareroot of the covariance matrix of x , then there exists an orthogonal matrix U ∈ R p × p such that the standardized random vector satisfies x st = Σ ( x ) − / x = Uz .For estimating the unknown matrix U both methods utilize fourth mo-ments. Defining next the matrices C ij ( x st ) = E (cid:8) ( x Tst e i )( x Tst e j ) x st x Tst (cid:9) − δ ij I − e i e Tj − e j e Ti , (2)where e i is the i th member of the canonical basis of R p — that is, e i hasall components equal to 0 except its i th component, which is 1, and δ ij isthe Kronecker delta. The set C = { C ij ( x st ) | i, j = 1 , . . . , p } collects ev-ery fourth cross-cumulant of the standardized random vector x st . It canbe shown that under the model the unknown orthogonal matrix U T diag-onalizes all matrices in the set C and JADE estimates U T by simultane-ously (approximately) diagonalizing these matrices. FOBI can be viewedas a lighter version of JADE in that it only diagonalizes the single matrix (cid:80) pi =1 C ii ( x st ) = E ( x st x Tst x st x Tst ) − ( p + 2) I , which is the sum of a subset ofmembers of C . By this heuristic it seems reasonable to speculate that JADEoutperforms FOBI, which indeed is generally the case: see, for example, Mi-ettinen et al. (2015). Additionally, for JADE to be Fisher consistent it issufficient that at most one of the β i ’s is zero, whereas for FOBI to be Fisherconsistent we need the stronger condition that all β i are distinct. However,JADE pays for its advantages by being computationally much heavier thanFOBI, and when a quick application of an independent component analysismethod is needed, FOBI is often the first choice.3 .2 Independent component analysis and functional data As the main contribution of this work we further extend on the functionalindependent component analysis proposed in Li et al. (2015) by consideringnot real-valued functions but instead functions that take values in the p -dimensional Euclidean space. That is, for each observational unit we observe p functions not necessarily residing in the same function space. Data ofthis form is increasingly common nowadays and some areas of applicationinclude electroencephalography (EEG) data, socio-economic time series dataobserved for multiple areas/countries and three-dimensional location datameasured for multiple observational units over time.Although univariate functional data analysis is currently exceedingly pop-ular, its multivariate counterpart has received relatively little attention in theliterature. Some previous contributions to the field include: Ramsay and Sil-verman (2005); Berrendero et al. (2011); Sato (2013); Chiou et al. (2014);Jacques and Preda (2014); Happ and Greven (2017) discussed multivariatefunctional principal component analysis, Jacques and Preda (2014) using theextracted principal components to conduct clustering and Happ and Greven(2017) allowing different domains for the component functions; Tokushigeet al. (2007); Ieva et al. (2011) developed multivariate functional clusteringusing k -means and Kayano et al. (2010) used orthonormalized Gaussian ba-sis functions for the same purpose; Li and Song (2017b) developed sufficientdimension reduction methodology where both the predictor and the responsecan be multivariate functional data.Consider next the conceptual and theoretical differences between multivariate-functional and univariate-functional extensions of independent componentanalysis. The two key aspects of independent component analysis are statis-tical independence and the notion of marginals. In a sense, the multivariatefunctional extension considered here is conceptually easier than the univariatefunctional extension developed in Li et al. (2015). As observed in that paper,unlike in the classical setting, the univariate functional data do not have nat-ural marginal random variables on which to perform independent componentanalysis. Li et al. (2015) tackled this issue by using the coefficients in theKarhunen-Loeve expansion as the marginal random variables to prompt theprocess. Independent components are then defined in terms of these coeffi-cients, see also Gutch and Theis (2012). For multivariate functional data,however, we can take a more straightforward route of simply treating thecomponent functions as the marginals. In this context the independent com-4onent problem has the intuitively appealing objective of, given an observedmultivariate random function, trying to extract another multivariate randomfunction with independent component functions. These independent compo-nent functions can then be various latent processes, such as vital signs inthe context of EEG-data. A finite-dimensional analogue for our problem isthe independent subspace analysis (Cardoso, 1998), where we try to divide alarger space into a collection of smaller, independent subspaces. To sum up,the independent components in Li et al. (2015) are random variables, butthe independent components in this paper are random functions. From thisperspective, this paper is not an extension of Li et al. (2015), but insteadan extension of the classical independent component analysis into a differentdirection.In Section 2 we go briefly through the basics of functional analysis. Thesection also introduces the Cartesian product space H where our observedfunctions will reside in and a natural subclass of linear operators therein.Section 3 equips the space H with a suitable probability structure and, havingdefined what we mean by a random multivariate function X ∈ H , definesthe covariance matrix operator of X . The proposed methods of functionalindependent component analysis are described in Section 4 along with aproof of their Fisher consistency. In Section 5 we derive the coordinaterepresentations for the sample versions of the methods and in Section 6 usethem in a simulation study and in an application on the uWave hand gesturedata set (Liu et al., 2009). Finally, we close in Section 7 with some discussionand prospective ideas. The simulation and real data example were conductedwith R (R Core Team, 2016) using the packages fda (Ramsay et al., 2014),ggplot2 (Wickham, 2009), JADE (Miettinen et al., 2017), MASS (Venablesand Ripley, 2002) and reshape2 (Wickham, 2007). H of vector-valued functions We next review the basics of functional analysis, see Conway (2013) for astandard treatment. Let T ⊂ R be an interval and ( H i , (cid:104)· , ·(cid:105) i ), i = 1 , . . . , p ,be separable Hilbert spaces of functions from T to R . Furthermore, let B i be the Borel σ -field generated by the open sets in H i with respect tothe metric induced by (cid:104)· , ·(cid:105) i . Let H be the direct sum of H , . . . , H p ; that5s, H = × pi =1 H i is the Cartesian product of the individual spaces and theinner product in H is defined by (cid:104) f, g (cid:105) H = (cid:104) f , g (cid:105) + · · · + (cid:104) f p , g p (cid:105) p , for any f = ( f , . . . , f p ) ∈ H and g = ( g , . . . , g p ) ∈ H . Denoting the norms inducedby the inner products (cid:104)· , ·(cid:105) H , (cid:104)· , ·(cid:105) , . . . , (cid:104)· , ·(cid:105) p by (cid:107) · (cid:107) H , (cid:107) · (cid:107) , . . . , (cid:107) · (cid:107) p ,respectively, the relation (cid:107) f (cid:107) H = (cid:107) f (cid:107) + · · · + (cid:107) f p (cid:107) p is easily seen to hold forany f = ( f , . . . , f p ) ∈ H . Furthermore, a natural σ -field in H is the product σ -field B = B × · · · × B p generated by all measurable rectangles B × · · · × B p where B i ∈ B i , i = 1 , . . . , p .Being separable, each H i admits a countable orthonormal basis, E i = { e ik } ∞ k =1 . Using the component bases we construct an orthonormal basis in H as follows. Let e + ik denote the p -dimensional vector of functions whosecomponents are 0 except for the i th component, which is e ik . Then { e + ik : i = 1 , . . . , p, k = 1 , , . . . } is an orthonormal basis of H . This constructionimplies that the product space H is also separable. Throughout the paperany vector f ∈ H which has exactly one non-zero component will be calledcanonical, in relation to such a vector’s resemblance to the canonical basisvectors in the Euclidean spaces.Let L ( H j , H i ) be the set of all bounded linear operators from H j to H i .That is, a linear operator L ij is in L ( H j , H i ) if and only if there exists apositive M ij such that for all f j ∈ H j we have (cid:107) L ij f j (cid:107) i ≤ M ij (cid:107) f j (cid:107) j . Thenfor any i, j , ( L ( H j , H i ) , (cid:107) · (cid:107) OP ,ij ) is a Banach space where the operator norm (cid:107) · (cid:107) OP ,ij is defined as (cid:107) L ij (cid:107) OP ,ij = sup f j (cid:54) =0 (cid:18) (cid:107) L ij f j (cid:107) i (cid:107) f j (cid:107) j (cid:19) . In the following we will use the notation (cid:107) · (cid:107) OP for all possible operatornorms and the context will always make clear which operator norm we mean.Similarly, I will be used to denote the identity operator of all consideredspaces, the context again making the intended use clear. Recall also thatfor all L ij ∈ L ( H j , H i ), there exists the adjoint operator L ∗ ij , defined as theunique member of L ( H i , H j ) that satisfies (cid:104) L ij f j , f i (cid:105) i = (cid:104) f j , L ∗ ij f i (cid:105) j , for all f i ∈ H i and f j ∈ H j .Finally, define the tensor product f i ⊗ f j of f i ∈ H i and f j ∈ H j as thelinear operator from H j to H i having the action g j (cid:55)→ (cid:104) f j , g j (cid:105) j f i . Equivalentproperties to those listed for tensor product operators from H to H in Lemma2 of Li et al. (2015) can also be proven for the tensor product operators from H j to H i . 6 .2 Matrices of bounded linear operators in H We next consider a natural subset of the set of all bounded linear operatorsfrom H to H , constructed using bounded linear operators from the compo-nent spaces to each other. For a set of operators { L ij ∈ L ( H j , H i ) : i, j =1 , . . . , p } , let L be the operator H → H , ( f , . . . , f p ) ∈ H (cid:55)→ (cid:16)(cid:80) pj =1 L j f j , . . . , (cid:80) pj =1 L pj f j (cid:17) . (3)Intuitively, we can identify L with the matrix of bounded linear operators, L ≡ L · · · L p ... . . . ... L p · · · L pp , so that the map in (3) can be formally regarded as matrix multiplication.We denote the class of all such operators as L ( H ) = × pi,j =1 L ( H j , H i ). Thesame construction was used in Li and Solea (2017). See also Sato (2013) andLi et al. (2014).Using (3), it is easy to check that an operator L ∈ L ( H ) is also linear; thatis, L ( af + bg ) = a ( Lf ) + b ( Lg ), for all f, g ∈ H and a, b ∈ R . Furthermore,using the Cauchy-Schwarz inequality and the operator norm inequality onecan show that, for all L ∈ L ( H ) and f ∈ H , we have (cid:107) Lf (cid:107) H ≤ (cid:16)(cid:80) pi,j =1 (cid:107) L ij (cid:107) OP (cid:17) / (cid:107) f (cid:107) H . That is, an operator L ∈ L ( H ) is also bounded. Thus, an operator L ∈ L ( H )inherits both linearity and boundedness from its component operators L ij .Consequently, being a bounded linear operator, any L ∈ L ( H ) admits theadjoint operator L ∗ . Using some algebra it is easily seen that the elements ofthe adjoint satisfy ( L ∗ ) ij = L ∗ ji , drawing an analogy to the Hermitian adjointof a matrix in C p × p .Two useful subsets of L ( H ) are now readily defined. Call a member L ∈L ( H ) a diagonal matrix of operators (or simply diagonal) if L ij = 0 whenever i (cid:54) = j and L ∗ ii = L ii . The simplest diagonal operator is the identity operatorfor which L ii = I , i = 1 , . . . , p . Diagonal operators play later a central role inestimating solutions to the functional independent component model and asone of our key results we prove in Section 4 a connection between diagonal7perators and canonical vectors. Finally, an element U ∈ L ( H ) is calledunitary if U ∗ U = U U ∗ = I . Using the component representation it is easilyseen that a sufficient and necessary condition for U to be unitary is p (cid:88) k =1 U ik U ∗ jk = ∆ ij , for all i, j = 1 , . . . , p, where ∆ ij is the zero operator if i (cid:54) = j , and is the identity operator from H i to H j if i = j . This is a clear analogy for the orthonormality of the rows ofa unitary matrix in C p × p . H H Let (Ω , F , P ) be a probability space. A random element in H i is a func-tion X i : Ω → H i that is F / B i -measurable, i = 1 , . . . , p . Similarly, arandom element in H is a function X : Ω → H that is F / B -measurable.A random element X in H can thus be thought of as a random function X ( · ) = ( X ( · ) , . . . X p ( · )), where X i resides in H i , i = 1 , . . . , p . For the basictheory of random variables in function spaces see Bosq (2012).In the following, we denote the set of all m th power integrable randomelements in H by X m ( H ), that is, X m ( H ) = { ( X : Ω → H ) : E ( (cid:107) X (cid:107) m H ) < ∞} . It is easily seen that requiring X ∈ X ( H ) or X ∈ X ( H ) is equivalent torequiring the component functions to respectively satisfy X i ∈ X ( H i ) or X i ∈ X ( H i ), for all i = 1 , . . . , p .Next, define a random operator W ij to be a mapping W ij : Ω → L ( H j , H i )that is F / B OP -measurable where B OP is the Borel σ -field generated by theopen sets of L ( H j , H i ) with respect to the metric induced by the opera-tor norm (cid:107) · (cid:107) OP . If W ij is a random operator with E ( (cid:107) W ij (cid:107) OP ) < ∞ ,then the bivariate map ( f i , f j ) (cid:55)→ E ( (cid:104) f i , W ij f j (cid:105) i ) is a bounded bilinear formand can be shown to induce a unique operator A ij ∈ L ( H j , H i ) satisfying (cid:104) f i , A ij g j (cid:105) i = E (cid:104) f i , W ij g j (cid:105) i for all f i ∈ H i and g j ∈ H j . We define theexpected value of W ij to be this operator, E ( W ij ) = A ij Using the previous we are now sufficiently equipped to define the firsttwo moments, the mean function and the covariance matrix operator, of arandom element X ∈ H . 8 .2 The covariance matrix operator Σ XX Assume next that X ∈ X ( H ). The expected values E ( X i ) = µ i ∈ H i , i = 1 , . . . , p , are readily defined as the Riesz representation of the boundedlinear functional H i → R , f i (cid:55)→ E ( (cid:104) f i , X i (cid:105) i ) . Using the component-wise expected values µ i we further define the expectedvalue of the random element X to be the function µ = ( µ , . . . , µ p ) ∈ H . Aswe can always center our observed data, it is not restricting to assume that µ = 0, as we will do for the remainder of this work.Consider then the random operator ( X i ⊗ X j ). Using the Cauchy-Schwarzinequality we have E ( (cid:107) X i ⊗ X j (cid:107) OP ) ≤ (cid:8) E (cid:0) (cid:107) X i (cid:107) i (cid:1) E (cid:0) (cid:107) X j (cid:107) j (cid:1)(cid:9) / , the right-hand side of which is finite due to our assumption on square inte-grability. The random operator ( X i ⊗ X j ) thus induces the unique, boundedlinear operator, Σ X i X j = E ( X i ⊗ X j ), the cross-covariance operator (Baker,1973) between X i and X j . Using the definition of the expected value of arandom operator one can further show that the adjoint operator of Σ X i X j isΣ ∗ X i X j = Σ X j X i .Using the p bounded linear operators Σ X i X j we next construct the co-variance matrix operator Σ XX ∈ L ( H ) asΣ XX ≡ Σ X X · · · Σ X X p ... . . . ...Σ X p X · · · Σ X p X p , It is easily seen that, for f = ( f , . . . , f p ) ∈ H , we have the equality f ⊗ f =( f i ⊗ f j ) pi,j =1 and the covariance matrix operator Σ XX can then be writtencompactly as E ( X ⊗ X ). This type of matrices of covariance operators werealso used in Li and Song (2017a) and Song and Li (2017). Remark 3.1.
For clarity we use two different notations for the covariancematrix operator of a random function X ∈ X ( H ) : when it is understood as abounded linear operator in H we use the notation Σ XX ; when it is understoodas the mapping X ( H ) → L ( H ) , X (cid:55)→ Σ XX , we use the notation Σ . x )of a square-integrable random vector x = ( x , . . . , x p ) T : i) self-adjointness(symmetry), cov( x ) = cov( x ) T , ii) positive-semidefiniteness, for any a ∈ R p we have a T cov( x ) a ≥
0, iii) affine equivariance, for any invertible matrix A ∈ R p × p the covariance matrix transforms as cov( Ax ) = A cov( x ) A T andiv) full independence property, if x i and x j are independent then cov( x ) ij =0. Not surprisingly, it turns out that all of these properties are shared alsoby the covariance matrix operator Σ XX , as described in the following lemma. Lemma 3.1.
Assuming X ∈ X ( H ) , the covariance matrix operator Σ XX ∈L ( H ) has the following properties:i) It is a self-adjoint, non-negative, trace-class operator and as such ad-mits a spectral decomposition with the associated orthonormal basis { φ k } ∞ k =1 .ii) As a mapping Σ : X ( H ) → L ( H ) , the covariance matrix operatoris affine equivariant in the sense that Σ( AX ) = A Σ( X ) A ∗ for anyinvertible bounded linear operator A ∈ L ( H ) .iii) If X i and X j are independent, Σ X i X j = 0 . These properties were established in Li et al. (2015) for the case of uni-variate X . Remark 3.2.
A stronger version of the affine equivariance can be shown tohold. Let A = ( A ij ) ki =1 pj =1 where A ij is a linear operator from H j to somesuitable Hilbert space G i , and k is any positive integer. Then we still have Σ( AX ) = A Σ( X ) A ∗ , a property that is in R p called full affine equivariance. Part i of Lemma 3.1 guarantees the existence of the spectral decomposi-tion of Σ XX into a sum of rank-1 operators:Σ XX = ∞ (cid:88) k =1 λ k ( φ k ⊗ φ k ) , (4)where ( φ k , λ k ) are eigenvector-eigenvalue pairs, { φ k } ∞ k =1 is an orthonormalbasis of H and the eigenvalues satisfy λ ≥ λ ≥ · · · ≥
0. This representationwill be used next to define the independent component model in H .10 Independent component analysis in H H We say that X ∈ X ( H ) follows the H -valued independent component modelif there exists a matrix of operators Γ = (Γ ij ) pi,j =1 ∈ L ( H ) such thatΓ X = Z, (5)where Z = ( Z , . . . , Z p ) is a random element in H having mutually indepen-dent component functions. We define two random elements X : Ω → G and Y : Ω → G , not necessarily having values in the same space, to be indepen-dent if E { q ( X ) q ( Y ) T } = E { q ( X ) } E { q ( Y ) T } for all q ( X ) ∈ X ( R p ), q ( Y ) ∈ X ( R p ) with p , p ∈ N . The objective in the H -valued indepen-dent component analysis is to estimate some unmixing operator Γ such thatΓ X has independent component functions.Like its vector-valued analogy in (1), the operator Γ in model (5) is notuniquely defined. If one applies to both sides of (5) any diagonal operator D ∈ L ( H ), the right-hand side still retains independent component functions.This implies that, without further assumptions, we cannot hope to find anyunique functional form for the component functions. Indeed, as we show laterin this section, our proposed methods actually estimate { B j Z j } pj =1 where B j ∈ L ( H j ), j = 1 , . . . , p . However, this identifiability issue does not affectour goal of discovering independent components, as the resulting vector offunctions has independent component functions regardless of the form of D .We will next approach the problem by extending two methods of vector-valued independent component analysis, FOBI and JADE, to the case ofvector-valued random functions. The first step in vector-valued independent component analysis is the stan-dardization of x by the inverse square root of the covariance matrix cov( x ).However, like in Li et al. (2015), the fact that the inverses of compact oper-ators are unbounded means that we must resort to additional assumptions.Let { φ k } ∞ k =1 be the orthonormal basis of H consisting of the eigenvectors ofΣ XX in decreasing order according to the corresponding eigenvalues. Fora fixed d ∈ N , let M d = span( { φ k } dk =1 ) be the subspace of H spanned by11he d first eigenvectors of Σ XX . The simplifying assumption we make is thefollowing. Assumption 4.1.
The component functions of X are dependent only alongthe d orthogonal directions { φ k } dk =1 . That is, if R d = (cid:80) ∞ k = d +1 (cid:104) X, φ k (cid:105) H φ k ,then the p components of R d are independent. In vector-valued independent component analysis this assumption is nat-urally always satisfied by picking simply d = p . One interpretation for theassumption in the current case is that the majority of the structure of theindependent component functions is noise, meaning that the signal in thefunction Z is in some sense finite-dimensional.Note that { φ k } may not span the entire H . However, by definition theyare guaranteed to span ran(Σ XX ), the closure of the range space of Σ XX .Because Σ XX is self-adjoint, ran(Σ XX ) ⊥ = ker(Σ XX ), the kernel space ofΣ XX . Meanwhile, for any f ∈ ker(Σ XX ), we have (cid:104) f, Σ XX f (cid:105) H = var {(cid:104) f, X (cid:105) H } = 0 , which implies that (cid:104) f, X (cid:105) H = constant almost surely. Since this holds forthe special case X ( ω ) = 0, we have (cid:104) f, X (cid:105) H = 0 almost surely. This means f is orthogonal to the support of X . Since such functions are of no interestto us, we can, without loss of generality, reset H to be ran(Σ XX ), as we willdo for the rest of the paper.For an arbitrary subspace M ⊂ H , let P M and Q M denote the orthogonalprojections on to M and M ⊥ , respectively. Then Assumption 4.1 says thatwe can without loss of generality consider the projections X ( d ) = P M d X instead of the original observations X . This simplifies the model (5) to theform Γ X ( d ) = Z ( d ) , (6)where X ( d ) , Z ( d ) are random functions in M d , the component functions of Z ( d ) are independent and Γ ∈ L ( M d ) is assumed to be invertible. Remark 4.1.
Later in this section the proposed methods are shown to beFisher consistent, meaning that under the model (5) and Assumption 4.1 thefinal independent component scores are invariant to injective transformations X (cid:55)→ ( P M d AP M d + Q M d ) X , where A ∈ L ( M d ) . However, as our estimationmethods crucially depend on the existence of a random function Z it is not eaningful to speak of affine equivariance outside the model in the same gen-eral sense that holds for both vector-valued FOBI and JADE, see Miettinenet al. (2015). With this, we are now ready to present the first step towards the estima-tion of Z , an analogy for Lemma 3 in Li et al. (2015). In the following, foran A ∈ L ( M d ), let A − / denote the self-adjoint inverse square root of theself-adjoint linear operator A within M d , that is, A − / AA − / = P M d . Lemma 4.1.
Assume that X ∈ X ( H ) follows the model (6) . Then Σ( X ( d ) ) − / X ( d ) = U Σ( Z ( d ) ) − / Z ( d ) , for some unitary operator U ∈ L ( M d ) . The standardized functions are in the following denoted by ˜ X = Σ( X ( d ) ) − / X ( d ) and ˜ Z = Σ( Z ( d ) ) − / Z ( d ) and naturally satisfy Σ( ˜ X ) = Σ( ˜ Z ) = P M d . Thenext step towards finding Z is the estimation of the unknown unitary opera-tor U in Lemma 4.1. As described in the introduction both FOBI and JADEapproach it via matrices of fourth cross-cumulants and before continuing wefirst define operatorial counterparts for them. C ij ( X ) In this section we assume that the zero-mean random function X ∈ X ( M d )resides in the d -dimensional space M d spanned by the fixed orthonormal basis { φ k } dk =1 . We define the ( i, j )th fourth cross-cumulant of X with respect tothe basis { φ k } dk =1 to be C ij ( X ) = E {(cid:104) X, φ i (cid:105) H (cid:104) X, φ j (cid:105) H ( X ⊗ X ) } − E {(cid:104) X (cid:48) , φ i (cid:105) H (cid:104) X (cid:48) , φ j (cid:105) H ( X ⊗ X ) } (7) − E {(cid:104) X (cid:48) , φ i (cid:105) H (cid:104) X, φ j (cid:105) H ( X (cid:48) ⊗ X ) } − E {(cid:104) X (cid:48) , φ i (cid:105) H (cid:104) X, φ j (cid:105) H ( X ⊗ X (cid:48) ) } , where i, j = 1 , . . . , d and the random function X (cid:48) is an independent copy of X . Repeated application of the Cauchy-Schwarz inequality shows that, forexample, the first term in (7) satisfies E {(cid:107)(cid:104) X, φ i (cid:105) H (cid:104) X, φ j (cid:105) H ( X ⊗ X ) (cid:107) OP } ≤ E (cid:0) (cid:107) X (cid:107) H (cid:1) < ∞ , implying that the first term of (7) exists as a uniquely defined boundedlinear operator in M d . Similar considerations for the other terms show that13he operator C ij ( X ) ∈ L ( M d ) is then well-defined. Our main interest isin standardized random functions, Σ( X ) = P M d , and the following lemmaprovides a simplified form for (7) in that case. Lemma 4.2.
Let the zero-mean random function X ∈ X ( M d ) satisfy Σ( X ) = P M d . Then we have C ij ( X ) = E {(cid:104) X, φ i (cid:105) H (cid:104) X, φ j (cid:105) H ( X ⊗ X ) } − δ ij P M d − φ i ⊗ φ j − φ j ⊗ φ i . The operator C ij in Lemma 4.2 closely resembles the cross-cumulant ma-trix (2) for standardized random vectors and is next shown to serve similarpurposes in constructing our versions of FOBI and JADE in H . Theorem 4.1.
Assume that Z ∈ X ( M d ) has independent component func-tions and that Σ( Z ) = P M d . Then we have for any unitary matrix of opera-tors U = ( U kl ) pk,l =1 ∈ L ( M d ) and for any i, j = 1 , . . . , d : C ij ( U Z ) =
U D ij U ∗ , where D ij = D ij ( U, Z ) is a diagonal matrix of operators with the diagonaloperators D ijkk = E { ( Z k ⊗ Z k )( ξ ik ⊗ ξ jk )( Z k ⊗ Z k ) } − (cid:104) ξ ik , ξ jk (cid:105) k P k − ( ξ ik ⊗ ξ jk ) − ( ξ jk ⊗ ξ ik ) , (8) for k = 1 , . . . , p , where ξ i = ( ξ i , . . . , ξ ip ) = U ∗ φ i and P k is the projectionoperator from the k th component space of H to the k th component space of M d . Theorem 4.1 essentially says that U ∗ diagonalizes (as in a diagonal op-erator) the operator C ij ( U Z ) for every choice of i, j = 1 , . . . , d and thesedecompositions provide us a mean of finding the missing unitary operator U .Our version of JADE will later utilize all p of these operators and for FOBIwe use just a subset of them, captured by the FOBI-operator C ( X ) ∈ L ( M d ), C ( X ) = d (cid:88) i =1 C ii ( X ) . (9)The next theorem gives some useful properties of this operator.14 heorem 4.2. Assume that Z ∈ X ( M d ) has independent component func-tions and that Σ( Z ) = P M d . Then, for any unitary matrix of operators U = ( U kl ) pk,l =1 ∈ L ( M d ) , the FOBI-operator (9) satisfies C ( U Z ) =
U C ( Z ) U ∗ = U DU ∗ , where D = C ( Z ) is a diagonal matrix of operators with the diagonal entries D kk = E (cid:8) ( Z k ⊗ Z k ) (cid:9) − ( d k + 2) P k , where d k = dim [ span ( { φ mk } dm =1 )] , the dimension of the k th component space,and P k is the projection operator from the k th component space of H to the k th component space of M d . The first equality in Theorem 4.2 does not need the independence ofthe component functions of Z but actually holds for all standardized Z ∈X ( M d ), as long as the operator U is unitary. This property of the functional B : X ( M d ) → L ( M d ) is called unitary equivariance.Recall from the introduction that in FOBI we diagonalize a single matrixand in JADE multiple matrices simultaneously. The functional analogy forthe former is the spectral decomposition of C ( X ) and for the latter we definenext the joint diagonalization of a set of operators. Namely, define the jointdiagonalizer of a finite set of operators, S = { S i | S i ∈ L ( M d ) , i = 1 , . . . , I } ,to be the orthonormal basis { ψ k } dk =1 of M d that maximizes the objectivefunction w ( ψ , . . . , ψ d ) = I (cid:88) i =1 d (cid:88) k =1 (cid:104) ψ k , S i ψ k (cid:105) H . (10)In the previous paragraphs we have discussed two kinds of diagonality,the diagonality in the sense of diagonal operators in Theorems 4.1 and 4.2and the diagonality in the sense of the spectral decomposition. The finaltool we need for the estimation of the independent functions is a connectionbetween these two concepts. Recall that by a canonical vector we mean anyelement of H which has at most one non-zero component. The needed con-nection is now provided by the next pair of lemmas which show that (undersuitable assumptions) the spectral decomposition and joint diagonalizationof diagonal operators mimic the eigendecomposition and joint diagonaliza-tion of diagonal real matrices in the sense that the spectral decompositionsand the joint diagonalizer of a set of diagonal operators consist entirely ofcanonical vectors. 15 emma 4.3. Let D ∈ L ( H ) be a diagonal matrix of operators with finiterank d and let its spectral decomposition be D = d (cid:88) k =1 τ k ( ψ k ⊗ ψ k ) where the eigenvalues { τ k } dk =1 are distinct. Then the eigenvectors { ψ k } dk =1 are canonical. Lemma 4.4.
Let S = { S i } Ii =1 be a finite collection of bounded linear op-erators in M d and let { ψ k } dk =1 be an orthonormal basis of M d . Then wehave w ( ψ , . . . , ψ d ) ≤ I (cid:88) i =1 (cid:107) S i (cid:107) HS , where (cid:107) · (cid:107) HS is the Hilbert-Schmidt norm and an equality is reached if andonly if each ψ k is an eigenvector of each S i , k = 1 , . . . , d , i = 1 , . . . , I . Inparticular, if all operators in S are diagonal and share an eigenbasis then theelements of the joint diagonalizer are canonical. U Using the previously defined fourth cross-cumulant operators we next formu-late the functional counterparts for the steps taken in vector-valued FOBIand JADE to estimate the orthogonal matrix U . Definition 4.1.
Let X ∈ X ( H ) follow the model (6) . Then we definei) FOBI-basis of X is the set { ψ Fk } dk =1 of eigenfunctions of the FOBI-operator C ( ˜ X ) ,ii) JADE-basis of X is the joint diagonalizer { ψ Jk } dk =1 of the set of opera-tors C = { C ij ( ˜ X ) } di,j =1 . In the next theorem Lemmas 4.3 and 4.4 are applied respectively to theFOBI-basis and JADE-basis to find U . However, to guarantee consistencywe need to make some additional assumptions which guarantee that theeigenbases are unique up to signs and order. For the FOBI-solution we needthe following. 16 ssumption 4.2. The eigenvalues of C ( ˜ Z ) are distinct. One consequence of Assumption 4.2 is that FOBI cannot estimate twolatent functions having the same distribution. For JADE the correspondingassumption is much more relaxed but to use Lemma 4.4 we first need theadditional assumption that all the diagonal operators in Theorem 4.1 sharea common eigenbasis.
Assumption 4.3.
The operators D ij ( U , ˜ Z ) , i, j = 1 , . . . p , have a commoneigenbasis. While this sounds somewhat stringent, in Section 5 discussing the sampleversion of the method we show that Assumption 4.3 is in fact not that strict,and is satisfied under some general conditions and choices of d . The need forthe next assumption guaranteeing the uniqueness of the eigenbasis for JADEnow follows directly from the equality condition in Lemma 4.4. Assumption 4.4.
For each pair ( ψ Jk , ψ Jl ) , k, l = 1 , . . . , d , there exists a pair ( i, j ) , i, j = 1 , . . . d , such that the eigenvalues of D ij ( U , ˜ Z ) related to ψ Jk and ψ Jl are distinct. The next theorem finally proves the Fisher consistency of our approachby showing how the FOBI-basis and JADE-basis can be used to estimate theindependent component functions.
Theorem 4.3.
Let X ∈ X ( H ) follow the model (6) and let { ψ Fk } dk =1 and { ψ Jk } dk =1 be the FOBI-basis and JADE-basis of X , respectively. Assume fur-ther that either Assumption 4.2 (FOBI) or Assumptions 4.3 and 4.4 (JADE)are satisfied. Then the FOBI and JADE estimators of the latent functionsare respectively the d elements of X ( H ) given as ˆ Z Fk = ( ψ Fk ⊗ ψ Fk ) ˜ X, k = 1 , . . . , d, and ˆ Z Jk = ( ψ Jk ⊗ ψ Jk ) ˜ X, k = 1 , . . . , d, where each estimator ˆ Z • k corresponds to exactly one latent function Z j . The proof of Theorem 4.3 shows that for each k = 1 , . . . , d the proce-dure actually recovers the p -variate function ˆ Z • k = (cid:104) ψ • k , ˜ X (cid:105) H ψ • k = (cid:104) h • k , ˜ Z (cid:105) H ψ • k where the only dependency on the latent function Z is through the innerproduct (cid:104) ψ • k , ˜ X (cid:105) H = (cid:104) h • k , ˜ Z (cid:105) H . Furthermore, every h • k is canonical, mean-ing that each of the estimates ˆ Z • k contains information on exactly one latentcomponent Z j and this information is entirely contained in the single inner17roduct, (cid:104) ψ • k , ˜ X (cid:105) H . In the following we will refer to these inner products asthe independent component scores. As more than one score can be relatedto a single latent function Z j , the d -vector of independent component scorescan further be divided into m mutually independent subvectors, Z ( l ) ∈ R d l , (cid:80) ml =1 d l = d , so that each subvector corresponds to a single latent function Z j . For deriving the sample version of the proposed method we make the simpli-fying assumption that the component spaces are the same, H = · · · = H p .The generalization to the case of different component spaces follows easily.Let X , . . . , X n be a random sample of X . Here, we use superscript torepresent the position in a sample, to differentiate from the subscript in X i which represents the i th component of X . Furthermore, let X ij represent the j th component of X i . Although our theory is based on infinite-dimensionalspaces, our observations are always finite-dimensional and so let X ij ( t m,ij )denote the value of the j th component function of the i th observation at thetime point { t m,ij : m = 1 , . . . , M ij , j = 1 , . . . , p, i = 1 , . . . , n } . We thus allow the measurement times and the numbers of measurements todiffer across both observations and components. The underlying assumptionin functional data analysis is that the observed values X ij ( t m,ij ) correspond tolatent (smooth) functions that we observe only at the discrete times t m,ij . Thefirst step in implementing the method is thus to express all the observationsas functions using some suitable basis.For approximating the space H , fix a K -element basis G = { g k } Kk =1 , thespan of which we denote as M . The functional approximations ˆ x ij ( t ) = (cid:80) Kk =1 ˆ c ijk g k ( t ) of the observed curves in M can be found as(ˆ c ij , . . . , ˆ c ijK ) T = argmin M ij (cid:88) m =1 (cid:40) X ij ( t m,ij ) − K (cid:88) k =1 c ijk g k ( t m,ij ) (cid:41) , c ijk we denote in the following the coordinate vector of the j th component func-tion of the i th observation in the basis G as [ X ij ] G = ( c ij , . . . , c ijK ) T ∈ R K .Consider then the pK -dimensional product space M = M × · · · × M . Thespace M then has the natural direct sum basis G = G ⊕· · ·⊕G . The stackedvector of the coordinates of all p component functions of the i th observationin the basis G is denoted by [ X i ] G = ([ X i ] T G , . . . , [ X ip ] T G ) T ∈ R pK and thematrix of all coordinates of all observations by [ X ] G = ([ X ] G , . . . , [ X n ] G ) T ∈ R n × pK . We assume without loss of generality that the coordinate represen-tations of the observations are centered, (cid:80) ni =1 [ X ij ] G = , j = 1 , . . . , p .Let G G = ( (cid:104) g k , g k (cid:48) (cid:105) H ) Kk,k (cid:48) =1 denote the Gram matrix of a basis G = { g k } Kk =1 . For orthonormal bases the Gram matrix equals the identity ma-trix and if G is a direct sum basis G = G ⊕ · · · ⊕ G then clearly G G =diag( G G , . . . , G G ) = ( I p ⊗ G G ), where G G is the Gram matrix of the basis G and ⊗ is the Kronecker product between matrices. The next theorem nowdescribes how the coordinate representations can be used to carry out theproposed methods in practice. Theorem 5.1.
Let [ ˆΦ] G ∈ R pK × d contain the d first eigenvectors of thematrix (1 /n )[ X ] T G [ X ] G ( I p ⊗ G G ) and let the diagonal matrix Λ d ∈ R d × d holdthe corresponding eigenvalues as its diagonal elements. Then, let [ ˜ X i ] V = Λ − / d [ ˆΦ] T G ( I p ⊗ G G )[ X i ] G ∈ R d , i = 1 , . . . , n , contain the coordinates of thestandardized observations in the eigenbasis. Finally, leti) the columns of [ ˆΨ F ] V ∈ R d × d be the eigenvectors of the matrix n n (cid:88) i =1 [ ˜ X i ] T V [ ˜ X i ] V · [ ˜ X i ] V [ ˜ X i ] T V − ( d + 2) I d , ii) the columns of [ ˆΨ J ] V = ([ ˆ ψ J ] V , . . . , [ ˆ ψ Jd ] V ) ∈ R d × d be the orthonormalset of vectors satisfying [ ˆΨ J ] V = argmax [ ˆΨ J ] T V [ ˆΨ J ] V = I d (cid:88) k =1 d (cid:88) l =1 d (cid:88) m =1 (cid:110) [ ˆ ψ Jm ] T V ( V [ ˆ C kl ( ˜ X )] V )[ ˆ ψ Jm ] V (cid:111) , where V [ ˆ C kl ( ˜ X )] V = (1 /n ) (cid:80) ni =1 ([ ˜ X i ] T V e k )([ ˜ X i ] T V e l ) · [ ˜ X i ] V [ ˜ X i ] T V − δ kl I d − e k e Tl − e l e Tk . hen, choosing either the FOBI-solution [ ˆΨ F ] V or the JADE-solution [ ˆΨ J ] V the independent component scores are given by ˆ Z i = [ ˆΨ • ] T V Λ − / d [ ˆΦ] T G ( I p ⊗ G G )[ X i ] G . The optimization problem required by the JADE-solution is easily solvedwith standard joint diagonalization techniques, e.g. the Jacobi angle al-gorithm, see Cardoso and Souloumiac (1996). An implementation of thealgorithm can be found in the R-package JADE (Nordhausen et al., 2015).Theorem 5.1 shows that the resulting vector of independent component scoresis a linear transformation of the original vector of coordinates, ˆ Z i = A [ X i ] G for some d × pK matrix A . Consequently, we can get interpretations forthe independent component scores by considering the elements of A and ob-serving which of the original coordinates most influence each of the obtainedscores. The same procedure is used in the standard principal componentanalysis where the elements of the matrix A are called loadings. An exampleof such an interpretation will be given in the real data example in Section 6. d We next give some rough guidelines on choosing an appropriate reduceddimension d . Naturally, we can estimate independent component scores cor-responding to each latent function Z j only if d ≥ p . Moreover, even if we put d = p it could still happen that some of the component functions have too lowvariation and cannot fit amongst the d eigenvectors of Σ XX with the highesteigenvalues. From this point of view it would thus make sense to increase d further to make sure we capture all the latent functions. However, doing thisalso increases the odds of introducing more and more of the non-dependentpart of the model (noise) to the estimation.Further complication is brought in by Assumption 4.3 which in the sampleversion requires that all the diagonal matrices in the JADE-decompositionshare a single eigenbasis. It can be shown that a sufficient condition for thisis that each of the subvectors Z ( l ) , l = 1 , . . . , m has either length one or anelliptical distribution. This condition is more likely to be fulfilled for smallvalues of d and since d = p is a natural meeting point for all these rules,allowing us to estimate all p latent functions in the best case, we advocatethe use of the value d = p in practice. This rule of thumb will be used in theexamples of the next section. 20 Examples
In this simulation study we compare the two proposed methods to the al-ternative of applying only the principal component analysis part of the algo-rithm, that is, only projecting the data onto the space spanned by the first d eigenfunctions of Σ XX .For our setting we used p = 4 and considered for all four componentfunctions the same 11-element Fourier basis G . The leading coefficients inthe coordinate vectors of the component functions were generated either as( u , g , χ , e ) (Setting 1) or as ( u , u , u , u ) (Setting 2) where u , u , u , u ∼ U nif orm (0 , g ∼ Γ(3 , √ χ ∼ χ , e ∼ Exp (1) and all the previousrandom variables were independent and standardized to have zero meansand unit variances. The rest of the coordinates were independent stan-dard normal. In the first setting all the “signal” components thus had dis-tinct kurtoses and in the second setting they had identical kurtoses. Wegenerated samples of sizes n = 1000 , , , , , , Z i = ( Z i , Z i , Z i , Z i ) T , as[ Z i ] G (cid:55)→ [ X i ] G = Ω [ Z i ] G with a random mixing matrix Ω ∈ R × . Forsimplicity, we considered estimation only in the true case d = 4.To obtain Ω we first generated the matrix Ω = diag (cid:8) ( B ) / , I (cid:9) , where B = AA T + λ I , the matrix A ∈ R × has independent standard normalelements and λ = 0 . , . , . , . , . Ω is now obtained by permuting the rows and columns of Ω sothat only the leading coefficients of the component functions are going to bemixed in the transformations [ Z i ] G (cid:55)→ Ω [ Z i ] G . This unorthodox proceduregoes to ensure that the dependency between the four functions exists only inthe directions given by the eigenvectors of Σ XX with the eigenvalues a j + λ , j = 1 , . . . ,
4, where a j are the singular values of A . Thus if λ >
1, thefour largest eigenvalues always (on the population level) correspond to thedirections of interest, meaning that the assumptions of our model are fulfilledand we always pick the correct four eigenvectors. A similar mixing schemewas used also in Li et al. (2015).Subjecting the data to our proposed independent component methods,21 l l l l ll l l l l ll l l l l l l l l l l ll l l l l ll l l l l l
Setting 1 Setting 21000 2000 4000 8000 16000 32000 1000 2000 4000 8000 16000 320000.0010.100 n M i n i m u m d i s t a n ce i nd e x Method lll
PCAFOBIJADE
Lambda l Figure 1: The average minimum distance indices across different methodsand settings in the simulation study. Lower value of the index indicatesbetter separation. The scale of the y -axis is logarithmic.both of them estimate a matrix W = [ ˆΨ • ] T V Λ − / d [ ˆΦ] T G ( I p ⊗ G G ) ∈ R d × pK = R × , see Section 5, while the principal component analysis uses only the matrix W = [ ˆΦ] T G ( I p ⊗ G G ) ∈ R × . The independent/principal component scoresare then W [ X i ] G = WΩ [ Z i ] G and for the methods to successfully sepa-rate the independent component functions each row of the gain matrix WΩ should pick from [ Z i ] G coefficients relating only to a single component func-tion. For assessing the performance of a single replication we first squaredthe elements of the estimated gain matrix and then summed row-wise overeach block of size 4 ×
11, resulting into a 4 × R . The closer thematrix R is to the set P of matrices with a single non-zero element in eachrow and column, the better the result of estimation. To quantify this we usethe minimum distance index (Ilmonen et al., 2010), D ( R ) ∈ [0 , R ∈ P .From the results we expect that the principal component analysis failsto estimate the sources under all settings, as the orthogonal transformationfound by it is not enough to undo our mixing by the general matrix B / .22he theory behind standard FOBI, on which our coordinate representationwas seen to be based, says that FOBI cannot estimate components withmatching kurtosis values (Cardoso, 1989) as is the case with the identicaluniform distributions in our Setting 2. On the other hand, both FOBI andJADE should be able to find the solution in Setting 1 with differing, non-zerokurtosis values, the latter most likely outmatching the former. The resultingmean minimum distance indices across 1000 replications for different settingsand parameter values are shown in Fig. 1 and distinctly verify our precon-ceptions. As discussed earlier, the separation fails on average if λ ≤ λ , as long as we have λ > We consider the uWave gesture data set available from http://zhen-wang.appspot.com/rice/projects uWave.html (Liu et al., 2009). At each dayof the study the eight participants did ten repetitions of each of the eightgesture patterns in the Nokia gesture vocabulary (Kela et al., 2006) using aWii (cid:114) remote measuring the 3D-acceleration of the gesture. Each participanthad a total of seven study days making the total number of observed samples4480. Of these we discarded two samples which had a measurement only for asingle time point. Of the observed 3-variate curves ( x , y and z -acceleration)we further took the subset corresponding to the three visually most similargestures, a square, a clockwise circle and a counterclockwise circle, makingour data a sample of multivariate functional data with n = 1679 and p = 3.A standard Fourier basis of 11 functions was fitted to all observations of eachcomponent function.In pre-processing data, latent groups are most easily visually recognizedfrom bivariate scatter plots and our objective is thus to extract from thedata a pair of components that best reveal the latent group memberships.To evaluate the methods’ capabilities for this we used the following scheme.For each of the 1000 replications we randomly partitioned the data into atraining set of 400 observations and a test set of 1279 observations. Next,for each value of d = 2 , . . . ,
10, the training set was subjected to eitherprincipal component analysis (conducted as in the previous example), FOBIor JADE. As low kurtosis is often an indicator of a multimodal distribution,for the independent component analysis methods we chose from the resultingindependent component scores the two having the lowest fourth moments23 l l l l l l l l d C o rr ec tl y c l a ss i f i e d Method l PCA (variance)PCA (kurtosis)FOBIJADE
Figure 2: The average proportions of correctly classified cases in the test setfor different values of d .and for principal component analysis we considered two rules, taking thetwo scores with highest variances or taking the two scores with lowest fourthmoments. Each chosen pair of scores was then used in quadratic discriminantanalysis to create a classification rule and, finally, the proportion of correctclassifications in the test set was computed for each rule.The results are shown in Fig. 2 where the y -axis was cut from 0 . y -value of around 0 . d = 6. Themain points of interest include the following. All methods perform equallywell when d = 2 as then the chosen two components necessarily span thesame space. Principal component analysis using variance as the criterionalways chooses by definition the two first principal components regardless ofthe value of d , yielding a constant curve, and principal component analysisusing kurtosis as the criterion clearly cannot find the relevant informationat all. For d = 3 , d = p proved to be usefulin this context. 24 llll llllll lll ll ll l llllllllll lll ll lllll lll lllll llllllllllllll lllllllll l l lllllll lllllllll llllllll llll lll lll ll lllllll lllll llllllllllllllllll lllll ll llll lll ll ll l lll lllll ll lll ll l llll lll lllll l llllllllll lll llllllllll l lllllll lllllll ll ll ll llll llll lll lll ll lllll ll ll l ll lllllll lllllllllll lllllllll llllll ll lllllllll llllll lllllllllllllll llll llllll lllll lllllllll lllllllllllllllll llllll ll llll lll ll llllllllll ll lllllllllllllllllllll PCA (variance) FOBI JADE−2 0 2 −2 0 2 −2 0 2−3−2−1012
Component 1 C o m pon e n t Gesture l Square Clockwise circle Counterclockwise circle
Figure 3: Examples of the pairs of components found by the methods when d = 3.Examples of the scatter plots of the pair of components extracted fromthe training data by the three methods for d = 3 are given in Fig. 3 wherethe principal components have been scaled to better show the details. Thefigure shows that of the two components found by principal component anal-ysis only the first one provides information on the separation of the grouplocations while for FOBI and JADE both components carry location infor-mation. Interpretations for the FOBI independent component scores can nowbe obtained by examining the loading matrix reproduced in Table 1 whereany loadings with absolute value greater than 0.6 have been shaded. For ex-ample, the final element of the second row tells the contribution of the 11thbasis vector of the second observed function X to the first estimated scoreˆ Z . We can now make two main observations. First, no separation informa-tion is carried by the basis elements of order six or higher. Since the higherindex functions in Fourier bases control the finer, high-frequency propertiesof the resulting functions this reveals that most of the classification informa-tion is expectedly contained in the large-scale properties of the movementsand accelerations. Secondly, the y -acceleration hardly contributes to any ofthe scores, showing that only the x and z direction are relevant in the clas-sification. Also this makes sense, assuming that the gestures are drawn in25he air roughly vertically, occupying mostly the x - z plane. Similar explana-tions could also be produced for the JADE and principal component analysissolutions (not shown here).Table 1: The loadings of the FOBI estimate. ˆ Z j refer to the estimatedcomponents and X k to the original functions.ˆ Z j X k We close the paper by discussing some directions for future research. First,while the provided rule of thumb of choosing d = p proved useful in theexamples, the logical next step is to provide a more analytical approach, e.g.in the form of sequential hypothesis testing.Second, Theorem 4.3 shows how the independent component scores areobtained but tells us nothing about the division of the scores into the inde-pendent subvectors. In our real data example this was not an issue as visualinspection already revealed us the scores of interest, but in the case of lessvisual data some kind of testing procedure is called for. A similar problemwas encountered in Nordhausen and Oja (2011) where an approach basedon scatter matrices with the independence property was used to identify theindependent subvectors, and a likewise procedure could possibly also be usedhere.Third, in Section 5 it was shown that the extensions of both FOBI andJADE to multivariate functional data can be applied in practice by project-ing the observed functions into the space spanned by the first d eigenvectors26f the covariance matrix operator and then subjecting the obtained stan-dardized principal component coefficients to regular FOBI or JADE. Thisnaturally begs for the question whether also some other standard multivari-ate methods can be meaningfully extended to multivariate functional datasimply by applying them to the principal component coefficients. Some pre-liminary testing shows that this is certainly the case for FastICA, a pro-jection pursuit-based family of independent component methods (Hyv¨arinenand Oja, 1997). A Proofs of results
Proof of Lemma 3.1.
The self-adjointness of Σ XX follows simply from theearlier discussion of the adjoints of the components Σ X i X j . Furthermore, byexpanding element-wise we have for any f ∈ H : (cid:104) Σ XX f, f (cid:105) H = E {(cid:104) ( X ⊗ X ) f, f (cid:105) H } = E (cid:0) (cid:104) X, f (cid:105) H (cid:1) ≥ , showing that Σ XX is non-negative.Let { e k } ∞ k =1 be an orthonormal basis of H . Using the same reasoning asabove, the trace of the self-adjoint, non-negative operator Σ XX is thentr(Σ XX ) = ∞ (cid:88) k =1 (cid:104) Σ XX e k , e k (cid:105) H = E (cid:32) ∞ (cid:88) k =1 (cid:104) X, e k (cid:105) H (cid:33) = E (cid:107) X (cid:107) H , where the last equality uses Parseval’s identity. Now, by our assumptions E (cid:107) X (cid:107) H is finite, making Σ XX a trace-class operator.To show that the affine equivariance holds, let A ∈ L ( H ) and writeΣ ( AX ) = E ( AX ⊗ AX ) = E { A ( X ⊗ X ) A ∗ } . Thus Σ ( AX ) is the unique operator C satisfying (cid:104) f, Cg (cid:105) H = E {(cid:104) f, A ( X ⊗ X ) A ∗ g (cid:105) H } ,for all f, g ∈ H . Using again the definition of the expected value of a randomoperator the right-hand side is seen to equal (cid:104) f, A Σ( X ) A ∗ g (cid:105) H showing thatΣ ( AX ) = A Σ( X ) A ∗ .Finally, the full independence property follows simply by assuming that X i and X j are independent and checking that we have E {(cid:104) f i , ( X i ⊗ X j ) g j (cid:105) i } = E ( (cid:104) X i , f i (cid:105) i ) E ( (cid:104) X j , g j (cid:105) j ) = 0 , for all f i ∈ H i and g j ∈ H j , and thus by definition E ( X i ⊗ X j ) = 0.27 roof of Lemma 4.1. Since Σ is affine equivariant, and since { φ k } ∞ k =1 areeigenvectors of Σ( X ), we haveΣ( X ( d ) ) = P M d Σ( X ) P M d = d (cid:88) k =1 λ k ( φ k ⊗ φ k ) , which further implies that Σ( X ( d ) ) − / = (cid:80) dk =1 λ − / k ( φ k ⊗ φ k ). Next, for Z ( d ) = Γ X ( d ) we haveΣ( Z ( d ) ) = d (cid:88) k =1 λ k (Γ φ k ⊗ Γ φ k ) = Γ Σ( X ( d ) )Γ ∗ . As Γ is boundedly invertible, the inverse square root of Σ( Z ( d ) ) exists as abounded operator, and we can writeΣ( Z ( d ) ) − / Z ( d ) = (cid:8) Σ( Z ( d ) ) − / Γ Σ( X ( d ) ) / (cid:9) Σ( X ( d ) ) − / X ( d ) . (11)What remains is to prove that A = Σ( Z ( d ) ) − / Γ Σ( X ( d ) ) / is unitary whichfollows by directly verifying, A A ∗ = Σ( Z ( d ) ) − / Γ Σ( X ( d ) )Γ ∗ Σ( Z ( d ) ) − / , where Γ Σ( X ( d ) )Γ ∗ is equal to Σ( Z ( d ) ), showing that A A ∗ = P M d . Theoperator A is thus unitary and consequently also A ∗ A = P M d . Applyingnow A ∗ from left to both sides of (11) shows that U = A ∗ , concluding theproof. Proof of Lemma 4.2.
We provide the proof for the second term in (7), theproofs for the third and fourth terms following similarly. Using the definitionof the expected value of a random operator, the second term is the uniqueoperator A ∈ L ( M d ) with (cid:104) f, Ag (cid:105) H = E ( (cid:104) X (cid:48) , φ i (cid:105) H (cid:104) X (cid:48) , φ j (cid:105) H (cid:104) X, f (cid:105) H (cid:104) X, g (cid:105) H ) , for all f, g ∈ M d . The independence of X and X (cid:48) further implies that theright-hand side can be written in the form E {(cid:104) ( X (cid:48) ⊗ X (cid:48) ) φ i , φ j (cid:105) H } E {(cid:104) ( X ⊗ X ) f, g (cid:105) H } = (cid:104) Σ( X (cid:48) ) φ i , φ j (cid:105) H (cid:104) Σ( X ) f, g (cid:105) H , which equals (cid:104) f, δ ij P M d g (cid:105) under our assumptions, concluding the proof.28 roof of Theorem 4.1. Consider only the first term in the expansion of C ij inLemma 4.2. Plugging in U Z we get
U E {(cid:104)
U Z, φ i (cid:105) H (cid:104) U Z, φ j (cid:105) H ( Z ⊗ Z ) } U ∗ = U M U ∗ . The ( k, l ) component operator of the expected value M is definedas the operator A kl ∈ L ( H l , H k ) satisfying (cid:104) f k , A kl g l (cid:105) k = p (cid:88) s,t =1 E ( (cid:104) Z s , ξ is (cid:105) s (cid:104) Z t , ξ jt (cid:105) t (cid:104) Z k , f k (cid:105) k (cid:104) Z l , g l (cid:105) l ) , (12)for all f k ∈ span( { φ mk } dm =1 ) and g l ∈ span( { φ ml } dm =1 ) where ξ i = ( ξ i , . . . , ξ ip ) = U ∗ φ i . Concentrate first on the off-diagonal case k (cid:54) = l . Then either s = k, t = l or s = l, t = k as otherwise the independence and zero means of the com-ponent functions reduce the sum to zero. Consider the first of these cases: E {(cid:104) f k , ( Z k ⊗ Z k ) ξ ik (cid:105) k } E {(cid:104) g l , ( Z l ⊗ Z l ) ξ jl (cid:105) l } = (cid:104) f k , ( ξ ik ⊗ ξ jl ) g l (cid:105) k . The expected value of an arbitrary ( k, l )th off-diagonal component operatorof M is thus ( ξ ik ⊗ ξ jl ) + ( ξ jk ⊗ ξ il ), which can be recognized to be also the( k, l )th component operator of U ∗ { ( φ i ⊗ φ j ) + ( φ j ⊗ φ i ) } U .The general form for an arbitrary ( k, k )th diagonal component operatorof M can be found in a similar manner. Notice first that if k = l in (12) thenit must be that s = t or otherwise the sum is again zero by independence andzero means. The summation over s can then be divided into two cases, s = k and s (cid:54) = k . Similar manipulation as done above yields then the expectedvalue E { ( Z k ⊗ Z k )( ξ ik ⊗ ξ jk )( Z k ⊗ Z k ) } + ( δ ij − (cid:104) ξ ik , ξ jk (cid:105) k ) P k for the ( k, k )thdiagonal operator where the first summand comes from the former case andthe second from the latter.Putting now everything together into a matrix of operators shows thatthe three last terms in the alternative form for C ij in Lemma 4.2 cancel out,leaving us with the claimed result. Proof of Theorem 4.2.
For an arbitrary X ∈ X ( M d ) with Σ( X ) = P M d wehave by Lemma 4.2 C ( X ) = d (cid:88) i =1 C ii ( X ) = E (cid:40) d (cid:88) i =1 (cid:104) X, φ i (cid:105) H ( X ⊗ X ) (cid:41) − ( d + 2) P M d , (13)29he argument of the expectation being further simplified by Parseval’s iden-tity to (cid:80) di =1 (cid:104) X, φ i (cid:105) H ( X ⊗ X ) = (cid:107) X (cid:107) H ( X ⊗ X ) = ( X ⊗ X ) . The first claimedequality now follows from the form C ( X ) = E { ( X ⊗ X ) } − ( d + 2) P M d .By Theorem 4.1, an arbitrary off-diagonal element of the operator C ( Z ) = (cid:80) di =1 D ii is zero. The exact form for its diagonal elements D kk could alsobe derived from (8) but the seeming dependency of D kk = (cid:80) di =1 D iikk onthe operator U needlessly complicates things and it is simpler to proceedstraight from the form C ( Z ) = E [( Z ⊗ Z ) ] − ( d + 2) P M d . The ( k, k )thdiagonal operator of the first term is then defined as the unique operator A kk ∈ L ( H k , H k ) satisfying (cid:104) f k , A kk g k (cid:105) k = E (cid:40) d (cid:88) j =1 (cid:104) Z j , Z j (cid:105) j (cid:104) Z k , f k (cid:105) k (cid:104) Z k , g k (cid:105) k (cid:41) , for all f k ∈ span( { φ mk } dm =1 ) and g k ∈ span( { φ mk } dm =1 ). Divide the summa-tion over j into two cases, j = k and j (cid:54) = k . The former yields the term E {(cid:104) f k , (cid:107) Z k (cid:107) k ( Z k ⊗ Z k ) , g k (cid:105) k } contributing E {(cid:107) Z k (cid:107) k ( Z k ⊗ Z k ) } = E { ( Z k ⊗ Z k ) } to the final expected value. The latter yields the term (cid:40)(cid:88) j (cid:54) = k E (cid:0) (cid:107) Z j (cid:107) j (cid:1)(cid:41) E {(cid:104) f k , ( Z k ⊗ Z k ) , g k (cid:105) k } = (cid:40)(cid:88) j (cid:54) = k E (cid:0) (cid:107) Z j (cid:107) j (cid:1)(cid:41) (cid:104) f k , g k (cid:105) k , where the first multiplicand can be written as E ( (cid:107) Z (cid:107) H ) − E ( (cid:107) Z k (cid:107) k ), whosefirst term equals by Parseval’s identity E (cid:0) (cid:107) Z (cid:107) H (cid:1) = d (cid:88) i =1 E (cid:0) (cid:104) Z, φ i (cid:105) H (cid:1) = d (cid:88) i =1 (cid:104) φ i , E ( Z ⊗ Z ) φ i (cid:105) = d. Similarly, by choosing an orthonormal basis for the k th component space onecan show that E ( (cid:107) Z k (cid:107) k ) = d k := dim[span( { φ mk } dm =1 )]. The total contribu-tion of the case j (cid:54) = k to the expected value of the ( k, k )th diagonal operatoris thus ( d − d k ) P k . Finally, putting everything together with (13) yields thedesired result. Proof of Lemma 4.3.
Inspect without loss of generality the first eigenvector ψ and assume that it is not canonical, ψ = ( ψ , . . . , ψ p ), where againwithout loss of generality we assume that ψ and ψ are both non-zero.30hen the linearly independent vectors ( ψ , , , . . . ,
0) and (0 , ψ , , . . . , D associated with the same eigenvalue τ , makingthe eigenspace associated with the eigenvalue τ have dimension of at least 2,a contradiction as the assumption on the distinctness of eigenvalues impliesunit rank. Thus only one of ψ , . . . , ψ p can be non-zero. Proof of Lemma 4.4.
By the Cauchy-Schwarz inequality and the unit lengthof ψ k we have w ( ψ , . . . , ψ d ) ≤ I (cid:88) i =1 d (cid:88) k =1 (cid:104) ψ k , ψ k (cid:105) H (cid:104) S i ψ k , S i ψ k (cid:105) H = I (cid:88) i =1 d (cid:88) k =1 (cid:104) S i ψ k , S i ψ k (cid:105) H . Now, (cid:80) dk =1 (cid:104) S i ψ k , S i ψ k (cid:105) H = (cid:107) S i (cid:107) HS for any orthonormal basis { ψ k } dk =1 andwe have shown the first part of the claim. To see when the equality holdsrecall that the Cauchy-Schwarz inequality preserves equality if and only if thetwo vectors in question are proportional. We must thus have ψ k = a ik S i ψ k for some a ik ∈ R for all i = 1 , . . . , I , k = 1 , . . . d , which is equivalent to sayingthat each ψ k is an eigenvector of each S i . Proof of Theorem 4.3.
Recall first that by Lemma 4.1 we have ˜ X = U ˜ Z where ˜ Z = Σ( Z ( d ) ) − / Z ( d ) . By Lemma 3.1 the operator Σ( Z ( d ) ) is diago-nal and thus one possible choice for the inverse square root of the operatorΣ( Z ( d ) ) is also a diagonal operator, namely the diagonal operator G withsome inverse square roots of the diagonal elements of Σ( Z ( d ) ) as its diagonalelements. With this choice, Σ( Z ( d ) ) − / = G , also ˜ Z has then independentcomponent functions. A reasoning similar to the one used in Remark 2.1in Ilmonen et al. (2012) shows that all inverse square roots of Σ( Z ( d ) ) areof the form V G where V is unitary and can by the unitary equivariance betaken out of C ( ˜ Z ), “merging” it with U . We may thus without loss of gen-erality assume that Σ( Z ( d ) ) − / is a diagonal operator. Invoking then finallyTheorem 4.2 shows that C ( ˜ Z ) is also a diagonal operator.Let { h Fk } dk =1 be the eigenvectors of C ( ˜ Z ). Then by Theorem 4.2 theFOBI-basis of X is given by { ψ Fk } dk =1 = { U h Fk } dk =1 . Then, by the unitarityof U we have ˆ Z Fk = U ( h Fk ⊗ h Fk ) ˜ Z. C ( ˜ X ) and C ( ˜ Z ) share the same eigenvalues all the assumptions of Lemma4.3 are satisfied and only the l ( k )th element of h Fk is non-zero, k = 1 , . . . d .Consequently ˆ Z k = (cid:104) h Fkl ( k ) , ˜ Z l ( k ) (cid:105) l ( k ) U h Fk , showing that ˆ Z k depends only on the l ( k )th component of Z .The result for the JADE-basis follows similarly. We first notice thatby Theorem 4.1 the operators C ij are semi-unitary equivariant in the sensethat we may again assume that Σ( Z ( d ) ) − / is a diagonal operator and thatthe random function ˜ Z has independent component functions. Let then { ψ Jk } dk =1 be the joint diagonalizer of C . Now, again by Theorem 4.1 wehave C ij ( ˜ X ) = U D ij U ∗ where D ij = D ij ( U , ˜ Z ) are diagonal operators, i, j = 1 , . . . , d . By Lemma 4.4 the joint diagonalizer of the set { D ij } di,j =1 is { h Jk } dk =1 where each h Jk is canonical. Consequently, the joint diagonalizerof C is { ψ Jk } dk =1 = { U h Jk } dk =1 and the desired result follows as above withFOBI. Proof of Theorem 5.1.
First, our space being finite-dimensional, for everyfixed pair of bases B , G every linear operator A in M has with it associatedthe unique matrix G [ A ] B ∈ R pK × pK that satisfies [ Af ] G = ( G [ A ] B )[ f ] B , for all f ∈ M . Furthermore, a function f is an eigenfunction of the operator A associated with the eigenvalue λ if and only if [ f ] B is an eigenvector of thematrix B [ A ] B associated with the same eigenvalue λ .The inner product of two elements f , f ∈ M expressed in the same basis G = { g k } Kk =1 is given simply by (cid:104) f , f (cid:105) = K (cid:88) k =1 K (cid:88) k (cid:48) =1 ([ f ] G ) k ([ f ] G ) k (cid:48) (cid:104) g k , g k (cid:48) (cid:105) = [ f ] T G G G [ f ] G , where G G = ( (cid:104) g k , g k (cid:48) (cid:105) ) Kk,k (cid:48) =1 is the Gram matrix of the basis G . The tensorproduct between two elements f , f ∈ M has the following coordinate G [ f ⊗ f ] G = [ f ] G [ f ] T G G G . (14)These and more properties about the coordinate system were used and fur-ther developed in Li and Solea (2017).We begin with the coordinate representation of the standardization step.An estimate for the covariance matrix operator isˆΣ X r X s = 1 n n (cid:88) i =1 ( X ir ⊗ X is ) . XX is the matrix { G [ ˆΣ X r X s ] G } pr,s =1 By (14), G [ ˆΣ X r X s ] G = 1 n n (cid:88) i =1 [ X ir ] G [ X is ] T G G G . Assemble these matrices together to obtain G [ ˆΣ XX ] G = 1 n [ X ] T G [ X ] G ( I p ⊗ G G ) , where ⊗ is the Kronecker product between matrices.We next fix the dimension d ≤ pK and estimate the coordinate [ ˆ φ l ] G ofthe first d eigenfunctions ˆ φ l of ˆΣ XX . As shown in Li and Solea (2017), ˆ φ l is the l th eigenfunction of the operator ˆΣ XX if and only if ( I p ⊗ G / G )[ ˆ φ l ] G is the l th eigenvector of the matrix ( I p ⊗ G / G )( G [ ˆΣ XX ] G )( I p ⊗ G − / G ). Theorthogonal projection of f ∈ M onto span ( V ), where V = { ˆ φ l } dl =1 , is then d (cid:88) l =1 ( ˆ φ l ⊗ ˆ φ l ) f = d (cid:88) k =1 [ ˆ φ l ] T G ( I p ⊗ G G )[ f ] G ˆ φ l , and the coordinates of the observations in the eigenbasis V are thus[ X i ( d ) ] V = [ ˆΦ] T G ( I p ⊗ G G )[ X i ] G ∈ R d , i = 1 , . . . , n, where [ ˆΦ] G = ([ ˆ φ ] G , . . . , [ ˆ φ d ] G ). Let [ X ( d ) ] V = ([ X d ) ] V , . . . , [ X n ( d ) ] V ) T . Thenthe above equations can be written in matrix form as[ X ( d ) ] V = [ X ] G ( I p ⊗ G G )[ ˆΦ] G . Since the principal component scores satisfy V [ ˆΣ( X ( d ) )] V = Λ d , where Λ d =diag( λ , . . . , λ d ) contains the eigenvalues of ˆΣ XX , the coordinates of the stan-dardized observations ˜ X i in the eigenbasis are[ ˜ X i ] V = ( V [ ˆΣ( X ( d ) )] V ) − / [ X i ( d ) ] V = Λ − / d [ ˆΦ] T G ( I p ⊗ G G )[ X i ] G . Turning our attention to the fourth cross-cumulant operators we have forfixed k, l = 1 , . . . , d the estimateˆ C kl ( ˜ X ) = 1 n n (cid:88) i =1 (cid:104) ˜ X i , ˆ φ k (cid:105) V (cid:104) ˜ X i , ˆ φ l (cid:105) V ( ˜ X i ⊗ ˜ X i ) − δ kl P M d − ˆ φ k ⊗ ˆ φ l − ˆ φ l ⊗ ˆ φ k , (cid:104) ˜ X i , ˆ φ k (cid:105) V just extracts the k th element of the co-ordinate vector [ ˜ X i ] V . Reasoning then as above with the covariance matrixoperator it is straightforward to obtain the following coordinate representa-tion: V [ ˆ C kl ( ˜ X )] V = 1 n n (cid:88) i =1 ([ ˜ X i ] T V e k )([ ˜ X i ] T V e l ) · [ ˜ X i ] V [ ˜ X i ] T V − δ kl I d − e k e Tl − e l e Tk , where e k is the k th canonical basis vector of R d and I d is the d × d identitymatrix. The similarity of this form to (2) already suggests that the functionalindependent component analysis solutions are found by performing regularFOBI or JADE on the coordinates [ ˜ X i ] V of the standardized observations.The coordinate representation of the estimate of the FOBI-operator (9)is now simply V [ ˆ C ( ˜ X )] V = 1 n n (cid:88) i =1 [ ˜ X i ] T V [ ˜ X i ] V · [ ˜ X i ] V [ ˜ X i ] T V − ( d + 2) I d , and an estimate U F = { ˆ ψ Fm } dm =1 for the FOBI-basis is found from its eigen-decomposition. Letting [ ˆΨ F ] V = ([ ˆ ψ F ] V , . . . , [ ˆ ψ Fd ] V ) ∈ R d × d be the coordinaterepresentation of the eigenvectors of ˆ C ( ˜ X ) in V , the vector of the FOBI in-dependent component scores, (cid:104) ˆ ψ Fm , ˜ X i (cid:105) = [ ˆ ψ Fm ] T V [ ˜ X i ] V , m = 1 , . . . , d , is thenfinally obtained as[ ˆΨ F ] T V [ ˜ X i ] V = [ ˆΨ F ] T V Λ − / d [ ˆΦ] T G ( I p ⊗ G G )[ X i ] G . For the JADE-solution, an estimate U J = { ˆ ψ Jm } dm =1 for the JADE-basis,i.e. the joint diagonalizer of the set { ˆ C kl ( ˜ X ) } dk,l =1 , is found by maximizingthe quantity (10), the maximization problem now having the form[ ˆΨ J ] V = argmax [ ˆΨ J ] T V [ ˆΨ J ] V = I d d (cid:88) k =1 d (cid:88) l =1 d (cid:88) m =1 (cid:110) [ ˆ ψ Jm ] T V ( V [ ˆ C kl ( ˜ X )] V )[ ˆ ψ Jm ] V (cid:111) , where [ ˆΨ J ] V = ([ ˆ ψ J ] V , . . . , [ ˆ ψ Jd ] V ) ∈ R d × d . As with FOBI above, the vectorsof the JADE independent component scores are then[ ˆΨ J ] T V [ ˜ X i ] V = [ ˆΨ J ] T V Λ − / d [ ˆΦ] T G ( I p ⊗ G G )[ X i ] G . eferences Baker, C. R. (1973). Joint measures and cross-covariance operators.
Trans-actions of the American Mathematical Society 186 , 273–289.Berrendero, J. R., A. Justel, and M. Svarc (2011). Principal componentsfor multivariate functional data.
Computational Statistics & Data Analy-sis 55 (9), 2619–2634.Bonhomme, S. and J.-M. Robin (2009). Consistent noisy independent com-ponent analysis.
Journal of Econometrics 149 (1), 12–25.Bosq, D. (2012).
Linear processes in function spaces: theory and applications ,Volume 149. Springer Science & Business Media.Cardoso, J.-F. (1989). Source separation using higher order moments. In
In-ternational Conference on Acoustics, Speech, and Signal Processing, 1989 ,pp. 2109–2112.Cardoso, J.-F. (1998). Multidimensional independent component analysis.In
Proceedings of the 1998 IEEE International Conference on Acoustics,Speech and Signal Processing , Volume 4, pp. 1941–1944.Cardoso, J.-F. and A. Souloumiac (1993). Blind beamforming for non-Gaussian signals. In
IEE Proceedings F-Radar and Signal Processing , Vol-ume 140, pp. 362–370.Cardoso, J.-F. and A. Souloumiac (1996). Jacobi angles for simultaneousdiagonalization.
SIAM journal on matrix analysis and applications 17 (1),161–164.Chiou, J.-M., Y.-T. Chen, and Y.-F. Yang (2014). Multivariate func-tional principal component analysis: A normalization approach.
StatisticaSinica 24 , 1571–1596.Comon, P. and C. Jutten (2010).
Handbook of Blind Source Separation:Independent component analysis and applications . Academic Press.Conway, J. B. (2013).
A course in functional analysis , Volume 96. SpringerScience & Business Media. 35utch, H. W. and F. J. Theis (2012). To infinity and beyond: On ICA overHilbert spaces. In
LVA/ICA , pp. 180–187. Springer.Happ, C. and S. Greven (2017). Multivariate functional principal componentanalysis for data observed on different (dimensional) domains.
Journal ofthe American Statistical Association (Accepted).Hyv¨arinen, A. and E. Oja (1997). A fast fixed-point algorithm for indepen-dent component analysis.
Neural Computation 9 (7), 1483–1492.Ieva, F., A. M. Paganoni, D. Pigoli, and V. Vitelli (2011). ECG signalreconstruction, landmark registration and functional classification. In .Ilmonen, P., K. Nordhausen, H. Oja, and E. Ollila (2010). A new performanceindex for ICA: properties, computation and asymptotic analysis. In
In-ternational Conference on Latent Variable Analysis and Signal Separation ,pp. 229–236. Springer.Ilmonen, P., H. Oja, and R. Serfling (2012). On invariant coordinate system(ICS) functionals.
International Statistical Review 80 (1), 93–110.Jacques, J. and C. Preda (2014). Model-based clustering for multivariatefunctional data.
Computational Statistics & Data Analysis 71 , 92–106.Kayano, M., K. Dozono, and S. Konishi (2010). Functional cluster analysisvia orthonormalized Gaussian basis expansions and its application.
Journalof Classification 27 (2), 211–230.Kela, J., P. Korpip¨a¨a, J. M¨antyj¨arvi, S. Kallio, G. Savino, L. Jozzo, andS. Di Marca (2006). Accelerometer-based gesture control for a design en-vironment.
Personal and Ubiquitous Computing 10 (5), 285–299.Koldovsky, Z., P. Tichavsky, and E. Oja (2006). Efficient variant of algorithmFastICA for independent component analysis attaining the Cramer-Raolower bound.
IEEE Transactions on Neural Networks 17 (5), 1265–1277.Li, B., H. Chun, and H. Zhao (2014). On an additive semigraphoid modelfor statistical networks with application to pathway analysis.
Journal ofthe American Statistical Association 109 , 1188–1204.36i, B. and E. Solea (2017). A nonparametric graphical model for functionaldata with application to brain networks based on fMRI.
Journal of theAmerican Statistical Association (Accepted).Li, B. and J. Song (2017a). Dimension reduction for functional data basedon weak conditional moments. Unpublished manuscript.Li, B. and J. Song (2017b). Nonlinear sufficient dimension reduction forfunctional data.
The Annals of Statistics 45 , 1059–1095.Li, B., G. Van Bever, H. Oja, R. Sabolov´a, and F. Critchley (2015). Func-tional independent component analysis: an extension of the fourth-orderblind identification.
Submitted .Liu, J., L. Zhong, J. Wickramasuriya, and V. Vasudevan (2009). uWave:Accelerometer-based personalized gesture recognition and its applications.
Pervasive and Mobile Computing 5 (6), 657–675.Matilainen, M., K. Nordhausen, and H. Oja (2015). New independent com-ponent analysis tools for time series.
Statistics & Probability Letters 105 ,80–87.Miettinen, J., K. Nordhausen, H. Oja, and S. Taskinen (2014). Deflation-based FastICA with adaptive choices of nonlinearities.
IEEE Transactionson Signal Processing 62 (21), 5716–5724.Miettinen, J., K. Nordhausen, H. Oja, S. Taskinen, and J. Virta (2017). Thesquared symmetric FastICA estimator.
Signal Processing 131 , 402 – 411.Miettinen, J., K. Nordhausen, and S. Taskinen (2017). Blind source sep-aration based on joint diagonalization in R: The packages JADE andBSSasymp.
Journal of Statistical Software 76 (2), 1–31.Miettinen, J., S. Taskinen, K. Nordhausen, and H. Oja (2015). Fourth mo-ments and independent component analysis.
Statistical Science 30 (3),372–390.Moreau, E. (2001). A generalization of joint-diagonalization criteria forsource separation.
IEEE Transactions on Signal Processing 49 (3), 530–541. 37ordhausen, K., J.-F. Cardoso, J. Miettinen, H. Oja, E. Ollila, and S. Task-inen (2015).
JADE: Blind Source Separation Methods Based on Joint Di-agonalization and Some BSS Performance Criteria . R package version1.9-93.Nordhausen, K. and H. Oja (2011). Independent subspace analysis usingthree scatter matrices.
Austrian Journal of Statistics 40 (1&2), 93–101.R Core Team (2016).
R: A Language and Environment for Statistical Com-puting . Vienna, Austria: R Foundation for Statistical Computing.Ramsay, J. and B. Silverman (2005).
Functional Data Analysis . Springer.Ramsay, J. O., H. Wickham, S. Graves, and G. Hooker (2014). fda: Func-tional Data Analysis . R package version 2.4.4.Risk, B. B., D. S. Matteson, and D. Ruppert (2015). Likelihood componentanalysis. arXiv preprint arXiv:1511.01609 .Sato, Y. (2013, August). Theoretical considerations for multivariate func-tional data analysis. In
Proceedings 59th ISI World Statistics Congress ,pp. 25–30.Song, J. and B. Li (2017). On additive functional principal component anal-ysis. Unpublished manuscript.Tokushige, S., H. Yadohisa, and K. Inada (2007). Crisp and fuzzy k-meansclustering algorithms for multivariate functional data.
ComputationalStatistics 22 (1), 1–16.Venables, W. N. and B. D. Ripley (2002).
Modern Applied Statistics with S (Fourth ed.). New York: Springer. ISBN 0-387-95457-0.Virta, J., B. Li, K. Nordhausen, and H. Oja (2017a). Independent componentanalysis for tensor-valued data.
Journal of Multivariate Analysis 162 , 172– 192.Virta, J., B. Li, K. Nordhausen, and H. Oja (2017b). JADE for tensor-valued observations.
Accepted to Journal of Computational and GraphicalStatistics, preprint at arXiv:1603.05406 .38ickham, H. (2007). Reshaping data with the reshape package.
Journal ofStatistical Software 21 (12), 1–20.Wickham, H. (2009). ggplot2: Elegant Graphics for Data Analysisggplot2: Elegant Graphics for Data Analysis