Group Linear non-Gaussian Component Analysis with Applications to Neuroimaging
Yuxuan Zhao, David S. Matteson, Mary Beth Nebel, Stewart H. Mostofsky, Benjamin Risk
GGroup linear non-Gaussian component analysis with applications toneuroimaging
Yuxuan Zhao a , David S. Matteson a , Mary Beth Nebel b,c , Stewart H. Mostofsky b,c,d ,Benjamin Risk e a Department of Statistics and Data Science, Cornell University b Center for Neurodevelopmental and Imaging Research, Kennedy Krieger Institute c Department of Neurology, Johns Hopkins University School of Medicine d Department of Psychiatry and Behavioral Science, Johns Hopkins University School of Medicine e Department of Biostatistics and Bioinformatics, Rollins School of Public Health, Emory University
Abstract
Independent component analysis (ICA) is an unsupervised learning method popular in func-tional magnetic resonance imaging (fMRI). Group ICA has been used to search for biomark-ers in neurological disorders including autism spectrum disorder and dementia. However,current methods use a principal component analysis (PCA) step that may remove low-variance features. Linear non-Gaussian component analysis (LNGCA) enables simultaneousdimension reduction and feature estimation including low-variance features in single-subjectfMRI. We present a group LNGCA model to extract group components shared by more thanone subject and subject-specific components. To determine the total number of componentsin each subject, we propose a parametric resampling test that samples spatially correlatedGaussian noise to match the spatial dependence observed in data. In simulations, our es-timated group components achieve higher accuracy compared to group ICA. We apply ourmethod to a resting-state fMRI study on autism spectrum disorder in 342 children (252 typ-ically developing, 90 with autism), where the group signals include resting-state networks.We find examples of group components that appear to exhibit different levels of temporalengagement in autism versus typically developing children, as revealed using group LNGCA.This novel approach to matrix decomposition is a promising direction for feature detectionin neuroimaging.
Keywords: big data; functional magnetic resonance imaging (fMRI); group inference;independent component analysis (ICA); matrix decomposition; principal componentanalysis; resting-state fMRI
1. Introduction
Independent component analysis (ICA) is a popular unsupervised learning method toidentify brain networks in functional magnetic resonance imaging (fMRI) studies (Beckmann,
Email address: [email protected] (Benjamin Risk)
Preprint submitted to Elsevier January 14, 2021 a r X i v : . [ s t a t . M E ] J a n
2. Methods
We propose a group LNGCA model for multi-subject fMRI data, which decomposesnon-Gaussian (NG) signals into group signals and individual signals for each subject. Let i = 1 , . . . , k index subjects, t = 1 , . . . , T index time points, and v = 1 , . . . , V index voxels(volumetric pixels). Let x i ( v ) ∈ R T be a data vector of observed fMRI data from subject i at voxel v . To extract group signals, we first decompose observations into orthogonal NG subspaceand Gaussian subspace for each subject. Specifically, the LNGCA model decomposes obser-vation x i ( v ) as x i ( v ) = M s i s i ( v ) + M n i n i ( v ) , for v = 1 , . . . , V, (1)where s i ( v ) ∈ R q i is a vector of mutually independent NG signals with 1 ≤ q i ≤ T , and n i ( v ) ∈ R T − q i is a Gaussian noise vector. The number of NG signals q i may vary across i . Mixing matrices M s i ∈ R T × q i , M n i ∈ R T × ( T − q i ) satisfy that [ M s i , M n i ] has full rank forany i . Here, { x i ( v ) } v =1 ,...,V are observed while { s i ( v ) } v =1 ,...,V and { n i ( v ) } v =1 ,...,V are latent.We assume E s i ( v ) = with E s i ( v ) s (cid:48) i ( v ) = I . Additionally, assume E n i ( v ) = 0 such thatE x i ( v ) = 0 and n i ( v ) has unit variance for identifiability. In practice, data are centered bytheir sample mean. We call s i ( v ) NG signals and n i ( v ) Gaussian noise.4CA+ICA projects observations into a smaller dimensional subspace spanned by the topprincipal components and extract top NG signals from that subspace. In contrast, subject-level LNGCA directly projects observations into a smaller subspace spanned by the top NGsignals, which are ordered by a measure of non-Gaussianity. Thus LNGCA can captureNG signals that have small variance that may be discarded in the PCA step in PCA+ICA.Notice the key difference between LNGCA and PCA+ICA is how they search for a low rankspace, and they are equivalent if there is no dimension reduction. Assume there is at least one group signal. We further decompose s i ( v ) into orthogonalgroup signals s g ( v ) and individual signals s I,i ( v ), M s i s i ( v ) = M gi s g ( v ) + M Ii s I,i ( v ) , for v = 1 , . . . , V. (2)where s g ( v ) ∈ R q g is a vector of mutually independent group signals with 1 ≤ q g ≤ min i q i , s I,i ( v ) ∈ R q I,i is a vector of mutually independent individual signals with q I,i = q i − q g .Group signals s g ( v ) are shared across all subjects. Combining (1) and (2), we have thecomplete decomposition, which we call the group linear non-Gaussian component analysismodel (group LNGCA): x i ( v ) = M gi s g ( v ) + M Ii s I,i ( v ) + M n i n i ( v ) . (3)In subject-level LNGCA, the NG signals and the mixing matrix in (1), i.e. s i ( v ) =[ s g ( v ) (cid:48) , s I,i ( v ) (cid:48) ] (cid:48) and M s i = [ M gi , M Ii ], are identifiable up to sign and permutation (Risk et al.,2019). Consider the matrix of subject-level components for the i th subject: S i ∈ R q i × V ,with rows [ s ij (1) , . . . , s ij ( V )] for j = 1 , . . . , q i . Then the group components are matchedrows from different subjects, and individual components are rows that are not equal. Inpractice, we will use a singular value decomposition of the concatenated subject-level NG5omponents (subspaces of R V ) to find the group subspace, where the size of the singularvalues characterizes the extent to which a direction of the non-Gaussian subspace is presentacross subjects; this is discussed in the next section. In applications, if a group componentis prominent in one subgroup but reduced in another subgroup, then this can be capturedwith a group component in which the variance of the subject-specific time courses (columnsof M gi ) reflect differing roles, as explored in Section 4. We expect that subject-specificcomponents will arise from 1) components that are unique to a subject, such as certaintypes of motion artifacts, and 2) spatial patterns of common resting-state networks thatare unique to a subject. For example, in fMRI, different subjects may have similar, butnot identical, default mode networks (DMNs). Roughly speaking, the intersection of theresting-state networks corresponds to a group component, while subject-specific deviationscan be allocated to the individual components. In the population formulation of group LNGCA, the decomposition for each subjectfollowing (3) is a re-labeling of the subject components in (1) into group and individual NGcomponents. In a sample of observations, the subject-specific LNGCAs will result in noisyestimates of the population signals, and we propose a scalable algorithm to extract the groupcomponents. We summarize our estimation procedure in Algorithm 1.In Step 1, we estimate NG signals for each subject. In this paper, we use the logistic non-linearity with the fastICA algorithm (Hyvarinen, 1999) modified for LNGCA because thelogistic non-linearity is used in Infomax and performs well in fMRI (Correa et al., 2007; Pliset al., 2014). We find this algorithm works well for the super-Gaussian distributions found infMRI (see simulations in Section 3), and it is computationally tractable and less sensitive toinitializations. Our approach can be adapted to more flexible (but typically computationallycostly) non-linearities that also allow the estimation of sub-Gaussian densities, for example,ProDenICA (Hastie and Tibshirani, 2003), if applied to other applications with sub-Gaussiandistributions. 6 lgorithm 1 group LNGCA Estimation Algorithm
Input: pre-whitened X i ∈ R T × V with X i = and V X i X (cid:48) i = I , NG subspace dimension q i for each subject i ; group NG subspace dimension q g . Output:
Group signals ˆ S g ∈ R q g × V ; individual signals ˆ S I,i ∈ R ( q i − q g ) × V for each subject i .1. Perform the subject-level LNGCA for each i : X i = ˆ M i ˆ S i + ˆ E i where ˆ S i ∈ R q i × V .2. Concatenate ˆ S i : ˆ S k = [ˆ S (cid:48) , . . . , ˆ S (cid:48) k ] (cid:48) ∈ R ( (cid:80) i q i ) × V . Conduct an SVD of ˆ S k . Retain thefirst q g right singular vectors of ˆ S k , denoted as ˆ Y g ∈ R q g × V .3. Perform noise-free ICA on the group signals subspace:ˆ Y g = ˆ M ˆ S g where ˆ S g ∈ R q g × V are the estimated q g group signals satisfying V ˆ S g ˆ S (cid:48) g = I.4. Project into the individual non-Gaussian subspace and apply LNGCA:ˆ M i ˆ S i (I − V ˆ S (cid:48) g ˆ S g ) = ˆ A i , ˆ A i = ˆ M Ii ˆ S I,i where ˆ S I,i ∈ R ( q g − q i ) × V is the estimated ( q g − q i ) individual signals for subject i .7n Step 2, we construct the subspace spanned by all NG signals across subjects by con-catenating the subject-level NGs. To estimate the group non-Gaussian subspace, we conductan SVD and extract the first q g singular vectors, which is equivalent to performing PCA onthe concatenated subject NGs. Insight into this step can be gained by viewing the SVDas a principal angle analysis. For ease of notation in what follows, rescale the NG compo-nents to have norm equal to 1 prior to concatenation. Assume observations v = 1 , . . . , V from k subjects, and let S gi,d ∈ R × V be the d th group component in the i th subject. Inthe concatenated NG components, this group component has multiplicity k . Then the sizeof the singular value from a group component shared by all subjects is equal to √ k . Inpractice, this group subspace will be an average of the information shared across subspaces.For two subjects, the principal angles between their subspaces are equal to arccos( σ d − d = 1 , . . . , ( q + q ), where σ d is the d th singular value and q i is the number of NG com-ponents in the i th subject. In the case of q = q = 1, the first right singular vector is theaverage of the NG components, and σ − { q i } i =1 ,...,k in Section 3.2. Determining q G is beyond thescope of this work but discussed in Section 5. Notice the choice of q G is more of a practicalconsideration for large noisy fMRI data, since those data usually do not admit a clear gapbetween group and individual components. With small q G , only a small portion of NG signalscan be estimated, but fast and accurate. With large q G , the returned components are likelyto contain all NG signals, although with slow computation, and visual inspection is requiredto exclude noise components. We show such phenomena through simulation study.In Step 3, we search for the orthogonal transformation of the group subspace that max-imizes non-Gaussianity using noise-free ICA. This produces an estimate of the group com-ponents. 8n Step 4, we estimate the subject-specific signal subspace as the orthogonal complementof the group signal subspace in the estimated NG subspace for that subject. Then we applyLNGCA to the subject-specific subspace to extract subject-specific NG signals. A major difference between group ICA and group LNGCA is that LNGCA is conductedfor subject dimension reduction rather than PCA. In simulations, we examine how thisimpacts the estimation under different signal variance regimes. Although both algorithmsuse PCA for the group stage dimension reduction, it is performed on distinct subspaces. Forgroup LNGCA, as long as the NG signals are extracted in many subjects, they will be thetop principal components in the concatenated NG components. For group ICA, there is nosuch guarantee. An NG signal is included in the top principal components only when it hasrelatively large variance and thus is not largely discarded in the subject PCA step. Althoughone could keep a very large ratio of variance in the subject PCA step to avoid the issue,that would result in challenges to accurately conduct the group stage PCA (usually throughiterative methods) due to much larger data size compared to group LNGCA. When there isno dimension reduction on the subject data, the group LNGCA algorithm is equivalent togroup ICA algorithm as in Calhoun et al. (2001).In addition to the difference in how the group subspace is defined, group LNGCA es-timates individual signals, which is useful for examining non-Gaussian signal that is notcaptured in the group component. In contrast, group ICA discards all information notcaptured by the group components.
To estimate the NG subspace dimension q i for each subject, recently proposed meth-ods (Nordhausen et al., 2017a; Jin et al., 2017) sequentially test the dimension of the NG9ubspace: H k : There are at most k NG signals. versus H kA : There are at least k + 1 NG signals.for 0 ≤ k ≤ T −
1. Suppose the true dimension is k . With increasing sample size, we expecta good test to perform as: (1) for k < k , the power of the test for H k approaches one; (2)for k = k , the size of the test for H k approaches prespecified α ; and (3) for k > k , therejection probability for H k tends to be smaller than α .Denote the p-value associated with H k by p k . The estimate is ˆ k = { k | p ˆ k ≤ α, p ˆ k − > α } .It is very expensive to test H k for all possible k . However, using a binary search in k , wecan expect no more than (cid:100) log T (cid:101) tests for dimension T (Jin et al., 2017). The estimatedˆ k then relies on multiple tests, which may be problematic for large T . One may consideradjusting the obtained p-values using Bonferroni correction or FDR control. However, thesequential tests are highly dependent in that: for any k < k , when H k holds, H k musthold. Adjusting p-values without considering this special dependence structure may largelydecrease the power of sequential tests and damage the estimated ˆ k . Thus we use the originalp-values and show through simulations in Sec 3.2 that we obtain accurate estimation of k . Suppose D ( · ) is some non-Gaussianity measure for a signal, e.g., skewness, excess kurtosis,or the Jarque-Bera statistic. For a matrix Y ∈ R T × V , sort Y = [ Y (cid:48) (1) , · · · , Y (cid:48) ( T ) ] (cid:48) such that D ( Y (1) ) > · · · > D ( Y ( T ) ), where Y ( t ) ∈ R V is a row vector of Y , for t = 1 , . . . , T . In thissection, we drop the subject index, but in practice, these tests are applied separately to eachsubject. We will use this sorting (rank) notation for other data matrices below. Motivatedby Jin et al. (2017), we assume Y ( j ) is more non-Gaussian than Y ( j ) in terms of D ( · ) for j < j . Consequently, under H k , [ Y (cid:48) (1) , . . . , Y (cid:48) ( k ) ] (cid:48) are NG signals and [ Y (cid:48) ( k +1) , . . . , Y (cid:48) ( T ) ] (cid:48) areGaussian noise. Jin et al. (2017) proposed an algorithm to test H k based on a max-minestimator which maximizes the non-Gaussanity of k components while minimizing the non-10aussianity of the ( T − k ) Gaussian components. The key idea is to extract the NG signalsand Gaussian noise from the original data, then generate new data samples mixed from theextracted NG signals and new sampled Gaussian noise. However, for every generated newsample, the algorithm conducts a full rank ICA-like algorithm on a matrix of size V × T ,which is very expensive. To overcome such computational burden, we choose to only samplethe Gaussian noise rather than the whole data set. We propose an algorithm based on the sample distribution of the maximum component-wise non-Gaussianity, i.e. D ( G (1) ), when ICA is applied to T − k Gaussian components G .We state our method in Algorithm 2. Algorithm 2
Our NG subspace dimension test H k . Input: pre-whitened X ∈ R T × V with X1 = 0 and V XX (cid:48) = I , number of samples B . Output:
Empirical p-value ˆ p .1. Estimate ˆ S = ˆ WX with ˆ S ∈ R T × V , ˆ W ∈ O T × T .2. Sort ˆ S = [ˆ S (cid:48) (1) , · · · , ˆ S (cid:48) ( k +1) , . . . , ˆ S (cid:48) ( T ) ] (cid:48) to obtain D (ˆ S ( k +1) ).3. Repeat for b = 1 , · · · , B :(a) Generate ( T − k ) independent Gaussian components G ( b ) ∈ R ( T − k ) × V .(b) Estimate ˆ S ( b ) G = ˆ W ( b ) G G ( b ) with ˆ S ( b ) G ∈ R ( T − k ) × V , ˆ W ( b ) G ∈ O ( T − k ) × ( T − k ) .(c) Sort the rows of ˆ S ( b ) G to obtain D (ˆ S ( b ) G, (1) ).4. Calculate the empirical p-value: ˆ p = { D (ˆ S ( k +1) ) 3. Simulations: spatio-temporal signals We evaluate the performance of group LNGCA versus group ICA (Calhoun et al., 2001)in a simulation study with twenty subjects. We also compare the performance of our NGsubspace dimension test with those proposed in Nordhausen et al. (2017a). All algorithmsare repeated with 30 random initializations. Our code is available at https://github.com/yuxuanzhao2295/Group_LNGCA_for_Neuroimaging .12 .1. Data generation For each of the twenty subjects, we used 3 group signals, 22 individual signals from gammarandom fields, and 25 Gaussian noise components from Gaussian random fields. Examplecomponents are depicted in Fig. 1. The group signals have active pixels in the shape of a “1”,“2 2”, or “3 3 3” with values between 0 . . neuRosim . Forboth gamma and Gaussian random fields, the full-width at half maximum of the Gaussiankernel, which controls spatial correlation, was set to 9. For the marginal distributions of theindividual signals, we selected gamma parameters (shape parameter 0 . 02, rate parameter10 − ) that resulted in individual signals with mean logistic non-linearity ranging from − . − . 87. The corresponding range found in our real data application is from − . 39 to − . 82. For the columns of the mixing matrix, we simulate AR(1) processes per subjectwith φ = 0 . 37, estimated from our real data application (detailed in the Web AppendixAppendix Appendix A.1). The scaling of the columns was chosen to result in varianceproportions detailed below.To control signal and noise strength, we define the subspace variance ratio (SVAR) amongthe group NG subspace, individual NG subspace, and Gaussian noise subspace. Fixingthe subscript i , let λ , . . . , λ q g be the nonzero eigenvalues from the eigenvalue decompo-sition (EVD) of the covariance matrix of M gi S g ; µ , . . . , µ q i − q g be the nonzero eigenval-ues from the EVD of the covariance matrix of M Ii S I,i ; and ν , . . . , ν T − q i be the nonzeroeigenvalues from the EVD of the covariance matrix of M n i N i . The SVAR is defined as (cid:80) q g l =1 λ l : (cid:80) q i − q g l =1 µ l : (cid:80) T − q i l =1 ν l . We can also normalize these values by the total variance,and then this is equivalent to the proportion of variance from the group signals, individualsignals, and Gaussian noise. Here, we focus on the variance ratio of the group NG subspace,so we designed three simulation settings corresponding to a high SVAR (large (cid:80) q g l =1 λ l ), amedium SVAR, and a low SVAR in the group NG subspace. In the discussion that follows,we refer to the variance of a component as the sum of squares of the corresponding column13 igure 1: Simulated non-Gaussian and Gaussian components. First row depicts 3 group NG signals. Secondrow depicts 3 (of 22) individual NG signals. Last row depicts 3 (of 25) Gaussian noise components. Eachcomponent is a 33 × 33 image corresponding to V = 1089. 14f the mixing matrix.First, we describe how we allocated the variance in a given subspace to the componentsin that subspace. For all settings, we assumed the SVAR was equal for all 20 subjects. Thenin the group subspace, we defined a low variance component, a medium variance component,and a higher variance component using the 0 . 1, 0 . 5, and 0 . . 4% : 29 . 8% : 54 . (cid:80) q g l =1 λ l : (cid:80) q i − q g l =1 µ l : (cid:80) T − q i l =1 ν l =33 . 5% : 29 . 9% : 36 . . 1, 0 . 5, and0 . (cid:80) q g l =1 λ l : (cid:80) q i − q g l =1 µ l : (cid:80) T − q i l =1 ν l = 1 . 7% : 46% : 52 . (cid:80) q g l =1 λ l to be half of the sum of that in high andlow SVAR scenario. Thus we have medium SVAR: (cid:80) q g l =1 λ l : (cid:80) q i − q g l =1 µ l : (cid:80) T − q i l =1 ν l = 17 . 6% :38 . 6% : 43 . We implemented our test in Algorithm 2 with spatially correlated Gaussian noise, denotedas FOBI-GRF , and compare its performance with tests FOBIasymp and FOBIboot from theR package ICtest (Nordhausen et al., 2017b). For each test, we apply a binary search toestimate the NG subspace dimension. For each SVAR setting, one experiment contains 2015 OBI−GRF (our) FOBIasymp FOBIboot0 10 20 30 40 50 0 10 20 30 40 50 0 10 20 30 40 50 Estimated NG Subspace Dimension C oun t Figure 2: Estimated non-Gaussian subspace dimension across 800 subjects under medium SVAR setting.The significance level α = 0 . 05. Dashed line indicates true dimension 25. The most frequently selecteddimension using our test, FOBI-GRF, corresponds to the true dimension. Although the test underestimatedthe dimensions in many simulations, this was due to possibly missing individual components, while it alwaysretained the group components, as described in Section 3.3. subjects and the experiment is repeated 40 times, thus 800 subjects in total. The true NGsubspace dimension is 25 for all settings. We report the results under the medium SVARsetting in Fig. 2. The results under the high and low SVAR settings are very similar andappear in Web Appendix Fig. A.7. FOBIasymp and FOBIboot do not consider the spatial correlation, and they clearly overes-timate the dimension. FOBIboot is a parametric bootstrap method that simulates Gaussiannoise with iid entries. We see FOBIboot estimates almost all components as NG when the as-sumption is violated. Our test FOBI-GRF performs the best in terms of accuracy and stability.Sometimes our test underestimates the number of NG components, but as we will show inthe next section, group LNGCA is still able to extract all group NG components. The under-estimation can occur due to the event of “unmixed” Gaussian RF components having highernon-Gaussianity than gamma RF components. This is similar to the gap between signal andnoise eigenvalues in principal component analysis (Zhu and Ghodsi, 2006). In our context, asmaller gap between the NG signals and transformations of the spatially correlated Gaussian16ignals can result in a decrease in the accuracy of our testing procedure. In applications, theindividual components tend to correspond to structured artifacts or individual resting-statenetworks, which may be more distinct from Gaussian RFs than the gamma RFs used in oursimulations. The choice of gamma RFs here is to allow computational scalability, since it isdifficult to generate 22*20 individual NG components.We comment here the NG subspace dimension is a challenging problem, and in particu-lar is more difficult for higher dimensions. Suppose we have a fixed number of NG signalsand linearly mix them with Gaussian components. As the number of Gaussian componentsincrease, there will also be more components displaying high non-Gaussianity among ex-tracted independent components. Bickel et al. (2018) showed the projection of multivariateGaussian distribution can approximate any non-Gaussian distribution for large T , in par-ticular, for T > V . Here, it appears that with low frequency some components estimatedfrom Gaussian RFs can be more non-Gaussian than the NGs. Thus when estimating theNG subspace dimension k , one could consider using a higher value of α with increased di-mension T , as the NG component can be hidden between spurious components. In practice,a visual inspection may suffice to identify the spurious, disc-like components that arise frommaximizing non-Gaussianity of Gaussian RFs. Using the estimated subject NG subspace dimension from our test as reported in Sec 3.2,we implement group LNGCA as in Algorithm 1 to estimate the group signals. For groupICA, we conducted the subject-level PCA with the number of signals selected to retain atleast 82% of the variance (selected to be consistent with the real data application, detailedin Section 4) for each subject. For both group LNGCA and group ICA, we examine theirperformance with group NG subspace dimension q g = 2 , owVar Comp MediumVar Comp HighVar Comp H i gh SV AR M e d i u m SV AR Lo w SV AR group LNGCA group ICA group LNGCA group ICA group LNGCA group ICA C o rr e l a t i on Figure 3: Correlation between three estimated and true group components. The correlation under all settingsfor group LNGCA concentrates at a high correlation value with vanishing variance. the results when q g = 3: the correlation between matched components in Fig. 3, and theestimated signals from the repetition associated with the median matching error (with truesignal) in each setting are depicted in Fig. 4.When SVAR is high, both methods can recover three group signals with high accuracy.However, accuracy across methods diverges as SVAR decreases. Group ICA failed to recoverthe low variance group signal in the medium SVAR setting and failed to extract any groupsignals under the low SVAR setting, while group LNGCA was highly accurate under all threeSVAR settings. Also notice group ICA is better able to estimate group signals with largervariance, which is consistent with the fact it reduces the data dimension by signals’ varianceand may lose low variance signals.We show in the appendix that group LNGCA is robust to the specified number of group18 i gh SVA R Low Var Comp Medium Var Comp High Var Comp Low Var Comp Medium Var Comp High Var Comp M ed i u m SVA R Lo w SVA R group LNGCA group ICA Figure 4: The estimated group components from a representative simulation (median matching error) when q G = 3. Left three columns display results from group LNGCA, while right three columns display resultsfrom group ICA. For each method, the allocated variance increases from left to right among three signals.Three rows represents high, medium and low SVAR settings respectively, from top to bottom. q G = 4, all three group signals can still be recovered with highaccuracy; (2) when q G = 2, two of the three group signals can be recovered with highaccuracy. 4. Resting-state fMRI data example We applied group LNGCA and group ICA to resting-state fMRI data from 342 school-age children, ages 8 through 12 years, recruited at the Kennedy Krieger Institue (PI: S.Mostofsky) including 90 with autism spectrum disorder (ASD) and 252 typically developing(TD) controls. Resting-state fMRI was acquired during either a 5 min 20 s- or 6 min30 s-long scan on a 3.0 T Philips scanner using a single-shot, partially parallel, gradient-recalled echo planar sequence (repetition time 2500ms, echo time 30ms, flip angle = 75 ◦ ,SENSE factor of 2, 3-mm axial slices with no slice gap, in-plane resolution of 3 . × . 15 mmresulting in 84 × × 47 voxels). Data were registered to MNI space and smoothed using a6-mm FWHM Gaussian filter. Each participant’s preprocessed data were mean-centered andvariance normalized on a voxel-wise basis. Additional details on preprocessing and motionexclusion criteria are provided in the Web Appendix Appendix Appendix B.1. We applied our test in Algorithm 2 to six participants. Gaussian RFs were generatedusing the estimated FWHM of the data from 3dFWHMx in AFNI. The estimated dimen-sion was 65, 38, 89, 47, 73 and 83, corresponding to participants with 128 time points forfirst three participants and 156 for last three participants (detailed in the Web AppendixAppendix Appendix B.2). Thus the number of components varied by participant, but ingeneral indicated the NG signal was contained in a lower-dimensional subspace. The test isapplied to only six participants because it is computationally intensive on fMRI data, andwe only need to get a ballpark number here.For the subject-level PCA in group ICA, we performed dimension estimation on eachparticipant’s preprocessed data using an information theoretic approach implemented within20IFT (Li et al., 2007), which calculates minimum description length (MDL) assuming aGaussian model. The maximum of the estimated number of PCs from the 342 datasets was59. Following recommendations from Erhardt et al. (2011), we conservatively chose to use 85components for all subjects. This kept at least 82% of the variance in each subject’s data. Wenote that this approach retains more variance than suggested using GIFT’s dimensionalitytests, and in this respect will make our results with group LNGCA more similar to groupICA. For group LNGCA, we also chose 85 NGs for all participants, to make fair comparisonbetween two methods. Notice 85 is also near the maximum of our estimated NG dimensionfrom the six participants.For the group-level PCA, we retain 59 components for both group LNGCA and GIFT tomake fair comparison. As mentioned above, 59 is the maximum estimated number of PCsacross all subjects. For group ICA, participant-specific PCs were temporally concatenated and a second PCAwas performed to extract group-level PCs. Noise-free ICA was repeated on the group-levelPCs 100 times using the Infomax algorithm (Bell and Sejnowski, 1995) and the ICASSO tool-box (Himberg et al., 2004) with randomized initial conditions in GIFT. Group componentsfrom GIFT were labeled using ICs from (Allen et al., 2011).For group LNGCA, we used 40 randomized initializations for each subject-level LNGCAand 100 in the group-level ICA. We used the logistic (Infomax) non-linearity and matchedthe components in GIFT and group LNGCA using the Hungarian algorithm. The majorityof components were very similar, with correlations greater than 0 . 8; twenty components hadcorrelations less than 0 . 8, and of those twenty, five had correlations less than 0 . < . omp 47 Comp 57 Comp 59 G I FTg r oup L N G CA Autism Control Autism Control Autism Control −8.0−7.5−7.0−6.5−6.0−5.5−6.0−5.5−5.0−4.5−4.0 Log va r i a n ce log−variance of subjects time series Figure 5: Most significant component: group LNGCA component 59 (p=9 e − 5, corrected), its matchedGIFT component 59 (p=0 . 1) is not significant; 2nd significant component: GIFT component 47 (p=2 e − . 01) is also significant; 3rd significant component: groupLNGCA component 57 (p=3 e − . igure 6: Comparison of group LNGCA (left) and matched group ICA components (right) associated withthe most significant ASD vs. TD differences in intrinsic engagement. Color indicates voxel contributionacross all subjects, and color bars are based on the second and ninety-eighth percentiles for the positive andnegative values of each component. (Top row) Group LNGCA component 59 characterizes the default modenetwork while the matched group ICA component may be related to motion (high activation near the edges),and a significant ASD vs TD difference in engagement is only observed for the former (Fig. 5). For bothmethods, component 47 (middle row) is easily identified as the frontoparietal network and is significantlyless engaged in ASD vs TD children. Component 57 (bottom row) is identified as default mode network forboth methods but is more left-lateralized for group LNGCA, and we only observe a significant ASD vs TDdifference in engagement for group LNGCA. 5. Discussion We propose a method to extract group non-Gaussian components from hundreds of sub-jects. We demonstrate in simulations that our method can extract low variance features thatare discarded using group ICA. Our method involves a first-stage LNGCA for each subject.In this stage, we present a novel test of the number of non-Gaussian components in thepresence of spatially correlated noise that improves upon methods assuming uncorrelatednoise, which dramatically overestimate the dimensionality. We apply group LNGCA to anrsfMRI study and discover components that exhibit different levels of activity in childrenwith ASD as compared with typically developing children.Group LNGCA divides each subject’s components into group, individual, and Gaussiannoise components. Current group ICA methods generally assume common group compo-nents with subject-specific temporal deviations, including back projection and dual regres-sion (e.g., Erhardt et al. 2011) and hierarchical ICA (Guo and Tang, 2013). Our formulation25an allow subject-specific deviations to be captured by individual components. These in-dividual components can also correspond to artifacts, and the subject-level LNGCA stepappears to separate artifacts from neuronally related components. Lower dimensional PCAin PCA+ICA can mistakenly aggregate features, as examined in Risk et al. (2019), althoughhere we have compared LNGCA to PCA+ICA with relatively high dimensional PCA. Insome respects, our formulation is more flexible since it can detect components unique to onesubject. Our framework is related to Joint and Individual Variation Explained (Lock et al.,2013) and Common Orthogonal Basis Extraction (COBE) (Zhou et al., 2015). Individualsubspaces in rsfMRI improved the prediction of behavior (Kashyap et al., 2019). This is alsoa promising direction for future research in group LNGCA.In GIFT, there is arguably no clear guidance how to select the number of PCs for both thesubject and group level. For the subject-level, it is often selected as many as the computationcan afford, but not smaller than the estimated number from the MDL-based method, whichassumes Gaussianity. In group LNGCA, our proposed test can be used to determine thenumber of components to be kept for the subject level. Determining the number of groupcomponents is unresolved in the group ICA literature and is beyond the scope of this work.Methods from JIVE could be examined to determine whether subject-level non-Gaussiancomponents are noisy estimates of group components (Lock et al., 2013; Feng et al., 2018).Lastly, our application to resting-state fMRI may be a conservative illustration of the dif-ference between group ICA and group LNGCA. We chose 85 PCs and 85 NGs for group ICAand group LNGCA, respectively, which is higher than suggested by GIFT’s criterion, andhence is higher than would be used in many neuroimaging studies. Even with this conser-vative approach, we find important differences in resting-state networks. Most notably, thecomponent 59 showed significantly greater activity in the ASD versus typically developinggroup in group LNGCA, but these differences were not detected using the matched compo-nent from group ICA. We hypothesize that greater dimension reduction in the subject-levelanalyses would lead to larger differences in the estimated resting-state networks.26 cknowledgements Work was supported in part by grants from Autism Speaks, the National Institute of Men-tal Health (R01 MH078160, U54 HD079123, K01 MH109766), the National Institute of Neu-rological Disorders and Stroke (R01 NS048527), the National Institutes of Health/NationalCenter for Research Resources Clinical and Translational Science Award (UL1 TR 000424-06), and the National Institute of Biomedical Imaging and Bioengineering (P54 EB15909).Financial support from the Cornell University Institute of Biotechnology, the New York StateDivision of Science, Technology and Innovation (NYSTAR), a Xerox PARC Faculty ResearchAward, National Science Foundation Awards 1455172, 1934985, 1940124, and 1940276, US-AID, and Cornell University Atkinson’s Center for a Sustainable Future, is gratefully ac-knowledged as well. References Allen, E. A., Erhardt, E. B., Damaraju, E., Gruner, W., Segall, J. M., Silva, R. F., Havlicek,M., Rachakonda, S., Fries, J., Kalyanam, R., et al. (2011). A baseline for the multivariatecomparison of resting-state networks. Frontiers in systems neuroscience , 5:2.Beckmann, C. F. (2012). Modelling with independent components. Neuroimage , 62(2):891–901.Beckmann, C. F., Mackay, C. E., Filippini, N., and Smith, S. M. (2009). Group comparison ofresting-state fmri data using multi-subject ica and dual regression. Neuroimage , 47(Suppl1):S148.Beckmann, C. F. and Smith, S. M. (2004). Probabilistic independent component analysis forfunctional magnetic resonance imaging. IEEE transactions on medical imaging , 23(2):137–152.Bell, A. J. and Sejnowski, T. J. (1995). An information-maximization approach to blindseparation and blind deconvolution. Neural computation , 7(6):1129–1159.27ickel, P. J., Kur, G., and Nadler, B. (2018). Projection pursuit in high dimensions. Pro-ceedings of the National Academy of Sciences , 115(37):9151–9156.Calhoun, V. D., Adali, T., Pearlson, G. D., and Pekar, J. (2001). A method for makinggroup inferences from functional mri data using independent component analysis. Humanbrain mapping , 14(3):140–151.Calhoun, V. D., Eichele, T., and Pearlson, G. (2009). Functional brain networks inschizophrenia: A review. Frontiers in Human Neuroscience , 3(AUG):17.Cardoso, J.-F. (1989). Source separation using higher order moments. In InternationalConference on Acoustics, Speech, and Signal Processing, , pages 2109–2112. IEEE.Chen, Z. and Calhoun, V. (2018). Effect of Spatial Smoothing on Task fMRI ICA andFunctional Connectivity. Frontiers in Neuroscience , 12(FEB):15.Coalson, T. S., Van Essen, D. C., and Glasser, M. F. (2018). The impact of traditionalneuroimaging methods on the spatial localization of cortical areas. Proceedings of theNational Academy of Sciences of the United States of America , 115(27):E6356–E6365.Correa, N., Adalı, T., and Calhoun, V. D. (2007). Performance of blind source separationalgorithms for fmri analysis using a group ica method. Magnetic resonance imaging ,25(5):684–694.Du, Y. and Fan, Y. (2013). Group information guided ica for fmri data analysis. Neuroimage ,69:157–197.Du, Y., Fu, Z., Sui, J., Gao, S., Xing, Y., Lin, D., Salman, M., Rahaman, M. A., Abrol,A., Chen, J., et al. (2019). Neuromark: a fully automated ica method to identify effectivefmri markers of brain disorders. medRxiv , page 19008631.Du, Y., Pearlson, G. D., Lin, D., Sui, J., Chen, J., Salman, M., Tamminga, C. A., Ivleva,E. I., Sweeney, J. A., Keshavan, M. S., et al. (2017). Identifying dynamic functional con-28ectivity biomarkers using gig-ica: Application to schizophrenia, schizoaffective disorder,and psychotic bipolar disorder. Human brain mapping , 38(5):2683–2708.Eloyan, A., Crainiceanu, C. M., and Caffo, B. S. (2013). Likelihood-based population inde-pendent component analysis. Biostatistics , 14(3):514–527.Erhardt, E. B., Rachakonda, S., Bedrick, E. J., Allen, E. A., Adali, T., and Calhoun, V. D.(2011). Comparison of multi-subject ica methods for analysis of fmri data. Human brainmapping , 32(12):2075–2095.Feng, Q., Jiang, M., Hannig, J., and Marron, J. (2018). Angle-based joint and individualvariation explained. Journal of Multivariate Analysis , 166:241–265.Guo, Y. (2011). A general probabilistic model for group independent component analysisand its estimation methods. Biometrics , 67(4):1532–1542.Guo, Y. and Tang, L. (2013). A hierarchical model for probabilistic independent componentanalysis of multi-subject fmri studies. Biometrics , 69(4):970–981.Hastie, T. and Tibshirani, R. (2003). Independent components analysis through productdensity estimation. Advances in Neural Information Processing Systems , 15:649–656.Himberg, J., Hyv¨arinen, A., and Esposito, F. (2004). Validating the independent componentsof neuroimaging time series via clustering and visualization. Neuroimage , 22(3):1214–1222.Hyvarinen, A. (1999). Fast and robust fixed-point algorithms for independent componentanalysis. IEEE Transactions on Neural Networks , 10(3):626–634.Jin, Z., Risk, B. B., and Matteson, D. S. (2017). Optimization and testing in linear non-gaussian component analysis. Statistical Analysis and Data Mining: The ASA Data Sci-ence Journal . 29ashyap, R., Kong, R., Bhattacharjee, S., Li, J., Zhou, J., and Yeo, B. T. (2019). Individual-specific fmri-subspaces improve functional connectivity prediction of behavior. NeuroIm-age , 189:804–812.Li, Y.-O., Adalı, T., and Calhoun, V. D. (2007). Estimating the number of independentcomponents for functional magnetic resonance imaging data. Human brain mapping ,28(11):1251–1266.Lock, E. F., Hoadley, K. A., Marron, J. S., and Nobel, A. B. (2013). Joint and individualvariation explained (jive) for integrated analysis of multiple data types. The annals ofapplied statistics , 7(1):523.Mejia, A. F., Nebel, M. B., Wang, Y., Caffo, B. S., and Guo, Y. (2019). Template indepen-dent component analysis: Targeted and reliable estimation of subject-level brain networksusing big data population priors. arXiv preprint arXiv:1906.07294 .Minka, T. P. (2001). Automatic choice of dimensionality for pca. In Advances in neuralinformation processing systems , pages 598–604.Nordhausen, K., Oja, H., Tyler, D. E., and Virta, J. (2017a). Asymptotic and bootstraptests for the dimension of the non-gaussian subspace. IEEE Signal Processing Letters ,24(6):887–891.Nordhausen, K., Oja, H., Tyler, D. E., and Virta, J. (2017b). Ictest: estimating and testingthe number of interesting components in linear dimension reduction. URL https://CRAN.R-project. org/package= ICtest. R package version 0.3 .Padmanabhan, A., Lynch, C. J., Schaer, M., and Menon, V. (2017). The default modenetwork in autism. Biological Psychiatry: Cognitive Neuroscience and Neuroimaging ,2(6):476–486. 30lis, S. M., Hjelm, D. R., Salakhutdinov, R., Allen, E. A., Bockholt, H. J., Long, J. D.,Johnson, H. J., Paulsen, J. S., Turner, J. A., and Calhoun, V. D. (2014). Deep learningfor neuroimaging: a validation study. Frontiers in neuroscience , 8:229.Rachakonda, S., Silva, R. F., Liu, J., and Calhoun, V. D. (2016). Memory efficient pcamethods for large group ica. Frontiers in neuroscience , 10:17.Risk, B. B., Matteson, D. S., and Ruppert, D. (2019). Linear non-gaussian component analy-sis via maximum likelihood. Journal of the American Statistical Association , 114(525):332–343.Risk, B. B., Matteson, D. S., Ruppert, D., Eloyan, A., and Caffo, B. S. (2014). An evaluationof independent component analyses with an application to resting-state fmri. Biometrics ,70(1):224–236.Welvaert, M., Durnez, J., Moerkerke, B., Verdoolaege, G., and Rosseel, Y. (2011). neurosim:An r package for generating fmri data. Journal of Statistical Software , 44(10):1–18.Yerys, B. E., Tun¸c, B., Satterthwaite, T. D., Antezana, L., Mosner, M. G., Bertollo, J. R.,Guy, L., Schultz, R. T., and Herrington, J. D. (2019). Functional connectivity of fron-toparietal and salience/ventral attention networks have independent associations withco-occurring attention-deficit/hyperactivity disorder symptoms in children with autism. Biological Psychiatry: Cognitive Neuroscience and Neuroimaging , 4(4):343–351.Zhou, G., Cichocki, A., Zhang, Y., and Mandic, D. P. (2015). Group component analysisfor multiblock data: Common and individual feature extraction. IEEE transactions onneural networks and learning systems , 27(11):2426–2439.Zhu, M. and Ghodsi, A. (2006). Automatic dimensionality selection from the scree plot viathe use of profile likelihood. Computational Statistics & Data Analysis , 51(2):918–930.31 eb-based Supplementary Materials for “Group linearnon-Gaussian component analysis with applications toneuroimaging” Appendix A. Simulation details Appendix A.1. Experimental setup We used the time courses estimated from group LNGCA of the real data application toestimate a time series model for use in the simulations. For each of the 85 non-Gaussiancomponents extracted for each of the 342 subjects, we first estimated the time courses byordinary least squares, and then fitted the estimated time courses using an AR(1) process.The median of the estimated coefficient is 0 . 37 over all component/subject combinations(85*342). We used this AR coefficient to simulate time courses following an AR(1) processin our simulations. Appendix A.2. Details on dimension estimation In this section, we provide the results of dimension estimation. In short, all three meth-ods return very similar but not identical dimension estimation across three SVAR settings,displayed in Fig. A.7. We conjecture the reason is that the variance does not drive thedimension estimation accuracy, but rather the difference in the non-Gaussianity of the non-Gaussian components and the non-Gaussianity achieved by rotating the spatially correlatedGaussian noise. Appendix A.3. Details on group components extraction In this section, we provide the results of group LNGCA and group ICA on group com-ponents in simulations when the number of group components is misspecified: q G = 4 and q G = 2. 32 OBI−GRF (our) FOBIasymp FOBIboot H i gh SV AR Lo w SV AR Estimated NG Subspace Dimension C oun t Figure A.7: Estimated non-Gaussian subspace dimension across 800 subjects under high and low SVAR set-ting. The significance level α = 0 . 05. Dashed line indicates true dimension 25. The most frequently selecteddimension using our test, FOBI-GRF, corresponds to the true dimension. Although the test underestimatedthe dimensions in many simulations, this was due to possibly missing individual components, while it alwaysretained the group components. owVar Comp MediumVar Comp HighVar Comp H i gh SV AR M e d i u m SV AR Lo w SV AR group LNGCA group ICA group LNGCA group ICA group LNGCA group ICA C o rr e l a t i on Figure A.8: Correlation between three true group components and their matched estimated componentswhen q G = 4. The correlation under all settings for group LNGCA concentrates at a high correlation valuewith vanishing variance. First when q G = 4, both methods perform similar to the case when q G = 3: the threecomponents matched to true group components have similar correlation as when q G = 3,showed in Fig. A.8. The estimated signals from the repetition associated with the medianmatching error (with true signal) in each setting are depicted in Fig. A.9.When q G = 2, both methods extract two components with higher variance. The cor-relation between true group components and their matched estimation are similar to thecorresponding value when q G = 3, showed in Fig. A.10. The estimated signals from therepetition associated with the median matching error (with true signal) in each setting aredepicted in Fig. A.11. 34 i gh SVA R Low Var Comp Medium Var Comp High Var Comp Low Var Comp Medium Var Comp High Var Comp M ed i u m SVA R Lo w SVA R group LNGCA group ICA Figure A.9: The estimated group components from a representative simulation (median matching error)when q G = 4. Left four columns display results from group LNGCA, while right four columns display resultsfrom group ICA. For each method, the allocated variance increases from left to right among three signalsexcluding the last estimated noise signal. Three rows represents high, medium and low SVAR settingsrespectively, from top to bottom. Appendix B. Details on resting-state fMRI data example Appendix B.1. Additional information on the data and preprocessing All children completed a mock scan to acclimate to the scanning environment. Par-ticipants were instructed to relax, fixate on a cross-hair, and remain as still as possi-ble. Functional data were preprocessed using SPM12 and custom MATLAB code ( https://github.com/KKI-CNIR/CNIR-fmri_preproc_toolbox ). Rs-fMRI scans were slice-timeadjusted using the slice acquired in the middle of the TR as a reference, and rigid bodyrealignment parameters were estimated to adjust for motion. The volume collected in themiddle of the scan was non-linearly registered to Montreal Neurological Institute (MNI) spaceusing the MNI EPI template. The estimated rigid body and nonlinear spatial transforma-tions were applied to the functional data in one step, producing 2-mm isotropic voxels in MNIspace. Voxel time series were linearly detrended. Data were excluded for between-volumetranslational movements > > ediumVar Comp HighVar Comp H i gh SV AR M e d i u m SV AR Lo w SV AR group LNGCA group ICA group LNGCA group ICA C o rr e l a t i on Figure A.10: Correlation between two estimated components and their matched true group componentswhen q G = 2. The correlation under all settings for group LNGCA concentrates at a high correlation valuewith vanishing variance. i gh SVA R Medium Var Comp High Var Comp Medium Var Comp High Var Comp M ed i u m SVA R Lo w SVA R group LNGCA group ICA Figure A.11: The estimated group components from a representative simulation (median matching error)when q G = 2. Left two columns display results from group LNGCA, while right two columns display resultsfrom group ICA. For each method, the allocated variance increases from left to right. Three rows representshigh, medium and low SVAR settings respectively, from top to bottom. able B.1: Details on dimension test of six subjects Sequential Hypothesis Path, presented as dimension(p-value) Time (mins) -75(0.11)-70(0.14)-67(0.10)-66(0.06)- -36(0.04)- -90(0.1)- -48(0.08)- - - Appendix B.2. Dimension estimation We applied the NG subspace dimension test in Section 2.3 to six participants. Theestimated dimension was 65, 38, 89, 47, 73 and 83, corresponding to participants with 128time points for first three participants and 156 for last three participants. We report thesequential hypothesis applied, its associated p-values and computation time here for eachparticipant in Table B.1. For each participant, the sequential test starts from 85, and theestimated dimension is determined by the two bold test results.We also implemented a sequential test with FOBIasymp and the estimated dimensions are126, 126, 126, 148, 151 and 153 for participants FOBIasymp tendsto overestimate the number of non-Gaussian components, as discussed in our simulation. Appendix B.3. Subject-specific components Example subject-specific components are depicted in Figure B.12.38 igure B.12: Example subject-specific components from four different participants. These componentsinclude artifacts. Activation near the brain edge, as in the first and third rows, is often indicative of amotion artifact.igure B.12: Example subject-specific components from four different participants. These componentsinclude artifacts. Activation near the brain edge, as in the first and third rows, is often indicative of amotion artifact.