Grouping effects of sparse CCA models in variable selection
11 Grouping effects of sparse CCA models in variableselection
Kefei Liu, Qi Long, Li Shen
Abstract —The sparse canonical correlation analysis (SCCA) isa bi-multivariate association model that finds sparse linear combi-nations of two sets of variables that are maximally correlated witheach other. In addition to the standard SCCA model, a simplifiedSCCA criterion which maixmizes the cross-covariance betweena pair of canonical variables instead of their cross-correlation, iswidely used in the literature due to its computational simplicity.However, the behaviors/properties of the solutions of these twomodels remain unknown in theory. In this paper, we analyzethe grouping effect of the standard and simplified SCCA modelsin variable selection. In high-dimensional settings, the variablesoften form groups with high within-group correlation and lowbetween-group correlation. Our theoretical analysis shows thatfor grouped variable selection, the simplified SCCA jointly selectsor deselects a group of variables together, while the standardSCCA randomly selects a few dominant variables from eachrelevant group of correlated variables. Empirical results onsynthetic data and real imaging genetics data verify the findingof our theoretical analysis.
Index Terms —canonical correlation analysis (CCA), sparseCCA, grouped variables, dimensionality reduction, imaging ge-netics
I. I
NTRODUCTION
Canonical correlation analysis (CCA) [1], [2] is a multi-variate statistical method which investigates the associationsbetween two sets of variables. It has found applications instatistics [3], data mining and machine learning [2], [4],functional magnetic resonance imaging [5], [6], genomics[7] and other fields [8]. Given two data sets X ∈ R n × p and Y ∈ R n × q measured on the same set of n samples, CCA seekslinear combinations of the variables in X and those in Y thatare maximally correlated with each other:maximize u , v u T X T Yv s.t. u T X T Xu ≤ , v T Y T Yv ≤ , where X and Y are column-centered to zero mean.Compared with multivariate multiple regression, the CCA is“symmetric” and more flexible in finding variables from both X and Y to predict each other well. However, in high dimensionalsetting ( n < p ) such as linking imaging to genomics [9], [10],the CCA breaks down because it has infinitely many solutions.In particular, the solution can have any support of cardinalitygreater than or equal to n , which means that the CCA canselect an arbitrary set of n or more variables. To handle that,the sparse CCA (SCCA) [11], [12], [13], [14], [15] utilizesthe L1 sparsity regularization to select a subset of variables, Department of Biostatistics, Epidemiology and Informatics,University of Pennsylvania, Philadelphia, Pennsylvania, USA.Email:[email protected]. which can improve the interpretability, stability as well asperformance in variable selection.A main drawback of the SCCA is that it is computationallyexpensive. To reduce the computational load, a commonpractice is to replace the covariance matrices X T X and Y T Y in the L2 constraints with diagonal matrices [16], [17], [18],[19], [20]. The resulting simplified SCCA model allows aclosed-form solution for solving each subproblem (update of u with v fixed or vice versa) and is thus computationally moreefficient.However, the fundamental difference between the standardand simplified SCCA in variable selection remains unclear,particularly in the theoretical properties of their solutions.In [17], [20], the use of the simplified SCCA model isjustified based only on the empirical observation that “inhigh-dimensional classification problems [21], [22], treatingthe covariance matrix as diagonal can yield good results”. Inthis paper, we attempt to close this gap by investigating theproperties of the solutions of the standard and simplified SCCAmodels.Our main contributions are summarized as follows. ● The behaviors of the standard and simplified SCCA modelsin grouped variable selection is theoretically characterized.In high-dimension small sample-size problems, the vari-ables often form groups of various sizes with high within-group correlation and low between-group correlation. Itshows that the simplified SCCA jointly selects or deselectsa group of correlated variables together, while the standardSCCA tends to select a few dominant variables from eachrelevant group of correlated variables. This finding couldbe used by practitioners using SCCA, allowing them toselect the proper method for their tasks. ● The Lemma 2.2 of [17] is extended from c ∈ [√∣S∣ , ∞) to c ∈ ( , ∞) , where S = { i ∶ i ∈ argmax j ∣ a j ∣} . The Lemma2.2 of [17], which solves maximize u a T u subject to ∥ u ∥ ≤ , ∥ u ∥ ≤ c , is a key component of the simplified SCCAalgorithm used to solve the subproblems at each iterationof the alternating optimization algorithm. However, thelemma fails to provide a solution to the above problemfor c ∈ ( , √∣S∣) . ● Greedy algorithms to sequentially compute multiple canon-ical components for standard and simplified SCCA arederived and presented. To the best of our knowledge, thesealgorithms are new.
Notation : Scalars are denoted as italic letters, column vectorsas boldface lowercase letters, and matrices as boldface capitals.The j -th column vector of a matrix X is denoted as x j . The a r X i v : . [ s t a t . M E ] A ug superscript T stands for the transpose. The ∥ u ∥ and ∥ u ∥ denotethe Euclidean norm and (cid:96) norm of a vector u , respectively.The σ max ( A ) and λ max ( A ) denote the largest singular valueand largest eigenvalue of a matrix A , respectively. For a set S ,its cardinality is denoted as ∣S∣ . The soft-thresholding operatoris defined as S ( a, ∆ ) = ⎧⎪⎪⎪⎪⎨⎪⎪⎪⎪⎩ a − ∆ , a > ∆ a + ∆ , a < − ∆0 , − ∆ ≤ a ≤ ∆ , where ∆ is a non-negative constant.II. S PARSE
CCA
MODEL
Assume that X and Y are column-centered to zero mean.SCCA aims to find a linear combination of variables in X and Y to maximize their correlation [15], [13]:maximize u , v u T X T Yv subject to u T X T Xu ≤ , ∥ u ∥ ≤ c v T Y T Yv ≤ , ∥ v ∥ ≤ c , (1)where Xu and Yv are the canonical variables, u and v arecanonical loadings/weights measuring the contribution of eachfeature in the identified association, and c > , c > arethe regularization parameters that control the sparsity of thesolution.The problem (1) is not convenient to solve due to thequadratic constraints. To save the computational cost, it isa common practice to treat the covariance matrices X T X and Y T Y as diagonal [16], [17], [18], [19], [20], [14], [23]. Thisyields the following simplified formulation of SCCA:maximize u , v u T X T Yv subject to ∥ u ∥ ≤ , ∥ u ∥ ≤ c ∥ v ∥ ≤ , ∥ v ∥ ≤ c , (2)where c , c > .In Section IV-B and Supplementary Materials Section B,we will describe algorithms to fit the two models, as well asexplain how to obtain multiple canonical components.III. G ROUPING EFFECT ANALYSIS
In high-dimensional problems such as imaging genomics,grouped variables are common and how to properly select themis an important research problem [10], [24], [25], [26]. For asparse CCA model, we say it exhibits the grouping effect ifit jointly selects or deselects each group of highly correlatedvariables together.To gain initial insights, we start with the simplest case withall p X variables fully correlated with each other.
Lemma III.1.
Let x = x = ⋅ ⋅ ⋅ = x p have unit L2 norm.The optimal solution u ∗ to problem (1) is (i) any point on the segment of the line u + u +⋅ ⋅ ⋅+ u p = that is inside the L1 ball: u ∗ + u ∗ + ⋅ ⋅ ⋅ + u ∗ p = (3) ∥ u ∗ ∥ ≤ c (4) when c ≥ , and (ii) any u ∗ ≥ , u ∗ ≥ , . . . , u ∗ p ≥ that satisfy: u ∗ + u ∗ + ⋅ ⋅ ⋅ + u ∗ p = c (5) when < c < .The optimal solution u ∗ to problem (2) is: (i) u ∗ = u ∗ = ⋅ ⋅ ⋅ = u ∗ p = √ p when c ≥ √ p , and (ii) any u ∗ ≥ , u ∗ ≥ , . . . , u ∗ p ≥ that satisfy: u ∗ + u ∗ + ⋅ ⋅ ⋅ + u ∗ p = c (6) u ∗ + u ∗ + ⋅ ⋅ ⋅ + u ∗ p ≤ (7) when ≤ c < √ p .Proof. We first prove the result for problem (1), i.e., the SCCAmodel.When x = x = ⋅ ⋅ ⋅ = x p ≜ x , the problem (1) reduces tomaximize u , v ( u + u + ⋅ ⋅ ⋅ + u p ) x T Yv subject to ∣ u + u + ⋅ ⋅ ⋅ + u p ∣ ≤ , ∥ u ∥ ≤ c v T Y T Yv ≤ , ∥ v ∥ ≤ c , (8)where c ≥ , c ≥ .Note that the optimal solution to problem (8) is not uniquebecause the objective function remains the same after wereverse the signs of both u and v . To resolve this, we assume u + u + ⋅ ⋅ ⋅ + u p ≥ .Note also that the optimal value of problem (8) is largerthan zero when c > , c > .As a result, u and v can be independently optimized: u ∗ = argmax u ( u + u + ⋅ ⋅ ⋅ + u p ) subject to ∣ u + u + ⋅ ⋅ ⋅ + u p ∣ ≤ , ∥ u ∥ ≤ c (9) v ∗ = argmax v x T Yv subject to v T Y T Yv ≤ , ∥ v ∥ ≤ c . (10)Solving (9) yields the optimal solution u ∗ shown in (3)-(5).We next prove the result regarding problem (2), i.e., thesimplified SCCA model.When x = x = ⋅ ⋅ ⋅ = x p ≜ x , the problem (2) reduces tomaximize u , v ( u + u + ⋅ ⋅ ⋅ + u p ) x T Yv subject to ∥ u ∥ ≤ , ∥ u ∥ ≤ c ∥ v ∥ ≤ , ∥ v ∥ ≤ c , (11)where c ≥ , c ≥ .To resolve sign ambiguity, we assume u + u + ⋅ ⋅ ⋅ + u p ≥ .Therefore, u and v can be independently optimized: u ∗ = argmax u ( u + u + ⋅ ⋅ ⋅ + u p ) subject to ∥ u ∥ ≤ , ∥ u ∥ ≤ c (12) v ∗ = argmax v x T Yv subject to ∥ v ∥ ≤ , ∥ v ∥ ≤ c . (13)Solving (12) yields the optimal solution shown in LemmaIII.1 (simplified SCCA part). We then provide a formal proof of the grouping effects invariable selection for the simplified SCCA.
Theorem III.2.
Given data ( X , Y ) , with columns standardizedto zero mean and unit norm, and regularization parameters ( c , c ) . Let ( u ∗ , v ∗ ) be an optimal solution to problem (2) .Assume at ( u ∗ , v ∗ ) the L2 inequality constraint on u is stronglyactive. We have: ● when u ∗ i u ∗ j > ∣ u ∗ i − u ∗ j ∣ (14) ≤ α min ⎛⎝ σ max ( Y ) , c ¿``(cid:192) n ∑ (cid:96) = max ≤ j ≤ q y (cid:96)j ⎞⎠ √( − r ij ) / ● when u ∗ i u ∗ j < ∣ u ∗ i + u ∗ j ∣ (15) ≤ α min ⎛⎝ σ max ( Y ) , c ¿``(cid:192) n ∑ (cid:96) = max ≤ j ≤ q y (cid:96)j ⎞⎠ √( + r ij ) / , where r ij = x T i x j ∈ [− , ] is the Pearson correlationcoefficient between x i and x j , and α > is a constant thatonly depends on ( X , Y , c , c ) .Likewise, if at ( u ∗ , v ∗ ) the L2 inequality constraint on v isstrongly active, we have ● when v ∗ i v ∗ j > ∣ v ∗ i − v ∗ j ∣ (16) ≤ α min ⎛⎝ σ max ( X ) , c ¿``(cid:192) n ∑ (cid:96) = max ≤ i ≤ p x (cid:96)i ⎞⎠ √( − r ′ ij )/ ● when v ∗ i v ∗ j < ∣ v ∗ i + v ∗ j ∣ (17) ≤ α min ⎛⎝ σ max ( X ) , c ¿``(cid:192) n ∑ (cid:96) = max ≤ i ≤ p x (cid:96)i ⎞⎠ √( + r ′ ij ) / , where r ′ ij = y T i y j ∈ [− , ] is the Pearson correlationcoefficient between y i and y j , and α > is a constant thatonly depends on ( X , Y , c , c ) .Proof. Since each subproblem (solve for u with v fixed orsolve for v with u fixed) is a convex optimization problemwith differentiable objective and constraint functions (The L1inequality constraint can be written as p linear inequalityconstraints), and is strictly feasible (Slater’s condition holds),the KKT conditions provide necessary and sufficient conditionsfor optimality [27].The KKT conditions for the optimality of u ∗ consist of thefollowing conditions: α u ∗ + λ s = X T Yv ∗ , (18)where s i = sign ( u ∗ i ) if u ∗ i ≠ ; otherwise, s i ∈ [− , ] . α ≥ , ∥ u ∗ ∥ ≤ , α (∥ u ∗ ∥ − ) = (19) λ ≥ , ∥ u ∗ ∥ ≤ c , λ (∥ u ∗ ∥ − c ) = . (20) If u ∗ i u ∗ j > , then both u ∗ i and u ∗ j are non-zero with sign ( u ∗ i ) = sign ( u ∗ j ) . From (18), it follows that α u ∗ i + λ sign ( u ∗ i ) = x T i Yv ∗ (21) α u ∗ j + λ sign ( u ∗ j ) = x T j Yv ∗ . (22)Subtracting (22) from (21) gives α ( u ∗ i − u ∗ j ) = ( x i − x j ) T Yv ∗ . (23)Therefore, we have ∣ u ∗ i − u ∗ j ∣ = α ∣( x i − x j ) T Yv ∗ ∣≤ α ∥ x i − x j ∥ ∥ Yv ∗ ∥ . (24)Since X is column standardized, we have ∥ x i − x j ∥ = √∥ x i ∥ + ∥ x j ∥ − x T i x j = √ ( − r ij ) , (25)where r ij = x T i x j is the sample Pearson correlation coefficientbetween x i and x j .In the domain of problem (2), it holds that ∥ Yv ∗ ∥ ≤ σ max ( Y ) ∥ v ∗ ∥ ≤ σ max ( Y ) (26)and ∥ Yv ∗ ∥ = ¿```(cid:192) n ∑ (cid:96) = ⎛⎝ q ∑ j = y (cid:96)j v ∗ j ⎞⎠ ≤¿```(cid:192) n ∑ (cid:96) = max ≤ j ≤ q y (cid:96)j ⎛⎝ q ∑ j = ∣ v ∗ j ∣⎞⎠ =¿``(cid:192) n ∑ (cid:96) = max ≤ j ≤ q y (cid:96)j ∥ v ∗ ∥ ≤ c ¿``(cid:192) n ∑ (cid:96) = max ≤ j ≤ q y (cid:96)j , (27)where in (26) and (27) we have used the L2 and L1 constraintsin problem (2), respectively.Substituting (25)-(27) into (24), we arrive at ∣ u ∗ i − u ∗ j ∣≤ α min ⎛⎝ σ max ( Y ) , c ¿``(cid:192) n ∑ (cid:96) = max ≤ j ≤ q y (cid:96)j ⎞⎠ √( − r ij ) / . (28)Since the L2 inequality constraint on u is strongly active at ( u ∗ , v ∗ ) , we have α > . Specifically, combining conditions(18)-(20) yields α = ∥ S ( X T Yv ∗ , λ )∥ , (29)where λ = if this results in ∥ X T Yv ∗ ∥ ∥ X T Yv ∗ ∥ ≤ c ; otherwise, λ is the smallest positive number for which it satisfies ∥ S ( X T Yv ∗ ,λ )∥ ∥ S ( X T Yv ∗ ,λ )∥ = c . Thus we obtain (14).Using a similar line of argumentation, we can prove (15)and (16)-(17). (u ,u )-1.5 -1 -0.5 0 0.5 1 1.5-1.5-1-0.500.511.5 (1,1) (a) (1,1)(u ,u )0 0.5 1 1.5 u u (b) u u ,u ) (c)Fig. 1. The optimal solution set u ∗ with p = identical variables. (a) the SCCA problem with c = . ; (b) simplified SCCA with c = . ; (c) simplifiedSCCA with c = . . The feasible set of points are shown lightly shaded. The optimal points are highlighted in orange. Fig. 1 illustrates the optimal solution u ∗ to problems (1)and (2) with p = identical X variables. We see that forSCCA (Fig. 1(a)), the optimal solution set is a line segmentthat cross the axes (i.e., includes sparse solutions). While forsimplified SCCA (Figs. 1(b)-1(c)), the optimal solution setdoes not intersect with the axes (i.e., does not include sparse ornearly sparse solutions); in particular, when the L2 constrainton u is strongly active at the optimal solution, i.e., when c ≥ √ , the optimal solution set contains a single point withequal coordinates: ( u ∗ , u ∗ ) = ( √ , √ ) .IV. O PTIMIZATION A LGORITHMS
Both problems (1) and (2) are bi-convex, i.e., convex in u with v fixed and in v with u fixed, but not jointly convex in u and v . A standard method to solve the SCCA models isalternating optimization [28]: it first updates u while holding v fixed and then updates v while holding u fixed, and repeatsthis process until convergence. A. SCCA model (1)The SCCA model fitting algorithm is shown in Algorithm1.
Algorithm 1
SCCA algorithm Initialize v ; repeat Update u with v fixed:maximize u u T X T Yv subject to ∥ Xu ∥ ≤ , ∥ u ∥ ≤ c (30) Update v with u fixed:;maximize v u T X T Yv subject to v T Y T Yv ≤ , ∥ v ∥ ≤ c (31) until convergence.Both problems (30) and (31) are convex optimizationproblems, and in [15] the linearized alternating directionmethod of multipliers (ADMM) [29] algorithm has been proposed to solve each of them. Since in [15] it uses a slightlydifferent formulation (therein the L1 regularizer appears inthe objective function), we have presented a new linearizedADMM algorithm to solve problem (30) in SupplementaryMaterials A. B. Simplified SCCA model (2)We first introduce the following lemma, which will be usedas a building block in the simplified SCCA algorithm.
Lemma IV.1.
Consider the quadratically constrained linearprogram (QCLP) optimization problem:maximize u a T u subject to ∥ u ∥ ≤ , ∥ u ∥ ≤ c, (32) where c > is a constant.Define S = { i ∶ i ∈ argmax j ∣ a j ∣} . The optimal solution u ∗ to (32) is as below. ● Case 1 : c < √∣S∣[ u ∗ ] i = ⎧⎪⎪⎨⎪⎪⎩ c ∣S∣ sign ( a i ) , i ∈ S , i ∉ S (33) ● Case 2: c ≥ √∣S∣ u ∗ = S ( a , ∆ )∥ S ( a , ∆ )∥ (34) where ∆ = if this results ∥ u ∗ ∥ ≤ c ; otherwise, ∆ > satisfies ∥ u ∗ ∥ = c . Here the soft-thresholding S ( a , ∆ ) isapplied to a coordinate-wise. The above lemma extends Lemma 2.2 of [17] from c ∈[√∣S∣ , ∞) to c ∈ ( , ∞) . See Supplementary Materials SectionA for the proof of Lemma IV.1 and how it extends Lemma2.2 of [17]. In Case 1, the solution is generally not unique. Specifically, the optimalsolution has the following form: [ u ∗ ] i = { w i sign ( a i ) , i ∈ S , i ∉ S where w i , i ∈ S , can be any non-negative numbers that satisfy ∑ i ∈S w i ≤ , ∑ i ∈S w i = c . The presented solution is the solution that minimizes ∑ i ∈S w i . For the simplified SCCA in (2), each subproblem (solving u with v fixed or solving v with u fixed) is a QCLP problemof form (32), which results in Algorithm 2. Algorithm 2
Simplified SCCA algorithm Initialize v ; repeat Update u according to Lemma IV.1, with a = X T Yv and c = c ; Update v according to Lemma IV.1, with a = Y T Xu and c = c ; until convergence.Note that by repeatedly applying Algorithms 1 and 2, we canobtain multiple canonical components, as described in SectionB in Supplemental Materials.V. E XPERIMENTAL RESULTS AND DISCUSSION
We perform comparative study of the two SCCA modelsusing both synthetic data and real imaging genetics data.
A. Simulation study on synthetic data
Assume the data X ∈ R n × p and Y ∈ R n × q collect n i.i.d.observations/samples of random vectors x ∈ R p × and Y ∈ R q × (with slight abuse of notation), respectively, with n = , p ≈ , q = . We consider two simulation setups, one withuncorrelated variables and the other with grouped variables.For simplicity, we focus on the simulation and analysis of X variables only.
1) Setup 1: uncorrelated variables:
The random vector x is modeled as standard normal: x ∼ N ( , I p ) . Define z as z = c T x , where c ∈ R p × is a sparse vector. The random vector y is the modeled as y = d z + σ n (35)where d ∈ R q × is a sparse vector, n ∼ N ( , I q ) modelsrandom noise. We set σ to have signal-to-noise ratio of 1.
2) Setup 2: grouped variables:
We assume that the variablesin x form G = non-overlapping groups: x = [ p ‡„„„„„„„„„„„„„„„„„„„„„„„„„•„„„„„„„„„„„„„„„„„„„„„„„„„(cid:181) x ⋯ x x ⋯ x p ‡„„„„„„„„„„„„„„„„„„„„„„„„„•„„„„„„„„„„„„„„„„„„„„„„„„„(cid:181) x ⋯ x x ⋯ x ⋯ p G ‡„„„„„„„„„„„„„„„„„„„„„„„„„„„„„•„„„„„„„„„„„„„„„„„„„„„„„„„„„„„(cid:181) x G ⋯ x G x G ⋯ x G ] T The group sizes p g are drawn independently from a Poissondistribution with mean 100. The total number of variables in x is p = ∑ Gg = p g . For g = , , . . . , G , x g ∼ N ( , ) .Define c ∈ R p × as a sparse vector collecting the weights ofvariables in x . We assume that the elements of c are groupedin the same way as x . Five of G = groups of variables in x are randomly selected and their weights are set to 1 (alternatein sign group-wise for visual clarity), while the remaininggroups of variables in x are not correlated/informative andtheir weights are set to 0. The c is shown in the top row ofFig. 2(c). Define a random variable z as z = c T X . The randomvector y is modeled in the same way as described in SectionV-A1.
3) Hyperparameter tuning & performance estimation:
Totune the hyperparameters ( c , c ) , we partition the data intotraining (50%), validation (25%), and testing (25%) sets. Afterfitting the SCCA model on the training data, the canonical corre-lation on the validation data is estimated over a two-dimensionalgrid in log-linear scale: . ∧ (⌊ log c , min ⌋ ∶ ⌈ log c , max ⌉) × . ∧ (⌊ log c , min ⌋ ∶ ⌈ log c , max ⌉) , where c (cid:96), min and c (cid:96), max , (cid:96) = , , are the minimum and maximum value of c (cid:96) , respectively.The c and c yielding the maximum validation canonicalcorrelation is selected. Then, we train the model with theselected regularization parameters on the full training data(training+validation) and report the canonical correlation onthe testing set as the performance. For the simplified SCCA,the same procedure is used except that the canonical covarianceis used as the metric for hyperparameter tuning. More detaileddescription of the procedure to select c , c and to assessperformance, including how to determine c (cid:96), min and c (cid:96), max , (cid:96) = , , is provided in Supplementary Materials F.
4) Simulation study results:
Fig. 2 shows the canonicalweight vectors estimated by SCCA and simplified SCCA onthe entire training data. In Supplementary Materials Tables S2-S3, we also summarize the variable selection performance interms of recall, precision, F1 score, accuracy (ACC), balancedaccuracy (bACC), Matthews correlation coefficient (MCC),precision-recall area under curve (PR AUC), and relativeabsolute error (RAE). The canonical correlation/covarianceon the training and testing sets are reported in TableI.Referring to Experimental setup 1 where the variables in x are uncorrelated, the standard SCCA consistently outperformsthe simplified SCCA in both selection of variables in x andidentification of strong canonical correlation.Referring to Experimental setup 2 where the variables in x form in groups with full correlation within each group, thesimplified SCCA always assigns the same weights to eachgroup of variables in x . However, for the standard SCCA,the weights of variables in x in the same group is randomlyassigned, which leads to a few variables with large weightswhile remaining variables with weights close to zero. Despitethat the simplified SCCA can falsely detect variables group-wise, it outperforms standard SCCA in selection of variablesin x . Note that, compared to standard SCCA, the simplifiedSCCA has slightly lower canonical correlation but much highercanonical covariance. This is not surprising because in thestandard SCCA the objective is to maximize the canonicalcorrelation while the simplified SCCA maximizes the canonicalcovariance.Regarding the selection of variables in y , the simplifiedSCCA performs better than standard SCCA in both Experimen-tal setups. This is as expected considering that the variables in y in (35) are highly correlated. B. Application to real imaging genetic data
We applied the two SCCA models to a real imaging geneticsdata set to compare their performances. The genotyping andbaseline AV-45 PET data of 757 non-Hispanic Caucasiansubjects (age 72.26 ± -0.200.2 u i X variable i -0.0500.05 u i X variable i -202 v j Y variable j (b) -202 v j Y variable j (d)
Fig. 2. The actual and estimated canonical weight vectors u and v for (a-b) Experiment 1 and (c-d) Experiment 2. In each subfigure, the top row showsthe actual weights used in the generative model, and the bottom two rows show the weights estimated by SCCA and simplified SCCA on all training data,respectively. To facilitate comparison, the estimated weight vectors are scaled to have the same Euclidean norm as the actual weight vector. APOE , PICALM and
ABCA7 . However, in each gene, the simplified SCCAselects a cluster of SNPs while the standard SCCA only selectsone or very few SNPs which dominate. Together with thecorrelation among the SNPs within each gene (Fig. S4 middle),it verifies that the simplified SCCA has the grouping effectsin feature selection while the standard SCCA does not.For imaging feature selection (Figure S7 and Table S6),although high correlation is prevalent among the 116 imagingfeatures (Fig. S4 right), the standard SCCA only selects about20 features while the simplified SCCA selects more than 60features, which confirms that the simplified SCCA is prone toselecting correlated feature together.
TABLE IP
ERFORMANCE COMPARISON ON CANONICAL CORRELATION COEFFICIENTS ON SYNTHETIC DATA . T RAINING T ESTING M ODEL ( c opt1 , c opt2 ) C OV @V AL ∗ C ORR @V AL ∗ C OV ∗∗ C ORR ∗∗ C OV C ORR E XPERIMENTAL SETUP
IMP
SCCA (11.763, 4.513) 3.304 — 7.396 0.893 4.104 0.749E
XPERIMENTAL SETUP
IMP
SCCA (21.354, 4.154) 461.295 — 522.153 0.983 545.016 0.985 * Cov@Val/Corr@Val: canonical covariance/correlation on the validation data during the training (model selection) stage. Thereported value is the maximum canonical covariance/correlation over all candidate ( c , c ) (i.e. at the optimal regularizationparameters ( c opt1 , c opt2 ) ). ** Cov/Corr: canonical covariance/correlation when the optimal model is fit to combined training and validation data.TABLE IIP
ERFORMANCE COMPARISON ON CANONICAL CORRELATION COEFFICIENTS ON REAL DATA . T RAINING T ESTING F OLD INDEX ( c opt1 , c opt2 ) C OV @V AL ∗ C ORR @V AL ∗ C OV ∗∗ C ORR ∗∗ C OV C ORR
SCCAF
OLD
OLD
OLD
OLD
ULL DATA (2, 4) — — 1.1274 0.0.6331 — —S
IMPLIFIED
SCCAF
OLD
OLD
OLD
OLD
ULL DATA (4, 16) — — 7.1326 0.4248 — — * Cov@Val/Corr@Val: mean canonical covariance/correlation for the left-out folds in the inner cross-validation to selectthe regularization parameters. The reported value is the maximum mean canonical covariance/correlation over all candidate ( c , c ) (i.e. at the optimal regularization parameters ( c opt1 , c opt2 ) ). ** Cov/Corr: mean canonical covariance/correlation when the optimal model is fit to the whole training data.
VI. C
ONCLUSION
The sparse canonical correlation analysis (SCCA) is a bi-multivariate model that maximizes the multivariate correlationbetween two sets of variables. Since SCCA is computationallyexpensive, a simplified SCCA model which maximizes themultivariate covariance, has been widely used as its surrogate.The fundamental properties of the solutions of these twomodels remain unknown. Through theoretical analysis, weshow that these two models behave differently regarding thegrouping effects in variable selection. The simplified SCCAjointly selects or deselects a group of correlated variablestogether, while the standard SCCA randomly selects one or fewrepresentatives from a group of correlated variables. Empiricalresults on both synthetic and real data confirm our theoreticalfinding. This result can guide users to choose the right SCCAmodel in practice. R EFERENCES[1] H. Hotelling, “Relations between two sets of variates,”
Biometrika , vol. 28,pp. 321–377, 1936.[2] D. R. Hardoon, S. Szedmak, and J. Shawe-Taylor, “Canonical correlationanalysis: An overview with application to learning methods,”
Neuralcomputation , vol. 16, no. 12, pp. 2639–2664, 2004.[3] A. Klami, S. Virtanen et al. , “Bayesian canonical correlation analysis,”
J. Mach. Learn. Res. , vol. 14, no. Apr, pp. 965–1003, 2013.[4] L. Sun, S. Ji, and J. Ye, “Canonical correlation analysis for multilabelclassification: A least-squares formulation, extensions, and analysis,”
IEEE Trans Pattern Anal Mach Intell , vol. 33, no. 1, pp. 194–200, 2010.[5] K. J. Worsley, J.-B. Poline, K. J. Friston, and A. Evans, “Characterizingthe response of PET and fMRI data using multivariate linear models,”
Neuroimage , vol. 6, no. 4, pp. 305–319, 1997.[6] O. Friman, J. Cedefamn et al. , “Detection of neural activity in functionalMRI using canonical correlation analysis,”
Magnetic Resonance inMedicine , vol. 45, no. 2, pp. 323–330, 2001.[7] Y. Yamanishi, J.-P. Vert et al. , “Extraction of correlated gene clustersfrom multiple genomic data by generalized kernel canonical correlationanalysis,”
Bioinformatics , vol. 19, no. suppl 1, pp. i323–i330, 2003.[8] J. Via, I. Santamaria, and J. P´erez, “Canonical correlation analysis (CCA)algorithms for multiple data sets: Application to blind SIMO equalization,”in
IEEE European Signal Proc. Conf.
IEEE, 2005, pp. 1–4.[9] A. R. Hariri and D. R. Weinberger, “Imaging genomics,”
British medicalbulletin , vol. 65, no. 1, pp. 259–270, 2003.[10] L. Shen and P. M. Thompson, “Brain imaging genomics: Integratedanalysis and machine learning,”
Proceedings of the IEEE , vol. 108, no. 1,pp. 125–162, Jan 2020.[11] S. Waaijenborg, P. C. V. de Witt Hamer, and A. H. Zwinderman,“Quantifying the association between gene expressions and DNA-markersby penalized canonical correlation analysis,”
Statistical applications ingenetics and molecular biology , vol. 7, no. 1, 2008.[12] D. R. Hardoon and J. Shawe-Taylor, “Sparse canonical correlationanalysis,”
Machine Learning , vol. 83, no. 3, pp. 331–353, 2011.[13] D. Chu, L.-Z. Liao, M. K. Ng, and X. Zhang, “Sparse canonicalcorrelation analysis: New formulation and algorithm,”
IEEE Trans PatternAnal Mach Intell , vol. 35, no. 12, pp. 3050–3065, 2013.[14] E. C. Chi, G. I. Allen et al. , “Imaging genetics via sparse canonicalcorrelation analysis,” in
IEEE 10th Int Sym on Biomedical Imaging (ISBI) ,San Francisco, CA, 2013, pp. 740–743.[15] X. Suo, V. Minden, B. Nelson, R. Tibshirani, and M. Saunders, “Sparsecanonical correlation analysis,” arXiv preprint arXiv:1705.10865 , 2017.[16] E. Parkhomenko, D. Tritchler, and J. Beyene, “Sparse canonical correla-tion analysis with application to genomic data integration,”
StatisticalApplications in Genetics and Molecular Biology , vol. 8, pp. 1–34, 2009.[17] D. Witten, R. Tibshirani, and T. Hastie, “A penalized matrix decompo-sition, with applications to sparse principal components and canonicalcorrelation analysis,”
Biostatistics , vol. 10, no. 3, pp. 515–34, 2009.[18] D. M. Witten and R. J. Tibshirani, “Extensions of sparse canonicalcorrelation analysis with applications to genomic data,”
Stat Appl GenetMol Biol , vol. 8, no. 1, pp. 1–27, 2009.[19] X. Chen, H. Liu, and J. G. Carbonell, “Structured sparse canonicalcorrelation analysis,” in
International Conference on Artificial Intelligenceand Statistics , vol. 12, La Palma, Canary Islands, 2012, pp. 199–207.[20] J. Chen, F. D. Bushman, J. D. Lewis, G. D. Wu, and H. Li, “Structure-constrained sparse canonical correlation analysis with an application tomicrobiome data analysis,”
Biostatistics , vol. 14, no. 2, pp. 244–258,2013.[21] S. Dudoit, J. Fridlyand, and T. P. Speed, “Comparison of discriminationmethods for the classification of tumors using gene expression data,”
J.Am. Stat. Assoc. , vol. 97, no. 457, pp. 77–87, 2002.[22] R. Tibshirani, T. Hastie, B. Narasimhan, and G. Chu, “Class predictionby nearest shrunken centroids, with applications to DNA microarrays,”
Statistical Science , pp. 104–117, 2003.[23] J. Fang, D. Lin et al. , “Joint sparse canonical correlation analysis fordetecting differential imaging genetics modules,”
Bioinformatics , vol. 32,no. 22, pp. 3480–3488, 2016.[24] C. B. MikeWest, H. Dressman et al. , “Predicting the clinical status ofhuman breast cancer using gene expression profiles,”
PNAS , 2001.[25] H. Zou and T. Hastie, “Regularization and variable selection via theelastic net,”
Journal of the royal statistical society: series B (statisticalmethodology) , vol. 67, no. 2, pp. 301–320, 2005.[26] P. M. Thompson, N. G. Martin, and M. J. Wright, “Imaging genomics,”
Curr Opin Neurol , vol. 23, no. 4, pp. 368–73, 2010.[27] S. Boyd and L. Vandenberghe,
Convex optimization . Cambridgeuniversity press, 2004. [28] J. C. Bezdek and R. J. Hathaway, “Some notes on alternating optimiza-tion,” in
AFSS International Conference on Fuzzy Systems . Berlin,Heidelberg: Springer, 2002, pp. 288–300.[29] S. Boyd, N. Parikh, E. Chu, B. Peleato, and J. Eckstein, “Distributedoptimization and statistical learning via the alternating direction methodof multipliers,”
Foundations and Trends® in Machine Learning , vol. 3,no. 1, pp. 1–122, 2011.[30] M. W. Weiner, D. P. Veitch et al. , “The Alzheimer’s disease neuroimag-ing initiative 3: Continued innovation for clinical trial improvement,”
Alzheimer’s & Dementia , vol. 13, no. 5, pp. 561–571, 2017.[31] C. Eckart and G. Young, “The approximation of one matrix by anotherof lower rank,”
Psychometrika , vol. 1, no. 3, pp. 211–218, 1936. S TUDY ON THE GROUPING EFFECTS OF TWO SPARSE
CCA
MODELS IN VARIABLESELECTION S UPPLEMENTARY MATERIALS A PPENDIX
We present how to solve problem (30) using the linearized alternating direction method of multipliers (ADMM) [29], [15].The problem (31) can be solved in a similar manner.First, we write problem (30) in the form:minimize u − u T X T Yv + (∥ Xu ∥ ≤ ) + (∥ u ∥ ≤ c ) , (36)where (⋅) is the indicator function defined as ( x ∈ A ) = ⎧⎪⎪⎨⎪⎪⎩ , x ∈ A ∞ , x ∉ A To apply the ADMM, the problem (36) is reformulated asminimize u − u T X T Yv + (∥ z ∥ ≤ ) + (∥ u ∥ ≤ c ) subject to Xu = z (37)The augmented Lagrangian of problem (37) is L ρ ( u , z , λ ) = − u T X T Yv + (∥ z ∥ ≤ ) + (∥ u ∥ ≤ c ) + ⟨ λ, Xu − z ⟩ + ρ ∥ Xu − z ∥ . (38)ADMM consists of the iterations: u (cid:96) + = argmin u L ρ ( u , z (cid:96) , λ (cid:96) ) (39) z (cid:96) + = argmin z L ρ ( u (cid:96) + , z , λ (cid:96) ) (40) λ (cid:96) + = λ (cid:96) + ρ ( Xu (cid:96) + − z (cid:96) + ) (41)That is u (cid:96) + = argmin u − u T X T Yv + (∥ u ∥ ≤ c ) + ⟨ λ (cid:96) , Xu − z (cid:96) ⟩ + ρ ∥ Xu − z (cid:96) ∥ (42) z (cid:96) + = argmin z (∥ z ∥ ≤ ) + ⟨ λ (cid:96) , Xu (cid:96) + − z ⟩ + ρ ∥ Xu (cid:96) + − z ∥ (43) λ (cid:96) + = λ (cid:96) + ρ ( Xu (cid:96) + − z (cid:96) + ) (44)The problem (42) is not easy to solve due to the term ∥ Xu − z (cid:96) ∥ ≜ f ( u ) . To handle this, we construct a quadraticapproximation of f ( u ) near the estimate u (cid:96) of u in the previous iteration (cid:96) : F ( u ) ≜ f ( u (cid:96) ) + ⟨∇ f ( u (cid:96) ) , u − u (cid:96) ⟩ + L X ∥ u − u (cid:96) ∥ = ∥ Xu (cid:96) − z (cid:96) ∥ + ⟨ X T ( Xu (cid:96) − z (cid:96) ) , u − u (cid:96) ⟩ + L X ∥ u − u (cid:96) ∥ (45)where L X = λ max ( X T X ) , where λ max (⋅) is the largest eigenvalue of its argument/input. Note that we have F ( u ) ≥ f ( u ) forany u ∈ R p × and F ( u (cid:96) ) = f ( u (cid:96) ) .In the linearized ADMM, it solves the approximate version of problem (42) with the term f ( u ) = ∥ Xu − z (cid:96) ∥ replaced by F ( u ) : u (cid:96) + = argmin u − u T X T Yv + (∥ u ∥ ≤ c ) + ⟨ λ (cid:96) , Xu − z (cid:96) ⟩ + ρF ( u )= argmin u ρL X ∥ u − u (cid:96) + L X X T ( Xu (cid:96) − z (cid:96) + ρ λ (cid:96) − ρ Yv )∥ + (∥ u ∥ ≤ c )= prox L ( u (cid:96) − L X X T ( Xu (cid:96) − z (cid:96) + ρ λ (cid:96) − ρ Yv ) ; c ) (46)where the proximal operator prox L (⋅ ; ⋅) is defined as prox L ( a ; c ) = argmin x ∥ x − a ∥ + (∥ x ∥ ≤ c ) = ⎧⎪⎪⎨⎪⎪⎩ a , ∥ a ∥ ≤ c S ( a , ∆ ) , ∥ a ∥ > c (47) where ∆ is a positive constant that satisfies ∥ S ( a , ∆ )∥ = c .The update formula of problem (43) is z (cid:96) + = ⎧⎪⎪⎨⎪⎪⎩ Xu (cid:96) + + λ (cid:96) / ρ, ∥ Xu (cid:96) + + λ (cid:96) / ρ ∥ ≤ Xu (cid:96) + + λ (cid:96) / ρ ∥ Xu (cid:96) + + λ (cid:96) / ρ ∥ , ∥ Xu (cid:96) + + λ (cid:96) / ρ ∥ > (48)Taken together, the updates at each ADMM iteration are u (cid:96) + = prox L ( u (cid:96) − L X X T ( Xu (cid:96) − z (cid:96) + ξ (cid:96) − ρ Yv ) ; c ) (49) z (cid:96) + = ⎧⎪⎪⎨⎪⎪⎩ Xu (cid:96) + + ξ (cid:96) , ∥ Xu (cid:96) + + ξ (cid:96) ∥ ≤ Xu (cid:96) + + ξ (cid:96) ∥ Xu (cid:96) + + ξ (cid:96) ∥ , ∥ Xu (cid:96) + + ξ (cid:96) ∥ > (50) ξ (cid:96) + = ξ (cid:96) + Xu (cid:96) + − z (cid:96) + (51)where ξ (cid:96) = λ (cid:96) / ρ .The Linearized ADMM algorithm for fitting the SCCA model is summarized in Algorithm 3. Algorithm 3
Sparse CCA fitting algorithm: Linearized ADMM
Require: X ∈ R n × p , Y ∈ R n × q , with column-wise zero empirical mean (the sample mean of each column has been shifted tozero);Regularization parameters c and c . Calculate Lipschitz constants: L X = λ max ( X T X ) , L Y = λ max ( Y T Y ) ; Initialization: u ( ) ∈ R p × , v ( ) ∈ R q × ; Set the penalty parameter to ρ = ρ = [29]; k = ; repeat Update u : a = Yv ( k ) Input: u = u ( k ) ∈ R p × , z = ξ = ∈ R n × ; for (cid:96) = , , , . . . do u (cid:96) + = prox L ( u (cid:96) − L X X T ( Xu (cid:96) − z (cid:96) + ξ (cid:96) − ρ a ) ; c ) z (cid:96) + = ⎧⎪⎪⎨⎪⎪⎩ Xu (cid:96) + + ξ (cid:96) , ∥ Xu (cid:96) + + ξ (cid:96) ∥ ≤ Xu (cid:96) + + ξ (cid:96) ∥ Xu (cid:96) + + ξ (cid:96) ∥ , ∥ Xu (cid:96) + + ξ (cid:96) ∥ > ξ (cid:96) + = ξ (cid:96) + Xu (cid:96) + − z (cid:96) + end for Output: u ( k + ) = u (cid:96) + . Update v : b = Xu ( k + ) Initialization: v = v ( k ) ∈ R q × , ζ = ψ = ∈ R n × ; for (cid:96) = , , , . . . do v (cid:96) + = prox L ( v (cid:96) − L Y Y T ( Yv (cid:96) − ζ (cid:96) + ψ (cid:96) − ρ b ) ; c ) ζ (cid:96) + = ⎧⎪⎪⎨⎪⎪⎩ Yv (cid:96) + + ψ (cid:96) , ∥ Yv (cid:96) + + ψ (cid:96) ∥ ≤ Yv (cid:96) + + ψ (cid:96) ∥ Yv (cid:96) + + ψ (cid:96) ∥ , ∥ Yv (cid:96) + + ψ (cid:96) ∥ > ψ (cid:96) + = ψ (cid:96) + Yv (cid:96) + − ζ (cid:96) + end for Output: v ( k + ) = v (cid:96) + . k ← k + . until convergence. A. Proof of Lemma IV.1Proof.
Since the problem (32) is a convex optimization problem with differentiable objective and constraint functions (Notethat the L1 inequality constraint can be written as p linear inequality constraints), and is strictly feasible (Slater’s conditionholds), the KKT conditions provide necessary and sufficient conditions for optimality [27].The Lagrangian function is L ( u , α, ∆ ) = − a T u + α (∥ u ∥ − ) + ∆ (∥ u ∥ − c ) where α and ∆ are the Lagrange multipliers (dual variables) for the L2 and L1 constraints, respectively.Setting the differential of L ( u , α, ∆ ) with respect to u equal to zero yields α u + ∆ s = a (52)where s is the subgradient of ∥ u ∥ with respect to u , with s i = sign ( u i ) if u i ≠ and s i ∈ [− , ] otherwise.The KKT conditions for optimality consist of (52) and α ≥ , ∥ u ∥ ≤ , α (∥ u ∥ − ) = (53) ∆ ≥ , ∥ u ∥ ≤ c, ∆ (∥ u ∥ − c ) = (54) ● Case 1: α = , ∆ > .The KKT conditions (52)-(54) are simplified as ∆ s = a , ∆ > (55) ∥ u ∥ ≤ (56) ∥ u ∥ = c (57)From (55), it follows that ∆ = max ≤ i ≤ p ∣ a i ∣ and u i = for any i ∉ S , where S = { i ∣ ∣ a i ∣ = ∆ } .Therefore, an optimal solution can be written in the following form: [ u ∗ ] i = ⎧⎪⎪⎨⎪⎪⎩ w i sign ( a i ) , i ∈ S , i ∉ S (58)with w i ≥ satisfying ∑ i ∈S w i ≤ , ∑ i ∈S w i = c When c ≤ √∣S∣ , the set of solutions defined above is non-empty, and among them the solution with minimum Euclideannorm is shown in (33). ● Case 2: α > – Case 2.1: ∆ = The KKT conditions (52)-(54) are simplified as α u = a , α > (59) ∥ u ∥ = (60) ∥ u ∥ ≤ c (61)From (59)-(60), it follows that u = a ∥ a ∥ .When c ≥ ∥ a ∥ ∥ a ∥ , the above u also satisfies (61) and is therefore the optimal solution. ● Case 2.2: ∆ > The conditions (53)-(54) become α > , ∥ u ∥ = (62) ∆ > , ∥ u ∥ = c (63)Combining conditions (52) and (62)-(63), we obtain the optimal solution shown in Eq. (34). This corresponds to the range √∣S∣ ≤ c < ∥ a ∥ ∥ a ∥ . u u (a ,a ) u u (a,a) Fig. S1. Two particular scenarios of Case 1 in Lemma IV.1 where Lemma 2.2 in [17] does not consider in the optimization problem. The dimension is p = . The shaded area shows the domain [feasible/constraint set/region] (defined by the L2 and L1 constraints) of the objective function of problem (32). (a) c = . < ; (b) c = . < √ and a = a = a . In both cases, the optimal solution in (a) (the point u ∗ = [ .
8; 0 ] ) and the optimal solution in (b) (any pointon the chord of the circle) do not have a form that is shown in Lemma 2.2 in [17]. B. How does Lemma IV.1 extend Lemma 2.2 of [17]?
Fig. S1 illustrates two particular scenarios of Case 1 in Lemma IV.1 for p = where Lemma 2.2 in [17] fails.Essentially, the expression presented in Lemma 2.2 of [17] is the solution tomaximize u a T u subject to ∥ u ∥ = , ∥ u ∥ ≤ c (64)while in the problem (32) that Lemma IV.1 is solving, the L2 equality constraint is replaced by the L2 inequality constraint,resulting in a convex problem.Note that in order for problem (64) to have an optimal solution of the form presented in Lemma 2.2 of [17], c must belarger than or equal to √∣S∣ , where S is a set defined as S = { i ∶ i ∈ argmax j ∣ a j ∣} (see the proof of Lemma IV.1). Otherwise, ● when ≤ c < , problem (64) is infeasible (there are no feasible points that satisfy the constraints). ● when < c < √∣S∣ , the optimal solution to problem (64) is [ u ∗ ] i = ⎧⎪⎪⎨⎪⎪⎩ w i sign ( a i ) , i ∈ S , i ∉ S (65a)where w i ≥ , i ∈ S , satisfy ∑ i ∈S w i = , ∑ i ∈S w i = c (65b)Note that solution (65) cannot be written in the form shown in Lemma 2.2 in [17].By contrast, problem (32) has an optimal solution for every c ≥ .In this section, we will show how to apply the single-canonical-component SCCA algorithms to sequentially compute multiplecanonical components of standard and simplified SCCA models. Note that except that Algorithm 8 was described in [17], allalgorithms (Algorithms 4-5 for standard the SCCA model and Algorithm 9 for the simplified SCCA model) and their theoreticaljustifications in Sections C1 and D1 are new, to the best our knowledge. C. Sequential calculation of multiple canonical components of standard SCCA
The SCCA model for computing R canonical components ismaximize U , V trace ( U T ˆΣ xy V ) subject to U T ˆΣ xx U = I R , ∥ u r ∥ ≤ c r , r = , , . . . , R V T ˆΣ yy V = I R , ∥ v r ∥ ≤ c r , r = , , . . . , R (66) where ˆΣ xy = n − X T Y (67) ˆΣ xx = n − X T X (68) ˆΣ yy = n − Y T Y (69)are the sample cross-covariance between random vectors x and y , sample auto-covariance matrix within random vector x andsample auto-covariance matrix within random vector y , respectively. Here we assume that the columns of X and Y have beencentered to zero mean.For clarity, we first present two algorithms (Algorithms 4 and 5) to sequentially compute multiple canonical components ofSCCA: one is based on deflation of the cross-covariance matrix, and the other one is based on deflation of the data matrices.Then we provide theoretical explanations of both algorithms in the subsequent sections. Algorithm 4
Sequential computation of R canonical components of SCCA via deflation of the cross-covariance matrix. Let ˆΣ xy = n − X T Y ∈ R p × q , ˆΣ xx = n − X T X ∈ R p × p and ˆΣ yy = n − Y T Y ∈ R q × q . for r = , , . . . , R do Find the r -th pair of canonical weight vectors ˆu r and ˆv r by applying the single-canonical-component SCCA algorithmto ( ˆΣ r − xy , ˆΣ xx , ˆΣ yy ) : maximize u r , v r u T r ˆΣ r − xy v r subject to u T r ˆΣ xx u r ≤ , ∥ u r ∥ ≤ c r v T r ˆΣ yy v r ≤ , ∥ v r ∥ ≤ c r ˆΣ r xy ← ˆΣ r − xy − ˆΣ xx ˆ d r ˆu r ˆv T r ˆΣ yy , where ˆ d r = ˆu T r ˆΣ r − xy ˆv r ˆu T r ˆΣ xx ˆu r ⋅ ˆv T r ˆΣ yy ˆv r . end forAlgorithm 5 Sequential computation of R canonical components of SCCA via deflation of the data matrices. Let X = X ∈ R n × p , Y = Y ∈ R n × q . for r = , , . . . , R do Find the r -th pair of canonical weight vectors ( ˆu r , ˆv r ) by applying Algorithm 3 to solvemaximize u , v n − u T r X r − Y r − v r subject to n − u T r X T Xu r ≤ , ∥ u r ∥ ≤ c r n − v T r Y T Yv r ≤ , ∥ v r ∥ ≤ c r Calculate the residual data: X r ← X r − − X r − ˆu r ˆu T r X T Xˆu T r X T Xˆu r (70) Y r ← Y r − − Y r − ˆv r ˆv T r Y T Yˆv T r Y T Yˆv r (71) end for Remark
A.1 . The deflated data in Eqs. (70)-(71) can also be interpreted as the residual matrix of linear least squaresregression: minimize z ∈ R n ∥ X r − ( X T X ) − / − z ⋅ [( X T X ) / ˆu r ] T ∥ and minimize ζ ∈ R n ∥ Y r − ( Y T Y ) − / − ζ ⋅ [( Y T Y ) / ˆv r ] T ∥ ,respectively.
1) Sequential calculation of multiple SCCA canonical components in the large-sample-size asymptotic regime:
To compute R canonical components sequentially/greedily, we consider the asymptotic regime of n → ∞ in which case model (66) becomesmaximize U , V trace ( U T Σ xy V ) subject to U T Σ xx U = I R V T Σ yy V = I R (72)where Σ xy , Σ xx and Σ yy are the population cross-covariance matrix between random vectors x and y , population auto-covariance matrix within random vector x and population auto-covariance matrix within random vector y , respectively. Notethat in model (72) we have dropped the L1 regularizers: since we have infinite amount of data available for use, the L1regularizations are no longer necessary.The Lagrangian function of problem (72) is defined as L ( U , V , Ψ , Φ ) = − U T Σ xy V + ⟨ Ψ , U T Σ xx U − I R ⟩ + ⟨ Φ , V T Σ yy V − I R ⟩ where Ψ ∈ R R × R is a symmetric matrix of Lagrange multipliers for the R ( R + )/ constraints on U in problem (72), and Φ ∈ R R × R is a symmetric matrix of Lagrange multipliers for the R ( R + )/ constraints on V .Denote the optimal primal and dual solutions of problem (72) as ( ˆU , ˆV ) and ( ˆΨ , ˆΦ ) , respectively. According to the KKTconditions, we have Σ xx ˆU ˆΨ = Σ xy ˆV (73) Σ yy ˆV ˆΦ = Σ T xy ˆU (74)Combining Eqs. (73)-(74) with the quadratic constraints in problem (72) yields ˆΨ = ˆU T Σ xy ˆV ˆΦ = ˆV T Σ T xy ˆU Note that problem (72) does not have a unique solution due to the rotational ambiguity: if ( ˆU , ˆV ) is an optimal solution ofproblem (72), then ( ˆˆU , ˆˆV ) = ( ˆUQ , ˆVQ ) for any orthogonal matrix Q ∈ R R × R is also an optimal solution. Since ˆΨ and thus ˆU T Σ xy ˆV is a symmetric matrix, we can choose the optimal solution ( ˆU , ˆV ) for which ˆU T Σ xy ˆV is a diagonal matrix. As aresult, ˆΨ = ˆΦ =∶ D is a diagonal matrix. Assuming both Σ xx and Σ yy are nonsingular, Eqs. (73)-(74) can be rewritten as Σ / xx ˆUD = Σ − / xx Σ xy Σ − / yy ⋅ Σ / yy ˆV (75) Σ / yy ˆVD = Σ − / yy Σ T xy Σ − / xx ⋅ Σ / xx ˆU (76)Note that the objective of problem (72) is to maximize trace ( D ) under the constraints that Σ / xx U and Σ / yy V both haveorthonormal columns. It follows that D contains the R largest singular values of Σ − / xx Σ xy Σ − / yy , and ˆE = Σ / xx ˆU and ˆF = Σ / yy ˆV contain the corresponding R left and right singular vectors, respectively. According to the Eckart-Young-Mirskytheorem [31], the columns of ˆU and ˆV can be obtained by successive rank-one SVDs of the residual covariance matrix.Specifically, let S = Σ − / xx Σ xy Σ − / yy ∈ R p × q . For r = , , . . . , R , we have ( ˆ d r , ˆu r , ˆv r ) = argmin d r , u r , v r ∥ Σ / xx u r ∥= ∥ Σ / yy v r ∥= ∥ S r − − Σ / xx d r u r v T r Σ / yy ∥ (77) S r = S r − − Σ / xx ˆ d r ˆu r ˆv T r Σ / yy (78)Suppose we have obtained the estimate of the r -th pair of canonical weight vectors ( ˆu r , ˆv r ) . We then estimate d r as ˆ d r = argmin d r ∥ S r − − Σ / xx d r ˆu r ˆv T r Σ / yy ∥ = ˆu T r Σ / xx S r − Σ / yy ˆv r ˆu T r Σ xx ˆu r ⋅ ˆv T r Σ yy ˆv r Taken all together, to compute multiple canonical components sequentially in the large-sample-size asymptotic regime, theresidual covariance matrix is updated as below: S = Σ − / xx Σ xy Σ − / yy (79) S r = S r − − Σ / xx ˆu r ˆu T r Σ / xx S r − Σ / yy ˆv r ˆv T r Σ / yy ˆu T r Σ xx ˆu r ⋅ ˆv T r Σ yy ˆv r , r = , , . . . , R (80) or equivalently Σ xy = Σ xy (81) Σ r xy = Σ r − xy − Σ xx ˆu r ˆu T r Σ r − xy ˆv r ˆv T r Σ yy ˆu T r Σ xx ˆu r ⋅ ˆv T r Σ yy ˆv r , r = , , . . . , R (82)which results in Algorithm 6.Let x ∈ R p × and y ∈ R q × be random vectors generating the X ∈ R n × p and Y ∈ R n × q , respectively. For notational simplicity,assume E [ x ] = , E [ y ] = . It can be shown that the residual covariance matrix update formulas (79)-(80) can be rewritten interms of random vectors x and y as x = x , y = y (83) x r = Σ / xx ⎛⎝ I p − Σ / xx ˆu r ˆu T r Σ / xx ˆu T r Σ xx ˆu r ⎞⎠ Σ − / xx x r − = x r − − Σ xx ˆu r ˆu T r ˆu T r Σ xx ˆu r x r − (84) y r = Σ / yy ⎛⎝ I q − Σ / yy ˆv r ˆv T r Σ / yy ˆv T r Σ yy ˆv r ⎞⎠ Σ − / yy y r − = y r − − Σ yy ˆv r ˆv T r ˆv T r Σ yy ˆv r y r − (85)which results in Algorithm 7. Algorithm 6
Sequential computation of R canonical components of SCCA in asymptotic regime via deflation of the populationcross-covariance matrix. Σ xy = E [ xy T ] , Σ xx = E [ xx T ] and Σ yy = E [ yy T ] . for r = , , . . . , R do Find the estimate of the r -th pair of canonical weight vectors ˆu r and ˆv r :maximize u r , v r u T r Σ r − xy v r subject to u T r Σ xx u r = v T r Σ yy v r = Σ r xy ← Σ r − xy − Σ xx ˆ d r ˆu r ˆv T r Σ yy , where ˆ d r = ˆu T r Σ r − xy ˆv r ˆu T r Σ xx ˆu r ⋅ ˆv T r Σ yy ˆv r . end forAlgorithm 7 Sequential computation of R canonical components of SCCA in asymptotic regime via deflation of randomvectors. Let x = x ∈ R p × , y = y ∈ R q × . for r = , , . . . , R do Find the estimate of the r -th pair of canonical weight vectors ˆu r and ˆv r :maximize u r , v r u T r E [ x r − y r − ] v r subject to u T r E [ xx T ] u r = v T r E [ yy T ] v r = Calculate the residual random vectors: x r ← x r − − Σ xx ˆu r ˆu T r ˆu T r Σ xx ˆu r x r − y r ← y r − − Σ yy ˆv r ˆv T r ˆv T r Σ yy ˆv r y r − end for The Algorithms 4 and 5 are implementations of Algorithms 6 and 7 in finite-sample settings, respectively.
D. Sequential calculation of multiple canonical components of simplified SCCA
The simplified SCCA model for computing R canonical components ismaximize U , V trace ( U T ˆΣ xy V ) subject to U T U = I R , ∥ u r ∥ ≤ c r , r = , , . . . , R V T V = I R , ∥ v r ∥ ≤ c r , r = , , . . . , R (86)where ˆΣ xy is the sample cross-covariance matrix between random vectors x and y .For clarity, we first present two algorithms (Algorithms 8 and 9) to sequentially compute multiple canonical components ofsimplified SCCA: one is based on deflation of the cross-covariance matrix, and the other one is based on deflation of the datamatrices. Then we provide theoretical explanations of both algorithms in the subsequent sections. Algorithm 8
Sequential computation of R canonical components of simplified SCCA via deflation of the cross-covariancematrix. Let ˆΣ xy = n − X T Y ∈ R p × q . for r = , , . . . , R do Find the r -th pair of canonical weight vectors ˆu r and ˆv r by applying Algorithm 2 to ˆΣ r − xy :maximize u r , v r u T r ˆΣ r − xy v r subject to ∥ u r ∥ ≤ , ∥ u r ∥ ≤ c r ∥ v r ∥ ≤ , ∥ v r ∥ ≤ c r ˆΣ r xy ← ˆΣ r − xy − ˆ d r ˆu r ˆv T r , where ˆ d r = ˆu T r ˆΣ r − xy ˆv r ∥ ˆu r ∥ ⋅∥ ˆv r ∥ . end forAlgorithm 9 Sequential computation of R canonical components of simplified SCCA via deflation of the data matrices. Let X = X ∈ R n × p , Y = Y ∈ R n × q . for r = , , . . . , R do Find the r -th pair of canonical weight vectors ( ˆu r , ˆv r ) by applying Algorithm 2:maximize u , v n − u T r X r − Y r − v r subject to ∥ u r ∥ ≤ , ∥ u ∥ ≤ c r ∥ v r ∥ ≤ , ∥ v ∥ ≤ c r Calculate the residual data: X r ← X r − ⎛⎝ I p − ˆu r ˆu T r ∥ ˆu r ∥ ⎞⎠ (87) Y r ← Y r − ⎛⎝ I q − ˆv r ˆv T r ∥ ˆv r ∥ ⎞⎠ (88) end for Remark
A.2 . The deflated data in Eqs. (87)-(88) can also be interpreted as the residual matrix of linear least squares regression:minimize z ∈ R n ∥ X r − − z ⋅ ˆu T r ∥ and minimize ζ ∈ R n ∥ Y r − − ζ ⋅ ˆv T r ∥ , respectively.
1) Sequential calculation of multiple SCCA canonical components in the large-sample-size asymptotic regime:
To compute R canonical components sequentially/greedily, we consider the asymptotic regime of n → ∞ in which case model (86) becomesmaximize U , V trace ( U T Σ xy V ) subject to U T U = I R V T V = I R (89)where Σ xy is the population cross-covariance matrix between random vectors x and y . Note that in model (89) we havedropped the L1 regularizers: since we have infinite amount of data available for use, the L1 regularizations are no longernecessary. The Lagrangian function of problem (89) is defined as
L ( U , V , Ψ , Φ ) = − U T Σ xy V + ⟨ Ψ , U T U − I R ⟩ + ⟨ Φ , V T V − I R ⟩ where Ψ ∈ R R × R is a symmetric matrix of Lagrange multipliers for the R ( R + )/ constraints on U in problem (89), and Φ ∈ R R × R is a symmetric matrix of Lagrange multipliers for the R ( R + )/ constraints on V .Denote the optimal primal and dual solutions of problem (89) as ( ˆU , ˆV ) and ( ˆΨ , ˆΦ ) , respectively. According to the KKTconditions, we have ˆU ˆΨ = Σ xy ˆV (90) ˆV ˆΦ = Σ T xy ˆU (91)Combining Eqs. (90)-(91) with the quadratic constraints in problem (89) yields ˆΨ = ˆU T Σ xy ˆV ˆΦ = ˆV T Σ T xy ˆU Note that problem (89) does not have a unique solution due to the rotational ambiguity: if ( ˆU , ˆV ) is an optimal solution ofproblem (89), then ( ˆˆU , ˆˆV ) = ( ˆUQ , ˆVQ ) for any orthogonal matrix Q ∈ R R × R is also an optimal solution. Since ˆΨ and thus ˆU T Σ xy ˆV is a symmetric matrix, we can choose the optimal solution ( ˆU , ˆV ) for which ˆU T Σ xy ˆV is a diagonal matrix. As aresult, ˆΨ = ˆΦ =∶ D is a diagonal matrix. Assuming both Σ xx and Σ yy are nonsingular, Eqs. (90)-(91) can be rewritten as ˆUD = Σ xy ⋅ ˆV (92) ˆVD = Σ T xy ⋅ ˆU (93)Note that the objective of problem (89) is to maximize trace ( D ) under the constraints that U and V both have orthonormalcolumns. It follows that D contains the R largest singular values of Σ xy , and ˆU and ˆV contain the corresponding R left andright singular vectors, respectively. According to the Eckart-Young-Mirsky theorem [31], the columns of ˆU and ˆV can beobtained by successive rank-one SVDs of the residual covariance matrix. Specifically, let Σ xy = Σ xy ∈ R p × q . For r = , , . . . , R ,we have ( ˆ d r , ˆu r , ˆv r ) = argmin d r , u r , v r ∥ u r ∥= ∥ v r ∥= ∥ Σ r − xy − d r u r v T r ∥ (94) Σ r xy = Σ r − xy − ˆ d r ˆu r ˆv T r (95)Suppose we have obtained the estimate of the r -th pair of canonical weight vectors ( ˆu r , ˆv r ) . We then estimate d r as ˆ d r = argmin d r ∥ Σ r − xy − d r ˆu r ˆv T r ∥ = ˆu T r Σ r − xy ˆv r ∥ ˆu r ∥ ⋅ ∥ ˆv r ∥ Taken all together, to compute multiple canonical components sequentially in the large-sample-size asymptotic regime, theresidual covariance matrix is updated as below: Σ xy = Σ xy (96) Σ r xy = Σ r − xy − ˆu r ˆu T r Σ r − xy ˆv r ˆv T r ∥ ˆu r ∥ ⋅ ∥ ˆv r ∥ , r = , , . . . , R (97)This results in Algorithm 10.For notational simplicity, assume E [ x ] = , E [ y ] = . It can be shown that the residual covariance matrix update formulas(96)-(97) can be rewritten in terms of random vectors x and y as x = x , y = y (98) x r = ⎛⎝ I p − ˆu r ˆu T r ∥ ˆu r ∥ ⎞⎠ x r − (99) y r = ⎛⎝ I q − ˆv r ˆv T r ∥ ˆv r ∥ ⎞⎠ y r − (100)which results in Algorithm 11. Algorithm 10
Sequential computation of R canonical components of simplified SCCA in asymptotic regime via deflation ofthe population cross-covariance matrix. Let Σ xy = E [ xy T ] . for r = , , . . . , R do Solve for the r -th pair of canonical weight vectors ˆu r and ˆv r :maximize u r , v r u T r Σ r − xy v r subject to ∥ u r ∥ = ∥ v r ∥ = Σ r xy ← Σ r − xy − ˆ d r ˆu r ˆv T r , where ˆ d r = ˆu T r Σ r − xy ˆv r ∥ ˆu r ∥ ⋅∥ ˆv r ∥ . end forAlgorithm 11 Sequential computation of R canonical components of simplified SCCA in asymptotic regime via deflation ofrandom vectors. Let x = x ∈ R p × , y = y ∈ R q × . for r = , , . . . , R do Solve for the r -th pair of canonical weight vectors ˆu r and ˆv r :maximize u r , v r u T r E [ x r − y r − ] v r subject to ∥ u r ∥ = ∥ v r ∥ = Calculate the residual random vectors: x r ← ⎛⎝ I p − ˆu r ˆu T r ∥ ˆu r ∥ ⎞⎠ x r − y r ← ⎛⎝ I q − ˆv r ˆv T r ∥ ˆv r ∥ ⎞⎠ y r − end for In finite-sample settings, the covariance matrix deflation based Algorithm 10 becomes Algorithm 8 to sequentially compute R canonical components of simplified SCCA, while the random vector deflation based Algorithm 11 becomes Algorithm 9. E. Covariance structure of the synthetic data
The sample cross- and auto-covariance matrices among random vectors x and y are defined as ˆΣ xy = n X T Y (101) ˆΣ xx = n X T X (102) ˆΣ yy = n Y T Y (103)
1) Experimental setup 1: uncorrelated variables:
The population cross- and auto-covariance matrices among random vectors x and y are E [ x ] = , E [ y ] = xx = E [ xx T ] = I p (104) Σ yy = E [ yy T ] = ∥ c ∥ dd T + σ I q (105) Σ xy = E [ xy T ] = cd T (106) Sample covariances between X and Y variables
20 40 60 80 100
Y variable j X v a r i ab l e i -0.3-0.2-0.100.10.20.3 Population covariances between X and Y variables
20 40 60 80 100
Y variable j X v a r i ab l e i -0.3-0.2-0.100.10.20.3 Sample covariances among X variables
500 1000 1500 2000
X variable i X v a r i ab l e i Population covariances among X variables
500 1000 1500 2000
X variable i X v a r i ab l e i Sample covariances among Y variables
20 40 60 80 100
Y variable j Y v a r i ab l e j -3-2-101234 Population covariances among Y variables
20 40 60 80 100
Y variable j Y v a r i ab l e j -3-2-101234 Fig. S2. Experimental setup 1: Heatmaps showing the sample (left) and population (right) cross-covariances between X and Y variables (top), auto-covarianceswithin X variables (middle), and auto-covariances within Y variables (bottom).
2) Experimental setup 2: grouped variables:
The population cross- and auto-covariance matrices among random vectors x and y are E [ x ] = , E [ y ] = Σ xx = E [ xx T ] = ⎡⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎣ Σ ⋱ Σ R Σ R + ⋱ Σ G ⎤⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎦ (107) Σ yy = E [ yy T ] = c dd T + σ I q (108) Σ xy = E [ xy T ] = Σ xx cd T (109)where Σ g = ( σ gij ) ∈ R p g × p g , with σ gii = and σ gij ρ gi ρ gj for any i ≠ j and g = , , . . . , G , and c ∶= E [ z ] = c T Σ xx c . Here R is the number of relevant/informative groups. TABLE S1G
ROUP SIZES OF VARIABLES IN x Group ID G1 G2 G3 G4 G5 G6 G7 G8 G9 G10 G11 G12 G13 G14 G15 G16 G17 G18 G19 G20Group size 89 112 92 88 88 99 130 103 94 91 99 91 90 112 96 100 96 91 103 100 Sample covariances between X and Y variables
20 40 60 80 100
Y variable j X v a r i ab l e i -10-50510 Population covariances between X and Y variables
20 40 60 80 100
Y variable j X v a r i ab l e i -10-50510 Sample covariances among X variables
200 400 600 800 1000 1200 1400 1600 1800
X variable i X v a r i ab l e i Population covariances among X variables
200 400 600 800 1000 1200 1400 1600 1800
X variable i X v a r i ab l e i Sample covariances among Y variables
20 40 60 80 100
Y variable j Y v a r i ab l e j -600-400-2000200400600800 Population covariances among Y variables
20 40 60 80 100
Y variable j Y v a r i ab l e j -600-400-2000200400600800 Fig. S3. Experimental setup 2: Heatmaps showing the sample (left) and population (right) cross-covariances between X and Y variables (top), auto-covarianceswithin X variables (middle), and auto-covariances within Y variables (bottom). F. Hyperparameter tuning and performance estimation
To select the regularization parameters ( c , c ) and estimate the generalization performance, we partition the data into training(50%, n s samples), validation (25%, n v samples), and testing (25%, n t = n − n s − n v samples) data sets: [ X Y ] = ⎡⎢⎢⎢⎢⎢⎣ X train Y train X val Y val X test Y test ⎤⎥⎥⎥⎥⎥⎦ ∈ R ( n s + n v + n t )×( p + q ) The training and validation data are used to tune the regularization parameters ( c , c ) , and the test data is used to estimate theperformance.To select the regularization parameters ( c , c ) , we fit the (simplified) SCCA model on the training data using each candidatevalue of ( c , c ) as the regularization parameters, where c and c are chosen from a sequence of values equally spaced on thelog scale: c ∈ . ∧ (⌊ log c , min ⌋ ∶ ⌈ log c , max ⌉) , c ∈∈ . ∧ (⌊ log c , min ⌋ ∶ ⌈ log c , max ⌉) . Here, c (cid:96), min and c (cid:96), max , (cid:96) = , , arethe minimum and maximum value of c (cid:96) which will be calculated for the standard and simplified SCCA models in Section F1.Denote the solution of the model fitted with ( c , c ) as ( ˆu train ( c , c ) , ˆv train ( c , c )) . For the standard SCCA model, theoptimal ( c , c ) are chosen as ( c opt1 , c opt2 ) = argmax c ,c Corr ( X val ˆu train , Y val ˆv train ) (110) = argmax c ,c ⟨ X val ˆu train , Y val ˆv train ⟩∥ X val ˆu train ∥ ∥ Y val ˆv train ∥ (111)For the simplified SCCA model, the optimal ( c , c ) are chosen as ( c opt1 , c opt2 ) = argmax c ,c Cov ( X val ˆu train / ∥ ˆu train ∥ , Y val ˆv train / ∥ ˆv train ∥) (112) = argmax c ,c n t ⟨ X val ˆu train , Y val ˆv train ⟩∥ ˆu train ∥ ∥ ˆv train ∥ (113)Then, we refit the SCCA model with ( c opt1 , c opt2 ) on all training data (combined training and validation data) to get thesolution ( ˆu trainval , ˆv trainval ) . The canonical covariance and correlation on the test data are reported as the generalizationperformance: Cov ( X test ˆu trainval , Y test ˆv trainval ) = ⟨ X test ˆu trainval , Y test ˆv trainval ⟩∥ ˆu trainval ∥ ∥ ˆv trainval ∥ (114) Corr ( X test ˆu trainval , Y test ˆv trainval ) = ⟨ X test ˆu trainval , Y test ˆv trainval ⟩∥ X test ˆu trainval ∥ ∥ Y test ˆv trainval ∥ (115)
1) Effective range of c and c : To determine the range for the parameters ( c , c ) for the standard SCCA model (1), wereplace its L2 inequality constraints with the L2 equality constraints:maximize u , v u T X T Yv subject to u T X T Xu = , ∥ u ∥ ≤ c v T Y T Yv = , ∥ v ∥ ≤ c (116)We note that for valid L1 regularization, the L1 inequality constraints needs to be active (i.e., satisfied as equalities) at theoptimal solution. This implies that c ≥ minimize u ∥ u ∥ subject to ∥ Xu ∥ = (117) c ≥ minimize v ∥ v ∥ subject to ∥ Yv ∥ = (118)and c ≤ maximize u ∥ u ∥ subject to ∥ Xu ∥ = (119) c ≤ maximize v ∥ v ∥ subject to ∥ Yv ∥ = (120) The reason the sample covariance matrix has n t in the denominator rather than n t − is that we assume that population mean of is known. Simple analysis shows that a sufficient condition for (117)-(118) to hold is c ≥ max ⎛⎝ σ max ( X ) , √∑ n(cid:96) = max ≤ i ≤ p x (cid:96)i ⎞⎠ =∶ c , min (121) c ≥ max ⎛⎜⎝ σ max ( Y ) , √∑ n(cid:96) = max ≤ j ≤ q y (cid:96)j ⎞⎟⎠ =∶ c , min (122)Note however, that the objective in (119) (resp., (120)) is unbounded above when n < p (resp., n < q ), and thus it can not beused to find an effective maximum of c (resp., c ). To find an effective maximum value of c and c , we solve problem (116)in the absence of L1 constraints instead: maximize u , v u T X T Yv subject to u T X T Xu = v T Y T Yv = (123)Denote the optimal solution of problem 123 as ( u ∗ , v ∗ ) . We set c , max = ∥ u ∗ ∥ and c , max = ∥ v ∗ ∥ .It can be shown that u ∗ = ( X T X ) − / u , v ∗ = ( Y T Y ) − / v , where u and v are respectively the left and right singularvectors of ( X T X ) − / X T Y ( Y T Y ) − / associated with the largest singular value. If X T X is singular, we can use X T X + (cid:15) I p to approximate it; likewise for Y T Y .In a similar line of reasoning, to determine the range for the parameters ( c , c ) for the simplified SCCA model (1), considermaximize u , v u T X T Yv subject to ∥ u ∥ = , ∥ u ∥ ≤ c ∥ v ∥ = , ∥ v ∥ ≤ c (124)We note that an effective value of c and c should be such that the L1 inequality constraints are active (i.e., satisfied asequalities) at the optimal solution.To this end, it should satisfy c ≥ minimize u ∥ u ∥ subject to ∥ u ∥ = (125) c ≥ minimize v ∥ v ∥ subject to ∥ v ∥ = (126)and c ≤ maximize u ∥ u ∥ subject to ∥ u ∥ = (127) c ≤ maximize v ∥ v ∥ subject to ∥ v ∥ = (128)From (125)-(128), it follows that c , min ∶= ≤ c ≤ √ p (129) c , min ∶= ≤ c ≤ √ q (130)The upper bounds √ p for c and √ q for c are too relaxed. To find a tighter bound, we solve problem (124) in the absenceof L1 constraints instead: maximize u , v u T X T Yv subject to ∥ u ∥ = ∥ v ∥ = (131)The optimal solution is u ∗ = u , v ∗ = v , where u and v are respectively the left and right singular vectors of X T Y associated with the largest singular value. We set c , max = ∥ u ∗ ∥ and c , max = ∥ v ∗ ∥ . G. Variable selection performance
The balanced accuracy (bACC) and Matthews correlation coefficient (MCC) are defined as bACC = ( TPTP + FN + TNTN + FP ) , (132) MCC = TP × TN − FP × FN √( TP + FP )( TP + FN )( TN + FP )( TN + FN ) , (133) TABLE S2T HE X VARIABLE SELECTION PERFORMANCE OF THE
SCCA
AND SIMPLIFIED
SCCA
ON WHOLE TRAINING DATA . M ODEL R ECALL P RECISION F1 SCORE
ACC B ACC MCC PR AUC RAEE
XPERIMENTAL SETUP
IMP
SCCA 0.430 0.306 0.358 0.846 0.661 0.278 0.429 1.044E
XPERIMENTAL SETUP A N 0.998 0.320S
IMP
SCCA 1.000 0.602 0.751 0.846 0.899 0.693 0.800 0.110
TABLE S3T HE Y VARIABLE SELECTION PERFORMANCE OF THE
SCCA
AND SIMPLIFIED
SCCA
ON WHOLE TRAINING DATA . M ODEL R ECALL P RECISION F1 SCORE
ACC B ACC MCC PR AUC RAEE
XPERIMENTAL SETUP A N 0.896 0.823S
IMP
SCCA 1.000 0.566 0.723 0.770 0.836 0.616 0.957 0.030E
XPERIMENTAL SETUP A N 0.844 0.503S
IMP
SCCA 0.967 0.558 0.707 0.760 0.819 0.585 0.948 0.050 where TP, TN, FP, and FN denote the numbers of true positives, true negatives, false positives, and false negatives, respectively.The bACC and MCC are overall measures of variable selection accuracy, and a larger score indicates a better variable selectionperformance.The relative absolute error (RAE), which for the selection of X variables is defined as RAE = ∥ ˆu − u ∗ ∥ ∥ u ∗ ∥ (134)where u ∗ and ˆu denote the true and estimated canonical vector, respectively. Our variable selection performance on the syntheticdata is shown in Table S2 and Table S3. H. Subject characteristics
TABLE S4S
UBJECT CHARACTERISTICS
HC SMC EMCI LMCI ADNum 183 75 218 184 97Gender (M/F) 89/94 29/46 113/105 96/88 54/43Handedness (R/L) 163/20 65/10 195/23 165/19 89/8Age (mean ± std) 73.96 ± ± ± ± ± ± std) 16.44 ± ± ± ± ± Participant characteristics of our real imaging genetics data from the Alzheimer’s Disease Neuroimaging Initiative (ADNI)cohort is shown in Table S4.
I. Correlation structure of the real imaging genetic data
Correlation structure of the real ADNI imaging genetics data used in this study is shown in Fig. S4.
J. Hyperparameter tuning and generalization performance estimation
We employ the nested cross-validation method which is an extension of the procedure described in Section F. We firstrandomly divide each category of subjects into five roughly equal-sized subgroups and combine the data from each category toform five outer folds.We used the first fold for testing and the remaining folds for training/validating the model. Test set data are put aside. Thefollowing steps were carried out with the training+validation data:(1) We employ the stratified cross-validation (CV) method to choose ( c , c ) . The samples/subjects from each category arerandomly divided into five roughly equal-sized subgroups and then combined to form five folds I = ∪ k = I k . Denote genetic-imaging feature correlation
20 40 60 80 100 imaging feature j g e n e ti c f ea t u r e i -0.2-0.100.10.20.30.4 genetic-genetic feature correlation
200 400 600 800 1000 1200 1400 genetic feature i g e n e ti c f ea t u r e j -0.8-0.6-0.4-0.200.20.40.60.81 imaging-imaging feature correlation
10 20 30 40 50 60 70 80 90 imaging feature i i m a g i ng f ea t u r e j Fig. S4. Heatmaps showing the pairwise sample Pearson correlation coefficients between genetic and imaging features (left), within genetic features (middle),and within imaging features (right). X trainval I k and Y trainval I k , k = , , . . . , , as the submatrices formed by the rows of X trainval and Y trainval indexed by I k , respectively.(2) The SCCA model is fitted to the normalized ( X trainval I∖I , Y trainval I∖I ) to obtain the solution as ( ˆu trainval − , ˆv trainval − ) . Then,the performance on the validation data is recorded as Corr ( X trainval I ˆu trainval − , Y trainval I ˆv trainval − ) . This process isrepeated five times with each fold of samples/subjects used once as the validation set.(3) The cross-validation criterion to select the regularization parameters is defined as ( c opt1 , c opt2 ) = argmax c ,c ∑ k = Corr ( X trainval I k ˆu trainval − k , Y trainval I k ˆv trainval − k ) (135)where Corr (⋅ , ⋅) is the correlation function and ( ˆu trainval − k , ˆv trainval − k ) are the estimates of ( u , v ) by the standard SCCAon the training+validation data ( X trainval I∖I k , Y trainval I∖I k ) with ( c , c ) as regularization parameters.(4) The SCCA model was then fit to the entire training set at ( c opt1 , c opt2 ) to estimate the canonical weights ( ˆu opt , ˆv opt ) .The canonical correlation on the test data Corr ( X test ˆu opt , Y test ˆv opt ) is reported as the generalization performance. For thesimplified SCCA, the canonical covariance is used as the metric to measure the performance and to tune the regularizationparameters.This process is repeated five times with each outer fold of samples/subjects used once as the testing set. K. Genetic and Imaging Marker Selection
TABLE S5: Genetic features (ordered by absolute values of estimated canonical weights)selected by SCCA and simplified SCCA.Standard SCCA Simplified SCCASNP Closest gene ˆ u i p-value SNP Closest gene ˆ u i p-valuers4420638 APOE
APOE
APOE
APOE
ABCA7
APOE
SLC24A4 -0.102 9.11e-01 rs2075650
APOE
CR1
APOE -0.213 7.29e-06rs609903
PICALM -0.065 6.38e-01 rs8106922
APOE -0.183 2.27e-03Continued on next page TABLE S5 – continued from previous page
Standard SCCA Simplified SCCASNP Closest gene ˆ u i p-value SNP Closest gene ˆ u i p-valuers7141622 RIN3
APOE
CR1
APOE -0.121 4.20e-03rs923892
SORL1 -0.052 3.82e-01 rs157580
APOE -0.111 1.70e-01rs2949766
EPHA1
APOE -0.084 3.85e-01rs17126012
FERMT2
APOE -0.078 3.26e-01rs3087554
CLU
ABCA7
APOE -0.043 7.29e-06 rs609903
PICALM -0.067 6.38e-01rs6843
ABCA7
PICALM -0.067 3.06e-01rs1422189
MEF2C -0.042 5.00e-02 rs6843
ABCA7
MEF2C -0.040 2.05e-01 rs519825
APOE
DSG2 -0.038 9.48e-01 rs694011
PICALM -0.059 5.21e-01rs6064401
CASS4
ABCA7
MS4A6A
ABCA7
SORL1
PICALM -0.050 6.54e-01rs12703526
EPHA1 -0.022 9.19e-01 rs1237999
PICALM -0.043 8.32e-01rs611267
MS4A6A -0.021 1.19e-01 rs34374273
APOE -0.041 5.45e-02rs7936092
PICALM
DSG2 -0.041 6.87e-01rs733430
SORL1
PICALM
ABCA7 -0.020 3.68e-01 rs11608136
PICALM -0.040 7.11e-01rs8008270
FERMT2 -0.013 5.49e-02 rs8013925
RIN3
RIN3
DSG2 -0.040 6.82e-01rs157582
APOE
PICALM -0.029 7.78e-01rs1667284
DSG2 -0.009 6.87e-01 rs17258982
CR1
CELF1 -0.009 8.21e-01 rs7143400
FERMT2
MS4A6A -0.007 6.15e-01 rs3851179
PICALM -0.021 8.13e-01rs11952384
MEF2C -0.007 5.46e-01 rs405697
APOE -0.020 2.60e-01rs1784927
SORL1 -0.006 2.66e-01 rs4147932
ABCA7
NME8
CR1
APOE -0.006 2.27e-03 rs12961029
DSG2
DSG2
FERMT2 -0.016 5.49e-02rs244749
MEF2C
PICALM -0.016 7.83e-01rs753812
CELF1
FERMT2
APOE
FERMT2
INPP5D -0.005 4.06e-01 rs16979595
APOE
INPP5D -0.005 3.62e-01 rs4904920
SLC24A4
RIN3
FERMT2
CELF1
FERMT2
CELF1
FERMT2
INPP5D -0.004 3.97e-01 rs2405442
ZCWPW1 -0.013 2.75e-01rs254778
MEF2C
RIN3 -0.013 5.53e-01rs1117067
MS4A6A
RIN3
MS4A6A
EPHA1 -0.011 1.96e-01rs4939319
MS4A6A
INPP5D -0.011 1.28e-01rs7929057
MS4A6A
SLC24A4 -0.009 4.92e-01rs1866236
BIN1
DSG2 -0.009 5.14e-01rs11218325
SORL1
DSG2 -0.009 5.14e-01rs1791161
DSG2 -0.003 6.82e-01 rs12434016
SLC24A4 -0.008 9.11e-01rs1871045
APOE
CD33
CASS4
HLA-DRB1
BIN1
DSG2 -0.006 7.17e-01rs757232
ABCA7
DSG2
APOE
DSG2 -0.003 9.48e-01rs12476339
BIN1
CD33 -0.002 2.95e-01rs674747
MEF2C
APOE
MS4A6A -0.002 3.67e-01 rs12539172
ZCWPW1 -0.002 5.44e-01rs17186722
CR1 -0.002 5.20e-01 rs13426725
BIN1
ABCA7 -0.002 5.98e-01 rs10779277
CR1
MEF2C -0.002 1.49e-01 rs2490255
CR1
PICALM -0.002 7.78e-01 rs17186722
CR1
CR1 -0.002 6.84e-01 rs2940252
CR1
ABCA7 -0.002 6.04e-01 rs2661361
CR1
PICALM -0.002 5.21e-01 rs6664001
CR1
CELF1 -0.002 8.76e-01 rs17042520
CR1
CELF1 -0.002 8.76e-01 rs2135924
CR1
CELF1 -0.002 8.76e-01 rs6656123
CR1
APOE
CR1
EPHA1
CR1
EPHA1 -0.001 1.96e-01 rs1032980
CR1
BIN1
CR1
NME8 -0.001 4.05e-01 rs4308977
CR1
SORL1 -0.001 5.51e-01 rs17616
CR1 TABLE S5 – continued from previous page
Standard SCCA Simplified SCCASNP Closest gene ˆ u i p-value SNP Closest gene ˆ u i p-valuers8018746 SLC24A4 -0.001 3.49e-01 rs7549152
CR1
MS4A6A -0.001 3.62e-01 rs2182909
CR1
MS4A6A -0.001 3.62e-01 rs6540433
CR1
SLC24A4
CR1
ABCA7 -0.001 6.49e-01 rs12021671
CR1
SLC24A4
CR1
BIN1
CR1
NME8 -0.001 9.90e-01 rs9429940
CR1
PICALM -0.001 6.54e-01 rs11117956
CR1
MEF2C
CR1
INPP5D -0.001 1.28e-01 rs10127904
CR1
CR1 -0.001 5.21e-01 rs2274566
CR1
SLC24A4 -0.000 2.36e-01 rs3738468
CR1
ABCA7
CR1
CELF1 -0.000 8.27e-01 rs6691117
CR1
MS4A6A -0.000 6.72e-02 rs12032275
CR1
PICALM -0.000 8.32e-01 rs12734030
CR1
CELF1 -0.000 8.66e-01 rs12034383
CR1
APOE
CR1
SORL1
CR1
CR1 -0.000 2.92e-01 rs6696840
CR1
CR1 -0.000 2.65e-01 rs1323721
CR1
CR1 -0.000 5.18e-01 rs10863461
CR1 ˆ v j p-value brain ROI ˆ v j p-valueHippocampus L -0.403 1.25e-08 Frontal Med Orb L 0.138 9.65e-26Frontal Mid R 0.279 4.84e-18 Frontal Sup Medial L 0.135 8.66e-21Frontal Mid L 0.261 1.67e-18 Cingulum Ant L 0.133 2.32e-19Precentral L -0.249 7.67e-07 Frontal Med Orb R 0.133 1.04e-24Rolandic Oper L -0.238 4.63e-10 Frontal Sup Medial R 0.132 4.47e-20Frontal Sup Medial L 0.235 8.66e-21 Rectus L 0.132 3.33e-25Cerebelum 6 R 0.219 5.71e-10 Frontal Mid R 0.130 4.84e-18Calcarine R -0.216 5.11e-13 Frontal Mid Orb R 0.129 5.09e-21Insula R 0.206 1.67e-16 Frontal Mid L 0.129 1.67e-18Cingulum Ant L 0.188 2.32e-19 Temporal Mid R 0.128 2.15e-20Temporal Pole Mid R -0.187 1.39e-06 Rectus R 0.128 4.53e-22Caudate L 0.185 1.38e-01 Frontal Sup Orb R 0.128 3.23e-20Precentral R -0.179 1.25e-05 Insula R 0.127 1.67e-16Vermis 8 0.171 9.35e-01 Temporal Inf R 0.127 6.79e-19Temporal Inf R 0.169 6.79e-19 Frontal Sup Orb L 0.127 7.41e-20Cuneus R -0.165 9.59e-07 Frontal Mid Orb L 0.126 3.34e-21Olfactory L 0.130 1.46e-13 Frontal Inf Orb R 0.126 2.57e-14Heschl R 0.130 6.06e-17 Olfactory L 0.125 1.46e-13Occipital Inf L 0.112 2.30e-13 Cingulum Mid L 0.125 8.86e-22Cerebelum 9 L -0.112 1.18e-03 Cingulum Mid R 0.125 2.12e-19Thalamus R 0.108 9.22e-01 Frontal Inf Orb L 0.124 1.64e-17Cerebelum 3 L -0.107 2.41e-05 Cingulum Ant R 0.123 6.76e-15Putamen L 0.105 2.11e-17 Frontal Sup R 0.123 1.35e-14Frontal Med Orb L 0.098 9.65e-26 Temporal Sup R 0.123 2.06e-20Temporal Mid R 0.097 2.15e-20 Temporal Mid L 0.121 1.94e-21Occipital Mid L 0.090 1.55e-09 Precuneus L 0.121 8.66e-22Frontal Inf Orb R 0.081 2.57e-14 Olfactory R 0.120 7.48e-11Frontal Mid Orb R 0.073 5.09e-21 Precuneus R 0.120 8.93e-23Olfactory R 0.069 7.48e-11 Frontal Inf Tri L 0.120 4.56e-16Vermis 3 0.059 1.05e-01 Temporal Inf L 0.119 5.09e-19Cerebelum 3 R -0.053 1.33e-05 Temporal Sup L 0.119 8.89e-17Cerebelum 4 5 R -0.052 5.88e-09 Frontal Sup L 0.119 6.30e-15Cuneus L -0.052 5.56e-06 Parietal Inf L 0.119 2.94e-15Frontal Sup R 0.051 1.35e-14 SupraMarginal R 0.118 7.04e-15Cerebelum 10 L 0.044 1.02e-04 Frontal Inf Tri R 0.117 4.50e-13Cerebelum 7b L 0.042 4.67e-07 Angular R 0.117 5.56e-16Hippocampus R -0.034 4.66e-08 Angular L 0.116 6.30e-17Cerebelum 4 5 L -0.033 1.29e-04 Parietal Inf R 0.115 7.79e-14Continued on next page TABLE S6 – continued from previous page
Standard SCCA Simplified SCCAbrain ROI ˆ v j p-value brain ROI ˆ v j p-valueCerebelum 6 L 0.032 1.69e-09 Insula L 0.115 4.36e-14Cingulum Mid R 0.023 2.12e-19 Heschl R 0.114 6.06e-17Supp Motor Area R -0.010 7.83e-15 SupraMarginal L 0.113 2.29e-11Cingulum Post R -0.009 3.18e-04 Frontal Inf Oper R 0.112 2.22e-14Fusiform R -0.009 1.94e-20 Rolandic Oper R 0.111 2.84e-13Postcentral L -0.009 2.91e-09 Supp Motor Area L 0.110 1.13e-15Frontal Mid Orb L 0.008 3.34e-21 Fusiform R 0.110 1.94e-20Postcentral R -0.008 1.85e-08 Cingulum Post L 0.108 5.66e-13Calcarine L -0.008 3.33e-17 Fusiform L 0.108 4.12e-19Frontal Inf Oper R -0.008 2.22e-14 Frontal Inf Oper L 0.106 4.89e-12Lingual L -0.008 2.68e-15 Putamen L 0.106 2.11e-17Cingulum Mid L 0.007 8.86e-22 Putamen R 0.106 4.16e-15Parietal Inf L 0.007 2.94e-15 Heschl L 0.105 1.05e-14Frontal Sup Medial R 0.007 4.47e-20 Temporal Pole Sup L 0.104 3.30e-08Temporal Sup L 0.007 8.89e-17 Temporal Pole Sup R 0.104 4.33e-09ParaHippocampal R -0.007 4.46e-01 Occipital Mid L 0.101 1.55e-09Temporal Sup R 0.006 2.06e-20 Occipital Inf L 0.100 2.30e-13Paracentral Lobule R -0.006 3.84e-12 Supp Motor Area R 0.098 7.83e-15Lingual R -0.006 7.85e-17 Rolandic Oper L 0.095 4.63e-10Temporal Pole Sup L 0.005 3.30e-08 Parietal Sup L 0.094 3.29e-10Paracentral Lobule L -0.005 5.85e-08 Temporal Pole Mid L 0.093 2.11e-07Vermis 1 2 0.005 6.11e-04 Occipital Mid R 0.092 1.56e-08Occipital Sup L -0.005 1.43e-02 Occipital Inf R 0.091 1.26e-09Occipital Inf R 0.005 1.26e-09 Calcarine L 0.091 3.33e-17Cingulum Post L -0.004 5.66e-13 Temporal Pole Mid R 0.088 1.39e-06Temporal Pole Mid L -0.004 2.11e-07 Postcentral R 0.087 1.85e-08Fusiform L -0.004 4.12e-19 Postcentral L 0.086 2.91e-09Pallidum R -0.004 3.78e-03 Paracentral Lobule R 0.084 3.84e-12Parietal Sup R -0.003 3.11e-05 Lingual R 0.081 7.85e-17Pallidum L -0.002 4.55e-02 Precentral L 0.078 7.67e-07Caudate R -0.002 5.42e-01 Lingual L 0.078 2.68e-15Vermis 9 0.002 9.51e-02 Amygdala L 0.078 8.42e-07Vermis 7 -0.002 4.63e-02 Cingulum Post R 0.076 3.18e-04Cerebelum Crus1 R -0.002 3.23e-03 Precentral R 0.074 1.25e-05Cerebelum 8 R -0.001 3.11e-06 Amygdala R 0.073 1.09e-02Cerebelum 7b R 0.001 6.70e-05 Calcarine R 0.072 5.11e-13Frontal Inf Oper L -0.001 4.89e-12 Parietal Sup R 0.071 3.11e-05Frontal Inf Orb L 0.001 1.64e-17 Cuneus L 0.071 5.56e-06ParaHippocampal L -0.001 4.57e-01 Paracentral Lobule L 0.068 5.85e-08Thalamus L 0.001 2.74e-01 Occipital Sup R 0.067 4.50e-05Supp Motor Area L -0.000 1.13e-15 ParaHippocampal R 0.064 4.46e-01Frontal Sup L -0.000 6.30e-15 Cerebelum 6 R 0.062 5.71e-10Frontal Sup Orb L -0.000 7.41e-20 Occipital Sup L 0.059 1.43e-02Frontal Sup Orb R -0.000 3.23e-20 Cuneus R 0.059 9.59e-07Frontal Inf Tri L -0.000 4.56e-16 Caudate R 0.057 5.42e-01Frontal Inf Tri R -0.000 4.50e-13 Pallidum R 0.056 3.78e-03Rolandic Oper R -0.000 2.84e-13 Caudate L 0.054 1.38e-01Frontal Med Orb R -0.000 1.04e-24 Cerebelum 6 L 0.051 1.69e-09Rectus L -0.000 3.33e-25 ParaHippocampal L 0.051 4.57e-01Rectus R -0.000 4.53e-22 Pallidum L 0.047 4.55e-02Insula L -0.000 4.36e-14 Cerebelum 3 L -0.044 2.41e-05Cingulum Ant R -0.000 6.76e-15 Cerebelum 8 R -0.042 3.11e-06Amygdala L -0.000 8.42e-07 Cerebelum 8 L -0.042 2.15e-05Amygdala R -0.000 1.09e-02 Cerebelum 4 5 R 0.041 5.88e-09Occipital Sup R -0.000 4.50e-05 Cerebelum 7b L -0.040 4.67e-07Occipital Mid R -0.000 1.56e-08 Cerebelum Crus2 L -0.039 3.35e-06Parietal Sup L -0.000 3.29e-10 Cerebelum 9 L -0.038 1.18e-03Parietal Inf R -0.000 7.79e-14 Cerebelum Crus2 R -0.038 8.42e-06SupraMarginal L -0.000 2.29e-11 Thalamus R 0.035 9.22e-01SupraMarginal R -0.000 7.04e-15 Cerebelum 9 R -0.033 3.81e-04Angular L -0.000 6.30e-17 Cerebelum 3 R -0.032 1.33e-05Angular R -0.000 5.56e-16 Cerebelum 10 R -0.030 7.99e-05Precuneus L -0.000 8.66e-22 Thalamus L 0.029 2.74e-01Precuneus R -0.000 8.93e-23 Cerebelum 7b R -0.029 6.70e-05Putamen R -0.000 4.16e-15 Vermis 1 2 -0.023 6.11e-04Heschl L -0.000 1.05e-14 Vermis 7 -0.022 4.63e-02Temporal Pole Sup R -0.000 4.33e-09 Vermis 4 5 0.021 5.31e-04Temporal Mid L -0.000 1.94e-21 Cerebelum Crus1 R -0.021 3.23e-03Temporal Inf L -0.000 5.09e-19 Vermis 8 0.016 9.35e-01Cerebelum Crus1 L -0.000 7.15e-02 Cerebelum 10 L -0.016 1.02e-04Cerebelum Crus2 L -0.000 3.35e-06 Vermis 10 -0.014 9.29e-02Continued on next page TABLE S6 – continued from previous page
Standard SCCA Simplified SCCAbrain ROI ˆ v j p-value brain ROI ˆ v j p-valueCerebelum Crus2 R -0.000 8.42e-06 Cerebelum Crus1 L -0.012 7.15e-02Cerebelum 8 L -0.000 2.15e-05 Vermis 9 0.011 9.51e-02Cerebelum 9 R -0.000 3.81e-04 Vermis 3 -0.008 1.05e-01Cerebelum 10 R -0.000 7.99e-05 Hippocampus R 0.007 4.66e-08Vermis 4 5 -0.000 5.31e-04 Vermis 6 0.003 1.60e-01Vermis 6 -0.000 1.60e-01 Hippocampus L -0.003 1.25e-08Vermis 10 -0.000 9.29e-02 Cerebelum 4 5 L 0.000 1.29e-04 CR B I N I N PP D M E F C H L A - D RB H L A - D RB T R E M C D A P N M E Z C W P W E P HA P T K B C L U C EL F M S A A P I C A L M S O R L F E R M T S L C A R I N D S G A BC A A P O E C D C A SS Estimated u by SCCA -0.200.20.40.60 500 1000 1500-0.200.20.40.60 500 1000 1500-0.200.20.40 500 1000 150000.20.40.60 500 1000 1500-0.200.20.4 CR B I N I N PP D M E F C H L A - D RB H L A - D RB T R E M C D A P N M E Z C W P W E P HA P T K B C L U C EL F M S A A P I C A L M S O R L F E R M T S L C A R I N D S G A BC A A P O E C D C A SS Estimated u by Simplified SCCA
Fig. S5. Canonical genetic weights estimated by SCCA (top figure) and simplified SCCA (bottom figure). In each figure, the results on each of the fourtraining folds (rows 1-4) and on the entire data (bottom row) are shown. - . - . .
20 0 . - . - . . . - . - . . . - . - . . . Precentral_LPrecentral_RFrontal_Sup_LFrontal_Sup_RFrontal_Sup_Orb_LFrontal_Sup_Orb_RFrontal_Mid_LFrontal_Mid_RFrontal_Mid_Orb_LFrontal_Mid_Orb_RFrontal_Inf_Oper_LFrontal_Inf_Oper_RFrontal_Inf_Tri_LFrontal_Inf_Tri_RFrontal_Inf_Orb_LFrontal_Inf_Orb_RRolandic_Oper_LRolandic_Oper_RSupp_Motor_Area_LSupp_Motor_Area_ROlfactory_LOlfactory_RFrontal_Sup_Medial_LFrontal_Sup_Medial_RFrontal_Med_Orb_LFrontal_Med_Orb_RRectus_LRectus_RInsula_LInsula_RCingulum_Ant_LCingulum_Ant_RCingulum_Mid_LCingulum_Mid_RCingulum_Post_LCingulum_Post_RHippocampus_LHippocampus_RParaHippocampal_LParaHippocampal_RAmygdala_LAmygdala_RCalcarine_LCalcarine_RCuneus_LCuneus_RLingual_LLingual_ROccipital_Sup_LOccipital_Sup_ROccipital_Mid_LOccipital_Mid_ROccipital_Inf_LOccipital_Inf_RFusiform_LFusiform_RPostcentral_LPostcentral_RParietal_Sup_LParietal_Sup_RParietal_Inf_LParietal_Inf_RSupraMarginal_LSupraMarginal_RAngular_LAngular_RPrecuneus_LPrecuneus_RParacentral_Lobule_LParacentral_Lobule_RCaudate_LCaudate_RPutamen_LPutamen_RPallidum_LPallidum_RThalamus_LThalamus_RHeschl_LHeschl_RTemporal_Sup_LTemporal_Sup_RTemporal_Pole_Sup_LTemporal_Pole_Sup_RTemporal_Mid_LTemporal_Mid_RTemporal_Pole_Mid_LTemporal_Pole_Mid_RTemporal_Inf_LTemporal_Inf_RCerebelum_Crus1_LCerebelum_Crus1_RCerebelum_Crus2_LCerebelum_Crus2_RCerebelum_3_LCerebelum_3_RCerebelum_4_5_LCerebelum_4_5_RCerebelum_6_LCerebelum_6_RCerebelum_7b_LCerebelum_7b_RCerebelum_8_LCerebelum_8_RCerebelum_9_LCerebelum_9_RCerebelum_10_LCerebelum_10_RVermis_1_2Vermis_3Vermis_4_5Vermis_6Vermis_7Vermis_8Vermis_9Vermis_10 E s ti m a t e d v by S CC A Fig. S6. Canonical imaging weights estimated by SCCA (top figure) and simplified SCCA (bottom figure). In each figure, the results on each of the fourtraining folds (rows 1-4) and on the entire data (bottom row) are shown. - .
05 0 0 .
05 0 . . - .
05 0 0 .
05 0 . . - .
05 0 0 .
05 0 . . - .
05 0 0 .
05 0 . . - .
05 0 0 .