[PDF] Generalized Liquid Association Analysis for Multimodal Data Integration

Abstract

Multimodal data are now prevailing in scientific research. A central question in multimodal integrative analysis is to understand how two data modalities associate and interact with each other given another modality or demographic variables. The problem can be formulated as studying the associations among three sets of random variables, a question that has received relatively less attention in the literature. In this article, we propose a novel generalized liquid association analysis method, which offers a new and unique angle to this important class of problems of studying three-way associations. We extend the notion of liquid association of \citet{li2002LA} from the univariate setting to the sparse, multivariate, and high-dimensional setting. We establish a population dimension reduction model, transform the problem to sparse Tucker decomposition of a three-way tensor, and develop a higher-order orthogonal iteration algorithm for parameter estimation. We derive the non-asymptotic error bound and asymptotic consistency of the proposed estimator, while allowing the variable dimensions to be larger than and diverge with the sample size. We demonstrate the efficacy of the method through both simulations and a multimodal neuroimaging application for Alzheimer's disease research.

Full PDF

GGeneralized Liquid Association Analysisfor Multimodal Data Integration

Lexin Li † , Jing Zeng ‡ , and Xin Zhang ‡ ∗ † University of California at Berkeley and ‡ Florida State University

Abstract

Multimodal data are now prevailing in scientiﬁc research. A central question in multi-modal integrative analysis is to understand how two data modalities associate and interactwith each other given another modality or demographic covariates. The problem can be for-mulated as studying the associations among three sets of random variables, a question thathas received relatively less attention in the literature. In this article, we propose a novelgeneralized liquid association analysis method, which offers a new and unique angle to thisimportant class of problem of studying three-way associations. We extend the notion of liquidassociation of Li (2002) from the univariate setting to the multivariate and high-dimensionalsetting. We establish a population dimension reduction model, transform the problem tosparse Tucker decomposition of a three-way tensor, and develop a higher-order singular valuedecomposition estimation algorithm. We derive the non-asymptotic error bound and asymp-totic consistency of the proposed estimator, while allowing the variable dimensions to belarger than and diverge with the sample size. We demonstrate the efﬁcacy of the methodthrough both simulations and a multimodal neuroimaging application for Alzheimer’s dis-ease research.

Key Words:

Liquid association; Multimodal neuroimaging; Sufﬁcient dimension reduction;Tensor analysis; Tucker tensor decomposition. ∗ The authors contributed equally to this work and are listed in alphabetical order. a r X i v : . [ s t a t . M E ] A ug Introduction

Multimodal data are now prevailing in scientiﬁc research, where different types of data, basedon different physical and physiological sensitivities of machines and technologies, are acquiredfor a common set of experimental subjects. One example is multi-omics, where different geneticinformation such as gene expressions, copy number alternations, and methylation changes arejointly collected for the same biological samples (Richardson et al., 2016). Another example ismultimodal neuroimaging, where distinct brain characteristics including brain structure, function,and chemical constituents are simultaneously measured for the same study subjects (Uludag andRoebroeck, 2014). Integrative analysis of multimodal data aggregates such diverse and oftencomplementary information, and consolidates knowledge across multiple data modalities.In this article, we aim to address a question of central interest in multimodal integrative anal-ysis, i.e., to understand how different data modalities associate and interact with each other givenother modalities or demographic covariates. Our motivation is a multimodal positron emissiontomography (PET) study for Alzheimer’s disease (AD) research. Amyloid-beta and tau are twohallmark proteins of AD, and both can be measured in vivo by PET imaging using differentnuclear tracers. The two proteins are closely associated in terms of spatial patterns of their accu-mulations, and such association patterns are believed to be highly affected by the subject’s age(Braak and Braak, 1991). On the other hand, their speciﬁc age-dependent regional associationsremain unclear. The data we study involve 81 elderly subjects, each receiving two PET scans thatmeasure the depositions of amyloid-beta and tau, respectively. Each PET modality is representedby a vector of measurements recording the amount of the protein at a set of parcellated brainregions-of-interest. Brain parcellation is particularly useful to facilitate the interpretation, andhas been frequently employed in brain imaging analysis (Fornito et al., 2013; Kang et al., 2016).We focus on the brain regions known to have possible amyloid-beta and tau accumulations. Thisresults in 60 regions for amyloid-beta, and 26 regions for tau. Our goal is to ﬁnd how and wherein the brain the associations of the two proteins are the most contrastive as age varies.This problem can be formulated statistically as studying the associations of two sets of ran-dom variables X ∈ R p and Y ∈ R p conditional on the third set of random variables Z ∈ R p .In our example, X denotes the amyloid-beta PET imaging with p = 60 , Y denotes the tau2ET imaging with p = 26 , and Z denotes the subject’s age with p = 1 . Meanwhile, in plentyof multimodal applications, X , Y , Z can all be high-dimensional, and their dimensions can beeven larger than the sample size. For instance, in imaging genetics (Nathoo et al., 2017), X , Y can represent different imaging modalities, whose dimensions can be in hundreds, and Z candenote the genetic information, whose dimension can be in tens of thousands or more. In high-dimensional data analysis, it is common to postulate that the data information can be sufﬁcientlycaptured by some low-dimensional representations, or most often, some linear combinations ofthe originally high-dimensional variables (Cook, 2007). Adopting this view, our question can beformulated as seeking linear combinations of X and linear combinations of Y whose conditionalassociations given Z are the most contrastive. In other words, we seek linear combinations of X and Y that change the most as Z changes.We also comment that, although motivated by a speciﬁc multimodal neuroimaging study,the problem we address is of a broad scientiﬁc interest, and the method is applicable to othermultimodal stuides, e.g., the multi-omics data analysis (Shen et al., 2013). There has been a rich statistical literature studying the associations between two sets of multivari-ate variables X and Y . A well studied and commonly used family of methods are canonical cor-relation analysis (CCA) and its variants (Witten et al., 2009; Gao et al., 2015; Li and Gaynanova,2018; Shu et al., 2019; Mai and Zhang, 2019, among others). CCA explores the symmetric rela-tions between X and Y , and looks for pairs of linear combinations that are most correlated. Thisgoal, however, is different from ours, as the highly correlated linear combinations of X and Y are not necessarily the ones that are the most contrastive. For instance, a pair of linear combina-tions of X and Y can be highly correlated, while this correlation remains a constant as the valueof Z varies, and as such they are not the target of our problem. We later numerically compare ourmethod with CCA to further demonstrate their differences. Another popular family of methodsare sufﬁcient dimension reduction (SDR), which looks for linear combinations of X that capturefull regression information of Y given X ; see the recent book of Li (2018) for a comprehen-sive review of this topic. Later we show that our proposed method is connected to several SDRmethods, including principal Hessian directions (Li, 1992; Cook, 1998; Tang et al., 2020), andpartial and groupwise sufﬁcient dimension reduction (Chiaromonte et al., 2002; Li et al., 2010).3owever, the goals of the two are utterly different. Whereas SDR studies asymmetric relations of Y conditioning on X , we seek symmetric relations between X and Y conditioning on the thirdset of variables Z , in that the roles of X and Y are interchangeable, but not with the role of Z .Compared to the setting of two sets of variables, there have been much fewer statistical meth-ods studying the associations among three sets of multivariate variables in the form of X and Y given Z . In his groundbreaking work, Li (2002) proposed a novel three-way interaction metric,termed liquid association , that measures the extent to which the association of a pair of randomvariables depends on the value of a third variable. He showed that this metric is particularlyuseful in discovering co-expressed gene pairs that are regulated by another gene. However, Li(2002) only considered the scenario where all three variables X, Y, Z are one-dimensional. Liet al. (2004) extended the notion of liquid association to the scenario of a multivariate X anda scalar Z , and sought two linear combinations γ T X and γ T X such that corr( γ T X , γ T X | Z ) varies the most with Z . Ho et al. (2011) and Yu (2018) developed some modiﬁed versions ofliquid association, but still focused on the one-dimensional X, Y, Z scenario. Relatedly, Chenet al. (2011) proposed a bivariate conditional normal model to identify the variables that regulatethe co-expression patterns between two genes. That corresponds to the scenario with a scalar X ,a scalar Y and a multivariate Z . Abid et al. (2018) proposed contrastive principal componentanalysis for a multivariate X and a binary scalar Z , which sought linear combinations of X thathave the largest changes in the conditional variance given Z = 0 versus Z = 1 . Moreover, Locket al. (2013); Li and Jung (2017) developed a class of matrix and tensor factorization methods,which aimed to decompose the multimodal data into the components that capture joint variationshared across modalities, and the components that characterize modality-speciﬁc variation. Theirgoal is again different from ours, as their methods did not target the conditional distribution of X , Y given Z . Finally, Xia et al. (2019) analyzed a similar dataset as our motivation example,but tackled a totally different problem. They studied hypothesis testing of covariance betweenthe two multivariate PET measurements, and worked on the residuals after regressing out the ageeffect, which involves no conditioning of any third set of variables. In this article, we study the three-way associations among multivariate X , Y , Z , and seek a setof linear combinations of X and Y that has a varying association as Z varies. We generalize4he notion of liquid association of Li (2002) from the univariate case to the multivariate case.However, such a generalization is far from trivial, leading to a completely different estimationmethod and the associated asymptotic theory. For the estimation, we transform the generalizedliquid association analysis to the problem of sparse Tucker decomposition of a three-way tensor,and introduce sparsity for the linear combinations to improve the interpretability. We then developa higher-order singular value decomposition algorithm for parameter estimation. For the theory,we establish a population model that is essential for the study of statistical properties. We thenderive the error bound and consistency, while allowing the variable dimensions p , p , p to belarger than and to diverge to inﬁnity along with the sample size n . As a result, our proposal makessome useful contributions from both the scientiﬁc and statistical perspectives.Scientiﬁcally, characterizing the associations between different modalities given other modal-ities or covariates is of crucial importance for multimodal integrative analysis. However, there isalmost no existing statistical solution available for this type of problem, especially when all themodalities involved are high-dimensional. Our proposal offers a unique angle for this importantclass of problem. As an illustration, for our multimodal PET study, understanding the dynamicpatterns between amyloid-beta and tau with respect to age would offer pivotal insight about howpathological proteins of Alzheimer’s disease interact in the aging human brain. In turn, it couldlead to better design and subject recruitment of clinical trials to potentially slow the spread of AD.For instance, clinical trials that aim at testing anti-amyloid-beta or anti-tau agents would need toknow not only that participants have amyloid-beta and tau in the brain, but also how the relativelevels of each pathological protein are spatially associated with each other given their ages.Statistically, our proposal of generalized liquid association analysis makes a useful additionto the toolbox of association analysis of more than two sets of variables. Moreover, our methodinvolves sparse tensor decomposition, which is itself of independent interest. Tensor data analy-sis is gaining increasing attention in recent years (Kolda and Bader, 2009; Zhou et al., 2013; Biet al., 2018; Tang et al., 2019; Zhang and Han, 2019, among others); see also Bi et al. (2020) for areview of tensor analysis in statistics. Nevertheless, our proposal differs in several ways. Partic-ularly, our sparse tensor decomposition algorithm is related to some recent proposals of singularvalue decomposition (SVD) for matrix denoising (Yang et al., 2016) and tensor denoising (Zhangand Han, 2019), in that they share a similar iterative hard thresholding SVD scheme. However,our algorithm is tailored to the tensor parameter estimation with more ﬂexible initialization and5uning. As a result, our theoretical analysis differs considerably from the denoising problems.Whereas both Yang et al. (2016) and Zhang and Han (2019) achieved the minimax optimal es-timation for their denoising problems, we establish the dimension reduction subspace recoveryconsistency, variable selection consistency, and tensor parameter estimation consistency. Ourrate of convergence matches the optimal rate in previous works, and all the consistency resultsare established in the ultra-high dimensional setting of s log( p ) = o ( n ) , where s = s s s and p = p p p are the products of the number of nonzero entries and dimensions, respectively, in X , Y and Z . Our theoretical development is highly non-trivial, and may be of independent interestfor future research involving tensor parameter estimation in a statistical model with i.i.d. data. Ina sense, our work further broadens the scope of higher-order sparse SVD and tensor analysis.The rest of the article is organized as follows. Section 2 develops the concept of general-ized liquid association, and the corresponding population model of generalized liquid associationanalysis. Section 3 introduces the estimation algorithm, and Section 4 establishes the theoreticalguarantees. Section 5 presents the simulations, and Section 6 revisits the multimodal PET study.Section 7 concludes the paper with a discussion, and the Supplementary Appendix collects alltechnical proofs and additional numerical results. We begin with a brief review of the concept of liquid association (LA) proposed by Li (2002) forthe univariate case. We then extend this notion to the multivariate case.Suppose

X, Y, Z are random variables with mean zero and variance one. Deﬁne g ( z ) =E( XY | Z = z ) : R (cid:55)→ R . Li (2002) deﬁned the liquid association of X and Y given Z as, LA(

X, Y | Z ) = E (cid:26) dg ( Z ) dZ (cid:27) = E (cid:26) ddZ E( XY | Z ) (cid:27) , When Z follows a standard normal distribution, by Stein’s Lemma (Stein, 1981), we have, LA(

X, Y | Z ) = E { g ( Z ) Z } = E( XY Z ) . Intuitively,

LA(

X, Y | Z ) characterizes the change of the association of X and Y conditioning on Z through g ( z ) , and the normality condition connects this quantity with the simple unconditional6xpectation E( XY Z ) . In practice, the univariate Z is transformed to standard normal using thenormal score transformation, and the LA measure is estimated by the sample mean E( XY Z ) . Liet al. (2004) considered an extension of LA to a multivariate X ∈ R p and a scalar Z , by lookingfor two linear combinations, such that LA( γ T X , γ T X | Z ) = γ T E( XX T Z ) γ is maximized. Ithas a close-form solution that γ = ( v + v p ) / √ , γ = ( v − v p ) / √ , where v and v p are theeigenvectors of the matrix E( XX T Z ) ∈ R p × p with the largest and smallest eigenvalues.We next extend the concept of liquid association to the multivariate case, where X ∈ R p , Y ∈ R p , and Z ∈ R p . Without loss of generality, suppose each variable entry in X , Y , and Z are standardized with mean zero and variance one. Deﬁne g ( z ) = E( XY T | Z = z ) : R p (cid:55)→ R p × p . We then deﬁne the generalized liquid association measure, which is a three-way tensor, as fol-lows.

Deﬁnition 1.

The generalized liquid association (GLA) of X and Y with respect to Z is, Φ = GLA( X , Y | Z ) = E (cid:26) dd Z g ( Z ) (cid:27) ∈ R p × p × p . When Z follows a multivariate normal distribution, by the multivariate version of Stein’s lemma(Liu, 1994, Lemma 1), we have the following property regarding Φ . Proposition 1. If Z ∼ Normal(0 , Σ Z ) , then Φ = E( X ◦ Y ◦ Z ) × Σ − Z , where ◦ denotes theouter product, and × denotes the mode-3 product between a tensor and a matrix. This extension from univariate to multivariate variables is straightforward. However, we recog-nize that all X , Y and Z can be high-dimensional such that p , p , p > n , and the dimensionof Φ is the product p p p . Besides, it involves the inversion of a potentially high-dimensionalcovariance matrix Σ Z , which makes direct calculation or any operation on Φ difﬁcult, if notcompletely infeasible. Next, we consider a dimension reduction model for Φ , which reduces thedimensionality, avoids Σ − Z , and improves the interpretability of the results. We also discuss thenormality assumption in more detail, and show that it is not absolutely necessary. Our key idea of dimension reduction is to look for linear combinations of X and Y that changethe most as Z or some of its linear combinations change. Moreover, we show that dropping Σ − Z X and Y .Speciﬁcally, we ﬁrst postulate that the matrix g ( z ) = E( XY T | Z = z ) ∈ R p × p varieswithin a low-dimensional subspace for all values of z , in that g ( z ) = E( XY T | Z = z ) = Γ f ( z ) Γ T , for some semi-orthogonal basis matrices Γ k ∈ R p k × r k , k = 1 , , and some latent function f : R r (cid:55)→ R r × r . This implies that the linear combinations Γ T X and Γ T Y capture all the variationsin the generalized liquid association measure. Next, we further assume that the generalized liquidassociation depends on Z only through a few linear combinations Γ T Z of Z , for some semi-orthogonal basis matrix Γ ∈ R p × r . Putting these two dimension reduction structures together,we obtain our dimension reduction model for the generalized liquid association analysis, g ( z ) = E( XY T | Z = z ) = Γ f ( Γ T z ) Γ T , (1)Given this model, we have the following result. Proposition 2.

Under model (1) , if Γ T Z is normally distributed, then Φ = Φ × P Γ × P Γ × P Γ = Φ × k P Γ k , where P Γ k = Γ k ( Γ Tk Γ k ) − Γ Tk denotes the projection onto the subspace span( Γ k ) spanned by thecolumns of Γ k , and × k denotes the mode- k product between a tensor and a matrix, k = 1 , , . An important implication of this proposition is that it points a way to estimate the linear com-bination coefﬁcients Γ k that we seek in the dimension reduction model (1), or more accurately,the subspace span( Γ k ) spanned by the columns of Γ k , k = 1 , , . That is, following Proposition2, we can re-formulate Γ k ’s as the solutions to the following optimization problem, minimize G , G , G (cid:107) Φ − Φ × P G × P G × P G (cid:107) F , (2)where G k ∈ R p k × r k , k = 1 , , , are the semi-orthogonal matrices. The optimization in (2) isactually the well-known tensor Tucker decomposition (Kolda and Bader, 2009).Moreover, let ( (cid:101) G , (cid:101) G , (cid:101) G ) denote the minimizer of (2). Then ( (cid:101) G , (cid:101) G , (cid:101) G ) is the solutionto the maximization problem, maximize G , G , G (cid:107) Φ × P G × P G × P G (cid:107) F .

8n other words, solving (2) helps ﬁnd the linear combinations of X and Y whose generalizedliquid association given some linear combination of Z is maximized under model (1). In thissense, it achieves our goal of ﬁnding the most contrastive associations of X and Y given Z .An issue with (2), however, is the computation of the generalized liquid association measure Φ still involves the matrix Σ − Z . Next, we show that we can essentially drop Σ − Z in our general-ized liquid association analysis. Toward that end, we deﬁne ∆ = E( X ◦ Y ◦ Z ) ∈ R p × p × p .By Proposition 1, Φ = ∆ × Σ − Z . We then consider the following optimization problem, minimize G , G , G (cid:107) ∆ − ∆ × P G × P G × P G (cid:107) F , (3)where G k ∈ R p k × r k , k = 1 , , , are the semi-orthogonal matrices. The optimization in (3) is stilla Tucker decomposition problem. However, it is much easier to compute ∆ given the data, as itinvolves no matrix inversion. Let ( (cid:98) G , (cid:98) G , (cid:98) G ) denote the minimizer of (3). The next propositionconnects the minimization problem (3) with (2). Proposition 3.

Under model (1) , span( Γ k ) = span( (cid:101) G k ) = span( (cid:98) G k ) , k = 1 , . If Γ T Z isnormally distributed, then span( Γ ) = span( (cid:101) G ) = Σ − Z span( (cid:98) G ) . A few remarks are in order. First of all, Proposition 3 suggests that minimizing (3) givesthe same estimates of Γ and Γ in model (1) as minimizing (2), in the sense that they span thesame subspaces. Meanwhile, the estimates of Γ under the two minimization problems differby a rotation. In the generalized liquid association analysis, our primary goal is to ﬁnd linearcombinations of X and Y that are the most contrastive given Z . As such, we are more interestedin the estimation of Γ and Γ , whereas the estimation of Γ provides additional dimensionreduction, but is, relatively speaking, of less interest. This justiﬁes to focus on the minimizationproblem (3) instead of (2) in our generalized liquid association analysis.Second, we discuss more about the normality assumption. The normality of Z allows usto use Stein’s Lemma to link the generalized liquid association measure Φ with the expectationtensor ∆ , as shown in Proposition 1. This assumption is commonly imposed in the liquid associ-ation analysis literature, and is often achieved by normal score transformation of data (Li, 2002;Li et al., 2004; Yu, 2018). On the other hand, we argue this assumption is not that crucial for ourproposed generalized liquid association analysis. This is because, as shown in Proposition 3, theestimation of Γ and Γ does not really require the normality assumption. When it holds, we are9ble to connect the minimizer of (3) with the generalized liquid association measure. When itdoes not hold, we lose this interpretation. Instead, we seek the linear combinations in the dimen-sion reduction model (1) that target the function g ( z ) , rather than the expectation of the derivativeof g ( z ) . In other words, the minimization problem in (3) is still meaningful and provides usefulinformation, but admits a different interpretation. Actually, this is similar in spirit to one of thesufﬁcient dimension reduction methods, the principal Hessian directions (Li, 1992). It was alsoderived based on Stein’s Lemma, but was shown to be useful for ﬁnding low-dimensional rep-resentations in graphics (Cook, 1998), and for detecting interaction terms in regressions (Tanget al., 2020), even without the normality assumption. Tucker decomposition is usually solved by a higher-order singular value decomposition (HOSVD)algorithm or its variants. Next, we develop an iterative HOSVD algorithm to solve (3). We furtherintroduce sparsity in this decomposition to improve the interpretability of the results.For n i.i.d. data observations { X i , Y i , Z i , i = 1 , . . . , n } , without loss of generality, we as-sume the data is centered, so that (cid:80) ni =1 Z i = 0 . The centering of X and Y is not required, butfor simplicity, we assume (cid:80) ni =1 X i = (cid:80) ni =1 Y i = 0 as well. Then the sample estimator of ∆ issimply (cid:101) ∆ = n − (cid:80) ni =1 X i ◦ Y i ◦ Z i ∈ R p × p × p . Following the dimension reduction model (1)and the optimization problem (3), (cid:101) ∆ admits a Tucker tensor decomposition structure, which canbe solved by some version of the higher-order singular value decomposition algorithm. Speciﬁ-cally, we simplify the STAT-SVD algorithm recently proposed by Zhang and Han (2019) for thetensor denoising problem, and tailor it to our generalized liquid association analysis problem toestimate Γ k , k = 1 , , . It consists of two major components, SVD of a matrix, and hard thresh-olding to identify important variables. We summarize the estimation procedure in Algorithm 1,then discuss each step in detail. We also note that, in our formulation, we allow the number ofvariables p k , k = 1 , , , to be much larger than the sample size n .We ﬁrst introduce some notation. For an integer p , let [ p ] denote the set { , . . . , p } . For amatrix A ∈ R p × q and index sets I ⊆ [ p ] , J ⊆ [ q ] , let A [ I,J ] denote the corresponding | I | × | J | submatrix, while the whole index set [ p ] is simpliﬁed as “:”; e.g., A [[ p ] ,J ] = A [: ,J ] . Let (cid:101) ∆ k and ∆ k denote the mode- k matricization of the tensors ∆ and (cid:101) ∆ , k = 1 , , . Deﬁne Γ − = lgorithm 1 Generalized liquid association analysis via sparse tensor decomposition.

Input : The Tucker ranks r k ≤ p k , and the sparsity parameters ( η k , (cid:101) η k ) , k = 1 , , . Step 1, initialization : Compute the sample estimate (cid:101) ∆ = n − (cid:80) ni =1 X i ◦ Y i ◦ Z i . Obtain theinitial active set (cid:98) I (0) k , and the initial basis matrices by, (cid:98) I (0) k = (cid:110) j : (cid:107) ( (cid:101) ∆ k ) [ j, :] (cid:107) max > η k (cid:111) , (cid:98) Γ (0) k = SVD (cid:110) D (cid:98) I (0) k (cid:101) ∆ k D (cid:98) I (0) − k (cid:111) , k = 1 , , . repeatStep 2a : Update the active set: (cid:98) I ( t ) k = (cid:110) j : (cid:107) ( (cid:101) ∆ k ) [ j, :] (cid:98) Γ ( t ) − k (cid:107) > (cid:101) η k (cid:111) , k = 1 , , . Step 2b : Perform SVD: (cid:98) Γ ( t ) k = SVD (cid:110) D (cid:98) I ( t ) k (cid:101) ∆ k (cid:98) Γ ( t ) − k (cid:111) ∈ R p k × r k , k = 1 , , . until the stopping criterion is met. Output : The estimated basis matrices (cid:98) Γ k , k = 1 , , , and (cid:98) ∆ = (cid:101) ∆ × P (cid:98) Γ × P (cid:98) Γ × P (cid:98) Γ . Γ ⊗ Γ , Γ − = Γ ⊗ Γ , Γ − = Γ ⊗ Γ , where ⊗ is the Kronecker product. Next, we deﬁnethe active sets of variables in the context of generalized liquid association analysis as, I k = (cid:8) j : ( ∆ k ) [ j, :] (cid:54) = 0 , ≤ j ≤ p k (cid:9) ⊆ [ p k ] , k = 1 , , . (4)As an example, the j th variable X j in X corresponds to the j th row of Γ ∈ R p × r , j =1 , . . . , p . Therefore, variable selection in X translates to the row-wise sparsity in Γ , andcorrespondingly, the row-wise sparsity in ∆ ∈ R p × p p . Deﬁne the diagonal matrix D I k ∈ R p k × p k that has one on the i th diagonal element if i ∈ I k and zero elsewhere. This matrixrepresents variable selection along each mode, and is used repeatedly in our estimation algo-rithm. Deﬁne D I − = D I ⊗ D I , whereas I − denotes the pair of subsets I and I . Deﬁne D I − , D I − , I − , I − similarly. Also deﬁne (cid:98) Γ − k , (cid:98) I k , k = 1 , , , in a similar fashion.We start the algorithm by computing the sample estimate (cid:101) ∆ , then perform the initial selectionof important variables and initial SVD in Step 1 of Algorithm 1. From (4), we see that theselection of important variables can be achieved based on (cid:107) ( ∆ k ) [ j, :] (cid:107) for some appropriate norm (cid:107) · (cid:107) . In the initialization step, we employ the maximum norm, and achieve the selection by hardthresholding under the sparsity parameter η k . The two diagonal matrices D (cid:98) I (0) k and D (cid:98) I (0) − k operateas the subset selection operator within the SVD operator. Moreover, depending on the sparsityparameter η k , we may keep all the variables in the active set, i.e., (cid:98) I k = [ p k ] .Next, we iterate the algorithm by repeatedly selecting important variables and performingSVD in Step 2 of Algorithm 1. We continue to do the selection by hard thresholding, but we use11 different norm, i.e., the (cid:96) -norm rather than the max -norm, and a different sparsity parameter (cid:101) η k . This change of the norm is practically useful because of the following consideration. In theinitialization step, the column dimension of (cid:101) ∆ k is (cid:81) k (cid:48) (cid:54) = k p k (cid:48) and is often very large, and thusthe maximum norm is more effective in screening out the zero rows in (cid:101) ∆ k . On the other hand,during the iterations, the active variable set I k is selected based on (cid:101) ∆ k Γ − k , which now has a muchsmaller column dimension, which is (cid:81) k (cid:48) (cid:54) = k r k (cid:48) . As such, the (cid:96) -norm is preferred to being able topick up potentially weaker signals and to reﬁne the selection from the initialization. Moreover, adifferent thresholding parameter (cid:101) η k during the iterations gives more ﬂexibility.We alternate Steps 2a and 2b until some termination criterion is met. That is, we terminatethe algorithm if the consecutive estimates are close enough, in that the difference between the (cid:96) -norm of the singular values of the two iterations is smaller than − , or the algorithm reachesthe maximum number of iterations pre-speciﬁed at 100. In our numerical experiments, we ﬁndthat the algorithm converges fast, usually within 10 to 20 iterations. We output the estimatedbasis matrices (cid:98) Γ k , k = 1 , , , along with the updated estimate (cid:98) ∆ = (cid:101) ∆ × P (cid:98) Γ × P (cid:98) Γ × P (cid:98) Γ that follows a Tucker decomposition.Our algorithm is related to the STAT-SVD algorithm of Zhang and Han (2019), in that weall use hard thresholding SVD iteratively. On the other hand, Zhang and Han (2019) targeteda tensor denoising problem involving identically distributed normal errors, and used a doublethresholding scheme with a theoretical thresholding value. Their algorithm, after obtaining thevariance of the errors, became tuning-free in terms of the thresholding parameter. By contrast, weaim to obtain a low-rank tensor estimator in the context of generalized liquid association analysis.The sample estimator does not have i.i.d. entries, and we use a single thresholding scheme withtwo data-driven tuning parameters. This leads to a more ﬂexible tuning, and consequently anutterly different approach for the asymptotic analysis.The thresholding values η k and (cid:101) η k are treated as tuning parameters, and we propose a prediction-based approach for tuning. That is, we ﬁrst randomly split the data into a training set and atesting set, and obtain the sample estimates (cid:101) ∆ train and (cid:101) ∆ test separately. We then apply Al-gorithm 1 to (cid:101) ∆ train to obtain (cid:98) Γ train k ( η ) , k = 1 , , , under a given set of tuning parameters η = ( η , η , η , (cid:101) η , (cid:101) η , (cid:101) η ) . We choose η such that the following discrepancy is minimized, L ( η ) = (cid:13)(cid:13)(cid:13) (cid:101) ∆ test − (cid:101) ∆ test × P (cid:98) Γ train1 ( η ) × P (cid:98) Γ train2 ( η ) × P (cid:98) Γ train3 ( η ) (cid:13)(cid:13)(cid:13) F . (5)12eanwhile, the ranks ( r , r , r ) take some pre-speciﬁed values. In practice, r k is often takenas 1 or 2 for exploratory analysis and data visualization. This is the same in spirt as canonicalcorrelation analysis. Actually, rank selection is still an open question in both CCA and matrix ortensor denoising, and we leave a full treatment of rank selection as future research. We establish the theoretical guarantees for the estimated subspace basis matrices (cid:98) Γ k , the es-timated Tucker tensor (cid:98) ∆ = (cid:101) ∆ × P (cid:98) Γ × P (cid:98) Γ × P (cid:98) Γ , and the estimated active sets (cid:98) I ( t ) k , k = 1 , , , from Algorithm 1. We ﬁrst lay out the required regularity conditions, then derivethe non-asymptotic error bounds and the asymptotic consistency. In our theoretical analysis, weallow both the tensor dimension p = (cid:81) k =1 p k and the sparsity level s = (cid:81) k =1 s k to diverge withthe sample size n , while we ﬁx the tensor rank r = (cid:81) k =1 r k . Throughout this section, we use C and C (cid:48) to denote some generic positive constants that could vary in different contexts.We begin with three technical assumptions.(A1) (Marginal distribution). Assume | X i Y j | ≤ C for some constant C > , i = 1 , . . . , p , j =1 , . . . , p . Assume Z k is sub-Gaussian with a constant parameter σ > , k = 1 , . . . , p .(A2) (Singular value). Let λ k denote the smallest non-zero singular value of ∆ k , k = 1 , , , and λ = min { λ , λ , λ } . Assume λ > max { C (cid:112) s log p/n, C (cid:48) } for some constants C, C (cid:48) > .(A3) (Signal strength). Let δ min = min k ∈{ , , } ,i ∈ I k (cid:107) ( ∆ k ) [ i, :] (cid:107) denote the minimal signal strength.Assume (cid:112) s log p/n = o ( δ min ) .We view these assumptions generally mild and reasonable. Particularly, Assumption (A1) re-quires X and Y to be bounded and Z to be sub-Gaussian, which are necessary to establish theconcentration of each element in (cid:101) ∆ to its population counterpart. The sub-Gaussian assumptionis weaker than the normality assumption, and is widely used in high-dimensional non-asymptoticanalysis (see, e.g., Rudelson and Vershynin, 2010; Wainwright, 2019). Besides, it assumes eachindividual Z k to be sub-Gaussian, which is weaker than assuming the joint distribution Z is sub-Gaussian. Moreover, the constant σ > does not require all Z k to have the same variance. Forinstance, if Z k is sub-Gaussian with parameter σ k , k = 1 , . . . , p , then we can let σ = max k σ k

13o have Assumption (A1) satisﬁed. Assumption (A2) ensures that there is a reasonable gap be-tween the zero and nonzero eigenvalues in ∆ k , under which the consistency for the estimator (cid:98) Γ k is ensured. This type of assumption on the eigenvalues is frequently used in high-dimensionalsingular value decomposition literature (Yu et al., 2015; Yang et al., 2016; Zhang and Han, 2019).Assumption (A3) guarantees that the signal of the important variables is of a reasonable magni-tude when n, p, s diverge, which in turn ensures the selection consistency. Note that δ min is alsothe minimal Frobenius norm of the non-zero slices in ∆ , i.e., slices of ∆ corresponding to thosevariables i ∈ I k , k = 1 , , . This assumption is very mild. If we assume the nonzero entries of ∆ are bounded away from zero, then this assumption is satisﬁed.Next, we derive the non-asymptotic error bounds of the generalized liquid association analysisestimators. Let (cid:98) ∆ and (cid:98) Γ k , k = 1 , , , denote the estimators returned from Algorithm 1, using thetheoretical thresholding values η k = (cid:112) α log p/n and (cid:101) η k = αs − k log p/n , where α > C + σ ) is a sufﬁciently large constant, and C, σ are as deﬁned in Assumption (A1). Moreover, since thebasis matrix (cid:98) Γ k is identiﬁable only up to an orthogonal rotation, we characterize its convergencein terms of the corresponding projection matrix P (cid:98) Γ k , k = 1 , , . Theorem 1 (Non-asymptotic error bound) . Suppose Assumptions (A1) and (A2) hold. Then, withprobability at least − C max (cid:20) p − γ , p − (cid:110) √ n ( γ +1) / (2 log p ) − (cid:111) (cid:21) , for some constant γ > ,(a) (cid:107) (cid:98) ∆ − ∆ (cid:107) F ≤ C (cid:112) s log p/n ;(b) max k =1 , , (cid:107) P (cid:98) Γ k − P Γ k (cid:107) F ≤ C (cid:112) s log p/n . When the sample size is sufﬁciently large, i.e., when n (cid:29) γ + 1) log p , both statements inTheorem 1 hold with probability − Cp − γ . This probability goes to one as p diverges with n .We also brieﬂy comment that, although we establish Theorem 1 for an order-3 tensor, the resultscan be straightforwardly extended for a general order tensor.Next, we establish the asymptotic consistency as n, p, s diverge to inﬁnity. Theorem 2 (Asymptotic consistency) . Suppose Assumptions (A1) to (A3) hold. Then, with prob-ability tending to one, as n, p, s → ∞ , we have,(a) (cid:107) (cid:98) ∆ − ∆ (cid:107) F → ;(b) (cid:107) P (cid:98) Γ k − P Γ k (cid:107) F → , k = 1 , , ; c) (cid:98) I ( t ) k = I k , k = 1 , , , and for iterations t = 0 , , . . . , t max . Theorem 2 establishes the consistency of tensor parameter estimation, subspace estimation, aswell as variable selection, under s log p = o ( n ) , which allows each tensor dimension p k to divergefaster than the growing sample size n . Moreover, the consistent variable selection is establishedfor every iteration of Algorithm 1. Note that the maximum number of iterations t max in Theorem 2(c) is treated as a constant regardless of the tensor dimension or sample size. In practice, wehave observed that the algorithm often converges within 10 to 20 iterations. As such, we feelit reasonable to treat t max as a constant in our theoretical analysis. On the other hand, we canextend the result by allowing t max to diverge as well. For instance, parallel to the arguments inZhang and Han (2019), we can let t max diverge at the rate of o ( p ) .Theorems 1 and 2 provide the theoretical guarantees for our proposed generalized liquidassociation analysis in the high dimensional setting. At the same time, these results may be ofindependent interest for more general tensor estimation problems. We carry out the simulations to investigate the empirical performance of the proposed generalizedliquid association analysis (GLAA) method. We consider three scenarios. In the ﬁrst scenario,we ﬁx the dimension of Z at p = 1 , and increase the dimensions of X and Y as p = p = { , , , , } . In the second scenario, we ﬁx p = p = 100 , and increase p = { , , , , } . In both cases, we ﬁx the sample size at n = 500 . In the third scenario, weﬁx p = 100 , p = 25 , p = 1 , and increase the sample size n = { , , , , } .We generate the data in the following way. For i = 1 , . . . , n , we ﬁrst generate Z i from anormal distribution with mean zero and covariance I p . We then generate ( X i , Y i ) jointly from anormal distribution with mean zero and covariance, cov( X , Y | Z = Z i ) = (cid:18) Σ X Γ f ( Γ (cid:62) Z i ) Γ (cid:62) Γ f (cid:62) ( Γ (cid:62) Z i ) Γ (cid:62) Σ Y (cid:19) . To ensure the positive-deﬁniteness of this covariance matrix, we set Γ = Σ / X ( O (cid:62) , ) (cid:62) and Γ = Σ / Y ( O (cid:62) , ) (cid:62) , where O = O ∈ R × , with the ﬁrst column being (1 , , , , , , . . . , / , and the second column being (0 , , , − , , , . . . , / √ . As a result, in this example, for X and Y , the ranks are r = r = 2 , and the sparsity levels are s = s = 5 . The marginal co-variance matrix Σ X is set as a block diagonal matrix, Σ X = bdiag( Σ X , , Σ X , ) , where Σ X , ∈ R s × s corresponds to the active variables in X and takes the form of an AR structure such thatits ( i, j ) th entry equals σ ij = 0 . | i − j | , i, j = 1 , . . . , s , and Σ X , ∈ R ( p − s ) × ( p − s ) is the iden-tity matrix. The marginal covariance matrix Σ Y is constructed in a similar fashion. The matrix f ( Γ (cid:62) Z i ) is set as a diagonal matrix, diag { f ( Γ (cid:62) Z i ) , f ( Γ (cid:62) Z i ) } , where f ( a ) = 0 . a ) and f ( a ) = 0 . a ) . In the Supplementary Appendix, we consider additional simulationsusing f ( a ; ρ, ξ ) = ρ { / (1 + e − ξa ) − } with different parameters < ρ ≤ and ξ > thatcontrol the magnitude and speed of changes in cov( X , Y | Z ) . For the ﬁrst and the third scenar-ios, p = 1 and thus Γ = 1 . For the second scenario, where p varies from to , we set Γ = (1 , , , , , , . . . , / √ , with s = 5 .When applying the proposed method, we choose η and η , which are the thresholding pa-rameters only used in the initialization step of the algorithm, so that about half of the variables in X and in Y are kept for subsequent iterations. We then tune (cid:101) η and (cid:101) η , the thresholding parame-ters used in iterative sparse SVD, by cross-validation over a grid of candidate values. We choosethe ranks r , r , i.e., the numbers of linear combinations for X and Y , to be one, which is mostcommonly used in canonical correlation analysis as well.There is no existing method designed to directly address our targeting problem. For thepurpose of comparison, we consider three relevant solutions. The ﬁrst solution we consideris a naive and marginal extension of the univariate liquid association (ULA) method from Li(2002). That is, we construct a tensor estimator (cid:101) Φ , each entry of which is the sample univariateLA for the triplet ( X i , Y i , Z i ) as deﬁned in Li (2002), i = 1 , . . . , p , i = 1 , . . . , p , i =1 , . . . , p . We then perform the usual HOSVD to each matricization of (cid:101) Φ under the given rankto obtain the estimates of basis matrices; i.e., SVD (cid:8) (cid:101) Φ ( k ) (cid:9) , k = 1 , , . The second and thirdsolutions we consider are two different versions of canonical correlation analysis, the penalizedmatrix decomposition (PMD) method of Witten et al. (2009), and the sparse canonical correlationanalysis (SCCA) method of Mai and Zhang (2019). We have chosen these two versions due totheir computational simplicity and superior empirical performance. We note that CCA is notdesigned to incorporate the third set of variables Z . We thus simply take the ﬁrst r and r directions of X and Y from CCA as the estimated basis matrices corresponding to Γ and Γ .16e evaluate the performance of each method in terms of the variable selection accuracy and thesubspace estimation accuracy.For variable selection, we record the true positive rate (TPR) and false positive rate (FPR) foreach mode. Recall from (4), the active set of variables is I k , which is also the index set of nonzerorows in Γ k . Let (cid:98) I k denote the estimated active set corresponding to (cid:98) Γ k , then TPR- k = | I k ∩ (cid:98) I k | /s k and FPR- k = | I ck ∩ (cid:98) I k | / ( p k − s k ) , k = 1 , , . For GLAA, PMD and SCCA, we estimate theactive set as (cid:98) I k = (cid:110) i : there exist non-zero elements in the i th row of (cid:98) Γ k (cid:111) . For ULA, it does notperform any variable selection. For the purpose of comparison, we simply calculate the (cid:96) -normof each row for the k th matricization (cid:101) Φ ( k ) , arrange the row indices in a descending order by the (cid:96) -norms, then select the ﬁrst s k rows for each mode, k = 1 , , . Of course, the informationabout s k is generally unknown in practice, and this solution utilizes such knowledge. Even so, aswe show later, ULA is still far less effective compared to the proposed GLAA method.For subspace estimation, we compute the average distance between the true and the estimatedsubspaces, D = (cid:80) ˜ kk =1 D ( Γ k , (cid:98) Γ k ) / ˜ k , where D ( Γ k , (cid:98) Γ k ) = (cid:13)(cid:13) P Γ k − P (cid:98) Γ k (cid:13)(cid:13) F / √ r k . For GLAA andULA, this distance measure is averaged over all three modes of X , Y , Z , so ˜ k = 3 . For PMDand SCCA, it is averaged over the ﬁrst two modes X , Y , so ˜ k = 2 . By deﬁnition, this distancemeasure is between 0 and 1, where 0 indicates a perfect subspace recovery. Tables 1–3 summarize the simulation results over 100 data replications for the three scenarios.Table 1 reports the accuracy of variable selection and subspace estimation when the dimen-sion p = p of X and Y increases. It is clearly seen that GLAA dominates all the competingsolutions. For ULA, variable selection and subspace estimation are carried out separately. Evenwith the oracle knowledge of the true sparsity level, the naive variable selection of ULA stillperforms worse, as it only utilizes the marginal information of each mode. Besides, the estimatedsubspace is distant away from the true subspace, especially when p , p are large. Moreover, wesee that, as p , p increase, the performance of ULA degrades fast, while GLAA remains compet-itive. For PMD and SCCA, both suffer large false positive rates in selection, while the estimatedsubspaces are almost orthogonal to the true subspace, as reﬂected by the distance measure D that is almost one. This is due to the fact that, by design, neither method takes into account theconditioning variable Z information when studying the association between X and Y .17 , p Method TPR-1 FPR-1 TPR-2 FPR-2 D

100 GLAA 1.000 (0.000) 0.000 (0.000) 1.000 (0.000) 0.000 (0.000) 0.095 (0.002)ULA 0.998 (0.002) 0.000 (0.000) 0.996 (0.003) 0.000 (0.000) 0.776 (0.002)PMD 0.804 (0.028) 0.735 (0.029) 0.802 (0.029) 0.731 (0.028) 0.971 (0.002)SCCA 0.584 (0.030) 0.626 (0.027) 0.634 (0.030) 0.629 (0.027) 0.989 (0.001)200 GLAA 1.000 (0.000) 0.000 (0.000) 1.000 (0.000) 0.000 (0.000) 0.100 (0.003)ULA 0.928 (0.010) 0.002 (0.000) 0.944 (0.009) 0.001 (0.000) 0.873 (0.002)PMD 0.766 (0.033) 0.731 (0.029) 0.762 (0.033) 0.727 (0.029) 0.989 (0.001)SCCA 0.596 (0.031) 0.611 (0.027) 0.626 (0.035) 0.611 (0.027) 0.993 (0.001)300 GLAA 1.000 (0.000) 0.000 (0.000) 1.000 (0.000) 0.000 (0.000) 0.100 (0.003)ULA 0.808 (0.016) 0.003 (0.000) 0.806 (0.016) 0.003 (0.000) 0.945 (0.003)PMD 0.718 (0.032) 0.693 (0.030) 0.730 (0.034) 0.696 (0.029) 0.992 (0.001)SCCA 0.670 (0.028) 0.651 (0.019) 0.624 (0.028) 0.651 (0.018) 0.997 (0.000)400 GLAA 0.932 (0.024) 0.008 (0.004) 0.930 (0.025) 0.008 (0.004) 0.186 (0.025)ULA 0.652 (0.018) 0.004 (0.000) 0.678 (0.019) 0.004 (0.000) 0.980 (0.002)PMD 0.798 (0.029) 0.766 (0.026) 0.800 (0.029) 0.762 (0.026) 0.994 (0.001)SCCA 0.594 (0.022) 0.601 (0.010) 0.590 (0.024) 0.602 (0.010) 0.997 (0.000)500 GLAA 0.848 (0.032) 0.041 (0.010) 0.848 (0.033) 0.040 (0.009) 0.304 (0.037)ULA 0.526 (0.021) 0.005 (0.000) 0.526 (0.020) 0.005 (0.000) 0.989 (0.001)PMD 0.788 (0.031) 0.727 (0.027) 0.794 (0.030) 0.729 (0.028) 0.995 (0.001)SCCA 0.532 (0.024) 0.518 (0.008) 0.532 (0.024) 0.516 (0.008) 0.997 (0.000)

Table 1: Variable selection accuracy, measured by TPR and FPR, and the subspace estimationaccuracy, measured by D , for Scenario 1 where p = p varies. The reported are the averagecriteria, with the standard errors in the parenthesis, over 100 data replications.Table 2 reports the variable selection and subspace estimation accuracy when the dimension p of Z increases. In this case, our goal is to estimate the subspaces spanned by Γ , Γ , Γ accurately, meanwhile select the variables in X and Y accurately. It is seen that GLAA againoutperforms all other methods considerably. Besides, it shows a competitive performance ofGLAA even with a relatively large dimensionality of Z . This also complements our real dataexample where the dimension of Z is one.Table 3 reports the variable selection and subspace estimation accuracy when the sample size n increases. The size of n we examine is comparable to that in our multimodal PET example.It is seen that GLAA performs the best, even under a relatively small sample size. Moreover,the performances of all methods improve as n increases. However, ULA suffers a poor subspaceestimation accuracy, while both PMD and SCCA continue to suffer both high false positive ratesand poor subspace estimation accuracy, even for a relatively large sample size.18 Method TPR-1 FPR-1 TPR-2 FPR-2 D

20 GLAA 1.000 (0.000) 0.000 (0.000) 1.000 (0.000) 0.000 (0.000) 0.132 (0.004)ULA 0.642 (0.019) 0.019 (0.001) 0.638 (0.019) 0.019 ( 0.001) 0.872 (0.003)PMD 0.774 (0.031) 0.733 (0.029) 0.776 (0.031) 0.732 (0.029) 0.978 (0.002)SCCA 0.518 (0.034) 0.534 (0.029) 0.536 (0.032) 0.532 (0.030) 0.989 (0.001)40 GLAA 1.000 (0.000) 0.000 (0.000) 1.000 (0.000) 0.000 (0.000) 0.131 (0.004)ULA 0.488 (0.020) 0.027 (0.001) 0.498 (0.020) 0.026 (0.001) 0.888 (0.002)PMD 0.774 (0.031) 0.695 (0.030) 0.772 (0.032) 0.695 (0.029) 0.973 (0.002)SCCA 0.590 (0.034) 0.596 (0.031) 0.624 (0.033) 0.589 (0.031) 0.987 (0.001)60 GLAA 1.000 (0.000) 0.000 (0.000) 1.000 (0.000) 0.000 (0.000) 0.136 (0.004)ULA 0.384 (0.020) 0.032 (0.001) 0.404 (0.020) 0.031 (0.001) 0.898 (0.002)PMD 0.748 (0.030) 0.694 (0.030) 0.754 (0.033) 0.701 (0.030) 0.973 (0.002)SCCA 0.564 (0.031) 0.537 (0.029) 0.572 (0.031) 0.541 (0.029) 0.985 (0.002)80 GLAA 0.998 (0.002) 0.000 (0.000) 0.998 (0.002) 0.000 (0.000) 0.145 (0.005)ULA 0.378 (0.021) 0.033 (0.001) 0.368 (0.019) 0.033 (0.001) 0.903 (0.001)PMD 0.706 (0.033) 0.689 (0.031) 0.760 (0.032) 0.688 (0.031) 0.975 (0.002)SCCA 0.624 (0.031) 0.613 (0.027) 0.592 (0.033) 0.609 (0.027) 0.987 (0.002)100 GLAA 0.996 (0.003) 0.005 (0.005) 0.996 (0.003) 0.005 (0.005) 0.158 (0.010)ULA 0.332 (0.020) 0.035 (0.001) 0.360 (0.018) 0.034 (0.001) 0.905 (0.001)PMD 0.712 (0.034) 0.633 (0.033) 0.662 (0.038) 0.633 (0.033) 0.974 (0.002)SCCA 0.560 (0.033) 0.565 (0.030) 0.586 (0.032) 0.567 (0.030) 0.989 (0.001)

Table 2: Variable selection accuracy, measured by TPR and FPR, and the subspace estimationaccuracy, measured by D , for Scenario 2 where p varies. The reported are the average criteria,with the standard errors in the parenthesis, over 100 data replications. We revisit the multimodal PET study introduced in Section 1.1. It is part of the ongoing BerkeleyAging Cohort Study that targets Alzheimer’s disease (AD) as well as normal aging. AD is anirreversible neurodegenerative disorder and the leading form of dementia. It is characterized byprogressive impairment of cognitive and memory functions, then loss of bodily functions, andultimately death. AD and related dementia currently affects more than 10% of adults aged 65 orolder, and the prevalence is continuously growing. It has now become an international imperativeto understand, diagnose, and treat this disorder (Alzheimer’s Association, 2020).The data consist of n = 81 elderly subjects, with the average age . years, and the stan-dard deviation . years. For each subject, three types of neuroimages were acquired, including19 Method TPR-1 FPR-1 TPR-2 FPR-2 D

60 GLAA 0.606 (0.025) 0.048 (0.008) 0.664 (0.027) 0.154 (0.021) 0.743 (0.015)ULA 0.410 (0.022) 0.031 (0.001) 0.362 (0.018) 0.160 (0.004) 0.920 (0.004)PMD 0.498 (0.040) 0.486 (0.036) 0.552 (0.039) 0.466 (0.036) 0.957 (0.004)SCCA 0.604 (0.023) 0.059 (0.012) 0.688 (0.023) 0.643 (0.017) 0.968 (0.002)80 GLAA 0.638 (0.027) 0.018 (0.002) 0.738 (0.025) 0.137 (0.027) 0.662 (0.021)ULA 0.550 (0.021) 0.024 (0.001) 0.412 (0.018) 0.147 (0.005) 0.895 (0.004)PMD 0.494 (0.038) 0.441 (0.034) 0.502 (0.038) 0.429 (0.034) 0.953 (0.004)SCCA 0.672 (0.026) 0.665 (0.016) 0.716 (0.025) 0.718 (0.019) 0.966 (0.003)100 GLAA 0.820 (0.022) 0.013 (0.004) 0.848 (0.020) 0.060 (0.016) 0.483 (0.024)ULA 0.686 (0.019) 0.017 (0.001) 0.534 (0.021) 0.117 (0.005) 0.870 (0.004)PMD 0.510 (0.035) 0.476 (0.032) 0.560 (0.036) 0.442 (0.033) 0.947 (0.004)SCCA 0.702 (0.027) 0.677 (0.021) 0.732 (0.025) 0.714 (0.024) 0.961 (0.003)120 GLAA 0.896 (0.018) 0.004 (0.001) 0.914 (0.018) 0.010 (0.003) 0.350 (0.021)ULA 0.802 (0.017) 0.010 (0.001) 0.622 (0.018) 0.094 (0.004) 0.850 (0.003)PMD 0.576 (0.035) 0.564 (0.032) 0.662 (0.035) 0.530 (0.032) 0.941 (0.004)SCCA 0.640 (0.030) 0.657 (0.024) 0.708 (0.028) 0.705 (0.022) 0.970 (0.003)160 GLAA 0.990 (0.004) 0.001 (0.000) 0.984 (0.005) 0.002 (0.001) 0.209 (0.012)ULA 0.920 (0.011) 0.004 (0.001) 0.732 (0.016) 0.067 (0.004) 0.812 (0.003)PMD 0.610 (0.037) 0.570 (0.034) 0.652 (0.038) 0.544 (0.035) 0.944 (0.004)SCCA 0.626 (0.031) 0.651 (0.024) 0.698 (0.029) 0.0691 (0.024) 0.969 (0.003)

Table 3: Variable selection accuracy, measured by TPR and FPR, and the subspace estimationaccuracy, measured by D , for Scenario 3 where n varies. The reported are the average criteria,with the standard errors in the parenthesis, over 100 data replications.a Pittsburgh Compound B (PiB) PET scan that measures amyloid-beta protein, an AV-1451 PETscan that measures tau protein, and a 1.5T structural MRI scan that was used for coregistration.Both PET images have been preprocessed. Speciﬁcally, for PiB PET, the images were attenu-ation corrected, and reconstructed using an ordered subset expectation maximization algorithmwith weighted attenuation and scatter correction. Native-space voxel-wise distribution volumeratio images were generated using Logan graphical analysis, with 35-90 min post-injection andcerebellar grey matter reference region (Logan et al., 1996). For AV-1451 PET, the images weresynthesized and reconstructed in a similar fashion as PiB imaging. Native-space standardizeduptake value ratio images were generated, with 80-100 min post-injection and inferior cerebellargrey matter reference region. MRI images were normalized to the FSL MNI152 2mm space tem-plate using Advanced Normalization Tools and a study-speciﬁc intermediate template (Lockhartet al., 2017). Native-space PET images were both coregistered to each participant’s MRI image.20ransformations were concatenated and applied to the coregistered AV-1451 and PiB PET imagesto generate MNI-space PET images. Moreover, we created and applied a mask representing vox-els likely to accumulate cortical amyloid and tau pathology. To do this, we intersected a corticalbrain mask from the Automated Anatomical Labeling atlas (Tzourio-Mazoyer et al., 2002), witha mask of high-probability grey matter voxels from the SPM12 tissue probability map. It was alsomasked to include only voxels where the coefﬁcient of variation in the signal across participantsin either PET modality was ≤ . , and the mean signal across participants was ≥ . . Froma FreeSurfer segmentation of the MNI152 template structural MRI, a set of MNI-space testingregions were created that encompass Braak I-IV stage regions for AV-1451, and Braak I-V stageregions for PiB, excluding basal ganglia and thalamus (Sch¨oll et al., 2016). These testing regionshave been designed to focus on areas of possible tau and A β accumulations. MNI-space PiB andAV-1451 PET images were masked by this cortical mask before data analysis. This results in p = 60 regions-of-interest for amyloid-beta PET, and p = 26 regions for tau-PET. One of the primary goals of this study is to identify brain regions where the association ofamyloid-beta and tau changes the most as age varies, and to further understand this associa-tion change. We cast this problem in the framework of liquid association analysis. Let X ∈ R , Y ∈ R denote the amyloid-beta accumulation and tau accumulation in various brain regions,respectively, and Z ∈ R denote the subject’s age. We ﬁrst log-transform each variable in X and Y , and standardize X , Y and Z marginally. We then apply the proposed generalized liquidassociation analysis (GLAA) method to this data.After obtaining the two estimated linear combinations (cid:98) Γ (cid:62) X and (cid:98) Γ (cid:62) Y , we plot them as thevalue of Z changes. We divide the interval of Z into six equal-sized intervals with overlaps,then draw the scatterplot of (cid:98) Γ (cid:62) Y versus (cid:98) Γ (cid:62) X within each interval. We also add a ﬁtted linearregression line in each panel to reﬂect the correlation between (cid:98) Γ (cid:62) X and (cid:98) Γ (cid:62) Y . Figure 1 showsthe trellis plots, where the stripe at the top of each panel represents the range of Z it covers. It isinteresting to see from the GLAA estimation, the correlation between Γ (cid:62) X and Γ (cid:62) Y changesfrom negative to positive gradually, as the age variable Z increases. This may be due to differentdeposition patterns of amyloid-beta and tau. In particular, amyloid-beta plaques are detectable inthe brain many years before dementia onset, while tau neuroﬁbrillary tangles aggregate specif-21 LAA Γ x Γ T y −505 −1 0 1 2 3 Age Age −1 0 1 2 3

AgeAge −1 0 1 2 3

Age −505

Age

ULA Γ x Γ T y −50510 0 5 10 Age Age

AgeAge

Age −50510

Age

PMD Γ x Γ T y −505 −15 −10 −5 0 5 Age Age −15 −10 −5 0 5

AgeAge −15 −10 −5 0 5

Age −505

Age

SCCA Γ x Γ T y −3−2−1012 −3 −2 −1 0 1 Age Age −3 −2 −1 0 1

AgeAge −3 −2 −1 0 1

Age −3−2−1012

Age

Figure 1: Trellis plots of the estimated linear combinations (cid:98) Γ (cid:62) Y versus (cid:98) Γ (cid:62) X as Z varies. Eachpanel represents an interval of Z , with a linear regression line added. The methods under compar-ison are: generalized liquid association analysis (GLAA), univariate liquid association (ULA),penalized matrix decomposition (PMD), and sparse canonical correlation analysis (SCCA).ically in the medial temporal lobes in normal aging. The spread of tau out of medial temporallobes and into the surrounding isocortex at elder age coincides with cognitive impairment, andthe process is hypothesized to be potentiated or accelerated by the presence of amyloid-beta (Heet al., 2018; Vogel et al., 2020). The change from a negative association in early years to a pos-itive association in later years between amyloid-beta and tau found by our GLAA method mayoffer some support to this hypothesis. As a comparison, no clear changing pattern is observedfrom the other three estimation methods.Next, we examine more closely the brain regions identiﬁed by GLAA that demonstrate dy-namic association patterns. Figure 2 plots the loadings of the estimated (cid:98) Γ and (cid:98) Γ , where theindices of non-zero loadings correspond to the selected regions. The number of non-zero loading22 L AA U L AP M D S CC A Index

Load i ng Loadings of Γ G L AA U L AP M D S CC A Index

Load i ng Loadings of Γ Figure 2: Estimated loadings in (cid:98) Γ and (cid:98) Γ . The number of non-zero loading entries estimated byGLAA, ULA, PMD, SCCA are , , , for (cid:98) Γ , and , , , for (cid:98) Γ , respectively.entries estimated by GLAA, ULA, PMD, SCCA are , , , for (cid:98) Γ , and , , , for (cid:98) Γ ,respectively. Note that, the ULA method does not deal with variable selection, and for the realdata, no information on the true sparsity level is known, so its estimated loadings are non-sparse.Moreover, the PMD method yields a large number of non-zero estimates, making the interpreta-tion difﬁcult. The SCCA method selects about the same number of non-zero regions as GLAA,but the selected regions are less meaningful and are difﬁcult to interpret.Table 4 reports the identiﬁed brain regions by GLAA for amyloid-beta and tau, respectively,while Figure 3 visualizes those regions on a template brain using BrainNet Viewer (Xia et al., Modality Identiﬁed regionsamyloid-beta Entorhinal R Entorhinal L Hippocampus R Hippocampus L Amygdala ROrbitofrontal L Posterior Cingulate L Middle Frontal Rtau Entorhinal R Entorhinal L Hippocampus R Parahippocampal R Fusiform LMiddle Temporal R Middle Temporal L Insula L Rostral Anterior Cingulate R

Table 4: Identiﬁed brain region names for amyloid-beta and tau by GLAA. Regions in the lefthemisphere are denoted by “L”, and regions in the right hemisphere are denoted by “R”.23 a) amyloid-beta (b) tau

Figure 3: Identiﬁed brain regions for amyloid-beta and tau by GLAA.2013). Many of these regions are known to be closely related to AD, and the dynamic associationsbetween amyloid-beta and tau of those regions reveal interesting and new insights. Particularly,for both amyloid-beta and tau, the identiﬁed regions include hippocampus and entorhinal cortex.The hippocampus is a major component of the human brain located in the medial temporal lobe,and is functionally involved in response inhibition, episodic memory, and spatial cognition. It isone of the ﬁrst brain regions to suffer damage from AD, and is the best established biomarkerfor AD (Jack et al., 2011). The entorhinal cortex is a brain area also located in the medialtemporal lobes, and functions as a hub in a widespread network for memory, navigation and theperception of time. The entorhinal cortex is the main interface between the hippocampus andneocortex, and together with hippocampus, plays an important role in memories. Atrophy inthe entorhinal cortex has been consistently reported in AD (Pini et al., 2016). Moreover, animalmodels have suggested that neuroﬁbrillary tangles of tau ﬁrst appear in the entorhinal cortex, thenspread to the hippocampus (Cho et al., 2016). For amyloid-beta, other identiﬁed regions includeamygdala, orbitofrontal cortex, posterior cingulate cortex and areas of middle frontal cortices.The amygdala locates within the temporal lobes of the brain, and performs a primary role in theprocessing of memory, decision-making and emotional responses. Amygdala atrophy is foundprominent in early AD (Poulin et al., 2011). The orbitofrontal cortex locates in the frontal lobesof the brain, and is involved in the cognitive process of decision-making, while the posteriorcingulate cortex is part of the cingulate cortex, is one of the most metabolically active brainregions, and is linked to emotion and memory. Atrophy of both regions and the middle frontal24ortices have all been found associated with AD (Van Hoesen et al., 2000; Minoshima et al.,1997; Pini et al., 2016). For tau, other identiﬁed regions include parahippocampal gyrus, middletemporal gyrus, fusiform, insula, and rostral anterior cingulate cortex. The parahippocampalgyrus is a region surrounding the hippocampus, and plays an important role in memory encodingand retrieval. Atrophy in the parahippocampal gyrus has been identiﬁed as an early biomarkerof AD (Echavarri et al., 2011). Middle temporal gyrus is located on the temporal lobes, andis connected with processes of recognition of known faces and accessing word meaning whilereading. The fusiform is located above the parahippocampal gyrus, and is linked with variousneural pathways related to recognition. The insula is a portion of the cerebral cortex foldeddeep within the lateral sulcus, and is involved in consciousness and diverse functions linkedto emotion. The rostral anterior cingulate cortex is frontal part of the cingulate cortex, and isinvolved in higher-level functions, such as attention allocation, decision-making and emotion.There have been evidences suggesting the associations between these regions and AD (Convitet al., 2000; Pini et al., 2016).In summary, GLAA identiﬁes interesting dynamic association patterns among a number ofimportant brain regions between amyloid-beta and tau as age increases. Moreover, GLAA pro-vides a useful dimension reduction tool to help visualize such patterns.

In this article, we propose generalized liquid association analysis, which offers a new angle tostudy three-way associations among random variables, and is particularly useful for multimodalintegrative data analysis. Next, we discuss some potential extensions.First, we begin with the situation when there is a univariate and categorical Z , whereas theanalysis so far has primarily concentrated on the case when each variable in Z is continous. Ingeneral, it remains an open question on how to deﬁne liquid association for a categorical variable,since the function g ( z ) is no longer differentiable for a categorical Z . For a binary Z ∈ { , } , wepropose to replace the derivative of the conditional mean function with the absolute change in theconditional means across the two groups, i.e., LA(

X, Y | Z ) = | E( XY | Z = 1) − E( XY | Z = 0) | ,where the absolute value is used because the class labels are interchangeable. This naturally ﬁtsthe original interpretation of LA. Similarly, for a categorical or ordinal Z ∈ { , . . . , K } , we25an use the weighted sum of pairwise absolute mean difference between the pairs of groups.Accordingly, the liquid association of X and Y given Z is deﬁned as a p × p matrix.Next, for a multivariate mixed type Z , we ﬁrst organize Z = ( Z , Z ) T to separate the contin-uous variables, Z = ( Z , . . . , Z q ) T ∈ R q , from the categorical variables, Z = ( Z q +1 , . . . , Z p ) T ∈ R p − q . Directly imposing a low-dimensional structure on entire Z would lead to difﬁculty ininterpretation. Alternatively, we propose a dimension reduction approach, by recognizing thereduction on Z in model (1) is indeed a sufﬁcient dimension reduction model. Speciﬁcally, when Z is continuous, by model (1), we have g ( Z ) = g ( P S Z ) , where S = span( Γ ) . Therefore, g ( Z ) ⊥⊥ Z | P S Z . This leads to a sufﬁcient dimension reduction model of Z for the conditionalmean function g ( Z ) = E( XY T | Z ) , in the sense that all the mean information of the regressionof the matrix response XY T given the predictor vector Z is fully captured by the linear combi-nations Γ T Z . As such, model (1) can be viewed as a generalization of the notion of sufﬁcientmean reduction (Cook and Li, 2002). Now for the mixed type Z = ( Z T , Z T ) T , we adopt theidea of partial dimension reduction (Chiaromonte et al., 2002), or groupwise dimension reduction(Li et al., 2010), and estiamte the subspace S ⊆ R q such that g ( Z ) ⊥⊥ Z | ( P S Z , Z ) .Finally, from the simulation results in Table 2, we observe that, the higher the dimension of Z , the more challenging the problem becomes. We believe it is possible to employ an alternativemodeling strategy such as Chen et al. (2011) when the dimension of Z is ultrahigh. This is alsotrue when the scientiﬁc interest is to select important variables in Z , while our current interestconcentrates on selection of variables in X and Y , but not in Z . We leave the pursuit of this lineof research as our future work. References

Abid, A., Zhang, M. J., Bagaria, V. K., and Zou, J. (2018). Exploring patterns enriched in adataset with contrastive principal component analysis.

Nature communications , 9(1):1–7.Alzheimer’s Association (2020). 2020 Alzheimer’s disease facts and ﬁgures.

Alzheimer’s &Dementia , 16(3):391–460.Bi, X., Qu, A., and Shen, X. (2018). Multilayer tensor factorization with applications to recom-mender systems.

The Annals of Statistics , 46(6B):3308–3333.26i, X., Tang, X., Yuan, Y., Zhang, Y., and Qu, A. (2020). Tensor in statistics.

Annual Review ofStatistics and Its Application , to appear.Braak, H. and Braak, E. (1991). Neuropathological stageing of alzheimer-related changes.

ActaNeuropathologica , 82(4):239–259.Chen, J., Xie, J., and Li, H. (2011). A penalized likelihood approach for bivariate conditionalnormal models for dynamic co-expression analysis.

Biometrics , 67(1):299–308.Chiaromonte, F., Cook, R. D., Li, B., et al. (2002). Sufﬁcient dimensions reduction in regressionswith categorical predictors.

The Annals of Statistics , 30(2):475–497.Cho, H., Choi, J. Y., Hwang, M. S., Kim, Y. J., Lee, H. M., Lee, H. S., Lee, J. H., Ryu, Y. H.,Lee, M. S., and Lyoo, C. H. (2016). In vivo cortical spreading pattern of tau and amyloid inthe alzheimer disease spectrum.

Annals of Neurology , 80(2):247–258.Convit, A., [de Asis], J., [de Leon], M., Tarshish, C., Santi], S. D., and Rusinek, H. (2000).Atrophy of the medial occipitotemporal, inferior, and middle temporal gyri in non-dementedelderly predict decline to alzheimers disease.

Neurobiology of Aging , 21(1):19–26.Cook, R. D. (1998). Principal hessian directions revisited.

Journal of the American StatisticalAssociation , 93(441):84–94.Cook, R. D. (2007). Fisher lecture: Dimension reduction in regression.

Statistical Science ,22(1):1–26.Cook, R. D. and Li, B. (2002). Dimension reduction for conditional mean in regression.

TheAnnals of Statistics , 30(2):455–474.Echavarri, C., Aalten, P., Uylings, H., Jacobs, H., Visser, P., Gronenschild, E., Verhey, F.,and Burgmans, S. (2011). Atrophy in the parahippocampal gyrus as an early biomarker ofalzheimer’s disease.

Brain Structure & Function , 215:265–271.Fornito, A., Zalesky, A., and Breakspear, M. (2013). Graph analysis of the human connectome:Promise, progress, and pitfalls.

NeuroImage , 80:426–444.Gao, C., Ma, Z., Ren, Z., and Zhou, H. H. (2015). Minimax estimation in sparse canonicalcorrelation analysis.

The Annals of Statistics , 43:2168–2197.27e, Z., Guo, J. L., McBride, J. D., Narasimhan, S., Kim, H., Changolkar, L., Zhang, B., Gatha-gan, R. J., Yue, C., Dengler, C., Stieber, A., Nitla, M., Coulter, D. A., Abel, T., Brunden, K. R.,Trojanowski, J. Q., and Lee, V. M.-Y. (2018). Amyloid-beta plaques enhance alzheimer’sbrain tau-seeded pathologies by facilitating neuritic plaque tau aggregation.

Nature Medicine ,24(1):29–38.Ho, Y.-Y., Parmigiani, G., Louis, T. A., and Cope, L. M. (2011). Modeling liquid association.

Biometrics , 67(1):133–141.Jack, C. R., Barkhof, F., Bernstein, M. A., Cantillon, M., Cole, P. E., DeCarli, C., Dubois, B.,Duchesne, S., Fox, N. C., Frisoni, G. B., Hampel, H., Hill, D. L., Johnson, K., Mangin, J.-F., Scheltens, P., Schwarz, A. J., Sperling, R., Suhy, J., Thompson, P. M., Weiner, M., andFoster, N. L. (2011). Steps to standardization and validation of hippocampal volumetry asa biomarker in clinical trials and diagnostic criterion for alzheimer’s disease.

Alzheimer’s &Dementia , 7(4):474–485.e4.Kang, J., Bowman, F. D., Mayberg, H., and Liu, H. (2016). A depression network of functionallyconnected regions discovered via multi-attribute canonical correlation graphs.

NeuroImage ,141:431–441.Kolda, T. G. and Bader, B. W. (2009). Tensor decompositions and applications.

SIAM review ,51(3):455–500.Li, B. (2018).

Sufﬁcient Dimension Reduction: Methods and Applications with R . Chapman &Hall/CRC Monographs on Statistics and Applied Probability. CRC Press.Li, G. and Gaynanova, I. (2018). A general framework for association analysis of heterogeneousdata.

The Annals of Applied Statistics , 12(3):1700–1726.Li, G. and Jung, S. (2017). Incorporating covariates into integrated factor analysis of multi-viewdata.

Biometrics , 73(4):1433–1442.Li, K.-C. (1992). On principal hessian directions for data visualization and dimension reduc-tion: Another application of stein’s lemma.

Journal of the American Statistical Association ,87(420):1025–1039.Li, K.-C. (2002). Genome-wide coexpression dynamics: theory and application.

Proceedings ofthe National Academy of Sciences , 99(26):16875–16880.28i, K.-C., Liu, C.-T., Sun, W., Yuan, S., and Yu, T. (2004). A system for enhancing genome-wide coexpression dynamics study.

Proceedings of the National Academy of Sciences ,101(44):15561–15566.Li, L., Li, B., and Zhu, L.-X. (2010). Groupwise dimension reduction.

Journal of the AmericanStatistical Association , 105(491):1188–1201.Liu, J. S. (1994). Siegel’s formula via stein’s identities.

Statistics & Probability Letters ,21(3):247–251.Lock, E. F., Hoadley, K. A., Marron, J. S., and Nobel, A. B. (2013). Joint and individual variationexplained (jive) for integrated analysis of multiple data types.

The annals of applied statistics ,7(1):523.Lockhart, S. N., Sch¨oll, M., Baker, S. L., Ayakta, N., Swinnerton, K. N., Bell, R. K., Mellinger,T. J., Shah, V. D., O’Neil, J. P., Janabi, M., and Jagust, W. J. (2017). Amyloid and tau PETdemonstrate region-speciﬁc associations in normal older people.

NeuroImage , 150:191–199.Logan, J., Fowler, J. S., Volkow, N. D., Wang, G.-J., Ding, Y.-S., and Alexoff, D. L. (1996).Distribution volume ratios without blood sampling from graphical analysis of pet data.

Journalof Cerebral Blood Flow & Metabolism , 16:834–840.Mai, Q. and Zhang, X. (2019). An iterative penalized least squares approach to sparse canonicalcorrelation analysis.

Biometrics , 75(3):734–744.Minoshima, S., Giordani, B., Berent, S., Frey, K. A., Foster, N. L., and Kuhl, D. E. (1997).Metabolic reduction in the posterior cingulate cortex in very early alzheimer’s disease.

Annalsof Neurology , 42(1):85–94.Nathoo, F. S., Kong, L., and Zhu, H. (2017). Inference on high-dimensional differential correla-tion matrix. arXiv preprint arXiv:1707.07332 .Pini, L., Pievani, M., Bocchetta, M., Altomare, D., Bosco, P., Cavedo, E., Galluzzi, S., Marizzoni,M., and Frisoni, G. B. (2016). Brain atrophy in alzheimers disease and aging.

Ageing ResearchReviews , 30:25–48. Brain Imaging and Aging.Poulin, S. P., Dautoff, R., Morris, J. C., Barrett, L. F., and Dickerson, B. C. (2011). Amygdalaatrophy is prominent in early alzheimer’s disease and relates to symptom severity.

PsychiatryResearch: Neuroimaging , 194(1):7–13. 29ichardson, S., Tseng, G. C., and Sun, W. (2016). Statistical methods in integrative genomics.

Annual Review of Statistics and Its Application , 3:181–209.Rudelson, M. and Vershynin, R. (2010). Non-asymptotic theory of random matrices: extremesingular values. In

Proceedings of the International Congress of Mathematicians 2010 (ICM2010) (In 4 Volumes) Vol. I: Plenary Lectures and Ceremonies Vols. II–IV: Invited Lectures ,pages 1576–1602. World Scientiﬁc.Sch¨oll, M., Lockhart, S., Schonhaut, D., ONeil, J., Janabi, M., Ossenkoppele, R., Baker, S.,Vogel, J., Faria, J., Schwimmer, H., Rabinovici, G., and Jagust, W. (2016). PET imaging oftau deposition in the aging human brain.

Neuron , 89:971–982.Shen, R., Wang, S., and Mo, Q. (2013). Sparse integrative clustering of multiple omics data sets.

Ann. Appl. Stat. , 7(1):269–294.Shu, H., Wang, X., and Zhu, H. (2019). D-cca: A decomposition-based canonical correlationanalysis for high-dimensional datasets.

Journal of the American Statistical Association , pages1–29.Stein, C. M. (1981). Estimation of the mean of a multivariate normal distribution.

The annals ofStatistics , pages 1135–1151.Tang, C. Y., Fang, E. X., and Dong, Y. (2020). High-dimensional interactions detection withsparse principal hessian matrix.

Journal of Machine Learning Research , 21(19):1–25.Tang, X., Bi, X., and Qu, A. (2019). Individualized multilayer tensor learning with an applicationin imaging analysis.

Journal of the American Statistical Association , pages 1–26.Tzourio-Mazoyer, N., Landeau, B., Papathanassiou, D., Crivello, F., Etard, O., Delcroix, N.,Mazoyer, B., and Joliot, M. (2002). Automated anatomical labeling of activations in SPM usinga macroscopic anatomical parcellation of the MNI MRI single-subject brain.

NeuroImage ,15:273–289.Uludag, K. and Roebroeck, A. (2014). General overview on the merits of multimodal neuroimag-ing data fusion.

NeuroImage , 102, Part 1:3 – 10.Van Hoesen, G. W., Parvizi, J., and Chu, C.-C. (2000). Orbitofrontal Cortex Pathology inAlzheimer’s Disease.

Cerebral Cortex , 10(3):243–251.30ogel, J. W., Iturria-Medina, Y., and et al. (2020). Spread of pathological tau proteins throughcommunicating neurons in human alzheimer’s disease.

Nature Communications , 11(1):2612.Wainwright, M. J. (2019).

High-dimensional statistics: A non-asymptotic viewpoint , volume 48.Cambridge University Press.Witten, D. M., Tibshirani, R., and Hastie, T. (2009). A penalized matrix decomposition, withapplications to sparse principal components and canonical correlation analysis.

Biostatistics ,10(3):515–534.Xia, M., Wang, J., and He, Y. (2013). Brainnet viewer: A network visualization tool for humanbrain connectomics.

PLOS ONE , 8(7):1–15.Xia, Y., Li, L., Lockhart, S. N., and Jagust, W. J. (2019). Simultaneous covariance inference formultimodal integrative analysis.

Journal of the American Statistical Association , 0(0):1–13.Yang, D., Ma, Z., and Buja, A. (2016). Rate optimal denoising of simultaneously sparse and lowrank matrices.

The Journal of Machine Learning Research , 17(1):3163–3189.Yu, T. (2018). A new dynamic correlation algorithm reveals novel functional aspects in singlecell and bulk rna-seq data.

PLOS Computational Biology , 14:1–22.Yu, Y., Wang, T., and Samworth, R. J. (2015). A useful variant of the davis–kahan theorem forstatisticians.

Biometrika , 102(2):315–323.Zhang, A. and Han, R. (2019). Optimal sparse singular value decomposition for high-dimensionalhigh-order data.

Journal of the American Statistical Association , pages 1–34.Zhou, H., Li, L., and Zhu, H. (2013). Tensor regression with applications in neuroimaging dataanalysis.