[PDF] Learning Signal Subgraphs from Longitudinal Brain Networks with Symmetric Bilinear Logistic Regression

Abstract

Modern neuroimaging technologies, combined with state-of-the-art data processing pipelines, have made it possible to collect longitudinal observations of an individual's brain connectome at different ages. It is of substantial scientific interest to study how brain connectivity varies over time in relation to human cognitive traits. In brain connectomics, the structural brain network for an individual corresponds to a set of interconnections among brain regions. We propose a symmetric bilinear logistic regression to learn a set of small subgraphs relevant to a binary outcome from longitudinal brain networks as well as estimating the time effects of the subgraphs. We enforce the extracted signal subgraphs to have clique structure which has appealing interpretations as they can be related to neurological circuits. The time effect of each signal subgraph reflects how its predictive effect on the outcome varies over time, which may improve our understanding of interactions between the aging of brain structure and neurological disorders. Application of this method on longitudinal brain connectomics and cognitive capacity data shows interesting discovery of relevant interconnections among a small set of brain regions in frontal and temporal lobes with better predictive performance than competitors.

Full PDF

LLearning Signal Subgraphs fromLongitudinal Brain Networks with

Symmetric Bilinear Logistic Regression

Lu WangDepartment of Statistics, Central South UniversityandZhengwu ZhangDepartment of Biostatistics and Computational Biology, University of RochesterAugust 16, 2019

Abstract

Modern neuroimaging technologies, combined with state-of-the-art data process-ing pipelines, have made it possible to collect longitudinal observations of an indi-vidual’s brain connectome at diﬀerent ages. It is of substantial scientiﬁc interest tostudy how brain connectivity varies over time in relation to human cognitive traits.In brain connectomics, the structural brain network for an individual correspondsto a set of interconnections among brain regions. We propose a symmetric bilinearlogistic regression to learn a set of small subgraphs relevant to a binary outcomefrom longitudinal brain networks as well as estimating the time eﬀects of the sub-graphs. We enforce the extracted signal subgraphs to have clique structure which hasappealing interpretations as they can be related to neurological circuits. The timeeﬀect of each signal subgraph reﬂects how its predictive eﬀect on the outcome variesover time, which may improve our understanding of interactions between the agingof brain structure and neurological disorders. Application of this method on longitu-dinal brain connectomics and cognitive capacity data shows interesting discovery ofrelevant interconnections among a small set of brain regions in frontal and temporallobes with better predictive performance than competitors.

Keywords: signal subgraph learning, longitudinal structural brain networks, symmetricbilinear logistic regression, age eﬀect 1 a r X i v : . [ s t a t . A P ] A ug Introduction

In this article, we study the methods for predicting a binary outcome y i from a sequenceof longitudinal network-valued variables { W ( s ) i : s = 1 , . . . , T i } for n subjects, where eachobserved network is undirected without self loops and hence each W ( s ) i is a V × V symmet-ric matrix with zero diagonal. The motivation of this study is from the increasing interestof understanding brain connectomics and its relation to brain functions. With advancedneuroimaging technologies, more and more large neuroscience studies start collecting longi-tudinal brain scans, e.g., the Alzheimer’s Disease Neuroimaging Initiative (ADNI) (Jack Jret al., 2008) and the recent Adolescent Brain Cognitive Development project (Casey et al.,2018). We extracted a subset of subjects in ADNI according to Lin et al.’s study (Lin et al.,2017) of successful cognitive aging, consisting of longitudinal structural brain networks overa 5-year span for 40 supernormals and 45 cognitively normal controls with matched age,gender and education. Supernormals refer to the older adults who do not exhibit the ex-pected reduction in cognition, but have superior cognitive performance during aging (Linet al., 2017).Using state-of-the-art connectome extraction pipeline (Zhang et al., 2018), we can ex-tract structural brain networks from diﬀusion MRI (dMRI) data. The structural brainnetwork for an individual corresponds to a set of white matter connections among prede-ﬁned brain regions. In our data, the number of observed brain networks for each individualranges from 1 to 5 since not all the participants visited every year during the 5-year pro-gram. Each brain network is characterized by a weighted adjacency matrix where eachelement denotes the connectivity strength of neural ﬁbers between a pair of brain regions.Figure 1 left panel shows the preprocess of recovering structural connectomes from dMRIdata, and the right panel shows a proﬁle of the dataset we used.Our scientiﬁc goal is to study the relationship between successful cognitive aging andstructural brain networks in hope of ﬁnding neurologically interpretable structural markersin brain connectome predictive of supernormals. By neurologically interpretable markerswe mean a set of small outcome-relevant subgraphs with certain structure in the brainnetwork. For example the connections within each subgraph should share some common2 Input (data collection)

Subject 1Subject 2

T1dMRI …… T1 processing tissue segmentationparcellation dMRI processing

DTI / HARDItractography

Identifying connection (1) GM ROI dilation cut (2) fiber cutting(3) outliers removing (1) + (2) + (3)

Brain Connectome streamline connectivity matrixfeatureextractionweighted connectivity matrix (a) (b)Figure 1: Panel (a) shows our structural brain network extraction pipeline and panel (b)shows the proﬁle of longitudinal brain networks for each individual in the dataset.nodes (brain regions) or form certain anatomical circuits. With age information of thesubjects, we also aim to estimate the age eﬀect of each identiﬁed signal subgraph, thatis the dynamic importance of each subgraph in preserving cognition, which may improvecognition diagnosis and help understand the role of aging in normal and diseased brains.The signal subgraph learning is basically a variable selection problem where the numberof predictors - corresponding to the number of connections in the longitudinal brain net-works - could easily exceed the number of observations (subjects). One typical approach tothis large p small n problem would be some generalized linear model with certain regular-ization, such as LASSO (Tibshirani, 1996), Elastic-Net (Zou and Hastie, 2005) and SCAD(Fan and Li, 2001). But these methods do not guarantee any structure among the selectedconnections in the brain network, making the results hard to interpret neurologically. Inaddition, these approaches require ﬁrst transforming each adjacency matrix into a long vec-tor, which could induce ultra high dimensionality for big networks and ignore non-Euclieanstructure of networks.The longitudinal networks can be easily formed into tensors and thus tensor regressionmodels (Zhou et al., 2013; Zhou and Li, 2014) provide an eﬀective tool for our problem. Butthe weighted adjacency matrices of structural brain networks are symmetric, so additionalsymmetry constraints are required on the coeﬃcient vectors in tensor regression. Wang3t al. (2019) propose a symmetric bilinear regression for estimating a set of small cliquesignal subgraphs from a network predictor. But they focus on a continuous response anddo not consider the case of longitudinal network observations. In addition, the censoring oflongitudinal networks prevents the direct application of existing tensor regression models.We propose to use a symmetric bilinear logistic regression (SBLR) model with partialtime-varying coeﬃcients and elastic-net penalty to learn a set of small clique subgraphsand estimate their age eﬀects. The clique signal subgraphs selected by SBLR have moreappealing interpretations than the results of unstructured variable selection tools especiallyin neurological studies, as many complex cognitive processes are often the product of coor-dinated activities among several brain regions. The estimated age eﬀects demonstrate thedynamic impact on the outcome of the selected signal subgraphs, which may improve cog-nition diagnosis and development of treatment strategies for neurological disorders. SBLRis also very ﬂexible in dealing with longitudinal network data, where each subject can havediﬀerent number of observations at diﬀerent time points. To adapt to the brain networkdata, the model puts symmetry constraints on the coeﬃcient vectors of a tensor regression,which makes the block relaxation algorithm (Zhou et al., 2013) not applicable. We there-fore develop a coordinate descent algorithm for model estimation based on the success ofR package glmnet (Friedman et al., 2010), which involves solving a sequence of conditionalconvex optimizations.The rest of the paper is organized as follows. The SBLR model with elastic-net penaltyis proposed in Section 2. Section 3 describes the coordinate descent algorithm for model es-timation and discusses model selection issues. Section 4 presents simulation studies demon-strating the advantages of our method over competitors. We apply SBLR model on thelongitudinal brain network data in Section 5. Section 6 concludes. The data can be summarized as (cid:16) y i , { ( W ( s ) i , g is ) : s = 1 , ..., T i } (cid:17) for subject i , i = 1 ..., n .The binary response y i = 1 denotes that subject i is a supernormal and y i = 0 for a normalcontrol. T i represents the number of network observations for subject i . W ( s ) i denotes the4tructural brain network measured for subject i at the s -th visit ( s = 1 , . . . , T i ), which isa V × V symmetric matrix with zero diagonal entries. g is is the age of subject i when thenetwork W ( s ) i is observed. Our goal is to learn a set of small clique subgraphs from thebrain network that are relevant to the outcome and estimate their age eﬀects.We borrow ideas from the symmetric bilinear model (Wang et al., 2019) to adapt toa binary response with longitudinal network predictors by adding a logit link and makingsome of the coeﬃcients time-varying. This leads to the following symmetric bilinear logisticregression (SBLR): y i | { ( g is , W ( s ) i ) : s = 1 , . . . , T i } ind ∼ Bernoulli( p i ) , i = 1 , . . . , n, logit( p i ) = α + K (cid:88) h =1 T i T i (cid:88) s =1 λ h ( g is ) β (cid:62) h W ( s ) i β h , (1)where β h ∈ R V and λ h ( · ) is a function, λ h : [ a, b ] (cid:55)→ R , mapping from an interval [ a, b ]to a scalar. Model (1) assumes that the binary outcome y i of each individual follows anindependent Bernoulli distribution given her longitudinal network observations and thecorresponding age { ( W ( s ) i , g is ) : s = 1 , ..., T i } .The coeﬃcients in model (1) are assumed to have K components, where each component β h β Th selects a signal subgraph. For ease of interpretation, the logit link of (1) can bewritten in matrix dot product form:logit( p i ) = α + 1 T i T i (cid:88) s =1 K (cid:88) h =1 (cid:68) λ h ( g is ) β h β (cid:62) h , W ( s ) i (cid:69) (2)where (cid:104) B, W (cid:105) = trace( B (cid:62) W ) = vec( B ) (cid:62) vec( W ). From (2), we can see that the nonzeroentries in each component matrix β h β (cid:62) h locate an outcome-relevant clique subgraph in thenetwork predictor. We do not let β h vary with age in (1) for two reasons. First, we assumethe subgraphs and the related brain regions associated with y are stable across time forhealthy adults (without cognitive impairment). The connection strengths among thesebrain regions could change over time, and their dynamic contribution to the outcome iscaptured by λ h ( g ). Note that λ h ( · ) is also necessary to avoid constraining the coeﬃcientmatrix in (2) to be positive semi-deﬁnite (p.s.d). Second, we ﬁx each β h over time for5odel simplicity. Otherwise a random function for each element of β h ( h = 1 , . . . , K ) willlead to intractable model estimation and overﬁtting issues.Although individuals have repeated measures of her structural brain network at dif-ferent time points in the dataset, we focus on estimating the common signal subgraphspredictive of y as well as the age eﬀects in model (1) instead of their individual-speciﬁccounterparts due to the limited number of observations over time for each individual. Interms of the number of time points, longitudinal structural neuroimaging data are rela-tively rare, with most studies having less than three time points (King et al., 2018). Inour motivating data, each subject has no more than 5 brain scans and around 87% of thesubjects only have one or two network observations. Therefore there are insuﬃcient datato reliably estimate the intra-individual signal subgraphs or individual trajectories of ageeﬀects. However, the longitudinal network observations { W ( s ) i : s = 1 , . . . , T i } for eachsubject, though temporally correlated, could still contribute to inference of the commonsignal subgraphs { β h β (cid:62) h } and the population-level age eﬀects { λ h ( g ) } . So we utilize all theobserved longitudinal network information for each individual in model (1). And since notall the subjects have the same number of network observations in the data, we divide thebilinear part by T i in the logit of (1).Suppose that y i is related to some subgraph corresponding to β h β (cid:62) h , within which theconnection strengths of older adults with normal aging tend to decrease with age, whilethose of supernormals tend to maintain unchanged as they grow older. Then higher weight should be put on the term β (cid:62) h W ( s ) i β h observed at an older age in predicting y . In thiscase, λ h ( g ) would be expected to increase with age and thus we could let λ h ( g ) = ρ h g + α h . λ h ( g ) could also be assumed as a higher order polynomial function to enhance ﬂexibility. Insimulations and applications, we let λ h ( g ) be up to second order, i.e. λ h ( g ) = γ h g + ρ h g + α h ,for interpretation consideration and model simplicity, since higher order terms of age adddiﬃculty to interpretation and are prone to overﬁtting. In addition, we rarely observednonzero coeﬃcient estimates for the quadratic terms of age in real applications with elastic-net penalty as shown in Section 4.Plug in the quadratic function λ h ( g ) = γ h g + ρ h g + α h into the logit link of (1). We6ave logit( p i ) = α + K (cid:88) h =1 (cid:34)(cid:42) α h β h β (cid:62) h , T i T i (cid:88) s =1 W ( s ) i (cid:43) + (cid:42) ρ h β h β (cid:62) h , T i T i (cid:88) s =1 g is W ( s ) i (cid:43) + (cid:42) γ h β h β (cid:62) h , T i T i (cid:88) s =1 g is W ( s ) i (cid:43)(cid:35) . (3)Equation (3) implies that under the quadratic assumption for λ h ( g ), the true covariatesfor each subject i in model (1) are not the raw longitudinal network observations, but theaverage of her networks, the weighted average of the networks by her age and squaredage separately. To ensure both the identiﬁability of the model and sparsity of coeﬃcientmatrices (cid:8)(cid:0) α h β h β (cid:62) h , ρ h β h β (cid:62) h , γ h β h β (cid:62) h (cid:1) : h = 1 , . . . , K (cid:9) , we penalize the magnitude of thelower-triangular entries in these coeﬃcient matrices with the following elastic-net penalty: δ K (cid:88) h =1 V (cid:88) u =1 (cid:88) v η ∈ [0 ,

1] controlling the fraction of L regulariza-tion. The entrywise regularization (4) ensures the identiﬁability of the coeﬃcient matrices (cid:8)(cid:0) α h β h β (cid:62) h , ρ h β h β (cid:62) h , γ h β h β (cid:62) h (cid:1) : h = 1 , . . . , K (cid:9) in model (3), which cannot be achieved bysimply penalizing the L norms or L norms of the parameters { ( α h , ρ h , γ h , β h ) : h = 1 , . . . , K } .Refer to Wang et al. (2019) for a detailed discussion. The parameters in SBLR model (1) are estimated by minimizing the loss function below:Loss function = − n n (cid:88) i =1 ll i + K (cid:88) h =1 V (cid:88) u =1 (cid:88) v

0. From (9), (11) and (15) we know that a ( t ) hu ≥ e ( t ) hu ≥ a ( t ) hu + e ( t ) hu = 0 implies that (i) α ( t ) h = ρ ( t ) h = γ ( t ) h = 0 so that λ ( t ) h ( g ) ≡ β ( t ) hv = 0 for v (cid:54) = u or (iii) η = 1 and p ( t ) i = 0 or 1, ∀ i . For the former two cases, thelower triangular part of the component matrix λ h ( g ) ( t ) β ( t ) h β ( t ) (cid:62) h becomes zero no matterwhat β hu takes. So we set β ( t +1) hu = 0 in these cases. Regarding (iii), if p ( t ) i ≡ y i , ∀ i , (16)is minimized at β hu = 0 because b ( t ) hu = 0 at this time according to (10) and (14) togetherwith a ( t ) hu = e ( t ) hu = 0. Then the ﬁrst derivative of (16) becomes d ( t ) hu ( >

0) when β hu > − d ( t ) hu ( <

0) when β hu <

0. Otherwise, p ( t ) i = 0 or 1 ∀ i may be due to bad initialization.For example, a large magnitude of initial values of the parameters could easily make p i become 1 or 0, ∀ i through the logit link (3). In this case, setting β hu = 0 could prevent thedivergence of the solution. In practice, we always normalize the entries in { W ( s ) i } and the10ge terms g is , g is as described at the beginning of Section 3 before applying SBLR. Thenwe recommend to initialize each parameter from U ( − /K, /K ) to avoid the explosion inthe logit scale.The computational complexity of updating each entry β hu is O ( nV ) per iteration andthus that of updating { β h } Kh =1 is O ( nKV ). { ( α h , ρ h , γ h ) : h = 1 , . . . , K } Minimizing the loss function (5) with respect to α h when ﬁxing the others is equivalent to:min α h L ( α h ) = − n n (cid:88) i =1 ll i ( α h ) + d h | α h | + e h α h d h = δη (cid:80) Vu =1 (cid:80) v

0. If a ( t ) α,h + e ( t ) h > α h is then updated to the argmin of (19): α ( t +1) h = sign (cid:16) a ( t ) α,h α ( t ) h − b ( t ) α,h (cid:17) a ( t ) α,h + e ( t ) h (cid:16)(cid:12)(cid:12)(cid:12) a ( t ) α,h α ( t ) h − b ( t ) α,h (cid:12)(cid:12)(cid:12) − d ( t ) h (cid:17) + . (21)If a ( t ) α,h + e ( t ) h = 0, then either the component matrix β ( t ) h β ( t ) (cid:62) h is a zero matrix or p ( t ) i = 0 or1, ∀ i . In either case, α ( t +1) h is set to 0 following a similar discussion in Section 3.1.11ikewise, the update rules for each ρ h and γ h are as follows. ρ ( t +1) h =  sign (cid:16) a ( t ) ρ,h ρ ( t ) h − b ( t ) ρ,h (cid:17) a ( t ) ρ,h + e ( t ) h (cid:16)(cid:12)(cid:12)(cid:12) a ( t ) ρ,h ρ ( t ) h − b ( t ) ρ,h (cid:12)(cid:12)(cid:12) − d ( t ) h (cid:17) + if a ( t ) ρ,h + e ( t ) h >

00 otherwise (22)where b ( t ) ρ,h = − n n (cid:88) i =1 ∂ll i ( ρ ( t ) h ) ∂ρ h = − n n (cid:88) i =1 ( y i − p ( t ) i ) β ( t ) (cid:62) h (cid:32) T i T i (cid:88) s =1 ˜ g is ˜ W ( s ) i (cid:33) β ( t ) h a ( t ) ρ,h = − n n (cid:88) i =1 ∂ ll i ( ρ ( t ) h ) ∂ρ h = 1 n n (cid:88) i =1 p ( t ) i (1 − p ( t ) i ) (cid:34) β ( t ) (cid:62) h (cid:32) T i T i (cid:88) s =1 ˜ g is ˜ W ( s ) i (cid:33) β ( t ) h (cid:35) . (23)And γ ( t +1) h =  sign (cid:16) a ( t ) γ,h γ ( t ) h − b ( t ) γ,h (cid:17) a ( t ) γ,h + e ( t ) h (cid:16)(cid:12)(cid:12)(cid:12) a ( t ) γ,h γ ( t ) h − b ( t ) γ,h (cid:12)(cid:12)(cid:12) − d ( t ) h (cid:17) + if a ( t ) γ,h + e ( t ) h >

00 otherwise (24)where b ( t ) γ,h = − n n (cid:88) i =1 ∂ll i ( γ ( t ) h ) ∂γ h = − n n (cid:88) i =1 ( y i − p ( t ) i ) β ( t ) (cid:62) h (cid:32) T i T i (cid:88) s =1 (cid:102) g is ˜ W ( s ) i (cid:33) β ( t ) h a ( t ) γ,h = − n n (cid:88) i =1 ∂ ll i ( γ ( t ) h ) ∂γ h = 1 n n (cid:88) i =1 p ( t ) i (1 − p ( t ) i ) (cid:34) β ( t ) (cid:62) h (cid:32) T i T i (cid:88) s =1 (cid:102) g is ˜ W ( s ) i (cid:33) β ( t ) h (cid:35) . (25)The computational complexity for updating each α h , ρ h or γ h is O ( nV ) per iterationand hence that of updating { ( α h , ρ h , γ h ) } Kh =1 is O ( nKV ). This step requires storing thetransformed matrix predictors for each subject, (cid:110)(cid:80) T i s =1 ˜ W ( s ) i /T i (cid:111) ni =1 , (cid:110)(cid:80) T i s =1 ˜ g is ˜ W ( s ) i /T i (cid:111) ni =1 , (cid:110)(cid:80) T i s =1 (cid:102) g is ˜ W ( s ) i /T i (cid:111) ni =1 ,and the intermediate results (cid:110) β ( t ) (cid:62) h (cid:16)(cid:80) T i s =1 ˜ W ( s ) i /T i (cid:17) β ( t ) h (cid:111) Kh =1 , (cid:110) β ( t ) (cid:62) h (cid:16)(cid:80) T i s =1 ˜ g is ˜ W ( s ) i /T i (cid:17) β ( t ) h (cid:111) Kh =1 and (cid:110) β ( t ) (cid:62) h (cid:16)(cid:80) T i s =1 (cid:102) g is ˜ W ( s ) i /T i (cid:17) β ( t ) h (cid:111) Kh =1 for each sub-ject. Therefore the memory complexity is O ( nV + nK ).12 .3 Update for α The intercept α is also updated by maximizing the quadratic approximation to the jointlog-likelihood at the current estimate α ( t )0 with the updating rule α ( t +1)0 =  α ( t )0 − b ( t ) a ( t ) if a ( t ) >

00 otherwise (26)where b ( t ) = − n n (cid:88) i =1 ∂ll i ( α ( t )0 ) ∂α = − n n (cid:88) i =1 ( y i − p ( t ) i ) a ( t ) = − n n (cid:88) i =1 ∂ ll i ( α ( t )0 ) ∂α = 1 n n (cid:88) i =1 p ( t ) i (1 − p ( t ) i ) . (27)The computational complexity of this step is O ( n ). The above procedure is cycled through all the parameters until the relative change of theloss function (5) is smaller than a tolerance number (cid:15) , as summerized in Algorithm 1. Weset (cid:15) = 10 − in simulations and applications. Since the loss function (5) is lower boundedby 0 and each update always decreases the function value, the coordinate descent algorithmderived above is guaranteed to converge.In general, the algorithm should be run from multiple initializations to locate a goodlocal solution. Although the estimates for { β h } Kh =1 and { ( α h , ρ h , γ h ) } Kh =1 will all be zero un-der suﬃciently large penalty factor δ , we cannot initialize them at zero because the resultswill then get stuck at zero. The updating rules (17) - (24) imply that each parameter willbe set to 0 given others being zero. In fact, we recommend to initialize all the parametersnonzero in case some components get degenerated unexpectedly at the beginning. In prac-tice, we initialize each parameter from a uniform distribution U ( − /K, /K ) as discussedat the end of Section 3.1. 13 lgorithm 1 Coordinate descent for SBLR model with elastic-net penalty. Input:

Normalized V × V symmetric matrix observations { ˜ W ( s ) i : s = 1 , . . . , T i } ,standardized age { ˜ g is } and squared age { (cid:102) g is } , binary outcome y i , i = 1 , . . . , n ; rank K ,overall penalty factor δ > L fractional penalty factor η ∈ (0 , (cid:15) > Output:

Estimates of α , { ( α h , ρ h , γ h , β h ) : h = 1 , . . . , K } . Initialize: α and each parameter of { ( α h , ρ h , γ h , β h ) : h = 1 , . . . , K } ∼ U ( − /K, /K ). repeat for h = 1 : K do for u = 1 : V do update β hu by (17) end for end for for h = 1 : K do update α h by (21) update ρ h by (22) update γ h by (24) end for update α by (26) until relative change of loss function (5) < (cid:15) . The penalty factors ( δ , η ) in the regularization (4) can be tuned by cross validation (CV)over a grid of values on [ δ min , δ max ] × (0 , δ max is a roughly smallest value for whichall the parameters { ( β h , α h , ρ h , γ h ) } Kh =1 become zero when η takes the the smallest value,and δ min is a suﬃciently small value that produces dense results when η = 1 (lasso). Werecord the deviance (minus twice the average log-likelihood) for each left-out fold ratherthan AUC or misclassiﬁcation error, since deviance is smoother (Friedman et al., 2010).Then for each value of η , the optimal δ is collected such that the mean CV deviance is14ithin one standard-error of the minimum. Among these pairs of ( δ, η ), we pick the onethat produces the smallest deviance. This strategy is adapted from the “one-standard-error” rule for lasso (Friedman et al., 2010) that prefers a more parsimonious model sincethe risk curves are estimated with errors. We also refer to it as the “one-standard-error”rule for selecting the best model under elastic-net penalty in the latter part of this paper.Our proposed model (1) assumes a known number of components K . In practice, wechoose a large enough number for K and allow the elastic-net penalty (4) to discard unnec-essary components, leading to a data-driven estimate for the number of signal subgraphs.We assess the performance of our procedure and verify its lack of sensitivity to the chosenupper bound K in Table 1. In this section, we ﬁrst conduct a number of simulation experiments to evaluate the em-pirical computational and memory complexity of the coordinate descent algorithm derivedin Section 3. We then compare the inference results to competitors.

Algorithm 1 is implemented in Matlab (R2018a) and the code is publicly available in Github( https://github.com/wangronglu/Symmetric-Bilinear-Logistic-Regression ). All thenumerical experiments are conducted in a machine with six Intel Core i7 3.2 GHz pro-cessors and 64 GB of RAM. We simulated observations (cid:16) y i , { g is , W ( s ) i : s = 1 , . . . , T i } (cid:17) , i = 1 , . . . , n for diﬀerent number of subjects n and diﬀerent number of nodes V , where W ( s ) i [ uv ] = W ( s ) i [ vu ] ∼ N (0 ,

1) ( u (cid:54) = v ), g i ∼ U (60 , T i ∼ U { , , . . . , } , y i ∼ Bernoulli(0.5).Figure 2 and 3 display how the execution time and peak memory (maximum amount ofmemory in use) vary with diﬀerent problem sizes.Note that the computational time of Algorithm 1 also depends on the penalty factors δ and η , because smaller δ and η will lead to more nonzero estimates for the parameters.Figure 2 and the left plot of Figure 3 show that the runtime per iteration is a linear order15

00 120 140 160 180 200 n t i m e ( s ) V=70, K=5 =0.001, =1=0.001, =0.3fitted linear curve=0.1, =1=0.1, =0.3 K t i m e ( s ) V=20, n=100 =0.001, =1=0.001, =0.5fitted linear curve=0.1, =1=0.1, =0.5

Figure 2: Average computational time (in seconds) per iteration for 30 runs versus thenumber of subjects n (left) and the number of components K (right).with n and K , and a quadratic order with V under small δ and η , which is in accordancewith our analysis of computational complexity in Section 3, i.e. O ( nKV ) in the worstcase scenario.The right plot of Figure 3 shows that the peak memory of our algorithm is a quadraticorder with V , no matter what penalty factors are used. This is in accordance with ouranalysis of memory complexity in Section 3, which is O ( nV + nK ) and dominated by thequadratic term of V in these cases.Our algorithm is coded in the Matlab programming environment using no C or FORTRAN code. There are substantial margins to reduce the computational time if such code wereused, as each iteration involves for-loops over the entries of component vectors { β h } Kh =1 ,which are particularly slow in Matlab. In this experiment, we compare the performance of recovering true signal subgraphs andage eﬀects among SBLR model and the following methods:1. Unstructured logistic regression (LR) with elastic-net penalty. This model does not16 V t i m e ( s ) n=100, K=5 =0.001, =1=0.001, =0.5fitted quadratic curve=0.1, =1=0.1, =0.5

20 30 40 50 60 70 80 90 100 V pea k m e m o r y ( m b ) n=100, K=5 =0.001=0.1fitted quadratic curve Figure 3: Average computational time (in seconds) per iteration (left) and average peakmemory (in mb) in use (right) for 30 runs versus the number of nodes V .assume any structure on the coeﬃcient matrices:logit( p i ) = α + (cid:42) B , T i T i (cid:88) s =1 W ( s ) i (cid:43) + (cid:42) B , T i T i (cid:88) s =1 g is W ( s ) i (cid:43) + (cid:42) B , T i T i (cid:88) s =1 g is W ( s ) i (cid:43) (28)where B , B and B are V × V symmetric coeﬃcient matrices. In fact we onlyneed to enter the lower triangular entries of 1 T i (cid:80) T i s =1 W ( s ) i , 1 T i (cid:80) T i s =1 g is W ( s ) i and1 T i (cid:80) T i s =1 g is W ( s ) i in estimating (28). We write (28) in matrix dot product form forconcision and convenience of displaying results. Elastic-net penalty is put on theentries of the coeﬃcient matrices B , B and B in model estimation.2. Naively symmetrized tensor regression (NSTR). According to Zhou et al. (2013), ageneric rank- K tensor regression for binary response on the matrix predictors in (3)has the form logit( p i ) = α + K (cid:88) h =1 (cid:68) β ( h )1 ◦ β ( h )2 ◦ β ( h )3 , X i (cid:69) (29)where ◦ denotes outer product, (cid:104)· , ·(cid:105) denotes tensor dot product (Kolda and Bader,2009), and β ( h )1 ∈ R V , β ( h )2 ∈ R V , β ( h )3 ∈ R , h = 1 , . . . , K . X i in (29) is a V × V × T i (cid:80) T i s =1 W ( s ) i , 1 T i (cid:80) T i s =1 g is W ( s ) i and1 T i (cid:80) T i s =1 g is W ( s ) i for subject i as shown in Figure 4.17igure 4: The 3D array X i constructed for each subject i in a tensor regression.Then the nonzero entries in each component matrix β ( h )1 ◦ β ( h )2 = β ( h )1 β ( h ) (cid:62) are sup-posed to locate an outcome-relevant subgraph. However, the partial symmetry in X i when ﬁxing the 3rd dimension does not necessarily lead to the same estimates for β ( h )1 and β ( h )2 . A naive solution is to use the symmetrized component matrices (cid:110) ( β ( h )1 β ( h ) (cid:62) + β ( h )2 β ( h ) (cid:62) ) / h = 1 , . . . , K (cid:111) (30)to locate signal subgraphs. Let β ( h )3 = ( α h , ρ h , γ h ) (cid:62) . Then the age eﬀect of the h -thsignal subgraph in (29) is estimated as λ h ( g ) = γ h g + ρ h g + α h . According to Zhouet al. (2013), we add the following elastic-net penalty on component vectors in (29)to encourage sparsity: ρ K (cid:88) h =1 (cid:20) ( λ − (cid:18)(cid:13)(cid:13)(cid:13) β ( h )1 (cid:13)(cid:13)(cid:13) + (cid:13)(cid:13)(cid:13) β ( h )2 (cid:13)(cid:13)(cid:13) + (cid:13)(cid:13)(cid:13) β ( h )3 (cid:13)(cid:13)(cid:13) (cid:19) +(2 − λ ) (cid:16)(cid:13)(cid:13)(cid:13) β ( h )1 (cid:13)(cid:13)(cid:13) + (cid:13)(cid:13)(cid:13) β ( h )2 (cid:13)(cid:13)(cid:13) + (cid:13)(cid:13)(cid:13) β ( h )3 (cid:13)(cid:13)(cid:13) (cid:17)(cid:105) , λ ∈ [1 , . (31)We simulate a synthetic dataset of 100 subjects. To mimic the real data, the numberof network observations for each subject ranges from 1 to 5 randomly and the age variesover a 5-year span accordingly with g i ∼ U (60 , i = 1 , . . . , { W ( s ) i } are generated as follows. The ﬁrstnetwork observation W (1) i for each subject is generated from a set of basis subgraphs toinduce correlations among edges with an individual loading vector as W (1) i = (cid:88) h =1 λ ih q h q h + ∆ i (32)where q h ∈ { , } is a random binary vector with (cid:107) q h (cid:107) = h + 1, h = 1 , . . . ,

10 and (cid:107) q (cid:107) = 4. The loadings { λ ih } in (32) are generated independently from U (0 ,

1) and ∆ i adds 5% random noise to each connection strength in the network. Figure 5 visualizesthe 11 basis subgraphs superimposed together, showing that the generating process (32)produces networks with complex correlation structure. The follow-up network observation W ( s ) i for each subject is generated by adding N (0 ,

1) percent of relative change to eachconnection strength in W ( s − i , s = 2 , . . . , T i . Note that T i ≤ T m = 5, ∀ i .Figure 5: Overlay of 11 basis subgraphs corresponding to (cid:8) q h q (cid:62) h (cid:9) h =1 .The binary response y i is generated from Bernoulli( p i ) independently withlogit( p i ) = (cid:88) h =1 T i T i (cid:88) s =1 λ h ( g is ) β (cid:62) h ˜ W ( s ) i β h (33)19

60 70 80 90 age ( age )

60 70 80 90 age -0.25-0.2-0.15-0.1-0.050 ( age ) Figure 6: True coeﬃcient matrix β h β (cid:62) h for h = 1 (upper) and h = 2 (lower) in a simulateddataset, along with the corresponding signal subgraphs (middle) and age eﬀects { λ h ( g ) } (right).where β = q , β = q in (32), and the functions { λ h ( g ) : h = 1 , } are set as theright column in Figure 6. The generating process (32) - (33) indicate that the true signalsubgraphs relevant to y correspond to two basis subgraphs { q h q (cid:62) h : h = 3 , } , each with 4nodes and linear or constant age eﬀect as displayed in Figure 6.SBLR model is estimated under K = 5 and 20 initializations. We use 5-fold crossvalidation (CV) to tune the penalty factors δ and η in (4) as discussed in Section 3.4.Speciﬁcally, the grid of values for η is picked as { . , . , . . . , } . We set δ min = 0 . δ max and choose a sequence of 20 equally spaced δ values on the logarithmic scale. The totalruntime for 5-fold CV on this grid is around 16 minutes. Figure 7 displays the mean CVdeviance on the left-out data and indicates the selected values for ( δ, η ) under the “one-standard-error” rule. The best local solution found at the chosen ( δ, η ) does not change20hen increasing to 50 initializations. D e v i an c e log( ) Figure 7: Mean deviances of SBLR 5-fold cross validation (CV) on simulated data. The gridof values for η is { . , . , . . . , } and the red curve indicates the locations of the optimal δ for each value of η such that the mean CV deviance is within one standard-error of theminimum. The black line locates the selected values for ( δ, η ) corresponding to the smallestdeviance along the red curve – referred to as “one-standard-error” rule in simulations andapplications.Figure 8 displays the estimated results of SBLR at the selected values for ( δ , η ) inFigure 7. In this case, SBLR identiﬁes one nonempty component, which partially recov-ers the ﬁrst true signal subgraph in Figure 6 with no wrong edges. SBLR also correctlyestimates that the eﬀect of this subgraph on the outcome is increasing with age, thoughstarts from a negative value. The proﬁles of the estimated entries in coeﬃcient matrices (cid:8)(cid:0) α h β h β (cid:62) h , ρ h β h β (cid:62) h , γ h β h β (cid:62) h (cid:1) : h = 1 , . . . , (cid:9) over iterations from SBLR are shown in Fig-ure 9, which indicates the convergence of the sequences of component coeﬃcients as theloss function (5) converges. 21

60 70 80 90 age -0.200.20.4 ( age ) Figure 8: Estimated results of SBLR under K = 5 where the penalty factors are chosenby the one-standard-error rule. Left: the estimated nonzero component matrix β β (cid:62) ; thematrix is normalized such that the oﬀ-diagonal element with the largest magnitude is 1.Middle: the corresponding selected subgraph, where the thickness of edges is proportionalto the magnitude of their estimated coeﬃcients in β β (cid:62) ; black edges denote true signaledges while red ones falsely identiﬁed edges (not displayed). Right: the correspondingestimated age eﬀect λ ( g ).Figure 10 shows the estimated results from unstructured LR model (28), where theelastic-net penalty factors are also chosen by the “one-standard-error” rule. Although itis easy to identify a triangle from the selected edges in this simple example, we have toanalyze their age eﬀects edge by edge. As shown in Figure 10, the selected edge (9, 14) hasfake quadratic age eﬀect and there is a falsely identiﬁed edge (3, 4).Figure 11 displays the estimated results of NSTR model (29) - (31) under K = 5 at theoptimal penalty factors selected by the “one-standard-error” rule. As can be seen, NSTRpartially identiﬁes the ﬁrst true signal subgraph in Figure 6 while selecting 3 false edges.The corresponding estimated age eﬀect looks similar to that of SBLR, comparing Figure11 to Figure 8.The performance of SBLR could be improved by increasing the number of observationsin the data. Figure 12 displays the estimated results of SBLR under K = 5 when thenumber of subjects is increased to 1000 under the same data generating process (32) -(33). This time SBLR partially recovers both true signal subgraphs. The ﬁrst selected22igure 9: Proﬁles of estimated entries in coeﬃcient matrices (cid:8)(cid:0) α h β h β (cid:62) h , ρ h β h β (cid:62) h , γ h β h β (cid:62) h (cid:1) : h = 1 , . . . , (cid:9) over iterations from SBLR at the cho-sen optimal penalty factors.subgraph in Figure 12 corresponds to the second true signal subgraph in Figure 6, whereSBLR correctly estimated its eﬀect on the outcome is negatively constant across age. Thesecond selected subgraph in Figure 12 is part of the ﬁrst true signal subgraph in Figure6, and SBLR correctly estimated its eﬀect on the outcome is positive and increasing withage.The procedure described above is repeated 30 times, where each time we generate asynthetic dataset based on (32) - (33), and record the mean CV deviance at the optimalpenalty factors under the “one-standard-error” rule for SBLR, LR and NSTR. We alsorecord the true positive rate (TPR) for each method, representing the proportion of truesignal edges that are correctly identiﬁed, and the false positive rate (FPR), representingthe proportion of non-signal edges that are falsely identiﬁed. Table 1 displays the meanand standard deviation (SD) of the mean CV deviance, TPR and FPR for SBLR, LR andNSTR under diﬀerent problem sizes.The upper panel of Table 1 corresponding to n = 100 displays the results where eachdataset consists of 100 subjects. As can be seen, the average of the mean CV deviance,23 -0.0500.050.1 B B -4 Figure 10: Estimated results of unstructured logistic regression (28) with elastic-net penaltywhere the penalty factors are chosen by the “one-standard-error” rule. Upper panel: es-timated entries (lower triangular) versus true values (upper triangular) in B , B and B .Lower panel: the selected edges in the network where the thickness of edges is proportionalto the magnitude of the corresponding coeﬃcients; black edges denote true signal edgesand red ones falsely identiﬁed edges.TPR and FPR from SBLR under K = 10 are very similar to those under K = 5, implyingthat the performance of SBLR is robust to the chosen upper bound for the number ofcomponents. In this case, SBLR models achieve competitive predictive performance withLR and better performance than NSTR in terms of out-of-sample deviance. AlthoughSBLR has smaller average TPR than LR or NSTR, it has the lowest average FPR.The lower panel of Table 1 shows that increasing the number of observations couldimprove the performance of SBLR and LR, though may not be the case for NSTR since itsmean FPR increases considerably with n . When we increase the number of subjects in eachdataset from n = 100 to n = 1000, the average CV deviance and FPR decrease signiﬁcantly24 (1)1 (1)T2 + (1)2 (1)T1 )/2

60 70 80 90 age -0.500.51 ( age ) Figure 11: Estimated results of NSTR model (29) - (31) under K = 5 where the penaltyfactors are chosen by the “one-standard-error” rule. Left: the estimated nonzero componentmatrix ( β (1)1 β (1) (cid:62) + β (1)2 β (1) (cid:62) ) /

2; the matrix is normalized such that the oﬀ-diagonal elementwith the largest magnitude is 1. Middle: the corresponding selected subgraph where thethickness of edges is proportional to the magnitude of their estimated coeﬃcients in thecomponent matrix; black edges denote true signal edges while red ones falsely identiﬁededges. Right: the corresponding estimated age eﬀect λ ( g ).for SBLR and LR, while their average TPRs increase signiﬁcantly; the standard deviationof each measure also decreases for SBLR and LR. In the case of n = 1000, Table 1 showsthat both average FPR and mean CV deviance of SBLR are smaller than those of LR.Therefore we consider SBLR as a conservative method that tends to be more sparse inselecting signal subgraphs than these competitors. Cognitive aging is typically characterized by declines with age in processing speed and mem-ory domains (Park and Reuter-Lorenz, 2009). However some older adults do not exhibitthe expected reduction in cognition, but have superior cognitive performance comparedto age- and education-matched cognitively normal older adults (Lin et al., 2017), or evenwhen compared to younger or middle-aged adults (Sun et al., 2016). These individuals ex-hibiting successful cognitive aging are thus called “supernormals” (Mapstone et al., 2017)25

60 70 80 90 age -0.2-0.15-0.1-0.050 ( age )

60 70 80 90 age ( age ) Figure 12: Estimated results of SBLR on a simulated dataset with n = 1000 subjects. Left:the estimated nonzero component matrices β h β (cid:62) h for h = 3 (upper) and h = 4 (lower); thematrices are normalized such that the oﬀ-diagonal element with the largest magnitude is 1.Middle: the corresponding selected subgraphs, where the thickness of edges is proportionalto the magnitude of their estimated coeﬃcients in the component matrix; black edgesdenote true signal edges and red ones falsely identiﬁed edges (not displayed). Right: thecorresponding estimated age eﬀects { λ h ( g ) } .or “superagers” (Rogalski et al., 2013).We applied our method to a subset of data from the Alzheimer’s Disease Neuroimag-ing Initiative (ADNI) database as introduced in Section 1 to better understand the neuralmechanism underlying successful cognitive aging. The dataset contains dMRI data overa 5-year span for 40 supernormals and 45 cognitively normal controls. A state-of-the-artDTI processing pipeline (Zhang et al., 2018) was applied to extract structural brain net-works of subjects. More speciﬁcally, we ﬁrst used a reproducible probabilistic tractography26able 1: Mean and SD of the mean CV deviance, TPR and FPR over 30 simulations.CV Deviance TPR FPR n = 100LR 1.3018 ± ± ± K = 5) 1.3200 ± ± ± K = 5) ± ± ± K = 10) 1.3030 ± ± ± n = 1000LR 1.1808 ± ± ± K = 5) 1.1849 ± ± ± K = 5) ± ± ± algorithm (Girard et al., 2014; Maier-Hein et al., 2017) to generate the whole-brain tractog-raphy data for each dMRI scan in the dataset. The method borrows anatomical informationfrom high-resolution T1-weighted imaging to reduce bias in reconstruction of tractography.We then used the popular Desikan-Killiany atlas (Desikan et al., 2006) to deﬁne ROIscorresponding to the nodes in the structural connectivity network. The Desikan-Killianyparcellation has 68 cortical surface regions with 34 nodes in each hemisphere. Freesurfersoftware (Dale et al., 1999; Fischl et al., 2004) was used to perform brain registration andparcellation. With the parcellation of an individual brain, we extracted two white matterintegrity measures - fractional anisotropy (FA) and mean diﬀusivity (MD), along each ﬁbertract, and then use the average value to describe connection strength between two ROIs.Both FA and MD are diﬀusion-related features that characterize water diﬀusivity alongwhite matter streamlines, and have been widely used in the literature for white matterstructure analysis (Kraus et al., 2007; Jin et al., 2017).27 .1 Analysis of FA connectivity matrices In this case, the weighted adjacency matrix W ( s ) i of each brain network consists of mean FAvalues along ﬁber tracts between each pair of brain regions. We ﬁrst compare the predictiveperformance between SBLR and LR. The penalty factors of elastic-net regularization forboth methods are tuned by 5-fold CV as described in Section 3.4. SBLR model is estimatedunder K = 5 and 50 initializations, which is suﬃcient since only one nonempty componentis selected under the one-standard-error rule as shown below, and the estimated results arerobust when increasing to 100 initializations. The mean CV deviance at the chosen penaltyfactors of SBLR is 1.29, which is smaller than that of LR, 1.38, indicating better predictiveperformance.The selected connections predictive of supernormals from LR and the correspondingestimated coeﬃcients in B , B and B of (28) are displayed in Figure 13. As can be seen,these connections are scattered in the brain network with no meaningful structure and wehave to analyze their age eﬀects edge by edge.Figure 13: Selected connections predictive of supernormals from LR with constant (left),linear (middle) and quadratic (right) age eﬀect using the FA metric, where the thicknessof edges is proportional to the magnitude of the corresponding coeﬃcients, and the colorgoes from blue to red as the coeﬃcient goes from negative to positive.SBLR estimated one nonzero component ˆ λ ( g ) ˆ β ˆ β (cid:62) under K = 5. The nonzero coeﬃ-cients in ˆ β ˆ β (cid:62) and the corresponding subgraph are displayed in the left plot of Figure 14,where the matrix ˆ β ˆ β (cid:62) is normalized such that the oﬀ-diagonal element with the largest28agnitude is 1. We notice that the subnetwork SBLR identiﬁed consists a hub node, la-beled 31 r , in Figure 14. In the Desikan-Killiany atlas, 31 r is the frontal pole area, whichis also considered as the Brodmann area 10 (BA 10). The other nodes in this subnetworkinclude 27 l , superior frontal region, and 32 r , temporal pole. Superior frontal region usuallycontrols the sensory system and temporal pole relates to episodic memories, emotion andsocially relevant memory. This subnetwork with BA 10 as a hub is super important, playingan important role in high level information integration from visual, auditory, and somaticsensory systems to achieve conceptual interpretation of the environment, which is criticalto maintain cognition. The right plot of Figure 14 shows that the eﬀect of the connectionstrengths within this subgraph on the outcome is constant across age. The red edges andthe positive function ˆ λ ( g ) in Figure 14 indicate that the eﬀects on the outcome of the con-nection strengths in the signal subgraph are all positive, implying that older adults withstronger neural connections among these brain regions are more likely to be supernormals.

60 70 80 90 age ( age ) Figure 14: Left: the selected signal subgraph corresponding to the nonzero componentˆ β ˆ β (cid:62) of SBLR under K = 5 using FA measures, where the thickness of edges is proportionalto the magnitude of their estimated coeﬃcients in ˆ β ˆ β (cid:62) , and the color goes from blue tored as the coeﬃcient goes from negative to positive. Right: the corresponding estimatedage eﬀect ˆ λ ( g ). 29 .2 Analysis of MD connectivity matrices This time the weighted adjacency matrix W ( s ) i of each brain network consists of meanMD values between each pair of brain regions. The mean CV deviance at the chosenpenalty factors of LR is 1.39 while that of SBLR is 1.31, again indicating superior predictiveperformance over the unstructured model. In this case, LR selected none of the connectionspredictive of supernormals under the one-standard-error rule.The signal subgraph identiﬁed by SBLR and the corresponding age eﬀect under K = 5are displayed in Figure 15. As can be seen, the clique signal subgraph selected by SBLRwith MD measures is nearly a subgraph of that in Figure 14. The estimated coeﬃcients andage eﬀect also look similar between the two ﬁgures. This shows that the estimated resultsfrom SBLR are robust to slightly diﬀerent measures of connection strengths in structuralbrain networks.

60 70 80 90 age ( age ) Figure 15: Left: the selected signal subgraph corresponding to the nonzero componentˆ β ˆ β (cid:62) from SBLR under K = 5 using MD measures, where the thickness of edges isproportional to the magnitude of their estimated coeﬃcients in ˆ β ˆ β (cid:62) , and the color goesfrom blue to red as the coeﬃcient goes from negative to positive. Right: the correspondingestimated age eﬀect ˆ λ ( g ).The involved regions identiﬁed in Figure 14 and Figure 15 are also in accordance with30he age-related studies in neuroscience. The frontal poles (31 r , 31 l ) and the superior frontalgyrus (27 l ) are among the regions with the greatest age-related reduction in volume andsurface area; the right temporal pole (32 r ) has signiﬁcantly greater-than-average reductionin volume (Lemaitre et al., 2012). In addition, lesions in the entorhinal cortex (5 r ) isassociated with impairment of episodic memory and the decline in the volume of this cortexpredicts progression from mild cognitive impairment to dementia (Rodrigue and Raz, 2004).The constant age eﬀect of the selected subgraph from SBLR under both measures impliesthat there is no speciﬁc age between 60 and 95 that has particularly large predictive eﬀecton the outcome (supernormal or normal aging). In summary, the symmetric bilinear logistic regression (SBLR) is a useful tool in analyzingthe relationship between a binary outcome and a sequence of longitudinal network obser-vations. SBLR contributes to an insightful understanding of the sub-structure of a networkrelevant to a binary outcome, as it produces much more interpretable results and lowerFPR than unstructured variable selection methods do, while maintaining competitive pre-dictive performance. A coordinate descent algorithm is developed for estimating SBLRwith elastic-net penalty, which outputs reliable outcome-relevant subgraphs and their timeeﬀects. Simulation studies show that the accuracy of SBLR estimates could be improved byincreasing the number of subjects in the data. With larger number of longitudinal networkobservations for each subject (e.g. ≥ References

Casey, B., Cannonier, T., Conley, M. I., Cohen, A. O., Barch, D. M., Heitzeg, M. M., Soules,M. E., Teslovich, T., Dellarco, D. V., Garavan, H., et al. (2018). The adolescent brain31ognitive development (ABCD) study: imaging acquisition across 21 sites.

DevelopmentalCognitive Neuroscience .Dale, A. M., Fischl, B., and Sereno, M. I. (1999). Cortical surface-based analysis: I.segmentation and surface reconstruction.

Neuroimage , 9(2):179–194.Desikan, R. S., S´egonne, F., Fischl, B., Quinn, B. T., Dickerson, B. C., Blacker, D., Buckner,R. L., Dale, A. M., Maguire, R. P., Hyman, B. T., Albert, M. S., and Killiany, R. J.(2006). An automated labeling system for subdividing the human cerebral cortex onMRI scans into gyral based regions of interest.

NeuroImage , 31(3):968 – 980.Fan, J. and Li, R. (2001). Variable selection via nonconcave penalized likelihood and itsoracle properties.

Journal of the American Statistical Association , 96(456):1348–1360.Fischl, B., Salat, D. H., van der Kouwe, A. J., Makris, N., S´egonne, F., Quinn, B. T., andDale, A. M. (2004). Sequence-independent segmentation of magnetic resonance images.

NeuroImage , 23(Supplement 1):S69 – S84.Friedman, J., Hastie, T., and Tibshirani, R. (2010). Regularization paths for generalizedlinear models via coordinate descent.

Journal of Statistical Software , 33(1):1–22.Girard, G., Whittingstall, K., Deriche, R., and Descoteaux, M. (2014). Towards quantita-tive connectivity analysis: reducing tractography biases.

NeuroImage , 98:266 – 278.Jack Jr, C. R., Bernstein, M. A., Fox, N. C., Thompson, P., Alexander, G., Harvey, D.,Borowski, B., Britson, P. J., L. Whitwell, J., Ward, C., et al. (2008). The Alzheimer’sdisease neuroimaging initiative (adni): Mri methods.

Journal of Magnetic ResonanceImaging: An Oﬃcial Journal of the International Society for Magnetic Resonance inMedicine , 27(4):685–691.Jin, Y., Huang, C., Daianu, M., Zhan, L., Dennis, E. L., Reid, R. I., Jack Jr, C. R.,Zhu, H., Thompson, P. M., and Initiative, A. D. N. (2017). 3d tract-speciﬁc local andglobal analysis of white matter integrity in Alzheimer’s disease.

Human Brain Mapping ,38(3):1191–1207. 32ing, K. M., Littleﬁeld, A. K., McCabe, C. J., Mills, K. L., Flournoy, J., and Chassin,L. (2018). Longitudinal modeling in developmental neuroimaging research: Commonchallenges, and solutions from developmental psychology.

Developmental Cognitive Neu-roscience , 33:54–72.Kolda, T. G. and Bader, B. W. (2009). Tensor decompositions and applications.

SIAMreview , 51(3):455–500.Kraus, M. F., Susmaras, T., Caughlin, B. P., Walker, C. J., Sweeney, J. A., and Little,D. M. (2007). White matter integrity and cognition in chronic traumatic brain injury: adiﬀusion tensor imaging study.

Brain , 130(10):2508–2519.Lemaitre, H., Goldman, A. L., Sambataro, F., Verchinski, B. A., Meyer-Lindenberg, A.,Weinberger, D. R., and Mattay, V. S. (2012). Normal age-related brain morphometricchanges: nonuniformity across cortical thickness, surface area and gray matter volume?

Neurobiology of Aging , 33(3):617.e1–617.e6179.Lin, F., Ren, P., Mapstone, M., Meyers, S. P., Porsteinsson, A., Baran, T. M., Initiative, A.D. N., et al. (2017). The cingulate cortex of older adults with excellent memory capacity.

Cortex , 86:83–92.Maier-Hein, K. H., Neher, P. F., Houde, J.-C., Cˆot´e, M.-A., Garyfallidis, E., Zhong, J.,Chamberland, M., Yeh, F.-C., Lin, Y.-C., Ji, Q., et al. (2017). The challenge of map-ping the human connectome based on diﬀusion tractography.

Nature Communications ,8(1):1349.Mapstone, M., Lin, F., Nalls, M. A., Cheema, A. K., Singleton, A. B., Fiandaca, M. S., andFederoﬀ, H. J. (2017). What success can teach us about failure: the plasma metabolomeof older adults with superior memory and lessons for Alzheimer’s disease.

Neurobiologyof Aging , 51:148–155.Park, D. C. and Reuter-Lorenz, P. (2009). The adaptive brain: aging and neurocognitivescaﬀolding.

Annual Review of Psychology , 60:173–196.33odrigue, K. M. and Raz, N. (2004). Shrinkage of the entorhinal cortex over ﬁve yearspredicts memory performance in healthy adults.

Journal of Neuroscience , 24(4):956–963.Rogalski, E. J., Gefen, T., Shi, J., Samimi, M., Bigio, E., Weintraub, S., Geula, C., andMesulam, M.-M. (2013). Youthful memory capacity in old brains: anatomic and geneticclues from the northwestern superaging project.

Journal of Cognitive Neuroscience ,25(1):29–36.Sun, F. W., Stepanovic, M. R., Andreano, J., Barrett, L. F., Touroutoglou, A., and Dicker-son, B. C. (2016). Youthful brains in older adults: preserved neuroanatomy in the defaultmode and salience networks contributes to youthful memory in superaging.

Journal ofNeuroscience , 36(37):9659–9668.Tibshirani, R. (1996). Regression shrinkage and selection via the lasso.

Journal of theRoyal Statistical Society. Series B (Statistical Methodology) , 58(1):267–288.Wang, L., Zhang, Z., and Dunson, D. B. (2019). Symmetric bilinear regression for signalsubgraph estimation.

IEEE Transactions on Signal Processing , 67:1929–1940.Zhang, Z., Descoteaux, M., Zhang, J., Girard, G., Chamberland, M., Dunson, D., Sri-vastava, A., and Zhu, H. (2018). Mapping population-based structural connectomes.

Neuroimag , 172:130 – 145.Zhou, H. and Li, L. (2014). Regularized matrix regression.

Journal of the Royal StatisticalSociety: Series B (Statistical Methodology) , 76(2):463–483.Zhou, H., Li, L., and Zhu, H. (2013). Tensor regression with applications in neuroimagingdata analysis.

Journal of the American Statistical Association , 108(502):540–552.Zou, H. and Hastie, T. (2005). Regularization and variable selection via the elastic net.