[PDF] Nonparametric Matrix Response Regression with Application to Brain Imaging Data Analysis

Abstract

With the rapid growth of neuroimaging technologies, a great effort has been dedicated recently to investigate the dynamic changes in brain activity. Examples include time course calcium imaging and dynamic brain functional connectivity. In this paper, we propose a novel nonparametric matrix response regression model to characterize the nonlinear association between 2D image outcomes and predictors such as time and patient information. Our estimation procedure can be formulated as a nuclear norm regularization problem, which can capture the underlying low-rank structure of the dynamic 2D images. We present a computationally efficient algorithm, derive the asymptotic theory and show that the method outperforms other existing approaches in simulations. We then apply the proposed method to a calcium imaging study for estimating the change of fluorescent intensities of neurons, and an electroencephalography study for a comparison in the dynamic connectivity covariance matrices between alcoholic and control individuals. For both studies, the method leads to a substantial improvement in prediction error.

Full PDF

NNonparametric Matrix Response Regression withApplication to Brain Imaging Data Analysis

Wei HuDepartment of Statistics, University of California, IrvineDehan KongDepartment of Statistical Sciences, University of TorontoandWeining ShenDepartment of Statistics, University of California, IrvineOctober 16, 2019

Abstract

With the rapid growth of neuroimaging technologies, a great effort has been dedicatedrecently to investigate the dynamic changes in brain activity. Examples include time coursecalcium imaging and dynamic brain functional connectivity. In this paper, we propose anovel nonparametric matrix response regression model to characterize the nonlinear associ-ation between 2D image outcomes and predictors such as time and patient information. Ourestimation procedure can be formulated as a nuclear norm regularization problem, whichcan capture the underlying low-rank structure of the dynamic 2D images. We present acomputationally efﬁcient algorithm, derive the asymptotic theory and show that the methodoutperforms other existing approaches in simulations. We then apply the proposed methodto a calcium imaging study for estimating the change of ﬂuorescent intensities of neurons,and an electroencephalography study for a comparison in the dynamic connectivity covari-ance matrices between alcoholic and control individuals. For both studies, the method leadsto a substantial improvement in prediction error.

Keywords:

Calcium imaging, Electroencephalography, Low rank, Matrix data, Nonparametricregression, Nuclear norm, 1 a r X i v : . [ s t a t . M E ] O c t Introduction

Neuroimaging techniques have been widely used in both scientiﬁc research and clinical ap-plications, e.g., calcium imaging, electroencephalography (EEG), magnetic resonance imagingand functional magnetic resonance imaging. Those techniques provide a great opportunity for(1) the investigation of neuron activities at the cellular level, e.g., the quantitative estimationof the intracellular calcium signals; and (2) a better understanding of the dynamic changes inbrain activity as well as their associations with disease and other clinical information. Herewe brieﬂy discuss two relevant examples. More details about these examples will be given inSection 2. The ﬁrst example is related to the ﬂuorescent calcium imaging, which is an emergingpopular technique for observing the spiking activity of large neuronal populations over the pastdecade. This new technology allows scientists to observe the locations of neurons and the timesat which they ﬁre using two-photon microscopy [Helmchen and Denk, 2005, Petersen et al.,2018], resulting in a data set as a video clip, i.e., a sequence of 2D images taken over the time.The scientiﬁc questions of interest are to identify the major neurons and model their spikingactivity over the time; and calcium imaging techniques provide a unique opportunity to addressthose questions because of its ability to directly collect the rich and accurate measurements atthe cellular level. As shown in Figure 2(a), the images are quite noisy and structured-sparse(in the sense that the signal level is low at the boundary for most images), yet contains richinformation (the video clip we study in this paper composes 3,000 picture frames with severalobvious major neurons located in the center of the image ﬁring for multiple times). Thus it isimportant and non-trivial to develop an automatic-yet-ﬂexible pipeline for analyzing such typeof data sets.Our second example is the brain functional connectivity analysis. Functional connectivityrefers to the coherence of the activities among distinct brain regions [Horwitz, 2003], and itprovides novel insights on how distributed brain regions are functional integrated [Biswal et al.,1995, 2010, Fox et al., 2005]. In general, studies of functional connectivity are based on thetemporal correlation between spatially remote neurophysiological events [Friston, 1994] with animplicit assumption that the functional connectivity is at the same level during the observation2eriod. Recently, functional connectivity has been shown to ﬂuctuate over time [Chang, Liu,Chen, Liu, and Duyn, 2013], implying that measures assuming stationarity over a full scan maybe too simplistic to capture the full brain activity. Since the initial ﬁndings, researchers haveinvestigated the so-called dynamic functional connectivity ; see Calhoun et al. [2014], Calhounand Adali [2016], Preti et al. [2017] for reviews to date. It then makes sense to represent theconnectivity as a covariance matrix and model its change over the time and other clinical factorsof interest. As shown in Figure 5, we analyze an EEG data set to study the dynamic functionalconnectivity between alcoholic and non-alcoholic individuals. There is a signiﬁcant differencein terms of image pattern and temporal correlation between two groups of participants.

Statistically speaking, it is natural to quantify the association between 2D image outcomes andthe predictors such as time, patient demographics, and other disease predictors by a regressionmodel. Such regression model will also serve as a useful tool for smoothing and denoisingpurposes. However, several challenges remain unaddressed in the literature. First, the neu-roimaging data obtained from the experiments, even after some pre-processing steps, are oftennoisy and taking complex structure, such as spatial/temporal correlation, low rankness, andsparsity. Such structural information provides useful scientiﬁc and clinical insight, but alsoimposes additional challenges to statistical analysis and computation.Secondly, existing regression approaches are mainly focused on linear models and imagepredictors in the literature, where the linearity assumption, as much as it simpliﬁes the compu-tational/theoretical investigation, does not quite meet the need to accurately model the nonlinearpattern between predictors and the image outcomes. For example, we performed a preliminaryanalysis on the calcium imaging data collected by Ilana Witten’s lab at the Princeton Neuro-science Institute [Petersen et al., 2018]. Figure 2(b) shows a scatter plot of the changes ofﬂuorescent intensities across time from a randomly selected pixel of the 2D-image. The scat-ter plot shows a clear nonlinear pattern, which will be neglected by linear models. Note thatclassical nonlinear regression approaches such as Nadaraya-Watson method can not be directlyused in our case to model the matrix-valued image responses, since doing so is equivalent tovectorizing the 2D-image data, which destroys the underlying spatial information of the image.3n this paper, we propose a novel low rank nonparametric regression approach to take ac-count of the matrix structure of the image data by solving a nuclear norm regularization prob-lem. By the singular value thresholding algorithm [Cai et al., 2010], we show that our estimatorhas a closed-form solution for each ﬁxed bandwidth and regularization parameter. To efﬁcientlyselect these tuning parameters, we derive a Bayesian information criterion (BIC) based on ourmodel and estimation procedure. For theoretical justiﬁcation, we derive the risk bound for ournonparametric estimator. We show that the rank of the true function can be consistently esti-mated as well.Here we highlight our contributions in this paper. First, we propose a novel nonparametricmatrix response regression model. There are some related works on (generalized) linear modelsfor matrix-valued data. For example, Zhou and Li [2014] proposed a class of regularized matrixlinear regression model by treating matrix data as covariates; Wang and Zhu [2017] developeda generalized scalar-on-image regression model via total variation; Ding and Cook [2018] stud-ied the matrix response linear regression model using envelope methods; Kong et al. [2019]proposed a low-rank linear regression model with high-dimensional matrix response and highdimensional scalar covariates. To the best of our knowledge, no work has been done on usingnonparametric models for matrix data analysis. Second, our nonparametric estimator is easy toderive and has a closed-form solution, which makes it computationally more efﬁcient than thestate-of-art multivariate varying coefﬁcient model [Zhu et al., 2011, 2012]. The tuning processis also simple thanks to the derived analytic form of BIC, which is not straightforward for theproposed nonparametric matrix response model. Thirdly, we develop the asymptotic theoriesincluding the risk bound and rank consistency for the method, which directly connects to theexisting work on nonparametric statistics theory, hence rendering for a deeper understanding ofsuch type of models. Finally, as demonstrated in both real data analysis examples, the proposednonparametric model is simple, ready-to-use, manages to ﬁt the nonlinear changes in brain sig-nals very well and has a substantial improvement in the prediction error. The low-ranknessassumption seems amenable in practice. We are currently developing a computational packagefor the proposed methodology and we expect it to have a major impact in the neuroimagingcommunity.The rest of the article proceeds as follows. In Section 3, we introduce a novel nonparametric4atrix response regression model and propose a fast algorithm for our low-rank regularizedestimation procedure. We further derive a BIC for our model to choose the tuning parameters.Section 4 investigates the theoretical properties of our method. We evaluate ﬁnite performanceof our method in Section 5. Section 6 illustrates applications of our method to two real datasetsfrom a calcium imaging study and an electroencephalography study.

In this section we provide more details about the two motivating data examples and highlighttheir scientiﬁc merit and associated statistical challenges.

Calcium ions generate versatile intracellular signals that control key functions in all types ofneurons, including the control of heart muscle cell contraction as well as the regulation of vitalaspects of the entire cell cycle. Fluorescent calcium imaging is a sensitive method for moni-toring neuronal activity [Mao et al., 2001]. It makes use of the fact that in living cells, mostdepolarizing electrical signals are associated with calcium ions inﬂux attributable to the activa-tion of one or more of the numerous types of voltage-gated calcium ions channels, abundantlyexpressed in the nervous system [Tsien and Tsien, 1990, Stosiek et al., 2003]. Imaging calciumin neurons is particularly important because calcium signals exert their highly speciﬁc functionsin well-deﬁned cellular subcompartments [Grienberger and Konnerth, 2012]. The advantage ofcalcium imaging is that it allows real-time analyses of individual cells and even subcellular com-partments. At the same time it readily permits simultaneous recordings from many individualcells [Stosiek et al., 2003].Whenever a neuron ﬁres, voltage-gated calcium channels in the axon terminal open andthen calcium ﬂoods the cell. Such changes in concentration of calcium ions are detected byobserving the ﬂuorescence of calcium indicator molecules. Therefore, not surprisingly, intra-cellular calcium concentration becomes an important surrogate marker for the spiking activityof neurons in the absence of effective voltage imaging approach and is commonly used whenanalyzing local neuronal circuits in vivo and in vitro [Petersen et al., 2018, Grienberger and5onnerth, 2012]. However, the available experimental techniques still lead to noisy and spatial-temporally-subsampled observations of the true underlying signals [Pnevmatikakis et al., 2012].The calcium trace is smeared which restricts the extraction of quantities of interest such as spiketrains of individual neurons [Rahmati et al., 2016]. The spatio-temporal smoothing of the cal-cium proﬁle remains a difﬁcult problem due to the high dimensionality and complex structure ofthe calcium imaging data. New methods that are capable of characterizing the dynamic changesof calcium imaging videos and performing the spatio-temporal smoothing of the calcium dataare largely needed.

Functional connectivity measures statistical dependence between the time series signals ob-tained from different brain regions. The functional connectivity has been investigated in awealth of literature with various analysis tools, and has applications in different imaging modal-ities such as functional magnetic resonance imaging (fMRI), Electroencephalography (EEG),Magnetoencephalography (MEG) and Positron emission tomography (PET). The functionalconnectivity is built upon the assumption that the connectivity is stationary in nature.More recently, the dynamic behavior of functional connectivity was revealed, suggestingthat connectivity between different brain regions exhibits meaningful variation over time. Henceit provides a more detailed representation of brain function than the traditional static functionalconnectivity analysis. Several previous studies have suggested that the dynamic functional con-nectivity can be used to characterize different psychiatric and neurodegenerative disorders suchas schizophrenia [Damaraju et al., 2014], Parkinson’s disease [Engels et al., 2018, Zhu et al.,2019] and Alzheimer’s disease [Niu et al., 2019]. One of the most common ways to quantifythe dynamic functional connectivity is by estimating the dynamic covariance matrices using aslide window approach. The dynamic covariance matrices have proven to be very useful inrevealing the change patterns of brain network over the time. In addition, studying the asso-ciation between those matrices with clinical factors and disease outcomes may provide usefulinsights for future clinical practice such as disease prediction. Due to the high level of noisesin the original imaging data, accurate estimation and denoising procedures are largely needed.By pooling the estimates of the covariance matrices obtained from different sliding windows6ogether and taking the structure of matrices into account, our newly proposed model can helpimprove the estimation accuracy of these dynamic covariance matrices.

Suppose we observe a set of 2D-images and some scalar predictors from n study subjects. Let Y i be a p × q matrix representing the 2D-image from the i th subject, and X i = ( x i , . . . , x is ) T be an s × vector denoting the scalar covariates of interest (e.g., time and disease predictors).We propose the following nonparametric matrix response model, E ( Y i | X i ) = g ( X i ) , (1)where g ( · ) : R s → R p × q is a nonparametric matrix-valued function that quantiﬁes the nonlinearrelationship between (each pixel of) Y i and X i . Since g ( x ) is a p × q matrix for all values of x ,we will also impose a structure constraint on g for scientiﬁc interpretability and regularizationpurpose.Our goal is to estimate the nonparametric function g . A commonly used estimator is theNadaraya-Watson estimator for the matrix data, which can be written as ˆ g nw ( x ) = (cid:80) ni =1 K H ( x − X i ) Y i (cid:80) ni =1 K H ( x − X i ) , (2)where K H ( · ) = | H | K ( H − · ) , K ( · ) is a kernel function, and H = diag ( h , h , · · · , h s ) is abandwidth matrix. It is often assumed that h = · · · = h s = h for computational convenience.However, the Nadaraya-Watson estimator is a “naive” estimator in our case because it doesnot utilize the underlying structure of the matrix response Y i . In particular, the estimator in(2) can also be obtained by vectorizing Y i , applying the Nadaraya-Watson estimator for thevectorized data, and transforming the estimator back to a matrix. To account for the matrixstructure, we take another look at the estimator in (2), which can be obtained by solving the7ollowing optimization problem ˆ g nw ( x ) = argmin Y n (cid:88) i =1 K H ( x − X i ) (cid:107) Y i − Y (cid:107) F , (3)where (cid:107) · (cid:107) F is the Frobenius norm of a matrix.To further exploit the underlying structure of the 2D response, we introduce a penalty on Y and propose to solve ˆ g ( x ) = argmin Y (cid:40) n n (cid:88) i =1 K H ( x − X i ) (cid:107) Y i − Y (cid:107) F + λ n (cid:107) Y (cid:107) (cid:41) , (4)where λ n is the tuning parameter and (cid:107)·(cid:107) is some norm of a matrix. Possible choices are nuclearnorms, total variation norms, and their combination; and each of those norms will have differentregularization effects on the image outcomes. For this paper, we mainly focus on the nuclearnorm regularization for illustration, i.e., writing (cid:107) Y (cid:107) as (cid:107) Y (cid:107) ∗ , which is deﬁned as the sum of allsingular values of the matrix Y . The nuclear norm is very popular in 2D-image denoising [Guet al., 2014]. The underlying true 2D-image is often of low rank or approximately low rank,and the nuclear norm regularization can help recover the low rank structure given a noisy image[Chen et al., 2013]. In our case, ˆ g ( x ) can be regarded as an image estimate at the point x , andtherefore the penalty (cid:107) Y (cid:107) ∗ can push for a low rank representation of the image estimate.It can be shown that solving (4) is equivalent to solving ˆ g ( x ) = argmin Y (cid:26) (cid:107) ˆ g nw ( x ) − Y (cid:107) F + nλ n (cid:80) ni =1 K H ( x − X i ) (cid:107) Y (cid:107) ∗ (cid:27) . (5)The optimization problem (5) can be solved using the following proposition restated fromCai et al. [2010]. Proposition 1.

Consider the singular value decomposition of a matrix Y ∈ R p × q with rank r , Y = U Σ V ∗ , Σ = diag ( { σ j } ≤ j ≤ r ) , where U and V are p × r and q × r matrices respectively with orthonormal columns, and ingular values σ j are positive. The soft-thresholding operator D τ is deﬁned as D τ ( Y ) = U D τ (Σ) V ∗ , D τ (Σ) = diag ( { ( σ j − τ ) + } ≤ j ≤ r ) , (6) where ( · ) + is the positive part of ( · ) . Then D τ ( Y ) satisﬁes D τ ( Y ) = arg min X (cid:26) (cid:107) Y − X (cid:107) F + τ (cid:107) X (cid:107) ∗ (cid:27) , where (cid:107) X (cid:107) ∗ is deﬁned as the nuclear norm of the matrix X . By Proposition 1, our estimator in (4) can be obtained using the following algorithm.

Algorithm 1

Algorithm to solve the optimization problem (4).

Input : { ( X i , Y i ) , ≤ i ≤ n } , x, H, λ n . Step 1 : Perform singular value decomposition of ˆ g nw ( x ) = (cid:80) ni =1 K H ( x − X i ) Y i (cid:80) ni =1 K H ( x − X i ) , and denote it by U Σ V ∗ . The diagonal matrix Σ = diag ( { σ j } ≤ j ≤ r ) , where σ ≥ σ ≥ . . . ≥ σ r > with r being the rank of Σ . Step 2 : Set τ = nλ n (cid:80) ni =1 K H ( x − X i ) , and calculate the soft-thresholding operator D τ (Σ) = diag ( { ( σ j − τ ) + } ≤ j ≤ r ) . Step 3 : Calculate ˆ g ( x ) = U D τ (Σ) V ∗ . Output : ˆ g ( x ) . The optimization problem (4) involves two tuning parameters, the bandwidth h and the reg-ularization parameter λ . The choices of these two parameters are critical as they control thetemporal smoothing level and the spatial low-rank level, respectively. In this paper, we de-rive a Bayesian information criterion (BIC) to select them. Deﬁne ˜ λ = nλ n (cid:80) ni =1 K H ( x − X i ) and ˆ Y i (˜ λ ) = ˆ g ( X i ) . Without loss of generality, we assume p ≥ q and denote the singular values of ˆ Y i (˜ λ ) by b i (˜ λ ) ≥ · · · ≥ b iq (˜ λ ) ≥ . From Algorithm 3.1, it can be seen that the singular valuesof ˆ Y i (˜ λ ) are corresponding truncated singular values of ˆ g nw ( X i ) .9ince we are considering a least squared error loss in (4), the BIC can be deﬁned as BIC ( (cid:101) λ ) = npq log( 1 npq n (cid:88) i =1 (cid:107) Y i − ˆ Y i ( (cid:101) λ ) (cid:107) F ) + log( npq ) df ( (cid:101) λ ) , (7)where df ( (cid:101) λ ) is given in the following proposition. The proof of the proposition is deferred tothe Supplementary File. Proposition 2.

Denote ˆ g nw ( X i ) ’s singular values by σ i ≥ σ i ≥ · · · ≥ σ ir i > and σ ik = 0 for k > r i . An unbiased estimator of the degree of freedom df ( (cid:101) λ ) is ˆ df ( (cid:101) λ ) = K H (0) n (cid:88) i =1 ˆ df i ( (cid:101) λ ) (cid:80) nj =1 K H ( X i − X j ) , where ˆ df i ( (cid:101) λ ) = q (cid:88) k =1 { b ik ( (cid:101) λ ) > } (cid:110) (cid:88) ≤ j ≤ p,j (cid:54) = k,k ≤ r i σ ik ( σ ik − (cid:101) λ ) σ ik − σ ij + (cid:88) ≤ j ≤ q,j (cid:54) = k,k ≤ r i σ ik ( σ ik − (cid:101) λ ) σ ik − σ ij (cid:111) . (8) In this section, we present theoretical results of the estimation procedure in (4), including arisk bound of the regularized estimator and a rank consistency result. Denote the strength ofregularization as λ n and the true response as g ( X ) given covariates X . Assume g ( X ) hasunknown rank r and denote the global minimizer of (4) by ˆ g ( X ) . For any two sequences of realnumbers a n and b n , we write a n (cid:16) b n if there exists universal positive constants C and C suchthat C b n ≤ a n ≤ C b n . We deﬁne a ∨ b = max( a, b ) and a ∧ b = min( a, b ) for any a, b ∈ R .With a little abuse of the notation, we use C to denote a universal constant whose value maychange in different context but does not affect the results. For a matrix A and a sequence of realnumbers a n , we write A = O p ( a n ) or A = o p ( a n ) if every element of A is O p ( a n ) or o p ( a n ) .Let g jk ( x ) be the ( j, k ) -th component of g ( x ) . We make the following assumptions: Assumption 1.

We assume that | g jk ( x ) − g jk ( y ) | < C (cid:107) x − y (cid:107) α with α > , ≤ j ≤ p, ≤ k ≤ q for any (cid:107) x − y (cid:107) < δ and some C > , when δ > is sufﬁciently small. Assumption 2.

Assume that npqh α + s → ∞ , nh s → ∞ , and pqh α → as n → ∞ . ssumption 3. We assume that the kernel function K ( · ) is bounded on R s . In other words,there exists a constant k max > such that K ( x ) ≤ k max and K H ( x ) ≤ h − s k max for any x ∈ R s . Moreover, we assume that there exist constants C f , c f > such that the density function of thecovariate x satisifes c f ≤ f ( x ) ≤ C f . Assumption 4.

Assume that n − / ( p ∧ q ) → , ( pq ) / h α ( p ∧ q ) / λ − n → , λ n ( p ∧ q ) → ,and pq ( p ∧ q ) nλ n → as n → ∞ . Assumption 1 assumes the α -smoothness for each element of the nonparametric function g ( x ) . This assumption is commonly used in multivariate function estimation literature such asScott [2015]. It is possible to extend the results for anisotropic case in the future work basedon the techniques developed in this paper. Assumptions 2 and 4 are required for estimationconsistency and rank consistency. Assumption 3 is satisﬁed for most kernel density functions.The boundedness condition for the density funciton of x will hold if x is deﬁned on a compactsupport. With these assumptions, we can state the two main theorems of this paper. Their proofsare given in the Supplementary File. Theorem 3.

Suppose that Assumptions 1–3 hold. We consider two cases for p and q (e.g.,whether they diverge or not).(1) If both p and q are ﬁxed, let h (cid:16) ( log nn ) α s ∧ s and λ n (cid:16) h α , then (cid:107) ˆ g ( x ) − g ( x ) (cid:107) F ≤ Cr (cid:18) log nn (cid:19) α α s ∧ α s . (2) If p ∨ q → ∞ and p ∨ q = o (cid:16) n α α s ∧ ( n log n ) α s (cid:17) , then by letting h (cid:16) n − α s ∨ ( log nn ) s and λ n = h α ( pq ) / , we have (cid:107) ˆ g ( x ) − g ( x ) (cid:107) F ≤ Cpqr (cid:18) n − α α s ∨ ( log nn ) α s (cid:19) . Note that the risk bound involves two quantities n − α α s and (log n/n ) − α s . As the numberof predictors s increases, it becomes more difﬁcult to estimate g ( x ) . Meanwhile, h s is involvedwhen proving strongly restricted convexity of the loss function. A larger value of s indicates11maller probability of the loss function being strong restricted convex. In contrast, α describesthe smoothness of g ( x ) . A larger α leads to a smaller risk bound and a faster convergence rate. Remark 4. If p and q are ﬁxed and s ≤ α , the optimal bandwidth h can be chosen arbitrarilyclose to n − α s , which leads to the same convergence rate (with additional logarithmic factor)for estimating an α -smooth, s -dimensional function without regularization. Remark 5.

When max( p, q ) → ∞ , if we further assume s ≤ α and choose nh α + s (cid:16) ( √ p + √ q ) pq and λ n (cid:16) ( pq ) / (cid:16) ( √ p + √ q ) npq (cid:17) α α s , as we let λ n → and nh s → , we obtain max( p, q ) = o ( n α α s ) . This is the necessary condition for ˆ g ( x ) being consistent. The assump-tion s ≤ α rules out the case where there are too many covariates in the model. Next we present the rank consistency result. We consider three general cases for differentvalues of p and q (e.g., whether they diverge or not) and discuss the corresponding choices of λ n and h as folllows,(C1) If both p, q are ﬁxed, we can choose h (cid:16) ( log nn ) α s ∧ s and λ n (cid:16) h α log n .(C2) If p ∧ q is ﬁnite, and p ∨ q → ∞ satisfying (log n ) ( p ∨ q ) = o (( n log n ) α s ∧ n α α s ) , thenwe choose h = ( log nn ) s ∨ n − α s and λ n (cid:16) ( p ∨ q ) / h α log n .(C3) If p (cid:16) q , and p → ∞ , then we let h (cid:16) ( log nn ) s ∨ n − α s and λ n (cid:16) p h α (log n ) . Inaddition, we assume (log n ) p = o ( h − α / ) . Theorem 6.

Suppose that Assumptions 1–4 hold, and one of the cases in (C1)–(C3) holds, then ˆ g ( x ) is consistent and rank consistent, i.e., P { rank (ˆ g ( x )) = rank ( g ( x )) } → , as n → ∞ . It can be seen that rank consistency requires stronger assumptions on p, q compared withthose in Theorem 3. For instance, the desired λ n is much larger than the one from the previoustheorem. Meanwhile, pq is not allowed to be greater than n for rank consistency.12 Simulation

In this section, we evaluate the empirical performance of our method and other competingmethods. We consider both univariate and multivariate X , different nonparametric functionsand different correlation structures of the random error E i , where E i = Y i − g ( x i ) . Setting I : We set the dimensions of the image p = q = 64 , and set the ( j, k ) -th element of thenonparametric function g ( x ) jk = { sin(10 πx ) + cos(10 πx ) + 0 . j + k ) } ∗ B jk , ≤ j, k ≤ ,where ≤ x ≤ and B jk is the ( j, k ) -th element of the true signal B . The true signal B is generated from a 64-by-64 image, where we consider three shapes: a cross, a square anda T-shape. We have plotted the true shapes in Figure 1(a)(b)(c), where we assign B a valueof 5 for black regions and 0 for white regions. The sample size is set at n = 200 , . Thecovariates { x i } , i = 1 , , . . . , n are equally spaced on [0,1]. The response Y i is generatedfrom Y i = g ( x i ) + E i , where vec ( E i ) ’s are i.i.d N (0 , I pq ) . The optimal bandwidth h and λ areselected by BIC. For the kernel function, we use the standard gaussian kernel deﬁned as K ( x ) =exp( − x / / √ π . We compare our method with the naive Nadaraya-Watson estimator and theLasso estimator, where the Lasso estimator is obtained by solving the following optimizationproblem ˆ g lasso ( x ) = argmin Y (cid:40) n n (cid:88) i =1 K H ( x − X i ) (cid:107) Y i − Y (cid:107) F + λ n (cid:107) Y (cid:107) (cid:41) , (9)where (cid:107) Y (cid:107) is deﬁned as the sum of the absolute values of all the elements of the matrix Y . Wealso use the BIC as deﬁned in (7) to choose the tuning parameter for Lasso. Here the degree offreedom can be obtained by the chain rule as ˆ df = n (cid:88) i =1 tr (cid:32) ∂ vec ( ˆ Y i ) ∂ vec ( Y i ) (cid:33) = n (cid:88) i =1 p (cid:88) j =1 q (cid:88) k =1 ∂ ˆ Y ijk ∂ ˆ g ijk ( nw ) ∂ ˆ g ijk ( nw ) ∂Y ijk = n (cid:88) i =1 K H (0) (cid:80) nj =1 K H ( X i − X j ) (cid:107) sign ( vec ( ˆ Y i )) (cid:107) , a) (b) (c)(d) (e) (f) Figure 1: (a)-(c): true signals, (d)-(f): recovered signalswhere we have used the fact that ∂ ˆ Y ijk ∂ ˆ g ijk ( nw ) = | sign ( ˆ Y ijk ) | .In each Monte Carlo simulation, we generate n samples as the training set and another 500samples as the test set. We report the integrated error (cid:82) x (cid:107) ˆ Y ( x ) − Y ( x ) (cid:107) F dx , which can beapproximated by (cid:80) i =1 (cid:107) ˆ Y ( x testi ) − Y ( x testi ) (cid:107) F . Table 1 shows the average integrated testerror by our method, naive Nadaraya-Watson estimator and Lasso estimator based on 100 MonteCarlo replicates. We also report the average selected rank by our method using BIC, deﬁned as n (cid:80) ni =1 rank ( ˆ Y ( x i )) .Table 1: Simulation results for Setting I: mean of integrated test error and associated standarderrors obtained from our method, NW estimator, and Lasso, the average selected rank andtrue rank are reported for three different shapes B . The results are based on 100 Monte Carloreplications.Shape Our method NW Lasso Selected rank True rankCross 4458 (0.70) 6024 (0.96) 5148 (0.81) 3.53 (0.004) 4n = 200 Square 4288 (0.82) 6107 (0.99) 4875 (0.83) 2.00 (0.000) 2Tshape 4472 (0.76) 6094 (0.98) 5193 (0.92) 3.22 (0.006) 4Cross 4306 (0.41) 5009 (0.52) 4560 (0.48) 3.99 (0.000) 4n = 500 Square 4186 (0.41) 4803 (0.48) 4440 (0.45) 2.01 (0.000) 2Tshape 4255 (0.60) 5042 (0.51) 4579 (0.49) 3.52 (0.007) 414rom the results, we can see that our method performs better than Nadaraya-Watson esti-mator and Lasso estimator in all cases. In addition, our method can estimate the true rank ofthe image accurately. We have plotted the recovered signals from one randomly selected MonteCarlo study in Figure 1(d)(e)(f), and our method manages to recover the true signals very well. Setting II : In this setting, we consider the case where the errors vec ( E i ) are correlatedacross different subjects i ’s and the pixels within the same random error matrix E i are alsocorrelated. Deﬁne e = ( vec ( E i ) T , . . . , vec ( E n ) T ) T ∈ R pqn . We assume e ∼ N (0 , Σ) , where Σ = Σ ⊗ Σ ∈ R pqn × pqn . Here Σ is a n × n matrix representing the correlation betweendifferent subjects ≤ i ≤ n , Σ is a pq × pq matrix representing the correlation among differentpixels of the 2D image, and ⊗ is the Kronecker product. This decomposition of Σ is oftenreferred to as the separability of the covariance matrix, which was studied in the literaturesuch as De Munck et al. [2002], Dawid [1981]. For Σ , we assume it has a subject-wise 1Dautoregressive structure. In particular, we set the ( i , i ) -th element of Σ as . | i − i | for ≤ i , i ≤ n . For Σ , we assume it is incorporated with a pixel-wise 2D autoregressive structure.Speciﬁcally, we set the ( j + ( k − q, j + ( k − q ) -th element of Σ as . | j − j | + | k − k | for ≤ j , j ≤ p and ≤ k , k ≤ q . The average integrated test errors by three methods and theaverage selected rank of our method are summarized in Table 2. From the results, we can seethat our methods still outperforms Nadaraya-Watson estimator and Lasso estimator in all cases.Compared with independent error case, we may over select the rank a bit, possibly due to theerror correlations, however, the average integrated errors are still at the similar levels for bothcases. Setting III : We consider shapes of the image with the same pixel value as setting I. We set the ( j, k ) -th element of the nonparametric function g ( x ) jk = { sin(2 π (cid:107) x (cid:107) ) + cos(2 π (cid:107) x (cid:107) ) + 0 . j + k ) } ∗ B jk , x ∈ [0 , × [0 , , ≤ j, k ≤ , where we consider the same three shapes of thetrue image B and (cid:107) x (cid:107) is the l -norm of x . The random error vec ( E i ) ’s are i.i.d. N (0 , I pq ) . Thecovariates x i consist of a set of { x jk } , ≤ j ≤ , ≤ k ≤ , that are equally spaced on [0 , × [0 , . The sample sizes n = 200 , are considered, and the multivariate Gaussiankernel deﬁned as K ( x ) = exp( −(cid:107) x (cid:107) / / π is used. In each Monte Carlo simulation, we15able 2: Simulation results for Setting II: mean of integrated test error and associated standarderrors obtained from our method, NW estimator, and Lasso, the average selected rank andtrue rank are reported for three different shapes B . The results are based on 100 Monte Carloreplications.Shape Our method NW Lasso Selected rank True rankCross 4656 (2.23) 6120 (3.47) 5528 (3.49) 6.64 (0.009) 4n = 200 Square 4463 (2.31) 5785 (3.27) 5169 (4.72) 4.42 (0.009) 2Tshape 4667 (2.49) 6017 (3.65) 5591 (3.74) 6.52 (0.008) 4Cross 4403 (1.23) 5240 (1.96) 4769 (1.71) 6.90 (0.008) 4n = 500 Square 4296 (1.10) 5036 (1.64) 4692 (1.68) 4.76 (0.000) 2Tshape 4404 (1.21) 5057 (1.60) 4797 (1.69) 6.75 (0.008) 4generate n samples as the training set and another 500 samples as the test set. We report theaverage integrated test error obtained by our method, the naive Nadaraya-Watson estimatorand Lasso estimator and the average selected rank of our method based on 100 Monte Carloreplicates in Table 3.Table 3: Simulation results for Setting III: mean of integrated test error and associated standarderrors obtained from our method, NW estimator, and Lasso, the average selected rank andtrue rank are reported for three different shapes B . The results are based on 100 Monte Carloreplications.Shape Our method NW Lasso Selected rank True rankCross 4711 (0.86) 7021(1.17) 5759(1.01) 4.03 (0.002) 4n = 200 Square 4518 (0.80) 6723(1.09) 5326(1.05) 2.04 (0.002) 2Tshape 4719 (0.83) 7137 (1.11) 5829(1.15) 4.34 (0.005) 4Cross 4647 (0.49) 5562 (0.62) 4843 (0.48) 4.84 (0.006) 4n = 500 Square 4281 (0.42) 5506 (0.58) 4649(0.45) 2.01 (0.001) 2Tshape 4376 (0.45) 5620 (0.59) 4875 (0.49) 4.12 (0.002) 4 Setting IV : We consider the same setting as Setting III except that the random error vec ( E i ) ’sare correlated across different i ’s and the pixels within the same random error matrix E i are alsocorrelated. The random error vec ( E i ) s are generated the same as Setting II. The average inte-grated test errors by three methods and the average selected rank of our method are summarizedin Table 4.The ﬁndings in the multivariate case (Settings III and IV) are consistent with the ones in theunivariate case. The simulation results in this section conﬁrm the excellent performance of the16able 4: Simulation results for Setting IV: mean of integrated test error and associated standarderrors obtained from our method, NW estimator, and Lasso, the average selected rank andtrue rank are reported for three different shapes B . The results are based on 100 Monte Carloreplications.Shape Our method NW Lasso Selected rank True rankCross 4894 (2.58) 7164 (3.89) 5804 (4.13) 5.40 (0.008) 4n = 200 Square 4642 (2.43) 6858 (3.57) 5360 (4.0) 3.14 (0.009) 2Tshape 4910 (2.66) 7283 (3.91) 5880 (4.36) 5.36 (0.008) 4Cross 4779 (1.73) 5687 (2.38) 5067 (1.86) 6.38 (0.008) 4n = 500 Square 4574 (1.51) 5614 (2.22) 4815 (1.62) 4.12 (0.001) 2Tshape 4797 (1.55) 5745 (2.09) 5080 (4.72) 6.34 (0.008) 4proposed nonparametric estimation procedure. Calcium imaging has been used as a promising technique to monitor the dynamic activity ofneuronal populations. We apply the proposed method to one-photon calcium imaging datasetcollected by Ilana Witten’s lab at the Princeton Neuroscience Institute [Petersen et al., 2018],which can be downloaded from https://ajpete.com/software . Calcium imaging isan important ﬂuorescent microscopy technique regulating a great variety of neuronal processessimultaneously [Berridge, 1998, Andilla and Hamprecht, 2014]. The calcium imaging data canbe viewed as a video clip (i.e., a collection of 2D-images recorded at the same frame overa period of time) that presents the location and time of neuron ﬁring [Apthorpe et al., 2016,Petersen et al., 2018]. Each pixel in a frame is continuous-valued and larger values indicatehigher ﬂuorescent intensities caused by greater calcium concentrations. The calcium imagingvideo we used consists of 3000 frames of size × pixels sampled at 10 Hz. An exampleframe randomly selected from the video is shown in Figure 4(a).Figure 3 gives estimated ﬂuorescent intensities versus true values of two randomly selectedpixels across 3000 frames. The optimal bandwidth and regularization parameter are selected bythe proposed BIC criteria. The average rank selected by our method is . SE = 1 . across17a) (b)Figure 2: (a) A sequence of frames (b) scatterplot for a ﬁxed voxel of coordinate (200,60) overframesall frames. We also plot estimated images from a randomly selected frame by our method inFigure 4(b). From that ﬁgure, we can see that our method ampliﬁes potential neuron signals,but also weakens those unclear and smaller neurons.We further evaluate the prediction performance of our method by cross-validation. Wecompare our method with two nonparametric regression methods: Nadaraya-Watson regressionand Lasso deﬁned in (3) and (9), respectively. We also compare our method with the low-rankmatrix response linear regression (L2RM) method [Kong et al., 2019]. As shown in Table 5,our method reaches the smallest leave-one-out cross-validation error among four methods. ForLasso estimator, the selected tuning parameter is always zero, hence making the Lasso estimatorequivalent to the Nadaraya-Watson estimator. This conﬁrms that the sparsity assumption doesnot seem plausible for the calcium imaging application, while low rankness is a more reasonableassumption. The linear model L2RM has the highest prediction error, which validates the useof a nonparametric model for the data set.Table 5: Leave-one-out cross-validation errors (Err)( ) and associated standard deviation bythree methods for calcium imaging dataErr - Our method Err - NW Err - Lasso Err - L2RM . .

63) 2 . .

04) 2 . .

04) 5 . . We also apply our method to an EEG dataset, which is available at https://archive.ics.uci.edu/ml/datasets/EEG+Database . The data was collected from 122 sub-jects by the Neurodynamics Laboratory to examine the EEG correlates of genetic predispositionto alcoholism. More details about the study can be found in Zhang et al. [1995]. Among the122 subjects, 77 were alcoholic individuals and 45 were controls. The dataset included voltagevalues from 64 electrodes placed on each subject’s scalps sampled at 256 Hz (3.9- msec epoch)for 1 second. Each subject was exposed to three stimuli: a single stimulus, two matched stimuli,two unmatched stimuli. For each subject, we use the average of all trials for each subject undersingle-stimulus condition, which results in a × matrix. Among those 122 subjects, we19andomly select one alcoholic individual and one control, and analyze the dynamic functionalconnectivity among different electrodes across time. The simplest analytical strategy to in-vestigate dynamic functional connectivity consists in segmenting the time courses from spatiallocations into a set of temporal windows, inside which their pairwise connectivity is probed. Bygathering functional connectivity descriptive measures over subsequent windows, ﬂuctuationsin connectivity can be captured. The basic sliding window framework has been applied by theneuroimaging community to understand how brain dynamics that are related to our cognitiveabilities [Kucyi and Davis, 2014, Elton and Gao, 2015, Madhyastha and Grabowski, 2014], isaffected by brain disorders [Sako˘glu et al., 2010, Jones et al., 2012], or compares to other func-tional or structural brain measures [Leonardi et al., 2013, Tagliazucchi et al., 2012, Li´egeoiset al., 2016]. More speciﬁcally, we use a moving window of size 100 to calculate a series ofcovariance matrices along dimension of 256, resulting 157 covariance matrices of size × for each individual.We apply the proposed method to analyze the dynamic change of covariance structures overthe time in both alcoholic individual and control. The optimal bandwidth and regularization pa-rameter are selected by BIC. Figure 5 shows estimated images of 10th frame by our method foralcoholic individual and control respectively. We observe a signiﬁcant structural difference intheir covariance matrices. Speciﬁcally, the alcoholic individual has a more complex covariancestructure than that from the control. Moreover, the average selected rank of alcoholic individualis 22.44 (SE = 0.93) compared to 6.82 (SE = 0.87) of control. This can be explained by drasticﬂuctuation across time in EEG signals of alcoholic individuals compared to stable variation incontrol. We further evaluate our method by leave-one-out cross-validation error and compare itwith Nadaraya-Watson regression, Lasso and L2RM. As shown in Table 6, our method achievesthe smallest leave-one-out cross-validation error among three methods. For Lasso estimator, theselected tuning parameter is always zero. In other words, the Lasso estimator is the same as theNadaraya-Watson estimator for this data application, which implies that the low rankness as-sumption is a more reasonable assumption than sparsity. We also notice that linear L2RM has amuch higher estimation error than the nonparametric methods. This indicates a strong nonlinearpattern in EEG signals for both alcoholic and control subjects.20able 6: Leave-one-out cross-validation errors (SE) by three methods for EEG dataErr - Our method Err - NW Err - Lasso Err - L2RMAlcoholic 1.05(1.20) 2.20(2.84) 2.20(2.84) 908.29(623.54)Control 9.57(61.44) 21.87(118.06) 21.87(0.75) 22513.17(16763.92)(a) (b)Figure 5: (a) Estimated 10th frame for alcoholic (b) Estimated 10th frame for control Supplementary File

Additional technical details and proofs of theorems are provided in the Supplementary File.Computational code is available at https://github.com/vivifox/Code . References

Ferran Diego Andilla and Fred A Hamprecht. Sparse space-time deconvolution for calciumimage analysis. In

Advances in Neural Information Processing Systems , pages 64–72, 2014.Noah Apthorpe, Alexander Riordan, Robert Aguilar, Jan Homann, Yi Gu, David Tank, andH Sebastian Seung. Automatic neuron detection in calcium imaging data using convolutionalnetworks. In

Advances in Neural Information Processing Systems , pages 3270–3278, 2016.Michael J Berridge. Neuronal calcium signaling.

Neuron , 21(1):13–26, 1998.Bharat Biswal, F Zerrin Yetkin, Victor M Haughton, and James S Hyde. Functional connectivity21n the motor cortex of resting human brain using echo-planar MRI.

Magnetic Resonance inMedicine , 34(4):537–541, 1995.Bharat B Biswal, Maarten Mennes, Xi-Nian Zuo, Suril Gohel, Clare Kelly, Steve M Smith,Christian F Beckmann, Jonathan S Adelstein, Randy L Buckner, Stan Colcombe, et al. To-ward discovery science of human brain function.

Proceedings of the National Academy ofSciences , 107(10):4734–4739, 2010.Jian-Feng Cai, Emmanuel J Cand`es, and Zuowei Shen. A singular value thresholding algorithmfor matrix completion.

SIAM Journal on Optimization , 20(4):1956–1982, 2010.Vince D Calhoun and Tulay Adali. Time-varying brain connectivity in fMRI data: whole-brain data-driven approaches for capturing and characterizing dynamic states.

IEEE SignalProcessing Magazine , 33(3):52–66, 2016.Vince D Calhoun, Robyn Miller, Godfrey Pearlson, and Tulay Adalı. The chronnectome: time-varying connectivity networks as the next frontier in fMRI data discovery.

Neuron , 84(2):262–274, 2014.Catie Chang, Zhongming Liu, Michael C Chen, Xiao Liu, and Jeff H Duyn. EEG correlates oftime-varying bold functional connectivity.

NeuroImage , 72:227–236, 2013.Kun Chen, Hongbo Dong, and Kung-Sik Chan. Reduced rank regression via adaptive nuclearnorm penalization.

Biometrika , 100(4):901–920, 2013.Eswar Damaraju, Elena A Allen, Aysenil Belger, Judith M Ford, S McEwen, DH Mathalon,BA Mueller, GD Pearlson, SG Potkin, A Preda, et al. Dynamic functional connectivityanalysis reveals transient states of dysconnectivity in schizophrenia.

NeuroImage: Clinical ,5:298–308, 2014.A. P. Dawid. Some matrix-variate distribution theory: notational considerations and a Bayesianapplication.

Biometrika , 68(1):265–274, 1981. ISSN 0006-3444. doi: 10.1093/biomet/68.1.265. 22an Casper De Munck, Hilde M Huizenga, Lourens J Waldorp, and RA Heethaar. Estimatingstationary dipoles from MEG/EEG data contaminated with spatially and temporally corre-lated background noise.

IEEE Transactions on Signal Processing , 50(7):1565–1572, 2002.Shanshan Ding and Dennis Cook. Matrix variate regressions and envelope models.

Journal ofthe Royal Statistical Society: Series B (Statistical Methodology) , 80(2):387–408, 2018.Amanda Elton and Wei Gao. Task-related modulation of functional connectivity variability andits behavioral correlations.

Human Brain Mapping , 36(8):3260–3272, 2015.Gwenda Engels, Annemarie Vlaar, Br ´onagh McCoy, Erik Scherder, and Linda Douw. Dynamicfunctional connectivity and symptoms of parkinsons disease: a resting-state fmri study.

Fron-tiers in Aging Neuroscience , 10:388, 2018.Michael D Fox, Abraham Z Snyder, Justin L Vincent, Maurizio Corbetta, David C Van Essen,and Marcus E Raichle. The human brain is intrinsically organized into dynamic, anticor-related functional networks.

Proceedings of the National Academy of Sciences , 102(27):9673–9678, 2005.Karl J Friston. Functional and effective connectivity in neuroimaging: a synthesis.

HumanBrain Mapping , 2(1-2):56–78, 1994.Christine Grienberger and Arthur Konnerth. Imaging calcium in neurons.

Neuron , 73(5):862–885, 2012.Shuhang Gu, Lei Zhang, Wangmeng Zuo, and Xiangchu Feng. Weighted nuclear norm min-imization with application to image denoising. In

Proceedings of the IEEE Conference onComputer Vision and Pattern Recognition , pages 2862–2869, 2014.Fritjof Helmchen and Winfried Denk. Deep tissue two-photon microscopy.

Nature Methods , 2(12):932, 2005.Barry Horwitz. The elusive concept of brain connectivity.

NeuroImage , 19(2):466–470, 2003.David T Jones, Prashanthi Vemuri, Matthew C Murphy, Jeffrey L Gunter, Matthew L Senjem,Mary M Machulda, Scott A Przybelski, Brian E Gregg, Kejal Kantarci, David S Knopman,23t al. Non-stationarity in the “resting brain’s” modular architecture.

PLOS ONE , 7(6):e39731,2012.Dehan Kong, Baiguo An, Jingwen Zhang, and Hongtu Zhu. L2RM: Low-rank linear regressionmodels for high-dimensional matrix responses.

Journal of the American Statistical Associa-tion , page to appear, 2019.Aaron Kucyi and Karen D Davis. Dynamic functional connectivity of the default mode networktracks daydreaming.

NeuroImage , 100:471–480, 2014.Nora Leonardi, Jonas Richiardi, Markus Gschwind, Samanta Simioni, Jean-Marie Annoni,Myriam Schluep, Patrik Vuilleumier, and Dimitri Van De Ville. Principal components offunctional connectivity: a new approach to study dynamic brain connectivity during rest.

NeuroImage , 83:937–950, 2013.Rapha¨el Li´egeois, Erik Ziegler, Christophe Phillips, Pierre Geurts, Francisco G´omez, Mo-hamed Ali Bahri, BT Thomas Yeo, Andrea Soddu, Audrey Vanhaudenhuyse, Steven Lau-reys, et al. Cerebral functional connectivity periodically (de) synchronizes with anatomicalconstraints.

Brain Structure and Function , 221(6):2985–2997, 2016.Tara M Madhyastha and Thomas J Grabowski. Age-related differences in the dynamic archi-tecture of intrinsic networks.

Brain Connectivity , 4(4):231–241, 2014.Bu-Qing Mao, Farid Hamzei-Sichani, Dmitriy Aronov, Robert C Froemke, and Rafael Yuste.Dynamics of spontaneous activity in neocortical slices.

Neuron , 32(5):883–898, 2001.Haijing Niu, Zhaojun Zhu, Mengjing Wang, Xuanyu Li, Zhen Yuan, Yu Sun, and Ying Han.Abnormal dynamic functional connectivity and brain states in alzheimers diseases: functionalnear-infrared spectroscopy study.

Neurophotonics , 6(2):025010, 2019.Ashley Petersen, Noah Simon, and Daniela Witten. Scalpel: Extracting neurons from calciumimaging data.

The Annals of Applied Statistics , 12(4):2430, 2018.Eftychios A Pnevmatikakis, Keith Kelleher, Rebecca Chen, Petter Saggau, Kresimir Josic, andLiam Paninski. Fast spatiotemporal smoothing of calcium measurements in dendritic trees.

PLoS Computational Biology , 8(6):e1002569, 2012.24aria Giulia Preti, Thomas AW Bolton, and Dimitri Van De Ville. The dynamic functionalconnectome: State-of-the-art and perspectives.

Neuroimage , 160:41–54, 2017.Vahid Rahmati, Knut Kirmse, Dimitrije Markovi´c, Knut Holthoff, and Stefan J Kiebel. Infer-ring neuronal dynamics from calcium imaging data using biophysical models and bayesianinference.

PLoS Computational Biology , 12(2):e1004736, 2016.¨Unal Sako˘glu, Godfrey D Pearlson, Kent A Kiehl, Y Michelle Wang, Andrew M Michael, andVince D Calhoun. A method for evaluating dynamic functional network connectivity andtask-modulation: application to schizophrenia.

Magnetic Resonance Materials in Physics,Biology and Medicine , 23(5-6):351–366, 2010.David W Scott.

Multivariate density estimation: theory, practice, and visualization . John Wiley& Sons, 2015.Christoph Stosiek, Olga Garaschuk, Knut Holthoff, and Arthur Konnerth. In vivo two-photoncalcium imaging of neuronal networks.

Proceedings of the National Academy of Sciences ,100(12):7319–7324, 2003.Enzo Tagliazucchi, Frederic Von Wegner, Astrid Morzelewski, Verena Brodbeck, and HelmutLaufs. Dynamic bold functional connectivity in humans and its electrophysiological corre-lates.

Frontiers in Human Neuroscience , 6, 2012.Richard W Tsien and Roger Y Tsien. Calcium channels, stores, and oscillations.

Annual Reviewof Cell Biology , 6(1):715–760, 1990.Xiao Wang and Hongtu Zhu. Generalized scalar-on-image regression models via total variation.

Journal of the American Statistical Association , 112(519):1156–1168, 2017.X L Zhang, H Begleiter, B Porjesz, W Wang, and A Litke. Event related potentials duringobject recognition tasks.

Brain Research Bulletin , 38:531–538, 1995.Hua Zhou and Lexin Li. Regularized matrix regression.

Journal of the Royal Statistical Society:Series B (Statistical Methodology) , 76(2):463–483, 2014.25ong Zhu, Juan Huang, Lifu Deng, Naying He, Lin Cheng, Pin Shu, Fuhua Yan, Shanbao Tong,Junfeng Sun, and Huawei Ling. Abnormal dynamic functional connectivity associated withsubcortical networks in parkinsons disease: A temporal variability perspective.

Frontiers inNeuroscience , 13:80, 2019.Hongtu Zhu, Linglong Kong, Runze Li, Martin Styner, Guido Gerig, Weili Lin, and John HGilmore. FADTTS: functional analysis of diffusion tensor tract statistics.

NeuroImage , 56(3):1412–1425, 2011.Hongtu Zhu, Runze Li, and Linglong Kong. Multivariate varying coefﬁcient model for func-tional responses.