Multiple-shooting adjoint method for whole-brain dynamic causal modeling
Juntang Zhuang, Nicha Dvornek, Sekhar Tatikonda, Xenophon Papademetris, Pamela Ventola, James Duncan
MMultiple-shooting adjoint method forwhole-brain dynamic causal modeling
Juntang Zhuang, Nicha Dvornek, Sekhar Tatikonda, Xenophon Papademetris,Pamela Ventola, James Duncan
Yale University
Abstract.
Dynamic causal modeling (DCM) is a Bayesian framework toinfer directed connections between compartments, and has been used todescribe the interactions between underlying neural populations basedon functional neuroimaging data. DCM is typically analyzed with theexpectation-maximization (EM) algorithm. However, because the inver-sion of a large-scale continuous system is difficult when noisy observa-tions are present, DCM by EM is typically limited to a small numberof compartments ( < https://jzkay12.github.io/TorchDiffEqPack Keywords: multiple shoot, adjoint method, dynamic causal modeling
Autism spectrum disorder (ASD) is a neurodevelopmental disorder that affectsboth social behavior and mental health [11]. ASD is typically diagnosed withbehavioral tests, and recently functional MRI (fMRI) has been applied to analyzethe cause of ASD [3]. Connectome analysis in fMRI aims to elucidate neuralconnections in the brain and can be generally categorized into two types: the a r X i v : . [ q - b i o . N C ] F e b Authors Suppressed Due to Excessive Length functional connectome (FC) [21] and the effective connectome (EC) [5]. TheFC typically calculates the correlation between time-series of different regions-of-interest (ROIs) in the brain, which is typically robust and easy to compute;however, FC does not reveal the underlying dynamics. EC models the directedinfluence between ROIs, and is widely used in analysis of EEG [8] and fMRI [19].EC is typically estimated using dynamic causal modeling (DCM) [5]. DCMcan be viewed as a Bayesian framework for parameter estimation in a dynamicalsystem represented by an ordinary different equation (ODE). A DCM modelis typically optimized using the expectation-maximization (EM) algorithm [10].Despite its wide application and good theoretical properties, a drawback is weneed to re-derive the algorithm when the forward model changes, which limitsits application. Furthermore, current DCM can not handle large-scale systems,hence is unsuitable for whole-brain analysis. Recent works such as rDCM [4],spectral-DCM [17] and sparse-DCM [16] modify DCM for whole-brain analysisof resting-state fMRI, yet they are limited to a linear dynamical system anduse the EM algorithm for optimization, hence cannot be used as off-the-shelfmethods for different forward models.In this project, we propose the Multiple-Shooting Adjoint (MSA) methodfor parameter estimation in DCM. Specifically, MSA uses the multiple-shootingmethod [1] for robust fitting of an ODE, and uses the adjoint method [15] forgradient estimation in the continuous case; after deriving the gradient, genericoptimizers such as stochastic gradient descent (SGD) can be applied. Our con-tributions are: (1) MSA is implemented as an off-the-shelf method, and can beeasily applied to generic non-linear cases by specifying the forward model with-out re-deriving the optimization algorithm. (2) In toy examples, we validate theaccuracy of MSA in parameter estimation; we also validated its ability to han-dle large-scale systems. (3) We apply MSA in the whole-brain dynamic causalmodeling for fMRI; in a classification task of ASD vs. control, EC estimated byMSA achieves better performance than FC.
We first introduce the notations and problem in Sec 2.1, then introduce mathe-matical methods in Sec2.3-2.4, and finally introduce DCM for fMRI in Sec2.5.
We summarize notations here for the ease of reading, which correspond to Fig. 1. • z ( t ) , (cid:103) z ( t ) , z ( t ): z ( t ) is the true time-series, (cid:103) z ( t ) is the noisy observation,and z ( t ) is the estimation. If p time-series are observed, then they are p -dimensional vectors for each time t . • ( t i , (cid:98) z i ) Ni =0 : { (cid:98) z i } Ni =0 are corresponding guesses of states at split time points { t i } Ni =0 . See Fig. 1. (cid:98) z i are discrete points, while (cid:103) z ( t ) , z ( t ) , z ( t ) are trajectories. ultiple-shooting adjoint method for whole-brain dynamic causal modeling 3 Fig. 1:
Left: illustration of the shooting method. Right: illustration of the multiple-shooting method. Blue dots represent the guess of state at split time t i . • f η : Hidden state z ( t ) follows the ODE dzdt = f ( z, t ), f is parameterized by η . • θ : θ = [ η, z , ...z N ]. We concatenate all optimizable parameters into onevector for the ease of notation, denoted as θ . • λ ( t ): Lagrangian multiplier in the continuous case, used to derive the adjointstate equation.The task of DCM can be viewed as a parameter estimation problem for a con-tinuous dynamical system, and can be formulated as:argmin η (cid:90) (cid:16) z ( τ ) − (cid:103) z ( τ ) (cid:17) dτ s.t. dz ( τ ) dτ = f η ( z ( τ ) , τ ) (1)The goal is to estimate η from observations (cid:101) z . In the following sections, we firstbriefly introduce the multiple-shooting method, which is related to the numericalsolution of a continuous dynamical system; next, we introduce the adjoint statemethod, which efficiently determines the gradient for parameters in continuousdynamical systems; next, we introduce the proposed MSA method, which com-bines multiple-shooting and the adjoint state method, and can be applied withgeneral forward models and gradient-based optimizers; finally, we introduce theDCM model, and demonstrate the application of MSA. The shooting method is commonly used to fit an ODE under noisy observa-tions, which is crucial for parameter estimation in ODE. In this section, we firstintroduce the shooting method, then explain its variant, the multiple-shootingmethod, for long time-series.
Shooting method
The shooting method typically reduces a boundary-valueproblem to an initial value problem [6]. An example is shown in Fig. 1: to finda correct initial condition (at t = 0) that reaches the target (at t = 1), theshooting algorithm first takes an initial guess (e.g. (cid:91) z (0)), then integrate the Authors Suppressed Due to Excessive Length curve to reach point ( t , z (1)); the error term target − z (1) is used to updatethe initial condition (e.g. (cid:91) z (0)) so that the end-time value z (1) is closer totarget. This process is repeated until convergence. Besides the initial condition,the shooting method can be applied to update other parameters. Multiple-shooting method
The multiple-shooting method [1] is an extensionof the shooting method to long time-series; it splits a long time-series into chunks,and applies the shooting method to each chunk. Integration of a dynamicalsystem for a long time is typically subject to noise and numerical error, whilesolving short time-series is generally easier and more robust.As shown in the right subfigure of Fig. 1, a guess of initial condition at time t is denoted as (cid:98) z , and we can use any ODE solver to get the estimated integralcurve z ( t ) , t ∈ [ t , t ]. Similarly, we can guess the initial condition at time t as (cid:98) z , and get z ( t ) , t ∈ [ t , t ] by integration as in Eq. 3. Note that each time chunkis shorter than the entire chunk ( | t i +1 − t i | < | t − t | , i ∈ { , } ), hence easierto solve. The split causes another issue: the guess might not match estimationat boundary points (e.g. z ( t ) (cid:54) = (cid:98) z , z ( t ) (cid:54) = (cid:98) z ). Therefore, we need to considerthis error of mismatch when updating parameters, and minimizing this mismatcherror is typically easier compared to directly analyzing the entire sequence.The multiple-shooting method can be written as:argmin η,z ,...z N J = argmin η,z ,...z N N (cid:88) i =0 (cid:90) t i +1 t i (cid:16) z ( τ ) − (cid:103) z ( τ ) (cid:17) d τ + α N (cid:88) i =0 (cid:16) z ( t i ) − (cid:98) z i (cid:17) (2) z ( t ) = (cid:98) z i + (cid:90) tt i f η (cid:0) z ( τ ) , τ (cid:1) d τ, t i < t < t i +1 , i ∈ { , , , ...N } (3)where N is the total number of chunks discretized at points { t , ...t N } , withcorresponding guesses { (cid:98) z , ... (cid:99) z N } . We use z ( t ) to denote the estimated curve asin Eq. 3; suppose t falls into the chunk [ t i , t i +1 ], z ( t ) is determined by solvingthe ODE from ( (cid:98) z i , t i ), where (cid:98) z i is the guess of initial state at t i . We use (cid:103) z ( t )to denote the observation. The first part in Eq. 2 corresponds to the differencebetween estimation z ( t ) and observation (cid:103) z ( t ), while the second part correspondsto the mismatch between estimation (orange square, z ( t i )) and guess (blue circle, (cid:98) z i ) at split time points t i . The second part is weighted by a hyper-parameter α . The ODE function f is parameterized by η . The optimization goal is to findthe best η that minimizes loss in Eq. 2, besides model parameters η , we alsoneed to optimize the guess (cid:98) z i for state at time t i , i ∈ { , , ...N } . Note thatthough previous work typically limits f to have a linear form, we don’t havesuch limitations. Instead, multiple-shooting is generic for general f . Our goal is to minimize the loss function in Eq. 2. Let θ = [ η, z , ..., z N ] representall learnable parameters. After fitting an ODE, we derive the gradient of loss Lw.r.t parameter θ and state guess (cid:98) z i for optimization. ultiple-shooting adjoint method for whole-brain dynamic causal modeling 5 Adjoint state equation
Note that different from discrete case, the gradient incontinuous case is slightly complicated. We refer to the adjoint method [15,23,2].Consider the following problem: dz ( t ) dt = f θ (cid:16) z ( t ) , t (cid:17) , s.t. z (0) = x, t ∈ [0 , T ] , θ = [ η, z , ...z N ] (4)ˆ y = z ( T ) , J (cid:16) ˆ y, y (cid:17) = J (cid:16) z (0) + (cid:90) T f θ ( z, t ) dt, y (cid:17) (5)where the initial condition z (0) is specified by input x , output ˆ y = z ( T ). Theloss function J is applied on ˆ y , with target y . Compared with Eq. 1 to Eq. 3,for simplicity, we use θ to denote both model parameter η and guess of initialconditions { (cid:98) z i } . The Lagrangian is L = J (cid:16) z ( T ) , y (cid:17) + (cid:90) T λ ( t ) (cid:62) (cid:104) dz ( t ) dt − f θ ( z ( t ) , t ) (cid:105) dt (6)where λ ( t ) is the continuous Lagrangian multiplier. Then we have the following: ∂J∂z ( T ) + λ ( T ) = 0 (7) dλ ( t ) dt + (cid:16) ∂f θ ( z ( t ) , t ) ∂z ( t ) (cid:17) (cid:62) λ ( t ) = 0 ∀ t ∈ (0 , T ) (8) dLdθ = (cid:90) T λ ( t ) (cid:62) ∂f θ ( z ( t ) , t ) ∂θ dt (9)We skip the proof for simplicity. In general, the adjoint method determines theinitial condition λ ( T ) by Eq. 7, then solves Eq. 8 to get the trajectory of λ ( t ), andfinally integrates λ ( t ) as in Eq. 9 to get the final gradient. Note that Eq. 7 to Eq. 9is generic for general θ , and in case of Eq. 2 and Eq. 3, we have θ = [ η, z , ...z N ],and ∇ θ = [ ∂L∂η , ∂L∂z , ... ∂L∂z N ]. Note that we need to calculate ∂f∂z and ∂f∂θ , which canbe easily computed by a single backward pass; we only need to specify the forwardmodel without worrying about the backward, because automatic differentiationis supported in frameworks such as PyTorch and Tensorflow. After deriving thegradient of all parameters, we can update these parameters by general gradientdescent methods.Note that though J ( z ( T ) , y ) is defined on a single point in Eq. 6, it can be de-fined as the integral form in Eq. 2, or a sum of single-point loss and integral form.The key observation is that for any loss in the integral form, e.g. (cid:82) Tt =0 loss ( t ) dt ,we can defined an auxiliary variable F such that dF ( t ) dt = loss ( t ) , F (0) = 0, then F ( T ) is just the value of the integral; in this way, we can transform integral form (cid:82) T loss ( t ) dt into a single point form F ( T ). Authors Suppressed Due to Excessive Length
Algorithm 1:
Multiple-shooting adjoint method
Input
Observation (cid:103) z ( t ), number of chunks N , learning rate lr . Initialize model parameter η , state { (cid:98) z i } Ni =0 at discretized points { t i } Ni =0 Repeat until convergence (1) Estimate trajectory z ( t ) from current parameters by the multipleshooting method as in Eq. 3.(2) Compute the loss J in Eq. 2, plug J in Eq. 6. Derive the gradient bythe adjoint method as in Eq. 7 to Eq. 9.(3) Update parameters θ ← θ − lr × ∇ θ Adaptive checkpoint adjoint
Eq. 7 to Eq. 9 are the analytical form of thegradient in the continuous case, yet the numerical implementation is crucial forempirical performance. Note that z ( t ) is solved in forward-time (0 to T ), while λ ( t ) is solved in reverse-time ( T to 0), yet the gradient in Eq. 9 requires both z ( t )and λ ( t ) in the integrand. Memorizing a continuous trajectory z ( t ) requires muchmemory; to save memory, most existing implementations forget the forward-timetrajectory of z ( t ), and instead only record the end-time state z ( T ) and λ ( T ) andsolve Eq. 4 and Eq. 7 to Eq. 9 in reverse-time on-the-fly.While memory cost is low, existing implementations of the adjoint methodtypically suffer from numerical error: since the forward-time trajectory (denotedas −−→ z ( t ) = z ( t )) is deleted, and the reverse-time trajectory (denoted as ←−− z ( t )) isreconstructed from the end-time state z ( T ) by solving Eq. 4 in reverse-time, −−→ z ( t )and ←−− z ( t ) cannot accurately overlap due to inevitable errors with numerical ODEsolvers. The error −−→ z ( t ) − ←−− z ( t ) propagates to the gradient in Eq. 9 in the ∂f ( z,t ) ∂z term. Please see [23] for a detailed explanation.To solve this issue, the adaptive checkpoint adjoint (ACA) [23] records −−→ z ( t )using a memory-efficient method to guarantee numerical accuracy. In this work,we use ACA for its accuracy. MSA is a combination of the multiple-shooting and theadjoint method, which is generic for various f . Details are summarized in Algo. 1.MSA iterates over the following steps until convergence: (1) estimate the trajec-tory based on the current parameters, using the multiple-shoot method for inte-gration; (2) compute the loss and derive the gradient using the adjoint method;(3) update the parameters based on the gradient. Advantages of MSA
Previous work has used the multiple-shooting method forparameter estimation in ODEs [13], yet MSA is different in the following aspects:(A) Suppose the parameters have k dimensions. MSA uses an element-wise up-date, hence has only O ( k ) computational cost in each step; yet the method in [13]requires the inversion of a k × k matrix, hence might be infeasible for large-scale ultiple-shooting adjoint method for whole-brain dynamic causal modeling 7 systems. (B) The implementation of [13] does not tackle the mismatch betweenforward-time and reverse-time trajectory, while we use ACA [23] for accurategradient estimation in step (2) of Algo. 1. (C) From a practical perspective, ourimplementation is based on PyTorch which supports automatic-differentiation,therefore we only need to specify the forward model f without the need tomanually compute the gradient ∂f∂z and ∂f∂θ . Hence, our method is off-the-shelffor general models, while the method of [13] needs to re-implement ∂f∂z and ∂f∂θ for different f , and conventional DCM with EM needs to re-derive the entirealgorithm when f changes. We briefly introduce the dynamical causal modeling here. Suppose there are p nodes (ROIs) and denote the observed fMRI time-series signal as s ( t ), whichis a p -dimensional vector at each time t . Denote the hidden neuronal state as z ( t ); then z ( t ) and s ( t ) are p -dimensional vectors for each time point t . Denotethe hemodynamic response function (HRF) [9] as h ( t ), and denote the externalstimulation as u ( t ), which is an n -dimensional vector for each t . The forward-model is: f (cid:16) [ z ( t ) D ( t )] (cid:17) = (cid:20) dz ( t ) /dtdD ( t ) /dt (cid:21) = (cid:20) D ( t ) z ( t ) + Cu ( t ) Bu ( t ) (cid:21) , D (0) = A (10) s ( t ) = (cid:16) z ( t ) + (cid:15) ( t ) (cid:17) ∗ h ( t ) , (cid:103) z ( t ) = z ( t ) + (cid:15) ( t ) = Deconv (cid:16) s ( t ) , h ( t ) (cid:17) (11)where (cid:15) ( t ) is the noise at time t , which is assumed to follow an independentGaussian distribution, and ∗ represents convolution operation. Note that a moregeneral model would be s ( t ) = (cid:16) z ( t ) + (cid:15) ( t ) (cid:17) ∗ h ( t ) + (cid:15) ( t ), where (cid:15) ( t ) is theinherent noise in neuronal state z ( t ), and (cid:15) ( t ) is the measurement noise. Weomit (cid:15) ( t ) for simplicity in this project; it’s possible to model both noises, evenmodel HRF as learnable parameters, but would cause a more complicated modeland require more data for accurate parameter estimation. D ( t ) is a p × p matrix for each t , representing the effective connectome betweennodes. A is a matrix of shape p × p , representing the interaction between ROIs. B is a tensor of shape p × p × n , representing the effect of stimulation on theeffective connectome. C is a matrix of shape p × n , representing the effect ofstimulation on neuronal state. An example of n = 1 , p = 3 is shown in Fig. 2.The task is to estimate parameters A, B, C from noisy observation s ( t ). Forsimplicity, we assume h ( t ) is fixed and use the empirical result from Nitimeproject [18] in our experiments. By deconvolution of s ( t ) with h ( t ), we get anoisy observation of z ( t ), denoted as (cid:103) z ( t ); z ( t ) follows the ODE defined in Eq. 10.By plugging f into Eq. 1, and viewing η as [ A, B, C ], this problem turns intoa parameter estimation problem for ODEs, which can be efficiently solved byAlgo. 1. We emphasize that Algo. 1 is generic and in fact MSA can be appliedto any form of f , where here the linear form of f in Eq. 10 is a special case fora specific model for fMRI. Authors Suppressed Due to Excessive Length
Fig. 2:
Toy example of dynamiccausal modeling with 3 nodes (la-beled 1 to 3). u is a 1-D stimula-tion signal, so n = 1 , p = 3. A, B, C are defined as in Eq. 10. For sim-plicity, though A is a 3 × A , , A , , A , are non-zero. We first validate MSA on toy examples of linear dynamical systems, then validateits performance on large-scale systems and non-linear dynamical systems.Fig. 3:
Results for the toy example of a linear dynamical system in Fig. 2. Left: er-ror in estimated value of connection A , , A , , A , , other parameters are set as 0 insimulation. Right: from top to bottom are the results for node 1, 2, 3 respectively. Foreach node, we plot the observation and estimated curve from MSA and EM methods.Note that the estimated curve is generated by integration of the ODE under estimatedparameters with only the initial condition known, not smoothing of noisy observation. A linear dynamical system with 3 nodes
We first start with a simplelinear dynamical system with only 3 nodes. We further simplify the matrix A as in Fig. 2, where only three elements in A are non-zero. We set B as a zerosmatrix, and u ( t ) as a 1-dimensional signal. The dynamical system is linear: (cid:20) dz ( t ) /dtdD ( t ) /dt (cid:21) = (cid:20) D ( t ) z ( t ) + Cu ( t )0 (cid:21) , D (0) = A, u ( t ) = (cid:40) , f loor ( t )%2 = 00 , otherwise (12) (cid:103) z ( t ) = z ( t ) + (cid:15) ( t ) , (cid:15) ( t ) ∼ N (0 , σ ) (13) ultiple-shooting adjoint method for whole-brain dynamic causal modeling 9 u ( t ) is an alternating block function at a period of 2, taking values 0 or 1. Theobserved function (cid:103) z ( t ) suffers from i.i.d Gaussian noise (cid:15) ( t ) with 0 mean anduniform variance σ .We perform 10 independent simulations and parameter estimations. For es-timation of DCM with the EM algorithm, we use the SPM package [14], whichis a widely used standard baseline. The estimation in MSA is implemented inPyTorch, using ACA [23] as the ODE solver. For MSA, we use the AdaBeliefoptimizer [25] to update parameters with the gradient; though other optimizerssuch as SGD can be used, we found AdaBelief converges faster in practice.For each of the non-zero elements in A , we show the boxplot of error inestimation in Fig. 3. Compared with EM, the error by MSA is significantlycloser to 0 and has a smaller variance. An example of a noisy observation andestimated curves are shown in Fig. 3, and the estimation by MSA is visuallycloser to the ground-truth compared to the EM algorithm. We emphasize thatthe estimated curve is not a simple smoothing of the noisy observation; instead,after estimating the parameters of the ODE, the estimated curve (for t >
0) isgenerated by solving the ODE using only the initial state. Therefore, the matchbetween estimated curve and observation demonstrates that our method learnsthe underlying dynamics of the system.
Application to large-scale systems
After validation on a small system withonly 3 nodes, we validate MSA on large scale systems with more nodes. We usethe same linear dynamical system as in Eq. 12, but with the node number p ranging from 10 to 100. Note that the dimension of A and B grows at a rateof O ( p ), and the EM algorithm estimates the covariance matrix of size O ( p ),hence the memory for EM method grows extremely fast with p . For varioussettings, the ground truth parameter is randomly generated from a uniformdistribution between -1 and 1, and the variance of measurement noise is set as σ = 0 .
5. For each setting, we perform 5 independent runs, and report the meansquared error (MSE) between estimated parameter and ground truth.As shown in Table 1, for small-size systems (number of nodes < = 20), MSAconsistently generates a lower MSE than the EM algorithm. For large-scale sys-tems, since the memory cost of the EM algorithm is O ( p ), the algorithm quicklyruns out-of-memory. On the other hand, the memory cost for MSA is O ( p ) be-cause it only uses the first-order gradient. Hence, MSA is suitable for large-scalesystems such as in whole-brain fMRI analysis. Application to general non-linear systems
Since neither the multiple-shootmethod nor the adjoint state method requires the ODE f to be linear, our MSAcan be applied to general non-linear systems. Furthermore, since our implemen-tation is in PyTorch which supports automatic differentiation, we only need tospecify f when fitting different models, and the gradient will be calculated auto-matically. Therefore, MSA is an off-the-shelf method, and is suitable for generalnon-linear ODEs both in theory and implementation. Table 1:
Mean squared error ( × − , lower is better) in estimation of parameters fora linear dynamical system with different number of nodes. “OOM” represents “out ofmemory”. 10 Nodes 20 Nodes 50 Nodes 100 NodesEM 3 . ± . . ± . . ± . . ± . . ± . . ± . We validate MSA on the Lotka-Volterra (L-V) equations [22], a system ofnon-linear ODEs describing the dynamics of predator and prey populations.The L-V equation can be written as: f (cid:16) [ z ( t ) , z ( t )] (cid:17) = (cid:20) dz ( t ) /dtdz ( t ) /dt (cid:21) = (cid:20) ζz ( t ) − βz ( t ) z ( t ) δz ( t ) z ( t ) − γz ( t ) (cid:21) , (cid:34) (cid:93) z ( t ) (cid:93) z ( t ) (cid:35) = (cid:20) z ( t ) + (cid:15) ( t ) z ( t ) + (cid:15) ( t ) (cid:21) (14)where ζ, β, δ, γ are parameters to estimate, (cid:103) z ( t ) is the noisy observation, and (cid:15) ( t ) is the independent noise. Note that there are non-linear terms z ( t ) z ( t ) inthe ODE, making EM derivation difficult. Furthermore, the EM method needsto explicitly derive the posterior mean, hence needs to be re-derived for everydifferent f ; while MSA is generic and hence does not require re-derivation.Besides the L-V model, we also consider a modified L-V model, defined as: dz ( t ) /dt = ζz ( t ) − βφ ( z ( t )) z ( t ) z ( t ) (15) dz ( t ) /dt = δφ ( z ( t )) z ( t ) z ( t ) − γz ( t ) (16)where φ ( x ) = 1 / (1 + e − x ) is the sigmoid function. We use this example todemonstrate the ability of MSA to fit highly non-linear ODEs.We compare MSA with LMFIT [12], which is a well-known python packagefor non-linear fitting. We use L-BFGS solver in LMFIT, which generates betterresults than other solvers. We did not compare with original DCM with EMbecause it’s unsuitable for general non-linear models. The estimation of the curvefor t > We apply MSA on whole-brain fMRI analysis with dynamic causal modeling.fMRI for 82 children with ASD and 48 age and IQ-matched healthy controlswere acquired. A biological motion perception task and a scrambled motion task[7] were presented in alternating blocks. The fMRI (BOLD, 132 volumes, TR= 2000ms, TE = 25ms, flip angle = 60 ◦ , voxel size 3.44 × × mm ) wasacquired on a Siemens MAGNETOM Trio 3T scanner. ultiple-shooting adjoint method for whole-brain dynamic causal modeling 11 Fig. 4:
Results for the L-V model.
Fig. 5:
Results for the modified L-V model.
Fig. 6:
An example of MSA for one subject. Left: effective connectome during task 1.Middle: effective connectome during task 2. Right: top and bottom represents the effec-tive connectome for task 1 and 2 respectively. Blue and red edges represent positive andnegative connections respectively. Only top 5% strongest connections are visualized.
Estimation of EC
We use the AAL atlas [20] containing 116 ROIs. For eachsubject, the parameters for dynamic causal modeling as in Eq. 10 is estimatedusing MSA. An example snapshot of the effective connectome (EC) during thetwo tasks is shown in Fig. 6, showing MSA captures the dynamic EC duringdifferent tasks.
Classification task
We conduct classification experiments for ASD vs. controlusing EC and FC as input respectively. The EC estimated by MSA at each timepoint provides a data sample, and the classification of a subject is based on themajority vote of the predictions across all time points. The FC is computed usingPearson correlation. We experimented with a random forest model and InvNet[24]. Results for a 10-fold subject-wise cross validation are shown in Fig. 7. Forboth models, using EC as input generates better accuracy, F1 score and AUCscore (threshold range is [0,1]). This indicates that estimating the underlyingdynamics of fMRI helps identification of ASD.
Fig. 7:
Classification results for ASD vs. control.
We propose the multiple-shooting adjoint (MSA) method for parameter esti-mation in ODEs, enabling whole-brain dynamic causal modeling. MSA has thefollowing advantages: robustness for noisy observations, ability to handle large-scale systems, and a general off-the-shelf framework for non-linear ODEs. Wevalidate MSA in extensive toy examples and apply MSA to whole-brain fMRIanalysis with DCM. To our knowledge, our work is the first to successfully applywhole-brain dynamic causal modeling in a classification task based on fMRI.Finally, MSA is generic and can be applied to other problems such as EEG andmodeling of biological processes.
References
1. Bock, H.G., Plitt, K.J.: A multiple shooting algorithm for direct solution of optimalcontrol problems. IFAC Proceedings Volumes (1984)2. Chen, R.T., Rubanova, Y., Bettencourt, J., Duvenaud, D.K.: Neural ordinary dif-ferential equations. Advances in neural information processing systems (2018)3. Di Martino, A., O’connor, D., Chen, B., Alaerts, K., Anderson, J.S., et al.: En-hancing studies of the connectome in autism using the autism brain imaging dataexchange ii. Scientific data (2017)4. Fr¨assle, S., Harrison, S.J., Heinzle, J., Clementz, B.A., Tamminga, C.A., et al.:Regression dynamic causal modeling for resting-state fmri. bioRxiv (2020)5. Friston, K.J., Harrison, L.: Dynamic causal modelling. Neuroimage (2003)6. Hildebrand, F.B.: Introduction to numerical analysis (1987)7. Kaiser, M.D., Hudac, C.M., Shultz, S., Lee, S.M., Cheung, C., et al.: Neural sig-natures of autism. PNAS (2010)8. Kiebel, S.J., Garrido, M.I., Moran, R.J., Friston, K.J.: Dynamic causal modellingfor eeg and meg. Cognitive neurodynamics (2008)9. Lindquist, M.A., Loh, J.M., Atlas, L.Y., Wager, T.D.: Modeling the hemodynamicresponse function in fmri: efficiency, bias and mis-modeling. Neuroimage (2009)10. Moon, T.K.: The expectation-maximization algorithm. ISPM (1996)11. Nation, K., Clarke, P., Wright, B., Williams, C.: Patterns of reading ability inchildren with autism spectrum disorder. J Autism Dev Disord (2006)12. Newville, M., Stensitzki, T., Allen, D.B., Rawlik, M., Ingargiola, A., Nelson, A.:Lmfit: Non-linear least-square minimization and curve-fitting for python (2016)ultiple-shooting adjoint method for whole-brain dynamic causal modeling 1313. Peifer, M., Timmer, J.: Parameter estimation in ordinary differential equations forbiochemical processes using the method of multiple shooting (2007)14. Penny, W.D., Friston, K.J., Ashburner, J.T., Kiebel, S.J., Nichols, T.E.: Statisticalparametric mapping: the analysis of functional brain images (2011)15. Pontryagin, L.S.: Mathematical theory of optimal processes (2018)16. Prando, G., Zorzi, M., Bertoldo, A., Corbetta, M., Chiuso, A.: Sparse dcm forwhole-brain effective connectivity from resting-state fmri data. NeuroImage (2020)17. Razi, A., Seghier, M.L., Zhou, Y., McColgan, P., Zeidman, P., Park, H.J., et al.:Large-scale dcms for resting-state fmri. Network Neuroscience (2017)18. Rokem, A., Trumpis, M., Perez, F.: Nitime: time-series analysis for neuroimagingdata. In: Proceedings of the 8th Python in Science Conference (2009)19. Seghier, M.L., Zeidman, P., Leff, A.P., Price, C.: Identifying abnormal connectiv-ity in patients using dynamic causal modelling of fmri responses. Front. Neurosci(2010)20. Tzourio-Mazoyer, N., Landeau, B., Papathanassiou, D., Crivello, F., Etard, O.,et al.: Automated anatomical labeling of activations in spm using a macroscopicanatomical parcellation of the mni mri single-subject brain. Neuroimage (2002)21. Van Den Heuvel, M.P., Pol, H.E.H.: Exploring the brain network: a review onresting-state fmri functional connectivity. Eur Neuropsychopharmacol (2010)22. Volterra, V.: Variations and fluctuations of the number of individuals in animalspecies living together. ICES Journal of Marine Science3