A nonstationary nonparametric Bayesian approach to dynamically modeling effective connectivity in functional magnetic resonance imaging experiments
aa r X i v : . [ s t a t . A P ] J u l The Annals of Applied Statistics (cid:13)
Institute of Mathematical Statistics, 2011
A NONSTATIONARY NONPARAMETRIC BAYESIAN APPROACHTO DYNAMICALLY MODELING EFFECTIVE CONNECTIVITY INFUNCTIONAL MAGNETIC RESONANCE IMAGINGEXPERIMENTS
By Sourabh Bhattacharya and Ranjan Maitra Indian Statistical Institute and Iowa State University
Effective connectivity analysis provides an understanding of thefunctional organization of the brain by studying how activated regionsinfluence one other. We propose a nonparametric Bayesian approachto model effective connectivity assuming a dynamic nonstationaryneuronal system. Our approach uses the Dirichlet process to specifyan appropriate (most plausible according to our prior beliefs) dy-namic model as the “expectation” of a set of plausible models uponwhich we assign a probability distribution. This addresses model un-certainty associated with dynamic effective connectivity. We derivea Gibbs sampling approach to sample from the joint (and marginal)posterior distributions of the unknowns. Results on simulation exper-iments demonstrate our model to be flexible and a better candidatein many situations. We also used our approach to analyzing func-tional Magnetic Resonance Imaging (fMRI) data on a Stroop task:our analysis provided new insight into the mechanism by which anindividual brain distinguishes and learns about shapes of objects.
1. Introduction.
Functional magnetic resonance imaging (fMRI) is a non-invasive technique for detecting regions in the brain that are activated by theapplication of a stimulus or the performance of a task. Although importantneuronal activities are responsible for such activation, these are very subtleand can not be detected directly. Instead, local changes during neuronal ac-tivity in the flow, volume, oxygen level and other characteristics of blood,called the blood oxygen level dependent (BOLD) response, form a proxy.Much research in fMRI has focused on identifying regions of cerebral activa-tion in response to the activity of interest. There is, however, growing interest
Received October 2010; revised February 2011. Supported in part by NSF CAREER Grant DMS-04-37555 and by the National In-stitutes of Health (NIH) Award DC-0006740.
Key words and phrases.
Attentional control network, Bayesian analysis, Dirichlet pro-cess, effective connectivity analysis, fMRI, Gibbs sampling, temporal correlation.
This is an electronic reprint of the original article published by theInstitute of Mathematical Statistics in
The Annals of Applied Statistics ,2011, Vol. 5, No. 2B, 1183–1206. This reprint differs from the original inpagination and typographic detail. 1
S. BHATTACHARYA AND R. MAITRA in obtaining better understanding of the interactions between different brainregions during the operation of the BOLD response. The study of how oneneuronal system interacts with another is called effective connectivity anal-ysis [Friston (1994); Nyberg and McIntosh (2001)]. We illustrate this in thecontext of obtaining greater insight into how an individual brain performsa Stroop task, which is also the main application studied in this paper.1.1.
Investigating the attentional control network in a Stroop task.
Thehuman brain’s information processing capability is limited, so it sifts out ir-relevant details from task-relevant information using the cognitive functioncalled attention . Specifically, task-relevant information is filtered out eitherbecause of intrinsic properties of the stimulus (bottom-up selection) or in-dependently (top-down selection) [Frith (2001)]. The brain’s preference fortask-related information in top-down selection requires coordination of neu-ral activity via an Attentional Control Network (ACN) which has systemsto process task-relevant and irrelevant information and also a “higher-orderexecutive control system” to modulate the frequency of neuronal firings ineach [Banach et al. (2000)]. Thus, the higher-order system can execute top-down selection by increasing neuronal activity in the task-relevant processingsystem while suppressing it in its task-irrelevant counterpart. Many studieshave empirically found the dorsal lateral prefrontal cortex (DLPFC) to bethe main source of attentional control, while the task-relevant and irrele-vant processing sites depend on whether the stimulus is visual, auditory orin some other form.Jaensch (1929) and Stroop (1935) discovered that the brain is quicker atreading named color words (e.g., blue, yellow, green, etc.) when they arein the concordant color than if they are in a discordant color. Tasks struc-tured along these lines are now called Stroop tasks. A much-studied two-phase experiment [Milham et al. (2002, 2003); Ho, Ombao and Shumway(2003, 2005); Milham, Banich and Barad (2003); Bhattacharya, Ho andPurkayastha (2006)] designed around such a task provided the dataset forour investigation. In the first phase, a subject was trained to associate eachof three unfamiliar shapes with a unique color word (“Blue,” “Yellow” and“Green”) with 100% accuracy. The second (testing) phase involved alternat-ing six times between blocks of eighteen interference and eighteen neutral trials. The neutral trial consisted of printing the shape in a neutral color(white). The interference trial involved presenting the subject with one ofthe learned shapes, but printed in a color different from that learned to berepresented by that shape in the learning phase. The subject’s task was tosubvocally name the shape’s color as trained in the learning phase, ignor-ing the color presented in the testing phase. Each neutral or interferencetrial consisted of a 0.3-s fixation cross, a 1.2-s stimulus presentation stageand a 0.5-s waiting state till the next trial. fMRI images were acquired and
YNAMIC EFFECTIVE CONNECTIVITY IN FMRI processed to obtain three activated regions, whose averaged post-processedtime series are what we analyze further to investigate attentional control.These three regions—also denoted as Regions 1, 2 and 3 in this paper—were the lingual gyrus (LG), the middle occipital gyrus (MOG) and theDLPFC, and were chosen as representatives of task-irrelevant, task-relevantand executive-control systems, respectively. The LG is a visual area for pro-cessing color information [Corbetta et al. (1991)], which in our context istask-irrelevant [Kelley et al. (1998)]. The MOG is another visual area butprocesses shape information, which is the task-related information (form ofthe shape) in the experiment. We refer to Bhattacharya, Ho and Purkayastha(2006) for further details on data collection and post-processing, noting herethat, as in that and other preceding papers, the objective is to investi-gate and to understand the working of the ACN mechanism in performinga Stroop task.1.2. Background and related work.
Structural equation modeling [McIn-tosh and Gonzalez-Lima (1994); Kirk et al. (2005); Penny et al. (2004)] andtime-varying parameter regression [B¨uchel and Friston (1998)] are two earlyapproaches that have been used to determine effective connectivity. In gen-eral, both approaches ignore dynamic modeling of the observed system, eventhough the latter accounts for temporal correlation in the analysis. Thereis, however, strong empirical evidence [Aertsen and Preißl (1991); Friston(1994); McIntosh and Gonzalez-Lima (1994); B¨uchel and Friston (1998);McIntosh (2000)] that effective connectivity is dynamic in nature, whichmeans that the time-invariant model assumed by both approaches may notbe appropriate. Ho, Ombao and Shumway (2005) overcame some of theselimitations by modeling the data using a state-space approach, but did notaccount for the time-varying nature of the effective connectivity parameters.An initial attempt at explicitly incorporating the time-varying nature ofeffective connectivity in addition to dynamic modeling of neuronal systemswas by Bhattacharya, Ho and Purkayastha (2006), who adopted a Bayesianapproach to inference and developed and illustrated their methodology withspecific regard to the ACN mechanism of the LG, MOG and DLPFC regionsin conducting the Stroop task outlined above. We summarize their model—framing it within the context of more recent literature in dynamic modelingof effective connectivity—and discuss their findings and some limitationsnext. In doing so, we also introduce the setup followed throughout this paper.1.2.1.
Bayesian modeling of dynamic effective connectivity.
Let y i ( t ) bethe observed fMRI signal (or the measured BOLD response) correspondingto the i th region at time t , i = 1 , , . . . , R , t = 1 , , . . . , T . Specifically, y i ( t )is some voxel-wise summary (e.g., regional average) of the correspondingdetrended time series in the i th region. Following Bhattacharya, Ho and S. BHATTACHARYA AND R. MAITRA
Purkayastha (2006), let x i ( t ) be the modeled BOLD response [as opposedto the measured BOLD response, y i ( t )], that is, the stimulus s ( t ) convolvedwith the hemodynamic response function (HRF) h i ( t ) for the i th regionand time point t . In this paper, h i ( t ) is assumed to be the very widely-usedstandard HRF model of Glover (1999) which differences two gamma func-tions and has some very appealing properties vis-a-vis other HRFs [Lu et al.(2006, 2007)]. Then the model for the observed fMRI signal can be hierar-chically specified as y i ( t ) = α i + x i ( t ) β i ( t ) + ε i ( t ) , (1)where α i and β i ( t ) are the baseline trend and activation coefficients for the i th region, the latter at time t . The errors ε i ( t )’s are all assumed to beindependent N (0 , σ i ), following Worsley et al. (2002). From Bhattacharya,Ho and Purkayastha [(2006), page 797], we assume that x i ( · ) = x ( · ) for i = 1 , . . . , R , that is, we use the same HRF h i ( · ) = h ( · ) for each of the R re-gions. Note that, as argued in that paper, this homogeneous assumptionon the x ( · ) is inconsequential because it is compensated by the β i ( t ) thatare associated with x ( t ), and allowed to be inhomogeneous with respectto the different regions. Also, following Bhattacharya, Ho and Purkayastha[(2006), page 799], we assume that σ i = σ ε ; i = 1 , . . . , R . Actually, (1) isa generalization of a very standard model used extensively in the literature—see, for example, Lindquist [(2008), equation (9)] or Henson and Friston[(2007), page 179, equation (14.1)], who use the same model but with a con-stant time-invariant β ( t ) ≡ β . (Indeed, as very helpfully pointed out bya reviewer, this last specification is also the general linear model com-monly used to analyze fMRI data voxel-wise, such as in statistical para-metric mapping and related conventional whole brain activation studies.)Our specific generalization incorporates time-varying β ( t ) and follows Ho,Ombao and Shumway (2005), Bhattacharya, Ho and Purkayastha (2006) orHarrison, Stephan and Friston [(2007), cf. page 516, equation 38.18]—note,however, that the latter model β ( t ) as a random walk [see equation 38.19,page 516, of Harrison, Stephan and Friston (2007)]. We prefer allowing fortime-varying activation β i ( t ) in order to address the “learning” effect of-ten reported in fMRI studies whereby strong activation in the initial stagesof the experiment dissipates over time [G¨ossl, Auer and Fahrmeir (2001);Milham et al. (2002, 2003); Milham, Banich and Barad (2003)]. Furthermodeling specifies the activation coefficient in the i th region at the t thtime-point in terms of the noise-free BOLD signal in the other regions atthe previous time-point. Thus, β i ( t ) = x ( t − " R X ℓ =1 γ iℓ ( t ) β ℓ ( t − + ω i ( t ) , (2) t = 2 , . . . , T ; i = 1 , , . . . , R, YNAMIC EFFECTIVE CONNECTIVITY IN FMRI where ω i ( t ) are independent N (0 , σ ω )-distributed errors and γ ij ( t ) is theinfluence of the j th region on the i th region at time t . Under (2), functionallyspecified cerebral areas are not constrained to act independently but caninteract with other regions. Our objective is to make inferences on γ ij ( t )in order to understand the functional circuitry in the brain as it processesa certain (in this paper, Stroop) task.Equations (1) and (2) together specify one of many Vector Autoregres-sive (VAR) models proposed by several authors [Harrison, Penny and Fris-ton (2003); Goebel et al. (2003); Rykhlevskaia, Fabiani and Gratton (2006);Sato et al. (2007); Thompson and Siegle (2009); Patriota, Sato and Achic(2010)]. To see this, note that for i = 1 , . . . , R , β i ( t −
1) depends linearlyupon y i ( t − β i ( t ) = g i ( y ( t − ,y ( t − , . . . , y R ( t − g i , which are linear in y ( t − y ( t − , . . . , y R ( t − β i ( t ) in (1), we see that for each i = 1 , . . . , R , y i ( t ) is a linear function of y ( t − , y ( t − , . . . , y R ( t − y ( t ) = ( y ( t ) , . . . , y R ( t )) ′ is a linear function of the vec-tor y ( t −
1) = ( y ( t − , . . . , y R ( t − ′ . As a result, our model is a firstorder VAR model from the viewpoint of the responses. It is of first or-der since y ( t ) depends upon y ( t − y (1) , . . . , y ( t − β i ( t ) are modeled as first orderVAR; that is, the R -component vector ( β ( t ) , . . . , β R ( t )) ′ depends linearlyupon ( β ( t − . . . , β R ( t − ′ .VAR models provide an alternative or a substantial generalization [Fris-ton (2009)] to the Dynamic Causal Modeling (DCM) approach proposed byFriston, Harrison and Penny (2003), at least in continuous-time, to modelthe change of the neuronal state vector over time, using stochastic differ-ential equations. In DCM, the observed BOLD signal is modeled as y i ( t ) = r i ( t ) + βz i ( t ) + ε i ( t ), where z i ( t ) denotes nuisance effects, and r i ( t ) is a mod-eled BOLD response obtained by first using a bilinear differential (neuralstate) equation, parametrized in terms of effective connectivity parametersand involving s ( t ), then subsequently using a “balloon model” transforma-tion [Buxton, Wong and Frank (1998) or extensions Friston et al. (2000);Stephan et al. (2007)] to the solution of the bilinear differential equation.DCM thus uses both r i ( t ) as well as the nuisance effects z i ( t ) to model theobserved BOLD response, with r i ( t ) playing the same role as our x i ( t ) withthe exception that the latter is obtained using the more widely-used Glover(1999) HRF model. Further, DCM assumes a deterministic relationship be-tween the different brain regions unlike (2) which allows for noisy dynamics[Bhattacharya, Ho and Purkayastha (2006)].Thompson and Siegle (2009) contend that VAR models have gained pop-ularity in recent years because “the direction and valence of effective connec-tivity relationships do not need to be pre-specified.” As such, these modelshave provided a useful framework for effective connectivity analysis. S. BHATTACHARYA AND R. MAITRA
Bhattacharya, Ho and Purkayastha (2006) proposed a symmetric randomwalk model for γ ij ( t ): γ ij ( t ) = γ ij ( t −
1) + δ ij ( t ) for i, j = 1 , , . . . , R ; t = 2 , , . . . , T, (3)where δ ij are independent N (0 , σ δ )-distributed errors. In this paper we use M RW to refer to the model specified by (1), (2) and (3). The effective con-nectivity parameters γ ij ( t ); ( i, j ) = 1 , . . . , R , also form a VAR model of thefirst order. To see this, let Γ ( t ) = ( γ ij ( t ); i, j = 1 , . . . , R ) ′ . Then it followsthat Γ ( t ) = IΓ ( t −
1) + δ ( t ), where I is the R × R -order identity matrix and δ ( t ) = ( δ ij ( t ); i, j = 1 , . . . , R ) ′ , indicating that γ ij ( t )’s are within the frame-work of a VAR model.Bhattacharya, Ho and Purkayastha (2006) specified prior distributions onthe parameters and hyperparameters of this model and used Gibbs samplingto learn the posterior distributions of the unknowns. We refer to that paperfor details and for results on simulation experiments using M RW , notinghere only that their Bayesian-derived inference supported ACN theory and,more importantly, the notion that effective connectivity is indeed dynamicin the network. Further, they found that the restricted model with γ ( t ) = γ ( t ) ≡ ∀ t was the best-performer, implying no direct feedback from thetwo sites of control (LG and MOG) to the source (DLPFC). Interestingly,however, and perhaps surprisingly, their estimated γ ij ( t )’s (see Figure 6in their paper) had very little relationship with the nature of the BOLDresponse (see Figure 1, bottom panel, in that paper). This is surprisingbecause from (1), we have β i ( t ) = ( y i ( t ) − α i − ε i ( t )) /x ( t ), and similarly for β i ( t − x ( · ). This means that the effective connectivity parameters γ iℓ ( t ) depend upon β i ( t ), the left-hand side of (2). Since β i ( t ) is a functionof x ( t ), it is reasonable to expect γ iℓ ( t )’s to depend upon x ( t ), but sucha relationship was not found in Bhattacharya, Ho and Purkayastha (2006).This perplexing finding led us to first investigate robustness of M RW to evenslight misspecifications.1.2.2. Robustness of the random walk model.
We tested the effect ofa slight departure from M RW by simulating, instead of from (3), from thefollowing stationary autoregressive model: γ ij ( t ) = 0 . γ ij ( t −
1) + δ ij ( t ) for i, j = 1 , , . . . , R ; t = 2 , , . . . , T. (4)We call this slightly modified model M RW ′ . Here, T = 285 and R = 3 tomatch the details of the dataset of Section 1.1. We fit M RW to data sim-ulated from M RW ′ . Figure 1 displays the estimated posterior distributionsof γ ij ( t ). The marginal posterior distribution of each γ ij ( t ) is representedhere by eight quantiles, each containing 12.5% of the distribution: increased YNAMIC EFFECTIVE CONNECTIVITY IN FMRI Fig. 1.
Posterior densities of γ ij ( t ); t = 1 , . . . , T ; i, j = 1 , , , under model M RW on datasimulated under model M RW ′ . The opacity of shading in each region is proportional to thearea under the density in that region. The solid line stands for the true values of γ ij ( t ) . opacity in shading denotes denser regions. Solid lines represent true values.As seen, many parts of the posterior distribution have very little coverageof the true effective connectivity parameters: this finding is also supportedby Table 1 which provides the proportion of true values included in the 95%highest posterior density (HPD) credible intervals [Berger (1985)] (these arethe shortest intervals with posterior probability 0.95). Thus, performance de-grades substantially even though M RW ′ is not all that different from M RW .Hence, modeling the process by a random walk may be too restrictive andthus a better approach may be needed. We do so in this paper by embed- S. BHATTACHARYA AND R. MAITRA
Table 1
Proportion of true γ ij ( t ) included in the 95% posterior credibleintervals obtained using model M RW on data simulated using M RW ′ γ γ γ γ γ γ γ γ γ ding an (asymptotically) stationary first order autoregressive AR(1) modelin a larger class of models. Formally, we employ a Bayesian nonparametricframework using a Dirichlet Process (DP) prior whose base distribution isassumed to be that implied by an AR(1) model. The intuition behind thismodeling style is that although one might expect the actual process to bestationary, the assumption might be too simplistic, and it is more logicalto think of the stationary model as an “expected model,” thus allowing fornonstationarity (quantified by the DP prior) in the actual model. Theoret-ical issues related to the construction of DP-based nonstationary processesare discussed in Section 2.1. In Section 2.2 we introduce our new model-ing ideas using the developments in Section 2.1. The efficacy of the newmodel is compared with its competitors on some simulated datasets in Sec-tion 3. The new approach is applied in Section 4 to the dataset introduced inSection 1.1 to investigate effective connectivity between the LG, MOR andDLPFC regions. We conclude in Section 5 with some discussion. Additionalderivations and further details on experiments and data analyses are pro-vided in the supplement [Bhattacharya and Maitra (2011)], whose sections,figures and tables have the prefix “S-” when referred to in this paper.
2. Modeling and methodology.
A nonstationary Dirichlet process model for time series observations.
A random probability measure G on the probability space ( Γ , B γ ) sampledfrom the Dirichlet Process (DP) denoted by DP( τ G ), and with knowndistribution G and precision parameter τ , can be represented almost surely,using the constructive method provided in Sethuraman (1994), as G ≡ ∞ X k =1 p k δ γ ∗ k , (5)where p = b and p k = b k Q k − ℓ =1 (1 − b ℓ ) , k = 2 , , . . . , with b k ’s being inde-pendent, identically distributed (henceforth i.i.d.) β (1 , τ ) random variables.The values γ ∗ k are i.i.d. realizations from G , for k = 1 , , . . . , and are alsoindependent of { b , b , . . . } . Note that (5) implies that G is discrete withprobability one, and has expectation G . DPs thus provide ways to placepriors on probability measures. YNAMIC EFFECTIVE CONNECTIVITY IN FMRI The dependent Dirichlet process (DDP) is an extension of the DP in thesense that it allows for a prior distribution to be specified on a set of randomprobability measures, rather than on a single random probability measure.In other words, the realizations γ ∗ k can be extended to accommodate anentire time-series domain T , such that Γ ∗ k, T = { γ ∗ kt ; t ∈ T } . Following (5),the random process thus constructed can be represented as G ( T ) ≡ ∞ X k =1 p k δ Γ ∗ k, T (6)with form similar to that used for spatial DP models [see Gelfand, Kottasand MacEachern (2005)]. Note that Γ ∗ k, T in (6) are realizations of somestochastic process Γ T = { γ t ; t ∈ T } , with distribution G ( T )0 for k = 1 , , . . . . Hence, Kolmogorov’s consistency holds for Γ T . That is, finite dimensionaljoint distributions { γ t ; t ∈ t T } , for ordered time-points t T = { t , . . . , t T } ,can be obtained from all finite but higher-dimensional joint distributions { γ t ; t ∈ t ∗ T ∪ t T } (here t ∗ T is a finite set) specified by the process, by marginal-izing over { γ t ; t ∈ t ∗ T } . Since (6) shows that G ( T ) is specified completely bythe process Γ T and { p k ; k = 1 , , . . . } , and since the latter are independentof t , it follows that Kolmogorov’s consistency holds for G ( T ) , providinga formal setup of a stochastic process of random distributions. In partic-ular, for any t ∈ T , G ( { t } ) ∼ DP( τ G ( { t } )0 ) [and admits the representation G ( { t } ) ≡ P ∞ k =1 p k δ γ ∗ kt ]. The collection of random measures G ( T ) is said tofollow the DDP [see, e.g., MacEachern (2000); De Iorio et al. (2004); Gelfand,Kottas and MacEachern (2005)].The process Γ T may be a time series that is stationary or—as adoptedin our application and more realistically—asymptotically so. Indeed, whileasymptotic stationarity is a very slight departure from stationarity, Sec-tion 1.2.2 demonstrates that it can have quite a significant impact on in-ference. It is also important to observe that although the process may bestationary or asymptotically stationary under G ( T )0 , the same process whenconditioned on G ( T ) is not even asymptotically stationary. Specifically, E ( γ t | G ( T ) ) = ∞ X k =1 p k γ ∗ kt , Var( γ t | G ( T ) ) = ∞ X k =1 p k ( γ ∗ kt ) − ∞ X k =1 p k γ ∗ kt ! and Cov( γ s , γ t | G ( T ) ) = ∞ X k =1 p k γ ∗ ks γ ∗ kt − ∞ X k =1 p k γ ∗ ks ! ∞ X k =1 p k γ ∗ kt ! . Thus, G ( T ) is nonstationary, although under G ( T )0 , Γ T may have a station-ary model so that the mean is constant and the covariance depends upon S. BHATTACHARYA AND R. MAITRA time only through the time lag | t − s | . Thus, we have defined here a process G ( T ) that is centered around a stationary process, but is actually nonsta-tionary. For purposes of applications, we have given (ordered) time-points( t , . . . , t T ), a T -variate distribution G ( T ) on the space of all T -variate dis-tributions of ( γ , . . . , γ T ) ′ with mean G ( T )0 being the T -variate distributionimplied by a standard time series.The development of our nonstationary temporal process here technicallyresembles that of a similar spatial process in Gelfand, Kottas and MacEach-ern (2005), but differs from the latter in that it is actually embedded in themodel for the observed fMRI signals. As a result, the full conditional dis-tributions of γ ij ( t )’s in our model are much more general and complicatedthan similar derivations following Gelfand, Kottas and MacEachern (2005).Another important difference between our approach and that of Gelfand,Kottas and MacEachern (2005) is that the latter had to introduce a pureerror (“nugget”) process to avoid discreteness of the distribution of theirspatial data. Such discreteness of the distribution (of our temporal data)is naturally avoided here, however, owing to the embedding approach usedin our modeling. Gelfand, Kottas and MacEachern (2005) also rely on theavailability of replications of the spatial dataset: our embedding approachobviates this requirement by merely assuming the availability of replicated(unobserved) random processes. We now introduce our dynamic effectiveconnectivity model.2.2. A Dirichlet process-based dynamic effective connectivity model.
Hierarchical modeling.
For i, j = 1 , , . . . , R , define the T -compo-nent vectors Γ ij = ( γ ij (1) , γ ij (2) , . . . , γ ij ( T )) ′ . Further, let Γ ij ’s be i.i.d. G ,where G ∼ DP( τ G ), with τ denoting the scale parameter quantifying un-certainty in the base prior distribution G . Also, assume that under G , γ ij (1) ∼ N (¯ γ, σ γ ) and for t = 2 , . . . , T , γ ij ( t ) = ργ ij ( t −
1) + δ ij ( t ), where | ρ | < δ ij ( t ) ∼ N (0 , σ δ ) are i.i.d. for i, j = 1 , , . . . , R ; t = 1 , , . . . , T . Itfollows that under G , Γ ij ∼ N T (¯ γ µ T , Σ ) where µ T = (1 , ρ, ρ , . . . , ρ T − ) ′ and for s ≤ t , Σ has the ( s, t )th elementΣ st = ρ s + t − σ γ + ρ t − s σ δ (cid:18) − ρ s − − ρ (cid:19) . (7)Note that with G as described above, the process is stationary if we choose¯ γ = 0 and σ γ = σ δ / (1 − ρ ), otherwise the process converges to station-arity for large s . In other words, under G , E ( γ ij ( s )) = E ( ρ s − γ ij (1) + P s − r =0 ρ r δ ij ( s − r )) = ρ s − ¯ γ which converges to 0 as s → ∞ , while from (7)it follows that, as s → ∞ with t − s < ∞ , Σ st → ρ t − s σ δ / (1 − ρ ). The casefor s > t is similar. Using the above developments, we specify our dynamic YNAMIC EFFECTIVE CONNECTIVITY IN FMRI effective connectivity model hierarchically, by augmenting (1) and (2) withthe following model for γ ij ( t )’s: Γ ij i . i . d . ∼ G ( T ) for i, j = 1 , , . . . , R, where G ( T ) ∼ DP( τ G ( T )0 ) . Distributional assumptions on ε i ( t )’s, ω i ( t )’s and δ ij ( t )’s are as in Sec-tion 1.2.1. We use M DP to refer to this model: note also that as τ → ∞ ,our DP-based model converges to the AR(1) model, which we denote us-ing M AR . We note in closing that the effective connectivity parameters areAR(1), hence VAR, under the expected distribution of M DP . Of course, theyare trivially also so under M AR . Note, however, that given a realization ofa random distribution from the Dirichlet process, such VAR representationdoes not hold.2.2.2. Other prior distributions.
We specify independent prior distribu-tions on each of σ ε , σ w , σ δ , ρ , τ , α i , β i (1) and γ ij (1); i, j = 1 , , . . . , R . Specif-ically, α i ’s are assumed to be i.i.d. N ( µ i , σ α ) for i = 1 , , . . . , R and β i ’sare assumed to be i.i.d. N ( ¯ β, σ β ), for i = 1 , , . . . , R . Also, γ ij (1)’s are in-dependently distributed with mean ¯ γ and variance σ γ , while ρ is uniformlydistributed in ( − , τ ∼ Γ( a τ , b τ ) and σ − ε , σ − w and σ − δ are each i.i.d.Γ( a, b ) with density having the functional form. Here µ i , σ α , ¯ β and σ β , ¯ γ and σ γ , a , b , a τ and b τ are all hyperparameters. In our examples, we take a = b = 0, reflecting our ignorance of the unknown parameter σ δ . Althoughthe Gamma priors with a = b = 0 are improper, they yielded proper posteri-ors in our case, vindicated by fast convergence of the corresponding marginalchains and resulting right-skewed posterior density estimates, which are ex-pected of proper posteriors having positive support. For ( a τ , b τ ) we firstfix the expected value of Γ( a τ , b τ ) (given by a τ /b τ ) to be such that in thefull conditional distribution of Γ ij , given by (8), the “expected” probabil-ity of simulating a new realization from the “prior” base measure approx-imately equals the probability of selecting realizations of Γ i ′ j ′ , for some( i ′ , j ′ ) = ( i, j ). Hence, if there are R nonzero Γ ij in the model, then setting a τ = c ( R −
1) and b τ = c serves the purpose. The resulting prior distribu-tion has variance equal to its expectation if c = 1. To achieve large variance,we set c = 0 .
1; the associated prior worked well in our examples. We alsoexperimented with c = 0 .
01 and c = 0 .
001 and noted that while the case with c = 0 . c . Moreover, the results demonstrate that interms of percentage of inclusion of the true γ ij ’s, all inclusion percentages,with the exception of γ and γ , were quite robust with respect to c . The re-sults corresponding to c = 0 .
01 and c = 0 .
001 were quite similar, while thosecorresponding to c = 0 . S. BHATTACHARYA AND R. MAITRA parameters were estimated empirically from the data as in Bhattacharya,Ho and Purkayastha (2006) using Berger’s (1985) ML-II approach.2.2.3.
Full conditional distributions.
The posterior distribution of theparameters are specified by their full conditionals, which are needed forGibbs sampling. The full conditional distributions of α i , β i ( t ), σ ε and σ ω areof standard form (see Section S-1.1), while those of the Γ ij ’s require somecareful derivation. To describe these, note that, on integrating out G ( T ) ,the prior conditional distribution of Γ ij given Γ kℓ for ( k, ℓ ) = ( i, j ) followsa Polya urn scheme, and is given by[ Γ ij | Γ kℓ ; ( k, ℓ ) = ( i, j )] ∼ τ G ( T )0 + P ( k,ℓ ) =( i,j ) δ Γ kℓ τ + { ( k, ℓ ) : ( k, ℓ ) = ( i, j ) } . (8)The above Polya urn scheme shows that marginalization with respect to G induces dependence among Γ ij in the form of clusterings, while maintain-ing the same stationary marginal G ( T )0 for each Γ ij . For Gibbs sampling weneed to combine (8) with the rest of the model to obtain the full condi-tional distribution given all the other parameters and the data. We obtainthe full conditionals by first defining, for i, j = 1 , , . . . , R , diagonal matri-ces A ij = σ − ω diag { , x (1) β j (1) , x (1) β j (2) , . . . , x ( T − β j ( T − } , wherediag lists the diagonal elements of the relevant matrix. We also define T -variate vectors B ij for i, j = 1 , , . . . , R with first element equal to zero. For t = 2 , . . . , T the t th element of B ij is B ij ( t ) = σ − ω [ β i ( t ) β j ( t − x ( t − − β j ( t − x ( t − P Rℓ =1 : ℓ = j γ iℓ ( t ) β ℓ ( t − Γ ij | · · · ] ∼ q ( ij )0 G ( T ) ij + X ( k,ℓ ) =( i,j ) q ( kℓ ) δ Γ kℓ , (9)where G ( T ) ij is the T -variate normal distribution with mean ( Σ − + A ij ) − (¯ γ Σ − × µ + B ij ) and variance ( Σ − + A ij ) − . Also, q ( ij )0 = C τ | I + ΣA ij | / × exp (cid:20) − { ¯ γ µ ′ T Σ − µ T (10) − (¯ γ Σ − µ T + B ij ) ′ ( Σ − + A ij ) − (¯ γ Σ − µ T + B ij ) } (cid:21) and q ( kℓ ) = C exp[ − ( Γ kℓ − A − ij B ij ) ′ A ij ( Γ kℓ − A − ij B ij ) − B ′ ij A − ij B ij ](11) YNAMIC EFFECTIVE CONNECTIVITY IN FMRI for ( k, ℓ ) = ( i, j ), with C chosen to satisfy q ( ij )0 + P ( k,ℓ ) =( i,j ) q ( kℓ ) = 1. Observethat unlike all DP-based approaches hitherto considered in the statisticsliterature, in our case G ( T ) ij , the conditional posterior base measure is notindependent of Γ i ′ j ′ for ( i ′ , j ′ ) = ( i, j ), which is a consequence of the factthat, thanks to (2), Γ i ′ j ′ are not conditionally independent of each other.Thus, our methodology generalizes other DP-based methods, including thatof Gelfand, Kottas and MacEachern (2005).Section S-1.2 presents an alternative algorithm to updating Γ ij using con-figuration indicators which are updated sequentially using themselves andonly the distinct Γ ij , given everything else. MacEachern (1994) has arguedthat such an updating procedure theoretically improves convergence prop-erties of the Markov chain: however, Section S-1.3 shows that in our case theassociated conditional distributions need to be obtained separately for eachof the 2 possible configuration indicators. This being infeasible, we recom-mend (9) for updating Γ ij . [We remark here that full conditionals are easilyobtained using configuration indicators in the case of Gelfand, Kottas andMacEachern (2005), thanks to the relative simplicity of their spatial prob-lem.] Also, (10) and (11) imply that as τ → ∞ , the full conditional distribu-tion (9) converges to G ( T ) ij , which is actually the full conditional distributionof the entire T -dimensional parameter vector Γ ij under the AR(1) model.In either case, we provide computationally efficient multivariate updates forour Gibbs updates: this makes our problem computationally tractable.To obtain the full conditional of τ , define m = { ( i, j ); i, j = 1 , , . . . , R } = R . Then, note that as in Escobar and West (1995), for a Γ( a τ , b τ ) prior on τ ,the full conditional distribution of the latter, given the number ( d ) of dis-tinct Γ ij and another continuous random variable η , is a mixture of twoGamma distributions, specifically π η Γ( a τ + d, b τ − log( η )) + (1 − π η )Γ( a τ + d − , b τ − log( η )), where π η / (1 − π η ) = ( a τ + d − / ( m ( b τ − log( η ))). Also,the full conditional of η is β ( τ + 1 , m ) . Finally, the full conditional dis-tributions of σ δ and ρ are not very standard and need careful derivation.Section S-1.4 describes a Gibbs sampling approach using configuration setsfor updating σ δ and ρ . For implementing this Gibbs step, one does notneed to simulate the configuration indicators, as they can be determinedafter simulating the Γ ij ’s using (9). Hence, this step is feasible. However, wefailed to achieve sufficiently good convergence with this approach, and henceused a Metropolis–Hastings step. The acceptance ratio for the Metropolis–Hastings step is given by [ Γ ][ Γ | Γ ][ Γ | Γ , Γ ] · · · [ Γ | Γ , . . . , Γ ],evaluated, respectively, at the new and the old values of the parameters( σ δ , ρ ). In the above, [ Γ ] ∼ G ( T )0 , and the other factors are Polya urn dis-tributions, following easily from (8). Once again, note the use of multivariateupdates in the MCMC steps, making our updating approach computation-ally feasible and easily implemented. S. BHATTACHARYA AND R. MAITRA
We conclude this section by noting that our model is structured to beidentifiable. The priors of α i , β i ( t ), γ ij ( t ) are all different and informative.Further, (2) shows that β i ( t ) is not permutation-invariant with respect tothe indices of Γ ij ’s. Identifiability of our model is further supported by theresults in this paper, which show all posteriors (based on MCMC) to bedistinct and different. This is unlike the case of the usual Dirichlet process-based mixture models which are permutation-invariant, as in Escobar andWest (1995), where the parameters have the same posterior due to noniden-tifiability. We now investigate performance of our methodology.
3. Simulation studies.
We performed a range of simulation experimentsto investigate performance of our approach relative to its alternatives. Sincethere are 9 nonzero Γ ij ’s in our model, we followed the recipe provided inSection 2 and put a Γ(0 . , .
1) prior on the DP scale parameter τ . We inves-tigated fitting M DP , M AR and M RW to the simulated data of Section 1.2.2and also to data simulated from the M RW and M AR models, the latter withboth ρ = 0 . ρ = 0 .
95 (where the model is notso clearly distinguished from nonstationarity but more clearly distinguishedthan when ρ = 0 . M AR in oursimulations was very similar to that of the M RW detailed in Bhattacharya,Ho and Purkayastha (2006): we omit details. For all experiments in this pa-per and in the supplement, we discarded the first 10,000 MCMC iterationsas burn-in and stored the following 20,000 iterations for Bayesian inference.Our results are summarized here for want of space, but presented in detail inSection S-2, with performance evaluated graphically [in terms of the poste-rior densities of γ ij ( t )’s] and numerically using coverage and average lengthsof the 95% HPD credible intervals of the posterior predictive distributions(for details, see Section S-2).The results of our experiments using the simulated data of Section 1.2.2showed that M AR performed better than M RW , but model M DP was theclear winner. Indeed, the support of the posterior distributions of γ ( t ) and γ ( t ) using M AR were much too wide to be of much use, but substan-tially narrower under M DP . M DP also outperformed the other two modelsin terms of the proportion of true γ ij ( t )’s included in the corresponding 95%HPD CIs. These CIs also captured almost all of the true values of γ ij ( t )under M DP , but far fewer values using M AR . M DP also exhibited betterpredictive performance than M AR and M RW . All these findings which fa-vor our DP-based model were implicitly the consequence of the fact that thetrue model in our experiment was approximately nonstationary, and mod-eled more flexibly by our nonstationary DP model rather than the stationaryAR(1) model. That this borderline between stationarity and nonstationar-ity of the true model is important was vindicated by the results of fitting M RW , M AR and M DP on the dataset simulated using M RW . Here, M RW YNAMIC EFFECTIVE CONNECTIVITY IN FMRI outperformed both M DP and M AR in terms of coverage of the true valuesof γ ij ( t ), indicating that M DP may under-perform when compared to thetrue model, in terms of coverage of parameter values, when the true modelcan be clearly identified. In terms of prediction ability, however, M DP wasstill the best performer, with the best coverage of the data points by theposterior predictive distribution and the lengths of the associated 95% CIs.This finding was not unexpected, since M DP involves model averaging (seeSection S-1.5), which improves predictive performance [see, e.g., Kass andRaftery (1995)]. For the dataset simulated from M AR with ρ = 0 .
5, thetrue model ( M AR ) outperformed M DP marginally and M RW substantially,but when ρ = 0 . M DP provided a much better fit than M AR or M RW .We have already mentioned that M DP outperformed M AR (and M RW ) forthe borderline case of ρ = 0 . ρ = 0 .
95 demonstratedgood performance of M DP even in relatively more distinguishable situations.At the same time, the experiment with ρ = 0 . M DP ; for clearly stationary data, we are at least marginally betteroff replacing M DP with a stationary model such as M AR . In spite of thiscaveat for clearly stationary situations, our simulation experiments indicatedthat our DP-based approach is flexible enough to address stationary modelsas well as deviations. We now analyze the Stroop Task dataset introducedin Section 1.1.
4. Application to Stroop task data.
The dataset was preprocessed fol-lowing Ho, Ombao and Shumway (2005) and Bhattacharya, Ho and Pur-kayastha 2006, to which we refer for details while providing only a briefsummary here. For each of the three regions (LG, MOG and DLPFC),a spherical region of 33 voxels was drawn around the location of peak acti-vation. The voxel-wise time series of the selected voxels in each region werethen subjected to higher order (multi-linear) singular value decomposition(HOSVD) using methods in Lathauwer, Moor and Vandewalle (2000). Thefirst mode of this HOSVD, after detrending with a running-line smoother asin Marchini and Ripley (2000), provided us with our detrended time seriesresponse y i ( t ) for the i th region [see Figure S-4 for y ( t )’s as well as x ( t )].We compared results obtained using M DP with those using M RW and M AR . We refer to Bhattacharya, Ho and Purkayastha (2006) and thesupplement for detailed results using M RW and M AR , respectively, onlysummarizing them here in comparison with results obtained using M DP ,which we also discuss in greater detail here. Detailed studies on MCMCconvergence are in Section S-3.2.4.1. Results.
Figure 2 displays the Gibbs-estimated marginal posteriordistributions of the γ ij ( t )’s for each time point t obtained using M DP .A striking feature of the marginal posterior densities of Figure 2 is the very S. BHATTACHARYA AND R. MAITRA
Fig. 2.
Estimated posterior densities (means in solid lines) of the regional influencesover time. strong oscillatory nature of these effective connectivity parameters with themodeled BOLD response x ( t ). This is quite different from the posteriordistributions of γ ij ( t )’s obtained using M AR (see Figure S-7). Table 2 eval-uates performance of the two models in terms of the length and proportionof observations contained in the 95% HPD credible intervals of the poste-rior predictive distributions: the intervals obtained using M DP have greatercoverage but are also much narrower, making it by far the better choiceamong the models. Figure 2 also shows that γ ( t ) , γ ( t ) and γ ( t )—and,to a lesser extent, γ ( t ) and γ ( t )—oscillate differently from the others inthat their amplitude is close to zero. We examined this issue further through YNAMIC EFFECTIVE CONNECTIVITY IN FMRI Table 2
Proportions of observed y included in, and average length, of the95% credible intervals of the posterior predictive distributionsunder M AR and M DP for the Stroop task dataset Proportions Average length y M AR M DP M AR M DP y y y Figure 3, which provides a map of the proportions of the cases for whicheach estimated marginal posterior density of γ ij ( t ) has positive support attime t . The intensities are mapped via a red-blue diverging palette: thus,darker hues of blue and red indicate high and low values, respectively, forthe proportions. Lighter hues of red or blue indicate values in the middle.Clearly, very little proportion of the marginal density is either on the positiveor the negative parts of the real line for the cases of γ ( t ), γ ( t ) and γ ( t ).We therefore investigated performance of models M DP modified to excludesome or all of these regional influences. Fig. 3.
Proportions of estimated marginal posterior density of γ ij ( t ) with positive supportat t . S. BHATTACHARYA AND R. MAITRA
Investigating restricted submodels of M DP . Bhattacharya, Ho andPurkayastha (2006) found that the model M RW with the constraint γ ( t ) = γ ( t ) = 0 (henceforth M − RW ) provided better results that the unconstrained M RW . Figure 2 also points to the possibility that models with some γ ij ( t ) ≡ M AR and M DP , by considering the proportion of datacontained in, and the average lengths of, the 95% HPD CIs of the corre-sponding posterior predictive distributions of y i ( t ); i = 1 , , , t = 1 , . . . , T .A systematic evaluation of all possible submodels is computationally verytime-consuming, so we investigated models with combinations of γ ( t ) = γ ( t ) ≡ γ ij ( t )’s for those ( i, j )’s whose posterior distributions exhibited less ampli-tude of oscillation as per Figure 2. Table 3 summarizes performances of thetop three submodels: others are in Tables S-11 and S-12. The top threeperformers were the following: • M (1)DP : M DP but with γ ( t ) ≡ ∀ t . • M (2)DP : M DP but with γ ( t ) ≡ ∀ t . • M (3)DP : M DP but with γ ( t ) = γ ( t ) ≡ ∀ t .Thus, M (1)DP and M (2)DP both beat M DP (of Table 2). The average 95% pos-terior predictive length using M (2)DP is about midway between M (1)DP and theunrestricted DP-based model, so we report our final findings and conclusionsonly using M (1)DP .4.2. Summary of findings.
Figure 4(a)–(h) display the posterior densi-ties of the nonnull regional influences γ ij ( t )’s over time. These γ ij ( t )’s arevery similar to those in Figure 2(a)–(h), with nonzero effective connectiv-ity parameters again having a very pronounced oscillation synchronous withthe modeled BOLD response: indeed, only the γ ( t ) of Figure 4(f) has an Table 3
Proportions of the observed data in, and mean lengths of, the 95% credible intervals ofposterior predictive distributions of y , y , y and the mean lengths of the 95% credibleintervals for the top three candidate submodels Proportion Mean length y M (1)DP M (2)DP M (3)DP M (1)DP M (2)DP M (3)DP y y y Fig. 4. (a)–(h)
Estimated posterior densities (means in solid lines) of the nonnull regio-nal influences over time using M (1)DP . (i) Proportion of the posterior distribution of γ ij ( t ) with positive support at time t . oscillation slightly more damped than in Figure 2. Further, Figure 4(i) in-dicates that the estimated posterior densities put most of their mass eitherbelow zero [when x ( t ) is negative] or above zero [when x ( t ) is positive]. In-deed, these densities have substantial mass around zero only when x ( t ) isaround zero. We also smoothed the modeled BOLD response x ( t ) to explorefurther its relationship with each of the estimated posterior mean γ ij ( t )’sfrom M (1)DP . For each t , we specified x ( t ) = A cos(2 πωt + φ ) + ψ t where ψ t are i.i.d. N (0 , σ ψ ), A is the amplitude of the time series, ω is the oscilla- S. BHATTACHARYA AND R. MAITRA tion frequency and φ is a phase shift. Equivalently, x ( t ) = β cos(2 πωt ) + β sin(2 πωt ) + ψ t with β = A cos( φ ) and β = A sin( φ ). We obtain ˆ ω = 0 . x ( t ) has a length of about 50 time-points. A least squaresfit yields ˆ β = 0 .
27 and ˆ β = − . , hence, ˆ A = 0 .
80 and φ = 1 .
16. Figure S-8shows that the smoothed BOLD response ˆ x ( t ) = ˆ β cos(2 π ˆ ωt ) + ˆ β sin(2 π ˆ ωt )closely approximates the original time series x ( t ). The correlation of ˆ x ( t )with each of γ ( t ), γ ( t ), γ ( t ), γ ( t ), γ ( t ), γ ( t ), γ ( t ) and γ ( t ) are0.959, 0.909, 0.952, 0.950, 0.922, 0.874, 0.949 and 0.929, respectively. Thus, γ ij ( t )’s are not completely linear in the BOLD response, but very close tobeing so with regard to its transformed version.The results of our analysis indicate that the region LG, centered aroundzero, exhibits very strong evidence of self-feedback, oscillatory with high am-plitude, and period of about 50, matching the period of the modeled BOLDresponse x ( t ). Similar influences are exerted by both MOG and DLPFC onLG and by the MOG region on itself. Indeed, Figure 4 indicates that thesefour inter- and intra-regional influences have, broadly, a similar pattern interms of amplitude. The influence of LG on MOG and DLPFC is smallerand similar to each other. Further, Figure 4(f) and (h) indicate that thefeedback provided by DLPFC on MOG [ γ ( t )] is similar to that in the re-verse direction [ γ ( t )]. Thus, there are three broad patterns in the way thatinter-and intra-regional influences occur.Our analysis also demonstrates the existence of the ACN and its mech-anism while performing a Stroop task. Thus, the executive control system(DLPFC) provides instruction to both the task-irrelevant (LG) and task-relevant processing sites (MOG) but gets similar levels of feedback from thetask-relevant processor (MOG). LG which sifts out the task-irrelevant colorinformation gets a lot of feedback in doing so from both itself and MOG.However, it provides far less feedback to the task-relevant shape informationprocessing MOG and the executive control DLPFC. MOG itself providessubstantial self-feedback while processing shape information. Finally, notethat while our results indicate higher amplitudes for inter-regional feedbackinvolving γ ij ( t )’s when they involve LG rather than MOG, this is consistentwith the established notion that processing shape information is a higher-level (more difficult) cognitive function than distinguishing color.The results on the effective connectivity parameters using M DP are verydifferent from those done using M − RW [see Figure 5 of Bhattacharya, Hoand Purkayastha (2006)] or M AR . Using M − RW , Bhattacharya, Ho andPurkayastha (2006) found some evidence of self-feedback only in LG: the95% HPD BCRs contained zero unless t increased. Further, while the re-lationship of the posterior mean appeared somewhat linear in t , there wasno relationship with the modeled BOLD response. Most γ ij ( t )’s [with the YNAMIC EFFECTIVE CONNECTIVITY IN FMRI exception of γ ( t )] were almost invariant with respect to time t , unlikethe clear oscillatory nature of the time series obtained here using M (1)DP (oreven M DP ). The fact that the BOLD response had very little relationshipwith these effective connectivity parameters is perplexing, given that theseregions were the ones found to be activated in the preprocessing of the fMRIdataset. The results on γ ij ( t )’s using M AR were also very surprising: whilethe posterior means oscillated synchronously with x ( t ) only for the task-irrelevant LG with a correlation of 0.943, there was no evidence of nonzerovalues for all the other effective connectivity parameter values (including thetask-relevant MOG), since their pointwise 95% HPD credible regions all con-tained zero for all time t . This is very unlike the results obtained using M (1)DP ,which also established the existence of the ACN theory in performing thistask. Indeed, among all the approaches considered in the literature and hereon this dataset, only the DP-based analyses have been able to capture boththe dynamic as well as the oscillatory nature of the effective connectivity pa-rameters. In doing so, we also obtain further insight into how an individualbrain performs a Stroop task.
5. Conclusions and future work.
Effective connectivity analysis providesan important approach to understanding the functional organization of thehuman brain. Bhattacharya, Ho and Purkayastha (2006) provide a coherentand elegant Bayesian approach to incorporating uncertainty in the analy-sis. In this paper we note that this approach also brings forth with it somelimitations. In this paper we therefore propose a nonstationary and nonpara-metric Bayesian approach using a DP-based model that embeds an AR(1)process in the class of many possible models. Heuristically, our suggestionhas some connection with model averaging, where we have, a priori, anAR(1) model in mind for specifying dynamic effective connectivity: the DPprovides a coherent way to formalize our intuition. We have also derived aneasily implemented Gibbs sampling algorithm for learning about the pos-terior distributions of all the unknown quantities. Simulation studies showthat our model is a better candidate for the analysis of effective connectivityin many cases. The advantage is more pronounced with increasing depar-tures from stationarity in the true model. We also applied our methodologyto investigate the feedback mechanisms between the task-irrelevant LG, thetask-relevant MOG and the “executive control” DLPFC in the context ofa single-subject Stroop task study. Our results showed strong self-feedbackfor LG and MOG, but not for DLPFC. Further, MOG and DLPFC influenceLG strongly but the reverse is rather mild. The influence of MOG on DLPFCand vice versa are very similar. All these discovered feedback mechanismsoscillate strongly in the manner of the BOLD signal and are supportive ofthe framework postulated by ACN theory. Our analysis also provides under-standing into the mechanism of how the brain performs a Stroop task. All S. BHATTACHARYA AND R. MAITRA these are novel findings not reported in the context of fMRI analysis in theliterature. Thus, adoption of our DP-based approach not only provided in-terpretable results, but—as very kindly pointed out by a reviewer—yieldedadditional insights into the workings of the brain.There are several aspects of our methodology and analysis that deservefurther attention. For one, we have investigated ACN in the context ofa Stroop task for a single male volunteer. It would be of interest to studyother tasks and responses to other stimuli and also to see how our resultson a Stroop task translate to multiple subjects and to investigate how thesemechanisms differ from one person to another. Our modeling approach caneasily be extended to incorporate such scenarios. Further, our methodology,while developed and evaluated in the context of modeling dynamic effectiveconnectivity in fMRI datasets, can be applied to other settings also, espe-cially in situations where the actual models for the unknowns may be quitedifficult to specify correctly. Thus, we note that while this paper has madean interesting contribution to analyzing dynamic effective connectivity insingle-subject fMRI datasets, several interesting questions and extensionsmeriting further attention remain.SUPPLEMENTARY MATERIAL
Contents (DOI: 10.1214/11-AOAS470SUPP; .pdf). Section S-1 containsadditional details regarding our methodology, including explicit forms ofthe full conditional distributions of specific parameters, the configurationindicators and the distinct parameters associated with the Dirichlet processneeded for Gibbs sampling. Detailed arguments that show model averagingassociated with our DP-based model M DP are also presented there. SectionS-2 provides additional information on our simulation experiments, includ-ing associated methodology and results. Section S-3 presents further detailson the analysis of the Stroop task experiment, including display of the data,detailed assessment of convergence of our MCMC samplers when using M DP and M (1)DP and MCMC-based posterior analysis using M AR and other addi-tional models obtained by setting some effective connectivity parameters tozero. Additional methodological details and results regarding the smoothingof the modeled BOLD signal x ( · ) are also presented there. Acknowledgments.
The authors are very grateful to the Editor and tworeviewers, whose very detailed and insightful comments on earlier versionsof this manuscript greatly improved its content and presentation.REFERENCES
Aertsen, A. and
Prei ß l, H. (1991). Dynamics of activity and connectivity in physiologi-cal neuronal networks. In Non-linear Dynamics and Neuronal Networks ( H. G. Schus-ter , ed.) 281–302. VCH, New York.YNAMIC EFFECTIVE CONNECTIVITY IN FMRI Banach, M. T. , Milham, M. P. , Atchley, R. , Cohen, N. J. , Webb, A. , Wszalek, T. , Kramer, A. F. , Liang, Z. P. , Wright, A. , Shenker, J. and
Magin, R. (2000).fMRI studies of Stroop tasks reveal unique roles of nterior and posterior brain systemsin attentional selection.
Journal of Cognitive Neuroscience Berger, J. O. (1985).
Statistical Decision Theory and Bayesian Analysis . Springer, NewYork. MR0804611
Bhattacharya, S. , Ho, M. R. and
Purkayastha, S. (2006). A Bayesian approach tomodeling dynamic effective connectivity with fMRI data.
NeuroImage Bhattacharya, S. and
Maitra, R. (2011). Supplement to “A nonstationary nonpara-metric Bayesian approach to dynamically modeling effective connectivity in functionalmagnetic resonance imaging experiments.” DOI:10.1214/11-AOAS470SUPP.
B¨uchel, C. and
Friston, K. J. (1998). Dynamic changes in effective connectivity charac-terized by variable parameter regression and Kalman filtering.
Human Brain Mapping Buxton, R. B. , Wong, E. C. and
Frank, L. R. (1998). Dynamics of blood flow andoxygenation changes during brain activation: The balloon model.
Magnetic Resonancein Medicine Corbetta, M. , Miezin, F. M. , Dobmeyer, S. , Shulman, G. L. and
Petersen, S. E. (1991). Selective and divided attention during visual distrimination of shape, color andspeed: Functional anatomy by positron emission tomography.
Journal of Neuroscience De Iorio, M. , M¨uller, P. , Rosner, G. L. and
MacEachern, S. N. (2004). AnANOVA model for dependent random measures.
J. Amer. Statist. Assoc. Escobar, M. D. and
West, M. (1995). Bayesian density estimation and inference usingmixtures.
J. Amer. Statist. Assoc. Friston, K. (1994). Functional and effective connectivity in neuroimaging: A synthesis.
Human Brain Mapping Friston, K. J. (2011). Dynamic causal modeling and Granger causality Comments on:The identification of interacting networks in the brain using fMRI: Model selection,causality and deconvolution.
NeuroImage . To appear.
Friston, K. J. , Harrison, L. and
Penny, W. (2003). Dynamic causal modeling.
Neu-roimage Friston, K. J. , Mechelli, A. , Turner, R. and
Price, C. J. (2000). Nonlinear responsesin fMRI: The Balloon model, Volterra kernels, and other hemodynamics.
Neuroimage Frith, C. (2001). A framework for studying the neural basis of attention.
Neuropsycholo-gia Gelfand, A. E. , Kottas, A. and
MacEachern, S. N. (2005). Bayesian nonparametricspatial modeling with Dirichlet process mixing.
J. Amer. Statist. Assoc.
Glover, G. (1999). Deconvolution of impulse response in event-related BOLD fMRI.
Neuroimage Goebel, R. , Roebroeck, A. , Kim, D. S. and
Formisano, E. (2003). Investigatingdirected cortical interactions in time-resolved fMRI data using vector autoregressivemodeling and Granger causality mapping.
Magnetic Resonance Imaging G¨ossl, C. , Auer, D. P. and
Fahrmeir, L. (2001). Bayesian spatiotemporal inference infunctional magnetic resonance imaging.
Biometrics Harrison, L. , Penny, W. L. and
Friston, K. (2003). Multivariate autoregressive mod-eling of fMRI time series.
Neuroimage S. BHATTACHARYA AND R. MAITRA
Harrison, L. , Stephan, K. E. and
Friston, K. J. (2007). Effective connectivity. In
Statistical Parametric Mapping: The Analysis of Functional Brain Images
Henson, R. and
Friston, K. J. (2007). Convolution models for fMRI. In
StatisticalParametric Mapping: The Analysis of Functional Brain Images
Ho, M. R. , Ombao, H. and
Shumway, R. (2003). Practice-related effects demonstratecomplementary role of anterior cingulate and prefrontal cortices in attentional control.
NeuroImage Ho, M. R. , Ombao, H. and
Shumway, R. (2005). A state-space approach to modellingbrain dynamics.
Statist. Sinica Jaensch, E. R. (1929).
Grundformen Menschlichen Seins . Otto Elsner, Berlin.
Kass, R. E. and
Raftery, R. E. (1995). Bayes factors.
J. Amer. Statist. Assoc. Kelley, W. M. , Miezin, F. M. , McDermott, K. B. , Buckner, R. L. , Raichle, M. E. , Cohen, N. J. , Ollinger, J. M. , Akbudak, E. , Conturo, T. E. , Snyder, A. Z. and
Petersen, S. E. (1998). Hemispheric specialization in human dorsal frontal cortex andmedial temporal lobe for verbal and nonverbal memory encoding.
Neuron Kirk, E. , Ho, M. R. , Colcombe, S. J. and
Kramer, A. F. (2005). A structural equationmodeling analysis of attentional control: An event-related fMRI study.
Cognitive BrainResearch Lathauwer, L. D. , Moor, B. D. and
Vandewalle, J. (2000). A multilinear singularvalue decomposition.
SIAM J. Matrix Anal. Appl. Lindquist, M. A. (2008). The statistical analysis of fMRI data.
Statist. Sci. Lu, Y. , Bagshaw, A. P. , Grova, C. , Kobayashi, E. , Dubeau, F. and
Gotman, J. (2006). Using voxel-specific hemodynamic response function in EEG-fMRI data analy-sis.
Neuroimage Lu, Y. , Bagshaw, A. P. , Grova, C. , Kobayashi, E. , Dubeau, F. and
Gotman, J. (2007). Using voxel-specific hemodynamic response function in EEG-fMRI data analy-sis: An estimation and detection model.
Neuroimage MacEachern, S. N. (1994). Estimating normal means with a conjugate-style Dirichletprocess prior.
Comm. Statist. Simulation Comput. MacEachern, S. N. (2000). Dependent Dirichlet processes. Technical report, Dept.Statistics, Ohio State Univ., Columbus, OH.
Marchini, J. L. and
Ripley, B. D. (2000). A new statistical approach to detectingsignificant activation in functional MRI.
NeuroImage McIntosh, A. R. (2000). Towards a network theory of cognition.
Neural Networks McIntosh, A. R. and
Gonzalez-Lima, F. (1994). Structural equation modeling and itsapplication to network analysis of functional brain imaging.
Human Brain Mapping Milham, M. P. , Banich, M. T. and
Barad, V. (2003). Competition for priority inprocessing increases prefrontal cortex’s involvement in top-down control: An event-related fMRI study of the Stroop task.
Cognitive Brain Research Milham, M. P. , Erickson, K. I. , Banich, M. T. , Kramer, A. F. , Webb, A. , Wsza-lek, T. and
Cohen, N. J. (2002). Attentional control in the aging brain: Insights froman fMRI study of the Stroop task.
Brain Cognition Milham, M. P. , Banich, M. T. , Claus, E. and
Cohen, N. (2003). Practice-relatedeffects demonstrate complementary role of anterior cingulate and prefrontal cortices inattentional control.
Neuroimage Nyberg, L. and
McIntosh, A. R. (2001). Functional neuroimaging: Network analysis.In
Handbook of Functional Neuroimaging of Cognition ( R. Cabeza and
A. Kingstone ,eds.) 49–72. MIT Press, Cambridge, MA.
Patriota, A. G. , Sato, J. R. and
Achic, B. G. B. (2010). Vector autoregressive modelswith measurement errors for testing Granger causality.
Stat. Methodol. Penny, W. D. , Stephan, K. E. , Mechelli, A. and
Friston, K. J. (2004). Modelingfunctional integration: A comparison of structural equation and dynamic causal models.
Neuroimage (Suppl. 1) 264–274. Rykhlevskaia, E. , Fabiani, M. and
Gratton, G. (2006). Lagged covariance structuremodels for studying functional connectivity in the brain.
Neuroimage Sato, J. R. , Morrettin, P. A. , Arantes, P. R. and
Amaro Jr., E. (2007).Wavelet-based time-varying vector autoregressive modeling.
Neuroimage Sethuraman, J. (1994). A constructive definition of Dirichlet priors.
Statist. Sinica Shumway, R. H. and
Stoffer, D. S. (2006).
Time Series Analysis and Its ApplicationsWith R Examples . Springer, New York. MR2228626
Stephan, K. E. , Weiskopf, N. , Drysdale, P. M. , Robinson, P. A. and
Friston, K. J. (2007). Comparing hemodynamic models with DCM.
Neuroimage Stroop, J. R. (1935). Studies of interference in serial verbal reactions.
Journal of Exper-imental Psychology Thompson, W. K. and
Siegle, G. (2009). A stimulus-locked vector autoregressive model.
Neuroimage Worsley, K. J. , Liao, C. H. , Aston, J. , Petre, V. , Duncan, G. H. , Morales, F. and
Evans, A. C. (2002). A general statistical analysis for fMRI data.
Neuroimage Bayesian and InterdisciplinaryResearch UnitIndian Statistical Institute203, B. T. RoadKolkata 700108IndiaE-mail: [email protected]