[PDF] Inferring the Type of Phase Transitions Undergone in Epileptic Seizures Using Random Graph Hidden Markov Models for Percolation in Noisy Dynamic Networks

Abstract

In clinical neuroscience, epileptic seizures have been associated with the sudden emergence of coupled activity across the brain. The resulting functional networks - in which edges indicate strong enough coupling between brain regions - are consistent with the notion of percolation, which is a phenomenon in complex networks corresponding to the sudden emergence of a giant connected component. Traditionally, work has concentrated on noise-free percolation with a monotonic process of network growth, but real-world networks are more complex. We develop a class of random graph hidden Markov models (RG-HMMs) for characterizing percolation regimes in noisy, dynamically evolving networks in the presence of edge birth and edge death, as well as noise. This class is used to understand the type of phase transitions undergone in a seizure, and in particular, distinguishing between different percolation regimes in epileptic seizures. We develop a hypothesis testing framework for inferring putative percolation mechanisms. As a necessary precursor, we present an EM algorithm for estimating parameters from a sequence of noisy networks only observed at a longitudinal subsampling of time points. Our results suggest that different types of percolation can occur in human seizures. The type inferred may suggest tailored treatment strategies and provide new insights into the fundamental science of epilepsy.

Full PDF

IInferring the Type of Phase TransitionsUndergone in Epileptic Seizures UsingRandom Graph Hidden Markov Models forPercolation in Noisy Dynamic Networks

Xiaojing Zhu ∗ , Heather Shappell ∗ , Mark A. Kramer,Catherine J. Chu, Eric D. Kolaczyk Abstract

In clinical neuroscience, epileptic seizures have been associated with the suddenemergence of coupled activity across the brain. The resulting functional networks –in which edges indicate strong enough coupling between brain regions – are consistentwith the notion of percolation, which is a phenomenon in complex networks corre-sponding to the sudden emergence of a giant connected component. Traditionally,work has concentrated on noise-free percolation with a monotonic process of networkgrowth, but real-world networks are more complex. We develop a class of randomgraph hidden Markov models (RG-HMMs) for characterizing percolation regimes innoisy, dynamically evolving networks in the presence of edge birth and edge death,as well as noise. This class is used to understand the type of phase transitions un-dergone in a seizure, and in particular, distinguishing between diﬀerent percolationregimes in epileptic seizures. We develop a hypothesis testing framework for inferringputative percolation mechanisms. As a necessary precursor, we present an EM algo-rithm for estimating parameters from a sequence of noisy networks only observed ata longitudinal subsampling of time points. Our results suggest that diﬀerent typesof percolation can occur in human seizures. The type inferred may suggest tailoredtreatment strategies and provide new insights into the fundamental science of epilepsy.

Keywords: Erdos-Renyi model, Achlioptas’ process, particle ﬁltering, data augmentation ∗ These authors contributed equally. Xiaojing Zhu ([email protected]), Eric D. Kolaczyk ([email protected]), Mark A. Kramer ([email protected]), Department of Mathematics and Statistics, BostonUniversity. Heather Shappell ([email protected]), Department of Biostatistics and Data Science,Wake Forest School of Medicine. Catherine J. Chu ([email protected]), Massachusetts General Hos-pital, Harvard Medical School. This research was supported by ARO award W911NF1810237 and NIHaward 1R01NS095369-01. a r X i v : . [ s t a t . A P ] F e b Introduction

Epilepsy is a common neurological syndrome, aﬀecting over 70 million people worldwide,and a major burden with respect to quality of life, morbidity, and risk of premature mor-tality (Thijs et al. (2019)). It is well known now that synchronization of neurons (and onthe macro-scale - groups of neurons that make up larger brain regions) play a pivotal rolein maintaining normal brain function. The abnormal synchronization of neurons, however,is a deﬁning characteristic of some neurological disorders, such as epilepsy.In recent years, it has become increasingly clear that epilepsy and seizures result notonly from isolated brain areas, but from networks of interacting brain regions (Kramer &Cash (2012)). More speciﬁcally, epileptic seizures have been associated with the suddenemergence of coupled/synchronized activity across the brain (Guye et al. (2006), Pontenet al. (2007), Schindler, Leung, Elger & Lehnertz (2007), Schindler, Elger & Lehnertz(2007), Schindler et al. (2010), Kramer et al. (2010), Martinet et al. (2020)). This synchro-nized activity may be deﬁned via functional brain networks, where the nodes of the networkrepresent distinct brain regions, and the edges between the nodes indicate strong couplingof brain activity (Rubinov & Sporns (2010)). Moreover, the emergence of coupled activity(or increased network edges), aligns with the notion of percolation – the sudden emergenceof a giant connected component (GCC) in a network ((Grimmett 2018, Chapter 3)).In percolation theory, the behavior of the GCC in a network is studied as a function ofits evolution over time. Two popular models of percolation are the Erdos-Renyi (ER) model(Erd˝os & R´enyi (1960)) and the Achlioptas’ process (Achlioptas et al. (2009)). They areboth random graph models assuming single-edge change over time, but they diﬀer in thechoice of which edge changes. The choice of edge addition in the ER model is uniform overall non-edges, while that in the Achlioptas’ process model is based on a ‘product rule’ (PR),which slows down the growth of the GCC by favoring the creation of edges between smaller2onnected components. Percolation in the ER model is considered a classical archetype,while that in the PR process is of a more rapid form. As such, these two percolation modelsrepresent prototypical extremes.Figure 1 depicts an example of functional brain network behavior during seizure onset.As can be seen in the ﬁgure, the size of the largest connected component of the brainnetwork increases dramatically following the clinically determined onset of the epilepticseizure. The behavior of the curve in this ﬁgure is qualitatively similar to that of a perco-lation curve, in which a network is undergoing a transition from a large collection of smallnetworks to a single large connected network.Increased interest in network percolation has recently been fueled by its relevance toepileptic seizures. While prior work has shown an explosive density increase (i.e. moreedges) in functional connectivity networks in epilepsy patients during seizure onset, align-ing with the notion of percolation, our work delves deeper to provide methods to uncoverthe underlying network evolution behavior behind the density increase. We aim to answerthe question: How can we distinguish between diﬀerent percolation regimes in practice?Understanding such phenomena, corresponding with the transition between normal brainfunction, seizure propagation, and seizure termination, may be critical for both the funda-mental science of epilepsy and developing improved strategies for treatment.In this paper, we propose a framework to distinguish between diﬀerent types of phasetransitions undergone in a seizure, and in particular, to distinguish between diﬀerent perco-lation regimes. Traditionally, work in this area has concentrated on noise-free percolationwith a monotonic process of network growth, but real-world networks are more complex.It is more realistic to consider both edge creation and dissolution in network evolution.Furthermore, we should expect observed networks to be contaminated by noise, such asmeasurement error. The presence of edge death and noise signiﬁcantly confounds thedistinction between percolation regimes and makes the two percolation models indistin-3uishable using heuristic statistics, e.g. the size of the largest component, as shown inFigure 2. Therefore, a framework for the statistical testing of competing hypotheses ofpercolation regimes, under these conditions, is needed.We develop a class of random graph hidden Markov models (RG-HMMs) and the neces-sary inferential methodologies, for characterizing percolation regimes in noisy, dynamicallyevolving networks in the presence of edge birth and edge death, as well as noise. Our modelclass builds on the framework proposed (with only preliminary inferential machinary) byViles et al. (2016), where the nonstationary process characterized by birth and death ofedges was modeled in a hidden/latent layer in discrete time, assuming the true underly-ing networks evolve by a single edge change per time step. We extend the model to thecontinuous-time setting where the process may stay in diﬀerent states for diﬀering (contin-uous) amounts of time. This is critical for making the framework applicable to epilepsy,as well as other real-world contexts, in which, even if observed at regular intervals, a dy-namically evolving network can almost never practically be observed at the resolution ofchanges in individual edge status.The remainder of this paper is organized as follows. In Section 2, we provide modeldeﬁnitions for continuous-time percolation models and our HMM set-up. An expectation-maximization (EM) algorithm for obtaining maximum likelihood (ML) estimates of themodel parameters, along with its asymptotic properties are described in Section 3. Section4 outlines a statistical testing framework for competing percolation regimes, and Section5 reports simulation results for our estimation and testing algorithms. An application forthe proposed framework to real epileptic seizure data is given in Section 6, followed by adiscussion in Section 7. 4igure 1: Proportion of nodes in thelargest component as a function of timefor a functional connectivity network de-duced from the electrocorticogram of asingle patient with epilepsy during aseizure. The black trace is a smoothedversion of this process. Dotted verticalline indicates seizure onset. Source: Vileset al. (2016) . . . . . . Time steps/Number of vertices P r opo r t i on o f V e r t i c e s i n G i an t C o m ponen t ER PercolationPR PercolationER with noisePR with noise

Figure 2: Proportion of nodes in thelargest component as a function of scaledtime for ER and PR birth and deathprocess with and without noise on net-work with 100 vertices. Observe that theprocesses are virtually indistinguishablewith noise.

We ﬁrst introduce two continuous-time percolation models allowing for both birth anddeath of edges: an Erdos-Renyi (ER) process and a product rule (PR) process, which buildoﬀ of the work of Viles et al. (2016) on discrete-time percolation processes. Both modelsrepresent continuous-time network evolution as the result of many one-edge changes overtime. We then overlay the hidden Markov model framework on top of the percolationmodels to capture the presence of noise in the observations, which gives us the randomgraph hidden Markov model (RG-HMM). Throughout the paper, we use capitals to denoterandom variables. Additionally, we only consider networks with a ﬁxed vertex set.5 .1 Continuous-time Percolation Models

The continuous-time birth and death Erdos-Renyi (ER) process is a network-valued bivariate continuous-time Markov chain { W ( t ) , G ( t ) , t ≥ } taking on values in aﬁnite set X . G ( t ) is an undirected-graph variable with N nodes. The state space of thisnetwork-valued variable is the space of all simple graphs on N nodes, which is of cardinality2( N ). W ( t ) is a binary variable with state space { , } , indicating a single edge is eitheradded ( W ( t ) = 1) or deleted ( W ( t ) = 0) to produce G ( t ) at the most recent transitiontime. This process has the following properties:(i) at each transition time τ with state { W ( τ ) = w, G ( τ ) = g } , the amount of time itspends in that state before making a transition into a diﬀerent state is exponentiallydistributed with rate parameter λ > τ < τ , the network G ( τ ) diﬀers from G ( τ ) bya single edge, and the edge is chosen to be added or deleted uniformly at random onnon-edge set or edge set of G ( τ ), the choice of which depends on the binary variable W ( τ ). The transition matrix for the binary variable, when G ( τ ) is neither emptynor complete, is shown in Table 1. p and q are respectively birth and death rates,and are not required to sum to 1. When G ( τ ) is a complete graph, W ( τ ) = 0 withprobability 1 (i.e. the growth of edges are not possible since all possible edges exist).When G ( τ ) is an empty graph, W ( τ ) = 1 with probability 1 (i.e. the death of edgesare not possible since none exist).The Continuous-time birth and death product rule (PR) process is analogousto the aforementioned birth and death ER process, except for the choice of which edgeis added or deleted at each transition time. In the case of the ER model, such choice isuniform over current non-edge set or edge set. However, in the case of the PR model, thechoice depends on the modular structure of the current network.6 ( τ ) = 0 W ( τ ) = 1 W ( τ ) = 0 1 − p pW ( τ ) = 1 q − q Table 1: Transition probability matrix for W ( t ) at two consecutive transition time ( τ , τ ), τ < τ , when G ( τ ) is neither empty nor complete.Let E t denote the edge set of current network G ( t ), and E Ct the non-edge set. Supposethat there exist m t = (cid:12)(cid:12) E t (cid:12)(cid:12) edges for network G ( t ) at time t . For the upcoming transitiontime τ , if W ( τ ) = 1, the choice of which edge to add is done in the following manner.(i) Uniformly choose two candidate vertex pairs among all edges in E Ct (i.e. among allnon-edges). Denote the two vertex pairs by e = ( v , v ) and e = ( v , v ).(ii) Evaluate the size of the connected components to which v , v , v , v belong, anddenote them by C , C , C , C , respectively.(iii) Apply the following product rule in Achlioptas et al. (2009): If (cid:12)(cid:12) C (cid:12)(cid:12)(cid:12)(cid:12) C (cid:12)(cid:12) < (cid:12)(cid:12) C (cid:12)(cid:12)(cid:12)(cid:12) C (cid:12)(cid:12) ,then add edge e . Otherwise, add edge e .The death of the an edge is handled in an analogous manner. If W ( τ ) = 0, we uniformlychoose two candidate vertex pairs among all edges in E t , and similarly evaluate the size ofthe connected components to which v , v , v , v would belong if their edges were absent.If (cid:12)(cid:12) C (cid:12)(cid:12)(cid:12)(cid:12) C (cid:12)(cid:12) < (cid:12)(cid:12) C (cid:12)(cid:12)(cid:12)(cid:12) C (cid:12)(cid:12) , then delete edge e . Otherwise, delete edge e . Note that theAchlioptas product rule slows down the growth of the GCC by favoring the creation ofedges between small connected components. 7 (t ), G(t ) W(t ), G(t ) W(t ), G(t ) W(τ ) W(τ ) W(τ ) G(τ ) G(τ ) G(τ ) W(τ ) W(τ ) G(τ ) G(τ ) G*(t ) G*(t ) G*(t ) HiddenObservedContinuous-time bivariate Markov chain

Figure 3: RG-HMM set-up. The unobserved hidden variables evolve according to acontinuous-time Markov process, and we observe networks at discrete observation timeswith error. The unobserved small changes occurring between the consecutive observationtimes are represented in gray.

We now consider observing the continuous-time network evolution at discrete time pointswith errors. Assume that we have M repeated network-valued observations g ? M = [ g ? ( t ), g ? ( t ) , · · · , g ? ( t M )], where t < t < ... < t M are M observation time points. Let G ? M =[ G ? ( t ) , · · · , G ? ( t M )] denote the random vector of observed networks, of which g ? M is arealization. We represent a true/hidden network variable underlying the observed networkat a particular observation time by G ( t m ), and the hidden binary variable at the observationtime by W ( t m ). We assume that(i) the latent networks evolve according to a percolation model. Then the true/hiddenvariables at discrete observation times { W ( t m ) , G ( t m ) , m = 1 , , ..., M } are embeddedin the continuous-time Markov chain (ER/PR process), thus constituting a discrete-time Markov chain, 8ii) there is a time-independent error process, which corrupts the observing process bytype-I error rate and type-II error rate, denoted respectively by α and β . Speciﬁcally,for any α, β ∈ [0 , P ( G ? ( e ) = 0 (cid:12)(cid:12) G ( e ) = 0) = 1 − α, P ( G ? ( e ) = 1 (cid:12)(cid:12) G ( e ) = 0) = αP ( G ? ( e ) = 1 (cid:12)(cid:12) G ( e ) = 1) = 1 − β, P ( G ? ( e ) = 1 (cid:12)(cid:12) G ( e ) = 0) = β, (1)where G ? ( e ) = 1 or 0 represent a speciﬁc edge is present or not in observed variable,and G ( e ) = 1 or 0 represent a speciﬁc edge is present or not in hidden variable,(iii) the ﬁrst observation G ? ( t ) is error-free.The last assumption is for convenience and standard. Combining the network-valuedlatent Markov chain with the error process, we obtain a random graph hidden Markovmodel (RG-HMM). The schematic representation is shown in Figure 3. Although our ultimate goal is testing for two competing hypotheses of percolation regimes,learning the RG-HMM from a sequence of noisy networks observed only at a longitudinalsubsampling of times is a necessary precursor and of independent interest.Throughout this section, we assume α < . β < .

5. Under such an assumption,the parameters in the model are identiﬁable. (A proof is provided in section 1 of theSupplement.) We present an Expectation-Maximization (EM) algorithm for estimatingthe parameters in an RG-HMM (i.e. p, q, γ, α, β ) with a given sequence of noisy networksobserved only from a longitudinal subsampling of time points, i.e. g ? M . We assume that thenetwork in the hidden layer changes faster than the observing rate so that there are likelymany non-observed small changes occurring between the consecutive observation times, asis typically the case for functional connectivity networks used in the study of epilepsy.9stimation is done conditional on the latent state W ( t ) = 1 , G ( t ) = g ? ( t ) at theﬁrst observation time t . As in Snijders et al. (2010), this has the advantage that noinitial distribution assumption is needed for the latent Markov chain, and the estimatedparameters refer exclusively to the dynamics of the network.The algorithm consists of an E-step, wherein we calculate the expected value of thecomplete data log-likelihood l ( α, β, γ, p, q ) = log f ( W M , G M , g ? M ), with respect to thedistribution of unknown latent variables W M , G M , given the observed networks g ? M and the current parameter estimates. We then maximize the expected log-likelihood in theM-step.The E-step is non-trivial in that there are two levels of unknown latent variables inour model: we do not know the true networks and binary variables W ( t m ) , G ( t m ) at theobservation times, nor do we know the intermediate path that connects one true state W ( t m − ) , G ( t m − ), to the next W ( t m ) , G ( t m ). EM is standard for parameter estimationin general HMMs. However, we face some unique challenges: (1) direct calculation ofthe expectation is computationally infeasible due to the enormously large state space ofsize 2( n ); (2) the observed-data likelihood is diﬃcult to calculate due to the unobservedchanges occurring between consecutive observation times. Particle ﬁltering and augmenteddata sampling using MCMC are used to tackle these challenges. For the complete data scenario, we assume we have samples w M , g M for the hiddenlayer. Then the complete data log-likelihood for the RG-HMM, conditional on W ( t ) =1 , G ( t ) = g ? ( t ), can be written as l ? ( α, β, p, q, γ ) = log f ( w M , g M , g ? M )= M X m =2 log f α,β ( g ?m (cid:12)(cid:12) g m ) + M X m =2 log f p,q,γ ( w m , g m (cid:12)(cid:12) w m − , g m − ) , (2)10here g m , g ?m is the abbreviation for g ( t m ) and g ? ( t m ), respectively, α is the type I error rate, β is the type II error rate, p and q are birth and death rate, and γ is the rate parameterfor the continuous-time percolation model in which the hidden chain in RG-HMM wasembedded.The ﬁrst term in (2) is based on the conditional distribution of the observed networkgiven the truth, which takes the form f α,β ( g ?m (cid:12)(cid:12) g m ) = α c m (1 − α ) d m β b m (1 − β ) a m , (3)where a m , b m , c m , d m are counts corresponding to type I and type II errors in the errorprocess from g m to g ?m at observation time t m . Details are shown in Table 2.True/Observed E E c E a bE c c d Table 2: Counts corresponding to type I and type II errorsThe second term in (2) is the log-likelihood of the embedded discrete-time Markov chainin the hidden layer, which cannot be calculated in closed from due to the non-observed smallchanges occurring between consecutive observation times in the hidden layer. Therefore, weconsider the log-likelihood for augmented data in order to obtain a more easily computedlikelihood. The details are as follows.

Augmented data

The data augmentation in the hidden layer can be done for each period ( t m − , t m ), m =2 , ..., M . Consider period t to t . We assume there are R transition time points between t and t , denoted as τ , ..., τ R , which are ordered increasingly. Setting τ = t , we have11 = τ < τ < τ , ..., < τ R ≤ t . The sample path from G ( t ) to G ( t ) is characterized by S = ( G ( τ ) , G ( τ ) , ..., G ( τ R )) V = ( W ( τ ) , W ( τ ) , ..., W ( τ R )) , (4)which specify the sequence of unobserved change that brings the network from G ( t ) , W ( t )to G ( t ) , W ( t ). Thus, the feasible sample path S , V from G ( t ) to G ( t ) should satisfytwo conditions: 1) G ( τ j ) , G ( τ j +1 ) diﬀer by only one edge for j = 1 , ..., R −

1; and 2) G ( τ R ) = G ( t ), W ( τ R ) = W ( t ). The details of the probability mass function of a samplepath S , V conditional on G ( t ) , W ( t ) can be found in section 2 of the Supplement. The goal is to ﬁnd arg max Γ f ( g ? M (cid:12)(cid:12) G = g ? , W = w ), where Γ = ( p, q, γ, α, β ). The expected value of the complete data log-likelihood with respect to the true networksand true binary variables G M , W M , given the observed networks g ? M and current pa-rameter estimates Γ ( k − is Q (Γ; Γ ( k − ) = E (cid:2) log f ( g ? M (cid:12)(cid:12) G M ) (cid:12)(cid:12) g ? M , Γ ( k − (cid:3) + E (cid:2) log f ( W M , G M ) (cid:12)(cid:12) g ? M , Γ ( k − (cid:3) . (5) The ﬁrst term in (5) can be easily maximized since it has a closed-form log-likelihood in(3), which yields the following: ˆ α = E [ C (cid:12)(cid:12) g ? M , Γ ( k − ] E [ C + D (cid:12)(cid:12) g ? M , Γ ( k − ]ˆ β = E [ B (cid:12)(cid:12) g ? M , Γ ( k − ] E [ A + B (cid:12)(cid:12) g ? M , Γ ( k − ] . (6)12he formula for ˆ α is the expected number of false edges in the observed network divided bythe expected number of non-edges in the true network, given the observed networks g ? M and current parameter estimates Γ ( k − . The formula for ˆ β is the expected number of falsenon-edges in the observed network divided by the expected number of edges in the truenetwork, given the observed networks g ? M and current parameter estimates Γ ( k − .The second term in (5) can be maximized by setting the score function to be zero, E (cid:2) S ( θ ; W M , G M ) (cid:12)(cid:12) g ? M , Γ ( k − (cid:3) = 0 , (7)where θ = [ p, q, γ ], S ( θ ; w M , g M ) = ∂∂ θ log f ( w M , g M ) is the partial data score func-tion, which cannot be calculated in closed form. Note that the score function for theaugmented data is in closed form and by the Missing Information Principal used in Sni-jders et al. (2010), we have that S ( θ ; w M , g M ) = E (cid:2) S ( θ ; w M , g M , S , V ) (cid:12)(cid:12) W M = w M , G M = g M (cid:3) . (8)In other words, the partial data score function can be calculated by taking the expectationof the total data score function. The proof is provided in section 4 of the Supplement.Therefore, solving (7) yields the following closed-form solution for γ , p and q ,ˆ γ = 1 t M − t E " E " M X m =2 R m − (cid:12)(cid:12) W M = w M , G M = g M g ? M , Γ ( k − (9) ˆ p = E h E hP Mm =2 P R m − r =1 I (cid:8) W ( τ m − r − ) = 0 , W ( τ m − r ) = 1 (cid:9)(cid:12)(cid:12) W M = w M , G M = g M i (cid:12)(cid:12)(cid:12) g ? M , Γ ( k − i E h E hP Mm =2 P R m − r =1 I (cid:8) W ( τ m − r − ) = 0 (cid:9)(cid:12)(cid:12) W M = w M , G M = g M i (cid:12)(cid:12)(cid:12) g ? M , Γ ( k − i (10) ˆ q = E h E hP Mm =2 P R m − r =1 I (cid:8) W ( τ m − r − ) = 1 , W ( τ m − r ) = 0 (cid:9)(cid:12)(cid:12) W M = w M , G M = g M i (cid:12)(cid:12)(cid:12) g ? M , Γ ( k − i E h E hP Mm =2 P R m − r =1 I (cid:8) W ( τ m − r − ) = 1 (cid:9)(cid:12)(cid:12) W M = w M , G M = g M i (cid:12)(cid:12)(cid:12) g ? M , Γ ( k − i . (11)where R m − is the total number of transition times between t m − and t m in the augmentedsample path and τ m − r is the r -th transition time between t m − and t m .13 .3 Particle Filtering The expectation step of our E-M algorithm is diﬃcult to calculate. Since the state space forlatent variables is prohibitively large, it is not feasible to directly calculate the conditionalexpectations in the aforementioned MLEs by summing over all possible network sequences,binary variable sequences and sample paths in the hidden layer. Instead, we propose toestimate those conditional expectations in (6), (9), (10), (11) by sampling from the statespace of latent variables.Note that there are two levels of the expectation in (9), (10), and (11). The ﬁrstis the outer expectation, which is taken with respect to the conditional distribution of W M , G M in the hidden layer given the observed networks g ? M , which we propose tosample from by a particle ﬁltering based sampling scheme described in Doucet et al. (2001)(i.e. a Sequential Monte Carlo method). This sampling scheme involves ﬁrst approximating f ( w ( t m ) , g ( t m ) | g ? m − , Γ ( k − ) with B particles (i.e. samples for the true network andbinary variable pair) at each observation time t m , then tracing the ancestral lines, or particlepaths, for the particles as samples from f ( w M , g M | g ? M , Γ ( k − ). The B particles ateach observation time reduces the space from which we will sample from for the particlepaths, thus allowing us to eﬀectively sample some number Ψ network and binary variablesequences that are more likely to generate the observed networks in the error process. Avisual representation for this procedure is provided in Figure 4. More details are providedin section 5 of the Supplement. The second level of the expectation in (9), (10), and (11) is the inner expectation, which istaken with respect to the conditional distribution of sample path S , V that connects thegiven true sequences w M , g M at the observation times in the hidden layer. We sample14igure 4: Particle ﬁltering sampling scheme. B particles are sampled at each observationmoment following a two-stage process: 1) particles are sampled at the previous momentwith probability proportional to the conditional distribution of the true network given theobserved network; 2) then a true network and binary variable at the current observationmoment, i.e. a new particle, is simulated starting from one of the sampled particle in step 1,according to ER/PR process. Ψ ancestral lines are obtained by tracing back the ancestrallines for Ψ particles at time t M , sampled with probability proportional to the conditionaldistribution of the observed network given the true network at time t M .from this conditional distribution by simulating paths using an MCMC method. We drawupon the work of Snijders et al. (2010) where draws are generated by the Metropolis-Hastings algorithm, using a proposal distribution consisting of small changes in a samplepath that brings one network to the next. More details are provided in section 6 of theSupplement. The EM algorithm resulting from the combination of all elements described in the previoussubsections, for maximum likelihood estimation of the parameters p , q , γ , α , β in our15G-HMM, is shown in Algorithm 1. Algorithm 1:

EM Algorithm for MLEs

Input:

Network snapshots: g ? = (cid:0) g ? ( t ) , g ? ( t ) , ..., g ? ( t M ) (cid:1) , B , H , Ψ Output: ˆΓ = [ˆ α, ˆ β, ˆ p, ˆ q, ˆ γ ] Initialization: Γ while ! stopping condition for EM do . At ( l + 1) th EM iteration with estimates Γ l = [ p l , q l , γ l , α l , β l ] ) . Sample B particles { ( g ( b ) ( t m ) , w ( b ) ( t m )) } Bb =1 at each t m (particle filtering) Set g ( b ) ( t ) = g ? ( t ), g ( b ) ( t ) = 1 for each b . for m = 1 , , ..., M − do for b = 1 , ..., B do Calculate the conditional probability p mb := P ( G ? ( t m ) = g ? ( t m ) | G ( t m ) = g ( b ) ( t m )) end Sample B index η , η , ...η B from set (1 , , ..., B ) with replacement with probability proportional to p m , p m , ..., p mB . for b = 1 , ..., B do Start from g ( η b ) ( t m ) , w ( η b ) ( t m ), simulate according to ER/PR up until time t m − t m − , then set g ( b ) ( t m +1 ) , w ( b ) ( t m +1 ) to be the current simulated network and binary variable. end end . Sample ancestral lines and simulate sample paths for h = 1 , , .., H do Sample an index η from 1 , , ..., B with probability proportional to p M , p M , ..., p MB , trace back theancestral line for the η -th particle at time t M , then obtain one sequence of networks and binaryvariables, g ( h )2: M , w ( h )2: M for d = 1 , , ..., D do Sample one path s ( d ) h , v ( d ) h from S , V | W = w ( h )2: M , G = g ( h )2: M by Metropolis–Hastings. Calculate and store the suﬃcient statistics. end end . Update parameters Update ˆ p l +1 , ˆ q l +1 , ˆ γ l +1 using formulas in (9) , (10) , (11) with sampled paths { s ( d ) h , v ( d ) h } Dd =1 thatinterpolate w ( h )2: M , g ( h )2: M , h = 1 , · · · , H . Draw Ψ ancestral lines, then update ˆ α l +1 , ˆ β l +1 using formulas in (6) with sampled network sequences { g ( ψ )2: M } Ψ ψ =1 . end || ˆ Γ l +1 − ˆ Γ l || || ˆ Γ l || < .

10. Based on ournumerical experience, the proposed EM algorithm usually converges within 5 iterationswhen all parameters are initialized as 0 .

5. Less iterations will be needed if parameters areinitialized somewhere close to the true values. The choice of B (i.e. the number of particlessampled at each observation moment), the choice of Ψ (i.e. the number of true networkseries to sample for the maximization of error rates α and β ) and the choice of H (i.e. thenumber of true network series to sample for the maximization of p, q, γ ) may vary dependingon the size of network and the number of observation time points. Our recommendedstarting point is to set B = 50 , ,

000 and H = 10. It is important to keepin mind that the maximization of p, q, γ involves a computationally expensive MCMCprocedure. Setting H to a small number helps eﬃciently reduce the computational burdenwithout compromising much on the accuracy of parameter estimates, in our experience.In our EM algorithm, time complexity is O ( BM ) for the particle ﬁltering step, O ( HM )for the MCMC step, and O (Ψ M ) for sampling Ψ number of ancestral lines. Although theestimation procedure scales linearly in B , M and H , it is still a computationally expensiveprocedure for large networks because simulating the true networks in the hidden layerinvolves sampling from edges or non-edges of a N -node network, which scales quadraticallyin N using standard techniques. We provide a convergence result for the MLE in our proposed RG-HMM in its stationaryperiod as time progresses. Let Γ and ˆΓ denote the true values of parameters and theMLE of RG-HMM, respectively. To simplify the notation, let X ( t m ) denote the pair( W ( t m ) , G ( t m )).The asymptotic behavior of ˆΓ in our RG-HMM can be established using results from17ickel et al. (1998) on asymptotic normality of the MLE for a general stationary HMM. Theorem 3.1.

Let π Γ ( x ) be the stationary distribution of a Markov chain in RG-HMM { X ( t m ) , G ? ( t m ) } and L be the limiting covariance matrix of m − / ∂∂ Γ log p Γ ( G ? m ) . As-sume (C1) that t m − t m − = t m +1 − t m = ∆ t, ∀ m ≥ , and stationarity is reached at t ,(C2) that ∀ x ∈ X , Γ → π Γ ( x ) has two continuous derivatives in some small neighborhoodof Γ , and (C3) that L is nonsingular. Then m / ( ˆΓ − Γ ) p −→ N (0 , L − ) as m → ∞ . The proof of Theorem 3.1 is given in section 7 of the Supplement. In the E-step ofour EM algorithm, we apply particle ﬁltering to approximate the expected log-likelihoodof the complete-data. Although a central limit theorem (Chopin et al. (2004)) exists forthe particle estimate of the likelihood, which assures the asymptotic unbiasedness of theapproximation in E-step as B → ∞ , it is still diﬃcult to theoretically investigate theconvergence property of our EM algorithm with the rather complicated likelihood functionin the RG-HMM. Therefore, we resort to simulation to evaluate the empirical performanceof our EM in the later section. In this section, we present a hypothesis testing framework using Bayes factors for dis-tinguishing between two percolation models, ER and PR. The testing problem can beformulated as a test of separate families of hypotheses, i.e. H ER : g ? ∼ f ( g ? , Γ ER ) vs H P R : g ? ∼ g ( g ? , Γ P R ) , (12)where g ? = [ g ? ( t ) , g ? ( t ) , ..., g ? ( t M )] is the observed sequence of networks, and f ( · , Γ ER )and g ( · , Γ P R ) are the probability functions of G ? under the ER process with parameters Γ ER and PR process with parameters Γ P R . H ER and H P R are the hypotheses that theobserved sequences g ? is from the ER process and PR process, respectively.18 .1 Bayes Factor Following Cox (1961) on tests of separate families of hypotheses, we adopt a Bayesianapproach to this testing problem. The posterior odds for H ER vs H P R are, by Bayes’stheorem, pr ( H ER | g ? ) pr ( H P R | g ? ) = ˜ ω f R f ( g ? , Γ ER ) p f ( Γ ER ) d Γ ER ˜ ω g R g ( g ? , Γ P R ) p g ( Γ P R ) d Γ P R , (13)where ˜ ω f and ˜ ω g are the prior probabilities of H ER and H P R being true, respectively. p f ( Γ ER ) is the prior p.d.f of Γ ER under H ER , and p g ( Γ P R ) is the prior p.d.f of Γ P R under H P R .The Bayes factor ( BF ) is the posterior odds for H ER vs H P R when the prior probabilitiesof the two hypotheses are equal, i.e. when ˜ ω f = ˜ ω g = 0 .

5. Exact analytic calculation ofthe Bayes factor for the testing problem in (12) is not tractable, so we resort to numericalmethods. Kass & Raftery (1995) proposed two large-sample approximations for Bayesfactors - Laplace’s Method and the Schwarz Criterion. The approximation from Laplace’smethod requires specifying the prior distributions p f ( Γ ER ) and p g ( Γ P R ) on parameters ofeach model. However, the choice of priors is not trivial for a percolation model. Therefore,we choose to use the approximation from the Schwarz criterion which has the beneﬁt ofnot requiring priors. Also note that because the dimensions of Γ ER and Γ P R are the same,the logarithm of the Bayes factor for our testing problem can be approximated by thelog-likelihood diﬀerence, i.e. log BF ≈ log f ( g ? , ˆ Γ ER ) − log g ( g ? , ˆ Γ P R ). The computation of the probabilities of the observed networks under the two percolationmodels, log f ( g ? , ˆ Γ ER ) and log g ( g ? , ˆ Γ P R ), is the key ingredient of the test. We now presenta forward algorithm with particle ﬁltering to approximate such quantities in our RG-HMM.Similar to the ML estimation, testing is also done conditional on the latent state W ( t ) =19 , G ( t ) = g ? ( t ) at the ﬁrst observation time t . To simplify the notation, let X m denotethe pair ( W ( t m ) , G ( t m )). Deﬁne Z m := P (cid:16) G ? m = g ? m | X = (cid:0) , g ? ( t ) (cid:1)(cid:17) for each m =2 , · · · M , and Z := 1. Also let q m ( x ) = P ( G ?m = g ?m | X m = x ), η m ( x ) = P ( X m = x | G ? m − = g ? m − , X = (1 , g ? ( t )) for each m = 2 , · · · M , with the convention η ( x ) = P ( X = x | X = (1 , g ? ( t )). To alleviate the notational burden, we omit the condition at t in the probability expression for the remainder of this section when no confusion is possible.With the recurrence relation (the derivation is provided in section 8 of the Supplement) Z m = Z m − X x q m ( x ) η m ( x ) = Z m − E ( q m ( X )) , X ∼ η m ( · ) (14)the probabilities Z M can be approximated recursively by replacing each η m ( · ) in (14) with η Bm ( · ), which is the particle approximation of the ﬁltering distribution η m ( · ). A forwardalgorithm with particle ﬁltering is provided in section 9 of the Supplement to compute Z BM , i.e. the particle approximation of the marginal probability of observations. The timecomplexity is O ( BM ). Thus the full complexity to compare the two percolation regimes is O (cid:0) ( B + H + Ψ) M (cid:1) . In this section, we conduct simulations to evaluate the empirical performance of the pro-posed estimators described in Section 3 and the test described in Section 4.

We ﬁrst investigate the ﬁnite sample performance of the estimation procedure for networkswith N = 20 nodes simulated at M = 50 observation time points, referred to as { t m } m =1 ,from the ER/PR percolation process with parameter setting p = 0 . , q = 0 . , γ = 2 anderror rates α = 0 . , β = 0 .

01. We set the observation rate, denoted by κ , to be 0 .

6, i.e. we20et t m = m/ .

6. The observation rate is set to be smaller than the actual rate of change tomimic the scenario in application. For each ER and PR process, we generate 100 networktime series with length 50 and obtain ML estimates of the model parameters from eachdata set. In the EM algorithm, we set B = 50 ,

000 and initialize all parameters as 0 . Birth Rate p Death Rate q Transition Rate γ Type-I Error α Type-II Error β Truth 0.70 0.30 2 0.03 0.01ER-HMM 0.680 (0.041) 0.299 (0.063) 1.746 (0.126) 0.112 (0.021) 0.037 (0.013)PR-HMM 0.684 (0.035) 0.289 (0.057) 1.770 (0.113) 0.118 (0.019) 0.035 (0.013)

Table 3: Mean and standard deviation of parameter estimates based on 100 simulationsfrom ER/PR process with B = 50000, N = 20, M = 50.are seeing bias in the estimates, especially for type-I and type-II error rates, which is notunexpected given that we are approximating the distribution on a very large dimensionalstate space with only B = 50 ,

000 particles. In principle, the larger B is, the more accurateestimates will be. However, it comes with the trade-oﬀ between computational time andaccuracy.We then perform a small simulation study for the ER process, as a very limited explo-ration of the performance of MLEs with respect to B (the number of particles sampled ateach observation moment), M (the number of observation time points), and N (networksize), under the same model parameter setting as in the previous simulation study. Figure 5shows the mean and standard deviation of parameter estimates from 20 simulated networksequences with N = 20 and M = 50 using an increasing number of particles. Figure 621hows the mean and standard deviation of parameter estimates based on 20 simulationsfrom an ER process with B = 50 ,

000 and (

N, M ) ∈ { , , } × { , , , } .Numerical results are provided in section 10 of the Supplement.The main empirical ﬁndings are the following. From Figure 5, we can see that in-creasing B could reduce bias in the estimates for type-I and type-II error rates, but suchimprovement is diminished as B increases. In Figure 6, the performance of type-I errorrate estimation improves as the size of the network increases, but it does not improve asthe length of the network time series (i.e. M ) increases, using a ﬁxed number of particles.It is not surprising that as the number of observed networks is increasing in M , more par-ticles are required to approximate the parameters, in order to reveal the true asymptoticbehavior as M increases. The type-II error rate estimate is not as sensitive to either M or N , but rather to B . The performance of parameter estimates of p , q and γ are improvedas M increases. Additionally, for estimating these parameters from larger networks, thealgorithm requires larger M , i.e. longer network time series with more observation timepoints, compared to that needed from smaller networks, in order to achieve similar perfor-mance. In summary, we can get more accurate estimates for error rates with larger N and B , with the caveat that more particles are needed for larger M . For the estimates of p , q and γ , improvement is seen as M and B increase, with the caveat that larger M is neededfor larger N . To assess the eﬀectiveness of our testing framework using Bayes factor for discriminatingbetween two percolation models, ER and P R , we choose the rate of detection as theperformance metric. For each simulated network sequence, we ﬁrst compute its Bayesfactor. We next identify it to be an ER sequence if the Bayes factor is greater than 1,otherwise we identify it as a PR sequence. Hence, the rate of detection for ER(PR) process22 .20.40.60.8 50000 250000 500000 1000000 B E s t i m a t e s Parameter pq 1.71.81.92.0 50000 250000 500000 1000000 B E s t i m a t e s Parameter g B E s t i m a t e s Parameter a B E s t i m a t e s Parameter b Figure 5: Mean and standard deviation of parameter estimates based on 20 simulationsfrom the ER process with varying B , N = 20, M = 50. M E s t i m a t e s N Parameter pq 1.61.82.0 50 100 150 200 M E s t i m a t e s Parameter g M E s t i m a t e s Parameter a M E s t i m a t e s Parameter b Figure 6: Mean and standard deviation of parameter estimates based on 20 simulationsfrom ER process with B = 50000, ( N, M ) ∈ { , , } × { , , , } .is the percentage of ER(PR) sequences that are correctly identiﬁed as such among allsimulated ER(PR) sequences.A simulation study is designed to evaluate the eﬀect of B ∈ { , , , } , N ∈ { , , } , and t M = { . , . , . , . } on the rate of detection for ER and PRusing Bayes factor, where t M is the normalized observation duration time, which represents23ow much of the percolation curve we take for testing. Speciﬁcally, we deﬁne normalizedobservation time points as t m := γt m N . Since observations are made with regular intervalsat rate κ , i.e. t m = m/κ for m = 1 , · · · , M , we have t m = mN ( κ/γ ) . Figure 7 illustratesthe four levels of progression for the percolation curve we are considering for testing viaarrowed lines. Each level of t M corresponds to a curve segment taken from the beginningof the curve to the gray line placed at the corresponding scaled time.Figure 7: 4 levels of t M : 0 . , . , . , . t M is the normalized observation duration time,which represents how much of the percolation curve we take for testing.For each combination of settings, we generate 100 ER sequences and 100 PR sequenceswith p = 0 . , q = 0 . , γ = 2 , α = 0 .

01, and β = 0 .

01. We set the rate of observation, κ , to be 1.5 and conduct hypothesis testing on each of the simulated network sequences.Rates of detection for ER and PR, based on 100 replicates, along with estimated standarderrors, are shown in Figure 8 (left). We then ﬁt a generalized linear model with mixedeﬀects on the testing results from the simulation study in order to analyze the relativecontribution of variation in the rate of detection for ER/PR due to the change of valuesin B , N and t M . We set the seed used for generating the networks to be a random eﬀect,since there might be some dependencies between the network sequences simulated usingthe same seeds. Additionally, those simulated from diﬀerent seeds may be of a diﬀerentlevel of diﬃculty to distinguish. The ﬁtted probability of successful detection, as well as24he estimates of the coeﬃcients for the GLMM is shown in Figure 8 (right) and Figure9, respectively. The ANOVA table associated with the ﬁtted model is provided in Table4. The results show that the rate of detection is signiﬁcantly aﬀected by B (number ofparticles), t M (how much percolation curve is observed for testing), but not N . And it’sslightly easier for ER sequences to be identiﬁed as such than for PR sequences.We also conduct a second simulation study to evaluate the eﬀect of observation rate κ on the rate of detection for ER and PR using Bayes factor, which shows that rate ofdetection signiﬁcantly increases by increasing κ from 0 . κ further increases. Details of this simulation study is provided in section 11 ofthe Supplement. ER PR scaled_t R a t e o f D e t e c t i on B ER PR scaled t P r obab ili t y o f S u cc e ss f u l D e t e c t i on B Fitted Probabilities of Successful Detection from GLMM

Figure 8: Results for the ﬁrst simulation study. Left: rates of detection for ER and PRbased on 100 replicates. Right: ﬁtted probability of successful detection in GLMM.In conclusion, our proposed testing framework can eﬀectively distinguish between diﬀer-ent percolation models using an observed sequence of noisy networks under proper settings.The size of the network does not aﬀect the ability to discern between the two percolationregimes and such ability increases as B (the number of particles) and κ (the observation25ate) increases. This coincides with our intuition that a better particle approximation forthe latent distribution, as well as a more informed signal, would help in distinguishingbetween percolation models. N 30N 20PRB 500000B 250000scaled t 2.4scaled t 1.86scaled t 1.34 0.2 0.5 1 2 5 10

Odds Ratios

Figure 9: Estimates of the coeﬃcients fromGLMM in the ﬁrst simulation study.

Chisq Df Pr( > Chisq)N 0.52 2 0.7728scaled t 18.88 3 0.0003process 7.68 1 0.0056B 14.29 2 0.0008

Table 4: ANOVA table for the ﬁrst simula-tion study

In this section, we show an example application of our methodology to electrocorticographydata recorded from a human subject during three seizures.

The data consist of invasive brain voltage recordings for 3 seizures from a patient withepilepsy. The entire array of electrodes consisted of 106 electrodes on the left hemisphereand 70 electrodes on the right hemisphere, with 1024 Hz sampling rate. For each seizure, weconstruct a time series of functional connectivity networks based on the voltage recordingsin the following manner. We ﬁrst band-pass ﬁlter the data between 4 and 50 Hz, andcompute a bipolar reference by subtracting the activity of neighboring electrodes. Thenwe divide the re-referenced data into 1 s windows and 0 . q = 0 . Before we apply our estimation and testing framework to the constructed networks, weneed ﬁrst choose segments of the network time series consistent with percolation. Previouswork (Kramer et al. (2010), Martinet et al. (2020)) suggests sudden increases in the sizeof connected functional network communities occur at multiple times during a seizure(e.g., near seizure onset and termination), making nontrivial the choice of an appropriatetime segment from the overall non-stationary signal. We developed a simple, automaticprocedure for this purpose, consisting of roughly the following steps: 1) Select intervals oftime that only contain network evolution in the regions of interest (ROIs) correspondingto periods of seizure evolution on clinical review of the data, as determined by a board-27ertiﬁed epileptologist (CJC) from the time intervals in which large dynamic communitiesappear (for details of the dynamic community detection procedure on network time seriessee Martinet et al. (2020)). 2) Identify segments in the ROIs where both the size of GCCand density ramp up over time and use them for testing. The details of this procedure areprovided in section 12 of the Supplement. Applying this procedure to the ﬁrst seizure, weidentify three segments, as shown in Figure 10 (bold curves). Results for the other twoseizures are shown in section 13 of the Supplement.

Having selected time segments of dynamic functional network evolution during the seizures,we now apply the statistical testing procedure to infer the percolation regime - ER or PR -for each segment. We take each chosen segment from the automatic segment ﬁnder as inputfor the statistical testing procedure under our RG-HMM framework, with B = 500 , .

3, 31 . .

0, respectively); all segments from seizure 2 (51 .

0, 105 . .

1, respectively); and3 of 4 segments from seizure 3 (10 .

9, 61 .

5, 15 . − .

0, respectively). We concludethat Erdos-Renyi (ER) percolation is better supported by the data for the entire courseof seizure 1 and seizure 2, as well as the early ictal stages of seizure 3, while product-rule(PR) percolation is better supported for the late ictal stage of seizure 3.These preliminary results suggest that both types of percolation can occur in the dy-namic functional networks of human seizures. For this patient, though each seizure beginsin the same focal location with the same recruitment patterns of type ER, the recruitment28igure 10: Example selection of analysis segment for one seizure. Vertical lines repre-sent clinically determined seizure onset and termination times. Top: Voltage time seriesrecorded at eight electrodes on left brain hemisphere. Second row: Community membership(indicated by shades of gray) for each node over time; the three large dynamic communitiesin darker shades manifest ROIs. Third row: Network density over time. Bottom: Propor-tion of nodes in the GCC over time; the bold curves in ROIs are chosen by automaticsegment ﬁnder for testing.patterns diﬀer preceding seizure termination, which are of type ER for seizure 1,2 and typePR for seizure 3. Clinically, seizure 3 diﬀers from the other two seizures; seizure 3 remainsspatially focal, while seizures 1 and 2 continue to propagate and only terminate after re-cruiting the entire brain. These preliminary results suggest a diﬀerent network dynamicfor the two clinical seizure types. Whether local PR percolation (here, in the left hemi-sphere) correlates with focal seizure termination and prevents generalization (here, to theright hemisphere) requires further investigation. We note that identiﬁcation of percolationtype may also suggest diﬀerent treatment strategies. For example, to prevent PR perco-29ation, the growing subnetworks that emerge could be separately targeted and preventedfrom joining. Alternatively, to prevent ER percolation, eﬀective treatment may requirepreventing the expansion of the emerging single connected component. ˆ p ˆ q ˆ γ ˆ α ˆ β loglik ER 0.404 (0.112) 0.248 (0.075) 10.075 (1.423) 0.068 (0.0005) 0.435 (0.035) -5636.822 (5.942)PR 0.468 (0.156) 0.163 (0.072) 9.819 (1.512) 0.068 (0.0004) 0.464 (0.032) -5647.120 (6.850)log(BF) 10.297 (9.468))

We develop a class of random graph hidden Markov models (RG-HMMs) for characterizingpercolation regimes in noisy, dynamically evolving networks in the presence of edge birthand edge death, as well as noise, under which exponential waiting times are used to modelthe time between the addition or removal of an edge. We present an EM algorithm with aparticle ﬁltering and sample path simulation scheme for estimating model parameters from asequence of noisy networks observed only at a longitudinal subsampling of time points (i.e.,not at every time step of edge change). We also provide a framework for statistical testing ofcompeting hypotheses of percolation regimes, which involves calculation of the likelihoodsevaluated at optimal estimates under each hypothesis, given the observed networks.Furthermore, we establish the asymptotic property of the MLE for our RG-HMMs inits stationary period using asymptotic results for general HMMs with stationary Markovchains (Bickel et al. (1998)). Unfortunately, no proof is yet available for our model in its30onstationary period when networks evolve through the percolation phase transition. Themost relevant work on asymptotic theory for HMMs with nonstationary Markov chains isin Douc et al. (2001). This is an interesting subject for future research.In the proposed model, we consider two canonical random graph models i.e., Erdos-Renyi and product-rule, representing two prototypical extremes of percolation. Notably,however, there is a wide spectrum of other random graph models, displaying a varietyof percolation behaviors (e.g., Riordan & Warnke (2011), Fig. 1). Our work may beextended to these other percolation regimes, given that our proposed RG-HMM is a ﬂexibleframework which may subsume a variety of random graph models.Our experience in the reported simulations and in working with empirical data setsis that the algorithm converges well. However, it is time-consuming, especially for largernetworks requiring more particles for approximation. For example, when working with a 30node network with 30 observation time points and 50,000 particles, one testing procedure,including ML estimations under two hypotheses and the calculation of Bayes factor, takesapproximately 1 hour to run on a high performance Linux computing cluster using 4 coreswith appropriate use of parallelization techniques for speedups. If we multiply the numberof particles by 10, i.e. B = 500 , B is, the more accurate estimates will be. However, it comes withthe trade-oﬀ between computational time and accuracy. While it is feasible to run ouralgorithm on even larger networks under manageable time with a relatively small particlesize, we must interpret the results cautiously. In such cases, we recommend running multipletesting trials and taking an average of the approximated Bayes factor to reduce the highvariance caused by a relatively small number of particles.Given that recent work in network science, as it pertains to epilepsy, has shown anexplosive density increase in functional connectivity networks in epilepsy patients duringseizure onset, our work has the potential to be quite impactful for clinical neuroscience.31o demonstrate the application to epileptic seizures, we apply our framework to assess,at a minimum, whether ER or PR percolation is better supported by the data for theseparticular seizures. We note that epilepsy is a complex and still incompletely understooddisease. While it would be naive to expect that support for one type of percolation regimeover another gleaned from our functional connectivity networks would oﬀer signiﬁcantinsights into the etiology of this disease, we nevertheless are optimistic that results obtainedthrough the application proposed here will be suﬃciently suggestive as to inspire additionalconstructive thinking about the period of seizure onset, which may lead to improvementsin treatment and disease management.Several opportunities for extending the scope of this application remain. For example,we may apply our testing framework to seizure data from a group of patients and performa meta-analysis to aggregate and contrast the ﬁndings from single seizures to identifypatterns and interesting relationships that may come to light by comparing the test resultsagainst patient phenotype information. This phenotype information may include medicalclassiﬁcation of severity of the disease, putative locations of seizure emergence and otherclinically relevant measurements.Lastly, although we have chosen to focus our attention on epileptic seizures, our class ofmodels and our framework for parameter estimation and testing have the potential to beused outside of the realm of seizure data to better understand the emergence of organizedstructure in many other dynamic systems. With percolation theory omnipresent acrossa vast range of applications such as social networks, traﬃc networks, infectious diseasenetworks, amino acid networks, and even understanding the spread of forest ﬁres, computerviruses, etc. (Saberi (2015), Bianconi & Radicchi (2016), Stauﬀer & Aharony (2018), Islam& Hassan (2019)), our hope is that our methods may be used to better understand networkdynamics across a wide range of ﬁelds. 32 upplementary Materials Extended details can be found in the online Supplement.

References

Achlioptas, D., D’Souza, R. M. & Spencer, J. (2009), ‘Explosive percolation in randomnetworks’,

Science (5920), 1453–1455.Bianconi, G. & Radicchi, F. (2016), ‘Percolation in real multiplex networks’,

PhysicalReview E (6), 060301.Bickel, P. J., Ritov, Y., Ryden, T. et al. (1998), ‘Asymptotic normality of the maximum-likelihood estimator for general hidden markov models’, The Annals of Statistics (4), 1614–1635.Chopin, N. et al. (2004), ‘Central limit theorem for sequential monte carlo methods andits application to bayesian inference’, The Annals of Statistics (6), 2385–2411.Cox, D. R. (1961), Tests of separate families of hypotheses, in ‘Proceedings of the fourthBerkeley symposium on mathematical statistics and probability’, Vol. 1, p. 23.Douc, R., Matias, C. et al. (2001), ‘Asymptotics of the maximum likelihood estimator forgeneral hidden markov models’, Bernoulli (3), 381–420.Doucet, A., De Freitas, N. & Gordon, N. (2001), An introduction to sequential monte carlomethods, in ‘Sequential Monte Carlo methods in practice’, Springer, pp. 3–14.Erd˝os, P. & R´enyi, A. (1960), ‘On the evolution of random graphs’, Publ. Math. Inst. Hung.Acad. Sci (1), 17–60. 33rimmett, G. (2018), Probability on graphs: random processes on graphs and lattices , Vol. 8,Cambridge University Press.Guye, M., R´egis, J., Tamura, M., Wendling, F., Gonigal, A. M., Chauvel, P. & Bartolomei,F. (2006), ‘The role of corticothalamic coupling in human temporal lobe epilepsy’,

Brain (7), 1917–1928.Islam, M. H. E. & Hassan, M. (2019), ‘Universality class of explosive percolation inbarab´asi-albert networks’,

Scientiﬁc reports (1), 1–13.Kass, R. E. & Raftery, A. E. (1995), ‘Bayes factors’, Journal of the american statisticalassociation (430), 773–795.Kramer, M. A. & Cash, S. S. (2012), ‘Epilepsy as a disorder of cortical network organiza-tion’, The Neuroscientist (4), 360–372.Kramer, M. A., Eden, U. T., Cash, S. S. & Kolaczyk, E. D. (2009), ‘Network inferencewith conﬁdence from multivariate time series’, Physical Review E (6), 061916.Kramer, M. A., Eden, U. T., Kolaczyk, E. D., Zepeda, R., Eskandar, E. N. & Cash,S. S. (2010), ‘Coalescence and fragmentation of cortical networks during focal seizures’, Journal of Neuroscience (30), 10076–10085.Martinet, L.-E., Kramer, M., Viles, W., Perkins, L., Spencer, E., Chu, C., Cash, S. &Kolaczyk, E. (2020), ‘Robust dynamic community detection with applications to humanbrain functional networks’, Nature Communications (1), 1–13.Ponten, S., Bartolomei, F. & Stam, C. (2007), ‘Small-world networks and epilepsy: graphtheoretical analysis of intracerebrally recorded mesial temporal lobe seizures’, Clinicalneurophysiology (4), 918–927. 34iordan, O. & Warnke, L. (2011), ‘Explosive percolation is continuous’,

Science (6040), 322–324.Rubinov, M. & Sporns, O. (2010), ‘Complex network measures of brain connectivity: usesand interpretations’,

Neuroimage (3), 1059–1069.Saberi, A. A. (2015), ‘Recent advances in percolation theory and its applications’, PhysicsReports , 1–32.Schindler, K., Amor, F., Gast, H., M¨uller, M., Stibal, A., Mariani, L. & Rummel, C. (2010),‘Peri-ictal correlation dynamics of high-frequency (80–200 hz) intracranial eeg’,

Epilepsyresearch (1), 72–81.Schindler, K., Elger, C. E. & Lehnertz, K. (2007), ‘Increasing synchronization may pro-mote seizure termination: evidence from status epilepticus’, Clinical neurophysiology (9), 1955–1968.Schindler, K., Leung, H., Elger, C. E. & Lehnertz, K. (2007), ‘Assessing seizure dynamics byanalysing the correlation structure of multichannel intracranial eeg’,

Brain (1), 65–77.Snijders, T. A., Koskinen, J. & Schweinberger, M. (2010), ‘Maximum likelihood estimationfor social network dynamics’,

The Annals of Applied Statistics (2), 567.Stauﬀer, D. & Aharony, A. (2018), Introduction to percolation theory , CRC press.Thijs, R. D., Surges, R., O’Brien, T. J. & Sander, J. W. (2019), ‘Epilepsy in adults’,

TheLancet (10172), 689–701.Viles, W., Ginestet, C. E., Tang, A., Kramer, M. A. & Kolaczyk, E. D. (2016), ‘Percola-tion under noise: Detecting explosive percolation using the second-largest component’,

Physical Review E (5), 052301. 35 upplement to “Inferring the Type of PhaseTransitions Undergone in Epileptic Seizures UsingRandom Graph Hidden Markov Models forPercolation in Noisy Dynamic Networks” Xiaojing Zhu ∗ , Heather Shappell ∗ , Mark A. Kramer,Catherine J. Chu, Eric D. Kolaczyk In this section, we address the identiﬁability of RG-HMM. The model is identiﬁable ifdistinct values of Γ correspond to distinct probability distributions, i.e. if Γ = Γ , thenalso P Γ = P Γ . We show that f ( g ? ) is identiﬁable if type-I error α < . β < . Γ = ( p , q , γ , α , β ), and Γ = ( p , q , γ , α , β )where p = q , q = p , α = 1 − β , β = 1 − α will give the same probability distributionof the observed networks, written as f (cid:0) g ? ( t ) , · · · , g ? ( t M ) (cid:1) . We will then assert that this isthe only parameterization that will cause non-identiﬁability, thus we can make the modelidentiﬁable by restricting the parameter space: α < . , β < . f Γ (cid:0) g ? ( t ) , · · · , g ? ( t M ) (cid:1) = f Γ (cid:0) g ? ( t ) , · · · , g ? ( t M ) (cid:1) , ∀ g ? ( t ) , · · · , g ? ( t M ).Let x ( t m ) = ( g ( t m ) , w ( t m )) denote the latent variables in the hidden layer, then conditionalon x ( t ) = ( g ? ( t ) , f Γ (cid:0) g ? ( t ) , · · · , g ? ( t M ) (cid:1) = X x ( t ) , ··· , x ( t M ) f Γ (cid:0) g ? ( t ) , · · · , g ? ( t M ) , x ( t ) , · · · , x ( t M ) (cid:1) = X x ( t ) , ··· , x ( t M ) f (cid:0) x ( t ) (cid:1) f α ,β (cid:0) g ? ( t ) | g ( t ) (cid:1) M Y m =2 f α ,β (cid:0) g ? ( t m ) (cid:12)(cid:12) g ( t m ) (cid:1) f p ,q ,γ (cid:0) x ( t m ) (cid:12)(cid:12) x ( t m − ) (cid:1) = X x ( t ) , ··· , x ( t M ) M Y m =2 f α ,β (cid:0) g ? ( t m ) (cid:12)(cid:12) g ( t m ) (cid:1) f p ,q ,γ (cid:0) x ( t m ) (cid:12)(cid:12) x ( t m − ) (cid:1) . (1) ∗ These authors contributed equally. a r X i v : . [ s t a t . A P ] F e b or any potential true network g ( t m ), there exists an ‘opposite’ network g ( t m ) obtainedby ﬂipping all edges of g ( t m ) to non-edges and all non-edges to edges. Hence, if given theobserved network g ? ( t m ), g ( t m ) contains a true edges, b false non-edges, c false edges and d true non-edges, then given g ? ( t m ), g ( t m ) will contain c true edges, d false non-edges, a false edges and b true non-edges. Note that f α ,β (cid:0) g ? ( t m ) | g ( t m ) (cid:1) = α c (1 − α ) d β b (1 − β ) a f α ,β (cid:0) g ? ( t m ) | g ( t m ) (cid:1) = α a (1 − α ) b β d (1 − β ) c , (2)then we have f α ,β (cid:0) g ? ( t m ) | g ( t m ) (cid:1) = f α ,β (cid:0) g ? ( t m ) | g ( t m ) (cid:1) if α = 1 − β , β = 1 − α .Deﬁne x ( t m ) = (cid:0) g ( t m ) , w ( t m ) (cid:1) , where w ( t m ) = 1 − w ( t m ). Next we show that f p ,q ,γ (cid:0) x ( t m ) (cid:12)(cid:12) x ( t m − ) (cid:1) = f p ,q ,γ (cid:0) x ( t m ) (cid:12)(cid:12) x ( t m − ) (cid:1) if p = q , p = q , γ = γ for ERprocess. Then we have f p ,q ,γ (cid:0) x ( t m ) (cid:12)(cid:12) x ( t m − ) (cid:1) = X u h e − γ ( t m − t m − ) ( γ ( t m − t m − ) R R ! · R Y r =1 f (cid:0) g ( τ r ) (cid:12)(cid:12) w ( τ r ) , g ( τ r − ) (cid:1) f (cid:0) w ( τ r ) (cid:12)(cid:12) g ( τ r − ) , w ( τ r − ) (cid:1)i , (3)where the sum is over all possible sample paths connecting x ( t m − ) and x ( t m ). De-tails for the pmf of a sample path are discussed in section 2. Let u be the ‘oppo-site’ sample path of u , which can be obtained by ﬂipping all ( g ( τ r ) , w ( τ r )) in u into( g ( τ r ) , − w ( τ r )). Hence if u is a sample path connecting x ( t m − ) and x ( t m ), u willthen be a sample path connecting x ( t m − ) and x ( t m ). If we ﬁx the rate parameter γ = γ , then it is suﬃcient to show that f (cid:0) g ( τ r ) (cid:12)(cid:12) w ( τ r ) , g ( τ r − ) (cid:1) f (cid:0) w ( τ r ) (cid:12)(cid:12) g ( τ r − ) , w ( τ r − ) = f (cid:0) g ( τ r ) (cid:12)(cid:12) w ( τ r ) , g ( τ r − ) (cid:1) f (cid:0) w ( τ r ) (cid:12)(cid:12) g ( τ r − ) , w ( τ r − ) for each τ r . Assume that g ( τ r − ) con-tains a edges and b non-edges, then g ( τ r − ) will contain b edges and a non-edges. Notethat under ER process, f (cid:0) g ( τ r ) (cid:12)(cid:12) w ( τ r ) , g ( τ r − ) (cid:1) f (cid:0) w ( τ r ) (cid:12)(cid:12) g ( τ r − ) , w ( τ r − ) =  p b , if w ( τ r ) = 1 , w ( τ r − ) = 0 − q b , if w ( τ r ) = 1 , w ( τ r − ) = 1 − p a , if w ( τ r ) = 0 , w ( τ r − ) = 0 q a , if w ( τ r ) = 0 , w ( τ r − ) = 1 (4) f (cid:0) g ( τ r ) (cid:12)(cid:12) w ( τ r ) , g ( τ r − ) (cid:1) f (cid:0) w ( τ r ) (cid:12)(cid:12) g ( τ r − ) , w ( τ r − ) =  p a , if w ( τ r ) = 1 , w ( τ r − ) = 0 − q a , if w ( τ r ) = 1 , w ( τ r − ) = 1 − p b , if w ( τ r ) = 0 , w ( τ r − ) = 0 q b , if w ( τ r ) = 0 , w ( τ r − ) = 1 , (5)then (4) and (5) are equivalent if p = p and p = p . For PR process, it’s not necessarilytrue since f (cid:0) g ( τ r ) (cid:12)(cid:12) w ( τ r ) , g ( τ r − ) (cid:1) depends on the sizes of connected components of g ( τ r )and the relation between the connected components of g ( τ r ) and that of g ( τ r ) is intractable.2ince each summand in (1) under paramerterization Γ = ( p , q , γ , α , β ) will have oneand only one match to a summand under the paramerterization Γ = ( p = q , q = p , γ = γ , α = 1 − β , β = 1 − α ) by ﬂipping the latent variables { x ( t m ) } into { x ( t m ) } , then f Γ (cid:0) g ? ( t ) , · · · , g ? ( t M ) (cid:1) = f Γ (cid:0) g ? ( t ) , · · · , g ? ( t M ) (cid:1) . Therefore, the probability distributionsof observed networks are the same under the two parameterization explained above, causingnon-identiﬁability. By restricting the parameter space α < . , β < .

5, we can make themodel identiﬁable.

The probability function of feasible sample path S , V conditional on G ( t ) , W ( t ) is p sp (cid:16) s , v (cid:12)(cid:12) g ( t ) , w ( t ))= P (cid:16) S = (cid:0) g ( τ ) , g ( τ , ..., g ( τ R ) (cid:1) , V = (cid:0) w ( τ ) , w ( τ , ..., w ( τ R ) (cid:1)(cid:12)(cid:12) G ( t ) = g ( t ) ,W ( t ) = w ( t ) (cid:17) · P ( τ R ≤ t < τ R +1 (cid:12)(cid:12) G ( t ) = g ( t ) , W ( t ) = w ( t ) , S = s , V = v )= R Y r =1 P (cid:16) G ( τ r ) = g ( τ r ) , W ( τ r ) = w ( τ r ) (cid:12)(cid:12) G ( τ r − ) = g ( τ r − ) , W ( τ r − ) = w ( τ r − ) (cid:17) · P ( τ R ≤ t < τ R +1 (cid:12)(cid:12) G ( t ) = g ( t ) , W ( t ) = w ( t ) , S = s , V = v )= R Y r =1 f (cid:16) g ( τ r ) , w ( τ r ) (cid:12)(cid:12) g ( τ r − ) , w ( τ r − ) (cid:17) · P ( τ R ≤ t < τ R +1 (cid:12)(cid:12) G ( t ) = g ( t ) , W ( t ) = w ( t ) , S = s , V = v )= R Y r =1 g (cid:16) g ( τ r ) (cid:12)(cid:12) w ( τ r ) , g ( τ r − ) (cid:17) h (cid:16) w ( τ r ) (cid:12)(cid:12) g ( τ r − ) , w ( τ r − ) (cid:17) · P ( τ R ≤ t < τ R +1 (cid:12)(cid:12) G ( t ) = g ( t ) , W ( t ) = w ( t ) , S = s , V = v ) , (6)where P ( τ R ≤ t < τ R +1 (cid:12)(cid:12) G ( t ) = g ( t ) , W ( t ) = w ( t ) , S = s , V = v )= P ( R transitions between ( t , t ])= exp( − γ ( t − t )) ( γ ( t − t )) R R ! since R ∼ poisson ( γ ( t − t )) . (7)Therefore, p sp (cid:16) s , v (cid:12)(cid:12) g ( t ) , w ( t ))= exp( − γ ( t − t )) ( γ ( t − t )) R R ! · R Y r =1 g (cid:16) g ( τ r ) (cid:12)(cid:12) w ( τ r ) , g ( τ r − ) (cid:17) h (cid:16) w ( τ r ) (cid:12)(cid:12) g ( τ r − ) , w ( τ r − ) (cid:17) , (8)3here s = ( g ( τ ) , g ( τ ) , ..., g ( τ R )) , v = ( w ( τ ) , w ( τ ) , ..., w ( τ R )) , τ = t , g ( · (cid:12)(cid:12) · ) is the con-ditional pmf of r.v. G ( τ ) given (cid:0) W ( τ ) , G ( τ ) (cid:1) , and h ( · (cid:12)(cid:12) · ) is the conditional pmf of r.v. W ( τ ) given (cid:0) G ( τ ) , W ( τ ) (cid:1) . τ , τ are two arbitrary consecutive transition time points, τ < τ . More details on g ( · (cid:12)(cid:12) · ) and h ( · (cid:12)(cid:12) · )on are provided in section 3 of this Supplement.Generally, for m = 2 , , ..., M , the pmf of feasible sample path S m − , V m − that brings g ( t m − ) to g ( t m ) conditional on G ( t m − ) , W ( t m − ) is p sp (cid:16) s m − , v m − (cid:12)(cid:12) g ( t m − ) , w ( t m − ))= exp( − γ ( t m − t m − )) ( γ ( t m − t m − )) R m − R m − ! · R m − Y r =1 g (cid:16) g ( τ ( m − r ) (cid:12)(cid:12) w ( τ ( m − r ) , g ( τ ( m − r − ) (cid:17) h (cid:16) w ( τ ( m − r ) (cid:12)(cid:12) g ( τ ( m − r − ) , w ( τ ( m − r − ) (cid:17) , (9)where s m − = ( g ( τ ( m − ) , g ( τ ( m − ) , ..., g ( τ ( m − R m − )) , v m − = ( w ( τ ( m − ) , w ( τ ( m − ) , ..., w ( τ ( m − R m − ))is the sample path from g ( t m − ) to g ( t m ) between t m − , and t m ; ( τ ( m − , ..., τ ( m − R m − ) are the R m − transition time points between t m − and t m ; τ ( m − = t m − . Recall that X = { , } × { , , , · · · , N ) } , where N is the number of nodes in networks.The one-step transition probability for the network-valued Markov chain, { (cid:0) G ( t ) , W ( t ) (cid:1) ∈X , t ≤ t ≤ t M } , from ( g , w ) to ( g , w ) in a single step is P (cid:16)(cid:0) G ( τ ) , W ( τ ) (cid:1) = (cid:0) g , w (cid:1)(cid:12)(cid:12)(cid:0) G ( τ ) , W ( τ ) (cid:1) = (cid:0) g , w (cid:1)(cid:17) = P (cid:16) G ( τ ) = g , W ( τ ) = w (cid:12)(cid:12) G ( τ ) = g , W ( τ ) = w (cid:17) = P (cid:16) G ( τ ) = g (cid:12)(cid:12) W ( τ ) = w , G ( τ ) = g , W ( τ ) = w (cid:17) · P (cid:16) W ( τ ) = w (cid:12)(cid:12) G ( τ ) = g , W ( τ ) = w (cid:17) = P (cid:16) G ( τ ) = g (cid:12)(cid:12) W ( τ ) = w , G ( τ ) = g (cid:17) · P (cid:16) W ( τ ) = w (cid:12)(cid:12) G ( τ ) = g , W ( τ ) = w (cid:17) (since G ( τ ) ⊥⊥ W ( τ ) (cid:12)(cid:12) G ( τ ) , W ( τ )) , (10)where τ , τ are two arbitrary consecutive transition time points, τ < τ . The ﬁrst proba-bility is speciﬁed by the percolation model, speciﬁcally by the way of choosing edge to addor delete, and the second probability is given in Table 1.We can rewrite the one-step transition probability as f ( g , w (cid:12)(cid:12) g , w ) = g ( g (cid:12)(cid:12) w , g ) · h ( w (cid:12)(cid:12) g , w ) , (11)4here f ( · (cid:12)(cid:12) · ) is the conditional pmf of rvs (cid:0) G ( τ ) , W ( τ ) (cid:1) given (cid:0) G ( τ ) , W ( τ ) (cid:1) , g ( · (cid:12)(cid:12) · ) is theconditional pmf of rv. G ( τ ) given (cid:0) W ( τ ) , G ( τ ) (cid:1) , and h ( · (cid:12)(cid:12) · ) is the conditional pmf of rv. W ( τ ) given (cid:0) G ( τ ) , W ( τ ) (cid:1) . τ , τ are two arbitrary consecutive transition time points, τ < τ .Speciﬁcally, g ( g (cid:12)(cid:12) w , g ) is parameter-free, only depend on edge change rule in the per-colation model. h ( w (cid:12)(cid:12) g , w ) = h p,q ( w (cid:12)(cid:12) g , w )= w { g =Empty } + (1 − w ) { g =Complete } + (cid:0) p w (1 − p ) − w { w =0 } + (1 − q ) w q − w { w =1 } (cid:1) { g =Empty or Complete } . (12) Recall that θ = ( p, q, γ ), we prove below that S ( θ ; w M , g M ) = E (cid:2) S ( θ ; w M , g M , S , V ) (cid:12)(cid:12) W M = w M , G M = g M (cid:3) (13) Proof.

Right = X s , v ∂∂ θ log f θ ( w , g , s , v ) · f ( s , v | w , g )= X s , v ∂∂ θ (log f θ ( s , v | w , g ) + log f θ ( w , g )) · f ( s , v | w , g )= X s , v ∂∂ θ log f θ ( s , v | w , g ) · f ( s , v | w , g ) + X s , v ∂∂ θ log f θ ( w , g ) · f ( s , v | w , g )= 0 + ∂∂ θ log f θ ( w , g ) X s , v f ( s , v | w , g )= 0 + ∂∂ θ log f θ ( w , g ) · Lef t (14)Notice that P s , v ∂∂ θ log f θ ( s , v | w , g ) · f ( s , v | w , g ) = P s , v ∂∂ θ f θ ( s , v | w , g ) f θ ( s , v | w , g ) · f ( s , v | w , g ) = P s , v ∂∂ θ f θ ( s , v | w , g ) = ∂∂ θ P s , v f θ ( s , v | w , g ) = ∂∂ θ The particle ﬁltering procedure is as follows:1. Start from the ﬁrst observed network g ? ( t ). Set G ( t ) = g ? ( t ) and W ( t ) = 1.2. To create B number of particles at observation time t , denoted by { w b ( t ) , g b ( t ) } Bb =1 ,repeat the following for b = 1 , · · · , B . 5a) Start from W ( t ) = 1 , G ( t ) = g ? ( t ) and simulate according to a continuoustime ER/PR process up until time t − t . Then set w b ( t ) , g b ( t ) equal to thecurrent binary variable value and network in the simulated chain.3. For m = 2 , · · · , M −

1, repeat the following steps.(a) For each one of the simulated pairs in { ( w b ( t m ) , g b ( t m )) } Bb =1 , calculate the con-ditional probability p b := P ( G ? ( t m ) = g ? ( t m ) (cid:12)(cid:12) G ( t m ) = g b ( t m )) with currentparameter estimates.(b) To create B number of particles at time t m +1 , repeat the following for b =1 , · · · , B .i. Sample an index η from 1 , , ..., B with probability proportional to p , p , ..., p B .ii. Start from W ( t m ) = w η ( t m ) , G ( t m ) = g η ( t m ) and simulate according toa continuous time ER/PR process up until time t m +1 − t m . Then set w b ( t m +1 ) , g b ( t m +1 ) equal to the current binary variable value and networkin the simulated chain.4. To obtain Ψ samples from the conditional distribution of W M , G M given the ob-served networks g ? M , we sample Ψ ancestral lines, or particle paths, denoted by { w ψ M , g ψ M } Ψ ψ =1 with the following procedure. We ﬁrst calculate the conditionalprobability at the last observation time p b := P ( G ? ( t M ) = g ? ( t M ) (cid:12)(cid:12) G ( t M ) = g b ( t M ))with current parameter estimates, for b = 1 , · · · , B , then repeat the following for ψ = 1 , ..., Ψ.(a) Sample an index η M from 1 , , ..., B with probability proportional to p , p , ..., p B .(b) Trace back the ancestral line for the η M -th particle, i.e. ( w η M ( t M ) , g η M ( t M )),at time t M . In other words, identify the ancestral particle index for the η M -th particle at time t M , and call it η M − , then identify the ancestral particleindex for η M − -th particle at time t M − , and call it η M − , and continue to dothis backward for η m -th particle at time t m , m = M − , · · · ,

2. Note that ateach observation moment, each particle has exactly one parent. Then set theancestral line, i.e. { ( w η m ( t m ) , g η m ( t m )) } Mm =2 as one sample from the conditionaldistribution of W M , G M , given g ? M .The particle ﬁltering allows us to sample Ψ network and binary variable sequences that aremore likely to generate the observed networks in the error process. It generates B possiblelatent variables at each observation time, thus eﬀectively reducing the space from whichwe will sample from for the particle paths. It also allows us to sample true network andbinary sequences without actually having the transition probabilities in closed form.6 Simulating the Sample Path

The inner expectations in formulas for ˆ γ , ˆ p and ˆ q (equation (9)-(11) in the main papar)are taken with respect to the conditional distribution of sample path S , V that connectsthe given true sequences w M , g M at the observation times in the hidden layer. In orderto estimate those expectations, we propose to sample from the conditional distributionsby simulating paths using an MCMC method. We draw upon the work of Snijders et al.(2010) where draws are generated by the Metropolis-Hastings algorithm, using a proposaldistribution consisting of small changes in a sample path that brings one network to thenext.Speciﬁcally, for each m = 2 , · · · , M , we need to generate a sample path s m − , v m − that brings ( w ( t m − ) , g ( t m − )) to ( w ( t m ) , g ( t m )). The probability function of the samplepath, conditional on ( w ( t m − ) , g ( t m − )) is given in (9). To simplify the notation, denotethe sample path and its conditional probability function by u and p ( u ), respectively. Theacceptance ratio in the Metropolis-Hasting algorithm for a current state u and a proposedstate ˜ u is min { , p ( ˜ u ) q ( u | ˜ u ) p ( u ) q ( ˜ u | u ) } (15)where q ( ˜ u | u ) is the proposal distribution under which a new candidate sample path ˜ u is proposed by making small changes in the current sample path u . The small changesconsist of three operations: paired deletion, paired insertion and permutation. Paireddeletion involves removing a pair of same edge changes in sample path. For example, ifsample path u consists of adding an edge at one time and deleting the same edge after, thena new sample path ˜ u can be obtained by removing such pair of same edge changes from u since the two edge changes oﬀset each other producing no change on the network. Similarly,paired insertion involves adding a pair of same edge changes in a current sample path, andpermutation involves permuting all edge changes in a current sample path. All of theseoperations are done in such a way that the proposed new sample path ˜ u is still a feasiblepath with only one edge change at a time connecting the two networks at the consecutiveobservation times in the hidden layer. More discussion of this MCMC procedure can befound in Snijders et al. (2010). Theorem . Let π Γ ( x ) be the stationary distribution of a Markov chain in RG-HMM { X ( t m ) , G ? ( t m ) } and L be the limiting covariance matrix of m − / ∂∂ Γ log p Γ ( G ? m ). As-sume (C1) that t m − t m − = t m +1 − t m = ∆ t, ∀ m ≥

2, and stationarity is reached at t ,(C2) that ∀ x ∈ X , Γ → π Γ ( x ) has two continuous derivatives in some small neighborhoodof Γ , and (C3) that L is nonsingular. Then m / ( ˆΓ − Γ ) p −→ N (0 , L − ) as m → ∞ . Proof.

Bickel et al. (1998) provide a central limit theorem for the MLE of a general HMMwith stationary Markov chain under 6 regularity conditions (A1)-(A6). Our approach is toshow that (A1)-A(6) hold for our RG-HMM under assumptions (C1)-(C3).7et P ab ( t ) denote the probability that the Markov chain { X ( t ) , t ≥ } in our RG-HMMpresently in state a will be in state b after an additional time t , P ab denote the one-steptransition probability that the process will next enter state b when it leaves state a (detailsin Appendix 3), P nab denote the n -step transition probability.Under (C1), the M observation moments t < t < · · · < t M are evenly spaced atthe stationary period of Markov chain, then the transition probability for the discrete-time Markov chain { X ( t m ) , m ≥ } in our RG-HMM is P ( X ( t m ) = b | X ( t m − ) = a ) = P ab ( t m − t m − ) = P ab (∆ t ), ∀ m , which is independent of m . Hence, (C1) ensures that { X ( t m ) } is a stationary Markov chain with homogeneous transition probability matrix { α Γ ( a, b ) } = { P ab (∆ t ) } . We now restate (A1)-(A6) and show that they all hold for ourRG-HMM { X ( t m ) , G ? ( t m ) } .(A1) The transition probability matrix { α Γ ( a, b ) } is ergodic, that is, irreducible andaperiodic.(A2) For all a , b ∈ X and g ? ∈ { , , ..., N ) } , the maps Γ → α Γ ( a, b ), Γ → π Γ ( a ) and Γ → g Γ ( g ? | a ) have two continuous derivatives in some neighborhood of Γ .(A3) Write Γ = (Γ , · · · , Γ d ) ≡ ( p, q, γ, α, β ). There exists a δ > ≤ i ≤ d and all a , E Γ h sup | Γ − Γ | <δ | ∂∂ Γ i log g Γ ( G ? | a ) | i < ∞ ; (16)(ii) for all a ≤ i, j ≤ d and all a , E Γ h sup | Γ − Γ | <δ | ∂ ∂ Γ i ∂ Γ j log g Γ ( G ? | a ) | i < ∞ ; (17)(iii) for j = 1 ,

2, all 1 ≤ i l ≤ d , l = 1 , · · · , j and all a , Z sup | Γ − Γ | <δ | ∂ j ∂ Γ i ∂ Γ i j g Γ ( g ? | a ) | ν ( dg ? ) < ∞ . (18)(A4) There exists a δ > ρ ( g ? ) = sup | Γ − Γ | <δ max a,b ∈X g Γ ( g ? | a ) g Γ ( g ? | b ) , (19) P Γ ( ρ ( G ? ( t m )) = ∞| X ( t m ) = a ) < a .(A5) Γ is an interior point of the set Θ, which is the set to which Γ belong.(A6) The MLE is strongly consistent, which requires (i) (A1); (ii) for all a , b and g ? , the map Γ → α Γ ( a, b ) and the map Γ → g Γ ( g ? | a ) are continuous on Θ; (iii) foreach a , E Γ | log g Γ ( G ? | a ) | < ∞ ; (iv) for each a and Γ there is a δ > E Γ h sup | Γ − Γ | <δ (log g Γ ( G ? | a )) + i < ∞ ; (v) for each Γ such that the laws P Γ and P Γ agree, Γ = Γ .In our RG-HMM, all states in the Markov chain { X ( t m ) } communicate with eachother as any two networks can be connected by a network path that leads the one to the8ther, hence irreducible. Note that |X | < ∞ , then { X ( t m ) } is positive recurrent as everyirreducible Markov chain with a ﬁnite state space is positive recurrent. Also note that thetransition probability α Γ ( a, b ) = P ab (∆ t ) = ∞ X n =0 P nab · exp( − γ ∆ t ) ( γ ∆ t ) n n ! ∈ (0 , + ∞ ) , (20)for all a, b ∈ X , p, q ∈ (0 ,

1) and γ ∈ (0 , + ∞ ). Let α ( k ) Γ ( a, b ) denote the k -step transitionprobability of { X ( t m ) } , then the matrix { α ( k ) Γ ( a, b ) } can be calculated by multiplying thematrix { α Γ ( a, b ) } by itself k times, yielding positive diagonal elements, which means thateach state can visit back itself in any number of transition steps, hence it is aperiodic.Therefore, (A1) holds, which also assures the existence of unique stationary distributionfor { X ( t m ) } .Note that P ab is given in Appendix 3 (example with 3-node network is provided in theend of the proof), and the matrix { P nab } can be calculated by multiplying the matrix { P ab } by itself n times, thus we have P nab = k ( n ) X i =1 c i p c i q c i (1 − p ) c i (1 − q ) c i , (21)where { c ji } j =0 and k ( n ) are constant depending on the values of a , b and n . c i is the termcorresponding to the product of probabilities of random graph process, thus c i ≤ k ( n )can be interpreted as the number of feasible paths from a to b with length n , and each pathcontains c i + c i edge additions and c i + c i edge deletions, thus 0 ≤ c i , c i , c i , c i ≤ n .Also we have g Γ ( g ? | a ) = α c (1 − α ) d β b (1 − β ) a , (22)where a , b , c , d are constant depending on the true network a and observed network g ? .From the analytical form of α Γ ( a, b ) and g Γ ( g ? | a ) given in (20), (21) and (22), alongwith assumption (C2), identiﬁability addressed in Appendix 1 and ﬁnite state space X , wecan verify that when p, q ∈ (0 , γ > α, β ∈ (0 , . g Γ ( g ? | a ) w.r.t α and β are bounded in some neighborhood of Γ , andthe expectation and integral involve ﬁnite summation of ﬁnite terms, thus ﬁnite as well.(A4) holds since g Γ ( g ? | a ) = 0 with α, β ∈ (0 , . ρ ( g ? ) < ∞ for all g? and a . (A5)holds since Θ is an open set, Θ = { ( p, q, γ, α, β ) | p, q ∈ (0 , , γ ∈ (0 , ∞ ) , α, β ∈ (0 , . } .(A6)-(i)(iii)(iv)(v) all hold since g Γ ( g ? | a ) are bounded in some neighborhood of Γ , and themodel is identiﬁable under set Θ.As it is hard to verify analytically the continuity regarding Γ → π Γ ( a ), we make itthe assumption (C2). We can verify that (A2) and (A6)-(ii) also hold under (C2). To seethis is so, we show, as an example, that α Γ ( a, b ) is continuous in p ∈ (0 ,

1) and the ﬁrstand second derivatives of α Γ ( a, b ) w.r.t p are continuous in some neighborhood of p . Thecontinuity w.r.t other parameters q, γ, α, β can be veriﬁed using a similar argument. Deﬁne f n ( p ) := P nab · exp( − γ ∆ t ) ( γ ∆ t ) n n ! = exp( − γ ∆ t ) ( γ ∆ t ) n n ! · k ( n ) X i =1 c i p c i q c i (1 − p ) c i (1 − q ) c i , (23)9 ( p ) := α Γ ( a, b ) = ∞ X n =0 f n ( p ) . (24)Note that for any γ , p and n , | f n ( p ) | ≤ M n := exp( − γ ∆ t ) ( γ ∆ t ) n n ! , and P ∞ n =1 M n = 1 < ∞ ,then by Weierstrass M-test P ∞ n =1 f n ( p ) converges uniformly to f ( p ) in p ∈ (0 , f n ( p ) is continuous, then f ( p ) is continuous in (0 , f ( p ) = ∞ X n =0 f n ( p ) = ∞ X n =0 exp( − γ ∆ t ) ( γ ∆ t ) n n ! · k ( n ) X i =1 c i p c i q c i (1 − p ) c i (1 − q ) c i ( c i p − c i − p ) . (25)For p ∈ ( p − (cid:15), p + (cid:15) ), we have ∞ X n =0 | f n ( p ) | ≤ ∞ X n =0 exp( − γ ∆ t ) ( γ ∆ t ) n n ! · k ( n ) X i =1 c i p c i q c i (1 − p ) c i (1 − q ) c i ( (cid:12)(cid:12)(cid:12)(cid:12) c i p (cid:12)(cid:12)(cid:12)(cid:12) + (cid:12)(cid:12)(cid:12)(cid:12) c i − p (cid:12)(cid:12)(cid:12)(cid:12) ) ≤ ∞ X n =0 exp( − γ ∆ t ) ( γ ∆ t ) n n ! · k ( n ) X i =1 c i p c i q c i (1 − p ) c i (1 − q ) c i ( (cid:12)(cid:12)(cid:12)(cid:12) np (cid:12)(cid:12)(cid:12)(cid:12) + (cid:12)(cid:12)(cid:12)(cid:12) n − p (cid:12)(cid:12)(cid:12)(cid:12) ) ≤ ∞ X n =0 exp( − γ ∆ t ) ( γ ∆ t ) n n ! · ( | np | + | n − p | ) P nab ≤ ( 1 | p | + 1 | − p | ) ∞ X n =0 exp( − γ ∆ t ) ( γ ∆ t ) n n ! · n · < ∞ , (26)so again by Weierstrass M-test P ∞ n =1 f n ( p ) converges uniformly to f ( p ) in some neighbor-hood of p . Then we have for p ∈ ( p − (cid:15), p + (cid:15) ) f ( p ) = ∞ X n =0 f n ( p ) = ∞ X n =0 exp( − γ ∆ t ) ( γ ∆ t ) n n ! · k ( n ) X i =1 c i p c i q c i (1 − p ) c i (1 − q ) c i (cid:20) ( c i p − c i − p ) − c i p − c i (1 − p ) (cid:21) , (27)10 X n =0 | f n ( p ) | ≤ ∞ X n =0 exp( − γ ∆ t ) ( γ ∆ t ) n n ! · k ( n ) X i =1 c i p c i q c i (1 − p ) c i (1 − q ) c i | ( c i p − c i − p ) − c i p − c i (1 − p ) |≤ ( 1 p + 1(1 − p ) + 2 p (1 − p ) ) ∞ X n =0 exp( − γ ∆ t ) ( γ ∆ t ) n n ! · n +( 1 p + 1(1 − p ) ) ∞ X n =0 exp( − γ ∆ t ) ( γ ∆ t ) n n ! · n < ∞ , (28)hence P ∞ n =1 f n ( p ) converges uniformly to f ( p ) in some neighborhood of p . Since f n ( p )and f n ( p ) are continuous in some neighborhood of p , and they are all uniformly convergent,then f ( p ) and f ( p ) are continuous in some neighborhood of p . Example.

The one-step transition matrix { P ab } for networks with 3 nodes under ER modelis shown in (29), where the state space is X = { , } × { , , · · · , } ) \ { (0 , , (1 , } ascomplete network cannot be obtained by deleting an edge and empty network cannot beobtained by adding an edge. And the mapping between labels and networks is given inFigure 1. { P ab } = (0 ,

1) (0 ,

2) (0 ,

3) (0 ,

4) (0 ,

5) (0 ,

6) (0 ,

7) (1 ,

2) (1 ,

3) (1 ,

4) (1 ,

5) (1 ,

6) (1 ,

7) (1 , ,

1) 0 0 0 0 0 0 0

13 13 13 ,

2) 1 − p p p ,

3) 1 − p p p ,

4) 1 − p p p ,

5) 0 (1 − p ) (1 − p ) 0 0 0 0 0 0 0 0 0 0 p (0 ,

6) 0 0 (1 − p ) (1 − p ) 0 0 0 0 0 0 0 0 0 p (0 ,

7) 0 (1 − p ) 0 (1 − p ) 0 0 0 0 0 0 0 0 0 p (1 , q (1 − q ) 0 (1 − q ) 0(1 , q (1 − q ) (1 − q ) 0 0(1 , q (1 − q ) (1 − q ) 0(1 ,

5) 0 q q − q (1 ,

6) 0 0 q q − q (1 ,

7) 0 q q − q (1 ,

8) 0 0 0 0

13 13 13 (29)11

B C AB C AB C AB C

AB C AB C AB C AB C G ( t ) with N = 3 nodes indexed from 1 to 8 instate space. Deﬁne Z m := P (cid:16) G ? m = g ? m | X = (cid:0) , g ? ( t ) (cid:1)(cid:17) for each m = 2 , · · · M , and Z := 1. Alsolet q m ( x ) = P ( G ?m = g ?m | X m = x ), η m ( x ) = P ( X m = x | G ? m − = g ? m − , X = (1 , g ? ( t ))for each m = 2 , · · · M , with conventions η ( x ) = P ( X = x | X = (1 , g ? ( t )). To alleviatethe notational burden, we omit the condition at t in the probability expression when noconfusion is possible. Z m = P ( G ? m = g ? m ) = P ( G ? m − = g ? m − ) P ( G ?m = g ?m | G ? m − = g ? m − )= Z m − X x P ( G ?m = g ?m , X m = x | G ? m − = g ? m − )= Z m − X x P ( G ?m = g ?m | X m = x, G ? m − = g ? m − ) P ( X m = x | G ? m − = g ? m − )= Z m − X x P ( G ?m = g ?m | X m = x ) P ( X m = x | G ? m − = g ? m − )= Z m − X x q m ( x ) η m ( x )= Z m − E ( q m ( X )) , X ∼ η m ( · ) (30) We now present a forward algorithm with particle ﬁltering to compute Z BM , i.e. the particleapproximation of the marginal probability of observations. The time complexity is O ( BM ).Most of the computation is in the simulation of ( ? ), which can be done as in step 3, section5 of this Supplement. 12 lgorithm 1: Forward Algorithm with Particle Filtering

Input: g ? = (cid:0) g ? ( t ) , g ? ( t ) , ..., g ? ( t M ) (cid:1) , B Output: Z BM Set Z B = 1, sample ξ b iid ∼ η ( · ) = P (cid:0) X = · | X = (1 , g ( t )) (cid:1) , b = 1 , · · · , B , and set η B ( · ) = B P Bb =1 { ξ b = ·} For m = 2 , · · · M : set Z Bm ← Z Bm − X x q m ( x ) η Bm ( x ) = Z Bm − B B X b =1 q m ( ξ bm )Sample ξ bm +1 iid ∼ η Bm +1 ( · ) = B X b =1 q m ( ξ bm ) P ( X m +1 = · | X m = ξ bm ) P Bi =1 q m ( ξ bm ) , b = 1 , · · · , B ( ? )and set η Bm +1 ( · ) = P Bb =1 { ξ bm +1 = ·}

10 Numerical Simulation Results for Estimation

B ˆ p ˆ q ˆ γ ˆ α ˆ β Table 1: Numerical results for Figure 5 in the paper. Mean and standard deviation ofparameter estimates based on 20 simulations from the ER process with varying B , N = 20, M = 50. 13 M ˆ p ˆ q ˆ γ ˆ α ˆ β

30 50 0.637 (0.042) 0.335 (0.053) 1.647 (0.111) 0.089 (0.013) 0.034 (0.011)100 0.669 (0.029) 0.310 (0.044) 1.728 (0.066) 0.121 (0.017) 0.032 (0.008)150 0.684 (0.027) 0.307 (0.046) 1.802 (0.084) 0.143 (0.015) 0.029 (0.007)200 0.684 (0.024) 0.311 (0.034) 1.920 (0.096) 0.163 (0.023) 0.026 (0.005)50 50 0.572 (0.038) 0.431 (0.078) 1.518 (0.046) 0.063 (0.005) 0.029 (0.008)100 0.613 (0.029) 0.374 (0.053) 1.581 (0.054) 0.082 (0.007) 0.034 (0.007)150 0.644 (0.026) 0.371 (0.027) 1.656 (0.062) 0.096 (0.009) 0.034 (0.006)200 0.653 (0.023) 0.337 (0.026) 1.691 (0.046) 0.107 (0.010) 0.038 (0.005)70 50 0.526 (0.035) 0.537 (0.061) 1.526 (0.055) 0.050 (0.003) 0.024 (0.004)100 0.572 (0.030) 0.481 (0.060) 1.533 (0.051) 0.064 (0.004) 0.030 (0.007)150 0.602 (0.020) 0.444 (0.047) 1.535 (0.038) 0.075 (0.004) 0.033 (0.005)200 0.611 (0.025) 0.395 (0.037) 1.580 (0.037) 0.083 (0.005) 0.036 (0.006)

Table 2: Numerical results for Figure 6 in the paper. Mean and standard deviation ofparameter estimates based on 20 simulations from ER process with B = 50000, ( N, M ) ∈{ , , } × { , , , } .

11 Supplementary Simulation Results for Testing

A second simulation study is designed to evaluate the eﬀect of observation rate κ on therate of detection for ER and PR using Bayes factor. We set B = 500 , γ = 2). We let κ take on values in { . , , . } and N takeon values in { , , } . For each combination of settings, we generate 100 ER sequencesand 100 PR sequences. We then conduct hypothesis testing on each sequence. We also ﬁta GLMM to analyze the testing results. The simulation and GLMM results are shown inFigures 2, 3 and Table 3. We can see that the observation rate signiﬁcantly aﬀect the rateof detection. The rate of detection signiﬁcantly increases by increasing κ from 0 . κ further increases.14 observation rate R a t e o f D e t e c t i on process ERPR 60.0%65.0%70.0%75.0% 0.5 1 1.5 observation rate P r obab ili t y o f S u cc e ss f u l D e t e c t i on process ERPR

Fitted Probabilities of Successful Detection from GLMM

Figure 2: Results for the second simulation study. Left: rates of detection for ER and PRbased on 100 replicates. Right: ﬁtted probability of successful detection in GLMM.

PRobservation rate 1.5observation rate 1N 30N 20 0.2 0.5 1 2 5 10

Odds Ratios

Figure 3: Estimates of the coeﬃcients fromGLMM in the second simulation study.

Chisq Df Pr( > Chisq)N 1.14 2 0.5668process 0.36 1 0.5458observation rate 6.61 2 0.0368

Table 3: ANOVA table for the second simu-lation study

12 Automatic Segment Finder

Before we apply our estimation and testing framework to the constructed networks, we needﬁrst choose a segment of the network time series consistent with percolation in connectionwith diﬀerent stages of seizure, e.g. seizure onset and preceding termination. The choiceof an appropriate time series segment from the overall non-stationary signal is nontrivial.We developed a simple, automatic procedure for the purpose, shown in Algorithm 2. Thisprocedure consists of the following steps:1. Subset the time series by windowing so that it only contains network evolution in theregion of interest (ROI). 15. Split the time series into diﬀerent low-high-low segments based on GCC/density ofthe network time series. Each low-high-low segment starts when the size of theGCC/density is below low level and ends when it reaches low level for the ﬁrst timeafter reaching high level. The two levels are set to be the ﬁrst and third quartiles ofthe size of the GCC/density over time.3. For each low-high-low segment, trim oﬀ the tail with decreased GCC/density, whichis from when it reaches high level for the last time to the end.4. Use the identiﬁed low-high segments for testing.The purpose of step 2-3 is to automatically identify all segments where both the size ofGCC and density ramp up in the ROI. There are several ways proposed to choose the ROIsin a network time series. One way is to choose the segment from 50 s before seizure onsetto 50 s after seizure onset. The second way is via the subjective determination of an expert.The third way is through dynamic community detection on network time series (Martinetet al. (2020)), and to choose the region where the large dynamic community appears. Weused the third in our analysis since it is the most objective approach and yields reliableROIs assessed by epileptologist.

Algorithm 2:

Automatic Segment Finder

Input:

Network time series g ? M = [ g ∗ ( t ) , · · · , g ∗ ( t M )], ROI = [ t , t ], metric on networks: GCC or Density Output:

A network time series segment g ?a : b = [ g ∗ ( t a ) , g ∗ ( t a +1 ) · · · , g ∗ ( t b )] , ≤ a < b ≤ M Initialize: set = {} an empty set to store segmenting indices c M ∈ R M ← GCC( g ? M ) / Density( g ? M ) . Calculate size of GCC/density over time l, r ← arg max l,r ∈ M ( t r − t l ) s.t. t l , t r ∈ ROI . Take a subsection of time series Q ← ﬁrst quartile of c l : r Q ← third quartile of c l : r anchor ← l . Start segmenting time series with sliding window while c anchor > Q do anchor ← anchor + 1 . Anchor the left point of a potential segment end set = { set, anchor } while not ﬁnished segmenting c l : r do i ← while c anchor + i < Q & anchor + i < r do i ← i + 1 . Grow the segment until it reaches Q end while c anchor + i > Q & anchor + i < r do i ← i + 1 . Keep growing the segment until it reaches Q end anchor ← anchor + i . Anchor the right point of the grown segment set = { set, anchor } end a, b ← a pair of consecutive indices in set s.t. ˜ t ∈ [ a, b ] b ← b while c b < Q do b ← b − . Trim off the tail of segment end Note that instead of using either GCC or density to segment the network time series,we can also use both of them to ﬁnd the proper segment for testing. Speciﬁcally, we can16rst run the Automatic Segment Finder on the network time series with one metric. Wecan then run it again with the other metric, thus obtaining two segmented network timeseries. And then we can take, as input, the overlapping of the two segments for testing,which is what we do in this analysis.

13 Results for Seizure 1, 2 and 3 on the Left Hemi-sphere

Figure 4: Seizure 1: left hemisphere. Two vertical lines represent seizure onset and ter-minations times, respectively. Top: Voltage time series recorded at eight electrodes onleft brain hemisphere. Second row: Community membership (indicated by colors) for eachnode over time; the large dynamic communities manifest ROIs. Third row: Network den-sity over time. Bottom: Proportion of nodes in the GCC over time; the bold curves inROIs are chosen by automatic segment ﬁnder for testing. ˆ p ˆ q ˆ γ ˆ α ˆ β loglik ER 0.404 (0.112) 0.248 (0.075) 10.075 (1.423) 0.068 (0.0005) 0.435 (0.035) -5636.822 (5.942)PR 0.468 (0.156) 0.163 (0.072) 9.819 (1.512) 0.068 (0.0004) 0.464 (0.032) -5647.120 (6.850)log(BF) 10.297 (9.468))

Table 4: Mean and standard deviation of parameter estimates for ER/PR process, log-likelihood estimates and Bayes factor (log scale) estimate based oﬀ of 10 trials for seizure1 on the left brain hemisphere (ﬁrst bold segment).17 p ˆ q ˆ γ ˆ α ˆ β loglik ER 0.580 (0.111) 0.279 (0.110) 6.480 (0.823) 0.033 (0.0004) 0.273 (0.047) -4222.710 (15.563)PR 0.625 (0.092) 0.250 (0.134) 6.010 (0.348) 0.033 (0.0004) 0.336 (0.040) -4254.414 (11.118)log(BF) 31.704 (15.141)

Table 5: Mean and standard deviation of parameter estimates for ER/PR process, log-likelihood estimates and Bayes factor (log scale) estimate based oﬀ of 10 trials for seizure1 on the left brain hemisphere (second bold segment). ˆ p ˆ q ˆ γ ˆ α ˆ β loglik ER 0.647 (0.051) 0.138 (0.055) 10.037 (0.631) 0.128 (0.0007) 0.214 (0.024) -16058.532 (24.298)PR 0.642 (0.037) 0.141 (0.078) 7.567 (1.209) 0.129 (0.0004) 0.228 (0.026) -16120.504 (26.966)log(BF) 61.971 (33.687)

Table 6: Mean and standard deviation of parameter estimates for ER/PR process, log-likelihood estimates and Bayes factor (log scale) estimate based oﬀ of 10 trials for seizure1 on the left brain hemisphere (third bold segment).18igure 5: Seizure 2: left hemisphere. Two vertical lines represent seizure onset and ter-minations times, respectively. Top: Voltage time series recorded at eight electrodes onleft brain hemisphere. Second row: Community membership (indicated by colors) for eachnode over time; the large dynamic communities manifest ROIs. Third row: Network den-sity over time. Bottom: Proportion of nodes in the GCC over time; the bold curves inROIs are chosen by automatic segment ﬁnder for testing. ˆ p ˆ q ˆ γ ˆ α ˆ β loglik ER 0.573 (0.051) 0.269 (0.075) 6.041 (0.223) 0.047 (0.0004) 0.262 (0.019) -9226.559 (25.313)PR 0.587 (0.057) 0.288 (0.103) 5.732 (0.242) 0.047 (0.0001) 0.279 (0.020) -9277.567 (21.869)log(BF) 51.008 (31.198)

Table 7: Mean and standard deviation of parameter estimates for ER/PR process, log-likelihood estimates and Bayes factor (log scale) estimate based oﬀ of 10 trials for seizure2 on the left brain hemisphere (ﬁrst bold segment).19 p ˆ q ˆ γ ˆ α ˆ β loglik ER 0.522 (0.049) 0.258 (0.067) 8.846 (0.596) 0.081 (0.0007) 0.354 (0.031) -18375.956 (52.389)PR 0.545 (0.068) 0.277 (0.088) 7.159 (0.852) 0.081 (0.0009) 0.410 (0.037) -18481.718 (40.650)log(BF) 105.762 (73.922)

Table 8: Mean and standard deviation of parameter estimates for ER/PR process, log-likelihood estimates and Bayes factor (log scale) estimate based oﬀ of 10 trials for seizure2 on the left brain hemisphere (second bold segment). ˆ p ˆ q ˆ γ ˆ α ˆ β loglik ER 0.533 (0.026) 0.141(0.038) 8.292(0.548) 0.099 (0.0001) 0.150 (0.021) -16234.315 (13.354)PR 0.575 (0.079) 0.174 (0.082) 7.783 (0.568) 0.099 (0.0001) 0.177 (0.025) -16241.436 (15.226)log(BF) 7.119 (19.931)

Table 9: Mean and standard deviation of parameter estimates for ER/PR process, log-likelihood estimates and Bayes factor (log scale) estimate based oﬀ of 10 trials for seizure2 on the left brain hemisphere (third bold segment).20igure 6: Seizure 3: left hemisphere. Two vertical lines represent seizure onset and ter-minations times, respectively. Top: Voltage time series recorded at eight electrodes onleft brain hemisphere. Second row: Community membership (indicated by colors) for eachnode over time; the large dynamic communities manifest ROIs. Third row: Network den-sity over time. Bottom: Proportion of nodes in the GCC over time; the bold curves inROIs are chosen by automatic segment ﬁnder for testing. ˆ p ˆ q ˆ γ ˆ α ˆ β loglik ER 0.651 (0.042) 0.166 (0.084) 6.367 (0.625) 0.055 (0.0003) 0.200 (0.027) -5388.566 (9.834)PR 0.665 (0.040) 0.128 (0.079) 5.722 (0.341) 0.055 (0.0002) 0.225 (0.036) -5399.512 (7.227)log(BF) 10.945 (12.786)

Table 10: Mean and standard deviation of parameter estimates for ER/PR process, log-likelihood estimates and Bayes factor (log scale) estimate based oﬀ of 10 trials for seizure3 on the left brain hemisphere (ﬁrst bold segment).21 p ˆ q ˆ γ ˆ α ˆ β loglik ER 0.824 (0.064) 0.091 (0.036) 12.746 (1.522) 0.198 (0.0006) 0.268 (0.023) -18120.573 (25.416)PR 0.682 (0.112) 0.134 (0.077) 7.442 (2.783) 0.201 (0.0007) 0.219 (0.071) -18182.074 (22.634)log(BF) 61.503 (25.172)

Table 11: Mean and standard deviation of parameter estimates for ER/PR process, log-likelihood estimates and Bayes factor (log scale) estimate based oﬀ of 10 trials for seizure3 on the left brain hemisphere (second bold segment). ˆ p ˆ q ˆ γ ˆ α ˆ β loglik ER 0.462 (0.078) 0.217(0.125 7.629 (0.643) 0.053 (0.0003) 0.185 (0.027) -4102.365 (17.419)PR 0.535 (0.126) 0.241(0.151) 6.961 (0.616) 0.053 (0.0004) 0.259 (0.070) -4117.433 (5.860)log(BF) 15.068 (20.097)

Table 12: Mean and standard deviation of parameter estimates for ER/PR process, log-likelihood estimates and Bayes factor (log scale) estimate based oﬀ of 10 trials for seizure3 on the left brain hemisphere (third bold segment). ˆ p ˆ q ˆ γ ˆ α ˆ β loglik ER 0.629 (0.091) 0.154 (0.070) 7.700 (0.586) 0.076 (0.0003) 0.179 (0.059) -6836.452 (6.823)PR 0.674 (0.035) 0.119 (0.051) 6.660 (0.569) 0.076 (0.0003) 0.248 (0.031) -6831.450 (7.429)log(BF) -5.002 (7.224)

Table 13: Mean and standard deviation of parameter estimates for ER/PR process, log-likelihood estimates and Bayes factor (log scale) estimate based oﬀ of 10 trials for seizure3 on the left brain hemisphere (fourth bold segment).

References

Bickel, P. J., Ritov, Y., Ryden, T. et al. (1998), ‘Asymptotic normality of the maximum-likelihood estimator for general hidden markov models’,

The Annals of Statistics (4), 1614–1635.Martinet, L.-E., Kramer, M., Viles, W., Perkins, L., Spencer, E., Chu, C., Cash, S. &Kolaczyk, E. (2020), ‘Robust dynamic community detection with applications to humanbrain functional networks’, Nature Communications (1), 1–13.Snijders, T. A., Koskinen, J. & Schweinberger, M. (2010), ‘Maximum likelihood estimationfor social network dynamics’, The Annals of Applied Statistics4