Physics-aware, deep probabilistic modeling of multiscale dynamics in the Small Data regime
PPreprint
PHYSICS-AWARE, DEEP PROBABILISTIC MODELING OFMULTISCALE DYNAMICS IN THE SMALL DATA REGIME
Sebastian Kaltenbach , Phaedon-Stelios Koutsourelakis Professorship of Continuum Mechanics, Technical University of MunichBoltzmannstr.15, 85748 Garching { sebastian.kaltenbach; p.s.koutsourelakis } Key words:
Bayesian machine learning, virtual observables, multiscale modeling, coarse-graining
Abstract:
The data-based discovery of effective, coarse-grained (CG) models of high-dimen-sional dynamical systems presents a unique challenge in computational physics and particu-larly in the context of multiscale problems. The present paper offers a probabilistic perspectivethat simultaneously identifies predictive, lower-dimensional coarse-grained (CG) variables aswell as their dynamics. We make use of the expressive ability of deep neural networks in or-der to represent the right-hand side of the CG evolution law. Furthermore, we demonstratehow domain knowledge that is very often available in the form of physical constraints (e.g.conservation laws) can be incorporated with the novel concept of virtual observables. Suchconstraints, apart from leading to physically realistic predictions, can significantly reduce therequisite amount of training data which enables reducing the amount of required, computation-ally expensive multiscale simulations (Small Data regime). The proposed state-space model istrained using probabilistic inference tools and, in contrast to several other techniques, does notrequire the prescription of a fine-to-coarse (restriction) projection nor time-derivatives of thestate variables. The formulation adopted is capable of quantifying the predictive uncertainty aswell as of reconstructing the evolution of the full, fine-scale system which allows to select thequantities of interest a posteriori. We demonstrate the efficacy of the proposed framework in ahigh-dimensional system of moving particles.
The solution of high-dimensional, multiscale system is challenging as the required compu-tational resources usually grow exponentially with the dimension of the state-space as well aswith the smallest time-scale that needs to be resolved. As such systems are ubiqitious in ap-plied physics and engineering, reduced/coarse-grained descriptions and models are necessarythat are predictive of various observables or the high-dimensional system, but whose discretiza-tion time-scales can be much larger than the inherent ones [1].We adopt a data-based perspective [2, 3] that relies on data generated by simulations of afine-grained (FG) system in order to learn a coarse-grained (CG) model. We nevertheless notethat such coarse-graining tasks exhibit fundamental differences from large-scale machine learn-1 a r X i v : . [ phy s i c s . c o m p - ph ] F e b ebastian Kaltenbach, Phaedon-Stelios Koutsourelakisebastian Kaltenbach, Phaedon-Stelios Koutsourelakis
The solution of high-dimensional, multiscale system is challenging as the required compu-tational resources usually grow exponentially with the dimension of the state-space as well aswith the smallest time-scale that needs to be resolved. As such systems are ubiqitious in ap-plied physics and engineering, reduced/coarse-grained descriptions and models are necessarythat are predictive of various observables or the high-dimensional system, but whose discretiza-tion time-scales can be much larger than the inherent ones [1].We adopt a data-based perspective [2, 3] that relies on data generated by simulations of afine-grained (FG) system in order to learn a coarse-grained (CG) model. We nevertheless notethat such coarse-graining tasks exhibit fundamental differences from large-scale machine learn-1 a r X i v : . [ phy s i c s . c o m p - ph ] F e b ebastian Kaltenbach, Phaedon-Stelios Koutsourelakisebastian Kaltenbach, Phaedon-Stelios Koutsourelakis ing tasks [4, 5] as the data involved is usually small due to the expensive data acquisition andas information about the underlying physical structure of the problem is available. When thisdomain knowledge is incorporated into the CG model it can improve its predictive ability [6, 7].In contrast to other frameworks for reduced-order modeling (e.g. SINDy [8]) where the dy-namics of the CG model is learned based on a large vocabulary of feature functions, we employa deep neural network for the CG dynamics in order to gain great flexibility and be able to notrestrict ourselves to an a priori chosen set of feature functions. This approach is similar to theideas of Neural ODEs [9] and Neural SDEs [10] which also use neural networks to representthe dynamics. Another possibility would be the use of Gaussian Processes [11] which wouldallow non-parametric, probabilistic modeling.In this paper, we combine a generative, probabilistic machine learning framework [12] withvirtual observables [6] and deep neural networks for the CG dynamics as well as the mappingfrom the CG states to the FG states. In doing so, we propose a framework that can make use ofthe flexibility of neural nets, while still obeying physical laws. We carry out the tasks of modelestimation and dimensionality reduction simultaneously and identify the CG states variables,their dynamics as well as a probabilistic coarse-to-fine map based only on small amounts of FGsimulation data. In general, the subscript f or lower-case letters are used to denote variables associated withthe (high-dimensional) fine-grained(FG) model and the subscript c or upper-case letters areused for quantities of the (lower-dimensional) coarse-grained(CG) description. We also use acircumflex ˆ to denote observed/known variables. The fine-grained system considered is a high-dimensional system with state variables xxx ( xxx ∈ X f ⊂ R d f ), whose dimension d f is very large ( d f >> xxx t = fff ( xxx t , t ) , t > xxx that might be determin-istic or drawn from a specified distribution. In this work, we want to coarse-grain such a systemonly based on simulated data, i.e. time sequences simulated from Equation (1) with a time-step δ t .Our goal is to simultaneously identify (unknown) CG state variables XXX with
XXX ∈ X c ⊂ R d c as well as the dynamics of those CG variables. The dimension d c of these CG state variables isintended to be much smaller than d f . For the CG dynamics a Markovian dynamic is assumedin the form: ˙ XXX t = FFF ( XXX t , t ) , t > ebastian Kaltenbach, Phaedon-Stelios Koutsourelakisebastian Kaltenbach, Phaedon-Stelios Koutsourelakis
XXX ∈ X c ⊂ R d c as well as the dynamics of those CG variables. The dimension d c of these CG state variables isintended to be much smaller than d f . For the CG dynamics a Markovian dynamic is assumedin the form: ˙ XXX t = FFF ( XXX t , t ) , t > ebastian Kaltenbach, Phaedon-Stelios Koutsourelakisebastian Kaltenbach, Phaedon-Stelios Koutsourelakis In contrast to approaches based on the Mori-Zwanzig formalism [13, 14], which include amapping from the FG system to the quantities of interest, we employ a probabilistic, generativecoarse-to-fine map [15] from the CG state-variables to the FG description. We indicate theassociated (conditional) density by: p c f ( xxx t | XXX t ; θθθ c f ) (3)where θθθ c f denote the (unknown) parameters that we will try to learn from the data. This condi-tional density p c f can be endowed a priori with domain knowledge by adapting its form to theparticulars of the problem or it can parametrized by deep neural networks to allow for maximumflexibility.Employing a probabilistic coarse-to-fine map instead of a deterministic, restriction operatorhas many advantages as e.g. the full FG system’s reconstruction and probabilistic predictiveestimates. In the following, we consider discretized time with a fixed time-step ∆ t and time-relatedsubscripts refer to the number of time-steps.We model the CG dynamics with the help of a deep neural network in order to gain a greatflexibility and be able to express nonlinear functions. Therefore, we assume an explicit dis-cretization of Equation (2) and model the right-hand-side by the deep neural network NN ( . ) parametrized by θθθ NN : XXX t + = XXX t + NN ( XXX t , θθθ NN ) + σ r εεε , εεε ∼ N ( , III ) (4)where the parameter σ r ≥ p ( XXX t + | XXX t , θθθ NN , σ r ) = N ( XXX t + | XXX t + NN ( XXX t , θθθ NN ) , σ r III ) (5)which effectively represents a discretized version of the neural stochastic ODEs of [10] and ismore flexible as compared to approaches in which the right-hand side consists of a restrictedamount of first- and second-order interactions of XXX t [6]. As the CG state-variables
XXX employed in multiscale modeling are usually given physicalmeaning, we employ the concept of virtual observables [6] in order to incorporate generalphysical principles such as conservation of mass, momentum or energy. Let these be expressedas equalities of the form at each time-step l : ccc l ( XXX l ) = , l = , , . . . (6)3 ebastian Kaltenbach, Phaedon-Stelios Koutsourelakisebastian Kaltenbach, Phaedon-Stelios Koutsourelakis
XXX employed in multiscale modeling are usually given physicalmeaning, we employ the concept of virtual observables [6] in order to incorporate generalphysical principles such as conservation of mass, momentum or energy. Let these be expressedas equalities of the form at each time-step l : ccc l ( XXX l ) = , l = , , . . . (6)3 ebastian Kaltenbach, Phaedon-Stelios Koutsourelakisebastian Kaltenbach, Phaedon-Stelios Koutsourelakis where ccc l : X c ⊂ R d c → R M c . The only requirement we will impose is that of differentiability of ccc l [6]. We define a new variable ˆ ccc l which relates to ccc l as follows:ˆ ccc l = ccc l ( XXX l ) + σ c εεε c , εεε c ∼ N ( , III ) (7)Now, it is assumed that the ˆ ccc l have been virtually observed and this set of virtual observationsˆ ccc l = p ( ˆ ccc l = | XXX l , σ R ) = N ( | ccc l ( XXX l ) , σ c III ) (8)The “noise” parameter σ c can be used to account for the intensity of the enforcement of thevirtual observations and represents the tolerance parameter with which the constraints would beenforced in a deterministic setting.We note that the concept of virtual observables is not restricted to physical constraints butcould also be applied to residuals of temporal discretization schemes [6] or of PDEs [16]. Inboth of this cases, it is shown that the incorporation of virtual observables can reduce the amountof training data required and enable training in the Small Data regime. Due to the introduction of virtual observables, we can adopt an enlarged definition of datawhich we cumulatively denote by D = (cid:110) ˆ xxx ( n ) T , ˆ ccc ( n ) T (cid:111) and which encompasses:• FG simulation data consisting of n sequences of the FG state-variables. These are denotedby ˆ xxx ( n ) T as the likelihood model implied by the p c f in Equation (3) involves only theobservables at each coarse time-step.• Virtual observables ˆ ccc ( n ) l relating to the CG states XXX l at each time-step l and which relateto the physical constraints as in Equation (7). In the example they pertain to all time-stepsfrom 0 to T and are denoted by ˆ ccc ( n ) T .We represent the latent (unobserved) variables of the model by the CG state-variables XXX ( n ) T and relate them to the FG data through the p c f (in Equation (3)) and to the virtual observablesthrough Equation (8). The parameters of the model are denoted cumulatively by θθθ and consistof :• θθθ NN which parametrize the neural network for the right-hand-side of the CG evolutionlaw (see section 2.3),• θθθ c f which parametrize the probabilistic coarse-to-fine map (Equation (3)),• σ r involved in the stochasticity of the transition law Equation (4) and• σ c involved in the enforcement of virtual observables in Equation (7) If any of these parameters are prescribed, then they are omitted from θθθ . ebastian Kaltenbach, Phaedon-Stelios Koutsourelakisebastian Kaltenbach, Phaedon-Stelios Koutsourelakis
XXX employed in multiscale modeling are usually given physicalmeaning, we employ the concept of virtual observables [6] in order to incorporate generalphysical principles such as conservation of mass, momentum or energy. Let these be expressedas equalities of the form at each time-step l : ccc l ( XXX l ) = , l = , , . . . (6)3 ebastian Kaltenbach, Phaedon-Stelios Koutsourelakisebastian Kaltenbach, Phaedon-Stelios Koutsourelakis where ccc l : X c ⊂ R d c → R M c . The only requirement we will impose is that of differentiability of ccc l [6]. We define a new variable ˆ ccc l which relates to ccc l as follows:ˆ ccc l = ccc l ( XXX l ) + σ c εεε c , εεε c ∼ N ( , III ) (7)Now, it is assumed that the ˆ ccc l have been virtually observed and this set of virtual observationsˆ ccc l = p ( ˆ ccc l = | XXX l , σ R ) = N ( | ccc l ( XXX l ) , σ c III ) (8)The “noise” parameter σ c can be used to account for the intensity of the enforcement of thevirtual observations and represents the tolerance parameter with which the constraints would beenforced in a deterministic setting.We note that the concept of virtual observables is not restricted to physical constraints butcould also be applied to residuals of temporal discretization schemes [6] or of PDEs [16]. Inboth of this cases, it is shown that the incorporation of virtual observables can reduce the amountof training data required and enable training in the Small Data regime. Due to the introduction of virtual observables, we can adopt an enlarged definition of datawhich we cumulatively denote by D = (cid:110) ˆ xxx ( n ) T , ˆ ccc ( n ) T (cid:111) and which encompasses:• FG simulation data consisting of n sequences of the FG state-variables. These are denotedby ˆ xxx ( n ) T as the likelihood model implied by the p c f in Equation (3) involves only theobservables at each coarse time-step.• Virtual observables ˆ ccc ( n ) l relating to the CG states XXX l at each time-step l and which relateto the physical constraints as in Equation (7). In the example they pertain to all time-stepsfrom 0 to T and are denoted by ˆ ccc ( n ) T .We represent the latent (unobserved) variables of the model by the CG state-variables XXX ( n ) T and relate them to the FG data through the p c f (in Equation (3)) and to the virtual observablesthrough Equation (8). The parameters of the model are denoted cumulatively by θθθ and consistof :• θθθ NN which parametrize the neural network for the right-hand-side of the CG evolutionlaw (see section 2.3),• θθθ c f which parametrize the probabilistic coarse-to-fine map (Equation (3)),• σ r involved in the stochasticity of the transition law Equation (4) and• σ c involved in the enforcement of virtual observables in Equation (7) If any of these parameters are prescribed, then they are omitted from θθθ . ebastian Kaltenbach, Phaedon-Stelios Koutsourelakisebastian Kaltenbach, Phaedon-Stelios Koutsourelakis We follow a fully-Bayesian formulation and express the posterior of the unknowns (i.e. latentvariables and parameters) as follows: p ( XXX ( n ) T , θθθ | D ) = p ( D | XXX ( n ) T , θθθ ) p ( XXX ( n ) T , θθθ ) p ( D ) (9)where p ( XXX ( n ) T , θθθ ) denotes the prior on the latent variables and parameters. The likelihood term p ( D | XXX ( n ) T , θθθ ) involved can be decomposed into the product of two (conditionally) independentterms, one for the FG data and one for the virtual observables, i.e.: p ( D | XXX ( n ) T , θθθ ) = p ( ˆ xxx ( n ) T | XXX ( n ) T , θθθ ) p ( ˆ ccc ( n ) T | XXX ( n ) T , θθθ ) (10)We further note that (from Equation (3)): p ( ˆ xxx ( n ) T | XXX ( n ) T , θθθ ) = n ∏ i = T ∏ t = p c f ( xxx ( i ) t | XXX ( i ) t , θθθ c f ) (11)and (from Equation (8)): p ( ˆ ccc ( n ) T | XXX ( n ) T , θθθ ) = ∏ ni = ∏ Tl = N ( | ccc l ( XXX ( i ) l ) , σ c III ) ∝ ∏ ni = ∏ Tl = σ dim ( ccc ) c exp (cid:26) − σ c (cid:12)(cid:12)(cid:12) ccc l ( XXX ( i ) l ) (cid:12)(cid:12)(cid:12) (cid:27) (12)The prior p ( XXX ( n ) T , θθθ ) can be decomposed into the transition density of Equation (5) and a priorfor XXX as well as the parameters θθθ : p ( XXX ( n ) T , θθθ ) = n ∏ i = p ( XXX ( i ) ) T − ∏ t = p ( XXX ( i ) t + | XXX ( i ) t , θθθ NN , σ r ) p ( θθθ ) (13)We advocate the use of Stochastic Variational Inference [17] for computing an approximateposterior. We select a parameterized family of densities, q φφφ ( XXX ( n ) T , θθθ ) and attempt to findthe one that best approximates the posterior by minimizing their Kullback-Leibler divergence.It can be shown [18], that this optimal q φφφ maximizes the Evidence Lower Bound (ELBO) F ( q φφφ ( XXX ( n ) T , θθθ )) :log p ( D ) = log (cid:82) p ( D , XXX ( n ) T , θθθ ) dXXX ( n ) T d θθθ = log (cid:82) p ( D | XXX ( n ) T , θθθ ) p ( XXX ( n ) T , θθθ ) q φφφ ( XXX ( n ) T , θθθ ) q φφφ ( XXX ( n ) T , θθθ ) dXXX ( n ) T d θθθ ≥ (cid:82) log p ( D | XXX ( n ) T , θθθ ) p ( XXX ( n ) T , θθθ ) q φφφ ( XXX ( n ) T , θθθ ) q φφφ ( XXX ( n ) T , θθθ ) dXXX ( n ) T d θθθ = F ( q φφφ ( XXX ( n ) T , θθθ )) (14)5 ebastian Kaltenbach, Phaedon-Stelios Koutsourelakisebastian Kaltenbach, Phaedon-Stelios Koutsourelakis
XXX employed in multiscale modeling are usually given physicalmeaning, we employ the concept of virtual observables [6] in order to incorporate generalphysical principles such as conservation of mass, momentum or energy. Let these be expressedas equalities of the form at each time-step l : ccc l ( XXX l ) = , l = , , . . . (6)3 ebastian Kaltenbach, Phaedon-Stelios Koutsourelakisebastian Kaltenbach, Phaedon-Stelios Koutsourelakis where ccc l : X c ⊂ R d c → R M c . The only requirement we will impose is that of differentiability of ccc l [6]. We define a new variable ˆ ccc l which relates to ccc l as follows:ˆ ccc l = ccc l ( XXX l ) + σ c εεε c , εεε c ∼ N ( , III ) (7)Now, it is assumed that the ˆ ccc l have been virtually observed and this set of virtual observationsˆ ccc l = p ( ˆ ccc l = | XXX l , σ R ) = N ( | ccc l ( XXX l ) , σ c III ) (8)The “noise” parameter σ c can be used to account for the intensity of the enforcement of thevirtual observations and represents the tolerance parameter with which the constraints would beenforced in a deterministic setting.We note that the concept of virtual observables is not restricted to physical constraints butcould also be applied to residuals of temporal discretization schemes [6] or of PDEs [16]. Inboth of this cases, it is shown that the incorporation of virtual observables can reduce the amountof training data required and enable training in the Small Data regime. Due to the introduction of virtual observables, we can adopt an enlarged definition of datawhich we cumulatively denote by D = (cid:110) ˆ xxx ( n ) T , ˆ ccc ( n ) T (cid:111) and which encompasses:• FG simulation data consisting of n sequences of the FG state-variables. These are denotedby ˆ xxx ( n ) T as the likelihood model implied by the p c f in Equation (3) involves only theobservables at each coarse time-step.• Virtual observables ˆ ccc ( n ) l relating to the CG states XXX l at each time-step l and which relateto the physical constraints as in Equation (7). In the example they pertain to all time-stepsfrom 0 to T and are denoted by ˆ ccc ( n ) T .We represent the latent (unobserved) variables of the model by the CG state-variables XXX ( n ) T and relate them to the FG data through the p c f (in Equation (3)) and to the virtual observablesthrough Equation (8). The parameters of the model are denoted cumulatively by θθθ and consistof :• θθθ NN which parametrize the neural network for the right-hand-side of the CG evolutionlaw (see section 2.3),• θθθ c f which parametrize the probabilistic coarse-to-fine map (Equation (3)),• σ r involved in the stochasticity of the transition law Equation (4) and• σ c involved in the enforcement of virtual observables in Equation (7) If any of these parameters are prescribed, then they are omitted from θθθ . ebastian Kaltenbach, Phaedon-Stelios Koutsourelakisebastian Kaltenbach, Phaedon-Stelios Koutsourelakis We follow a fully-Bayesian formulation and express the posterior of the unknowns (i.e. latentvariables and parameters) as follows: p ( XXX ( n ) T , θθθ | D ) = p ( D | XXX ( n ) T , θθθ ) p ( XXX ( n ) T , θθθ ) p ( D ) (9)where p ( XXX ( n ) T , θθθ ) denotes the prior on the latent variables and parameters. The likelihood term p ( D | XXX ( n ) T , θθθ ) involved can be decomposed into the product of two (conditionally) independentterms, one for the FG data and one for the virtual observables, i.e.: p ( D | XXX ( n ) T , θθθ ) = p ( ˆ xxx ( n ) T | XXX ( n ) T , θθθ ) p ( ˆ ccc ( n ) T | XXX ( n ) T , θθθ ) (10)We further note that (from Equation (3)): p ( ˆ xxx ( n ) T | XXX ( n ) T , θθθ ) = n ∏ i = T ∏ t = p c f ( xxx ( i ) t | XXX ( i ) t , θθθ c f ) (11)and (from Equation (8)): p ( ˆ ccc ( n ) T | XXX ( n ) T , θθθ ) = ∏ ni = ∏ Tl = N ( | ccc l ( XXX ( i ) l ) , σ c III ) ∝ ∏ ni = ∏ Tl = σ dim ( ccc ) c exp (cid:26) − σ c (cid:12)(cid:12)(cid:12) ccc l ( XXX ( i ) l ) (cid:12)(cid:12)(cid:12) (cid:27) (12)The prior p ( XXX ( n ) T , θθθ ) can be decomposed into the transition density of Equation (5) and a priorfor XXX as well as the parameters θθθ : p ( XXX ( n ) T , θθθ ) = n ∏ i = p ( XXX ( i ) ) T − ∏ t = p ( XXX ( i ) t + | XXX ( i ) t , θθθ NN , σ r ) p ( θθθ ) (13)We advocate the use of Stochastic Variational Inference [17] for computing an approximateposterior. We select a parameterized family of densities, q φφφ ( XXX ( n ) T , θθθ ) and attempt to findthe one that best approximates the posterior by minimizing their Kullback-Leibler divergence.It can be shown [18], that this optimal q φφφ maximizes the Evidence Lower Bound (ELBO) F ( q φφφ ( XXX ( n ) T , θθθ )) :log p ( D ) = log (cid:82) p ( D , XXX ( n ) T , θθθ ) dXXX ( n ) T d θθθ = log (cid:82) p ( D | XXX ( n ) T , θθθ ) p ( XXX ( n ) T , θθθ ) q φφφ ( XXX ( n ) T , θθθ ) q φφφ ( XXX ( n ) T , θθθ ) dXXX ( n ) T d θθθ ≥ (cid:82) log p ( D | XXX ( n ) T , θθθ ) p ( XXX ( n ) T , θθθ ) q φφφ ( XXX ( n ) T , θθθ ) q φφφ ( XXX ( n ) T , θθθ ) dXXX ( n ) T d θθθ = F ( q φφφ ( XXX ( n ) T , θθθ )) (14)5 ebastian Kaltenbach, Phaedon-Stelios Koutsourelakisebastian Kaltenbach, Phaedon-Stelios Koutsourelakis In the following illustrations, we postulate a mean-field decomposition: q φφφ ( XXX ( n ) T , θθθ ) = q φφφ ( XXX ( n ) T ) p φφφ ( θθθ ) = (cid:34) n ∏ i = q φφφ ( XXX ( i ) T ) (cid:35) δ φφφ ( θθθ ) (15)where we make use of the (conditional) independence of the time sequences in the likelihood.We further note that we employed Dirac δ φφφ functions for the q φφφ ( θθθ ) and therefore obtain MAPestimates θθθ MAP (i.e. φφφ includes θθθ
MAP ) for the unknown parameters.Gradients of the ELBO with respect to the parameters φφφ involve expectations with respect to q φφφ . These were approximated with Monte Carlo estimates which employ the reparametrizationtrick [19] and stochastic optimization was carried out with the ADAM algorithm [20]. The proposed framework can produce probabilistic predictive estimates for a sequence whichwas observed up to time-step T i.e. ˆ xxx ( i ) T . This predictive uncertainty reflects not only theinformation-loss due to the coarse-graining process but also the epistemic uncertainty arisingfrom finite (and small) datasets.In particular, if q φφφ ( XXX ( i ) T ) is the (marginal) posterior of the last, hidden CG state and θθθ MAP theMAP estimate of the model parameters, then we follow the steps described in Algorithm 1. Thisprocedure generates samples of the full FG state evolution but does not necessarily guaranteesthe enforcement of the constraints for the CG states.We note that if we would also like to enforce the constraints ccc l for future predictions, thenthese would need to be included in the posterior density defined in Equation (9). Consequently,future (FG or CG) states would need to be inferred from this augmented posterior and an en-larged inference process is required for predictions. Algorithm 1:
Prediction - Algorithm
Result:
Sample of xxx ( i )( T + P ) Data: q φφφ ( XXX T ) , θθθ MAP Sample from q φφφ ( XXX ( i ) T ) ; while Time-step ( T + P ) not reached do Sample from the CG evolution law in Equation (4); end Sample from p c f ( xxx ( T + P ) | XXX ( T + P ) , θθθ MAP ) ebastian Kaltenbach, Phaedon-Stelios Koutsourelakisebastian Kaltenbach, Phaedon-Stelios Koutsourelakis
Sample of xxx ( i )( T + P ) Data: q φφφ ( XXX T ) , θθθ MAP Sample from q φφφ ( XXX ( i ) T ) ; while Time-step ( T + P ) not reached do Sample from the CG evolution law in Equation (4); end Sample from p c f ( xxx ( T + P ) | XXX ( T + P ) , θθθ MAP ) ebastian Kaltenbach, Phaedon-Stelios Koutsourelakisebastian Kaltenbach, Phaedon-Stelios Koutsourelakis We demonstrate the capabilities of the proposed framework by applying it to a high-dimensionalsystem of stochastically moving particles.
For the simulations presented in this section, we used d f = × particles, which, ateach microscopic time step δ t = . × − performed random, non-interacting, jumps of size δ s = , either to the left with probability p le f t = . p right = . [ − , ] with periodic boundary conditions.It is well-known [21] that in the limit (i.e. d f → ∞ ) the particle density ρ ( s , t ) can be describedwith an advection-diffusion PDE with diffusion constant D = ( p le f t + p right ) δ s δ t and velocity v = ( p right − p le f t ) δ s δ t : ∂ ρ∂ t + v ∂ ρ∂ s = D ∂ ρ∂ s , s ∈ ( − , ) .. (16) The CG model relates to a discretization of the particle density into d c =
25 equally-sizedbins at each coarse time step . The nature of the CG variables
XXX t gives rise to a multinomial forthe coarse-to-fine density p c f (section 2.2) i.e.: p c f ( xxx t | XXX t ) = d f ! m ( xxx t ) ! m ( xxx t ) ! . . . m d c ( xxx t ) ! d c ∏ j = X m j ( xxx t ) t , j , (17)where m j ( xxx t ) is the number of particles in bin j . We assume that, given the CG state XXX t , thecoordinates of the particles xxx t are conditionally independent. This does not imply that theymove independently nor that they cannot exhibit coherent behavior [22]. The consequence ofEquation (17) is that for this example no parameters need to be learned for p c f .For the transition law (section 2.3), we assume a coarse time step of ∆ t = NN ( . ) with ReLU activation functions. Each layersconsisted of 25 neurons.We enforce conservation of mass, using the following constraint at eachtime step l : c l ( XXX l ) = − d c ∑ j = X l , j = , l = , , . . . (18)These are complemented by the virtual observables presented earlier and with σ c = − (Equa-tion (7)).For the family of variational distributions q φφφ ( XXX ( i )( T ) ) and since X ( i ) t , j > , ∀ j , t , we employedmultivariate lognormals with a diagonal covariance matrices i.e. we assume X ( i ) t , j are a posterioriindependent. The mean and covariance matrix of the underlying Gaussians for each sequence7 ebastian Kaltenbach, Phaedon-Stelios Koutsourelakisebastian Kaltenbach, Phaedon-Stelios Koutsourelakis
XXX t gives rise to a multinomial forthe coarse-to-fine density p c f (section 2.2) i.e.: p c f ( xxx t | XXX t ) = d f ! m ( xxx t ) ! m ( xxx t ) ! . . . m d c ( xxx t ) ! d c ∏ j = X m j ( xxx t ) t , j , (17)where m j ( xxx t ) is the number of particles in bin j . We assume that, given the CG state XXX t , thecoordinates of the particles xxx t are conditionally independent. This does not imply that theymove independently nor that they cannot exhibit coherent behavior [22]. The consequence ofEquation (17) is that for this example no parameters need to be learned for p c f .For the transition law (section 2.3), we assume a coarse time step of ∆ t = NN ( . ) with ReLU activation functions. Each layersconsisted of 25 neurons.We enforce conservation of mass, using the following constraint at eachtime step l : c l ( XXX l ) = − d c ∑ j = X l , j = , l = , , . . . (18)These are complemented by the virtual observables presented earlier and with σ c = − (Equa-tion (7)).For the family of variational distributions q φφφ ( XXX ( i )( T ) ) and since X ( i ) t , j > , ∀ j , t , we employedmultivariate lognormals with a diagonal covariance matrices i.e. we assume X ( i ) t , j are a posterioriindependent. The mean and covariance matrix of the underlying Gaussians for each sequence7 ebastian Kaltenbach, Phaedon-Stelios Koutsourelakisebastian Kaltenbach, Phaedon-Stelios Koutsourelakis i become part of the parameters φφφ with respect to which the ELBO is maximized (see Section2.5). We note that it would also be possible to use an amortized formulation and explicitly ac-count for the dependence on the data values by employing a neural network for both mean andcovariance with the time sequence as an input. Timesteps in t s Timesteps in t s Figure 1: Particle density: Inferred and predicted posterior mean (bottom) in comparison withthe ground truth (top). The red line divides inferred quantities from predicted ones.We employed n =
64 time sequences with T = x t futures. The trained model is able to accurately track first-order statistics well into thefuture for many more time steps than those contained in the training data.A more detailed view of the predictive estimates with snapshots of the particle density atselected time instances is presented in Figure 2 and 3 where the predictive posterior mean butalso the associated uncertainty is displayed. Inferred as well as predicted particle densities8 ebastian Kaltenbach, Phaedon-Stelios Koutsourelakisebastian Kaltenbach, Phaedon-Stelios Koutsourelakis
64 time sequences with T = x t futures. The trained model is able to accurately track first-order statistics well into thefuture for many more time steps than those contained in the training data.A more detailed view of the predictive estimates with snapshots of the particle density atselected time instances is presented in Figure 2 and 3 where the predictive posterior mean butalso the associated uncertainty is displayed. Inferred as well as predicted particle densities8 ebastian Kaltenbach, Phaedon-Stelios Koutsourelakisebastian Kaltenbach, Phaedon-Stelios Koutsourelakis match accurately the ground-truth and reasonable uncertainty bounds are computed. s ( s , t ) Posterior MeanReference+/- 1 Standard Deviation 1.00 0.75 0.50 0.25 0.00 0.25 0.50 0.75 1.00 s ( s , t ) Posterior MeanReference+/- 1 Standard Deviation 1.00 0.75 0.50 0.25 0.00 0.25 0.50 0.75 1.00 s ( s , t ) Posterior MeanReference+/- 1 Standard Deviation
Figure 2: Inferred particle density profiles at t = , ∆ t , ∆ t (from left to right). s ( s , t ) Posterior MeanReference+/- 1 Standard Deviation 1.00 0.75 0.50 0.25 0.00 0.25 0.50 0.75 1.00 s ( s , t ) Posterior MeanReference+/- 1 Standard Deviation1.00 0.75 0.50 0.25 0.00 0.25 0.50 0.75 1.00 s ( s , t ) Posterior MeanReference+/- 1 Standard Deviation 1.00 0.75 0.50 0.25 0.00 0.25 0.50 0.75 1.00 s ( s , t ) Posterior MeanReference+/- 1 Standard Deviation
Figure 3: Predicted particle density profiles at t = ∆ t , ∆ t , ∆ t , ∆ t (from left to right andtop to bottom).Finally, in Figure 4, the mass constraint is depicted for inferred as well as predicted particledensities and good agreement with the target value ( =
1) is observed. This result is particularlyimportant as it demonstrates that the virtual observables were able to find CG state variablesthat agree with an a priori given physical constraint and additionally a transition law has beenlearned that is able to automatically satisfy the constraint in the future. We combined a probabilistic generative model with physical constraints and deep neural net-works in order to obtain a framework for the automated discovery of coarse-grained variables9 ebastian Kaltenbach, Phaedon-Stelios Koutsourelakisebastian Kaltenbach, Phaedon-Stelios Koutsourelakis
1) is observed. This result is particularlyimportant as it demonstrates that the virtual observables were able to find CG state variablesthat agree with an a priori given physical constraint and additionally a transition law has beenlearned that is able to automatically satisfy the constraint in the future. We combined a probabilistic generative model with physical constraints and deep neural net-works in order to obtain a framework for the automated discovery of coarse-grained variables9 ebastian Kaltenbach, Phaedon-Stelios Koutsourelakisebastian Kaltenbach, Phaedon-Stelios Koutsourelakis
Timesteps in t M a ss Inferred/Predicted Mean+/- 1 Standard Deviation
Figure 4: Mass based on inferred and predicted particle densities.and dynamics based on fine-grained simulation data. The FG simulation data are augmented ina fully Bayesian fashion by virtual observables that enable the incorporation of physical con-straints at the CG level. These could be for instance conservation laws that are available whenCG variables have physical meaning. Deviations from such conservation laws would invalidatepredictions. As a result of augmenting the training data with domain knowledge, the model pro-posed can learn from Small Data (i.e. shorter and fewer FG time-sequences) which is a crucialadvantage in multiscale settings where the simulation of the FG dynamics is computationallyvery expensive.Our approach learns simultaneously a coarse-to-fine mapping and a transition law for thecoarse-grained dynamics by employing probabilistic inference tools for the latent variables andmodel parameters. Deep neural networks can be used in both of these components in order toendow great expressiveness and flexibility.The model proposed was successfully tested on a coarse-graining task which involved stochas-tic particle dynamics. In the example presented, the method was able to accurately predict parti-cle densities at time steps not contained in the training data. Moreover, as it is able to reconstructthe entire FG state vector at any future time instant, it is capable of producing predictions ofany FG observable of interest as well as quantify the associated predictive uncertainty.A shortcoming of presented framework is that the CG dynamics are not fully interpretableand long-term stability is not guaranteed. These limitations have been addressed in [23] wherean additional layer of latent variables was employed that ensured the discovery of stable CG dy-namics but also promoted the identification of slow-varying processes that are most predictiveof the system’s long-term evolution.
REFERENCES [1] D.Givon, R.Kupferman and A.Stuart, Extracting Macroscopic Dynamics: Model Prob-lems and Algorithms, Nonlinearity, 2004.[2] Z. Ghahraman, Probabilistic machine learning and artificial intelligence, Nature, 2015[3] Y. LeCun, Y. Bengio and G. Hinton, Deep Learning, Nature, 201510 ebastian Kaltenbach, Phaedon-Stelios Koutsourelakisebastian Kaltenbach, Phaedon-Stelios Koutsourelakis
REFERENCES [1] D.Givon, R.Kupferman and A.Stuart, Extracting Macroscopic Dynamics: Model Prob-lems and Algorithms, Nonlinearity, 2004.[2] Z. Ghahraman, Probabilistic machine learning and artificial intelligence, Nature, 2015[3] Y. LeCun, Y. Bengio and G. Hinton, Deep Learning, Nature, 201510 ebastian Kaltenbach, Phaedon-Stelios Koutsourelakisebastian Kaltenbach, Phaedon-Stelios Koutsourelakis [4] P.-S.Koutsourelakis, N.Zabaras and M. Girolami, Big data and predictive computationalmodeling, Journal of Computational Physics, 2016.[5] M.Alber, A.Tepole, W.Cannon, S.De, S.Dura-Bernal, K.Garikipati, G. Karniadakis,W.Lytton, P.Perdikaris, L.Petzold and E.Kuhl, Integrating machine learning and multi-scale modeling - perspectives, challenges,and opportunities in the biological, biomedical,and behavioral sciences, NPJ digital medicine, 2019[6] S. Kaltenbach and P.-S. Koutsourelakis, Incorporating physical constraints in a deep prob-abilistic machine learning framework for coarse-graining dynamical systems, Journal ofComputational Physics, 2020[7] P.Stinis, T.Hagge, A.M. Tartakovsky and E. Yeung: Enforcing constraints for interpolationand extrapolation in generative adversarial networks, Journal of Computational Physics,2019[8] S. L. Brunton, J. L. Proctor and J. N. Kutz, Discovering governing equations from databy sparse identification of nonlinear dynamical systems, Proceedings of the NationalAcademy of Sciences 113 (15) (2016) 3932–3937.[9] R.Chen, Y.Rubanova, J.Bettencourt and D.Duvenaud, Neural ordinary differential equa-tions, Advances in neural information porcessing systems, 2018[10] X. Li, T.-K. Wong, R. TQ Chen and D. Duvenaud, Scalable gradients for stochastic dif-ferential equations, arXiv preprint 2001.01328, 2020[11] M. Raissi and G.E. Karniadakis, Hidden physics models: Machine learning of nonlinearpartial differential equation, Journal of Computational Physics 357 (2018) 125–142.[12] P.-S. Koutsourelakis and I. Bilionis, Scalable Bayesian Reduced-Order Models for Simu-lating High-Dimensional Multiscale Dynamical Systems, Multiscale Modeling and Simu-lation, 2011[13] H.Mori, Transport, collective motion, and brownian motion, Progress of theoreticalphysics, 33(3):423–455, 1965.[14] R.Zwanzig, Nonlinear generalized langevin equations, Journal of Statistical Physics,9(3):215–220, 1973.[15] M. Sch¨oberl, N. Zabaras and P.-S. Koutsourelakis, Predictive coarse-graining, Journal ofComputational Physics 333 (2017) 49–77.[16] M. Rixner and P.-S. Koutsourelakis, A probabilistic generative model for semi-supervisedtraining of coarse-grained surrogates and enforcing physical constraints through virtualobservables, Arxiv Preprint 2006.01789, 2020[17] M.Hoffman, D.Blei, C. Wang and J.Paisley, Stochastic variational inference, The Journalof Machine Learning Research, 14(1):1303–1347, 2013.[18] C. Bishop, Pattern Recognition and Machine Learning. Springer, 2006.[19] D.Kingma and M.Welling, Auto-Encoding Variational Bayes, International Conference on11 ebastian Kaltenbach, Phaedon-Stelios Koutsourelakisebastian Kaltenbach, Phaedon-Stelios Koutsourelakis