[PDF] Physics-aware, deep probabilistic modeling of multiscale dynamics in the Small Data regime

Abstract

The data-based discovery of effective, coarse-grained (CG) models of high-dimensional dynamical systems presents a unique challenge in computational physics and particularly in the context of multiscale problems. The present paper offers a probabilistic perspective that simultaneously identifies predictive, lower-dimensional coarse-grained (CG) variables as well as their dynamics. We make use of the expressive ability of deep neural networks in order to represent the right-hand side of the CG evolution law. Furthermore, we demonstrate how domain knowledge that is very often available in the form of physical constraints (e.g. conservation laws) can be incorporated with the novel concept of virtual observables. Such constraints, apart from leading to physically realistic predictions, can significantly reduce the requisite amount of training data which enables reducing the amount of required, computationally expensive multiscale simulations (Small Data regime). The proposed state-space model is trained using probabilistic inference tools and, in contrast to several other techniques, does not require the prescription of a fine-to-coarse (restriction) projection nor time-derivatives of the state variables. The formulation adopted is capable of quantifying the predictive uncertainty as well as of reconstructing the evolution of the full, fine-scale system which allows to select the quantities of interest a posteriori. We demonstrate the efficacy of the proposed framework in a high-dimensional system of moving particles.

Full PDF

PPreprint

PHYSICS-AWARE, DEEP PROBABILISTIC MODELING OFMULTISCALE DYNAMICS IN THE SMALL DATA REGIME

Sebastian Kaltenbach , Phaedon-Stelios Koutsourelakis Professorship of Continuum Mechanics, Technical University of MunichBoltzmannstr.15, 85748 Garching { sebastian.kaltenbach; p.s.koutsourelakis } Key words:

Bayesian machine learning, virtual observables, multiscale modeling, coarse-graining

Abstract:

The data-based discovery of effective, coarse-grained (CG) models of high-dimen-sional dynamical systems presents a unique challenge in computational physics and particu-larly in the context of multiscale problems. The present paper offers a probabilistic perspectivethat simultaneously identiﬁes predictive, lower-dimensional coarse-grained (CG) variables aswell as their dynamics. We make use of the expressive ability of deep neural networks in or-der to represent the right-hand side of the CG evolution law. Furthermore, we demonstratehow domain knowledge that is very often available in the form of physical constraints (e.g.conservation laws) can be incorporated with the novel concept of virtual observables. Suchconstraints, apart from leading to physically realistic predictions, can signiﬁcantly reduce therequisite amount of training data which enables reducing the amount of required, computation-ally expensive multiscale simulations (Small Data regime). The proposed state-space model istrained using probabilistic inference tools and, in contrast to several other techniques, does notrequire the prescription of a ﬁne-to-coarse (restriction) projection nor time-derivatives of thestate variables. The formulation adopted is capable of quantifying the predictive uncertainty aswell as of reconstructing the evolution of the full, ﬁne-scale system which allows to select thequantities of interest a posteriori. We demonstrate the efﬁcacy of the proposed framework in ahigh-dimensional system of moving particles.

The solution of high-dimensional, multiscale system is challenging as the required compu-tational resources usually grow exponentially with the dimension of the state-space as well aswith the smallest time-scale that needs to be resolved. As such systems are ubiqitious in ap-plied physics and engineering, reduced/coarse-grained descriptions and models are necessarythat are predictive of various observables or the high-dimensional system, but whose discretiza-tion time-scales can be much larger than the inherent ones [1].We adopt a data-based perspective [2, 3] that relies on data generated by simulations of aﬁne-grained (FG) system in order to learn a coarse-grained (CG) model. We nevertheless notethat such coarse-graining tasks exhibit fundamental differences from large-scale machine learn-1 a r X i v : . [ phy s i c s . c o m p - ph ] F e b ebastian Kaltenbach, Phaedon-Stelios Koutsourelakisebastian Kaltenbach, Phaedon-Stelios Koutsourelakis ing tasks [4, 5] as the data involved is usually small due to the expensive data acquisition andas information about the underlying physical structure of the problem is available. When thisdomain knowledge is incorporated into the CG model it can improve its predictive ability [6, 7].In contrast to other frameworks for reduced-order modeling (e.g. SINDy [8]) where the dy-namics of the CG model is learned based on a large vocabulary of feature functions, we employa deep neural network for the CG dynamics in order to gain great ﬂexibility and be able to notrestrict ourselves to an a priori chosen set of feature functions. This approach is similar to theideas of Neural ODEs [9] and Neural SDEs [10] which also use neural networks to representthe dynamics. Another possibility would be the use of Gaussian Processes [11] which wouldallow non-parametric, probabilistic modeling.In this paper, we combine a generative, probabilistic machine learning framework [12] withvirtual observables [6] and deep neural networks for the CG dynamics as well as the mappingfrom the CG states to the FG states. In doing so, we propose a framework that can make use ofthe ﬂexibility of neural nets, while still obeying physical laws. We carry out the tasks of modelestimation and dimensionality reduction simultaneously and identify the CG states variables,their dynamics as well as a probabilistic coarse-to-ﬁne map based only on small amounts of FGsimulation data. In general, the subscript f or lower-case letters are used to denote variables associated withthe (high-dimensional) ﬁne-grained(FG) model and the subscript c or upper-case letters areused for quantities of the (lower-dimensional) coarse-grained(CG) description. We also use acircumﬂex ˆ to denote observed/known variables. The ﬁne-grained system considered is a high-dimensional system with state variables xxx ( xxx ∈ X f ⊂ R d f ), whose dimension d f is very large ( d f >> xxx t = fff ( xxx t , t ) , t > xxx that might be determin-istic or drawn from a speciﬁed distribution. In this work, we want to coarse-grain such a systemonly based on simulated data, i.e. time sequences simulated from Equation (1) with a time-step δ t .Our goal is to simultaneously identify (unknown) CG state variables XXX with

XXX ∈ X c ⊂ R d c as well as the dynamics of those CG variables. The dimension d c of these CG state variables isintended to be much smaller than d f . For the CG dynamics a Markovian dynamic is assumedin the form: ˙ XXX t = FFF ( XXX t , t ) , t > ebastian Kaltenbach, Phaedon-Stelios Koutsourelakisebastian Kaltenbach, Phaedon-Stelios Koutsourelakis In contrast to approaches based on the Mori-Zwanzig formalism [13, 14], which include amapping from the FG system to the quantities of interest, we employ a probabilistic, generativecoarse-to-ﬁne map [15] from the CG state-variables to the FG description. We indicate theassociated (conditional) density by: p c f ( xxx t | XXX t ; θθθ c f ) (3)where θθθ c f denote the (unknown) parameters that we will try to learn from the data. This condi-tional density p c f can be endowed a priori with domain knowledge by adapting its form to theparticulars of the problem or it can parametrized by deep neural networks to allow for maximumﬂexibility.Employing a probabilistic coarse-to-ﬁne map instead of a deterministic, restriction operatorhas many advantages as e.g. the full FG system’s reconstruction and probabilistic predictiveestimates. In the following, we consider discretized time with a ﬁxed time-step ∆ t and time-relatedsubscripts refer to the number of time-steps.We model the CG dynamics with the help of a deep neural network in order to gain a greatﬂexibility and be able to express nonlinear functions. Therefore, we assume an explicit dis-cretization of Equation (2) and model the right-hand-side by the deep neural network NN ( . ) parametrized by θθθ NN : XXX t + = XXX t + NN ( XXX t , θθθ NN ) + σ r εεε , εεε ∼ N ( , III ) (4)where the parameter σ r ≥ p ( XXX t + | XXX t , θθθ NN , σ r ) = N ( XXX t + | XXX t + NN ( XXX t , θθθ NN ) , σ r III ) (5)which effectively represents a discretized version of the neural stochastic ODEs of [10] and ismore ﬂexible as compared to approaches in which the right-hand side consists of a restrictedamount of ﬁrst- and second-order interactions of XXX t [6]. As the CG state-variables

XXX employed in multiscale modeling are usually given physicalmeaning, we employ the concept of virtual observables [6] in order to incorporate generalphysical principles such as conservation of mass, momentum or energy. Let these be expressedas equalities of the form at each time-step l : ccc l ( XXX l ) = , l = , , . . . (6)3 ebastian Kaltenbach, Phaedon-Stelios Koutsourelakisebastian Kaltenbach, Phaedon-Stelios Koutsourelakis where ccc l : X c ⊂ R d c → R M c . The only requirement we will impose is that of differentiability of ccc l [6]. We deﬁne a new variable ˆ ccc l which relates to ccc l as follows:ˆ ccc l = ccc l ( XXX l ) + σ c εεε c , εεε c ∼ N ( , III ) (7)Now, it is assumed that the ˆ ccc l have been virtually observed and this set of virtual observationsˆ ccc l = p ( ˆ ccc l = | XXX l , σ R ) = N ( | ccc l ( XXX l ) , σ c III ) (8)The “noise” parameter σ c can be used to account for the intensity of the enforcement of thevirtual observations and represents the tolerance parameter with which the constraints would beenforced in a deterministic setting.We note that the concept of virtual observables is not restricted to physical constraints butcould also be applied to residuals of temporal discretization schemes [6] or of PDEs [16]. Inboth of this cases, it is shown that the incorporation of virtual observables can reduce the amountof training data required and enable training in the Small Data regime. Due to the introduction of virtual observables, we can adopt an enlarged deﬁnition of datawhich we cumulatively denote by D = (cid:110) ˆ xxx ( n ) T , ˆ ccc ( n ) T (cid:111) and which encompasses:• FG simulation data consisting of n sequences of the FG state-variables. These are denotedby ˆ xxx ( n ) T as the likelihood model implied by the p c f in Equation (3) involves only theobservables at each coarse time-step.• Virtual observables ˆ ccc ( n ) l relating to the CG states XXX l at each time-step l and which relateto the physical constraints as in Equation (7). In the example they pertain to all time-stepsfrom 0 to T and are denoted by ˆ ccc ( n ) T .We represent the latent (unobserved) variables of the model by the CG state-variables XXX ( n ) T and relate them to the FG data through the p c f (in Equation (3)) and to the virtual observablesthrough Equation (8). The parameters of the model are denoted cumulatively by θθθ and consistof :• θθθ NN which parametrize the neural network for the right-hand-side of the CG evolutionlaw (see section 2.3),• θθθ c f which parametrize the probabilistic coarse-to-ﬁne map (Equation (3)),• σ r involved in the stochasticity of the transition law Equation (4) and• σ c involved in the enforcement of virtual observables in Equation (7) If any of these parameters are prescribed, then they are omitted from θθθ . ebastian Kaltenbach, Phaedon-Stelios Koutsourelakisebastian Kaltenbach, Phaedon-Stelios Koutsourelakis We follow a fully-Bayesian formulation and express the posterior of the unknowns (i.e. latentvariables and parameters) as follows: p ( XXX ( n ) T , θθθ | D ) = p ( D | XXX ( n ) T , θθθ ) p ( XXX ( n ) T , θθθ ) p ( D ) (9)where p ( XXX ( n ) T , θθθ ) denotes the prior on the latent variables and parameters. The likelihood term p ( D | XXX ( n ) T , θθθ ) involved can be decomposed into the product of two (conditionally) independentterms, one for the FG data and one for the virtual observables, i.e.: p ( D | XXX ( n ) T , θθθ ) = p ( ˆ xxx ( n ) T | XXX ( n ) T , θθθ ) p ( ˆ ccc ( n ) T | XXX ( n ) T , θθθ ) (10)We further note that (from Equation (3)): p ( ˆ xxx ( n ) T | XXX ( n ) T , θθθ ) = n ∏ i = T ∏ t = p c f ( xxx ( i ) t | XXX ( i ) t , θθθ c f ) (11)and (from Equation (8)): p ( ˆ ccc ( n ) T | XXX ( n ) T , θθθ ) = ∏ ni = ∏ Tl = N ( | ccc l ( XXX ( i ) l ) , σ c III ) ∝ ∏ ni = ∏ Tl = σ dim ( ccc ) c exp (cid:26) − σ c (cid:12)(cid:12)(cid:12) ccc l ( XXX ( i ) l ) (cid:12)(cid:12)(cid:12) (cid:27) (12)The prior p ( XXX ( n ) T , θθθ ) can be decomposed into the transition density of Equation (5) and a priorfor XXX as well as the parameters θθθ : p ( XXX ( n ) T , θθθ ) = n ∏ i = p ( XXX ( i ) ) T − ∏ t = p ( XXX ( i ) t + | XXX ( i ) t , θθθ NN , σ r ) p ( θθθ ) (13)We advocate the use of Stochastic Variational Inference [17] for computing an approximateposterior. We select a parameterized family of densities, q φφφ ( XXX ( n ) T , θθθ ) and attempt to ﬁndthe one that best approximates the posterior by minimizing their Kullback-Leibler divergence.It can be shown [18], that this optimal q φφφ maximizes the Evidence Lower Bound (ELBO) F ( q φφφ ( XXX ( n ) T , θθθ )) :log p ( D ) = log (cid:82) p ( D , XXX ( n ) T , θθθ ) dXXX ( n ) T d θθθ = log (cid:82) p ( D | XXX ( n ) T , θθθ ) p ( XXX ( n ) T , θθθ ) q φφφ ( XXX ( n ) T , θθθ ) q φφφ ( XXX ( n ) T , θθθ ) dXXX ( n ) T d θθθ ≥ (cid:82) log p ( D | XXX ( n ) T , θθθ ) p ( XXX ( n ) T , θθθ ) q φφφ ( XXX ( n ) T , θθθ ) q φφφ ( XXX ( n ) T , θθθ ) dXXX ( n ) T d θθθ = F ( q φφφ ( XXX ( n ) T , θθθ )) (14)5 ebastian Kaltenbach, Phaedon-Stelios Koutsourelakisebastian Kaltenbach, Phaedon-Stelios Koutsourelakis

MAP ) for the unknown parameters.Gradients of the ELBO with respect to the parameters φφφ involve expectations with respect to q φφφ . These were approximated with Monte Carlo estimates which employ the reparametrizationtrick [19] and stochastic optimization was carried out with the ADAM algorithm [20]. The proposed framework can produce probabilistic predictive estimates for a sequence whichwas observed up to time-step T i.e. ˆ xxx ( i ) T . This predictive uncertainty reﬂects not only theinformation-loss due to the coarse-graining process but also the epistemic uncertainty arisingfrom ﬁnite (and small) datasets.In particular, if q φφφ ( XXX ( i ) T ) is the (marginal) posterior of the last, hidden CG state and θθθ MAP theMAP estimate of the model parameters, then we follow the steps described in Algorithm 1. Thisprocedure generates samples of the full FG state evolution but does not necessarily guaranteesthe enforcement of the constraints for the CG states.We note that if we would also like to enforce the constraints ccc l for future predictions, thenthese would need to be included in the posterior density deﬁned in Equation (9). Consequently,future (FG or CG) states would need to be inferred from this augmented posterior and an en-larged inference process is required for predictions. Algorithm 1:

Prediction - Algorithm

Result:

Sample of xxx ( i )( T + P ) Data: q φφφ ( XXX T ) , θθθ MAP Sample from q φφφ ( XXX ( i ) T ) ; while Time-step ( T + P ) not reached do Sample from the CG evolution law in Equation (4); end Sample from p c f ( xxx ( T + P ) | XXX ( T + P ) , θθθ MAP ) ebastian Kaltenbach, Phaedon-Stelios Koutsourelakisebastian Kaltenbach, Phaedon-Stelios Koutsourelakis

For the simulations presented in this section, we used d f = × particles, which, ateach microscopic time step δ t = . × − performed random, non-interacting, jumps of size δ s = , either to the left with probability p le f t = . p right = . [ − , ] with periodic boundary conditions.It is well-known [21] that in the limit (i.e. d f → ∞ ) the particle density ρ ( s , t ) can be describedwith an advection-diffusion PDE with diffusion constant D = ( p le f t + p right ) δ s δ t and velocity v = ( p right − p le f t ) δ s δ t : ∂ ρ∂ t + v ∂ ρ∂ s = D ∂ ρ∂ s , s ∈ ( − , ) .. (16) The CG model relates to a discretization of the particle density into d c =

25 equally-sizedbins at each coarse time step . The nature of the CG variables

XXX t gives rise to a multinomial forthe coarse-to-ﬁne density p c f (section 2.2) i.e.: p c f ( xxx t | XXX t ) = d f ! m ( xxx t ) ! m ( xxx t ) ! . . . m d c ( xxx t ) ! d c ∏ j = X m j ( xxx t ) t , j , (17)where m j ( xxx t ) is the number of particles in bin j . We assume that, given the CG state XXX t , thecoordinates of the particles xxx t are conditionally independent. This does not imply that theymove independently nor that they cannot exhibit coherent behavior [22]. The consequence ofEquation (17) is that for this example no parameters need to be learned for p c f .For the transition law (section 2.3), we assume a coarse time step of ∆ t = NN ( . ) with ReLU activation functions. Each layersconsisted of 25 neurons.We enforce conservation of mass, using the following constraint at eachtime step l : c l ( XXX l ) = − d c ∑ j = X l , j = , l = , , . . . (18)These are complemented by the virtual observables presented earlier and with σ c = − (Equa-tion (7)).For the family of variational distributions q φφφ ( XXX ( i )( T ) ) and since X ( i ) t , j > , ∀ j , t , we employedmultivariate lognormals with a diagonal covariance matrices i.e. we assume X ( i ) t , j are a posterioriindependent. The mean and covariance matrix of the underlying Gaussians for each sequence7 ebastian Kaltenbach, Phaedon-Stelios Koutsourelakisebastian Kaltenbach, Phaedon-Stelios Koutsourelakis

64 time sequences with T = x t futures. The trained model is able to accurately track ﬁrst-order statistics well into thefuture for many more time steps than those contained in the training data.A more detailed view of the predictive estimates with snapshots of the particle density atselected time instances is presented in Figure 2 and 3 where the predictive posterior mean butalso the associated uncertainty is displayed. Inferred as well as predicted particle densities8 ebastian Kaltenbach, Phaedon-Stelios Koutsourelakisebastian Kaltenbach, Phaedon-Stelios Koutsourelakis match accurately the ground-truth and reasonable uncertainty bounds are computed. s ( s , t ) Posterior MeanReference+/- 1 Standard Deviation 1.00 0.75 0.50 0.25 0.00 0.25 0.50 0.75 1.00 s ( s , t ) Posterior MeanReference+/- 1 Standard Deviation 1.00 0.75 0.50 0.25 0.00 0.25 0.50 0.75 1.00 s ( s , t ) Posterior MeanReference+/- 1 Standard Deviation

Figure 2: Inferred particle density proﬁles at t = , ∆ t , ∆ t (from left to right). s ( s , t ) Posterior MeanReference+/- 1 Standard Deviation 1.00 0.75 0.50 0.25 0.00 0.25 0.50 0.75 1.00 s ( s , t ) Posterior MeanReference+/- 1 Standard Deviation1.00 0.75 0.50 0.25 0.00 0.25 0.50 0.75 1.00 s ( s , t ) Posterior MeanReference+/- 1 Standard Deviation 1.00 0.75 0.50 0.25 0.00 0.25 0.50 0.75 1.00 s ( s , t ) Posterior MeanReference+/- 1 Standard Deviation

Figure 3: Predicted particle density proﬁles at t = ∆ t , ∆ t , ∆ t , ∆ t (from left to right andtop to bottom).Finally, in Figure 4, the mass constraint is depicted for inferred as well as predicted particledensities and good agreement with the target value ( =

1) is observed. This result is particularlyimportant as it demonstrates that the virtual observables were able to ﬁnd CG state variablesthat agree with an a priori given physical constraint and additionally a transition law has beenlearned that is able to automatically satisfy the constraint in the future. We combined a probabilistic generative model with physical constraints and deep neural net-works in order to obtain a framework for the automated discovery of coarse-grained variables9 ebastian Kaltenbach, Phaedon-Stelios Koutsourelakisebastian Kaltenbach, Phaedon-Stelios Koutsourelakis

Timesteps in t M a ss Inferred/Predicted Mean+/- 1 Standard Deviation

Figure 4: Mass based on inferred and predicted particle densities.and dynamics based on ﬁne-grained simulation data. The FG simulation data are augmented ina fully Bayesian fashion by virtual observables that enable the incorporation of physical con-straints at the CG level. These could be for instance conservation laws that are available whenCG variables have physical meaning. Deviations from such conservation laws would invalidatepredictions. As a result of augmenting the training data with domain knowledge, the model pro-posed can learn from Small Data (i.e. shorter and fewer FG time-sequences) which is a crucialadvantage in multiscale settings where the simulation of the FG dynamics is computationallyvery expensive.Our approach learns simultaneously a coarse-to-ﬁne mapping and a transition law for thecoarse-grained dynamics by employing probabilistic inference tools for the latent variables andmodel parameters. Deep neural networks can be used in both of these components in order toendow great expressiveness and ﬂexibility.The model proposed was successfully tested on a coarse-graining task which involved stochas-tic particle dynamics. In the example presented, the method was able to accurately predict parti-cle densities at time steps not contained in the training data. Moreover, as it is able to reconstructthe entire FG state vector at any future time instant, it is capable of producing predictions ofany FG observable of interest as well as quantify the associated predictive uncertainty.A shortcoming of presented framework is that the CG dynamics are not fully interpretableand long-term stability is not guaranteed. These limitations have been addressed in [23] wherean additional layer of latent variables was employed that ensured the discovery of stable CG dy-namics but also promoted the identiﬁcation of slow-varying processes that are most predictiveof the system’s long-term evolution.

REFERENCES [1] D.Givon, R.Kupferman and A.Stuart, Extracting Macroscopic Dynamics: Model Prob-lems and Algorithms, Nonlinearity, 2004.[2] Z. Ghahraman, Probabilistic machine learning and artiﬁcial intelligence, Nature, 2015[3] Y. LeCun, Y. Bengio and G. Hinton, Deep Learning, Nature, 201510 ebastian Kaltenbach, Phaedon-Stelios Koutsourelakisebastian Kaltenbach, Phaedon-Stelios Koutsourelakis [4] P.-S.Koutsourelakis, N.Zabaras and M. Girolami, Big data and predictive computationalmodeling, Journal of Computational Physics, 2016.[5] M.Alber, A.Tepole, W.Cannon, S.De, S.Dura-Bernal, K.Garikipati, G. Karniadakis,W.Lytton, P.Perdikaris, L.Petzold and E.Kuhl, Integrating machine learning and multi-scale modeling - perspectives, challenges,and opportunities in the biological, biomedical,and behavioral sciences, NPJ digital medicine, 2019[6] S. Kaltenbach and P.-S. Koutsourelakis, Incorporating physical constraints in a deep prob-abilistic machine learning framework for coarse-graining dynamical systems, Journal ofComputational Physics, 2020[7] P.Stinis, T.Hagge, A.M. Tartakovsky and E. Yeung: Enforcing constraints for interpolationand extrapolation in generative adversarial networks, Journal of Computational Physics,2019[8] S. L. Brunton, J. L. Proctor and J. N. Kutz, Discovering governing equations from databy sparse identiﬁcation of nonlinear dynamical systems, Proceedings of the NationalAcademy of Sciences 113 (15) (2016) 3932–3937.[9] R.Chen, Y.Rubanova, J.Bettencourt and D.Duvenaud, Neural ordinary differential equa-tions, Advances in neural information porcessing systems, 2018[10] X. Li, T.-K. Wong, R. TQ Chen and D. Duvenaud, Scalable gradients for stochastic dif-ferential equations, arXiv preprint 2001.01328, 2020[11] M. Raissi and G.E. Karniadakis, Hidden physics models: Machine learning of nonlinearpartial differential equation, Journal of Computational Physics 357 (2018) 125–142.[12] P.-S. Koutsourelakis and I. Bilionis, Scalable Bayesian Reduced-Order Models for Simu-lating High-Dimensional Multiscale Dynamical Systems, Multiscale Modeling and Simu-lation, 2011[13] H.Mori, Transport, collective motion, and brownian motion, Progress of theoreticalphysics, 33(3):423–455, 1965.[14] R.Zwanzig, Nonlinear generalized langevin equations, Journal of Statistical Physics,9(3):215–220, 1973.[15] M. Sch¨oberl, N. Zabaras and P.-S. Koutsourelakis, Predictive coarse-graining, Journal ofComputational Physics 333 (2017) 49–77.[16] M. Rixner and P.-S. Koutsourelakis, A probabilistic generative model for semi-supervisedtraining of coarse-grained surrogates and enforcing physical constraints through virtualobservables, Arxiv Preprint 2006.01789, 2020[17] M.Hoffman, D.Blei, C. Wang and J.Paisley, Stochastic variational inference, The Journalof Machine Learning Research, 14(1):1303–1347, 2013.[18] C. Bishop, Pattern Recognition and Machine Learning. Springer, 2006.[19] D.Kingma and M.Welling, Auto-Encoding Variational Bayes, International Conference on11 ebastian Kaltenbach, Phaedon-Stelios Koutsourelakisebastian Kaltenbach, Phaedon-Stelios Koutsourelakis