[PDF] Gaussian processes for data fulfilling linear differential equations

Abstract

A method to reconstruct fields, source strengths and physical parameters based on Gaussian process regression is presented for the case where data are known to fulfill a given linear differential equation with localized sources. The approach is applicable to a wide range of data from physical measurements and numerical simulations. It is based on the well-known invariance of the Gaussian under linear operators, in particular differentiation. Instead of using a generic covariance function to represent data from an unknown field, the space of possible covariance functions is restricted to allow only Gaussian random fields that fulfil the homogeneous differential equation. The resulting tailored kernel functions lead to more reliable regression compared to using a generic kernel and makes some hyperparameters directly interpretable. For differential equations representing laws of physics such a choice limits realizations of random fields to physically possible solutions. Source terms are added by superposition and their strength estimated in a probabilistic fashion, together with possibly unknown hyperparameters with physical meaning in the differential operator.

Full PDF

GGaussian processes for data fulﬁlling linear differential equations

Christopher G. AlbertMax-Planck-Institut für Plasmaphysik,Boltzmannstr. 2, 85748 Garching, [email protected] 10, 2019

Abstract

A method to reconstruct ﬁelds, source strengths and physical parameters based on Gaussian processregression is presented for the case where data are known to fulﬁll a given linear differential equation withlocalized sources. The approach is applicable to a wide range of data from physical measurements andnumerical simulations. It is based on the well-known invariance of the Gaussian under linear operators,in particular differentiation. Instead of using a generic covariance function to represent data from an un-known ﬁeld, the space of possible covariance functions is restricted to allow only Gaussian random ﬁeldsthat fulﬁll the homogeneous differential equation. The resulting tailored kernel functions lead to more reli-able regression compared to using a generic kernel and makes some hyperparameters directly interpretable.For differential equations representing laws of physics such a choice limits realizations of random ﬁelds tophysically possible solutions. Source terms are added by superposition and their strength estimated in aprobabilistic fashion, together with possibly unknown hyperparameters with physical meaning in the differ-ential operator.

The larger context of the present work is the goal to construct reduced complexity models as emulatorsor surrogates that retain mathematical and physical properties of the underlying system. Similar to usualnumerical models, such methods aim to represent inﬁnite systems by exploiting ﬁnite information in someoptimal sense. In the spirit of structure preserving numerics the aim here is to move errors to the “rightplace”, in order to retain laws such as conservation of mass, energy or momentum.This article deals with Gaussian process (GP) regression on data with additional information known inthe form of linear, generally partial differential equations (PDEs). An illustrative application is the recon-struction of an acoustic sound pressure ﬁeld and its sources from discrete microphone measurements. GPs, aspecial class of random ﬁelds, are used in a probabilistic rather than a stochastic sense: approximate a ﬁxedbut unknown ﬁeld from possibly noisy local measurements. Uncertainties in this reconstruction are modeledby a normal distribution. For the limit of zero measured data a prior has to be chosen whose realizationstake values in the expected order of magnitude. An appropriate choice of a covariance function or kernelguarantees that all ﬁelds drawn from the GP at any stage fulﬁll the underlying PDE. This may require togive up stationarity of the process.Techniques to ﬁt GPs to data from PDEs has been known for some time, especially in the ﬁeld of geo-statistics [1]. A general analysis including a number of important properties is given by [2]. In these earlierworks GPs are usually referred to as Kriging and stationary covariance functions / kernels as covariograms.A number of more recent works from various ﬁelds [3, 4, 5] use the linear operator of the problem to obtaina new kernel function for the source ﬁeld by applying it twice to a generic, usually squared exponential,kernel. In contrast to the present approach, that method is suited best for source ﬁelds that are non-vanishingacross the whole domain. In terms of deterministic numerical methods one could say that the approach cor-respond to meshless variants of the ﬁnite element method (FEM). The approach in the present work insteadrepresents a probabilistic variant of a procedure related to the boundary element method (BEM), also knownas the method of fundamental solutions (MFS ) or regularized BEM [6, 7, 8]. As in the BEM, the MFS alsobuilds on fundamental solutions, but allows to place sources outside the boundary rather than localizingthem on a layer. Thus the MFS avoids singularities in boundary integrals of the BEM while retaining asimilar ratio of numerical effort and accuracy for smooth solutions. To the author’s knowledge the proba-bilistic variant of the MFS via GPs has ﬁrst been introduced by [9] to solve the boundary value problem ofthe Laplace equation and dubbed

Bayesian boundary elements estimation method ((BE) M) . This work also a r X i v : . [ phy s i c s . d a t a - a n ] S e p rovides a detailed treatment of kernels for the 2D Laplace equation. A more extensive and general treat-ment of the Bayesian context as well as kernels and their connection to fundamental solutions is availablein [10] under the term probabilistic meshless methods (PMM) .While [9] is focused on boundary data of a single homogeneous equation, and [10] provides a detailedmathematical foundation, the present work aims to explore the topic further for application and extend therecent work in [11]. Starting from general notions some regression techniques are introduced with emphasison the role of localized sources. For this purpose Poisson, Helmholtz and heat equation are considered andseveral kernels are derived and tested. To ﬁt a GP to a homogeneous (source-free) PDE, kernels are built viaaccording fundamental solutions. Possible singularities (sources) are moved outside the domain of interest.In particular, boundary conditions on a ﬁnite domain can be either supplied or reconstructed in this fashion.In addition contributions by internal sources are superimposed, using again fundamental solutions in thefree ﬁeld. For that part boundary conditions of the actual problem are irrelevant. The speciﬁc approachtaken here is most efﬁcient for source-free regions with possibly few localized sources that are representedby monopoles or dipoles. Gaussian processes (GPs) are a useful tool to represent and update incomplete information on scalar ﬁelds u ( x ) , i.e. a real number u depending on a (multi-dimensional) independent variable x . A GP with mean m ( x ) and covariance function of kernel k ( x , x (cid:48) ) is denoted as u ( x ) ∼ G ( m ( x ) , k ( x , x (cid:48) )) . (1)The choice of an appropriate kernel k ( x , x (cid:48) ) restricts realizations of (1) to respect regularity properties of u ( x ) such as continuity or characteristic length scales. Often regularity of u does not appear by chance, butrather reﬂects an underlying law. We are going to exploit such laws in the construction and application ofGaussian processes describing u for the case described by linear (partial) differential equationsˆ Lu ( x ) = q ( x ) . (2)Here ˆ L is a linear differential operator, and q ( x ) is an inhomogeneous source term. In physical laws dimen-sions of x usually consist of space and/or time. Physical scalar ﬁelds u include e.g. pressure p , temperature T or the electrostatic potential φ e . Corresponding laws under certain conditions include Gauss’ law of electro-statics for φ e with Laplacian ˆ L = ε ∆ , frequency-domain acoustics for p with Helmholtz operator ˆ L = ∆ − k or thermodynamics for T with heat/diffusion operator ˆ L = ∂∂ t − D ∆ . These operators contain free param-eters, namely permeability ε , wavenumber k , and diffusivity D , respectively. While ε may be absorbedinside q in a uniform material model of electrostatics, estimation of parameters k or D is useful for materialcharacterization.For the representation of PDE solutions the weight-space view of Gaussian process regression is useful.There the kernel k is represented via a tuple φφφ ( x ) = ( φ ( x ) , φ ( x ) , . . . ) of basis functions φ i ( x ) that underliea linear regression model u ( x ) = φφφ ( x ) T w = ∑ i φ i ( x ) w i . (3)Bayesian inference starting from a Gaussian prior with covariance matrix Σ p for weights w yields a Mercerkernel k ( x , x (cid:48) ) ≡ φφφ T ( x ) Σ p φφφ ( x (cid:48) ) = ∑ i , j φ i ( x ) Σ i j p φ j ( x (cid:48) ) . (4)The existence of such a representation is guaranteed by Mercer’s theorem in the context of reproducingkernel Hilbert spaces (RKHS) [8]. More generally one can also deﬁne kernels on an uncountably inﬁnitenumber of basis functions in analogy to (3) via f ( x ) = ˆ φ [ w ( ζζζ )] = (cid:104) φ ( x , ζζζ ) , w ( ζζζ ) (cid:105) = (cid:90) φ ( x , ζζζ ) w ( ζζζ ) d ζζζ , (5)where ˆ φ is a linear operator acting on elements w ( ζζζ ) of an inﬁnite-dimensional weight space parametrizedby an auxiliary index variable ζζζ , that may be multi-dimensional. We represent ˆ φ via an inner product (cid:104) φ ( x , ζζζ ) , w ( ζζζ ) (cid:105) in the respective function space given by an integral over ζζζ . The inﬁnite-dimensional ana-logue to the prior covariance matrix is a prior covariance operator ˆ Σ p that deﬁnes the kernel as a bilinearform k ( x , x (cid:48) ) ≡ (cid:10) φ ( x , ζζζ ) , ˆ Σ p φ ( x (cid:48) , ζζζ (cid:48) ) (cid:11) ≡ (cid:90) φ ( x , ζζζ ) Σ p ( ζζζ , ζζζ (cid:48) ) φ ( x (cid:48) , ζζζ (cid:48) ) d ζζζ d ζζζ (cid:48) . (6) The more general case of complex valued ﬁelds and vector ﬁelds is left open for future investigations in this context. ernels of the form (6) are known as convolution kernels. Such a kernel is at least positive semideﬁnite, andpositive deﬁniteness follows in the case of linearly independent basis functions φ ( x , ζζζ ) [8]. For treatment of PDEs possible choices of index variables in (4) or (6) include separation constants of an-alytical solutions, or the frequency variable of an integral transform. In accordance with [10], using basisfunctions that satisfy the underlying PDE, a probabilistic meshless method (PMM) is constructed. In par-ticular, if ζζζ parameterizes positions of sources, and φ ( x , ζζζ ) = G ( x , ζζζ ) in (6) is chosen to be a fundamentalsolution / Green’s function G ( x , ζζζ ) of the PDE, one may call the resulting scheme a probabilistic methodof fundamental solutions (pMFS) . In [10] sources are placed across the whole computational domain, andthe resulting kernel is called natural . Here we will instead place sources in the exterior to fulﬁll the ho-mogeneous interior problem, as in the classical MFS [6, 7, 8]. Technically, this is also achieved by setting Σ p ( ζζζ , ζζζ (cid:48) ) = ζζζ or ζζζ (cid:48) in the interior. For discrete sources localized ζζζ = ζζζ i one obtains againdiscrete basis functions φ i ( x ) = G ( x , ζζζ i ) for (4).More generally, according to theorem 2 of [2], for linear PDE operators ˆ L in (2) with q (cid:54) = m ( x ) with ˆ Lm ( x ) = q ( x ) , (7)ˆ Lk ( x , x (cid:48) ) = . (8)Here ˆ L acts on the ﬁrst argument of k ( x , x (cid:48) ) . Sources affect only the mean m ( x ) of the Gaussian process,whereas the kernel k ( x , x (cid:48) ) should be based on the homogeneous equation. This hints to the technique of [12]discussed in [13] chapter 2.7 to treat m ( x ) via a linear model added on top of a zero-mean process for thehomogeneous equation. In that case we consider is the superposition u ( x ) = u h ( x ) + u p ( x ) , (9) u h ( x ) ∼ G ( , k ( x , x (cid:48) )) , (10) u p ( x ) = h T ( x ) b , (11) b ∼ N ( b , B ) . (12)where h T ( x ) b is a linear model for m ( x ) with Gaussian prior mean b and covariance B for the modelcoefﬁcients. The homogeneous part (10) corresponds to a random process u h ( x ) where a source-free k isconstructed according to (8). The inhomogeneous part (11) may be given by any particular solution u p ( x ) for arbitrary boundary conditions. Using the limit of a vague prior with b = | B − | →

0, i.e. minimuminformation / inﬁnite prior covariance [12, 13], posteriors for mean ¯ u and covariance matrix cov ( u , u ) basedon given training data y = u ( X ) + σ n with measurement noise variance σ are¯ u ( X (cid:63) ) = K T (cid:63) K − y ( y − H T ¯ b ) + H T (cid:63) ¯ b = K T (cid:63) K − y y + R T ¯ b , (13)cov ( u ( X (cid:63) ) , u ( X (cid:63) )) = K (cid:63)(cid:63) − K T (cid:63) K − y K (cid:63) + R T ( HK − y H T ) − R . (14)Here X = ( x , x , . . . x N ) contains the training points, X (cid:63) = ( x (cid:63) , x (cid:63) , . . . , x (cid:63) N (cid:63) ) the evaluation or test points.Functions of X and X (cid:63) are to be understood as vectors or matrices resulting from evaluation at different po-sitions, i.e. ¯ u ( X (cid:63) ) ≡ ( ¯ u ( x (cid:63) ) , ¯ u ( x (cid:63) ) , . . . , ¯ u ( x (cid:63) N (cid:63) )) is a tuple of predicted expectation values. The matrix K ≡ k ( X , X ) is the kernel covariance of the training data with entries K i j ≡ k ( x i , x j ) and cov ( u ( X (cid:63) ) , u ( X (cid:63) )) i j ≡ cov ( u ( x (cid:63) i ) , u ( x (cid:63) j )) are entries of the predicted covariance matrix for u evaluated in the test points x (cid:63) i . Fur-thermore K y ≡ k ( X , X ) + σ I , K (cid:63) ≡ k ( X , X (cid:63) ) , K (cid:63)(cid:63) ≡ k ( X (cid:63) , X (cid:63) ) , R ≡ H (cid:63) − HK − y K (cid:63)(cid:63) , and entries of H are H i j ≡ h i ( x j ) , H (cid:63) i j ≡ h i ( x (cid:63) j ) , and ¯ b ≡ ( HK − y H T ) − HK − y y . A linear model for m ( x ) fulﬁlling a PDE according to (8) follows directly from the source representation.Consider sources to be modeled as a linear superposition over basis functions q ( x ) = ∑ i ϕ i ( x ) q i (15)with unknown source strength coefﬁcients q = ( q i ) . To model the mean instead of the source functionsthemselves, one uses an according superposition m ( x ) = ∑ i u pi ( x ) q i (16) f particular solutions u pi ( x ) from inhomogeneous equationsˆ Lu pi ( x ) = ϕ i ( x ) . (17)For the linear model (9) this means that b = q and h i ( x ) = u pi ( x ) . Posterior mean of source strengths andtheir uncertainty are ¯ q = ( HK − y H T ) − HK − y y , (18)cov ( q , q ) = ( HK − y H T ) − . (19)One can easily check that the predicted mean ¯ u ( x (cid:63) ) = ¯ u h ( x (cid:63) ) + ¯ u p ( x (cid:63) ) at a speciﬁc point x (cid:63) in (13) fulﬁllsthe linear differential equation (2). In the homogeneous part ¯ u h ( x (cid:63) ) = k ( x (cid:63) , X ) K − y ( y − H T ¯ q ) sources areabsent with ˆ L ¯ u h ( x (cid:63) ) =

0, with ˆ L acting on x (cid:63) here. The particular solution ¯ u p ( x (cid:63) ) = h T ( x (cid:63) ) ¯ q = ∑ i u pi ( x (cid:63) ) ¯ q i adds source contributions q i ϕ i ( x (cid:63) ) due to (17). For point monopole sources ϕ i ( x ) = δ ( x − x qi ) placed atat positions x qi , the particular solution u p , i ( x ) equals the fundamental solution G ( x , x qi ) evaluated for therespective source. In the absence of sources the part described in this subsection isn’t modeled and (13-14)reduce to posteriors of a GP with prior mean m ( x ) = R vanishes. Here the general results described in the previous section are applied to speciﬁc equations. Regression isperformed based on values measured at a set of sampling points x i and may also include optimization ofhyperparameters β appearing as auxiliary variables inside the kernel k ( x , x (cid:48) ; β ) . The optimization step isusually performed in a maximum a posteriori (MAP) sense, choosing β MAP as ﬁxed rather than providinga joint probability distribution function including β as random variables. We note that depending on thesetting this choice may lead to underestimation of uncertainties in the reconstruction of u , in particular forsparse, low-quality measurements. First we explore construction of kernels in (10) for a purely homogeneous problem in a ﬁnite and inﬁnitedimensional index space, depending on the mode of separation. Consider Laplace’s equation ∆ u ( x ) = . (20)In contrast to the Helmholtz equation, Laplace’s equation has no scale, i.e. permits all length scales in thesolution. In the 2D case using polar coordinates the Laplacian becomes1 r ∂∂ r (cid:18) r ∂ u ∂ r (cid:19) + r ∂ u ∂ θ = . (21)A well-known family of solutions for this problem based on the separation of variables is u = r ± m e ± im θ , (22)leading to a family of solutions r m cos ( m θ ) , r m sin ( m θ ) , r − m cos ( m θ ) , r − m sin ( m θ ) . (23)Since our aim is to work in bounded regions we discard the solutions with negative exponent that divergeat r =

0. Choosing a diagonal prior that weights sine and cosine terms equivalently [9] and introducing alength scale s as a free parameter we obtain a kernel according to (4) with k ( x , x (cid:48) ; s ) = ∞ ∑ m = (cid:18) rr (cid:48) s (cid:19) m σ m ( cos ( m θ ) cos ( m θ (cid:48) ) + sin ( m θ ) sin ( m θ (cid:48) )) = ∞ ∑ m = (cid:18) rr (cid:48) s (cid:19) m σ m cos (cid:0) m ( θ − θ (cid:48) ) (cid:1) . (24)A ﬂat prior σ m = s as a hyperparameter, yields k ( x , x (cid:48) ; s ) = − rr (cid:48) s cos ( θ − θ (cid:48) ) − rr (cid:48) s cos ( θ − θ (cid:48) ) + ( rr (cid:48) ) s = − x · x (cid:48) s − x · x (cid:48) s + | x | | x (cid:48) | s . (25) This kernel is not stationary, but isotropic around a ﬁxed coordinate origin. Introducing a mirror point ¯ x (cid:48) with polar angle ¯ θ (cid:48) = θ (cid:48) and radius ¯ r (cid:48) = s / r (cid:48) we notice that (25) can be written as k ( x , x (cid:48) ; s ) = | ¯x (cid:48) | − x · ¯x (cid:48) ( x − ¯ x (cid:48) ) , (26)making a dipole singularity apparent at x = ¯ x (cid:48) . In addition k is normalized to 1 at x =

0. Choosing s > R larger than the radius R of a circle centered in the origin and enclosing the computational domain, we have¯ r (cid:48) > s / s = s > R . Thus all mirror points and the according singularities are moved outside the domain.Choosing a slowly decaying σ m = / m , excluding m = k ( x , x (cid:48) ; s ) = −

12 ln (cid:18) − x · x (cid:48) s + | x | | x (cid:48) | s (cid:19) = − ln (cid:18) | x − ¯ x (cid:48) || ¯x (cid:48) | (cid:19) . (27)Instead of a dipole singularity that expression features a monopole singularity at x − ¯ x (cid:48) that is avoided asmentioned above.Using instead Cartesian coordinates x , y to separate the Laplacian provides harmonic functions like u = e ± κ x e ± i κ y . (28)Here all solutions yield ﬁnite values at x =

0, so we don’t have to exclude any of them a priori . Introducingagain a diagonal covariance operator in (6) and taking the real part yields k ( x , x (cid:48) ) = (cid:90) ϕ ( x , κ ) σ ( κ ) ϕ ( x (cid:48) , κ ) d κ = Re (cid:90) ∞ − ∞ σ ( κ ) e κ ( x ± x (cid:48) ) e i κ ( y ± y (cid:48) ) d κ . (29)Setting σ ( κ ) ≡ e − κ and choosing a characteristic length scale s together with a possible rotation angle θ of the coordinate frame yields the kernel k ( x , x (cid:48) ; s , θ ) =

12 Re exp (cid:18) (( x + x (cid:48) ) ± i ( y − y (cid:48) )) e i θ ) s (cid:19) . (30)Other sign combinations do not yield a positive deﬁnite kernel – similar to the polar kernel (26) before wecouldn’t obtain an fully stationary expression that depends only on differences between coordinates of x and x (cid:48) . q = ∆ ¯ u of prediction (bottom right). For demonstration purposes we consider an analytical solution to a boundary value problem of Laplace’sequation on a square domain Ω with corners at ( x , y ) = ( ± , ± ) . The reference solution is u ref ( x , y ) = e y cos x + x cos ( y ) (31)and depicted in the upper left of Fig. 1 together with the extension outside the boundaries. This ﬁgure alsoshows results from a GP ﬁtted based on data with artiﬁcial noise of σ n = . s =

2. Inside Ω the solution is represented with errors below 5%. This is also reﬂected in theerror predicted by the posterior variance of the GP that remains small in the region enclosed by measurementpoints. The analogy in classical analysis is the theorem that the solution of a homogeneous elliptic equationis fully determined by boundary values.In comparison, a reconstruction using a generic squared exponential kernel k ∝ exp (( x − x (cid:48) ) / ( s )) yields a result of similar approximation quality in Fig. 2. The posterior covariance of that reconstructionis however not able to capture the vanishing error inside the enclosed domain due to given boundary data.More severely, in contrast to the previous case, the posterior mean ¯ u doesn’t satisfy Laplace’s equation ∆ ¯ u = Ω , showing up in the difference to the reconstruction in Fig. 2. Thiskind of error is quantiﬁed by computation of the reconstructed charge density ¯ q = ∆ ¯ u . This is ﬁne if datafrom Poisson’s equation ∆ u = q with distributed charges should be ﬁtted instead. However, to keep ∆ u = Ω , one requires more specialized kernels such as (26). To demonstrate the proposed method in full we now consider the Helmholtz equation with sources ∆ u ( x ) + k u ( x ) = q ( x ) . (32)Stationary kernels based on Bessel functions for the homogeneous equation have been presented in [11].These functions provide smoothing regularization on the order of the wavelength λ = π / k and have beendemonstrated to produce excellent ﬁeld reconstruction from point measurements. Here we consider thetwo-dimensional case. The method of source strength reconstruction is improved compared to [11], as itconstitutes a linear problem according to (18-19). Non-linear optimization is instead applied to wavenumber k as a free hyperparameter to be estimated during the GP regression. q with 95% conﬁdence interval according to posterior (18-19). Negative log likelihood(bottom right) with optimum at k ML0 = .

19 for Bessel kernel [11] (solid line), whereas the actual value (dottedline) is k = .

16. The length scale of a squared exponential kernel (dashed line) is less peaked.

The setup is the same as in [11]: a 2D cavity with various boundary conditions and two sound sourcesof strengths 0.5 and 1, respectively. Results for sound pressure fulﬁlling (32) are normalized to have amaximum of p / p =

1. Fig. 3 shows reconstruction error in ﬁeld reconstruction depending on the numberof measurement positions. Here noise of σ n = .

01 has been added to the samples. The obtained negativelog-likelihood depending on k permits an accurate reconstruction of this quantity that has the physicalmeaning of a wavenumber. A generic squared exponential kernel k ∝ exp (( x − x (cid:48) ) / ( ( π / k ) )) leads toresults of similar quality and a slightly less peaked spatial length scale hyperparameter without a directphysical interpretation. Consider the homogeneous heat/diffusion equation ∂ u ∂ t − D ∆ u = . (33)for ( x , t ) ∈ R × R + . Integrating the fundamental solution G = / (cid:112) π ( t − τ ) exp (( x − ξ ) / ( ( t − τ )) from ξ = − ∞ to ∞ at τ =

0, i.e. placing sources everywhere at a single point in time, leads to the kernel k n ( x − x (cid:48) , t + t (cid:48) ; D ) = (cid:112) π D ( t + t (cid:48) ) e − ( x − x (cid:48) ) D ( t + t (cid:48) ) . (34)In terms of x this is a stationary squared exponential kernel and the natural kernel over the domain x ∈ R .The kernel broadens with increasing t and t (cid:48) . Non-stationarity in time can also be considered natural to theheat equation, since its solutions show a preferred time direction on each side of the singularity t =

0. Theonly difference of (34) to the singular heat kernel is the positive sign between t and t (cid:48) . If both of them arepositive, k is guaranteed to takes ﬁnite values. s for the Laplace equation it is also convenient to deﬁne a spatially non-stationary kernel by cuttingout a ﬁnite source-free domain. Evaluating the integral over the fundamental solution in R \ ( a , b ) withoutour domain interval ( a , b ) we obtain k n ( x , t , x (cid:48) , t (cid:48) ) = k n ( x − x (cid:48) , t + t (cid:48) ; D ) (cid:20) − g ( x , t , x (cid:48) , t (cid:48) ; D , b ) − g ( x , t , x (cid:48) , t (cid:48) ; D , a ) (cid:21) . (35)where g ( x , t , x (cid:48) , t (cid:48) ; D , s ) ≡ erf (cid:32) ( s − x ) / t + ( s − x (cid:48) ) / t (cid:48) √ D (cid:112) / t + / t (cid:48) (cid:33) . (36)Incorporating the prior knowledge that there are no domain sources could potentially improve the recon-struction. Initial investigations on the initial-boundary value problem of the heat equation based on thosekernels produce stable results showing natural regularization within the limits of the strongly ill-posed set-ting. Reconstruction of diffusivity D has proven to be a difﬁcult task and requires further investigations. Summary and Outlook

A framework for application of Gaussian process regression to data from an underlying partial differentialhas been presented. The method is based on Mercer kernels constructed from fundamental solutions andproduces realizations that match the homogeneous problem exactly. Contributions from sources are super-imposed via an additional linear model. Several examples for suitable kernels have been given for Laplace’sequation, Helmholtz equation and heat equation. Regression performance has been shown to yield results ofsimilar or higher quality to a squared exponential kernel in the considered application cases. Advantages ofthe specialized kernel approach are the possibility to represent exact absence of sources as well as physicalinterpretability of hyperparameters.In a next step reconstruction of vector ﬁelds via GPs could be formulated, taking laws such as Maxwell’sequations or Hamilton’s equations of motion into account. A starting point could be squared exponentialkernels for divergence- and curl-free vector ﬁelds [14]. Such kernels have been used in [15] to perform sta-tistical reconstruction, and [16] apply them to GPs for source identiﬁcation in the Laplace/Poisson equation.In order to model Hamiltonian dynamics in phase-space, vector-valued GPs could possibly be extended torepresent not only volume-preserving (divergence-free) maps but retain full symplectic properties, therebyconserving all integrals of motion such as energy or momentum.

Acknowledgments

I would like to thank Dirk Nille, Roland Preuss and Udo von Toussaint for insightful discussions. Thisstudy is a contribution to the

Reduced Complexity Models grant number ZT-I-0010 funded by the HelmholtzAssociation of German Research Centers.

References [1] A. Dong, “Kriging Variables that Satisfy the Partial Differential Equation ∆ Z = Y,” in

Geostatistics ,pp. 237–248, 1989.[2] K. G. van den Boogaart, “Kriging for processes solving partial differential equations,” in

IAMG2001,Cancun, Mexiko , no. July, pp. 1–21, 2001.[3] T. Graepel, “Solving noisy linear operator equations by Gaussian processes: Application to ordinaryand partial differential equations,” in

Proc. Int. Conf. Mach. Learn. (T. Fawcett and N. Mishra, eds.),pp. 234–241, 2003.[4] S. Särkkä, “Linear Operators and Stochastic Partial Differential Equations in Gaussian Process Regres-sion,”

Lect. Notes Comput. Sci. (including Subser. Lect. Notes Artif. Intell. Lect. Notes Bioinformatics) ,vol. 6792 LNCS, no. PART 2, pp. 151–158, 2011.[5] M. Raissi, P. Perdikaris, and G. E. Karniadakis, “Inferring solutions of differential equations usingnoisy multi-ﬁdelity data,”

J. Comput. Phys. , vol. 335, pp. 736–746, 2017.[6] K. Lackner, “Computation of ideal MHD equilibria,”

Comput. Phys. Commun. , vol. 12, no. 1, pp. 33–44, 1976.[7] M. A. Golberg, “The method of fundamental solutions for Poisson’s equation,”

Eng. Anal. Bound.Elem. , vol. 16, no. 3, pp. 205–213, 1995.

8] R. Schaback and H. Wendland, “Kernel techniques: From machine learning to meshless methods,”

Acta Numer. , vol. 15, pp. 543–639, 2006.[9] F. M. Mendes and E. A. da Costa Junior, “Bayesian inference in the numerical solution of Laplace’sequation,” in

AIP Conf. Proc. , vol. 1443, pp. 72–79, 2012.[10] J. Cockayne, C. Oates, T. Sullivan, and M. Girolami, “Probabilistic Numerical Methods for PartialDifferential Equations and Bayesian Inverse Problems,” arXiv Prepr. , 2016.[11] C. Albert, “Physics-informed transfer path analysis with parameter estimation using Gaussian pro-cesses,” in , 2019.[12] A. O’Hagan, “Curve Fitting and Optimal Design for Prediction,”

J. R. Stat. Soc. Ser. B , vol. 40, no. 1,pp. 1–24, 1978.[13] C. E. Rasmussen and C. K. I. Williams,

Gaussian Processes for Machine Learning . MIT Press, 2006.[14] F. J. Narcowich and J. D. Ward, “Generalized Hermite Interpolation Via Matrix-Valued ConditionallyPositive Deﬁnite Functions,”

Math. Comput. , vol. 63, no. 208, p. 661, 1994.[15] I. Macêdo and R. Castro, “Learning divergence-free and curl-free vector ﬁelds with matrix-valuedkernels,”

Inst. Nac. Mat. Pura e Apl. Bras. Tech. Rep , 2008.[16] A. D. Cobb, R. Everett, A. Markham, and S. J. Roberts, “Identifying Sources and Sinks in the Presenceof Multiple Agents with Gaussian Process Vector Calculus,”

Proc. 24th ACM SIGKDD Int. Conf.Knowl. Discov. Data Min. - KDD ’18 , pp. 1254–1262, 2018., pp. 1254–1262, 2018.