[PDF] Field dynamics inference for local and causal interactions

Abstract

Inference of fields defined in space and time from observational data is a core discipline in many scientific areas. This work approaches the problem in a Bayesian framework. The proposed method is based on statistically homogeneous random fields defined in space and time and demonstrates how to reconstruct the field together with its prior correlation structure from data. The prior model of the correlation structure is described in a non-parametric fashion and solely builds on fundamental physical assumptions such as space-time homogeneity, locality, and causality. These assumptions are sufficient to successfully infer the field and its prior correlation structure from noisy and incomplete data of a single realization of the process as demonstrated via multiple numerical examples.

Full PDF

FField dynamics inference for local and causal interactions

Philipp Frank* Reimar Leike Torsten A. Enßlin

Ph. Frank, R. Leike, PD. Dr. T. A. EnßlinMax-Planck Institut f¨ur Astrophysik, Karl-Schwarzschild-Straße 1, 85748, Garching, GermanyLudwig-Maximilians-Universit¨at M¨unchen, Geschwister-Scholl-Platz 1, 80539, M¨unchen, [email protected]:

Information theory, Probability theory, Stochastic processes, Data analysis, Time series anal-ysis, Bayesian methods

Inference of ﬁelds deﬁned in space and time from observational data is a core discipline in many scientiﬁc areas. This work approachesthe problem in a Bayesian framework. The proposed method is based on a Gaussian process prior for ﬁelds and demonstrates how toreconstruct the ﬁeld together with its prior correlation structure from data. The prior model of the correlation structure is describedin a non-parametric fashion and solely builds on fundamental physical assumptions such as space-time homogeneity, locality, andcausality. These assumptions are suﬃcient to successfully infer the ﬁeld and its prior correlation structure from noisy and incompletedata of a single realization of the process as demonstrated via multiple numerical examples.

Modeling as well as inferring quantities deﬁned in space and time on the basis of observational data thereofhas always been at the very core of many scientiﬁc areas. In recent years, astrophysical imaging beganto become sensitive to the temporal dimension, in addition to the spatial ones. This is due to the factthat although large astrophysical objects such as galaxies appear to be static on observational timescales,small objects such as stars and binary black holes exhibit periodic modulations of the emission on ob-servable timescales.In addition, the spatio-temporal correlation structure of non-astronomical systems plays a central rolein the calibration of modern telescopes. In particular the temporal variability of these systems is usedin order to identify and distinguish them from the typically static astronomical object of interest. As aprominent example, modern radio telescopes such as LOFAR [1] or the upcoming SKA are limited inresolution by the deﬂection of incoming radio signals due to the ionosphere. The strength of these dis-tortions ultimately depends on the electron density of the ionosphere. As this density is not known forall observed locations x at time t , it has to be inferred along with the incoming ﬂux. Typically, the elec-tron density is probed via observing a calibration target with known ﬂux at location x (cid:48) and time t (cid:48) . There-fore, it is also necessary to make a statement about the correlation structure of the electron density inorder to extrapolate the information gained at ( t (cid:48) , x (cid:48) ) to the space-time location ( t, x ) where the actualobservation is made.In order to tackle these as well as other inference problems of this kind in space and time, we have toperform inference of continuous quantities, or ﬁelds, from a ﬁnite set of measurement data. This prob-lem is in general ill-posed, as we aim to constrain inﬁnite degrees of freedoms (dofs) on ﬁnite, usuallyalso noisy, measurements. Consequently we rely on Bayesian inference, more precisely on InformationField Theory (IFT) [2], and use this language to encode prior knowledge about the system under consid-eration. Typically, this prior knowledge is incomplete, meaning that there exist a set of unknown hyper-parameters, such as the spatio-temporal prior correlation structure, which have to be inferred along withthe ﬁeld of interest. We will outline in this paper how physically motivated concepts such as spatio-temporalhomogeneity, locality as well as causality can be encoded into the prior correlation structure. Further-more, we will demonstrate that the resulting prior is restrictive enough to perform inference on the ba-sis of noisy and incomplete data of a single realization of the process, while still being ﬂexible enough tocapture complicated correlation structures.Traditionally there exist two diﬀerent pictures on random ﬁelds in space-time, one results in space andtime being treated separately [3, 4], while the other models the ﬁeld as deﬁned over a single space, namelyspace-time [5, 6] (see e.g. [7] for an extensive discussion). In this work we rely on the latter picture. Con-sequently the corresponding inference problems can be regarded as the task of inferring a ﬁeld deﬁned ona single space, given a ﬁnite amount of measurements in this space. a r X i v : . [ phy s i c s . d a t a - a n ] O c t o this end, in section 2 we start with a brief introduction to IFT and the notation used in this work. Insection 3 we discuss how to encode our prior knowledge about the ﬁeld and its correlation structure intoa joint prior distribution thereof. This prior is then used in section 4 in order to solve the correspond-ing inference problem and the performance of the resulting algorithm is demonstrated in section 5. Ulti-mately, in section 6, we conclude the paper with a brief summary of the proposed concepts. In this work, we deal with ﬁelds s x deﬁned on some domain Ω with Ω ⊂ R n +1 , where n ∈ N denotes thenumber of spatial dimensions and x ≡ ( t, x ) labels the space-time coordinates of Ω. In order to constructa probability distribution for s we deﬁne a scalar product for ﬁelds as a † b ≡ (cid:90) Ω a ∗ x b x d x . (1)Consequently, applications of linear operators are denoted as b x = ( Oa ) x = O xx (cid:48) a x (cid:48) = (cid:90) Ω O xx (cid:48) a x (cid:48) d x (cid:48) , (2)where we also introduced the continuous version of the Einstein sum convention. This can be used in or-der to deﬁne for example a Gaussian distribution with mean m and covariance D for a ﬁeld s as P ( s ) = G ( s − m, D ) ≡ | πD | e − ( s − m ) † D − ( s − m ) , (3)where | • | denotes the functional determinant. (For further details see e.g. [2]). These deﬁnitions aresuﬃcient in order to deﬁne probability distributions for ﬁelds. To perform inference, however, we addi-tionally need to introduce the concept of a measurement. Before we turn to the actual inference problemof reconstructing a ﬁeld together with its correlation structure, we outline the simpler problem of infer-ring a ﬁeld, given its correlation structure ﬁrst. For the sake of simplicity, in addition, we stick to linearmeasurements of ﬁelds throughout this work. However a generalization to non-linear measurements ispossible (see e.g [8]). Consider a linear measurement of s with additive Gaussian distributed noise n , deﬁned as d i = R ix s x + n i . (4)Here the measurement response R maps from the continuous space Ω, called signal space, to the discretespace of the measurements d , called data space. If we assume s to be a priori distributed as P ( s ) = G ( s, S ) , (5)with zero mean and known covariance S and furthermore denote P ( n ) as P ( n ) = G ( n, N ) , (6)with known covariance N , the resulting inference problem can be solved analytically as the posterior dis-tribution is again a Gaussian distribution and given as P ( s | d ) = G ( s − m, D ) , (7) .2 Useful coordinate transformation with D = ( R † N − R + S − ) − , (8) m = Dj = D R † N − d . (9)This resembles the solution of the famous Wiener ﬁlter equations [9], applied to a ﬁeld theoretical set-ting.Note that although the posterior solution is available in closed form analytically, for many interestingcombinations of R, N, S the posterior covariance D cannot be calculated analytically due to the opera-tor inversion. Consequently the application of D is ultimately approximated numerically on a discretizedgrid, and via linear solvers such as the conjugate gradient method. All results in this work are imple-mented using the Python package NIFTy [10]. From a theoretical perspective the above linear inference problem is solved completely. In practice, how-ever, an equivalent reformulation of the above turns out to be typically more eﬃcient numerically, andalso provides a diﬀerent perspective on the inference problem.Consider a generic coordinate transformation of the form s = F [ ξ s ] , (10)with F invertible. Note that this is a coordinate transformation in the inﬁnite dimensional conﬁgurationspace of s . Using the transformation rule for probabilities P ( s | d ) D s = P ( ξ s | d ) D ξ s , (11)we get that P ( ξ s | d ) = P ( s | d ) | s = F [ ξ s ] (cid:12)(cid:12)(cid:12)(cid:12) ∂F [ ξ s ] ∂ξ s (cid:12)(cid:12)(cid:12)(cid:12) . (12)Furthermore if we consider a linear transformation s = G ξ s , (13)with G such that S = GG † , (14)we ﬁnd after some calculation that the transformed posterior P ( ξ s | d ) reads P ( ξ s | d ) = G ( ξ s − ˜ m, ˜ D ) , (15)with ˜ D = ( G † R † N − RG + ) − , (16)˜ m = Dj = ˜ D G † R † N − d . (17)Furthermore, the transformed prior distribution P ( ξ s ) turns out to be P ( ξ s ) = G ( ξ s , ) . (18)The ﬁrst observation is that the spectrum of the transformed posterior covariance ˜ D is bounded fromabove by 1. This renders numerical inversion of this operator to be faster and also typically more sta-ble (see [11] for further details). In addition, the transformed dofs ξ s are independent and identically dis-tributed (i.i.d.), and eq. (13) deﬁnes a generative process for s . In a space time setting, this allows foran interpretation of ξ s as an external force (excitation) applied to a system, while G models the dynamicresponse (Green’s function) of the system to these excitations. As our goal is to construct a physicallymotivated prior for ( s, S ), this formulation allows to directly encode our prior belief about how a typicalresponse of a dynamic system to excitations should look like. We will therefore proceed to formulate aprior for ( ξ s , G ) rather than for ( s, S ) as it appears to be a more convenient choice. Note that the origi-nal quantities ( s, S ) can at any point be recovered using eqs. (13) and (14). Prior

In the introduction we proposed three diﬀerent concepts we aim to encode into a prior for G in order toreduce its complexity. These are • Statistical space-time homogeneity • (Spatio-) temporal causality • Locality.The ﬁrst concept implies that G (and consequently also S ) is the same for all space-time locations. I.e. G xx (cid:48) = g ( x − x (cid:48) ) . (19)From the Wiener-Khinchin theorem [9] we get that G is diagonal in Fourier space and its eigen-spectrumis given via the Fourier transformed g . This reads G kk (cid:48) ∝ δ ( k − k (cid:48) ) ( F g ) k , (20)with F kx ≡ e ikx , (21)where k = ( ω, k ) denotes the coordinates of the Fourier space corresponding to Ω. This assumptiongreatly reduces the complexity of the inference problem since we have to infer two ﬁelds ξ s and g , ratherthan one ﬁeld and one linear operator.The second concept, causality, is introduced via an additional constraint on G . As G should model theresponse of the system to the excitations ξ s , it should not contain a response at times before an excita-tion happens. This can be formulated mathematically as G xx (cid:48) = Θ( t − t (cid:48) ) g ( x − x (cid:48) ) , (22)where Θ denotes a step function in time. This trivially ensures that no response happens before an exci-tation.In space-time, the physical concept of causality also implies a maximal ﬁnite propagation speed of inter-actions. This leads to propagation within a light cone, as depicted in ﬁgure 1. Therefore, given a max-imal propagation speed c , we can restrict the reconstruction to Greens functions that are non-zero onlywithin the light cone. We can implement this constraint by extending eq. 22 as G xx (cid:48) = Θ( t − t (cid:48) ) Θ( l ) g ( x − x (cid:48) ) , (23)where l = ( t − t (cid:48) ) − ( x − x (cid:48) ) † C − ( x − x (cid:48) ) . (24)This ensures that the propagator is only non-zero within the light cone. In general we might expect dif-ferent maximal propagation speeds in diﬀerent directions, and consequently C becomes a symmetric ten-sor. In case propagation is isotropic in space with speed c we have C ij = c δ ij . Note that even in caseswhere we do not know C , for example if the medium in which the interaction is realized is unknown,such a constraint can also be useful if we elevate C to be an additional unknown parameter that has tobe inferred. Since C is assumed to be the same for all scales, an inference algorithm can use large scaleinformation, where the signal to noise ration (SNR) is usually higher, to determine C , which then eﬀec-tively increases the SNR for smaller scales. A detailed description how C enters the reconstruction canbe found in appendix A.In order to encode the last concept, locality, we ﬁrst have to revisit some properties of Green’s functions.Consider a system that can be described via L s = 0 , (25) here L is a linear diﬀerential operator. The response G of such a system to an external force has to ful-ﬁll ( L G ) xx (cid:48) = δ ( x − x (cid:48) ) . (26)In order to give rise to a homogeneous G , L also has to be homogeneous and therefore can be written as L xx (cid:48) = q ( ∂ x µ ) δ ( x − x (cid:48) ) , (27)where q ( ∂ x µ ) is a diﬀerential operator encoding function, and µ is an index of the coordinates x = ( t, x ).Its Fourier space representation takes the form L kk (cid:48) = f k δ ( k − k (cid:48) ) , (28)where we also introduced the complex valued operator encoding function f k ≡ q ( ik ). Consequently G takes the form G kk (cid:48) = δ ( k − k (cid:48) ) f k , (29)and therefore the eigen-spectrum of G reads g k = 1 f k . (30)It turns out that locality can be encoded more intuitively in terms of a prior for f . As we can see in Fig-ure 2, the locality of a response of the system to an excitation is related to the order in the derivativesof the diﬀerential operator L . Low order derivatives result in an almost instantaneous response of thesystem while higher order derivatives lead to an apparent non-local response. Therefore we seek to for-mulate a prior for f such that lower order derivatives are a priori favoured. Nevertheless it should bepossible that f can deviate from this assumption if there is enough evidence in the data to support this.What we aim to do is to impose a certain degree of smoothness for f . We can construct a prior for f k such that it punishes second and higher order derivatives of k . One way to do so is to propose a Gaus-sian process prior P ( f ) = G ( f, T ) = 1 | πT | e − f † T − f , (31)where the exponential factor reads − f † T − f = − σ (cid:90) |(cid:52) k f k | dk , (32)where ∆ k denotes the Laplace operator and σ is an overall scaling parameter in order to steer the strengthof this prior. For further details see [12, 13]. Consequently an f being a diﬀerentiable function of its ar-guments is a priori favoured, and therefore the most local conﬁguration that is consistent with the datais the one selected during reconstruction. This is perfectly motivated physically, since we know that fordirect (classical) interactions, locality is a fundamental concept. Consequently apparent non-localities inthe response indicate additional hidden layers within the system, since the excitation does not inﬂuencethe observed quantity s directly, but couples to an additional unobserved quantity which then couplesto s . Such a conﬁguration is of course possible and perfectly realizable, but if the system can also be de-scribed via a more direct interaction, the corresponding model is favoured.All in all, we can combine the above concepts to end up with a generative description of our prior whichtakes the form s = G ξ s = F − (cid:98) g F ξ s , (33)where (cid:98) g denotes a diagonal operator in Fourier space with g k on its diagonal. Furthermore g = F (cid:100) Θ( t ) (cid:100) Θ( l ) F − f , (34)with f being a priori distributed according to eq. (31) while here the (cid:98) Θ are diagonal operators in posi-tion space. .1 Prior distributions for excitations Since the set of Gaussian distributions is closed under linear maps, we get that if ξ s is a priori Gaussiandistributed, the prior for s is also a Gaussian. In particular for P ( ξ s ) = G ( ξ s , ) , (35)we got that P ( s ) is a Gaussian with zero mean and covariance S = GG † (see eq. (14)).From the perspective of s being the result of a dynamic response G to external excitations ξ s it is notnecessary that ξ s is Gaussian distributed. In fact, in many applications it might be more realistic to de-ﬁne a diﬀerent prior for the excitations. In this case, the prior statistics of s is also non-Gaussian, butcan be calculated from P ( ξ s ). In order to demonstrate the implications of a non-Gaussian prior, in sec-tion 5, we will show results for the inference problem in cases where ξ s is distributed according to aninverse-gamma distribution at each location. This prior is typically used as a sparsity prior, and in ourcase results in a system that is subject to sparse external excitations. Of course, many other prior distri-butions are also reasonable and important, but for the sake of simplicity we will stick to ξ s being eitherGaussian or inverse-gamma distributed in the examples. In section 2.1 we already discussed the inference problem in the case of a linear measurement and a Gaus-sian prior with known covariance. Now, with the appropriate prior for the covariance at hand, we canset up the task of inferring the covariance together with the ﬁeld, given observational data about theﬁeld. To this end, consider again a linear measurement as deﬁned in eq. (4) which in the transformedcoordinates (

G, ξ s ) takes the form d = R G f ξ s + n , (36)where G f denotes the Green’s function as a function of f , with an eigen-spectrum as deﬁned in eq. (34).If we again assume n being Gaussian distributed with zero mean and covariance N we get that the jointdistribution of ( d, ξ s , f ) reads P ( d, ξ s , f ) = G ( d − R s ( f, ξ s ) , N ) G ( f, T ) P ( ξ s ) , (37)where we also introduced s ( f, ξ s ) ≡ G f ξ s to indicate the dependency of s on the quantities of interest.The posterior P ( ξ s , f | d ) is proportional to the joint distribution up to a factor that only depends on d since P ( d, ξ s , f ) = P ( ξ s , f | d ) P ( d ) ∝ P ( ξ s , f | d ) . (38)It turns out that this posterior is intractable, due to the non-linear dependency of s on ( f, ξ s ). Conse-quently the corresponding inference problem cannot be solved analytically and we have to rely on a nu-merical approximation. In this work, we use a variational approximation algorithm called Metric Gaus-sian Variational Inference (MGVI) [14] where we approximate the true posterior with a Gaussian distri-bution in order to get an estimate for the mean and the covariance of the posterior. In general, variational inference can be described as the task of approximating one probability distribu-tion P ( x ) for some quantity x with another distribution Q σ ( x ) which, in addition, is deﬁned up to a setof parameters σ . Approximation is then achieved via minimizing the Forward Kullbach-Leibler diver-gence (KL) with respect to σ . The KL is deﬁned asKL( Q σ , P ) ≡ (cid:90) Q σ ( x ) log (cid:18) Q σ ( x ) P ( x ) (cid:19) d x = (cid:104) H P (cid:105) Q σ − (cid:104) H Q σ (cid:105) Q σ , (39) .1 Variational Inference where we also introduced the so called Information Hamiltonian H deﬁned as H P = − log( P ) . (40)In our case, the approximate distribution Q is chosen to be a Gaussian distribution in a = ( f, ξ s ) withmean m and covariance A . For many relevant inference problems, and also for the one studied in thiswork, this approximation cannot be performed analytically as typically (cid:104) H P (cid:105) Q σ cannot be calculated an-alytically. Therefore, similar to the linear problem studied in section 2.1, we choose an appropriate dis-cretization for the space-time domain Ω, and perform a variational approximation of the correspondingdiscrete problem. It turns out that in order to achieve a reasonable resolution in space-time, the infer-ence problems can become very high dimensional. As an example, in section 5, we show an applicationfor a discretized 1+1 dimensional space-time with a resolution of 256 ×

200 pixels. This yields that thenumber of dofs in a is 2 × × ∝ and consequently the number of entries in A are ∝ which renders an explicit representation of A on a computer to be ineﬃcient generally. Therefore, follow-ing [14], we avoid an explicit representation by setting it to be equal to the inverse Fisher Metric M − [15] of the posterior, evaluated at m . For the posterior distribution as deﬁned in eqs. (37) and (38), to-gether with a Gaussian prior for ξ s with zero mean and unit covariance, we get that M takes the form M = (cid:20)(cid:18) ∂s∂ξ s ∂s∂f (cid:19) N − (cid:0) ∂s∂ξ s ∂s∂f (cid:1) + (cid:18) T − (cid:19)(cid:21) ( ξ s ,f )= m . (41)Using the deﬁnition of s (eq. (33)) we get that ∂s∂ξ s = F − (cid:98) g F , (42) ∂s∂f = F − (cid:100) F ξ s F (cid:100) Θ( t ) (cid:100) Θ( l ) F − (cid:100) − / f . (43)Therefore, together with the deﬁnition of T (eq. (32)) we see that M has an implicit representation, i.e.it can be applied solely using Fourier transformations and diagonal operations, avoiding the explicit stor-age of this matrix at any point. Consequently the application of A can be achieved via linear solvers. Inaddition, the structure of M allows for an eﬃcient sampling of the approximate distribution Q . Thisis another important property since the KL involves expectation values w.r.t Q , which can be approxi-mated via samples from Q . Ultimately, the Fisher metric is also a measure for the local curvature of theKL and therefore enables us to use second order optimization schemes to solve the corresponding opti-mization problem in m .As discussed in the previous section, in some cases space-time causality can only be imposed if we alsoinfer the propagation speed C . To do so, we notice that given a prior for C , the above still applies withthe extension that s is now also a function of C . Therefore the Fisher metric gets an additional entrywith the same structure as in eq. (41), for the corresponding gradient ∂s / ∂ C (See appendix A for furtherdetails).All in all, the solution strategy as deﬁned in MGVI, starting from an random initialized m , can be sum-marized as: • Using M , evaluated at the current approximate mean m , draw a set of samples { a ∗ } from the ap-proximate distribution Q . • With these samples, calculate an estimation for the current value of the KL and its gradient. • Together with the metric M perform a second order Newton minimization step in order to get anew estimate for m . • Repeat this procedure by re-evaluating everything using the updated m , until convergence. Application

In order to demonstrate the applicability of our method we apply it to several synthetic data examples.First, we demonstrate the performance of the algorithm in a one dimensional setting, where we only aimto infer the Green’s function of the system, given the excitations and data. In the second example weperform a reconstruction of the excitations as well as the Green’s function from data alone, in a 1+1 di-mensional setting. In the last example we add another level of complexity via non-Gaussian statistics inthe excitations. Hereby we perform source detection, the task of inferring sparse excitations at variouslocations in space-time, in a case where also the Green’s function is unknown.All examples were implemented using the NIFTy software package in its version 5 [16]. We included animplementation of the here introduced prior structure for the Green’s function in this publicly availablepackage.

In our ﬁrst application, shown in ﬁgure 3, we aim to demonstrate the inference of the dynamics encodingﬁeld f alone in a one dimensional setting where there is only a temporal evolution to be reconstructed.The excitation ﬁeld ξ s is known during inference. We generate synthetic data according to eq. 36, with R being the identity. The excitations were drawn from an inverse gamma distribution to model known,sparse excitations of the system (e.g. in a laboratory setting where the unknown system is driven viasparse excitations). The synthetic signal s as well as corresponding data d is shown in the top-left panelof ﬁgure 3. The dynamic operator used to generate signal and data is of the form: L = ( ∂ t + m + γ∂ t ) ( ∂ t + ˜ m + ˜ γ∂ t ) , (44)with ( m, γ, ˜ m, ˜ γ ) = (0 . , . , . , . f given d and ξ s . The results are shown in ﬁgure 3 as well.We see that the reconstruction of the Green’s function behaves as expected and is in agreement with theground truth in the temporal domain, within uncertainties. Due to the fact that the reconstructed dy-namics is uncertain, the recovered signal also has uncertainty although the excitations are known. Thereconstructed Green’s function indicates that it is indeed possible to reconstruct an apparent non-localresponse of the system (due to higher order derivatives in this setup) since the true propagator as wellas its reconstruction show oscillations that grow in the beginning of the propagator before decaying ex-ponentially. We also notice that there is relatively high uncertainty in the ﬁrst timesteps of the response G t . This is caused by the low initial response to excitations of the true propagator. The initial part ofthe reconstructed propagator is purely dominated by noise and thus only constrained up to the noiselevel (the standard deviation of the noise σ n is set to be σ n = 20 and the temporal domain is discretizedvia 512 equidistant pixels).We also notice that the posterior solution for the spectrum levels out for high-frequency modes (large ω ) below the signal to noise ratio, while the true spectrum continues to decay (ultimately also the truespectrum levels out in the numerical example due to the ﬁnite size of the considered space). The un-certainty increases in this region, but not enough to capture the true solution. Here we notice the limi-tations of the variational inference, which provides a local approximation of the posterior with a Gaus-sian. Consequently the true uncertainty might be underestimated, as in this case. However this devia-tion occurs orders of magnitudes below the peak of the spectrum and therefore has only a barely visibleeﬀect on the reconstruction of the signal. One way to allow for a better extrapolation to higher frequen-cies would be to provide a more restrictive prior for the dynamics encoding ﬁeld e.g. by deﬁning it on apolynomial basis which is a more suitable basis for this particular setup as the true dynamics is also de-scribed in terms of a polynomial. However we aim to provide a general and less restrictive approach herecapable of also reconstructing non-polynomial dynamics encoding functions and consequently being lessable to extrapolate to regions where we have no information provided by data.To further quantify the reconstruction error, we also investigate the residual as well as the correspondingposterior uncertainty for the signal s t and the time representation of the Greens function G t (see ﬁgure .2 Space-time evolution σ n = 20, we also show the residuals and uncertainties for variousother noise levels. For a better comparison, no other changes where made during reconstruction. In par-ticular also the random number generator used to generate the synthetic data as well as the approximateposterior samples during reconstruction was seeded with the same random seed for all runs. We noticethat the posterior uncertainty appears to be on a reasonable scale as the residual is within the one ortwo sigma conﬁdence interval for almost all cases. In addition, we notice that the posterior uncertaintyof the signal s is particularly high in regions right after a strong excitation happened. This is due to thefact that the reconstruction of the Green’s function G , which is the response to these excitations, is mostuncertain for the ﬁrst timesteps. This uncertainty propagates into the uncertainty of s . In our second example, shown in ﬁgure 5, we aim to reconstruct the dynamics as well as the excitationsin a spatio-temporal (1+1 dimensional) setting from incomplete and noisy observations. We aim to inferthe dynamical ﬁeld f , the propagation speed c , and the excitations ξ from noisy and incomplete mea-surements d of the ﬁeld s alone. The dynamical system used for the generation of synthetic data is aproduct of a damped harmonic oscillator and an advection-diﬀusion generating term. This reads L = ( ∂ t − c ∂ x + m + α∂ t − β∂ x ) ( ∂ t + ˜ m − γ∂ x + ˜ β∂ x ) . (45)Furthermore, the excitations are Gaussian distributed with zero mean and unit covariance from which asingle realization was drawn and convolved with the synthetic Green’s function corresponding to L . Thesignal s , the data d and the reconstruction of s are shown in ﬁgure 5 for a case with ( c, m, α, β, ˜ m, γ, ˜ β ) =(0 . , . , − . , − . , . , . , . ≈ σ n = 10. The space-time is discretized via a regular grid with 256 ×

200 pixels, respectively.The reconstruction algorithm is capable of reconstructing the signal in regions where we have observa-tions thereof, while being relatively blind in unobserved regions. Consequently the posterior uncertaintyis higher there. In addition, we notice that unlike the posterior mean, posterior samples consistently ﬁllunobserved regions. Although in these regions the samples deviate strongly from the true signal, theyobey the same statistical properties, which is important for posterior analysis. The results indicate thatthe statistical properties (and ultimately the dynamics encoding ﬁeld) are correctly reconstructed.As a validation, in ﬁgures 6 and 7, we compare the reconstructed propagator as well as the correspond-ing spectrum with the underlying ground truth. The spectrum is comparable to the ground truth in re-gions with suﬃcient SNR while it levels out in regions where we have no information given via data. Inaddition, the reconstructed propagator also shows oscillations consistent with the true propagator. How-ever we notice that modes that propagate “downwards” are reconstructed well while the weaker “up-wards” propagating modes are not reconstructed due to the fact that they are below the noise level. Inaddition, we see that deviations in posterior samples of the propagator only occur within a “cone” andremain zero outside. This is due to the fact that we also reconstruct the maximal propagation speed ofthe process, which is reconstructed to be c ≈ .

45 ( ± .

08) enclosing the correct value of c = 0 . In our last example we aim to perform source detection in the excitation ﬁeld, in a case where also thedynamic response is unknown. To demonstrate this scenario we again generate a 1+1-dimensional syn-thetic example where in this case we assume that we are only able to measure the temporal evolution ofthe system at several locations. In particular we measure the temporal evolution at 50 randomly selectedlocations of the space under consideration. This results in ≈

80% of the discrete space-time being un-observed, as the resolution is the same as in the previous example. As before, we assume that the mea-surements are subject to additive Gaussian noise with σ n = 0 . t t = 0. The resulting data is shown in the top-left panel of ﬁgure 8. The unknown excitations are in-verse gamma distributed to model strong but sparse excitations. We infer those from measurements ofthe system at multiple locations together with the dynamics encoding function f . The system used togenerate the data exhibits damped traveling waves described by L = ∂ t − c ∂ x + m + α∂ t − β∂ x , (46)with ( c, m, α, β ) = (1 . , . , − . , . In this work we considered the problem of reconstructing a Gaussian random ﬁeld s , deﬁned in spaceand time, together with its correlation structure S from noisy and incomplete data d about s . We haveshown that this inference problem can be reformulated to a (theoretically) equivalent problem wherewe aim to infer an excitation ﬁeld ξ s as well as the dynamic response G . Ultimately the eigen-spectrumof G was encoded in the dynamics encoding ﬁeld f and the propagation-speed encoding parameter C which, together with ξ s , denote the quantities of interest. Furthermore we proposed a Gaussian processprior for f which gives rise to a non-parametric description of the dynamic response G . In addition, to-gether with a Gaussian prior for ξ s , this gives rise to a Gaussian process prior for s . As the proposedmethod is also applicable for non-Gaussian prior distributions for ξ s it can also give rise to a variety ofother prior distributions for s . This ﬂexibility is discussed for the example of an inverse gamma distribu-tion for the excitations.To demonstrate its applicability, the proposed method is then applied to several synthetic data exam-ples. These include a one dimensional example where the excitations were known and only G had to beinferred, a 1+1 dimensional example with unknown Gaussian distributed excitations as well as an exam-ple with inverse gamma distributed ξ s and a measurement response that is sparse in the spatial domain.As we restricted the prior assumptions for G to the physically motivated concepts of space-time homo-geneity, locality, and causality, the method appears to be applicable in a wide range of problems. Oneparticular strength is the non-parametric formulation of the Green’s function. This becomes importantin scenarios where physical models cannot provide a simple parametric description of evolution so far, todescribe the Green’s function. In addition, a probabilistic description of excitations is suﬃcient for infer-ence. Consequently the method is still applicable in cases where external inﬂuences cannot be describedcompletely.All in all, we believe that the proposed method is capable of dealing with current as well as upcominginference problems involving ﬁelds deﬁned over space and time, arising from the context of astrophysicalimaging. Light cone prior on a discretized space

As discussed in section 3 the concept of causality in space-time results in a restriction of propagationwithin a light cone. In a continuous description, this restriction is realized via a convolution with a stepfunction of the form Θ( l ) = Θ (cid:0) t − x † C − x (cid:1) . (47)Due to the fact that our calculations are ultimately performed on a ﬁnite grid, this deﬁnition appears tobe somewhat problematic, as it introduces boundary eﬀects along the edges of the step function, whenrealized on a discretized space. In addition, if we aim to elevate C to be an unknown parameter of theproblem that has to be inferred, gradient based methods are no longer applicable due to the fact thatthe gradient is zero almost everywhere in space-time (or not deﬁned on the boundary). Therefore weseek to ﬁnd a way to relax the sharp boundary introduced via the cone, without loosing its useful prop-erties. To do so we borrow an idea from quantum ﬁeld theory where it turns out that these sharp bound-aries are “smeared” out when considering them on a quantum scale. In our case the “quantum” scalecan be regarded as the resolution of the discrete representation of space-time, although this analogy ispurely artiﬁcial.To achieve this relaxation, consider the following quantity∆ = √− l = (cid:112) − ( t − x † C − x ) . (48)It has two useful properties: For causal (including time-like as well as light-like) points the real part of∆, (cid:60) (∆), is zero as the square root is taken from a negative number. For non-causal (space-like) pointsthe real part becomes positive. Furthermore, for ﬁxed t , (cid:60) (∆) is asymptotically linear in x . Therefore, ifwe consider a Gaussian in (cid:60) (∆), G (∆) ≡ exp (cid:18) − σ (cid:60) (∆) (cid:19) , (49)we notice that this quantity remains one within the light cone, while it asymptotically falls oﬀ like a Gaus-sian in x , for ﬁxed t . Here σ stands for an optional scaling parameter which controls the width of theGaussian. In all applications of this paper it was set to a size of a few pixels. EFERENCES

Acknowledgements

We would like to thank Jakob Knollm¨uller, Philipp Arras and Margret Westerkamp for fruitful discus-sions, as well as Martin Reinecke for his contributions to NIFTy.

Conﬂict of interest

The authors declare no conﬂicts of interest.

References [1] M. P. van Haarlem, M. W. Wise, A. W. Gunst, G. Heald, J. P. McKean, J. W. T. Hessels, A. G. deBruyn, R. Nijboer, J. Swinbank, R. Fallows,

A&A , A2.[2] T. A. Enßlin,

Annalen der Physik , , 3 1800127.[3] P. E. Protter, In Stochastic integration and diﬀerential equations , 249–361. Springer, .[4] W. Grecksch, P. E. Kloeden,

Bulletin of the Australian mathematical society , , 1 79.[5] C. R. Doering, Physics Letters A , , 3-4 133.[6] A. J. Roberts, ANZIAM Journal , New York .[8] T. A. Enßlin, M. Frommert, F. S. Kitaura,

Phys. Rev. D , , 10 105005.[9] N. Wiener, Extrapolation, interpolation, and smoothing of stationary time series, with engineeringapplications. , Number ix, 163 p. in Stationary time series. Technology Press of the MassachusettsInstitute ofTechnology, .[10] T. Steininger, J. Dixit, P. Frank, M. Greiner, S. Hutschenreuter, J. Knollm¨uller, R. Leike, N. Por-queres, D. Pumpe, M. Reinecke,

Annalen der Physik , , 3 1800290.[11] J. Knollm¨uller, T. A. Enßlin, arXiv e-prints , arXiv:1812.04403.[12] N. Oppermann, M. Selig, M. R. Bell, T. A. Enßlin, Phys. Rev. E , , 3 032136.[13] P. Frank, T. Steininger, T. A. Enßlin, Phys. Rev. E , , 5 052104.[14] J. Knollm¨uller, T. A. Enßlin, arXiv e-prints , arXiv:1901.11033.[15] S.-i. Amari, H. Nagaoka, Methods of information geometry , volume 191, American MathematicalSoc., .[16] P. Arras, M. Baltac, T. A. Ensslin, P. Frank, S. Hutschenreuter, J. Knollmueller, R. Leike, M.-N.Newrzella, L. Platz, M. Reinecke, NIFTy5: Numerical Information Field Theory v5, . EFERENCES t Figure 1:

Left:

Causal response of a system deﬁned as a combination of two damped harmonic oscillators with diﬀerentmasses.

Right:

Anti-causal response of the system where one of the oscillations travels backwards in time.13

EFERENCES ( f * ) ( f * )( f * ) ( f * ) ( f * ) ( e t f * ) Figure 2: Excitation ξ and responses of various systems deﬁned in terms of powers of f , with f t = ∂ t + γ and γ small.We see that higher order derivatives result in apparent non-local responses. Since time derivatives are the generators oftemporal translation, the case where the response is the exponential of − ∆ t f leads to translations by ∆ t .14 EFERENCES G t t G | ) Figure 3:

Top:

On the left we depict the signal (red line), the data (blue dots) as well as 50 over-plotted posterior sam-ples (gray). The right panel shows the synthetic propagator (Greens function) in the temporal domain (red) as well ascorresponding posterior samples.

Bottom:

Left: Excitation ﬁeld used to generate the signal. Right: Natural logarithmicspectrum of the synthetic propagator (red) and corresponding posterior samples.15

EFERENCES n = 20 Residual s t n = 20 Residual G t n = 15 10010 n = 1510010 n = 10 505 n = 10505 n = 5 2.50.02.5 n = 50 10 20 30 40 50t202 n = 1 0 10 20 30 40 50t101 n = 1 Figure 4:

Left:

Residuals between the true signal s t and the posterior mean of the reconstruction (blue line) as well asthe one and two sigma conﬁdence intervals of the corresponding posterior uncertainty. We show those for various diﬀerentnoise standard deviations σ n starting with the highest noise level at the top to the lowest at the bottom. Right:

Corre-sponding residuals and conﬁdence intervals for the temporal representation of the dynamic Green’s function G t again forvarious noise levels. In all inference runs, we seeded the random number generator used for data generation and duringreconstruction with the same random seed such that the only diﬀerence in these reconstructions is a diﬀerent σ n .16 EFERENCES x Data Signal Posterior mean x Logarithmic 1 uncertainty

Logarithmic Residual

Posterior sample

20 10 0 10 200.8 1.0 1.2 1.4 1.6 1.8 2.0 2.2 2.4

Figure 5:

Top:

The left panel shows the spatio-temporal masked and noisy data, drawn from the synthetic signal (middlepanel) and the corresponding signal reconstruction (right panel).

Bottom:

Natural logarithmic one-sigma posterior uncer-tainty (left panel), natural logarithmic residual between the true signal and the posterior mean (middle panel), as well asan approximate posterior sample (right panel). 17

EFERENCES x Propagator Posterior mean x Residual

Posterior sample

Figure 6:

Top:

True Green’s function (left) and corresponding posterior mean (right).

Bottom:

Residual of the trueGreen’s function and the reconstruction (left) and a posterior sample for the Green’s function (right). k Log spectrum

Reconstruction

Figure 7: Natural logarithmic spectrum of the true Green’s function (left) as well as the corresponding posterior mean(right). 18

EFERENCES x Data Signal Posterior mean x Logarithmic 1 uncertainty

Logarithmic Residual

Posterior sample

Figure 8:

Top:

The left panel shows the sparse and noisy measurement data, drawn from the synthetic signal (middlepanel) and the corresponding reconstruction (right panel).

Bottom:

Natural logarithmic one-sigma posterior uncertainty(left panel), natural logarithmic residual of the true signal and the corresponding posterior mean (middle panel), as well asan approximate posterior sample (right panel). 19

EFERENCES x Propagator Posterior mean x Residual

Posterior sample

Figure 9:

Top:

True Green’s function of the process deﬁned in eq. 46 (left) and corresponding posterior mean (right).

Bottom:

Residual of the true Green’s function and the reconstruction (left) and a posterior sample for the Green’s func-tion (right). k Log spectrum

Reconstruction

Figure 10: Natural logarithmic spectrum of the true Green’s function (left) as well as the corresponding posterior mean(right). 20

EFERENCES