Bayesian Reconstruction of Missing Observations
aa r X i v : . [ s t a t . M L ] M a r Bayesian Reconstruction of Missing Observations
Shun Kataoka ∗ , Muneki Yasuda † , and Kazuyuki Tanaka ∗ ∗ Graduate School of Information Sciences, Tohoku University, Sendai 980-8579, Japan † Graduate School of Science and Engineering, Yamagata University, Yonezawa 992-8510, Japan abstract:
We focus on an interpolation method referred to Bayesian reconstruction in this paper. Whereas in standardinterpolation methods missing data are interpolated deterministically, in Bayesian reconstruction, missing dataare interpolated probabilistically using a Bayesian treatment. In this paper, we address the framework ofBayesian reconstruction and its application to the traffic data reconstruction problem in the field of trafficengineering. In the latter part of this paper, we describe the evaluation of the statistical performance of ourBayesian traffic reconstruction model using a statistical mechanical approach and clarify its statistical behavior.
Methods for interpolating missing data are important in various scientific fields. A standard interpolationmethod, such as spline interpolation, is a deterministic interpolation technique. An alternative, probabilistic,interpolation technique has been developed in the last few years. In the probabilistic interpolation technique,which is called Bayesian reconstruction, the Bayesian treatment is used to interpolate and reconstruct missingregions. To the best of our Knowledge, Bayesian reconstruction was first implemented in the digital imageinpainting filter. The digital image inpainting filter is used in the process of reconstructing lost or deterioratedparts of images [1] (see figure 1). Previously, two of the authors applied Bayesian reconstruction to the digitalFigure 1: Example of digital image inpainting filter. The left image is the original scratched image and thecenter image is masked by the black regions. We reconstructed the masked region using the digital imageinpainting filter to restore the original damaged image. The right image is the reconstructed image obtained byusing the digital image inpainting method proposed in reference [2].image inpainting filter [3, 4]. Bayesian reconstruction is now becoming the standard technique in the digitalimage inpainting filter [5].It can be expected that the framework of Bayesian reconstructions will be utilized in various reconstructionproblems, and therefore, their use should not be limited to image processing. Recently, the authors appliedBayesian reconstruction to the traffic data reconstruction problem [6]. Traffic data reconstruction is an impor-tant processing that precedes traffic prediction, such as travel time prediction, density prediction, and routeplanning. In order to provide accurate information to drivers, a broad-scale database of real-time vehicular traf-fic over an entire city is required. However, in practice, it is difficult to collect the traffic data for an entire city,because traffic sensors are not installed on all roads. Therefore, the objective of the traffic data reconstructionis to reconstruct the states of unobserved roads where traffic sensors are not installed by using information fromobserved roads where traffic sensors are installed.In the first part of this paper, we introduce the details of Bayesian reconstruction, and subsequently, anoverview of the Bayesian traffic data reconstruction method proposed in reference [6], together with some new Corresponding author: [email protected]
In the Bayesian framework, we suppose observations (observed data) are probabilistically drawn from a specificprobability distribution, referred to as prior probability, because observations suffer from uncertainty, whoseorigin is physical noise, incompleteness of some elements, and so on.Suppose that there exists an n -dimensional observation x = { x i ∈ R | i ∈ V = { , , . . . , n }} , which isgenerated from prior probability P prior ( x ), and that we cannot observe a part of the elements in the observationfor some reason. Since the observation is probabilistically generated, we can treat its elements as randomvariables. We define the set of labels of missing elements by M ⊆ V and the complementary set of M bynotation O , i.e., O := V \ M , and therefore, O is the set of labels of observed elements. Given an observation x , we describe the values of the observed elements by notation y to distinguish them from unobserved elementsand collectively express the observed elements by y = { y i ∈ R | i ∈ O} . The values of the elements in set O are fixed by the observation y . Bayesian reconstruction considered in this paper consists of reconstructing theunobserved elements, M , in the observation by using the observed elements, O ; in other words, the objectiveis to estimate the values of x M by using y , where notation x A is the set of x i belonging to set A ⊆ V , i.e., x A := { x i | i ∈ A} .In order to reconstruct the missing elements in terms of the Bayesian point of view, we first formulate theposterior probability of the missing elements, P ( x M | y ), by the Bayesian rule P ( x M | y ) = P ( x M , y ) P ( y ) , (1)where the value of y is fixed by the observation. By using Dirac’s delta, we have P ( x M , y ) = (cid:16) Y i ∈O δ ( y i − x i ) (cid:17) P ( x M , x O ) = (cid:16) Y i ∈O δ ( y i − x i ) (cid:17) P prior ( x ) . (2)It should be noted that, if x are discrete variables, Dirac’s delta is replaced by Kronecker’s delta. From equations(1) and (2), we have P ( x M | y ) ∝ P likelihood ( y | x ) P prior ( x ) , (3)where P likelihood ( y | x ) := Y i ∈O δ ( y i − x i ) . Probability P likelihood ( y | x ) is referred to as the likelihood in the Bayesian framework. In Bayesian reconstruc-tion, we consider the suitable reconstructed values of unobserved elements, x ∗M , to be the values of x M thatmaximize the posterior probability in equation (3), i.e., x ∗M = arg max x M P ( x M | y ) = arg max x M P likelihood ( y | x ) P prior ( x ) . (4)The above reconstruction scheme requires prior probabilities that describe the hidden probabilistic mecha-nisms of observations. However, unfortunately, in almost all situations we do not know the details of the priorprobabilities. Therefore, in order to implement the Bayesian reconstruction system, we should model unknownprior probabilities. 2 .1 Prior Modeling based on Markov Random Fields One of the models of prior probabilities of observed data that is presently available is MRF. MRFs can easilytreat complex spatial interactions among observational data points that create a variety of appearance patterns.Consider an undirected graph G ( V, E ), where V = { , , . . . , n } is the set of vertices and E = { ( i, j ) } is theset undirected edges between the vertices. An MRF is usually defined on such an undirected graph G ( V, E )by assigning each variable x i to the corresponding vertex i . Edge ( i, j ) expresses a spatial interaction betweenvariable x i and variable x j . To construct a probabilistic model, we define the joint probability of x , P model ( x ).On the undirected graph G ( V, E ), if we assume a spatial Markov property among random variables, x , and thepositivity of model P model ( x ) >
0, by the Hammersley-Clifford theorem, the model can be expressed as P model ( x ) = 1 Z exp (cid:16) X i ∈ V φ i ( x i ) + X ( i,j ) ∈ E ψ ij ( x i , x j ) (cid:17) , (5)without loss of generalities. The first term in the exponent, φ i ( x i ), is a potential function on vertex i thatdetermines the characteristic of x i , and the second term in the exponent, ψ ij ( x i , x j ), is a potential functionbetween vertices i and j that determines the interaction between x i and x j . P ( i,j ) ∈ E represents the summationrunning over all edges, Z denotes the normalization constant, sometimes referred to as the partition function,defined by Z := Z ∞−∞ exp (cid:16) X i ∈ V φ i ( x i ) + X ( i,j ) ∈ E ψ ij ( x i , x j ) (cid:17) d x . The model in equation (5) is the MRF that is most frequently used. In the MRF, let us consider the conditionalprobability of x i expressed as P model ( x i | x − i ) = P model ( x ) R ∞−∞ P model ( x ) d x i , (6)where x − i denotes the set of all variables except x i : x − i = { x j | j ∈ V \ { i }} . Equations (5) and (6) lead to P model ( x i | x − i ) = P model ( x i | x ∂ ( i ) ) , (7)where ∂ ( i ) denotes the set of vertices connecting to vertex i in the graph, and x ∂ ( i ) denotes the set of variables onthe vertices belonging to ∂ ( i ), that is, x ∂ ( i ) is the set of nearest neighbor variables of x i : x ∂ ( i ) = { x j | j ∈ ∂ ( i ) } .Equation (7) states that the variable x i depends on only nearest neighbor variables in the conditional probability,and this constitutes the spatial Markov property of MRF. In order to implement the MRF in equation (5), the forms of potential functions should be determined. Thisis one of the most important points in MRF modeling. Parametrically, we model the potential functions bycertain parametric functions with parameter θ , P model ( x | θ ) = 1 Z ( θ ) exp (cid:16) X i ∈ V φ i ( x i | θ i ) + X ( i,j ) ∈ E ψ ij ( x i , x j , | θ ij ) (cid:17) . (8)Thus, we should find the optimal values of the parameters. The standard method for achieving this is providedby the field of machine learning theory described as follows.The objective of MRF modeling is to model the unknown prior probability of observation P prior ( x ). There-fore, the optimal values of the parameters, θ ∗ , should minimize some distance between the prior probabilityand our model P model ( x | θ ). The Kullback-Leibler divergence (KLD) K ( P || P ) := Z ∞−∞ P ( x ) ln P ( x ) P ( x ) d x (9)3s often utilized as a measure of two distinct probabilities, P ( x ) and P ( x ). The value of KLD is always non-negative and is zero when two probabilities are equivalent. Thus, we consider that two distinct probabilitiesare close to each other when the value of KLD is small. In terms of KLD, we suppose the optimal values of theparameters are given by minimizing the value of KLD between the prior probability and our model, θ ∗ = arg min θ K ( P prior || P model ) . However, we cannot perform this minimization because we do not know the prior probability.Since we do not know the prior probability, we suppose instead that we have many complete observations generated from the prior probability. We describe the observations by D = { y ( µ ) ∈ R n | µ = 1 , , . . . , N } , andwe define the empirical distribution of the N complete observations by Q D ( x ) := 1 N N X µ =1 Y i ∈ V δ ( y ( µ ) i − x i ) . (10)The empirical distribution is the frequency distribution of the N complete observations. It should be notedthat, if x are discrete variables, Dirac’s delta is again replaced by Kronecker’s delta. We suppose the empiricaldistribution has some important properties of the prior probability and that suppose the optimal values of theparameters are approximately obtained by minimizing the value of KLD between the empirical distribution andour model, θ ∗ = arg min θ K ( P prior || P model ) ≈ arg min θ K ( Q D || P model ) . (11)This minimization can be perform if we have the complete observations generated from the prior probability.Equation (11) is rewritten as θ ∗ ≈ arg min θ K ( Q D || P model ) = arg max θ Z ∞−∞ Q D ( x ) ln P model ( x | θ ) d x . This corresponds to the MLE in statistics. In the above scheme, we assumed there are no missing points in the N observations used in the estimation of the parameters. If the observations include missing points, we will usean alternative strategy and apply the expectation and maximization (EM) algorithm.From the above arguments, the (parametric) Bayesian reconstruction system is summarized as follows.Before the reconstructions, we design the potential functions in our MRF model P model ( x | θ ) and estimatethe optimal values of the parameters, θ ∗ , in advance by using many complete observations and equation (11).Then, the reconstruction is approximately performed using the constructed model P model ( x | θ ∗ ) instead of theprior probability in equation (4), i.e., x ∗M = arg max x M P likelihood ( y | x ) P prior ( x ) ≈ arg max x M P likelihood ( y | x ) P model ( x | θ ∗ ) . (12)Since arg max x M P likelihood ( y | x ) P model ( x | θ ∗ ) = arg max x M P model ( x M | x O = y , θ ∗ ) , the suitable reconstructed values of unobserved missing points, x ∗M , are the values that maximize the conditionalprobability of our model, P model ( x M | x O , θ ∗ ) = P model ( x | θ ∗ ) R ∞−∞ P model ( x | θ ∗ ) d x M , with x O fixed by the observation y . In this section, we give an overview of the application of the Bayesian reconstruction scheme presented in theprevious section to the traffic data recognition problem proposed by the authors [6], together with some newnumerical results. “Complete” means each observation includes no missing points. .1 MRF model for Bayesian Traffic Data Reconstruction We applied the Bayesian reconstruction scheme to traffic data reconstruction. Our goal is to reconstruct thestates of roads, and therefore, random variables x are assigned to roads. In order to formulate an MRF for aroad network, we construct an undirected graph G ( V, E ) as follows: we assign each vertex on each road anddraw each edge between two roads that are connected to each other at a traffic intersection (see figure 2). OnFigure 2: Undirected graph representation for road network. (a) Road network with six roads and two inter-sections. (b) Vertices are assigned to roads. (c) Edges are drawn between two roads that are connected to eachother at intersections.the undirected graph, we define the MRF by P model ( x | θ ) = 1 Z ( θ ) exp (cid:16) X i ∈ V h i x i − ξ X i ∈ V x i − J X ( i,j ) ∈ E ( x i − x j ) (cid:17) , (13)where θ = { h , ξ, J } are the parameters of the model . The variable x i ∈ R expresses the state of road i . Inthis paper, we consider x as traffic densities according to the method in reference [6]. The traffic density onroad i is defined by the number of cars per unit area on road i ; high densities tend to lead to traffic jams. ThisMRF is obtained by setting φ i ( x i ) = h i x i − ξx i / ψ ij ( x i , x j ) = J ( x i − x j ) / h i is the bias that controls the level of the traffic density of road i , and parameter ξ > J ≥ M and the set of observed roadsby O . After determining the values of the parameters by the machine learning method in equation (11), fromequation (12), the reconstructed densities on the unobserved roads are obtained by x ∗M = arg max x M P model ( x M | x O = y , θ ∗ ) , (14)where y represents the densities on the observed roads. Since our model in equation (13) is multi-dimensionalGaussian, the conditional probability is also multi-dimensional Gaussian. Therefore, equation (14) is rewrittenas x ∗M = Z ∞−∞ x M P model ( x M | x O = y , θ ∗ ) d x M . (15)Hence, we find that the reconstructed densities, x ∗M , are the expectations of P model ( x M | x O = y , θ ∗ ). Theexpectations are obtained by solving the simultaneous equations, x i = 1 ξ + | ∂ ( i ) | J (cid:16) h i + J X j ∈ ∂ ( i ) z j (cid:17) i ∈ M (16) Although this expression seems to differ slightly from the original model proposed in reference [6], this expression is essentiallyequivalent to the original model.
5y an iteration method, where |A| denotes the number of elements in the assigned set A and z j = ( x j j ∈ M y j j ∈ O . Equation (16) is known as the mean-field equation, which is obtained by the naive mean-field approximation for P model ( x M | x O = y , θ ∗ ), and is also known as the Gauss-Seidel method. It is known that in GGM mean-fieldequations and Gauss-Seidel methods are equivalent in general and that mean-field equations always provideexact expectations [7]. In this section, we describe the application of our Bayesian traffic data reconstruction to the road network ofthe city of Sendai (shown in figure 3) and show the performance of our model. Figure 3 shows the road networkFigure 3: Road network of Sendai, Japan. This network consists of about ten thousand roads.of Sendai which consists of about ten thousand roads. According to our Bayesian traffic data reconstructionscheme, first, we defined the MRF model shown in equation (13) for the road network. The structure of theMRF was constructed according to figure 2.The parameters were determined by the maximum likelihood estimation shown in section 2.2 with the L regularizations [6]. Regularizations are frequently used to avoid over-fitting to noises in training data. In themaximum likelihood estimation, we used N = 359 complete traffic data generated by a traffic simulator forthe road network of Sendai. Although the traffic data were not real, they were presumed to represent typicalbehavior of traffic in Sendai. Using the MRF model, we reconstructed the traffic densities of Sendai. FigureFigure 4: True traffic density used in our numerical experiment. Each road is colored according to its trafficdensity. The network in the left panel is the entire network and the network in the right panel is an enlargedimage of the center of Sendai.4 shows the traffic density data that were not used in the learning, namely, the test data. In order to visuallyrepresent the densities, we quantized the densities into five stages at 0.03 intervals; each road is colored accordingto its quantized traffic density. The densities increase in the following order: black, blue, green, yellow, and red.Thus, a road colored black has a density in the interval [0 , . p = 0 .
8. Figure 5 shows the positions of the unobservedFigure 5: Positions of unobserved roads, where the unobserved roads are colored red. About 80 % of roadsare unobserved. The network in the left panel is the entire network and the network in the right panel is anenlarged image of the center of Sendai.roads, which are colored red. We reconstructed densities in the unobserved roads using the densities in theobserved roads, which colored black in the figure. Our Bayesian reconstruction result is shown in figure 6. TheFigure 6: Reconstruction result by using our Bayesian reconstruction, where each road is colored according toits traffic density. The network in the left panel is the entire network and the network in the right panel is anenlarged image of the central area of Sendai.mean square error (MSE) between the true densities shown in figure 4 and the reconstructed densities shownin figure 6 of the unobserved roads is approximately 0.001447, where the MSE is defined by[MSE] := 1 M X i ∈M (cid:0) x data i − x recon i (cid:1) , (17)where x data i is the true density on road i and x recon i is the density on road i reconstructed by our method.The scatter plot of this reconstruction is shown in figure 7. The correlation coefficient of this scatter plot isapproximately 0.919. It can be seen that the correlation coefficient is close to one. Thus, our simple MRFmodel can be expected to capture a static statistical property of the traffic data.Next, we address the average performance of our reconstruction method versus the value of the missingprobability p . Figure 8 shows the MSE versus the value of the missing probability p . Each point is the averagevalue of MSE over 100 trials using the leave-one-out cross-validation method. It can be seen that the errorincreases with the value of the missing probability p . In this section, we clarify the relationship between the model parameters in the MRF model in equation (13)and the reconstruction performance from a statistical mechanical point of view.7 t r ue den s i t i e s estimated densities Figure 7: Scatter plot of the true densities in figure 4 and the reconstructed densities in figure 6. [ M S E ] p Figure 8: MSE versus the missing probability p . The solid curve is the Bezier interpolation of points.In our analysis, we assume that the prior probability of traffic data has the same form as our model inequation (13), P prior ( x | θ ) = 1 Z ( θ ) exp (cid:16) X i ∈ V h i x i − ξ X i ∈ V x i − J n X i 5. It can be seen that, where J is bigger than J , the performance level is relativelyFigure 9: Plot of E against error r when ξ = ξ , µ ε = σ ε = 0, p = 0 . 5, and J = J + r . The vertical axis is E in equation (27). The inset is an enlarged plot around r = 0.robust. In contrast, the performance level drastically decreases when J is smaller than J .From equation (27), it can be seen that the dependency on the missing rate p arises when model errors existin the bias parameters or in variance parameters ξ and ξ . Next, we consider the case where model errors existin the bias parameters and the variance parameters. Figure 10 shows the plot of E against missing rate p when ξ = 0 . µ ε = 0 . σ ε = 0, and J = J . The reconstruction performance level decreases with the increase inthe value of p . This performance behavior seems to be qualitatively similar to the behavior of our numericaltraffic density reconstruction in figure 8. From this result, we can presume that the behavior of MSE in figure8 is caused primarily by the model errors in either the biases or the variance, or both.The parameters in our reconstruction model were determined by the training data as described in section3.2. However, in the training, it was not easy to find the truly optimal values of parameters from the trainingdata, because the number of training data was much smaller than the number of parameters. This could beone of the reasons why the model errors exist. 10igure 10: Plot of E against missing rate p when ξ = 0 . µ ε = 0 . σ ε = 0, and J = J . The vertical axis is E in equation (27). In this paper, we introduced the Bayesian reconstruction framework, in which missing data are probabilisticallyinterpolated, and overviewed the application of Bayesian reconstruction to the problem of traffic data recon-struction in the field of traffic engineering. Our traffic reconstruction model in equation (13) neglects some realtraffic properties, for example, traffic lanes, contraflows, and so on. Nevertheless, the results of our reconstruc-tion seem to be accurate. It can be expected that our simple GGM captures the static statistical property oftraffic data and that it can be used an important base model of Bayesian traffic reconstructions. The extensionof our model by taking real traffic properties into account should be addressed in the next study.In the latter part of this paper, we evaluated the statistical performance of our reconstruction by using thestatistical mechanical analysis, that is mean-field analysis. In our analysis, we used the simplified reconstructionmodel, which has no network structures, for the convenience of calculations. However, since real road networksare network structures, the result of our evaluation can be only a rough approximation. The authors proposed amethod based on the belief propagation method to evaluate the statistical properties of Bayesian reconstructionson structured networks in the context of image processing [8]. We strongly suggest that the method can beapplied to Bayesian traffic reconstruction and can lead to more realistic evaluations. A Mean-field Analysis for Traffic Data Reconstruction Model The partition function in equation (18) is Z ( θ ) = Z ∞−∞ exp (cid:16) X i ∈ V h i x i − ξ X i ∈ V x i − J n X i 12 ln 2 πξ + J − n ( ξ + J ) X i ∈ V h i − J n ( ξ + J ) ξ (cid:16) X i ∈ V h i (cid:17) , (30)so that the first- and the second-order moments of the prior model are given by h x i i prior = − n ∂F ( θ ) ∂h i = h i ξ + J + Jn ( ξ + J ) ξ X j ∈ V h j (31)and h x i x j i prior = − n ∂ F ( θ ) ∂h i ∂h j + h x i i prior h x j i prior = δ ij ξ + J + Jn ( ξ + J ) ξ + h x i i prior h x j i prior , (32)respectively, where h· · ·i prior = R ∞−∞ ( · · · ) P prior ( x | θ ) d x and δ ij is Kronecker’s delta.Next, we find the free energy for the conditional probability of the reconstruction model in equation (20).The conditional probability is expressed by P model ( x M | x O , θ ) = 1 Z ( θ , x O ) exp (cid:16) X i ∈M (cid:0) β i + f ( x O ) (cid:1) x i − (cid:16) ξ + ( n − J n (cid:17) X i ∈M x i + J n X i 12 ln 2 πξ + J − t ( ξ + J ) X i ∈M (cid:0) β i + f ( x O ) (cid:1) − pJ t ( ξ + J ) (cid:0) ξ + (1 − p ) J (cid:1) (cid:16) X i ∈M (cid:0) β i + f ( x O ) (cid:1)(cid:17) , (34)so that the first-order moments of the conditional probability of the reconstruction model are given by h x i i M|O = − t ∂ F ( θ , x O ) ∂β i = β i + f ( x O ) ξ + J + pJ t ( ξ + J ) (cid:0) ξ + (1 − p ) J (cid:1) X j ∈M (cid:0) β j + f ( x O ) (cid:1) (35)for ∀ i ∈ M , where we define h· · ·i M|O := R ∞−∞ ( · · · ) P model ( x M | x O , θ ) d x .We consider the thermal dynamical limit by taking limits n, t → ∞ , where p is fixed at a finite constant. Inthe thermal dynamical limit, the free energies in equations (30) and (34) are reduced to F ( θ ) = − 12 ln 2 πξ + J − σ h + µ h ξ + J ) − Jµ h ξ + J ) ξ F ( θ , x O ) = − 12 ln 2 πξ + J − σ h + σ ε + (cid:0) µ h + µ ε + f ( x O ) (cid:1) ξ + J ) − pJ (cid:0) µ h + µ ε + f ( x O ) (cid:1) ξ + J ) (cid:0) ξ + (1 − p ) J (cid:1) , respectively, where µ h and σ h are the average and variance of p h ( h ), respectively, and µ ε and σ ε are the averageand variance of p ε ( ε ), respectively. { µ h , µ ε , σ h , σ ε } appear because of the law of large numbers. Similarly, themoments in equations (31), (32), and (35) are reduced to h x i i prior = h i ξ + J + Jµ h ( ξ + J ) ξ , (36) h x i x j i prior = δ ij ξ + J + h x i i prior h x j i prior , (37)and h x i i M|O = β i + f ( x O ) ξ + J + pJ (cid:0) µ h + µ ε + f ( x O ) (cid:1) ( ξ + J ) (cid:0) ξ + (1 − p ) J (cid:1) , (38)respectively, in the thermal dynamical limit. Acknowledgment The authors are very grateful to Professor Masao Kuwahara and Dr. Jinyoung Kim of the Graduate Schoolof Information Science, Tohoku University for providing road network data and traffic simulation data. Thiswork was partially supported by grants-in-aid (nos. 25280089, 24700220, and 25-7259) from the Ministry ofEducation, Culture, Sports, Science and Technology of Japan. References [1] Bertalm´ıo, M., Sapiro, G., Caselles, V., Ballester, C., “Image inpainting,” Proceedings of SIGGRAPH2000 (2000).[2] Yasuda, M., Tanaka, K., “Mean field approximation for fields of experts,” Interdisciplinary InformationSciences : 113–119 (2013).[3] Yasuda, M., Ohkubo, J., Tanaka, K., “Digital image inpainting based on Markov random field,” Proceedingsof 2005 International Conference on Computational Intelligence for Modelling, Control and Automation(CIMCA’05) , : 747–752 (2005).[4] Yasuda, M., Ohkubo, J., Tanaka, K., “Digital image inpainting algorithm by using Gaussian graphicalmodel,” Information Technology Letters (Proceedings of Forum of Information Technology (FIT) 2006) , :225–228 (2006) (in Japanese).[5] Roth, S., Black, M.J., “Fields of experts,” International Journal of Computer Vision , : 205–229 (2009).[6] Kataoka, S., Yasuda, M., Furtlehner, C., Tanaka, K., “Traffic data reconstruction based on Markov randomfield modeling,” Inverse Problems , : 025003 (2014).[7] Weiss, Y., Freeman, W.T., “What makes a good model of natural images,” Proceedings of the IEEEConference on Computer Vision and Pattern Recognition (CVPR) (2007).[8] Kataoka, S., Yasuda, M., Tanaka, K., “Statistical analysis of Gaussian image inpainting problems,” Journalof the Physical Society of Japan81