Random mesh projectors for inverse problems
Sidharth Gupta, Konik Kothari, Maarten V. de Hoop, Ivan Dokmanić
RR ANDOM MESH PROJECTORS FOR INVERSE PROBLEMS
Konik Kothari ∗ University of Illinois at Urbana-Champaign [email protected]
Sidharth Gupta ∗ University of Illinois at Urbana-Champaign [email protected]
Maarten V. de Hoop
Rice University [email protected]
Ivan Dokmani´c
University of Illinois at Urbana-Champaign [email protected] A BSTRACT
We propose a new learning-based approach to solve ill-posed inverse problems inimaging. We address the case where ground truth training samples are rare andthe problem is severely ill-posed—both because of the underlying physics andbecause we can only get few measurements. This setting is common in geophysicalimaging and remote sensing. We show that in this case the common approach todirectly learn the mapping from the measured data to the reconstruction becomesunstable. Instead, we propose to first learn an ensemble of simpler mappings fromthe data to projections of the unknown image into random piecewise-constantsubspaces. We then combine the projections to form a final reconstruction bysolving a deconvolution-like problem. We show experimentally that the proposedmethod is more robust to measurement noise and corruptions not seen duringtraining than a directly learned inverse.
NTRODUCTION
A variety of imaging inverse problems can be discretized to a linear system y = Ax + η where y ∈ R M is the measured data, A ∈ R M × N the imaging or forward operator, x ∈ X ⊂ R N theobject being probed by applying A (often called the model), and η the noise. Depending on theapplication, the set of plausible reconstructions X could model natural, seismic, or biomedical images.In many cases the resulting inverse problem is ill-posed, either because of the poor conditioning of A (a consequence of the underlying physics) or because M (cid:28) N .A classical approach to solve ill-posed inverse problems is to minimize an objective functionalregularized via a certain norm (e.g. (cid:96) , (cid:96) , total variation (TV) seminorm) of the model. Thesemethods promote general properties such as sparsity or smoothness of reconstructions, sometimes incombination with learned synthesis or analysis operators, or dictionaries (Sprechmann et al. (2013)).In this paper, we address situations with very sparse measurement data ( M (cid:28) N ) so that even acoarse reconstruction of the unknown model is hard to get with traditional regularization schemes.Unlike artifact-removal scenarios where applying a regularized pseudoinverse of the imaging operatoralready brings out considerable structure, we look at applications where standard techniques cannotproduce a reasonable image (Figure 1). This highly unresolved regime is common in geophysics andrequires alternative, more involved strategies (Galetti et al. (2017)).An appealing alternative to classical regularizers is to use deep neural networks. For example,generative models (GANs) based on neural networks have recently achieved impressive results inregularization of inverse problems (Bora et al. (2018), Lunz et al. (2018)). However, a difficulty ingeophysical applications is that there are very few examples of ground truth models available fortraining (sometimes none at all). Since GANs require many, they cannot be applied to such problems.This suggests to look for methods that are not very sensitive to the training dataset. Conversely, itmeans that the sought reconstructions are less detailed than what is expected in data-rich settings; foran example, see the reconstructions of the Tibetan plateau (Yao et al. (2006)). ∗ S. Gupta and K. Kothari contributed equally. a r X i v : . [ c s . C V ] D ec seudoinverse Non-negative least squares TV regularization NN direct inversion ReconstructionsModerate
Parallel beam tomography with 25 view angles
Severe
Traveltime tomography with 25 sensors and insufficient ground truth samples RxAx
Model, x Data
Figure 1: We reconstruct an image x from its tomographic measurements. In moderately ill-posedproblems, conventional methods based on the pseudoinverse and regularized non-negative leastsquares ( x ∈ [0 , N , N is image dimension) give correct structural information. In fact, totalvariation (TV) approaches give very good results. A neural network (Jin et al. (2016)) can be trainedto directly invert and remove the artifacts (NN). In a severely ill-posed problem on the other hand(explained in Figure 4) with insufficient ground truth training data, neither the classical techniquesnor a neural network recover salient geometric features.In this paper, we propose a two-stage method to solve ill-posed inverse problems using randomlow-dimensional projections and convolutional neural networks. We first decompose the inverseproblem into a collection of simpler learning problems of estimating projections into random (butstructured) low-dimensional subspaces of piecewise-constant images. Each projection is easier tolearn in terms of generalization error (Cooper (1995)) thanks to its lower Lipschitz constant.In the second stage, we solve a new linear inverse problem that combines the estimates from thedifferent subspaces. We show that this converts the original problem with possibly non-local (oftentomographic) measurements into an inverse problem with localized measurements, and that in fact, inexpectation over random subspaces the problem becomes a deconvolution. Intuitively, projectinginto piecewise-constant subspaces is equivalent to estimating local averages—a simpler problemthan estimating individual pixel values. Combining the local estimates lets us recover the underlyingstructure. We believe that this technique is of independent interest in addressing inverse problems.We test our method on linearized seismic traveltime tomography (Bording et al. (1987); Hole (1992))with sparse measurements and show that it outperforms learned direct inversion in quality of achievedreconstructions, robustness to measurement errors, and (in)sensitivity to the training data. The latteris essential in domains with insufficient ground truth images. ELATED WORK
Although neural networks have long been used to address inverse problems (Ogawa et al. (1998);Hoole (1993); Schiller and Doerffer (2010)), the past few years have seen the number of relateddeep learning papers grow exponentially. The majority address biomedical imaging (Güler andÜbeylı (2005); Hudson and Cohen (2000)) with several special issues and review papers (Lucas et al.(2018); McCann et al. (2017)) dedicated to the topic. All these papers address reconstruction fromsubsampled or low-quality data, often motivated by reduced scanning time or lower radiation doses.Beyond biomedical imaging, machine learning techniques are emerging in geophysical imaging(Araya-Polo et al. (2017); Lewis et al. (2017); Bianco and Gertoft (2017)), though at a slower pace,perhaps partly due to the lack of standard open datasets.Existing methods can be grouped into non-iterative methods that learn a feed-forward mappingfrom the measured data y (or some standard manipulation such as adjoint or a pseudoinverse) to themodel x (Jin et al. (2016); Pelt and Batenburg (2013); Zhu et al. (2018); Wang (2016); Antholzeret al. (2017); Han et al. (2016); Zhang et al. (2016)); and iterative energy minimization methods,with either the regularizer being a neural network (Li et al. (2018)), or neural networks replacingvarious iteration components such as gradients, projectors, or proximal mappings (Kelly et al. (2017); IEEE Transactions on Medical Imaging, May 2016 (Greenspan et al. (2016)); IEEE Signal ProcessingMagazine, November 2017, January 2018 (Porikli et al. (2017; 2018)). Non-negativeleast squaresAxModel, x
Estimate P S Λ xEstimate P S Λ -1 x Reformulated inverse problem solver
Estimate P S3 xEstimate P S2 xEstimate P S1 x Target projectionsof x to map to P S1 x P S2 x .....P S3 x P S Λ -1 x P S Λ xStage 1 Stage 2 Figure 2: Regularization by Λ random projections: 1) each orthogonal projection is approximated bya convolutional neural network which maps from a non-negative least squares reconstruction of animage to its projection onto a lower dimension subspace of Delaunay triangulations; 2) projectionsare combined to estimate the original image using regularized least squares.Adler and Öktem (2017b;a); Rick Chang et al. (2017)). These are further related to the notion ofplug-and-play regularization (Venkatakrishnan et al. (2013)), as well as early uses of neural nets tounroll and adapt standard sparse reconstruction algorithms (Gregor and LeCun (2010); Xin et al.(2016)). An advantage of the first group of methods is that they are fast; an advantage of the secondgroup is that they are better at enforcing data consistency. Generative models
A rather different take was proposed in the context of compressed sensingwhere the reconstruction is constrained to lie in the range of a pretrained generative network (Boraet al. (2017; 2018)). Their scheme achieves impressive results on random sensing operators andcomes with theoretical guarantees. However, training generative networks requires many examples ofground truth and the method is inherently subject to dataset bias. Here, we focus on a setting whereground-truth samples are very few or impossible to obtain.There are connections between our work and sketching (Gribonval et al. (2017); Pilanci and Wain-wright (2016)) where the learning problem is also simplified by random low-dimensional projectionsof some object—either the data or the unknown reconstruction itself (Yurtsever et al. (2017)). Thisalso exposes natural connections with learning via random features (Rahimi and Recht (2008; 2009)).
EGULARIZATION BY RANDOM MESH PROJECTIONS
The two stages of our method are (i) decomposing a “hard” learning task of directly learning anunstable operator into an ensemble of “easy” tasks of estimating projections of the unknown modelinto low-dimensional subspaces; and (ii) combining these projection estimates to solve a reformulatedinverse problem for x . The two stages are summarized in Figure 2. While our method is applicableto continuous and non-linear settings, we focus on linear finite-dimensional inverse problems.3.1 D ECOMPOSING THE LEARNING PROBLEM
Statistical learning theory tells us that the number of samples required to learn a M -variate L -Lipschitz function to a given sup-norm accuracy is O ( L M ) (Cooper (1995)). While this result isproved for scalar-valued multivariate maps, it is reasonable to expect the same scaling in L to holdfor vector-valued maps. This motivates us to study Lipschitz properties of the projected inverse maps.We wish to reconstruct x , an N -pixel image from X ⊂ R N where N is large (we think of x as an √ N × √ N discrete image). We assume that the map from x ∈ X to y ∈ R M is injective so that it isinvertible on its range, and that there exists an L -Lipschitz (generally non-linear) inverse G , (cid:107) G ( y ) − G ( y ) (cid:107) ≤ L (cid:107) y − y (cid:107) . In order for the injectivity assumption to be reasonable, we assume that X is a low-dimensionalmanifold embedded in R N of dimension at most M , where M is the number of measurements. Since3e are in finite dimension, injectivity implies the existence of L (Stefanov and Uhlmann (2009)).Due to ill-posedness, L is typically large.Consider now the map from the data y to a projection of the model x into some K -dimensionalsubspace S , where K (cid:28) N . Note that this map exists by construction (since A is injective on X ),and that it must be non-linear. To see this, note that the only consistent linear map acting on y isan oblique, rather than an orthogonal projection on S (cf. Section 2.4 in Vetterli et al. (2014)). Weexplain this in more detail in Appendix A.Denote the projection by P S x and assume S ⊂ R N is chosen uniformly at random. We want toevaluate the expected Lipschitz constant of the map from y to P S x , noting that it can be written as P S • G : E (cid:107) P S • G ( y ) − P S • G ( y ) (cid:107) ≤ (cid:113) E (cid:107) P S • G ( y ) − P S • G ( y ) (cid:107) ≤ (cid:113) KN L (cid:107) y − y (cid:107) where the first inequality is Jensen’s inequality, and the second one follows from E (cid:107) P S x (cid:107) = E x (cid:62) P (cid:62) S P S x = x (cid:62) E ( P (cid:62) S P S ) x and the observation that E P (cid:62) S P S = KN I N . In other words, random projections reduce the Lipschitzconstant by a factor of (cid:112) K/N on average. Since learning requires O ( L K ) samples, this allows us towork with exponentially fewer samples and makes the learning task easier. Conversely, given a fixedtraining dataset, it gives more accurate estimates.3.1.1 T HE CASE FOR D ELAUNAY TRIANGULATIONS
The above example uses unstructured random subspaces. In many inverse problems, such as inversescattering (Beretta et al. (2013); Di Cristo and Rondi (2003)), a judicious choice of subspace familycan give exponential improvements in Lipschitz stability. Particularly, it is favorable to use piecewise-constant images: x = (cid:80) Kk =1 x k χ k , with χ k being indicator functions of some domain subset.Motivated by this observation, we use piecewise-constant subspaces over random Delaunay trianglemeshes. The Delaunay triangulations enjoy a number of desirable learning-theoretic properties. Forfunction learning it was shown that given a set of vertices, piecewise linear functions on Delaunaytriangulations achieve the smallest sup-norm error among all triangulations (Omohundro (1989)).We sample Λ sets of points in the image domain from a uniform-density Poisson process and construct Λ (discrete) Delaunay triangulations with those points as vertices. Let S = { S λ | ≤ λ ≤ Λ } be thecollection of Λ subspaces of piecewise-constant functions on these triangulations. Let further G λ bethe map from y to the projection of the model into subspace S λ , G λ y = P S λ x . Instead of learningthe “hard” inverse mapping G , we propose to learn an ensemble of simpler mappings { G λ } Λ λ =1 .We approximate each G λ by a convolutional neural network, Γ θ ( λ ) ( (cid:101) y ) : R N → R N , parameterizedby a set of trained weights θ ( λ ) . Similar to Jin et al. (2016), we do not use the measured data y ∈ R M directly as this would require the network to first learn to map y back to the image domain;we rather warm-start the reconstruction by a non-negative least squares reconstruction, (cid:101) y ∈ R N ,computed from y . The weights are chosen by minimizing empirical risk: θ ( λ ) = arg min θ J J (cid:88) j =1 (cid:13)(cid:13) Γ θ ( λ ) ( (cid:101) y j ) − P S λ x j (cid:13)(cid:13) , (1)where (cid:8) ( x j , (cid:101) y j ) (cid:9) Jj =1 is a set of J training models and non-negative least squares measurements.3.2 T HE NEW INVERSE PROBLEM
By learning projections onto random subspaces, we transform our original problem into that ofestimating x from (cid:8) Γ θ ( λ ) ( (cid:101) y ) (cid:9) Λ λ =1 . To see how this can be done, ascribe to the columns of B λ ∈ Consistent meaning that if x already lives in S , then the map should return x . One way to construct the corresponding projection matrix is as P S = W W † , where W ∈ R N × K is amatrix with standard iid Gaussian entries. N × K a natural orthogonal basis for the subspace S λ , B λ = [ χ λ, , . . . , χ λ,K ] , with χ λ,k being theindicator function of the k th triangle in mesh λ . Denote by q λ def = q λ ( y ) the mapping from the data y to an estimate of the expansion coefficients of x in the basis for S λ : q λ ( y ) def = B (cid:62) λ Γ θ ( λ ) ( (cid:101) y ) Let B def = (cid:2) B B . . . B Λ (cid:3) ∈ R N × K Λ , and q def = q ( y ) def = (cid:2) q (cid:62) , q (cid:62) , . . . , q (cid:62) Λ (cid:3) (cid:62) ∈ R K Λ ; then wecan estimate x using the following reformulated problem: q ≈ B (cid:62) x , and the corresponding regularized reconstruction: (cid:98) x = (cid:98) G ( y ) def = arg min x ∈ [0 , N (cid:13)(cid:13)(cid:13) q ( y ) − B (cid:62) x (cid:13)(cid:13)(cid:13) + λϕ ( x ) , (2)with ϕ ( x ) chosen as the TV-seminorm (cid:107) x (cid:107) TV . The regularization is not essential. As we showexperimentally, if K Λ is sufficiently large, ϕ ( x ) is not required. Note that solving the originalproblem directly using (cid:107) x (cid:107) TV regularizer fails to recover the structure of the model (Figure 1).3.3 S TABILITY OF THE REFORMULATED PROBLEM AND “ CONVOLUTIONALIZATION ”Since the true inverse map G has a large Lipschitz constant, it would seem reasonable that as thenumber of mesh subspaces Λ grows large (and their direct sum approaches the whole ambient space R N ), the Lipschitz properties of (cid:98) G should deteriorate as well.Denote the unregularized inverse mapping in y (cid:55)→ (cid:98) x (2) by G . Then we have the following estimate: (cid:13)(cid:13) G ( y ) − G ( y ) (cid:13)(cid:13) = (cid:13)(cid:13)(cid:13) ( B T ) † q ( y ) − ( B T ) † q ( y ) (cid:13)(cid:13)(cid:13) ≤ σ min ( B ) − √ Λ L K (cid:107) y − y (cid:107) , with σ min ( B ) the smallest (non-zero) singular value of B and L K the Lipschitz constant of the stableprojection mappings q λ . Indeed, we observe empirically that σ min ( B ) − grows large as the numberof subspaces increases which reflects the fact that although individual projections are easier to learn,the full-resolution reconstruction remains ill-posed.Estimates of individual subspace projections give correct local information. They convert possiblynon-local measurements (e.g. integrals along curves in tomography) into local ones. The key is thatthese local averages (subspace projection coefficients) can be estimated accurately (see Section 4).To further illustrate what we mean by correct local information, consider a simple numerical exper-iment with our reformulated problem, q = B T x , where x is an all-zero image with a few pixels“on”. For the sake of clarity we assume the coefficients q are perfect. Recall that B is a block matrixcomprising Λ subspace bases stacked side by side. It is a random matrix because the subspaces aregenerated at random, and therefore the reconstruction (cid:98) x = ( B (cid:62) ) † q is also random. We approximate E (cid:98) x by simulating a large number of Λ -tuples of meshes and averaging the obtained reconstructions.Results are shown in Figure 3 for different numbers of triangles per subspace, K , and subspacesper reconstruction, Λ . As Λ or K increase, the expected reconstruction becomes increasinglylocalized around non-zero pixels. The following proposition (proved in Appendix B) tells us that thisphenomenon can be modeled by convolution. Proposition 1.
Let (cid:98) x be the solution to q = B (cid:62) x given as ( B (cid:62) ) † q . Then there exists a kernel (cid:101) κ ( u ) ,with u a discrete index, such that E (cid:98) x = x ∗ (cid:101) κ . Furthermore, (cid:101) κ ( u ) is isotropic. While Figure 3 suggests that more triangles are better, we note that this increases the subspacedimension which makes getting correct projection estimates harder. Instead we choose to stack moremeshes with a smaller number of triangles.Intuitively, since every triangle average depends on many measurements, estimating each averageis more robust to measurement corruptions as evidenced in Section 4. Accurate estimates of localaverages enable us to recover the geometric structure while being more robust to data errors. We note that this result requires adequate handling of boundary conditions; for the lack of space we omitthe straightforward details. K ) N u m b e r o f s u b s p a c e s p e r t r i a l ( Λ ) Figure 3: Illustration of the expected kernel κ ( u, v ) with varying subspace dimension, K , and numberof subspaces, Λ . Reconstruction of a sparse three-pixel image (left) and the cameraman image (right).Figure 4: Linearized traveltime tomography illustration: On the left we show a sample model, withred crosses indicating sensor locations and dashed blue lines indicating linearized travel paths; onthe right we show a reconstruction from (cid:0) (cid:1) = 300 measurements by non-negative least squares. UMERICAL RESULTS
PPLICATION : TRAVELTIME TOMOGRAPHY
To demonstrate our method’s benefits we consider linearized traveltime tomography (Hole (1992);Bording et al. (1987)), but we note that the method applies to any inverse problem with scarce data.In traveltime tomography, we measure (cid:0) N (cid:1) wave travel times between N sensors as in Figure 4.Travel times depend on the medium property called slowness (inverse of speed) and the task is toreconstruct the spatial slowness map. Image intensities are a proxy for slowness maps—the lowerthe image intensity the higher the slowness. In the straight-ray approximation, the problem data ismodeled as integral along line segments: y ( s i , s j ) = (cid:90) x ( t s i + (1 − t ) s j ) d t, ∀ s i (cid:54) = s j (3)where x : R → R + is the continuous slowness map and s i , s j are sensor locations. In ourexperiments, we use a × pixel grid with sensors (300 measurements) placed uniformly inan inscribed circle, and corrupt the measurements with zero-mean iid Gaussian noise.4.1.1 A RCHITECTURES AND RECONSTRUCTION
We generate random Delaunay meshes each with 50 triangles. The corresponding projector matricescompute average intensity over triangles to yield a piecewise constant approximation P S λ x of x .We test two distinct architectures: (i) ProjNet , tasked with estimating the projection into a singlesubspace; and (ii)
SubNet , tasked with estimating the projection over multiple subspaces. The ProjNet architecture is inspired by the FBPConvNet (Jin et al. (2016)) and the U-Net (Ronnebergeret al. (2015)) as shown in Figure 11a in the appendix. Crucially, we constrain the network output tolive in S λ by fixing the last layer of the network to be a projector, P S λ (Figure 11a). A similar trickin a different context was proposed in (Sønderby et al. (2016)). Code available at https://github.com/swing-research/deepmesh under the MIT License.
6e combine projection estimates from many ProjNets by regularized linear least-squares (2) to getthe reconstructed model ( cf.
Figure 2) with the regularization parameter λ determined on five held-outimages. A drawback of this approach is that a separate ProjNet must be trained for each subspace.This motivates the SubNet (shown in Figure 11b). Each input to SubNet is the concatenation ofa non-negative least squares reconstruction and 50 basis functions, one for each triangle forminga 51-channel input. This approach scales to any number of subspaces which allows us to getvisually smoother reconstructions without any further regularization as in (2). On the other hand, theprojections are less precise which can lead to slightly degraded performance.As a quantitative figure of merit we use the signal-to-noise ratio (SNR). The input SNR is defined as
10 log ( σ signal /σ noise ) where σ signal and σ noise are the signal and noise variance; the output SNR isdefined as sup a,b
20 log ( (cid:107) x (cid:107) / (cid:107) x − a ˆ x − b (cid:107) ) with x the ground truth and ˆ x the reconstruction.130 ProjNets are trained for 130 different meshes with measurements at various SNRs. Similarly,a single SubNet is trained with 350 different meshes and the same noise levels. We compare theProjNet and SubNet reconstructions with a direct U-net baseline convolutional neural network thatreconstructs images from their non-negative least squares reconstructions. The direct baseline hasthe same architecture as SubNet except the input is a single channel non-negative least squaresreconstruction like in ProjNet and the output is the target reconstruction. Such an architecture wasproposed by (Jin et al. (2016)) and is used as a baseline in recent learning-based inverse problemworks (Lunz et al. (2018); Ye et al. (2018)) and is inspiring other architectures for inverse problems(Antholzer et al. (2017)). We pick the best performing baseline network from multiple networkswhich have a comparable number of trainable parameters to SubNet. We simulate the lack of trainingdata by testing on a dataset that is different than that used for training. Robustness to corruption
To demonstrate that our method is robust against arbitrary assumptionsmade at training time, we consider two experiments. First, we corrupt the data with zero-mean iidGaussian noise and reconstruct with networks trained at different input noise levels. In Figures 5a, 12and Table 1, we summarize the results with reconstructions of geo images taken from the BP2004dataset and x-ray images of metal castings (Mery et al. (2015)). The direct baseline and SubNetare trained on a set of 20,000 images from the arbitrarily chosen LSUN bridges dataset (Yu et al.(2015)) and tested with the geophysics and x-ray images. ProjNets are trained with 10,000 imagesfrom the LSUN dataset. Our method reports better SNRs compared with the baseline. We note thatdirect reconstruction is unstable when trained on clean and tested on noisy measurements as it oftenhallucinates details that are artifacts of the training data. For applications in geophysics it is importantthat our method correctly captures the shape of the cavities unlike the direct inversion which canproduce sharp but wrong geometries (see outlines in Figure 5a). ∞ dB ∞ dB10dB Training SNR T e s t i n g S N R Direct ProjNets SubNet Direct ProjNets SubNetDirect
DirectProjNetsSubNetb)a) p = 1/8 p = 1/10 p = 1/12
Figure 5: a) Reconstructions for different combinations of training and testing input SNR. The outputSNR is indicated for each reconstruction. Our method stands out when the training and testingnoise levels do not match; b) reconstructions with erasures with probability , and . Thereconstructions are obtained from networks which are trained with input SNR of 10 dB. The directnetwork cannot produce a reasonable image in any of the cases.Second, we consider a different corruption mechanism where traveltime measurements are erased (setto zero) independently with probability p ∈ (cid:8) , , (cid:9) , and use networks trained with 10 dB inputSNR on the LSUN dataset to reconstruct. Figure 5b and Table 2 summarizes our findings. Unlike http://software.seg.org/datasets/2D/2004_BP_Vel_Benchmark/ ∞ dBDirect ProjNets SubNet Direct ProjNets SubNetTestingSNR 10 dB 13.51 14.49 13.92 10.34 12.88 12.85 ∞ dB 13.78 15.38 14.04 16.67 17.23 16.86Table 1: Average reconstruction SNR for various training and testing SNR combinations.Average SNR over102 x-ray images p = p = p = Direct 9.03 9.62 10.06ProjNets 11.09 11.70 12.08SubNet 11.33 11.74 11.99Table 2: Average SNR values for reconstructionsfrom measurements with erasure probability, p . Allnetworks were trained for 10dB noisy measurementson the LSUN bridges dataset. Refer to Appendix Efor actual reconstructions. Original Non-negativeleast squares LSUN CelebA Shapes
Figure 6: Reconstructions from networkstrained on different datasets (LSUN, CelebAand Shapes) with 10dB training SNR.with Gaussian noise (Figure 5a) the direct method completely fails to recover coarse geometry in alltest cases. In our entire test dataset of 102 x-ray images there is not a single example where the directnetwork captures a geometric feature that our method misses. This demonstrates the strengths of ourapproach. For more examples of x-ray images please see Appendix E.
Robustness against dataset overfitting
Figure 6 illustrates the influence of the training data onreconstructions. Training with LSUN, CelebA (Liu et al. (2015)) and a synthetic dataset of randomoverlapping shapes (see Figure 15 in Appendix for examples) all give comparable reconstructions—adesirable property in applications where real ground truth is unavailable.We complement our results with reconstructions of checkerboard phantoms (standard resolution tests)and x-rays of metal castings in Figure 7. We note that in addition to better SNR, our method producesmore accurate geometry estimates, as per the annotations in the figure.
ONCLUSION
We proposed a new approach to regularize ill-posed inverse problems in imaging, the key idea beingto decompose an unstable inverse mapping into a collection of stable mappings which only estimate
OriginalNon-negativeleast squaresDirectProjNetsSubNet
Figure 7: Reconstructions on checkerboards and x-rays with 10dB measurement SNR tested on 10dBtrained networks. Red annotations highlight where the direct net fails to reconstruct correct geometry.8ow-dimensional projections of the model. By using piecewise-constant Delaunay subspaces, weshowed that the projections can indeed be accurately estimated. Combining the projections leads toa deconvolution-like problem. Compared to directly learning the inverse map, our method is morerobust against noise and corruptions. We also showed that regularizing via projections allows ourmethod to generalize across training datasets. Our reconstructions are better both quantitatively interms of SNR and qualitatively in the sense that they estimate correct geometric features even whenmeasurements are corrupted in ways not seen at training time. Future work involves getting preciseestimates of Lipschitz constants for various inverse problems, regularizing the reformulated problemusing modern regularizers (Ulyanov et al. (2017)), studying extensions to non-linear problems anddeveloping concentration bounds for the equivalent convolution kernel. A CKNOWLEDGEMENT
This work utilizes resources supported by the National Science Foundation’s Major Research Instru-mentation program, grant
EFERENCES
Jonas Adler and Ozan Öktem. Solving ill-posed inverse problems using iterative deep neural networks. arXiv preprint arXiv:1704.04058v2 , April 2017a.Jonas Adler and Ozan Öktem. Learned Primal-dual Reconstruction. arXiv preprintarXiv:1707.06474v1 , July 2017b.Stephan Antholzer, Markus Haltmeier, and Johannes Schwab. Deep Learning for PhotoacousticTomography from Sparse Data. arXiv preprint arXiv:1704.04587v2 , April 2017.Mauricio Araya-Polo, Joseph Jennings, Amir Adler, and Taylor Dahlke. Deep-learning tomography.
The Leading Edge , December 2017.Elena Beretta, Maarten V de Hoop, and Lingyun Qiu. Lipschitz Stability of an Inverse BoundaryValue Problem for a Schrödinger-Type Equation.
SIAM J. Math. Anal. , 45(2):679–699, March2013.Michael Bianco and Peter Gertoft. Sparse travel time tomography with adaptive dictionaries. arXivpreprint arXiv:1712.08655 , 2017.Ashish Bora, Ajil Jalal, Eric Price, and Alexandros G Dimakis. Compressed sensing using generativemodels. arXiv preprint arXiv:1703.03208 , 2017.Ashish Bora, Eric Price, and Alexandros G Dimakis. Ambientgan: Generative models from lossymeasurements. In
International Conference on Learning Representations (ICLR) , 2018.R Phillip Bording, Adam Gersztenkorn, Larry R Lines, John A Scales, and Sven Treitel. Applicationsof seismic travel-time tomography.
Geophysical Journal International , 90(2):285–303, 1987.Duane A Cooper. Learning lipschitz functions.
International Journal of Computer Mathematics , 59(1-2):15–26, 1995.Michele Di Cristo and Luca Rondi. Examples of exponential instability for inverse inclusion andscattering problems.
Inverse Problems , 19(3):685, 2003.Erica Galetti, Andrew Curtis, Brian Baptie, David Jenkins, and Heather Nicolson. TransdimensionalLove-wave tomography of the British Isles and shear-velocity structure of the East Irish Sea Basinfrom ambient-noise interferometry.
Geophys. J. Int. , 208(1):36–58, January 2017.Hayit Greenspan, Bram van Ginneken, and Ronald M Summers. Deep Learning in Medical Imaging:Overview and Future Promise of an Exciting New Technique.
IEEE Trans. Med. Imag. , 35(5):1153–1159, may 2016.Karol Gregor and Yann LeCun. Learning fast approximations of sparse coding. In
Proceedings of the27th International Conference on International Conference on Machine Learning , pages 399–406.Omnipress, 2010.Rémi Gribonval, Gilles Blanchard, Nicolas Keriven, and Yann Traonmilin. Compressive statisticallearning with random feature moments. arXiv preprint arXiv:1706.07180 , 2017.˙Inan Güler and Elif Derya Übeylı. ECG beat classifier designed by combined neural network model.
Pattern Recognition , 38(2):199–208, 2005.Yo Seob Han, Jaejun Yoo, and Jong Chul Ye. Deep Residual Learning for Compressed Sensing CTReconstruction via Persistent Homology Analysis. arXiv preprint arXiv:1611.06391 , November2016.Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Identity mappings in deep residualnetworks. In
European Conference on Computer Vision , pages 630–645. Springer, 2016a.Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for imagerecognition. In
Proceedings of the Conference on Computer Vision and Pattern Recognition , pages770–778. IEEE, 2016b. 10ohn Hole. Nonlinear high-resolution three-dimensional seismic travel time tomography.
Journal ofGeophysical Research: Solid Earth , 97(B5):6553–6562, 1992.S R H Hoole. Artificial neural networks in the solution of inverse electromagnetic field problems.
IEEE Trans. Magn. , 29(2):1931–1934, March 1993.Donna L Hudson and Maurice E Cohen.
Neural networks and artificial intelligence for biomedicalengineering . Wiley Online Library, 2000.Kyong Hwan Jin, Michael T McCann, Emmanuel Froustey, and Michael Unser. Deep ConvolutionalNeural Network for Inverse Problems in Imaging. arXiv preprint arXiv:1611.03679v1 , November2016.Brendan Kelly, Thomas P Matthews, and Mark A Anastasio. Deep Learning-Guided Image Recon-struction from Incomplete Data. arXiv preprint arXiv:1709.00584 , September 2017.Winston Lewis, Denes Vigh, et al. Deep learning prior models from seismic images for full-waveform inversion. In
SEG International Exposition and Annual Meeting . Society of ExplorationGeophysicists, 2017.Housen Li, Johannes Schwab, Stephan Antholzer, and Markus Haltmeier. NETT: Solving InverseProblems with Deep Neural Networks. arXiv preprint arXiv:1803.00092v1 , February 2018.Ziwei Liu, Ping Luo, Xiaogang Wang, and Xiaoou Tang. Deep learning face attributes in the wild. In
Proceedings of International Conference on Computer Vision (ICCV) , December 2015.Alice Lucas, Michael Iliadis, Rafael Molina, and Aggelos K Katsaggelos. Using Deep NeuralNetworks for Inverse Problems in Imaging: Beyond Analytical Methods.
IEEE Signal Process.Mag. , 35(1):20–36, 2018.Sebastian Lunz, Ozan Öktem, and Carola-Bibiane Schönlieb. Adversarial regularizers in inverseproblems. arXiv preprint arXiv:1805.11572 , 2018.Michael T McCann, Kyong Hwan Jin, and Michael Unser. Convolutional neural networks for inverseproblems in imaging: A review.
IEEE Signal Process. Mag. , 34(6):85–95, 2017.Domingo Mery, Vladimir Riffo, Uwe Zscherpel, Germán Mondragón, Iván Lillo, Irene Zuccar, HansLobel, and Miguel Carrasco. GDXray: The Database of X-ray Images for Nondestructive Testing.
Journal of Nondestructive Evaluation , 34, 11 2015.Takehiko Ogawa, Yukio Kosugi, and Hajime Kanada. Neural network based solution to inverseproblems. In
Neural Networks Proceedings, 1998. IEEE World Congress on ComputationalIntelligence. The 1998 IEEE International Joint Conference on , volume 3, pages 2471–2476. IEEE,1998.S M Omohundro. The Delaunay triangulation and function learning, 1989.Daniel Maria Pelt and Kees Joost Batenburg. Fast tomographic reconstruction from limited datausing artificial neural networks.
IEEE Trans. on Image Process. , 22(12):5238–5251, 2013.Mert Pilanci and Martin J Wainwright. Iterative hessian sketch: Fast and accurate solution approxima-tion for constrained least-squares.
The Journal of Machine Learning Research , 17(1):1842–1879,2016.Fatih Porikli, Shiguang Shan, Cees Snoek, Rahul Sukthankar, and Xiaogang Wang. Deep Learningfor Visual Understanding [From the Guest Editors].
IEEE Signal Process. Mag. , 34(6):24–25, Nov2017.Fatih Porikli, Shiguang Shan, Cees Snoek, Rahul Sukthankar, and Xiaogang Wang. Deep Learningfor Visual Understanding: Part 2 [From the Guest Editors].
IEEE Signal Process. Mag. , 35(1):17–19, Jan 2018.Ali Rahimi and Benjamin Recht. Random features for large-scale kernel machines.
Advances inNeural Information and Processing (NIPS) , 2008.11li Rahimi and Benjamin Recht. Weighted Sums of Random Kitchen Sinks: Replacing minimizationwith randomization in learning.
Advances in Neural Information and Processing (NIPS) , pages1313–1320, 2009.JH Rick Chang, Chun-Liang Li, Barnabas Poczos, BVK Vijaya Kumar, and Aswin C Sankara-narayanan. One Network to Solve Them All–Solving Linear Inverse Problems Using DeepProjection Models. In
Proceedings of the IEEE Conference on Computer Vision and PatternRecognition , pages 5888–5897, 2017.Olaf Ronneberger, Philipp Fischer, and Thomas Brox. U-net: Convolutional networks for biomedicalimage segmentation. In
International Conference on Medical Image Computing and Computer-Assisted Intervention , pages 234–241. Springer, 2015.Helmut Schiller and Roland Doerffer. Neural network for emulation of an inverse model operationalderivation of Case II water properties from MERIS data.
International Journal of Remote Sensing ,November 2010.Casper Kaae Sønderby, Jose Caballero, Lucas Theis, Wenzhe Shi, and Ferenc Huszár. Amortisedmap inference for image super-resolution. arXiv preprint arXiv:1610.04490 , 2016.Pablo Sprechmann, Roee Litman, Tal Ben Yakar, Alexander M Bronstein, and Guillermo Sapiro.Supervised sparse analysis and synthesis operators. In
Advances in Neural Information ProcessingSystems , pages 908–916, 2013.Plamen Stefanov and Gunther Uhlmann. Linearizing non-linear inverse problems and an applicationto inverse backscattering.
Journal of Functional Analysis , 256(9):2842–2866, 2009.Dmitry Ulyanov, Andrea Vedaldi, and Victor Lempitsky. Deep image prior. arXiv preprintarXiv:1711.10925 , 2017.Singanallur V Venkatakrishnan, Charles A Bouman, and Brendt Wohlberg. Plug-and-play priorsfor model based reconstruction. In
Global Conference on Signal and Information Processing(GlobalSIP), 2013 IEEE , pages 945–948. IEEE, 2013.Martin Vetterli, Jelena Kovaˇcevi´c, and Vivek K Goyal.
Foundations of signal processing . CambridgeUniversity Press, 2014.Ge Wang. A perspective on deep imaging.
IEEE Access , 4:8914–8924, 2016.Bo Xin, Yizhou Wang, Wen Gao, David Wipf, and Baoyuan Wang. Maximal sparsity with deepnetworks? In
Advances in Neural Information Processing Systems , pages 4340–4348, 2016.Huajian Yao, Robert D van Der Hilst, and Maarten V De Hoop. Surface-wave array tomography inse tibet from ambient seismic noise and two-station analysis—i. phase velocity maps.
GeophysicalJournal International , 166(2):732–744, 2006.Jong Chul Ye, Yoseob Han, and Eunju Cha. Deep convolutional framelets: A general deep learningframework for inverse problems.
SIAM Journal on Imaging Sciences , 11(2):991–1048, 2018.Fisher Yu, Yinda Zhang, Shuran Song, Ari Seff, and Jianxiong Xiao. LSUN: Construction ofa Large-scale Image Dataset using Deep Learning with Humans in the Loop. arXiv preprintarXiv:1506.03365 , 2015.Alp Yurtsever, Madeleine Udell, Joel A Tropp, and Volkan Cevher. Sketchy decisions: Convexlow-rank matrix optimization with optimal storage. arXiv preprint arXiv:1702.06838 , 2017.Hanming Zhang, Liang Li, Kai Qiao, Linyuan Wang, Bin Yan, Lei Li, and Guoen Hu. ImagePrediction for Limited-angle Tomography via Deep Learning with Convolutional Neural Network. arXiv preprint arXiv:1607.08707v1 , July 2016.Bo Zhu, Jeremiah Z Liu, Stephen F Cauley, Bruce R Rosen, and Matthew S Rosen. Image recon-struction by domain-transform manifold learning.
Nature , 555(7697):487, March 2018.12igure 8: Orthogonal vs. oblique projections. There is no linear operator acting on y or on theorthogonal projection (cid:101) y = P R ( A ∗ ) x = A † y that can compute the orthogonal projection into S . A N
EED FOR NON - LINEAR OPERATORS
We explain the need for non-linear operators even in the absence of noise with reference to Figure 8.Projecting x into a given known subspace is a simple linear operation, so it may not be a priori clearwhy we use non-linear neural networks to estimate the projections. Alas, we do not know x and onlyhave access to y . Suppose that there exists a linear operator (a matrix) F ∈ R N × M which acts on y and computes the projection of x on S λ . A natural requirement on F is consistency: if x alreadylives in S λ , then we would like to have F Ax = x . This implies that for any x , not necessarily in S λ , we require F AF Ax = F Ax which implies that
F A = (
F A ) is an idempotent operator.Letting the columns of B λ be a basis for S λ , it is easy to see that the least squares minimizer for F is B λ ( AB λ ) † . However, because R ( F ) = S λ (cid:54) = R ( A ∗ ) ( A ∗ is the adjoint of A , simply a transposefor real matrices), in general it will not hold that ( F A ) ∗ = F A . Thus,
F A is an oblique, ratherthan orthogonal projection into S . In Figure 8 this corresponds to the point P oblique S λ x which can bearbitrarily far from the orthogonal projection P ortho S λ x . The nullspace of this projection is precisely N ( A ) = R ( A ∗ ) ⊥ .Thus consistent linear operators can at best yield oblique projections which can be far from theorthogonal one. One could also see this geometrically from Figure 8. As the angle between S λ and R ( A ∗ ) increases to π/ the oblique projection point travels to infinity (note that the obliqueprojection always happens along the nullspace of A , which is the line orthogonal to P R ( A ∗ ) . Sinceour subspaces are chosen at random, in general they are not aligned with R ( A ∗ ) . The only subspaceon which we can linearly compute an orthogonal projection from y is R ( A ∗ ) ; this is given by theMoore-Penrose pseudoinverse. Therefore, to get the orthogonal projection onto random subspaces,we must use non-linear operators. More generally, for any other ad hoc linear reconstruction operator W , W y = W Ax always lives in the column space of
W A which is a subspace whose dimensionis at most the number of rows of A . However, we do not have any linear subspace model for x .As shown in the right half of Figure 8, as soon as A is injective on X , the existence of this non-linearmap is guaranteed by construction: since y determines x , it also determines P S λ x .We show the results of numerical experiments in Figures 9 and 10 which further illustrate theperformance difference between linear oblique projectors and our non-linear learned operator whenestimating the projection of an image into a random subspace. We refer the reader to the captionsbelow each figure for more details. B P
ROOF OF PROPOSITION Proof.
The reconstruction of the new inverse problem can be written as (cid:98) x = (cid:101) BB (cid:62) x where thecolumns of (cid:101) B = ( B (cid:62) ) † form a biorthogonal basis to the columns of B . Thus (cid:98) x = K Λ (cid:88) p =1 (cid:104) x , b p (cid:105) (cid:101) b p . bliqueprojection Subspace at a typical angle Subspace at an atypical angleNo noise 10dB measurementnoise No noise 10dB measurementnoiseProjNetprojection MSE: 1688.4913 MSE: 337427.3048
MSE: 0.0051 MSE: 0.0442
MSE: 0.0065 MSE: 0.0401
MSE: 0.4471 MSE: 8.0348
Model, x
Perfect orthogonalprojection into a subspace at a typical angle
Perfect orthogonalprojection into a subspace at an atypical angle
Figure 9: Comparison between perfect orthogonal projection, ProjNet projections and obliqueprojection. The projections of an image, x , (same as Figure 5) are obtained using ProjNet and thelinear oblique projection method. The mean-squared errors (MSE) between the obtained projectionsand the perfect projections are stated. The subspaces used in this figure were used in the ProjNetreconstructions. ProjNet (Fig. 5):130 subspaceswith TV and BC SubNet (Fig. 5):350 subspaceswith BCModel, x Linear approach:130 subspaceswith TV and BC Linear approach:350 subspaceswith BC
Figure 10: We try hard to get the best reconstruction from the linear approach. SNRs are indicatedin the bottom-left of each reconstruction. In the linear approach, coefficients are obtained using thelinear oblique projection method. Once coefficients are obtained, they are non-linearly reconstructedaccording to (2). Both linear approach reconstructions use the box-constraint (BC) mentioned in (2).For the 130 subspace reconstruction total-variation (TV) regularization is also used. Therefore, oncethe coefficients are obtained using the linear approach, the reconstruction of the final image is donein an identical manner as ProjNet for 130 subspaces and SubNet for 350 subspaces. To give the linearapproach the best chance we also optimized hyperparameters such as the regularization parameter togive the highest SNR. 14sing the definition of the inner product and rearranging, we get (cid:98) x ( u ) = (cid:42) K Λ (cid:88) p =1 b p ( · ) (cid:101) b p ( u ) , x (cid:43) def = (cid:104) κ ( u, · ) , x (cid:105) where κ ( u, v ) def = (cid:80) K Λ p =1 b p ( v ) (cid:101) b p ( u ) . Now, the probability distribution of triangles around anypoint u is both shift- and rotation-invariant because a Poisson process in the plane is shift- androtation-invariant. It follows that E κ ( u, v ) = (cid:101) κ ( (cid:107) u − v (cid:107) ) for some (cid:101) κ , meaning that ( E (cid:98) x )( u ) = E (cid:104) κ ( u, · ) , x (cid:105) = (cid:104) (cid:101) κ ( (cid:107) u − ·(cid:107) ) , x (cid:105) = ( x ∗ (cid:101) κ )( u ) which is a convolution of the original model with a rotationally invariant (isotropic) kernel. C N
ETWORK ARCHITECTURES
Figure 11 explains the network architecture used for ProjNet and SubNet. The network consists of asequence of downsampling layers followed by upsampling layers, with skip connections (He et al.(2016b;a)) between the downsampling and upsampling layers. Each ProjNet output is constrainedto a single subspace by applying a subspace projection operator, P S λ . We train such networksand reconstruct from the projection estimate using (2). SubNet is a single network that is trainedover multiple subspaces. To do this, we change its input to be [ (cid:101) y B λ ] . Moreover, we apply the sameprojection operator as ProjNet to the output of the SubNet. Each SubNet is trained to give projectionestimates over random subspaces. This approach allows us to scale to any number of subspaceswithout training new networks for each. Moreover, this allows us to build an over-constrainedsystem q = Bx to solve. Even though SubNet has almost as many parameters as the direct net,reconstructing via the projection estimates allows SubNet to get higher SNR and more importantly,get better estimates of the coarse geometry than the direct inversion. All networks are trained withthe Adam optimizer. ProjNet:
32 channels128x1286464x64 12832x32 25616x16 5128x8 25616x16 12832x32 6464x64 32128x1281 concatenate as one input
SubNet:
64 channels128x12812864x64 25632x32 51216x16 10248x8 51216x16 25632x32 12864x64 64128x1281 projection into input subspace
Figure 11: a) ProjNet architecture; b) SubNet architecture. In both cases, the input is a non-negativeleast squares reconstruction and the network is trained to reconstruct a projection into one subspace.In SubNet, the subspace basis is concatenated to the non-negative least squares reconstruction.
D F
URTHER RECONSTRUCTIONS
We showcase more reconstructions on actual geophysics images taken from the BP2004 dataset inFigure 12. Note that all networks were trained on the LSUN bridges dataset.15 ∞ dB ∞ dB10dB Training SNR T e s t i n g S N R Direct ProjNets SubNet Direct ProjNets SubNet ∞ dB ∞ dB10dB Training SNR T e s t i n g S N R Direct ProjNets SubNet Direct ProjNets SubNet ∞ dB ∞ dB10dB Training SNR T e s t i n g S N R Direct ProjNets SubNet Direct ProjNets SubNet
Figure 12: Geophysics image patches taken from BP2004 dataset. Our method especially gets correctglobal shapes with better accuracy even when tested on noise levels different from training.
OriginalDirectProjNetsSubNet
Figure 13: Reconstructions from erasures on x-ray images with erasure probability p = . E E
RASURE RECONSTRUCTIONS
We show additional reconstructions for the largest corruption case, p = , for x-ray images (Figure13) and geo images (Figure 14). Our method consistently has better SNR. More importantly we notethat there is not a single instance where the direct reconstruction gets a feature that our methodsdo not. The majority of times, the direct network misses a feature of the image. This is highlyundesirable in settings such as geophysical imaging.16 .8311.4811.74 4.768.667.98 8.0813.7613.00 5.519.369.21 OriginalDirectProjNetsSubNet
Figure 14: Reconstructions from erasures on geo images with erasure probability p = .Figure 15: Examples from the random shapes dataset which is used in Figure 6. F S