[PDF] Distributed optimization with tunable learned priors for robust ptycho-tomography

Abstract

Joint ptycho-tomography is a powerful computational imaging framework to recover the refractive properties of a 3D object while relaxing the requirements for probe overlap that is common in conventional phase retrieval. We use an augmented Lagrangian scheme for formulating the constrained optimization problem and employ an alternating direction method of multipliers (ADMM) for the joint solution. ADMM allows the problem to be split into smaller and computationally more manageable subproblems: ptychographic phase retrieval, tomographic reconstruction, and regularization of the solution. We extend our ADMM framework with plug-and-play (PnP) denoisers by replacing the regularization subproblem with a general denoising operator based on machine learning. While the PnP framework enables integrating such learned priors as denoising operators, tuning of the denoiser prior remains challenging. To overcome this challenge, we propose a tuning parameter to control the effect of the denoiser and to accelerate the solution. In our simulations, we demonstrate that our proposed framework with parameter tuning and learned priors generates high-quality reconstructions under limited and noisy measurement data.

Full PDF

DD ISTRIBUTED OPTIMIZATION WITH TUNABLE LEARNED PRIORSFOR ROBUST PTYCHO - TOMOGRAPHY

A P

REPRINT

Selin Aslan

X-ray Science DivisionArgonne National LaboratoryLemont, IL 60439 [email protected]

Zhengchun Liu

Data Science and Learning DivisionArgonne National LaboratoryLemont, IL 60439, USA

Viktor Nikitin

X-ray Science DivisionArgonne National LaboratoryLemont, IL 60439

Tekin Bicer

X-ray Science Division &Data Science and Learning DivisionArgonne National LaboratoryLemont, IL 60439

Sven Leyffer

Mathematics and Computer Science DivisionArgonne National LaboratoryLemont, IL 60439

Doga Gursoy

X-ray Science DivisionArgonne National LaboratoryLemont, IL 60439 andDepartment of Electrical Engineering and Computer Science,Northwestern UniversityEvanston, Illinois 60208September 22, 2020 A BSTRACT

Joint ptycho-tomography is a powerful computational imaging framework to recover the refractiveproperties of a 3D object while relaxing the requirements for probe overlap that is common in con-ventional phase retrieval. We use an augmented Lagrangian scheme for formulating the constrainedoptimization problem and employ an alternating direction method of multipliers (ADMM) for thejoint solution. ADMM allows the problem to be split into smaller and computationally more manage-able subproblems: ptychographic phase retrieval, tomographic reconstruction, and regularization ofthe solution. We extend our ADMM framework with plug-and-play (PnP) denoisers by replacing theregularization subproblem with a general denoising operator based on machine learning. While thePnP framework enables integrating such learned priors as denoising operators, tuning of the denoiserprior remains challenging. To overcome this challenge, we propose a tuning parameter to controlthe effect of the denoiser and to accelerate the solution. In our simulations, we demonstrate that ourproposed framework with parameter tuning and learned priors generates high-quality reconstructionsunder limited and noisy measurement data. K eywords ptychography · tomography · imaging · plug-and-play priors · learned priors Ptychography [21] is a scanning-based coherent diffraction imaging technique that avoids the spatial resolutionlimitations of lens-based microscopy and, when data is acquired tomographically, can provide high-resolution imagingof volumetric samples in 3D [12]. In ptycho-tomography, a 3D object is scanned with a focused illumination beam to a r X i v : . [ ee ss . I V ] S e p PREPRINT - S

EPTEMBER

22, 2020

Illumination beam Scan points 3D object PixelateddetectorRotation axisScan direction O p t i c a l a x i s Figure 1: Schematic of a ptycho-tomography experiment. A 3D object is scanned with a focused illumination beam atdifferent scan points, while collecting far-ﬁeld diffraction images with a pixelated detector. This process is repeated foreach view of the object around a common rotation axis to collect tomographic data.collect a series of diffraction patterns through a pixelated far-ﬁeld detector; see Fig. 1. The detector records the intensityimages of the incoming wave on detector plane; therefore, the phase of the wave needs to be recovered in order torecover the object from measured data through a computational procedure called the phase retrieval. This scanningprocedure can be repeated for different view angles of the 3D object around a common rotation axis in order to collecttomographic data and to recover the object in 3D. The conventional approach for reconstruction then consists of solvinga 2D ptychographic phase retrieval problem independently for each angle, followed by a 3D tomographic reconstructionfrom the retrieved angular projections of the phase (and amplitude) of the object plane wave. Because phase retrievalalgorithms require signiﬁcant overlap (60 % or more) between neighboring illuminations for a successful recovery, thesequential approach is not optimal and limits scanning large volumes within reasonable data collection times.While the sequential approach, that is, ﬁrst performing phase retrieval for each angle and then tomographicallyreconstructing the object, is still the method of choice in practice, recent efforts have focused on relaxing or avoidingthe illumination overlap requirement. These methods pose the reconstruction problem in a joint fashion. In other words,the phase retrieval problems for each rotation angle are solved simultaneously with the tomographic reconstructionthrough a joint optimization framework, resulting in a better-posed problem with those extra constraints and allowingfor less stringent scanning requirements. Beginning with the ﬁrst successful demonstration of the joint inversion conceptthrough a numerical simulation [17], and later on experimentally [24], recent efforts have focused on further relaxingthese overlap constraints and ﬁnding new sparse scanning schemes for high-speed or dose-efﬁcient implementations.Different optimizers such as the Levenberg-Marquadt algorithm [32] and Adam algorithm [13] have been used forsuccessfully solving the joint problem. At the same time, an extensible and generic distributed optimization frameworkhas been proposed [2] as a solution when additional experimental errors due to noise, motion blur, or other types ofmodel mismatches need to be corrected. The framework is based on the alternating direction method of multipliers(ADMM) [5] and allows splitting the problem into smaller parts where each subproblem can be solved with anindependent optimizer. With this modular structure, the whole reconstruction procedure can be expanded by adding newsubproblems that often emerge in practical experimental settings. For example, we can incorporate different types ofprior knowledge to regularize the solution when data points are signiﬁcantly reduced or when data is heavily corruptedwith noise [30].Choosing an appropriate prior for the model has been a challenging topic of many research papers in image processing.In order to tackle this challenge, several regularization methods have been introduced. Some methods deﬁne priorsexplicitly in a regularized optimization framework such as total variation (TV) [34], Tikhonov regularization [41],and other types of sparsity-based regularizations [42]; others do not have explicit formulation as an optimizationproblem, such as BM3D [9] and WNMM [16]. Also recently, learning-based denoisers have been popularized becauseof their success in improving the quality of low-dose images [23]. Unlike physics-based optimization methods, learnedpriors are based on training a mapping between noisy images and a desirable image, and they are often applied afterthe reconstruction step is completed [28] or, in some cases, before the reconstruction in order to improve the rawdata [47]. Furthermore, with the aid of special hardware, the reconstruction times can be improved signiﬁcantly.Incorporating learned priors into the ADMM framework is challenging, however, because the corresponding regularizedoptimization problem may not be explicitly deﬁned. To overcome this challenge, Venkatakrishnan et al. [43] proposedthe plug-and-play (PnP) framework, which enables integrating implicit priors for denoising, to complement model-based2 PREPRINT - S

EPTEMBER

22, 2020Figure 2: The ground truth (left) and a representative ptychotomographic reconstruction of a slice of the 3D syntheticchip section (right) to demonstrate the effect of measurement noise on reconstructions. Because the high-frequencysignals in measurement are corrupted more than the low-frequency signals, the resulting effect in the reconstruction is amix of strong blurring and weak speckle noise.optimization methods. Although the PnP framework was originally proposed ad hoc, it has been popularized quickly invarious inverse problems because of its performance [6, 18, 25, 33, 38, 40, 45, 48]. This success has also led to relatedstudies; for example, convergence of PnP has also been discussed in [6, 7, 35, 39].While the PnP framework provides ﬂexible means to incorporate machine-learning-based denoising models intophysics-based models, it has been used only for additive white Gaussian noise (AWGN) denoising of linear problems.In ptychography, however, or in phase retrieval problems in general, the problem is highly nonconvex and hard tosolve optimally. In addition, the noise in measurement data in ptychography is different from that in AWGN noisemodels in that it leads to image blurring; see Fig. 2 for a representative example. This is because the measurementsare taken in Fourier space; thus, high-frequency signals dampen quickly, and in turn they are more corrupted than thelow-frequency signals because of the Poisson measurement statistics. Therefore, the state-of-the-art AWGN denoisersare not effective in addressing noise in ptycho-tomography. Moreover, ADMM can reach a modest accuracy evenwhen the individual subproblems do not converge to optimal values [5], and this fact has been used for acceleratingptycho-tomography reconstruction [2]. When acceleration is used, however, the role of the denoiser for approximatesolutions of the subproblems needs to be balanced for stabilizing the solution. To this end, in addition to the learningpriors that we propose for ptycho-tomography denoising, we introduce a tuning parameter that gives weight to the dataﬁdelity term at earlier iterations and gradually increases the weight of the denoiser at later iterations as we get closer tothe solution.In this paper, we propose a distributed optimization framework for the ptycho-tomography problem using the PnPframework with learned priors. To accurately model our problem at low doses, we use the Poisson measurement process.We split the problem into three parts: ptychographic phase retrieval, tomographic reconstruction, and denoiser. We showthe solution details of each subproblem. We also introduce and study the effects of a tuning parameter for stabilizingthe subproblems when we solve each subproblem approximately for acceleration of the solution of the problem. In thefollowing, we summarize our contributions. • Distributed optimization with GAN-based learned priors : We propose using conditional coupled generativeadversarial network (GAN) as a learned prior for ﬁltering out the unique noise effects in the ptycho-tomographyreconstruction. We implemented this approach on GPUs and validated its effectiveness with highly sparse dataand noisy measurements by comparing with conventional ofﬂine denoising and TV regularization. • Tuning parameter for acceleration of the solution : While PnP enables integrating denoising algorithms intothe ADMM framework, the control of denoiser priors remains challenging in the existing PnP framework,especially when the subproblems are solved only approximately to accelerate the solution. To tackle thischallenge, we propose a tuning parameter to control the effect of the learned priors on the reconstruction, andwe present our analysis for effective selection of this parameter. • Robust reconstruction with highly sparse sampled and noisy data : Parameter tuning and integration of learnedpriors into the optimization generates high-quality reconstructions with limited and noisy measurement data.Our results show that our optimizations can decrease the total number of required projections (with signiﬁcantlyfewer overlapped regions) by 75% compared with using adequately sampled data (based on Nyquist) whilemaintaining good image quality.The remainder of this paper is organized as follows. In Section 2, we give an overview of the joint ptycho-tomographyproblem and its solution using the ADMM method. Section 3 describes the challenges in using the original PnPframework and how we tackle the problem. The training, network design, and other important implementation details of3

PREPRINT - S

EPTEMBER

22, 2020the framework are given in Section 4. In Section 5, we validate our proposed framework for the joint ptycho-tomographyproblem via simulated experiments. Discussion and conclusions are given in Section 6.

In this section, we formulate the ptycho-tomography forward and inverse problems and describe the ADMM schemefor the reconstruction.

In the ptycho-tomography problem, the model for reconstructing the complex refractive index of a 3D object, x = δ + iβ ,is given by Poisson {| G H x | } = d. (1)Here we use a Poisson-based measurement model, which accurately captures photon-counting statistics in diffractiondata in Fourier space. G is the ptychography operator, H is the tomography operator, x is the unknown object, and d is the measurement data. G is deﬁned as G ψ = F Q ψ , where ψ = H x is the object transmission function, F is thediscrete Fourier transform operator, and Q is the illumination matrix. H is deﬁned as H x = exp ( ık R x ) , where ı is √− , k is the wavenumber of the illumination beam, and R is the Radon transform [19]. Let p ( x | d ) be the posterior conditional probability of having object x with given measurements d . Then using Bayes’srule, the maximum a posteriori probability (MAP) estimate for the solution x MAP is deﬁned as follows: x MAP = arg max x p ( d | x ) p ( x ) p ( d )= arg min x − log { p ( d | x ) } − log { p ( x ) } , (2)where log p ( d | x ) is the log-likelihood of the observation and log p ( x ) is the prior of x , also referred to as the regulariza-tion term. The MAP estimate (2) for the ptycho-tomography model (1) is given as x MAP = arg min x n (cid:88) j =1 (cid:0) | G H x | j − d j log | G H x | j (cid:1) + ϕ N ( x ) , (3)where ϕ N ( x ) is the regularization term to stabilize or to constrain the solution. For simplicity of notation, j indexes allmeasurement varieties, namely, detector pixel, rotation angle, and scan position. Next, we rewrite (3) into a consensusform by using auxiliary variables ψ and η : min ψ,η,x n (cid:88) j =1 (cid:0) | G ψ | j − d j log | G ψ | j (cid:1) + ϕ N ( η ) , subject to (cid:26) H x = ψ,x = η. (4)The objective function is a real-valued function of complex variables, and its augmented Lagrangian is a complex-valuedfunction. We follow [27] and work with the following real-valued augmented Lagrangian: L λ,µρ,τ ( ψ, x, η ) = n (cid:88) j =1 (cid:0) | G ψ | j − d j log | G ψ | j (cid:1) + ϕ N ( η )+ 2 Re { λ H ( H x − ψ ) } + ρ (cid:107)H x − ψ (cid:107) (5) + 2 Re { µ H ( x − η ) } + τ (cid:107) x − η (cid:107) , where ρ > and τ > are penalty parameters, λ and µ represent dual variables, and H corresponds to the Hermitianconjugate. This augmented Lagrangian enables us to include the linear terms, Re { λ H ( H x − ψ ) } , ρ (cid:107)H x − ψ (cid:107) , and Re { µ H ( x − η ) } , τ (cid:107) x − η (cid:107) in the L2-terms. 4 PREPRINT - S

EPTEMBER

22, 2020

Minimization of (5) can be achieved by ADMM with the iterations as follows: ψ k +1 = arg min ψ n (cid:88) j =1 (cid:0) | G ψ | j − d j log | G ψ | j (cid:1) + ρ (cid:13)(cid:13) H x k − ψ + λ k /ρ (cid:13)(cid:13) , (6) x k +1 = arg min x ρ (cid:13)(cid:13) H x − ψ k +1 + λ k /ρ (cid:13)(cid:13) + τ (cid:13)(cid:13) x − η k + µ k /τ (cid:13)(cid:13) , (7) η k +1 = arg min η ϕ N ( η ) + τ (cid:13)(cid:13) x k +1 − η + µ k /τ (cid:13)(cid:13) , (8)which is then followed by dual variable updates: λ k +1 = λ k + ρ (cid:0) H x k +1 − ψ k +1 (cid:1) , (9) µ k +1 = µ k + τ (cid:0) x k +1 − η k +1 (cid:1) . (10)The dual variable updates promote the satisfaction of the constraints. Using the ADMM framework, we formulatethe joint ptycho-tomography problem (2) in terms of three independently deﬁned subproblems: ptychographic phaseretrieval in Eq. (6), tomographic reconstruction in Eq. (7), and regularization in Eq. (8). To analyze the convergence behavior, we monitor the optimality conditions for the ADMM problem, which are theprimal and dual feasibility. For our problem, the primal residuals for the two constraints at iteration k + 1 are deﬁned asfollows: r k +11 = H x k +1 − ψ k +1 , (11) r k +12 = x k +1 − η k +1 , (12)which we call the ﬁrst and second primal residuals, respectively. In addition, we deﬁne the residual for dual feasibilityat iteration k + 1 as follows: s k +1 = H x k +1 − H x k ; (13)see [2, Section. (2.3)]. In Section 5, we present a numerical study of residual decays for our application. Next, wefocus on the solutions of the subproblems. For the ﬁrst subproblem, we minimize the following objection function: F P ( ψ ) = n (cid:88) j =1 (cid:0) | G ψ | j − d j log | G ψ | j (cid:1) + ρ (cid:13)(cid:13)(cid:13) H x k − ψ + λ k /ρ (cid:13)(cid:13)(cid:13) . (14)The corresponding gradient is ∇ ψ F P ( ψ ) = G H (cid:18) G ψ − d ( G ψ ) ∗ (cid:19) − ρ ( H x k − ψ + λ k /ρ ) , (15)which is computed by using the Wirtinger calculus [22]. Here ∗ denotes the complex conjugate. For the solution, we use theconjugate gradient (CG) method [31]: ψ m +1 = ψ m + γ m ξ m , (16)where γ m is a step length computed via a backtracking line search method and ξ m is the search direction. The ﬁrst iteration isthe steepest descent direction, ξ = −∇ ψ F P ( ψ ) . For other iterations, ξ m +1 is computed recursively by using the Dai-Yuan [10]formula, which gives the fastest convergence in our simulations: ξ m +1 = −∇ ψ F P ( ψ m +1 ) + (cid:107)∇ ψ F P ( ψ m +1 ) (cid:107) y Hm ξ m ξ m , (17)where y m = ( ∇ ψ F P ( ψ m +1 ) − ∇ ψ F P ( ψ m )) . PREPRINT - S

EPTEMBER

22, 2020

For solving the subproblem with respect to x in Eq. (7), we transform the nonlinearity introduced by H x as in [30] and insteadminimize the following objection function: F T ( x ) = ρ (cid:107)K R x − ζ (cid:107) + τ (cid:13)(cid:13)(cid:13) x k +1 − η + µ k /τ (cid:13)(cid:13)(cid:13) , (18)where the linear diagonal operator K is deﬁned as K R x = 2 πiν ( ψ k +1 − λ k /ρ ) R x (19)and ζ is given by ζ = ( ψ k +1 − λ k /ρ ) log( ψ k +1 − λ k /ρ ) . (20)Hence, we replace the objective function in Eq. (7) with Eq. (18). The gradient is given as follows: ∇ x F T ( x ) = ρ R T K H ( K R x − ξ ) + τ ( x − η k + µ k /τ ) . (21)Similar to the ptychography subproblem, we use the CG method with the Dai-Yuan formula; see (16) and (17).While Eqs. (6)–(7) can be solved via well-known optimization methods, the solution of Eq. (8) depends on the choice of the imageprior. The question of how to choose a prior, − log { p ( x ) } = ϕ N ( η ) is a challenging topic in image processing. While one canchoose an explicit image prior and measure its distance using the TV norm, we turn our attention to learning-based priors because oftheir ﬂexibility and efﬁcient implementation. In this section, we discuss the solution of the denoising problem. We ﬁrst rewrite Eq. (8) for some prior N ( η ) as follows: η k +1 = arg min η N ( η ) + τ /ϕ (cid:13)(cid:13)(cid:13) ˜ x k +1 − x (cid:13)(cid:13)(cid:13) , (22)where ˜ x k +1 = x k +1 + µ k /τ and x correspond to the noisy and noise-free images, respectively. Several state-of-the art denoisers donot have closed-form expressions for the prior, N ( η ) . Hence, integrating these denoisers into the joint ptycho-tomography problemis challenging. We use the PnP framework [43] to replace Eq. (22) with a general denoising operator as follows: η k +1 = Denoiser (cid:16) ˜ x k +1 (cid:17) , (23)where an explicit deﬁnition of the image prior, N ( η ) , is not necessarily known. While PnP was originally proposed to removethe AWGN of variance, σ = τ / ϕ , the method has been extended to Poisson inverse problems [33]. In this work, we use aPoisson-based model to accurately capture photon-counting statistics in diffraction data. While we still use the ADMM to solveEq. (4) and while the ﬁrst two subproblems corresponding to the ptychographic phase retrieval and tomography are the same, the lastsubproblem corresponding to the regularization is replaced with a denoising operator in Eq.(23). The PnP framework allows us touse state-of-the-art denoising algorithms, such as BM3D [9], K-SVD [14], and WNNM [16]. Although the PnP framework does notgive a clear deﬁnition of the objective function because of the implicit regularization parameter, the method has shown empiricalsuccess in various image reconstruction problems [18, 25, 33, 45].Alternatively, deep-learning-based denoisers have shown great success implementing the PnP framework; see, for example, [8,29,49].In this work, we use our recently developed denoising technique based on GAN, whose implementation details will be discussed inthe following section.We point out that the regularization parameter, ϕ , that tunes the regularization term in Eq. (3) is associated with the additive noise inthe denoising operator, σ = τ / ϕ . In our application, we have observed that replacing the regularizer problem, Eq. (22), by thedenoising operator using Eq. (23) can lead to divergence of the overall ADMM scheme. In particular, it appears that the denoiserpushes early iterations to nonphysical solutions from which the ADMM cannot recover. This observation motivates the introductionof a denoising parameter, α k ∈ [0 , , that controls the inﬂuence of the denoising operator. In particular, we rewrite Eq. (23) as η k +1 = α k Denoiser (˜ x k +1 ) + (1 − α k )˜ x k +1 , (24)which makes η k +1 a convex combination of the denoised reconstructions and the noisy reconstructions, ˜ x k +1 . The extremes α k = 0 and α k = 1 corresponds to the maximum likelihood (ML) estimate (i.e., no regularizer) and full denoising (i.e., PnP denoiser),respectively. In our implementations, we tune α k to provide fast convergence to good reconstructions. An alternative tuningoperator has also been proposed in [46] via denoising scaling. In Section 5, we discuss the effect of tuning parameters for theptycho-tomography problem.One challenge that arises from including Eq. (24) in the ADMM framework is that it does not directly correspond to an optimizationproblem (unless the denoiser can be written as a gradient) and therefore cannot directly be included in the augmented Lagrangian (5).This make it harder to generalize the traditional augmented Lagrangian or ADMM convergence theory. PREPRINT - S

EPTEMBER

22, 2020

Generator

Noisy Image

Pre-TrainedVGG

Denoised Image

DiscriminatorPixel-wiseL2-Norm WassersteinDistanceAdversarialLossPerceptualLossBack-propagation and weights updatingBack-propagation andweights updatingMSE W e i gh t ed A v e r age Learned Prior

Figure 3: Model training pipeline. Once the model is trained, only the generator is used as the learned prior to advancethe tomographic reconstructions.

In this section we discuss the implementation aspects of our approach; see Algorithm 1. For the ptychography and tomographysubproblems, we use the same solvers (CG) as in our previous work; see Section 4 in [30]. Hence, we devote this section to thedetails of the denoising operator used in Eq. (24).In our simulations, we use our recently developed denoiser, TomoGAN [28], an image-quality enhancement model based ongenerative adversarial networks [15], which was originally developed for low-dose X-ray imaging as the learned prior. Figure 3shows the training pipeline of the model where two neural networks (i.e., generator and discriminator) contend with each otherduring the training until an equilibrium is reached. Speciﬁcally, the generative network generates noise-free images from noisyimages while the discriminative network evaluates them; thus both networks are trained from the competition. The VGG [37] isa neural network model with 16 convolutional neural network (CNN) layers followed by three fully connected layers for imageclassiﬁcation. Here, the VGG was pretrained with the ImageNet dataset [11], and we only keep the 19 CNN layers to work as afeature extractor for quantifying the difference between denoised image and true image in VGG’s feature space. The generator modelwill work as the learned prior (i.e., denoiser in Eq. (24)) once trained by using the pipeline. That is, we can input a noisy image to thegenerator, and it outputs the corresponding enhanced image.The TomoGAN generator network architecture is a variation of the U-Net architecture proposed for biomedical image segmentationby Shan et al. [36]. It comprises a down-sampling network followed by an up-sampling network. In the down-sampling process,three sets of two convolution kernels (the three boxes) extract feature maps. Then, followed by a pooling layer, the feature mapprojections are distilled to the most essential elements by using a signal maximizing process. Ultimately, the feature maps are / of the original size. Successful training should result in the 128 channels in this feature map, retaining important features. In theup-sampling process, bilinear interpolation is used to expand the feature maps. At each layer, high-resolution features from thedown-sampling path are concatenated to the up-sampled output from the layer below to form a large number of feature channels.This structure allows the network to propagate context information to higher-resolution layers, so that the following convolutionlayer can learn to assemble a more precise output based on this information. The detailed TomoGAN generator architecture can befound in [28].We implemented TomoGAN with TensorFlow [1] and used one NVIDIA Tesla V100 GPU card for training. The Adam algorithm [26]was used to train both the generator and discriminator, with a batch size of 16 samples. As a data augmentation to avoid overﬁtting,each image of the batch is a patch (of size 128 × ×

256 image.

Algorithm 1

Joint ptycho-tomography reconstruction with learned priorsGiven ≤ α k ≤ , ρ > , τ > Initialize: ψ , η , x , λ , µ ψ k +1 ← arg min ψ (cid:80) nj =1 (cid:0) | G ψ | j − d j log | G ψ | j (cid:1) + ρ (cid:13)(cid:13) H x k − ψ + λ k /ρ (cid:13)(cid:13) x k +1 ← arg min x ρ (cid:13)(cid:13) H x − ψ k +1 + λ k /ρ (cid:13)(cid:13) + τ (cid:13)(cid:13) x − η k + µ k /τ (cid:13)(cid:13) ˜ x k +1 ← x k +1 + µ k /τη k +1 ← α k Denoiser (˜ x k +1 ) + (1 − α k )˜ x k +1 λ k +1 ← λ k + ρ (cid:0) H x k +1 − ψ k +1 (cid:1) µ k +1 ← µ k + τ (cid:0) x k +1 − η k +1 (cid:1) PREPRINT - S

EPTEMBER

22, 2020Figure 4: (Left) 3D object [30]. (Right) 2D slice of the 3D synthetic chip of the real part of the object, δ .Figure 5: Photon counts on average at the detector for different noise levels. From left to right, the noise level increases. In this section we demonstrate the effectiveness of applying the proposed framework for reconstruction of a 3D simulated chipobject; see Figure 4, left. First, we provide the simulation details. Next, we show reconstruction results and further analyze resultsfor undersampled measurements. Then, we discuss the effect of tuning parameters, α , on convergence and show reconstructionquality for each parameter using the full reference metric, SSIM index [44]. In our experiments the object is a simulated chip of size N × N × N with N = 512 and voxel size . The 3D simulatedchip and its 2D slice are given in Figure 4. Our interest is to recover the object that is deﬁned by its complex refractive index, x = δ + iβ . We use a ﬂat-top Gaussian probe function with probe size × pixels. The far-ﬁeld diffraction patterns are recordedby a × pixelated detector. We use . beam energy to simulate the refractive index values for ptychographic data. Weemulate a ptychographic experiment by simulating a 3D chip, where δ yields the main imaging contrast. We distort the data withPoisson noise. In Fig. 5 we demonstrate the effect of Poisson noise on the measured data for three different detector photon counts inthe ranges I = [0 , , I = [0 , , and I = [0 , on average. As the interval I decreases, the simulations become noisier.Initially, the distance between adjacent probe center positions is set to 8 pixels that approximately correspond to overlap.The object is rotated N/ times at regular intervals from to π . Our main goal is to generate good-quality reconstructions withundersampled data. Therefore, we further decrease the probe overlap to and rotate the object N/ times regular intervals from to π , using only a quarter of the necessary projections.For acceleration of the ADMM, we use 4 inner CG iterations for ptychography and tomography problems, and the ADMM outeriteration limit is set to 250. Early termination is a common practice to accelerate the ADMM solution; see the review in [5] and moredetailed analysis for our application in [2]. Further accelerations are possible by varying the penalty parameters ρ and τ dynamicallyduring the ADMM iterations [5, Eq. (3.13)]. PREPRINT - S

EPTEMBER

22, 2020

In this section we demonstrate the effect of learned priors for the joint ptycho-tomography problem. To quantify image qualitydegradation, we use the full reference metric, SSIM index [44].In the ﬁrst simulation, the adjacent probe overlap is set to , and the number of projection angles is set to N/ , satisfyingthe Nyquist criterion. We refer to this case as well-sampled data . Next, we set the probe overlap to and use only N/ projection angles. We refer to this case as undersampled data . In Fig. 6 we report reconstruction results for the real part of the object, δ , using four different reconstruction results: (1) the ML estimate (i.e., no regularizer), (2) the MAP estimate with TV prior, (3) theML estimate followed by TomoGAN postprocessor, denoted by ML+TomoGAN, and (4) the MAP estimate with learned priors,denoted by PNP+TomoGAN. The tuning parameters α k are chosen according to the heuristic discussion in the next subsection.While the ﬁrst row of Fig. 6 corresponds to the well-sampled data, the remaining rows correspond to the undersampled data atdifferent probe intensity levels. In the ﬁrst row, we observe that most of the features are recovered with well-sampled data. WhileML+TomoGAN keeps some of the artifacts generated by ML, PNP+TomoGAN removes the artifacts with the help of iterativedenoising.In the remainder of this section, we focus on reconstructions with undersampled data in Fig. 6, Rows 2–4. While the features aresharper at low-noise simulations, I = [0 , , the loss of quality is clear as the noise level increases, I = [0 , , and I = [0 , .Without prior knowledge, reconstructions suffer from high noise levels as observed from the low SSIM index values. On the otherhand, even using a sparse regularizer such as TV, the blurring noise effect is visible in all reconstructions because sparse priors donot result in deblur images.While using TomoGAN as a postprocessor can generate good-quality reconstructions at lower noise levels, the reconstruction qualityhighly depends on the noisy input of the image. Hence, the degradation in image quality is highest for I = [0 , ; see Fig. 6,last row. At high noise levels, the effect of learned priors is more drastic because the small features in the ML estimate are hardlyseparable from the background. Our proposed framework generates sharp images with signiﬁcantly higher SSIM index values.Simulations show that the proposed method can decrease the total number of projections by 75% based on Nyquist sampling whilegenerating acceptable quality reconstructions. Although small artifacts are introduced in the reconstructions with undersampledmeasurements, the high SSIM index is still acceptable and can be improved by extending the training data.Our evaluation shows that our optimizations can decrease the total number of required projections (with signiﬁcantly fewer overlappedregions) by 75% compared with using adequately sampled data (based on Nyquist) where the SSIM index value is greater than 0.85. α k In this section we investigate the effect of the tuning parameters, α k , based on reconstruction quality and convergence behavior usingsix representative schemes. These schemes are referred to as α -schedules, and they are depicted in Fig. 7. In Fig. 8 we give the lineproﬁles of the ground truth and the normalized reconstructions with the corresponding α k sequences. To demonstrate reconstructionquality, we also provide the reconstruction results of a 2D slice of the simulated chip for each tuning parameter and report the SSIMindex value [44] on each image. In some cases, we observe that MSE loss in TomoGAN causes some peak amplitude information tobe lost since it tries to ﬁt the average. However, this loss does not affect the image quality notably as it is conﬁrmed with relativelyhigh SSIM index values. To analyze the effect of the tuning parameter on convergence, we use primal and dual residuals given inEqs. (11)–(13) and demonstrate that poor choices of α lead to divergence in Fig. 9.Based on our observation, choosing the general PnP denoising operator in Eq. (23), α , it leads to severe degradation of thereconstruction, as is depicted in Fig. 8. Furthermore, the algorithm diverges quickly, as is also conﬁrmed in Fig. 9. Overall, weobserve that choosing large α values in early iterations of the ADMM or for several consecutive iterations leads to divergence inFig. 9; see α − α . Clearly, α k ≈ turns off the denoiser and generates noisy reconstructions; see the ML estimate reconstructionsin Fig. 6.To summarize, we conclude that we obtain poor reconstructions in the early ADMM iterations for the joint ptycho-tomographyproblem. Hence, reducing the denoiser effect is essential in the ﬁrst few tens of outer iterations. Because of the early terminationin the optimization of ψ and x , we set α = [10 − ] for the ﬁrst 40 iterations. Furthermore, we observe that solving ψ and x subproblems higher number of inner iterations does not improve the reconstruction quality in the early ADMM iterations andtuning parameter is still needed. Next, we set α = [0 . to implement the denoising operator only incrementally. Then, weset α = [1] to maximize the denoiser effect as a postprocessing step. This selection not only gives one of the highest SSIMindex values but also gives the fastest convergence behavior, as can be conﬁrmed in Fig. 9. While α also generates good-qualityreconstruction, we observe the oscillation in the convergence behavior when α is set to 0.3 every 50 iterations. Our observationsshow that an effective tuning parameter satisﬁes the convergence criteria and produces good-quality reconstructions. We point outthat the choice of tuning parameter depends on the application. However, our observations are widely applicable. The goal of thissection is not to provide an optimal tuning parameter but to share valuable observations to decide on an effective one. In this paper, we derive a generic reconstruction framework for solving the joint ptycho-tomography problem with learned priors.The framework splits the joint problem into three parts: ptychography, tomography, and a learned denoiser. The PnP framework is PREPRINT - S

EPTEMBER

22, 2020Figure 6: Reconstruction results of a typical slice of the 3D synthetic chip, δ for different probe intensity levels. Theﬁrst row corresponds to reconstructions with probe overlap and 768 projection angles and high noise level. Theremaining rows correspond to reconstructions with probe overlap and 192 projection angles and various noiselevels. The ADMM outer iteration is 250, and inner CG iterations are set to 4 for each ptychography and tomographysubproblem followed by TomoGAN. The tuning parameter is deﬁned as α k = { [10 − ] , [0 . , [1] } PREPRINT - S

EPTEMBER

22, 2020Figure 7: Representative α values for 250 outer ADMM iterations. proposed as a ﬂexible way to add state-of-the-art priors to the ADMM. For the joint ptycho-tomography problem, however, thesedenoisers are not effective because of the blurring effect in the reconstruction. To this end, we ﬁrst use a Poisson process to accuratelymodel our measurements. Then, we improve reconstruction quality with deep-learning-based priors.A popular way to speed up the ADMM method is through early termination of the subproblems. In our previous work [2], we observedthat by solving only a few iterations of ptychography and tomography subproblems, we obtained good-quality reconstructions. Inthis work, we showed that the general PnP framework leads to poor denoising visible as big white blocks in the reconstructions; see α and α in Fig. 8. In our simulations, we discuss the importance of a regularization parameter and introduce a tuning parameter tocontrol the denoiser. Currently, this parameter requires tuning and convergence analysis; a search for an optimal parameter is left to aforthcoming paper.Another way to improve the time-to-solution performance of the ADMM method is to use high-performance many-core architectures,such as GPUs. Depending on the algorithm used, the solution of each subproblems deﬁned in our framework can require signiﬁcantcomputational throughput [3, 30]. In our work, we implemented the main solvers using CUDA and accelerated their computationswith NVIDIA RTX 2080 GPUs. Similarly, we implemented TomoGAN in TensorFlow, which can be ported to and executed onvariety of GPUs, for efﬁcient training and inference operations. We plan to further improve the computational performance of oursolvers and intermediate steps using the methods introduced in our previous works [4, 20] and provide a comprehensive evaluation ina future work.In this work, we focused on undersampled and highly noisy measurements where we reduced the probe overlap to and projectionangles to N/ . Hence, we decrease the total number of projections by 75% compared with well-sampled data based on Nyquist.Our simulations show that the effect of our framework is evident at high noise level. While TomoGAN as a postprocessor improvesreconstruction quality at lower noise levels, the degradation in image quality is substantial at high noise. On the other hand, ourproposed framework generates a reconstructed object with minimal loss in the quality. While we observe additional artifacts in thereconstructions, we expect that the image quality can further be improved by extending training data. Acknowledgments

This research used resources of the Advanced Photon Source, a U.S. Department of Energy (DOE) Ofﬁce of Science User Facilityoperated for the DOE Ofﬁce of Science by Argonne National Laboratory under Contract No. DE-AC02-06CH11357.

References [1] M. Abadi, P. Barham, J. Chen, Z. Chen, A. Davis, J. Dean, M. Devin, S. Ghemawat, G. Irving, M. Isard, M. Kudlur, J. Levenberg,R. Monga, S. Moore, D. G. Murray, B. Steiner, P. Tucker, V. Vasudevan, P. Warden, M. Wicke, Y. Yu, and X. Zheng. TensorFlow:A system for large-scale machine learning. In ,OSDI’16, pages 265–283, Berkeley, CA, USA, 2016. USENIX Association.[2] S. Aslan, V. Nikitin, D. J. Ching, T. Bicer, S. Leyffer, and D. Gürsoy. Joint ptycho-tomography reconstruction throughalternating direction method of multipliers.

Optics Express , 27(6):9128–9143, 2019. PREPRINT - S

EPTEMBER

22, 2020Figure 8: Line proﬁles. In the ﬁrst row, the line proﬁles of the normalized reconstructions for 6 different tuningparameters and the ground truth image are given. The reconstruction results for the corresponding tuning parametersare given below. Line proﬁles are highlighted with a yellow line on each image.12

PREPRINT - S

EPTEMBER

22, 2020Figure 9: Convergence analysis for various tuning parameter selections. In our experiments, α and α lead todivergence in primal and dual residual plots, depicted as PR and DR in the lower right ﬁgure.13 PREPRINT - S

EPTEMBER

22, 2020 [3] T. Bicer, D. Gürsoy, V. D. Andrade, R. Kettimuthu, W. Scullin, F. D. Carlo, and I. T. Foster. Trace: a high-throughputtomographic reconstruction engine for large-scale datasets.

Advanced Structural and Chemical Imaging , 3(1):6, Jan 2017.[4] T. Bicer, D. Gürsoy, R. Kettimuthu, F. De Carlo, G. Agrawal, and I. T. Foster. Rapid tomographic image reconstructionvia large-scale parallelization. In J. L. Träff, S. Hunold, and F. Versaci, editors,

Euro-Par 2015: Parallel Processing , pages289–302, Berlin, Heidelberg, 2015. Springer.[5] S. Boyd, N. Parikh, E. Chu, B. Peleato, and J. Eckstein. Distributed optimization and statistical learning via the alternatingdirection method of multipliers.

Foundations and Trends in Machine Learning , 3(1):1–122, 2011.[6] G. T. Buzzard, S. H. Chan, S. Sreehari, and C. A. Bouman. Plug-and-play unplugged: Optimization-free reconstruction usingconsensus equilibrium.

SIAM Journal on Imaging Sciences , 11(3):2001–2020, 2018.[7] S. H. Chan, X. Wang, and O. A. Elgendy. Plug-and-play ADMM for image restoration: Fixed-point convergence andapplications.

IEEE Transactions Computational Imaging , 3(1):84–98, 2017.[8] J. H. R. Chang, C.-L. Li, B. Poczos, B. V. K. V. Kumar, and A. C. Sankaranarayanan. One network to solve them all – solvinglinear inverse problems using deep projection models.

In IEEE International Conference on Computer Vision (ICCV) , pages5888–5897, 2017.[9] K. Dabov, A. Foi, V. Katkovnik, and K. Egiazarian. Image denoising by sparse 3-D transform-domain collaborative ﬁltering.

IEEE Transactions on Image Processing , 16:2080–2095, 2007.[10] Y. H. Dai and Y. Yuan. A nonlinear conjugate gradient method with a strong global convergence property.

SIAM J. Optim. ,10(1):177–182, 1999.[11] J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei-Fei. Imagenet: A large-scale hierarchical image database. In , pages 248–255. Ieee, 2009.[12] M. Dierolf, A. Menzel, P. Thibault, P. Schneider, C. M. Kewish, R. Wepf, O. Bunk, and F. Pfeiffer. Ptychographic x-raycomputed tomography at the nanoscale.

Nature , 467(7314):436–439, 2010.[13] M. Du, Y. S. G. Nashed, S. Kandel, D. Gursoy, and C. Jacobsen. Three dimensions, two microscopes, one code: automaticdifferentiation for x-ray nanotomography beyond the depth of focus limit, 2019.[14] M. Elad and M. Aharon. Image denoising via sparse and redundant representations over learned dictionaries.

Image Processing,IEEE Transactions on , 15:3736–3745, 2006.[15] I. J. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, and Y. Bengio. Generativeadversarial networks. In

Advances in Neural Information Processing Systems , pages 2672–2680, 2014.[16] S. Gu, L. Zhang, W. Zuo, W. Zuo, and X. Feng. Weighted nuclear norm minimization with application to image denoising.

InIEEE Conference on Computer Vision and Pattern Recogni- tion , pages 2862–2869, 2014.[17] D. Gürsoy. Direct coupling of tomography and ptychography.

Opt. Lett. , 42(16):3169–3172, 2017.[18] J. He, Y. Yang, Y. Wang, D. Zeng, Z. Bian, H. Zhang, J. Sun, Z. Xu, and J. Ma. Optimizing a parameterized plug-and-playADMM for iterative low-dose CT reconstruction.

IEEE Transactions on Medical Imaging , 38(2):371–381, 2019.[19] S. Helgason and S. Helgason.

The Radon transform , volume 2. Springer, 1999.[20] M. Hidayeto˘glu, T. Biçer, S. G. De Gonzalo, B. Ren, D. Gürsoy, R. Kettimuthu, I. T. Foster, and W.-m. W. Hwu. Memxct:Memory-centric x-ray ct reconstruction with massive parallelization. In

Proceedings of the International Conference for HighPerformance Computing, Networking, Storage and Analysis , pages 1–56, 2019.[21] W. Hoppe. Beugung im inhomogenen Primärstrahlwellenfeld, I: Prinzip einer Phasenmessung.

Acta Crystallographica A ,25:495–501, 1969.[22] R. Hunger.

An introduction to complex differentials and complex differentiability . Munich University of Technology, Inst. forCircuit Theory and Signal Processing, 2007.[23] K. H. Jin, M. T. McCann, E. Froustey, and M. Unser. Deep convolutional neural network for inverse problems in imaging.

IEEE Transactions on Image Processing , 26(9):4509–4522, Sep. 2017.[24] M. Kahnt, J. Becher, D. Brückner, Y. Fam, T. Sheppard, T. Weissenberger, F. Wittwer, J.-D. Grunwaldt, W. Schwieger, andC. G. Schroer. Coupled ptychography and tomography algorithm improves reconstruction of experimental data.

Optica ,6(10):1282–1289, 2019.[25] U. S. Kamilov, H. Mansour, and B. Wohlberg. A plug-and-play priors approach for solving nonlinear imaging inverse problems.

IEEE Signal Processing Letters , 24(12):1872–1876, 2017.[26] D. P. Kingma and J. Ba. Adam: A method for stochastic optimization.

CoRR , 2014.[27] L. Li, X. Wang, and G. Wang. Alternating direction method of multipliers for separable convex optimization of real functionsin complex variables.

Mathematical Problems in Engineering , 2015:104531, 2015.[28] Z. Liu, T. Bicer, R. Kettimuthu, D. Gursoy, F. De Carlo, and I. Foster. Tomogan: low-dose synchrotron x-ray tomography withgenerative adversarial networks.

J. Opt. Soc. Am. A , 37(3):422–434, 2020. PREPRINT - S

EPTEMBER

22, 2020 [29] T. Meinhardt, M. Moeller, C. Hazirbas, and D. Cremers. Learning proximal operators: Using denoising networks forregularizing inverse imaging problems.

In IEEE International Conference on Computer Vision (ICCV) , pages 1781–1790,2017.[30] V. Nikitin, S. Aslan, Y. Yao, T. Bicer, S. Leyffer, R. Mokso, and D. Gürsoy. Photon-limited ptychography of 3D objects viaBayesian reconstruction.

OSA Continuum , 2(10):2948–2968, 2019.[31] J. Nocedal and S. J. Wright.

Numerical optimization . Springer Science, second edition, 2006.[32] T. Ramos, B. E. Grønager, M. S. Andersen, and J. W. Andreasen. Direct three-dimensional tomographic reconstruction andphase retrieval of far-ﬁeld coherent diffraction patterns.

Physical Review A , 99(2), Feb 2019.[33] A. Rond, R. Giryes, and M. Elad. Poisson inverse problems by the plug-and-play scheme.

J. Vis. Commun. Image Represent ,41:96–108, 2016.[34] L. Rudin, S. Osher, and E. Fatemi. Nonlinear total variation based noiÅse removal algorithms.

Phys. D, Nonlinear Phenomena ,60(1-4):259–268, 1992.[35] E. K. Ryu, J. Liu, S. Wang, X. Chen, Z. Wang, and W. Yin. Plug-and-play methods provably converge with properly traineddenoisers,.

In Proc. 36th Int. Conf. Machine Learning (ICML) , pages 5546–5557, 2019.[36] H. Shan, Y. Zhang, Q. Yang, U. Kruger, M. K. Kalra, L. Sun, W. Cong, and G. Wang. 3-D convolutional encoder-decodernetwork for low-dose CT via transfer learning from a 2-D trained network.

IEEE Transactions on Medical Imaging , 37(6):1522–1534, June 2018.[37] K. Simonyan and A. Zisserman. Very deep convolutional networks for large-scale image recognition.

CoRR , abs/1409.1556,2014.[38] S. Sreehari, S. V. Venkatakrishnan, B. Wohlberg, G. T. Buzzard, L. F. Drummy, J. P. Simmons, and C. A. Bouman. Plug-and-play priors for bright ﬁeld electron tomography and sparse interpolation.

IEEE Transactions on Computational Imaging ,2(4):408–423, 2016.[39] Y. Sun, B. Wohlberg, and U. Kamilov. An online plug- and-play algorithm for regularized image reconstruction,.

IEEETransactions on Computational Imaging , 5(3):395–408, 2019.[40] Y. Sun, S. Xu, Y. Li, L. Tian, B. Wohlberg, and U. S. Kamilov. Regularized fourier ptychography using an online plug-and-playalgorithm.

Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing , pages 7665–7669,2019.[41] A. N. Tikhonov and V. Y. Arsenin.

Solutions of Ill-Posed Problems . Wiley, New York, 1977.[42] J. A. Tropp and S. J. Wright. Computational methods for sparse solution of linear inverse problems.

Proc. IEEE , 98(6):948–958,2010.[43] S. V. Venkatakrishnan, C. A. Bouman, and B. Wohlberg. Plug-and-play priors for model based reconstruction. , pages 945–948, 2013.[44] Z. Wang, A. C. Bovik, H. R. Sheikh, and E. P. Simoncelli. Image quality assessment: From error visibility to structuralsimilarity.

IEEE Transactions on Image Processing , 13(4):600–612, 2004.[45] K. Wei, A. Aviles-Rivero, J. Liang, Y. Fu, Carola-Bibiane, Schnlieb, and H. Huang. Tuning-free plug-and-play proximalalgorithm for inverse imaging problems. arXiv preprint:2002.09611 , 2020.[46] X. Xu, J. Liu, Y. Sun, B. Wohlberg, and U. S. Kamilov. Boosting the performance of plug-and-play priors via denoiser scaling. arXiv preprint:2002.11546 , 2020.[47] X. Yang, V. De Andrade, W. Scullin, E. L. Dyer, N. Kasthuri, F. De Carlo, and D. Gürsoy. Low-dose x-ray tomography througha deep convolutional neural network.

Scientiﬁc reports , 8(1):1–13, 2018.[48] D. H. Ye, S. Srivastava, J. Thibault, K. Sauer, and C. Bouman. Deep residual learning for model-based iterative CT reconstructionusing plug-and-play framework.

In 2018 IEEE International Conference on Acoustics , pages 6668–6672, 2018.[49] K. Zhang, W. Zuo, Y. Chen, D. Meng, and L. Zhang. Beyond a Gaussian denoiser: Residual learning of deep CNN for imagedenoising.

IEEE Transactions on Image Processing , 26(7):3142–3155, 2017., 26(7):3142–3155, 2017.