Learning an optimal PSF-pair for ultra-dense 3D localization microscopy
Elias Nehme, Boris Ferdman, Lucien E. Weiss, Tal Naor, Daniel Freedman, Tomer Michaeli, Yoav Shechtman
LLearning an optimal PSF-pair for ultra-dense 3D localizationmicroscopy
Elias Nehme , Boris Ferdman , Lucien E. Weiss , Tal Naor , Daniel Freedman , Tomer Michaeli , and Yoav Shechtman
The Erna and Andrew Viterbi Faculty of Electrical Engineering, Technion - Israel Institute of Technology, 3200003 Haifa,Israel Biomedical Engineering Department and Lorry I. Lokey Center for Life Sciences and Engineering, Technion - IsraelInstitute of Technology, 3200003 Haifa, Israel Russel Berrie Nanotechnology Institute, Technion - Israel Institute of Technology, 3200003 Haifa, Israel Google Research, Haifa, Israel Equal Contribution * Corresponding author: [email protected]
Abstract
A long-standing challenge in multiple-particle-tracking is the ac-curate and precise 3D localization of individual particles at closeproximity. One established approach for snapshot 3D imagingis point-spread-function (PSF) engineering, in which the PSF ismodified to encode the axial information. However, engineeredPSFs are challenging to localize at high densities due to lateralPSF overlaps. Here we suggest using multiple PSFs simulta-neously to help overcome this challenge, and investigate theproblem of engineering multiple PSFs for dense 3D localization.We implement our approach using a bifurcated optical systemthat modifies two separate PSFs, and design the PSFs using threedifferent approaches including end-to-end learning. We demon-strate our approach experimentally by volumetric imaging offluorescently labelled telomeres in cells.
In a conventional imaging system, the spatial resolution isbounded by Abbe’s diffraction limit. In a high numerical aper-ture microscope, this corresponds to approximately half the op-tical wavelength, i.e . ≈
200 nm for visible light. For cell-imagingapplications, this obscures subcellular features of interest withdimensions on the nanoscale. Since 2006, Single-Molecule Local-ization Microscopy (SMLM) super-resolution techniques haverevolutionized biological-structure imaging by circumventingthe diffraction limit, namely, using many low-density images ofdifferent sets of fluorescent emitters to generate a high-resolutionreconstruction [1–3].While biological structures are intrinsically 3D, attaining axial(z) information at super-resolution is not trivial. This is due tothe standard Point Spread Function (PSF) of the microscope be-ing approximately symmetric about the focal plane, and havingonly a thin axial range before the signal becomes very diffuse.Several approaches have been developed to capture 3D data inmicroscopy. For example, one can acquire multiple 2D datasetsat different focal planes [4–6], or determine the axial positions ofemitters from the images themselves. The latter can be enabledby PSF engineering, where the PSF is modified to encode the de-sired 3D information. This is typically done by either inducing an intentional aberration in the imaging path, e.g. a cylindricallens [7] or a phase mask at the Fourier plane of the microscopeusing an extended optical system [8–10]. Notably, while provid-ing scan-free axial information, this approach poses a limitationon the maximum emitter densities suitable for imaging, due tothe increased lateral size of the PSFs, and requires more compleximage-analysis algorithms than 2D localizaiton microscopy.When imaging samples that are even just several micronsthick, engineered PSFs spread the signal photons over a largelateral footprint relative to the in-focus PSF [11]. This poses adifficult localization challenge when the experimental objectiveof obtaining a super-resolution reconstruction necessitates thatmany molecules be localized in a densely labelled structure.Currently available software packages struggle to achieve goodperformance in this regime [12, 13]; however, recent work hasshown that deep neural networks are well suited to the problem[14], enabling high-quality reconstruction estimations from lowemitter densities [15, 16], and increased-density processing [17–23].Deep Learning (DL) has excelled in a variety of challengingcomputational-imaging problems in computer vision, computa-tional photography, medical imaging, and microscopy [24, 25].Within the realm of computational microscopy, DL has beendeployed for tasks such as cell segmentation [26], image restora-tion [27–30], sample classification [31, 32], artificial labelling [33],phase imaging [34–36], optical tomography [37], lifetime imag-ing [38], single-molecule localization [15–23, 39–42], aberrationcorrection [43–46], CryoEM [47], and more [48].An exciting recent application enabled by deep learning isthe end-to-end design of “computational cameras.” Powered bydifferentiable imaging models and back-propagation, end-to-end learning jointly optimizes the sensing system alongside thedata-processing algorithm, thus enabling both components towork harmoniously. This approach has quickly expanded withinthe computational-imaging community for numerous applica-tions in computer vision and computational photography, forexample, color sensing and demosaicing [49, 50], illumination-design through scattering media [51], extended-depth-of-fieldimaging [52–54], monocular depth estimation [52, 53, 55, 56],high-dynamic-range imaging [57, 58], and hyper-spectral imag-ing [59, 60]. In computational microscopy, end-to-end learning1 a r X i v : . [ ee ss . I V ] S e p Tlens CameraImageplane
FTlens
Camera
Relay
LC-SLM a cb d PBSLaser Fig. 1.
The multi-PSF optical system. (a) A standard inverted microscope with laser illumination. (b-c) Two image planes, split bytheir polarization, employing two LC-SLMs placed in conjugate back focal planes to the objective lens. Each optical path can bemodulated with a different phase mask ( M & M ). (d) A comparison between the standard PSF (left) and a 4 µ m Tetrapod PSF(right).has been utilized by our group and others to enhance variouscomputational modalities such as sample classification [31, 32],single-molecule color sensing and 3D localization [21, 41], quan-titative phase imaging [61] and multi-photon microscopy [62].Here, to address the challenge of high density 3D localizationfrom a snapshot, we suggest the simultaneous use of multiplePSFs, as well as the method to design and implement the optimalphase masks. Specifically, we introduce a bifurcated optical sys-tem that modifies two separate PSFs with a pair of phase masksusing Liquid-Crystal Spatial Light Modulators (LC-SLMs). First,we demonstrate that there is an advantage of splitting precioussignal photons into two channels compared to a single PSF sys-tem even in moderately dense emitter conditions. For this taskwe utilize a PSF-pair that splits the 3D information into com-plementary channels, namely, for lateral and axial localization.To localize the emitters from the obtained pair of images weemployed a convolutional neural network (CNN) architecturebased on DeepSTORM3D [21]. Next, we revisit the problemof optimizing the information content of a single emitter in apair of PSF measurements [9]. Lastly, we implement end-to-endlearning to jointly design our localization algorithm and the PSF-pair. The resulting PSFs, which we call the Nebulae PSFs, achieveunprecedented performance in localizing volumes of dense emit-ters in 3D. We quantify and directly compare the performanceof each approach by simulation and experimentally with volu-metric imaging of fluorescently labelled telomeres in fixed cells.Finally, we demonstrate continuous, scan-free, live-cell trackingof >
60 telomeres in a single cell’s nucleus simultaneously with ≈
30 nm 3D precision and 100 ms temporal resolution over anaxial range of ≈ µ m . Dual-camera systems have been utilized in the past in mi-croscopy for localizing single emitters in 3D [63–66]. Most re-cently, the use of a dual-view scheme was utilized in DAISY [67] to combine Astigmatism-based PSF engineering with Super-critical Angle Fluorescence (SAF) [68] to provide a semi-isotropic3D resolution over a ≈ µ m axial range. However, while theseworks proposed creative designs to combine the informationin both channels, their objective was to enable a precise andexperimentally-robust axial localization of single emitters. Inaddition, the proposed PSFs were hand-crafted based on desiredproperties and not fully optimized. Here we use a bifurcatedoptical system with two detection paths for the task of precise3d localization of multiple emitters in ultra-dense samples.The optical system used to implement the monocular PSF-pair is presented in Fig. 1. Briefly, our system is composedof an epifluorescence microscope extended with two identicaldetection paths. The fluorescent light emitted from the particlein the sample is split using a polarizing beam splitter into two4 f optical processing systems, each equipped with a LC-SLMplaced in the Fourier plane. The LC-SLM is used to implementa phase modulation modifying the emission pattern to encodethe 3D position onto the 2D captured measurements, whichare then decoded jointly via further image processing. For alist of the specific components used in our implementation seesupplementary section A.5.2.We model our system using the scalar diffraction approxima-tion where the emitters are modeled as isotropic point sources[69]. Thus, the PSFs of our system can be efficiently computed bya Fast Fourier Transform (FFT). A full description of our imagingmodel is provided in supplementary section A.1.Equipped with the system above, the question is what pair ofPSFs is suited for the task of dense 3D localization. In the nextsections we gradually answer this question. For simplicity, we first consider the problem of designing anadditional PSF while keeping the first PSF fixed to the 4 µ m Tetra-pod [9], which was optimized for the sparse case in a single2hannel. Given that Tetrapod PSFs encode depth at the costof a large lateral footprint, we would like the complementaryPSF to be compact and help disentangle the approximate lateralpositions in overlapping regions. Then, aided by this additionalmeasurement, the overlapping Tetrapods can be decoded torecover the 3D positions. In other words, we are, broadly, sepa-rating the problem into an "axial localization" channel, encodedby the Tetrapod PSF, and a "lateral localization" channel, to beencoded by a different PSF.For encoding lateral information we propose the use of anExtended-Depth-of-Field (EDOF) PSF, namely, a PSF that main-tains its lateral shape over extended axial ranges. However,unlike traditional EDOF designs [70, 71], the desired PSF needsto be laterally-compact and signal-efficient, because it shouldwork for very dense samples. These requirements motivated usto design a novel EDOF suited for the task.
To design the desired EDOF PSF, we formulate the problem asa phase retrieval task. Specifically, given a desired axial range ∆ z ( e.g . 4 µ m s), we first generate a synthetic z-stack comprisedof the approximate in-focus Airy disk PSF A ( x , y ) at 200 nmsteps. Afterwards, we use stochastic gradient descent iterationswith importance sampling [72] to recover the phase mask M associated with this PSF. Let D be the diffraction limit for theassumed optical setup. Then our cost function for this task isgiven by L EDOF ( M ) = N z ∑ i = (cid:107) ( PSF ( x , y ; M , z i ) − A ( x , y )) · S ( x , y ) (cid:107) ,(1)where PSF ( x , y ; M , z i ) is the on-axis PSF at depth z i , N z is thenumber of axial slices ( ∆ z ), and S ( x , y ) is a weighting termadded to quickly “squeeze" the signal photons into the diffrac-tion limited spot, given by S ( x , y ) = (cid:40)
1, if (cid:112) x + y ≤ D25 · (cid:112) x + y , otherwise . (2)The resulting phase mask and PSF are presented in Fig. 2.This simple approach leads to a powerful EDOF, with very highsignal-efficiency and small lateral-footprint (Fig. 2b) comparedto previous designs [70, 71] (see supplementary section A.2 forcomparisons and implementation details). While we designedand implemented this EDOF to complement the Tetrapod in-formation in emitter-dense regions, its potential applicationsextend far beyond our localization task.Notably, recent end-to-end designs of EDOF PSFs haveachieved quite compelling results [52–54]. In particular, thephase mask presented in [54] resembles the result of our ap-proach. However, these data-driven approaches are ultimatelydataset-dependant, and take hours of training to design for anew range, whereas our approach is independent of the datasetand converges in less than 2 minutes on GPU. In typical LC-SLM PSF engineering systems, half of the signal-photons are discarded, since the LC-SLM can only modulatepolarized light. Therefore, in our system the second PSF mea-surement comes at no additional photon-cost, with the onlycaveat being the need of an additional detection path in the abc
Standard EDOF x z Fig. 2.
A small-footprint EDOF mask. (a) The evolution of theEDOF phase mask optimization over 400 ietrations. (b) Com-parison between the standard PSF (top) and the final EDOFPSF (bottom). (c) The XZ cross-sections of the standard (left)and EDOF (right) PSFs, respectively. The colorscale is normal-ized to the maximum intensity of the in-focus, unmodulatedPSF.two-view setup. It should be noted that 4 f systems that utilizea Diffractive Optical Element (DOE), instead of a LC-SLM, donot suffer from this photon loss. Yet, this comes at the cost ofversatility. Now that we have designed a novel EDOF PSF forour task, we can test the hypothesis whether or not splitting thesignal into two cameras is in fact beneficial compared to a DOEbased system.Since neural networks are already established to be incrediblyefficient for dense localization [17, 21], we modify our previouslypublished fully convolutional architecture [21] to receive animage with two channels comprised of the two measurements.For training details and network architecture see supplementarysection A.4. Our results in simulation (see supplementary Fig.S8) confirms that for the task of dense 3D localization, a splitsignal dual-view system is superior to a single measurementwith a DOE, even when that measurement is sensed using anoptimal end-to-end learned design [21]. Next, we validate our approach in cells. For this task, we imagedfluorescently labeled telomeres in fixed human osteosarcoma(U2OS) cells (for fixation and labeling see supplementary sectionA.5.2). We first chose fixed cells to enable the acquisition of aground truth approximation via axial scanning. The imagedcell line was hypertriploid, meaning that it has an unusuallylarge number of telomeres (70-130), which facilitates testing ourmethod in a dense environment. The experiment consisted oftwo parts: first, each cell was scanned in the axial directionusing a piezo stage (100 nm steps) the 3D ground truth posi-tions were approximated via fitting (see supplementary sectionA.5.1). Afterwards, we recorded 3 snapshot images: one withthe Tetrapod PSF utilizing 100% of the signal (accomplishedusing a longer exposure time) and two more with the signal split50%/50% between the Tetrapod PSF and the EDOF PSF (Fig. 3).In agreement with simulations, these results demonstrate that ata density of ≈ (cid:104) emitters µ m (cid:105) , the Tetrapod-EDOF pair is superiorin localizing overlapping telomeres as measured by the Jaccardindex [13, 21].While the complementary PSF-pair is effective, this way of3ecoupling the 3D positional information by dedicated "lateral"and "axial" channels is unlikely to be the optimal solution. Forexample, beyond a certain density, the axial information in theTetrapod PSF will be occluded completely by overlapping PSFs.Having a second measurement that is solely dedicated to encodethe lateral information (EDOF PSF) will not be beneficial fordecoding z . This motivates us to revisit the task of designinga PSF-pair for dense 3D localization. For simplicity we startwith the single-emitter case, viewed from an estimation theoryperspective. abc FN FP TP
FN FP TP Y [ ] Z [ ] X [ ]Y [ ] X [ ] Z [ ] Ground truth True positive False positive False negative
Fig. 3.
Snapshot, dense-emitter, 3D localizations in fluores-cently labelled cells. (a) A single frame recorded with a single-channel, 4 µ m Tetrapod PSF (left) and the split-channel, dualPSF approach (right). (b-c) Localizations are plotted with theground truth measured by axially scanning the sample withthe unmodulated PSF.
Optimal PSFs for two-channel localization of only a single emit-ter can be derived by minimizing the Cramér Rao Lower Bound(CRLB) [9, 73, 74]. Considering the system in Fig. 1, we canjointly optimize the sensitivity of a PSF-pair with respect to achange in the 3D position of a single emitter. The CRLB then de-fines the lower bound on the precision of unbiased estimation ofthe 3D position from a noisy-PSF pair. Unlike the original Tetra-pod optimization [9], here we employed a pixel-wise approachto explore aberrations not spanned by low-order Zernike poly-nomials. For a full derivation of the CRLB and the optimizationobjective see supplementary section A.3. The CRLB-optimized PSF pair is given in Fig. 4. Notably, theCRLB of the PSF-pair is similar to the CRLB of a 4 µ m TetrapodPSF with twice the signal. Therefore, as can be expected, splittingthe information does not improve precision in the single-emittercase, suggesting that a two-channel system is not justified forsparse localization.The resulting PSF-pair combines the concept of bi-plane imag-ing and PSF engineering in an elegant way to encode the 3Dposition in two measurements. Simulation results show that thisPSF-pair outperforms the Tetrapod-EDOF pair described earlier(see supplementary sections A.6 and A.7); however, previouswork demonstrates that end-to-end designs using deep neuralnetworks can lead to superior performance [21], and this is thepath we describe next.
Mask 1Mask 1 + Mask 2Tetrapod x2 SNRMask 2 a M a s k M a s k b Fig. 4.
CRLB-optimized PSF pair. (a) Two phase masks weregenerated by CRLB optimization, namely by estimating the3D position of a single emitter from a pair of images. Inter-estingly, each channel encodes a complementary part of theaxial range. These PSFs have a smaller lateral footprint thanthe similar z-range 4 µ m Tetrapod. The colorbar is normalizedto the in-focus unmodulated PSF of the system. (b) The esti-mated CRLB lateral (upper) and axial (lower) precision as afunction of emitter depth of each PSF separately (red and or-ange), and after combining both channels (green), as well asthe single-channel PSF Tetrapod (blue).
As shown previously [21], end-to-end designs lead to efficientPSF patterns that are highly suited for dense 3D imaging. Here,we extend the DeepSTORM3D approach to tackle the problemof designing a PSF-pair. This is achieved by designing the encod-ing stage to incorporate two disjoint and differentiable physical-simulation layers (Fig. 5a). Each layer is parameterized by itsown phase mask ( M & M ) dictating the respective PSF (seesupplementary section A.1 for the imaging model). During train-ing, we randomly simulate 3D positions ( ∪ i r i ), and feed them tothe two physical layers. Each physical layer encodes the 3D posi-tions to their simulated sensor image ( I & I ). These images areconcatenated and fed to the localization CNN (parameterizedby W ) which decodes them in order to recover the underlying3D positions ( ∪ i ˆ r i ). The difference between the simulated and4 hysical layers a ZX Y
PBS
ZX Y 0.33 b Encoding 1Encoding 2 Fig. 5.
End-to-end learning of the dual-channel optical system. (a) Simulated 3D positions of emitters ∪ i r i are fed into two physicallayers, which differ only in the applied phase mask M , to simulate the acquired image pairs with the modulated PSFs - I & I .Next, both images are fed through a convolutional neural network to recover the 3D positions in the simulation ∪ i ˆ r i . Afterwards,these reconstructed positions are compared to the ground truth with our loss function L , and the gradients are back propagatedthrough the layers (red lines) to jointly optimize the encoding masks M & M , and localization CNN parameters W . (b) NebulaePSFs, which are the result of the end-to-end learning for a 4 µ m axial range. The colorbar is normalized compared to the in-focusunmodulated PSF of the system.the recovered positions is quantified by our loss function ( L )and back-propagated to jointly optimize the phase masks ( M & M ), and the localization CNN parameters ( W ) end-to-end.This process is usually repeated for ≈
30 epochs until conver-gence. For training details see supplementary section A.4.The end-to-end learned phase masks and their respectivePSFs, hereafter referred to as the Nebulae PSFs, are presentedin Fig. 5. Two distinctive features stand out in this pair com-pared to the previous approaches described earlier. First, bothchannels encode 3D information in their individual intensity pat-terns, as well as in the relative position of the intensity centroidsthroughout the entire axial range, a trait conceived to be use-ful for 3D localization before [63]. Second, in phase-space, thelearned phase masks are approximately rotated versions com-pared to one another, although our optimization was performedpixel-wise and our loss function did not include any constraintson the mutual information of both measurements.To evaluate the performance of the Nebulae PSFs, we firstcompare them in simulation to the Tetrapod-EDOF pair (section3), as well as to a single channel Tetrapod PSF with twice thesignal (Fig. 7). The results indicate that the Nebulae PSFs achieveunprecedented performance in localizing dense 3D emitters overa large axial range of 4 µ m s assuming our experimental telomereimaging conditions, i.e . ≈
15K signal photons per emitter and ≈
500 background photons per pixel.
Next, we applied the Nebulae PSFs in fixed cells, and comparedthe performance to the Tetrapod-EDOF pair experimentally (Fig.7). Similar to section 3.3, we first found the emitter positions byaxial scanning, for comparison to our snapshots images taken at a single focal plane: once with the Tetrapod-EDOF pair, andonce with the Nebulae PSFs. The results show that at a densityof ≈ (cid:104) emitters µ m (cid:105) , the Nebulae PSFs are superior in localizingoverlapping telomeres as measured by the Jaccard index. TheNebulae PSFs were also found to have superior performancerelative to the CRLB-optimized pair from section 4.1. For a head-to-head comparison in simulations as well as experiments seesupplementary sections A.6 and A.7. Throughout this work we have imaged and localized 3D posi-tions of telomeres in fixed cells to facilitate quantitative compar-isons of the proposed solutions. However, more pertinent is theapplication of our method to multiple-particle-tracking in live cells, where axial scanning is inapplicable due to the motion ofthe objects. Here, our simultaneous multi-channel snapshot ap-proach enables capturing the behavior of diffusing telomeres inliving cells at an unprecedented combination of density, speed,and axial range [75].Quantifying telomere dynamics in live cells is of paramountimportance for answering fundamental questions under normaland disease conditions [75, 76], as tracking the 3D diffusion oftelomeres unveils information on the chromatin environmentand on DNA folding regulation. One challenge in observingchromatin in living cells is the intrinsic biological heterogeneitybetween diffusing telomeres [77]. Therefore, to fully characterizechromatin dynamics it is desired to capture all single telomeretrajectories, including in emitter-dense regions.Figure 8 demonstrates the full applicability of the NebulaePSFs for volumetric tracking of ≈
61 diffusing telomeres, span-5
Density J acca r d I nd e x [ a . u . ] Density L a t e r a l R M S E Density A x i a l R M S E Tetrapod + EDOFTetrapod x2 SignalNebulae
Y [ ]X [ ] Z [ ] b c Ground truthNet
Fig. 6.
Performance as function of density. (a) Performance comparison of a single-channel Tetrapod (red), Tetrapod-EDOF pair(blue), and the Nebulae PSFs (green). The Nebulae PSFs performs best both in detectability (Jaccard index) and in precision (lat-eral/axial RMSE). Emitters were simulated with ≈
15K signal photons per emitter and ≈
500 background photons per pixel. Match-ing of points was computed with a threshold distance of 100 nm using the Hungarian algorithm. Each data point is an average ofn = 100 simulated images. Average standard deviation in Jaccard index was ≈
5% and in precision was ≈ ≈ (cid:104) emitters µ m (cid:105) alongside 3D comparison of the recovered (red) and the ground truth (blue) positions. bc FN FP TP Z [ ] Z [ ] Y [ ]Y [ ] X [ ]X [ ] a a FN FP TP
Ground truth True positive False positive False negative
Fig. 7.
Experimental measurement of fixed U2OS cells withfluorescently labelled telomeres. Example images showing thetwo proposed mask pairs: the Tetrapod + EDOF (left) and theend-to-end learned pair (right). (b) The single-frame 3D local-izations with the ground truth (achieved via axial scanning) forthe Tetrapod + EDOF and learned pair, respectively. ning an axial range of ≈ µ m in the nucleus of a living U2OScell. The trained localization CNN is able to reliably track allof the labelled telomeres over the course of 500 frames (50 s),even those in close proximity, and with a low signal-to-noiseratio. As evident in the resulting tracks (Fig. 8), the telomeres ex-hibit variable diffusion profiles (Fig. 8e) necessitating individualprocessing as facilitated by our approach. In computational imaging, the co-design of optics and image-processing algorithms has been introduced in various applica-tions spanning the fields of computational photography andcomputational microscopy. In the realm of localization mi-croscopy, this is the key concept in PSF engineering [7–9], andhas been utilized to extend the imaging capabilities in SMLM[12, 78]. Until recently, however, the standard approach wasto design the optical system to optimize a specific trait ofthe PSF that would facilitate its processing afterwards, e.g . anaxial-displacement-induced rotation in the Double-Helix PSF[8, 79]. In addition to conceived physical properties, information-content-driven optimization was also used in PSF-design; forexample, in [80], where the PSF was optimized for depth dis-crimination. Similarly, for SMLM applications [9] the PSF hasbeen optimized to minimize the variance of an unbiased estima-tor for localizing the 3D position of a point source. While thelatter two identified theoretically optimal solutions to encodethe information, in complex environments, the decoding step isoften limiting the problem as well.Recently, powered by deep learning and differentiable phys-ical models, end-to-end designs of physical elements and data-processing algorithms have been demonstrated by our groupand others to facilitate efficient imaging modalities in mi-croscopy [31, 32, 61, 62]. Specifically in SMLM, the efficiency ofjointly designing PSFs and deep networks was demonstratedfor multi-color 2D imaging [41] and snapshot dense 3D imaging6 d be s b c bc Track bTrack cEnsemble M S D [ ] Z YX
Time lag [s] X [ ] Y [ ] Z [ ] Y [ ]X [ ] Z [ ] Estimatednuclear envelope Telomere c Fig. 8.
Dense-particle tracking of labelled telomeres in live cancer cells with the Nebulae PSFs. (a) A single time point showing thetwo PSF-modulated images. (b)-(c) 3D spatiotemporal trajectories for telomeres (b) and (c), exhibiting drastically different diffusionbehaviors, in different regions of the nucleus. (d) 3D rendered cell with all the accumulated tracks showing the motion tracking oftelomeres in 3D. Most telomeres were localized in all frames ( ≤
5% missing localizations). (e) Ensemble MSD of all the estimatedtracks, obscures the dynamics of individual particles, such as tracks (b) and (c), which exhibit very different diffusion dynamics.[21].In this work, we addressed the challenging task of multi-PSFengineering for dense 3D imaging. Specifically, we proposedthree different PSF-pairs, each derived with a different set ofconsiderations. For the first pair, we introduced an efficient andlaterally-compact EDOF PSF to complement the Tetrapod PSFat high emitter densities. Notably, this EDOF PSF has numer-ous applications in its own right for imaging in thick sampleswith little need for deconvolution [71]. In the second pair, weextended the CRLB-design metric to optimize the sensitivity ofa PSF-pair in the single-emitter case. Lastly, we presented theNebulae PSFs, learned end-to-end to achieve reliable dense 3Dlocalization via from snapshot measurements. We validated eachof the proposed designs numerically and experimentally. Todemonstrate the applicability for dense 3D tracking in live cells,we tracked regions of dense telomeres using the Nebulae PSFs,enabling a statistical analysis of population heterogeneity, andhigh-resolution 3D modelling of chromatin dynamics in singlecells.In contrast to standard CNN filters, a notable aspect of end-to-end learning with physical layers, is our ability to visualizeand interpret the designed physical elements. For example,for the Nebulae PSFs, the signal photons are compacted intoa single lobe in each channel. This feature is understandablyadvantageous in the dense fields of emitters with limited SNRused in our simulations and experimental conditions. Moreover,the intensity patterns at each axial position combine elementarydepth-encoding aberrations, such as astigmatism, rotation, andrelative inter-channel single lobe movement. What separatesthese PSFs from predetermined designs is the simultaneousdeployment of multiple depth-encoding strategies making fulluse of the decoding CNN capacity, and thereby optimizing dense3D localization from noisy measurements.Notably, our approach is not limited to particle tracking. By tweaking the physical-simulation layers, this method can bereadily adapted to any point-source-sensing paradigm, includ-ing DAISY [67], MINFLUX [81, 82], multi-plane microscopy [4–6, 83], and more. In a concurrent work [84], similar ideas werepursued for multiplane PSF engineering demonstrating promis-ing results in simulations. In these, and for SMLM applications,it is likely that modifying the CNN architecture, initializations,training sets, and loss functions, may further improve the perfor-mance, raising questions of how globally optimal is the solutionderived in our framework. At this point, it is unclear how eachoptimization component affects the learning process, a ques-tion that will be addressed in future work. In particular, weanticipate that the emerging suite of tools developed to makedeep learning more accessible to the community will assist inanswering these critical questions [85–87].To the best of our knowledge, this work reports the first end-to-end learning of multiple PSFs with experimental feasibility.Such multi-PSF designs may prove useful outside the realm ofcomputational microscopy. For example, in computational pho-tography, the design of coded aperture pairs and their optimalcombination with stereo imaging has been a long standing ques-tion [88–93]. Most recently, Gil et al . [93] proposed to exploitidentical phase-mask pairs for improved depth estimation andonline stereo calibration. We believe this work paves the way forasymmetric strategies in the field of computational photography,with applications in stereo imaging, and multi-shot monoculardepth estimation. Depending on the specific task at hand, theoptimal PSF-pair could vary, however, we believe that the ap-proaches to PSF-pair optimization in this work will provide auseful initialization to the general problem.7 unding
The Israel Science Foundation (grant 852/17), the Technion Ol-lendorff Minerva Center, the Zuckerman Foundation. H2020European Research Council Horizon 2020 (802567), Google Fac-ulty Research Award for Machine Perception, The Israel ScienceFoundation (grant 450/18).
Acknowledgments
We thank Rotem Mulayoff for insights and fruitful discussionswith respect to the EDOF design. We also thank Romain F. Lainefor his help with conceiving the name Nebulae. We gratefullyacknowledge the support of the NVIDIA Corporation with thedonation of the Titan Xp and the Titan V GPU used for thisresearch. We thank Google for the cloud units provided to accel-erate this research.
Disclosures
The authors declare no conflicts of interest.
A Appendix
A.1 Imaging model
In this section we briefly review the imaging model usedthroughout this work. Our system is composed of fluorescentemitters with an emission wavelength λ suspended in water(refractive index of n ≈ n ≈ f , and their image is magnified onto the sensor with a mi-croscope magnification M. Let M denote the phase mask placedin the conjugate back focal plane of an extended emission pathwith a 4 f system (Fig. 1), and let ( ρ , φ ) denote the normalizedradial coordinates in the Fourier plane such that ρ = NA n .Under the scalar approximation [69], the PSF of a point sourcelocated at ( x , y , z ) above a water-oil interface is given byPSF th ( x , y ; M , x , y , z ) ∝ (cid:12)(cid:12)(cid:12) F (cid:16) A ( ρ , φ ) e j M + π j λ Φ ( x , y , z , f ) (cid:17)(cid:12)(cid:12)(cid:12) ,(S1)where ( x , y ) are the coordinates at the image plane, F is the two-dimensional Fourier transform, A ( ρ , φ ) is the effective apertureof the compound system, limited by n for high NA objectives A ( ρ , φ ) = (cid:40)
1, if ρ ≤ n n
0, otherwise , (S2)and Φ ( x , y , z , f ) is the accumulated phase due to the emit-ter 3D position and the focal plane setting. This phase can bedecomposed into lateral and axial components Φ ( x , y , z , f ) = Φ xy ( x , y ) + Φ z ( z , f ) . (S3)The lateral component is assumed to be a linear phase ( i.e . shift-invariant convolution system), given by Φ xy ( x , y ) = M · NA (cid:112) M − NA ( x ρ cos φ + y ρ sin φ ) . (S4)As for the axial component, it is split into two terms to accountfor refractive index-mismatch [94]: the phase accumulated in water due to the emitter depth z , and the phase accumulated inoil due to a focus shift f from the coverslip Φ z ( z , f ) = Φ water ( z ) + Φ oil ( f ) , (S5)where, Φ water ( z ) = z n (cid:115) − (cid:18) n n (cid:19) ρ , Φ oil ( f ) = − f n (cid:113) − ρ . (S6)Finally, the PSF in eq. (S1) is slightly smoothed in image spacePSF ( x , y ; M , x , y , z ) = PSF th ( x , y ; M , x , y , z ) (cid:126) G ( x , y ) ,(S7)Where (cid:126) denote convolution, and G ( x , y ) is a 2D Gaussiankernel, with a standard deviation that is fit empirically to matchexperimental data (usually ≈
70 nm). This blur accounts for thefinite size of the emitter, its spectrum, and the inherent blur inthe optical system, alleviating the need to explicitly model theseeffects. For a full derivation of the model that includes neglecteddipole and near-field effects, the reader is referred to [10, 95].The image V ( x , y ) of a set of emitters ∪ i r i is given by theincoherent sum of their PSFs V ( x , y ; M , ∪ i r i ) = ∑ i PSF ( x , y ; M , r i ) , (S8)where r i = ( x i , y i , z i ) is the 3D position of the i th emitter.The commonly used measurement model is given by a data-dependant Poisson noise, and an additive Gaussian read noise I ( x , y ) ∼ P ( V ( x , y ) + B ( x , y )) + N (cid:16) µ , σ (cid:17) , (S9)where P is the Poisson distribution, B ( x , y ) is a per-pixel back-ground noise, N is the the normal distribution, µ is a baselinecount level, and σ is the read-noise variance.To make the measurement model differentiable, by the lawof large numbers, we can approximate the Poisson noise with aGaussian noise using the central limit theorem P ( V ( x , y ) + B ( x , y )) ≈≈ N ( V ( x , y ) + B ( x , y ) , V ( x , y ) + B ( x , y )) . (S10)The resulting data-dependant noise approximation is imple-mented using the reparameterization trick [96] N ( V ( x , y ) + B ( x , y ) , V ( x , y ) + B ( x , y )) ∼∼ V ( x , y ) + B ( x , y ) + (cid:113) V ( x , y ) + B ( x , y ) × (cid:101) , (S11)where (cid:101) is a realization of a standard normal distribution (cid:101) ∼ N (
0, 1 ) . (S12)Now, the measurement model is differentiable w.r.t . the phasemask M and is therefore suited for end-to-end learning.8 .2 EDOF PSF design In this section we provide the implementation details for de-signing the EDOF PSF, then compare our result with existingpopular designs. There are several ways to implement an EDOFPSF, including: placing an axicon in the optical path [97], us-ing ring apertures, and reducing the numerical aperture of thesystem. Due to photon-efficiency considerations, in this workwe focus on the implementation of an EDOF PSF using a phasemask. Our general strategy is to formulate the design problemas a phase-retrieval task as detailed next.First, we start by simulating the in-focus Airy disk PSF for thedesired optical system. Afterwards, this PSF is thresholded tokeep only the main lobe with diameter D, and the result is fittedwith a 2D Gaussian A ( x , y ) . This Gaussian is then replicated togenerate a synthetic z-stack with 200 nm jumps between slices. A ( x , y ) is also used to define a weighting matrix S ( x , y ) , that"squeezes" signal photons quickly into the diffraction limitedspot, D . Let ( x , y ) be centered pixel coordinates in image space,the matrix S ( x , y ) is given by S ( x , y ) = (cid:40)
1, if (cid:112) x + y ≤ D α · (cid:112) x + y , otherwise , (S13)where in our implementation α =
25, and D = [ nm ] deter-mined empirically to achieve appealing results.Given S ( x , y ) , we try to retrieve the corresponding phasemask associated with the synthetic z-stack via phase retrieval[72]. This is implemented using Stochastic Gradient Descent(SGD) with importance sampling to minimize the following costfunction L EDOF ( M ) = N z ∑ i = (cid:107) ( PSF ( x , y ; M , z i ) − A ( x , y )) · S ( x , y ) (cid:107) ,(S14)where PSF ( x , y ; M , z i ) is the on-axis PSF at depth z i , and N z isthe number of axial slices ( ∆ z ). Let Z ( M ) denote the cur-rent PSF stack dictated by phase mask M , such that Z i ( M ) ≡ PSF ( x , y ; M , z i ) . Our optimization is comprised of the 3 follow-ing steps:1. We compute the correlation C ( z ) of Z ( M ) with A ( x , y ) ateach axial slice z i C ( z i ) = (cid:104)A ( x , y ) , Z i ( M ) (cid:105) (S15)and choose the three axial slices ( z , z , z ) with the lowestcorrelation.2. To avoid overfitting the sampled 200 nm “knots" through-out the axial range, we perturb each of ( z , z , z ) locallywith a random continuous shift δ z ∼ U [ − ] while clipping out-of-range values.3. We calculate the gradient of the cost in eq. (S14) sampledonly at ( z , z , z ) , and take a gradient step.In the third step, we experimented with a few adaptive SGD op-timizers [98–101], and ultimately chose Adam [98]. The processis repeated for 400 iterations, or till the loss function stagnates.Notably, the correlation in our implementation serves as side-information [102], and is used to adaptively sample z slices anddirect the SGD iterations. Compared with a stochastic samplingapproach, this has the benefit of accelerating convergence, andempirically led to better solutions. Figure S1 compares the result to two common EDOF im-plementations: the cubic phase mask [71], and the randomlysampled Fresnel lenses phase mask [70]. The amplitude of thecubic phase mask was chosen such that the PSF is consistent overthe FOV, but retains as much SNR as possible. Our proposedEDOF has three significant advantages over the classical designs:(1) its lateral extent is much smaller than the cubic phase maskPSF, matching our density requirements, (2) the SNR in the mainspot is higher than both other methods, and (3) the proposedphase mask is smooth compared to randomly sampled Fresnellenses. This facilitates its implementation using LC-SLM devicesas these suffer from inter-pixel cross-talk [103]. abcd Fig. S1.
Comparisons of EDOF PSFs in simulation. (a) Stan-dard unmodulated PSF. (b) Cubic phase mask PSF. (c) Ran-domly sampled Fresnel lenses PSF. (d) Ours. The colorscale isnormalized to the maximum intensity of the in-focus, unmod-ulated PSF.
A.3 CRLB optimization
In this section we derive the Cramér Rao Lower Bound (CRLB)[9, 73, 74] of the system in Fig. 1. For simplicity, we start withthe assumption that the measurement model is reduced to aPoisson data-dependent noise only. At the end, we also providethe expression for the extended case including the read noise.First let us start with some notation. Let θ = ( x , y , z ) de-note the 3D position of a single emitter imaged with the systemin Fig. 1, let u = ( x , y ) denote the concatenated coordinatesat the image plane, and let P θ ( u ; M k ) ≡ PSF ( u ; M k ; θ ) denotethe model PSF of the emitter in the detection path with phasemodulation M k . Assuming Poisson statistics for the source andbackground signals, the measured PSF I k ( u ) is given by I k ( u ) ∼ P ( P θ ( u ; M k ) + B k ( u )) , (S16)where B k ( u ) is a per-pixel background. The log-likelihood func-tion (cid:96) ( I k ( u ) ; θ ) for the measurement in eq. (S16) is given by (cid:96) ( I k ( u ) ; θ ) = N u ∑ u = I k ( u ) · log ( P θ ( u ; M k )) − P θ ( u ; M k )+ C ( I k ( u )) , (S17)where N u is the number of pixels in the image, and C ( I k ( u )) is a function of the measurements that is independent of theunknown 3D position θ .Given a log-likelihood function, the Fisher Information ma-trix F ( θ ) is defined as [73] [ F ( θ )] i , j = E (cid:34)(cid:18) ∂∂θ i (cid:96) ( I k ( u ) ; θ ) (cid:19) · (cid:32) ∂∂θ j (cid:96) ( I k ( u ) ; θ ) (cid:33) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) θ (cid:35) .(S18)9ubstituting the log-likelihood from eq. (S17) we get [ F ( θ ; M k )] i , j = N u ∑ u = (cid:18) ∂∂θ i P θ ( u ; M k ) (cid:19) · (cid:32) ∂∂θ j P θ ( u ; M k ) (cid:33) · θ ( u ; M k ) + B ( u ) . (S19)Assuming independent photon arrivals in each detectionpath, the measurements I ( u ) , I ( u ) become independent.Therefore, the joint information of both PSFs is given by thesum of the individual information from each PSF. Formally, let F ( θ ; M k ) denote the information matrix of the measurementwith phase modulation M k . The joint Fisher Information matrix F ( θ ; M , M ) for measurements I ( u ) , I ( u ) is given by F ( θ ; M , M ) = F ( θ ; M ) + F ( θ ; M ) . (S20)Let θ i ∈ { x , y , z } denote the coordinate of the 3D position.Given F ( θ ; M , M ) , the CRLB for estimating θ i is defined as[73] CRLB i ( θ ; M , M ) = (cid:104) F ( θ ; M , M ) − (cid:105) i , i , (S21)where F ( θ ; M , M ) − denote the inverse of the Fisher infor-mation matrix. Based on eq. (S21), to derive the phase masks M , M optimizing the CRLB for all three estimated parametersˆ θ = ( ˆ x , ˆ y , ˆ z ) , we minimize the following cost function L CRLB ( M , M ) = ∑ i ∈{ ˆ x , ˆ y ,ˆ z } ∑ θ (cid:48) (cid:113) CRLB i ( θ (cid:48) ; M , M ) .(S22)In our implementation, CRLB i ( θ (cid:48) ) is evaluated at on-axis po-sitions θ (cid:48) = (
0, 0, z (cid:48) ) , where z (cid:48) is sampled each 250 nm through-out the desired axial range. We also simplify the per-pixel back-ground term B ( u ) to a single scalar of 15 photons per pixel, andscale the PSFs to match realistic signal counts encountered inSMLM imaging, i.e . 2000 photons per emitter. Notably, differentfrom our previous work [9], we optimized the CRLB using aper-pixel approach rather than constraining the solution to a sub-space of Zernike polynomials. This was particularly importantto efficiently navigate the wide variety of possible solutions.Finally, in this work we focused our attention on SMLMexperimental conditions. Therefore, for our purpose the readnoise effects were negligible. However, the optimization is read-ily extended to the mixed Poisson-Gaussian case by revisitingeqs. (S16), (S17) and (S19). Specifically, assume the measuredPSF I k ( u ) is given by I k ( u ) ∼ P ( P θ ( u ; M k ) + B k ( u )) + N (cid:16) µ , σ (cid:17) , (S23)where N is the normal distribution, µ is a baseline, and σ is thevariance of the read noise.We can approximate the Poisson noise by a Gaussian noiseusing eq. (S10) I k ( u ) ∼ N ( P θ ( u ; M k ) + B k ( u ) , P θ ( u ; M k ) + B k ( u ))+ N (cid:16) µ , σ (cid:17) . (S24)Assuming both noise sources are independent we get I k ( u ) ∼ N (cid:16) P θ ( u ; M k ) + B k ( u ) + µ , P θ ( u ; M k ) + B k ( u ) + σ (cid:17) .(S25) The resulting log-likelihood function (cid:96) ( I k ( u ) ; θ ) for the mea-surement in eq. (S25) is given by (cid:96) ( I k ( u ) ; θ ) = − N u ∑ u = log (cid:16) P θ ( u ; M k ) + B k ( u ) + σ (cid:17) +( I k ( u ) − P θ ( u ; M k )) P θ ( u ; M k ) + B k ( u ) + σ , (S26)where N u is the number of pixels in the image. Substituting thelog-likelihood from eq. (S26) in the definition from eq. (S18) weget [ F ( θ ; M k )] i , j = N u ∑ u = (cid:18) ∂∂θ i P θ ( u ; M k ) (cid:19) · (cid:32) ∂∂θ j P θ ( u ; M k ) (cid:33) · (cid:32) θ ( u ; M k ) + B ( u ) + σ + ( P θ ( u ; M k ) + B ( u ) + σ ) (cid:33) .(S27)Substituting eq. (S27) in eqs. (S20) to (S22) we get the desiredcost function for the general case. A.4 Learning details
A.4.1 CNN architecture
In this work, we adapt the CNN architecture previously pro-posed in DeepSOTRM3D [21] to process an image with 2 chan-nels (Fig. S2). Our architecture is relatively light with only ≈ d max was set according to the PSFs lateral foot-print: d max =
16 for the Tetrapod-EDOF pair, and d max = × µ m range, we use 80 channels, i.e . a voxel-size of ≈
50 nm in z . The final prediction is givenby a 1 × [
0, W ] , where W is setempirically to 800 to account for class imbalance (occupiedvs. vacant voxels).The spatial supports of all convolutional filters are 3 ×
3. Eachconvolution block is follow by a Batch Normalization layer, anda LeakyReLU non-linearity with slope α =
04H 4W2H2W64HW Receptive field + EDOF Tetrapod {{ + CRLB/Nebulae {{ Fig. S2.
CNN architecture. (a) The concatenated snapshot images I , I are fed to a CNN composed of 3 modules as describedin the text. Feature maps dimensions are depicted with [104] to reflect the operation of each module. The spatial supports of allconvolutional filters are 3 ×
3. The number of channels is fixed to 64 in both the multi-scale context aggregation, and the upsam-pling modules. Then, the number is increased to 80 for the refinement module. Note that in the context-aggregation module thespatial support of all convolutional filters is 3 ×
3, although their receptive field grows exponentially with the dilation rate. Bluesquare depicts the final receptive field for both choices of d max . The output 3D high-resolution volume is translated to a list of 3Dlocalizations through simple post-processing. Scale bars are 3 µ m.localizations at test time, we threshold the voxel values andfind local maxima in clustered components (details in sectionA.4.5). Lastly, to efficiently learn the phase masks with reducedcomputation, we modify the architecture in a similar fashion tothat described in [21].Notably, in this work we used the same encoder to processboth images. In our implementation the image pair is firstwarped using a calibrated affine transform prior to CNN pro-cessing. However, in case of severe inter-channel misalingmentthis is expected to be sub-optimal, and a “Y-net" structure withseparate encoders should be considered. In particular, one ofthe encoders could be potentially swapped with a spatial trans-former network [108] to alleviate the need for calibration. A.4.2 Training set
To learn a localization CNN solely with predefined phase masks,we simulate a training set composed of 10K simulated image-pairs and their corresponding labels which are lists of emitterpositions. 9K examples were used for training with 1K examplesheld out for validation. Alternatively, to jointly learn the phasemasks and the localization CNN parameters, the training set iscomposed of solely simulated emitter positions, as the respectiveimage-pairs are being changed throughout iterations accordingto the phase masks.In our implementation the training positions are randomlydrawn within the 3D cube of possible locations in order forthe method to generalize to arbitrary imaged structures. TheBoolean grid used as label in training is given by projectingthe continuous positions on the recovery grid (voxel size of27.5 × × nm ).Given a set of 3D locations, the expected model images aresimulated using the measurement model in eq. (S9). To accu-rately model experimental data in our simulations, we imagebeads on the coverslip prior to the experiment, and retrieve theaberrated pupil functions using VIPR [72]. To make our simu- lations realistic, we diversify the training conditions to includeexperimentally variability. Namely, we vary the emitter density,the signal-to-noise ratio, the amount of blur, and any additionalexpected experimental challenges ( e.g . motion blur, laser fringesetc.). For example, in telomere imaging we have observed ahighly non-uniform per-pixel background, presumably result-ing from the nucleus auto-fluorescence. To model this effect, weapproximate the per-pixel background B ( x , y ) in eq. (S9) usinga super-Gaussian B ( u ) = A e − ( u − µ ) T Σ − ( u − µ ) + A , (S28)where u = ( x , y ) is the combined 2D coordinates in image space, A , A are scaling parameters, µ is the 2D centroid, and Σ is thecovariance matrix. These parameters are augmented in trainingto make the model robust to their variations. A.4.3 Loss function
Let x denote the GT boolean volume, and ˆ x denote the networkprediction. Our loss function for training the net L Net is a com-bination of two terms L Net ( x , ˆ x ) = L Heatmaps ( x , ˆ x ) + L Overlap ( x , ˆ x ) . (S29)The first term L Heatmaps is a 3D heatmap matching loss, givenby L Heatmaps ( x , ˆ x ) = (cid:107) x (cid:126) G D − ˆ x (cid:126) G D (cid:107) , (S30)where G D is a 3D Gaussian kernel with a standard deviation of 1voxel. This term measures the proximity of our prediction to thesimulated ground truth by measuring the (cid:96) distance betweentheir respective heatmaps.The second term L Overlap is a measure of overlap, given by L Overlap ( x , ˆ x ) = − · ∑ i x i · ˆ x i ∑ i x i · ˆ x i + ∑ i x i . (S31)11 F o ca l s ca n C a li b r a ti on s t ac k S n a p s ho t p a i r C a li b r a t e d m od e l T r a i n i ng S n a p s ho t s i m u l a t o r T e s ti ng Z [ ] Y [ ] X [ ] E xp e r i m e n t a l G T ( O p ti on a l ) F o ca l s ca n [ ] Ground truth True positive False positive False negative
MeasuredPoly2 fitRecovered depthFocal position[ ] M ea n i n t e n s it y V I P R Fig. S3.
Overview of a typical experiment. Fluorescent beads are used to create 3D PSF scans of the two channels, which are thenmodelled using VIPR. The calibrated PSF models are used to train the localization net. The trained net can then localize experimen-tal data, and output the desired 3D positions from snapshot measurements. For fixed samples, where an experimental ground truthis available, the Jaccard index is calculated by matching the axial scan results with the net output.This term provides a soft approximation of the true positive ratein the prediction. Note that L Overlap doesn’t take into accountfalse positives, and hence if optimized alone will result in apredicted volume of 1s. Although, here this is not a feasiblesolution as it is not favored by L Heatmaps . In our implementationwe weight voxels containing emitters with a factor of W=800 inorder to balance out the contributions of vacant and occupiedvoxels. Hence, the CNN output is constrained to be in the range [
0, 800 ] . This strategy makes optimization easier and preventsgradient clipping. A.4.4 Optimization and hyper-parameters
We used the Adam optimizer [98] with the following parameters:lr = × − , β = β = (cid:101) = − . The batch sizewas 16 for learning a phase mask, and 4 for learning a recoverynet (due to GPU memory). The learning rate was reduced bya factor of 10 when the loss plateaus for more than 5 epochs,and training was stopped if no improvement was observedfor more than 7 epochs, or alternatively a maximum numberof 50 epochs was reached. The initial weights were sampledfrom a uniform distribution on the interval (cid:104) −√ k , √ k (cid:105) where k = k x × k y × C in , with k x , k y the filter spatial dimensions, and C in the number of input channels to the convolutional layer.Training and evaluation were run on a workstation equippedwith 32 GB of memory, an Intel(R) Core(TM) i − ≈
25 h, and recovery nettraining took ≈
35 h. Our code is implemented using the Pytorchframework [109], and soon will be made publicly available at https://github.com/EliasNehme/DeepNebulae . A.4.5 Post-processing
The fully convolutional architecture that we adopted in thiswork outputs a super-resolved 3D volume, where occupied vox-els account for emitters. To compile a list of localizations, wefirst threshold this volume keeping only voxels with a minimalconfidence of 80 (maximal output is 800). Afterwards, out ofthe remaining localizations we discard those which are not lo-cal maximas in their 3D vicinity. The radius used for groupingand local maxima finding was ≈
100 nm. Lastly, the recoveredcontinuous 3D position is given by applying the 3D Center ofGravity (CoG) estimator to the vicinity of the local maximas inthe prediction volume. While it is possible to use more sophis-ticated post-processing steps we choose to use this simple andefficient strategy to keep our method as fast as possible. In ourimplementation we write these steps as a composition of pool-ing and convolution operations, making calculations extremelyefficient on GPU.Notably, While grouping and local maxima finding poten-tially limits the maximal density, keep in mind that overlaps in2D normally translates to non-overlapping "blobs" in 3D. Hence,this is hardly a limitation in common imaging conditions as12ocalization algorithms struggle considerably before reachingthis limit.In the telomere tracking experiment, the per-frame localiza-tions were linked using DBSCAN clustering [110] applied di-rectly to the 3D positions. The maximum distance allowed be-tween points was (cid:101) = µ m , and the minimal number ofemitters per cluster was minPts=25. This resulted in filtering83 localizations out of 24530 throughout the 500 frames, i.e . lessthan 0.3%. All tracks started within the first 6 frames and wererelatively clustered in 3D with no bifurcations observed. Formore complicated tracking scenarios the reader is encouraged tolink the CNN localizations by resorting to a more robust trackingsoftware such as [111]. A.5 Experimental implementation
This section details the full experimental procedure to local-ize emitters using snapshot measurements from the dual-viewsetup. An outline of a typical experiment is presented in Fig. S3.The following subsections detail each part of the experiment forcompleteness.
A.5.1 Dual channel calibration
The goal of this section is to describe the process of calibratingthe proposed dual-camera system, such that simulated PSFswill match measured data and their positions will correspondbetween the two images. The practice of aligning an optical 4 f Fourier processing system, calibrating a LC-SLM, and creatinga simulated model for a single channel has been meticulouslyexplained in many previous works ( e.g . [112]).The proposed system consists of two identical optical pathswhich generate 3D PSF images. The acquired images are en-coded simultaneously in the localization network, and thus posesome extra challenges in the calibration process, specifically withrespect to their spatial alignment. In our work, post-processingcorrections are not a viable option due to the density of PSFs,necessitating a good calibration of the 3D alignment. For thisend, we created two calibration samples (sparse and dense) con-sisting of a water-covered glass coverslip (Fisher Scientific) with40 nm fluorescent beads (FluoSpheres (580/605), ThermoFisher)adhered to the surface with 1% PVA. The dense sample waschosen such that the unmodulated PSFs will cover the entirefield of view (FOV) but each individual bead can still be fitusing ThunderSTORM [113]. The localizations from each chan-nel were used to estimate an affine transformation between thetwo cameras (Fig. S4). To prevent outliers from biasing thetransformation, we implemented a Random sample consensus(RANSAC) procedure.Next, the sparse sample is chosen such that each slice of the3D PSFs (for both channels) can be imaged without any overlapsfrom neighboring emitters. An axial scan is performed to ensurethat both channels measure corresponding PSFs at the same focalplane positions, to account for any minor axial misalignmentbetween the two cameras. The point of reference (lateral) waschosen as the center of gravity of the maximum projection inone of the channels.The reference point of the second channel was calculatedusing the aforementioned affine transformation. Next, we usedVIPR [72] to generate a phase mask for each channel, as it pro-vides with a good model and accounts for the issue of wobbleand near field effects by implementing the vectorial diffractionmodel. Importantly, while the affine transformation is calculated using localizations and not images, ultimately the input to thelocalization network is an image-pair. However, since a globalaffine transformation is not a shift-invariant operator, a fully con-volutional model will struggle to learn this operator efficiently.Therefore, at test time, we warp the image of one camera to alignwith its counterpart, and feed the aligned concatenated imagepair to the network. The warping operation is implementedusing cubic-spline interpolation.
X [ ] Y [ ] TetrapodEDOF
Fig. S4.
Channel registration. The estimated affine transforma-tion for the Tetrapod-EDOF experiment (main text Fig. 3). J acca r d I nd e x [ a . u . ] L a t e r a l R M S E A x i a l R M S E Density
AffineWarpAligned
Fig. S5.
Effect of image misalignment. Numerical comparisonbetween networks trained with aligned images (blue), mis-aligned images (green) and approximately aligned images (upto 50 nm) by warping (red).13o test the importance of image alignment, we trained threedifferent models: (1) with perfectly aligned positions, (2) withrandomly misaligned positions (achieve by sampling portions ofthe estimated transformation), and (3) with misaligned positionsaccompanied by a known transformation between channels (upto 50 nm) that is used to warp the images. Three conclusionscan be made based on the results (Fig. S5): (1) it is clear thatthe model is unable to efficiently cope with a random globaltransform, (2) calibrating the affine transform up to 50 nm errorsand warping the images prior to localization improves perfor-mance, and (3) perfect alignment of the Tetrapod and the EDOFPSFs does not improve the axial localization precision. The lat-ter is expected because the axial information is decoded solelybased on the Tetrapod channel. Therefore, it is insensitive to thealignment with the EDOF PSF which does not encode z . A.5.2 Optical components
The imaging system in Fig. 1 consists of a Nikon Eclipse-Tiinverted fluorescence microscope with a 100X/1.49 NA Nikonobjective (CFI SR HP Apo TIRF 100XC). A polarizing beam split-ter was placed after the first achromatic doublet lens (f=15 cm) tosplit the emission path. Both paths consisted of three additionalachromatic doubles lenses to image the back focal plane onto aLC-SLM (Pluto-VIS020, Holoeye in the first path, and 1920X1152liquid crystal on silicon, Meadowlark in the second). After a lastimage-forming lens, the modulated images were recorded bytwo sCMOS cameras (Prime 95B, Photometrics). For full syn-chronization, the first camera triggered the second camera (in aleader-follower configuration), which in turn triggered the 561nm illumination laser (iChrome MLE, Toptica).
A.5.3 Biological sample preparation
For cell experiments, U2OS cells were prepared as describedpreviously in [21]. In brief, cells were grown in standard con-ditions: 37 o C , 5% CO in Dulbecco‘s Modified Eagle Media(DMEM - without phenol red for the live cells experiment) with1 gl − D-glucose (low glucose), supplemented with 10% fetalbovine serum, and 1% penicillin–streptomycin and glutamine.To fluorescently label the telomeres, cells were transfected with aplasmid encoding the fluorescently tagged telomeric repeat bind-ing factor 1 (DsRed-hTRF1) using Lipofectamine 3000 (ThermoFisher Scientific). After 20-24 hours, cells were either fixed with4% paraformaldehyde for 20 min, washed three times with PBSand mounted to a slide (22 ×
22 mm , 170 µ m thick) with mount-ing medium; or imaged live in a temperature, humidity, andgas-mixture controlled imaging chamber mounted to the micro-scope (Okolab) on a glass bottom culture dish (15mm, 180 µ m thick). A.5.4 Ground truth estimation
In fixed cell experiments, the experimental ground truth 3Dpositions were approximated via axial scanning with the unmod-ulated PSF (Fig. S6). The scan consisted of 100 nm steps overa range of 4-5 µ m . The resulting z-stack was then processed inthe following manner: first, detection and lateral position esti-mation were performed with ThunderSTORM [113]. Next, thein-focus position of emitters was estimated by fitting a secondorder polynomial to the mean intensity across focal slices. Themean intensity was calculated as the mean of number of countsin the central 5 × Max projection MeasuredPoly2 fitIn-focus positionFocal position[ ] M ea n i n t e n s it y ab c Fig. S6.
Experimental ground truth approximation. (a) A focalsweep is performed with an unmodulated imaging path. (b)Max projection of the focal sweep, showing the density oflabelled telomeres in the U2OS cell. (c) Axial fit of the meanintensity to determine the in-focus position of an emitter.
A.6 Additional simulation results
In this section we present further numerical simulation resultswhich support conclusions from the main text and the choice ofthe PSF pair. The first result presented in Figs. S7 and S8 shows anumerical comparison between single-channel and dual-channelsetups in terms of their detection (measured by the Jaccard index)and the average precision (measured by the lateral\axial RMSE).We compare the Tetrapod-EDOF (blue) pair to the commonlyused biplane (cyan) method [4, 5] and to two single-channel ap-proaches with double signal, namely the Tetrapod PSF (red) andthe single channel end-to-end optimized phase mask (orange)adopted from DeepSTORM3D [21]. The numerical results showthat the Tetrapod-EDOF pair is the best in detection. In terms ofthe lateral RMSE in high densities, the biplane approach is betteras the in-focus PSF is more photon efficient than the EDOF. Theaxial RMSE result shows that the proposed pair is surpassedonly by the end-to-end encoding of a single channel. This islikely because the axial position is mainly encoded in the Tetra-pod path, thus is limited to the axial localization performanceof the single channel Tetrapod at high densities. These resultsreinforce the decision to explore other solutions which mutu-ally encode all parameters in both channels, and are optimal fordetection and localization.The second result in Fig. S9 is a comparison between thethree proposed PSF pairs in this manuscript. Both the detectionand the average precision support our claim that the NebulaePSFs (green) are better than the CRLB (black) and Tetrapod-EDOF pairs (blue). A similar conclusion was drawn from the14 Fig. S7.
PSFs for single and dual channel comparisons. Phase masks which were used in the single-channel vs. dual-channel com-parison: (top to bottom) Tetrapod, end-to-end encoding for single-channel, EDOF, and biplane. J acca r d I nd e x [ a . u . ] L a t e r a l R M S E Density A x i a l R M S E Tetrapod x2 Signal Learned x2 SignalTetrapod + EDOFBiplane
Fig. S8.
Single-channel vs. dual-channel systems. Detection(left) and localization precision (lateral \axial) over a range ofsimulated density of sources. Emitters were simulated with ≈
15K signal photons per emitter and ≈
500 background pho-tons per pixel. Each data point is an average of n = 100 simu-lated images. Average standard deviation in Jaccard index was ≈
5% and in precision was ≈ NebulaeCRLBTetrapod + EDOF J acca r d I nd e x [ a . u . ] L a t e r a l R M S E A x i a l R M S E Density
Fig. S9.
Performance as function of density for the three pro-posed PSF pairs. The methods are tested in detection (left) andlocalization precision (lateral \axial RMSE). Emitters weresimulated with ≈
15K signal photons per emitter and ≈ ≈
5% and in precision was ≈ A.7 Additional experimental results
This section presents more experimental results in fixed celldata. Figure S10 explores the false negatives presented in 3. Allof the experimentally undetected points were with a very lowsignal. While the EDOF performs well in 2D, it is not as signalefficient as the in-focus unmodulated PSF. Thus, emitters whichare slightly above the noise limit (without a phase mask) canbe detected in the axial scan but are invisible for the EDOF andTetrapod PSFs. This was improved in the subsequent PSF-pairswhich complement each other more efficiently.To validate our conclusions from simulation regarding theNebulae PSFs being the optimal pair, we have shown in Fig. S11that the Nebulae PSFs are outperform the Tetrapod-EDOF pair.For completeness, we show in Fig. S11 the results including theCRLB-pair for the same cell. As predicted in simulations, theCRLB pair performs slightly worse than the Nebulae PSFs butbetter than the Tetrapod-EDOF pair. To verify reproducibility,we present in Fig. S12 similar experimental results for a biggercell, which exhibits a staggering number of 142 emitters. Thereconstruction results are improved for all PSF pairs as thiscell experiences less overlaps, yet, they are consistent with theprevious conclusions on PSF-pair performance. a bc d 1 2
12 345 678 Fig. S10.
Experimental false negatives for the Tetrapod-EDOFpair.(a) U2OS cell experimental snapshot with the TetrapodPSF (Fig. 3). (b) Reconstructed image by rendering the posi-tions recovered by the net with the Tetrapod PSF. Asterisksmark true (green) and false (blue) positives. (c) Paired experi-mental EDOF snapshot. (d) Zoom-ins on undetected emitters(false positives).
References [1] E. Betzig, G. H. Patterson, R. Sougrat, O. W. Lindwasser,S. Olenych, J. S. Bonifacino, M. W. Davidson, J. Lippincott-Schwartz, and H. F. Hess, “Imaging intracellular fluores-cent proteins at nanometer resolution,”
Science , vol. 313,no. 5793, pp. 1642–1645, 2006.[2] S. T. Hess, T. P. Girirajan, and M. D. Mason, “Ultra-highresolution imaging by fluorescence photoactivation local-ization microscopy,”
Biophysical journal , vol. 91, no. 11,pp. 4258–4272, 2006.[3] M. J. Rust, M. Bates, and X. Zhuang, “Sub-diffraction-limitimaging by stochastic optical reconstruction microscopy(storm),”
Nature methods , vol. 3, no. 10, pp. 793–796, 2006.[4] S. Ram, P. Prabhat, J. Chao, E. S. Ward, and R. J. Ober,“High accuracy 3d quantum dot tracking with multifocalplane microscopy for the study of fast intracellular dy-namics in live cells,”
Biophysical journal , vol. 95, no. 12,pp. 6025–6043, 2008.[5] M. F. Juette, T. J. Gould, M. D. Lessard, M. J. Mlodzianoski,B. S. Nagpure, B. T. Bennett, S. T. Hess, and J. Bewersdorf,“Three-dimensional sub–100 nm resolution fluorescencemicroscopy of thick samples,”
Nature methods , vol. 5, no. 6,pp. 527–529, 2008.[6] B. Louis, R. Camacho, R. Bresolí-Obach, S. Abakumov,J. Vandaele, T. Kudo, H. Masuhara, I. G. Scheblykin,J. Hofkens, and S. Rocha, “Fast-tracking of single emit-ters in large volumes with nanometer precision,”
OpticsExpress , vol. 28, no. 19, pp. 28656–28671, 2020.[7] B. Huang, W. Wang, M. Bates, and X. Zhuang, “Three-dimensional super-resolution imaging by stochastic opti-cal reconstruction microscopy,”
Science , vol. 319, no. 5864,pp. 810–813, 2008.[8] S. R. P. Pavani, M. A. Thompson, J. S. Biteen, S. J. Lord,N. Liu, R. J. Twieg, R. Piestun, and W. Moerner, “Three-dimensional, single-molecule fluorescence imaging be-yond the diffraction limit by using a double-helix pointspread function,”
Proceedings of the National Academy ofSciences , vol. 106, no. 9, pp. 2995–2999, 2009.[9] Y. Shechtman, S. J. Sahl, A. S. Backer, and W. Moerner,“Optimal point spread function design for 3d imaging,”
Physical review letters , vol. 113, no. 13, p. 133902, 2014.[10] A. S. Backer and W. Moerner, “Extending single-moleculemicroscopy using optical fourier processing,”
The Journalof Physical Chemistry B , vol. 118, no. 28, pp. 8313–8329,2014.[11] Y. Shechtman, L. E. Weiss, A. S. Backer, S. J. Sahl, andW. Moerner, “Precise three-dimensional scan-free multiple-particle tracking over large axial ranges with tetrapodpoint spread functions,”
Nano letters , vol. 15, no. 6,pp. 4194–4199, 2015.[12] A. Aristov, B. Lelandais, E. Rensen, and C. Zimmer, “Zola-3d allows flexible 3d localization microscopy over an ad-justable axial range,”
Nature communications , vol. 9, no. 1,p. 2409, 2018.[13] D. Sage, T.-A. Pham, H. Babcock, T. Lukes, T. Pengo,J. Chao, R. Velmurugan, A. Herbert, A. Agrawal, S. Co-labrese, et al. , “Super-resolution fight club: assessment of2d and 3d single-molecule localization microscopy soft-ware,”
Nature methods , vol. 16, no. 5, p. 387, 2019.[14] L. Möckl, A. R. Roy, and W. Moerner, “Deep learning insingle-molecule microscopy: fundamentals, caveats, andrecent developments,”
Biomedical Optics Express , vol. 11,16 b FN FP TP
X [ ]Y [ ] Z [ ] Z [ ] Y [ ] X [ ] a Z [ ] Y [ ] X [ ]
FN FP TP c Ground truth True positive False positive False negative
FN FP TP
Fig. S11.
Experimental measurement of fixed U2OS cells with fluorescently labelled telomeres. Example images showing the twoproposed mask image pairs and subsequent 3D reconstruction plotted over the approximated ground truth: (a) Tetrapod-EDOFpair with J = J = J = c FN FP TP ab FN FP TP
FN FP TP
X [ ]Y [ ] Z [ ] X [ ]X [ ]Y [ ]Y [ ] Z [ ] Z [ ] Ground truth True positive False positive False negative
Fig. S12.
Experimental measurement of fixed U2OS cells with fluorescently labelled telomeres. Example images showing the twoproposed mask image pairs and subsequent 3D reconstruction plotted over the approximated ground truth: (a) Tetrapod-EDOFpair with J = J = J = Nature biotechnology , 2018.[16] S. K. Gaire, Y. Zhang, H. Li, R. Yu, H. F. Zhang, and L. Ying,“Accelerating multicolor spectroscopic single-molecule lo-calization microscopy using deep learning,”
BiomedicalOptics Express , vol. 11, no. 5, pp. 2705–2721, 2020.[17] E. Nehme, L. E. Weiss, T. Michaeli, and Y. Shecht-man, “Deep-storm: super-resolution single-molecule mi-croscopy by deep learning,”
Optica , vol. 5, no. 4, pp. 458–464, 2018.[18] N. Boyd, E. Jonas, H. P. Babcock, and B. Recht, “Deeploco:Fast 3d localization microscopy using neural networks,”
BioRxiv , p. 267096, 2018.[19] J. M. Newby, A. M. Schaefer, P. T. Lee, M. G. Forest, andS. K. Lai, “Convolutional neural networks automate de-tection for tracking of submicron-scale particles in 2d and3d,”
Proceedings of the National Academy of Sciences , vol. 115,no. 36, pp. 9026–9031, 2018.[20] B. Diederich, P. Then, A. Jügler, R. Förster, and R. Heintz-mann, “cellstorm—cost-effective super-resolution on acellphone using dstorm,”
PloS one , vol. 14, no. 1,p. e0209827, 2019.[21] E. Nehme, D. Freedman, R. Gordon, B. Ferdman, L. E.Weiss, O. Alalouf, T. Naor, R. Orange, T. Michaeli, andY. Shechtman, “Deepstorm3d: dense 3d localization mi-croscopy and psf design by deep learning,”
Nature Meth-ods , vol. 17, no. 7, pp. 734–740, 2020.[22] A. Speiser, S. C. Turaga, and J. H. Macke, “Teaching deepneural networks to localize sources in super-resolutionmicroscopy by combining simulation-based learning andunsupervised learning,” arXiv preprint arXiv:1907.00770 ,2019.[23] R. Barth, K. Bystricky, and H. Shaban, “Coupling chro-matin structure and dynamics by live super-resolutionimaging,” bioRxiv , p. 777482, 2019.[24] G. Barbastathis, A. Ozcan, and G. Situ, “On the use ofdeep learning for computational imaging,”
Optica , vol. 6,no. 8, pp. 921–943, 2019.[25] G. Ongie, A. Jalal, C. A. M. R. G. Baraniuk, A. G. Dimakis,and R. Willett, “Deep learning techniques for inverse prob-lems in imaging,”
IEEE Journal on Selected Areas in Informa-tion Theory , 2020.[26] T. Falk, D. Mai, R. Bensch, Ö. Çiçek, A. Abdulkadir, Y. Mar-rakchi, A. Böhm, J. Deubner, Z. Jäckel, K. Seiwald, et al. ,“U-net: deep learning for cell counting, detection, andmorphometry,”
Nature methods , vol. 16, no. 1, pp. 67–70,2019.[27] Y. Rivenson, Z. Göröcs, H. Günaydin, Y. Zhang, H. Wang,and A. Ozcan, “Deep learning microscopy,”
Optica , vol. 4,no. 11, pp. 1437–1443, 2017.[28] M. Weigert, U. Schmidt, T. Boothe, A. Müller, A. Dibrov,A. Jain, B. Wilhelm, D. Schmidt, C. Broaddus, S. Culley, et al. , “Content-aware image restoration: pushing the lim-its of fluorescence microscopy,”
Nature methods , vol. 15,no. 12, p. 1090, 2018.[29] A. Krull, T.-O. Buchholz, and F. Jug, “Noise2void-learningdenoising from single noisy images,” in
Proceedings of theIEEE Conference on Computer Vision and Pattern Recognition ,pp. 2129–2137, 2019.[30] S. Lim, H. Park, S.-E. Lee, S. Chang, B. Sim, and J. C.Ye, “Cyclegan with a blur kernel for deconvolution mi- croscopy: Optimal transport geometry,”
IEEE Transactionson Computational Imaging , vol. 6, pp. 1127–1138, 2020.[31] R. Horstmeyer, R. Y. Chen, B. Kappes, and B. Judkewitz,“Convolutional neural networks that teach microscopeshow to image,” arXiv preprint arXiv:1709.07223 , 2017.[32] A. Muthumbi, A. Chaware, K. Kim, K. C. Zhou, P. C.Konda, R. Chen, B. Judkewitz, A. Erdmann, B. Kappes,and R. Horstmeyer, “Learned sensing: jointly optimizedmicroscope hardware for accurate image classification,”
Biomedical Optics Express , vol. 10, no. 12, pp. 6351–6369,2019.[33] C. Ounkomol, S. Seshamani, M. M. Maleckar, F. Coll-man, and G. R. Johnson, “Label-free prediction of three-dimensional fluorescence images from transmitted-lightmicroscopy,”
Nature methods , vol. 15, no. 11, pp. 917–920,2018.[34] Y. Rivenson, Y. Zhang, H. Günaydın, D. Teng, and A. Oz-can, “Phase recovery and holographic image reconstruc-tion using deep learning in neural networks,”
Light: Sci-ence & Applications , vol. 7, no. 2, p. 17141, 2018.[35] T. Nguyen, Y. Xue, Y. Li, L. Tian, and G. Nehmetallah,“Deep learning approach for fourier ptychography mi-croscopy,”
Optics express , vol. 26, no. 20, pp. 26470–26484,2018.[36] Y. Xue, S. Cheng, Y. Li, and L. Tian, “Reliable deep-learning-based phase imaging with uncertainty quantifi-cation,”
Optica , vol. 6, no. 5, pp. 618–629, 2019.[37] Z. Wu, Y. Sun, A. Matlock, J. Liu, L. Tian, and U. S.Kamilov, “Simba: scalable inversion in optical tomogra-phy using deep denoising priors,”
IEEE Journal of SelectedTopics in Signal Processing , 2020.[38] J. T. Smith, R. Yao, N. Sinsuebphon, A. Rudkouskaya,J. Mazurkiewicz, M. Barroso, P. Yan, and X. Intes, “Ultra-fast fit-free analysis of complex fluorescence lifetime imag-ing via deep learning,” bioRxiv , p. 523928, 2019.[39] P. Zelger, K. Kaser, B. Rossboth, L. Velas, G. Schütz, andA. Jesacher, “Three-dimensional localization microscopyusing deep learning,”
Optics express , vol. 26, no. 25,pp. 33166–33179, 2018.[40] P. Zhang, S. Liu, A. Chaurasia, D. Ma, M. J. Mlodzianoski,E. Culurciello, and F. Huang, “Analyzing complex single-molecule emission patterns with deep learning,”
Naturemethods , vol. 15, no. 11, p. 913, 2018.[41] E. Hershko, L. E. Weiss, T. Michaeli, and Y. Shechtman,“Multicolor localization microscopy and point-spread-function engineering by deep learning,”
Optics express ,vol. 27, no. 5, pp. 6158–6183, 2019.[42] G. Dardikman-Yoffe and Y. C. Eldar, “Learned spar-com: Unfolded deep super-resolution microscopy,” arXivpreprint arXiv:2004.09270 , 2020.[43] L. Möckl, P. N. Petrov, and W. Moerner, “Accurate phaseretrieval of complex 3d point spread functions with deepresidual neural networks,”
Applied Physics Letters , vol. 115,no. 25, p. 251106, 2019.[44] L. Möckl, A. R. Roy, P. N. Petrov, and W. Moerner, “Accu-rate and rapid background estimation in single-moleculelocalization microscopy using the deep neural networkbgnet,”
Proceedings of the National Academy of Sciences ,vol. 117, no. 1, pp. 60–67, 2020.[45] D. Saha, U. Schmidt, Q. Zhang, A. Barbotin, Q. Hu, N. Ji,M. J. Booth, M. Weigert, and E. W. Myers, “Practical sen-sorless aberration estimation for 3d microscopy with deeplearning,”
Optics Express , vol. 28, no. 20, pp. 29044–29053,18020.[46] A. Shajkofci and M. Liebling, “Spatially-variant cnn-basedpoint spread function estimation for blind deconvolutionand depth estimation in optical microscopy,”
IEEE Trans-actions on Image Processing , vol. 29, pp. 5848–5861, 2020.[47] H. Gupta, M. T. McCann, L. Donati, and M. Unser, “Cryo-gan: A new reconstruction paradigm for single-particlecryo-em via deep adversarial learning,”
BioRxiv , 2020.[48] C. Belthangady and L. A. Royer, “Applications, promises,and pitfalls of deep learning for fluorescence image recon-struction,”
Nature methods , pp. 1–11, 2019.[49] A. Chakrabarti, “Learning sensor multiplexing designthrough back-propagation,” in
Advances in Neural Informa-tion Processing Systems , pp. 3081–3089, 2016.[50] E. Schwartz, R. Giryes, and A. M. Bronstein, “Deepisp:Toward learning an end-to-end image processing pipeline,”
IEEE Transactions on Image Processing , vol. 28, no. 2, pp. 912–923, 2018.[51] A. Turpin, I. Vishniakou, and J. d Seelig, “Light scatteringcontrol in transmission and reflection with neural net-works,”
Optics express , vol. 26, no. 23, pp. 30911–30929,2018.[52] S. Elmalem, R. Giryes, and E. Marom, “Learned phasecoded aperture for the benefit of depth of field extension,”
Optics express , vol. 26, no. 12, pp. 15316–15331, 2018.[53] V. Sitzmann, S. Diamond, Y. Peng, X. Dun, S. Boyd,W. Heidrich, F. Heide, and G. Wetzstein, “End-to-end op-timization of optics and image processing for achromaticextended depth of field and super-resolution imaging,”
ACM Transactions on Graphics (TOG) , vol. 37, no. 4, p. 114,2018.[54] U. Akpinar, E. Sahin, and A. Gotchev, “Learning wave-front coding for extended depth of field imaging,” arXivpreprint arXiv:1912.13423 , 2019.[55] Y. Wu, V. Boominathan, H. Chen, A. Sankaranarayanan,and A. Veeraraghavan, “Phasecam3d—learning phasemasks for passive single view depth estimation,” in , pp. 1–12, IEEE, 2019.[56] J. Chang and G. Wetzstein, “Deep optics for monoculardepth estimation and 3d object detection,” in
Proceedingsof the IEEE International Conference on Computer Vision ,pp. 10193–10202, 2019.[57] C. A. Metzler, H. Ikoma, Y. Peng, and G. Wetzstein, “Deepoptics for single-shot high-dynamic-range imaging,” in
Proceedings of the IEEE/CVF Conference on Computer Visionand Pattern Recognition , pp. 1375–1385, 2020.[58] Q. Sun, E. Tseng, Q. Fu, W. Heidrich, and F. Heide, “Learn-ing rank-1 diffractive optics for single-shot high dynamicrange imaging,” in
Proceedings of the IEEE/CVF Conferenceon Computer Vision and Pattern Recognition , pp. 1386–1396,2020.[59] X. Dun, H. Ikoma, G. Wetzstein, Z. Wang, X. Cheng, andY. Peng, “Learned rotationally symmetric diffractive achro-mat for full-spectrum computational imaging,”
Optica ,vol. 7, no. 8, pp. 913–922, 2020.[60] S.-H. Baek, H. Ikoma, D. S. Jeon, Y. Li, W. Heidrich, G. Wet-zstein, and M. H. Kim, “End-to-end hyperspectral-depthimaging with learned diffractive optics,” arXiv preprintarXiv:2009.00463 , 2020.[61] M. Kellman, E. Bostan, M. Chen, and L. Waller, “Data-driven design for fourier ptychographic microscopy,” in , pp. 1–8, IEEE, 2019.[62] H. Pinkard, H. Baghdassarian, A. Mujal, E. Roberts, K. H.Hu, D. H. Friedman, I. Malenica, T. Shagam, A. Fries,K. Corbin, et al. , “Learned adaptive multiphoton illumina-tion microscopy,” bioRxiv , 2020.[63] M. D. Lew, S. F. Lee, M. Badieirostami, and W. Mo-erner, “Corkscrew point spread function for far-field three-dimensional nanoscale localization of pointlike objects,”
Optics letters , vol. 36, no. 2, pp. 202–204, 2011.[64] M. P. Backlund, M. D. Lew, A. S. Backer, S. J. Sahl,G. Grover, A. Agrawal, R. Piestun, and W. Moerner, “Si-multaneous, accurate measurement of the 3d position andorientation of single molecules,”
Proceedings of the NationalAcademy of Sciences , vol. 109, no. 47, pp. 19087–19092, 2012.[65] C. Roider, A. Jesacher, S. Bernet, and M. Ritsch-Marte, “Ax-ial super-localisation using rotating point spread functionsshaped by polarisation-dependent phase modulation,”
Op-tics express , vol. 22, no. 4, pp. 4029–4037, 2014.[66] J. Min, S. J. Holden, L. Carlini, M. Unser, S. Manley, andJ. C. Ye, “3d high-density localization microscopy using hy-brid astigmatic/biplane imaging and sparse image recon-struction,”
Biomedical optics express , vol. 5, no. 11, pp. 3935–3948, 2014.[67] C. Cabriel, N. Bourg, P. Jouchet, G. Dupuis, C. Leterrier,A. Baron, M.-A. Badet-Denisot, B. Vauzeilles, E. Fort, andS. Leveque-Fort, “Combining 3d single molecule local-ization strategies for reproducible bioimaging,”
Naturecommunications , vol. 10, no. 1, pp. 1–10, 2019.[68] N. Bourg, C. Mayet, G. Dupuis, T. Barroca, P. Bon, S. Lécart,E. Fort, and S. Lévêque-Fort, “Direct optical nanoscopywith axially localized detection,”
Nature Photonics , vol. 9,no. 9, pp. 587–593, 2015.[69] J. W. Goodman,
Introduction to Fourier optics . Roberts andCompany Publishers, 2005.[70] E. Ben-Eliezer, E. Marom, N. Konforti, and Z. Zalevsky,“Experimental realization of an imaging system with anextended depth of field,”
Applied Optics , vol. 44, no. 14,pp. 2792–2798, 2005.[71] E. R. Dowski and W. T. Cathey, “Extended depth of fieldthrough wave-front coding,”
Applied optics , vol. 34, no. 11,pp. 1859–1866, 1995.[72] B. Ferdman, E. Nehme, L. E. Weiss, R. Orange, O. Alalouf,and Y. Shechtman, “Vipr: Vectorial implementation ofphase retrieval for fast and accurate microscopic pixel-wise pupil estimation,”
Optics Express , vol. 28, no. 7,pp. 10179–10198, 2020.[73] S. M. Kay,
Fundamentals of statistical signal processing . Pren-tice Hall PTR, 1993.[74] R. J. Ober, S. Ram, and E. S. Ward, “Localization accuracyin single-molecule microscopy,”
Biophysical journal , vol. 86,no. 2, pp. 1185–1200, 2004.[75] L. E. Weiss, T. Naor, and Y. Shechtman, “Observing dnain live cells,”
Biochemical Society Transactions , vol. 46, no. 3,pp. 729–740, 2018.[76] I. Bronshtein, E. Kepten, I. Kanter, S. Berezin, M. Lindner,A. B. Redwood, S. Mai, S. Gonzalo, R. Foisner, Y. Shav-Tal, et al. , “Loss of lamin a function increases chromatindynamics in the nuclear interior,”
Nature communications ,vol. 6, p. 8044, 2015.[77] L. E. Weiss, Y. S. Ezra, S. Goldberg, B. Ferdman, O. Adir,A. Schroeder, O. Alalouf, and Y. Shechtman, “Three-dimensional localization microscopy in live flowing cells,”
Nature Nanotechnology , pp. 1–7, 2020.1978] Y. Shechtman, L. E. Weiss, A. S. Backer, M. Y. Lee,and W. Moerner, “Multicolour localization microscopyby point-spread-function engineering,”
Nature photonics ,vol. 10, no. 9, p. 590, 2016.[79] Y. Y. Schechner, R. Piestun, and J. Shamir, “Wave propaga-tion with rotating intensity distributions,”
Physical ReviewE , vol. 54, no. 1, p. R50, 1996.[80] A. Levin, R. Fergus, F. Durand, and W. T. Freeman, “Imageand depth from a conventional camera with a coded aper-ture,”
ACM transactions on graphics (TOG) , vol. 26, no. 3,pp. 70–es, 2007.[81] F. Balzarotti, Y. Eilers, K. C. Gwosch, A. H. Gynnå, V. West-phal, F. D. Stefani, J. Elf, and S. W. Hell, “Nanometerresolution imaging and tracking of fluorescent moleculeswith minimal photon fluxes,”
Science , vol. 355, no. 6325,pp. 606–612, 2017.[82] K. C. Gwosch, J. K. Pape, F. Balzarotti, P. Hoess, J. Ellen-berg, J. Ries, and S. W. Hell, “Minflux nanoscopy delivers3d multicolor nanometer resolution in cells,”
Nature meth-ods , vol. 17, no. 2, pp. 217–224, 2020.[83] M. J. Amin, S. Petry, J. W. Shaevitz, and H. Yang, “Local-ization precision in chromatic multifocal imaging,” arXivpreprint arXiv:2008.10488 , 2020.[84] H. Ikoma, Y. Peng, M. Broxton, and G. Wetzstein, “Snap-shot multi-psf 3d single-molecule localization microscopyusing deep learning,” in
Computational Optical Sensing andImaging , pp. CW3B–3, Optical Society of America, 2020.[85] W. Ouyang, F. Mueller, M. Hjelmare, E. Lundberg, andC. Zimmer, “Imjoy: an open-source computational plat-form for the deep learning era,”
Nature methods , vol. 16,no. 12, pp. 1199–1200, 2019.[86] E. Gómez-de Mariscal, C. García-López-de Haro, L. Do-nati, M. Unser, A. Muñoz-Barrutia, and D. Sage, “Deepim-agej: A user-friendly plugin to run deep learning modelsin imagej,” bioRxiv , p. 799270, 2019.[87] L. Von Chamier, J. Jukkala, C. Spahn, M. Lerche,S. Hernández-Pérez, P. Mattila, E. Karinou, S. Holden,A. C. Solak, A. Krull, et al. , “Zerocostdl4mic: an openplatform to simplify access and use of deep-learning inmicroscopy,”
BioRxiv , 2020.[88] C. Zhou, S. Lin, and S. Nayar, “Coded aperture pairs fordepth from defocus,” in , pp. 325–332, IEEE, 2009.[89] C. Zhou and S. Nayar, “What are good apertures for defo-cus deblurring?,” in , pp. 1–8, IEEE, 2009.[90] C. Zhou, S. Lin, and S. K. Nayar, “Coded aperture pairs fordepth from defocus and defocus deblurring,”
Internationaljournal of computer vision , vol. 93, no. 1, pp. 53–72, 2011.[91] A. Levin, “Analyzing depth from coded aperture sets,”in
European Conference on Computer Vision , pp. 214–227,Springer, 2010.[92] Y. Takeda, S. Hiura, and K. Sato, “Fusing depth from defo-cus and stereo with coded apertures,” in
Proceedings of theIEEE Conference on Computer Vision and Pattern Recognition ,pp. 209–216, 2013.[93] Y. Gil, S. Elmalem, H. Haim, E. Marom, and R. Giryes,“Monster: Awakening the mono in stereo,” arXiv preprintarXiv:1910.13708 , 2019.[94] S. Hell, G. Reiner, C. Cremer, and E. H. Stelzer, “Aberra-tions in confocal fluorescence microscopy induced by mis-matches in refractive index,”
Journal of microscopy , vol. 169,no. 3, pp. 391–405, 1993. [95] D. Axelrod, “Fluorescence excitation and imaging of sin-gle molecules near dielectric-coated and bare surfaces: atheoretical study,”
Journal of microscopy , vol. 247, no. 2,pp. 147–160, 2012.[96] D. P. Kingma and M. Welling, “Auto-encoding variationalbayes,” arXiv preprint arXiv:1312.6114 , 2013.[97] P. Dufour, M. Piché, Y. De Koninck, and N. McCarthy,“Two-photon excitation fluorescence microscopy with ahigh depth of field using an axicon,”
Applied optics , vol. 45,no. 36, pp. 9246–9252, 2006.[98] D. P. Kingma and J. Ba, “Adam: A method for stochasticoptimization,” arXiv preprint arXiv:1412.6980 , 2014.[99] L. Xiao and T. Zhang, “A proximal stochastic gradientmethod with progressive variance reduction,”
SIAM Jour-nal on Optimization , vol. 24, no. 4, pp. 2057–2075, 2014.[100] T. Dozat, “Incorporating nesterov momentum into adam,”2016.[101] L. Liu, H. Jiang, P. He, W. Chen, X. Liu, J. Gao, and J. Han,“On the variance of the adaptive learning rate and beyond,” arXiv preprint arXiv:1908.03265 , 2019.[102] S. Gopal, “Adaptive sampling for sgd by exploiting sideinformation,” in
International Conference on Machine Learn-ing , pp. 364–372, 2016.[103] S. Moser, M. Ritsch-Marte, and G. Thalhammer, “Model-based compensation of pixel crosstalk in liquid crystalspatial light modulators,”
Optics express , vol. 27, no. 18,pp. 25046–25063, 2019.[104] A. LeNail, “Nn-svg: Publication-ready neural networkarchitecture schematics,”
Journal of Open Source Software ,vol. 4, no. 33, p. 747, 2019.[105] F. Yu and V. Koltun, “Multi-scale context aggregation by di-lated convolutions,” arXiv preprint arXiv:1511.07122 , 2015.[106] K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learn-ing for image recognition,” in
Proceedings of the IEEE confer-ence on computer vision and pattern recognition , pp. 770–778,2016.[107] A. Odena, V. Dumoulin, and C. Olah, “Deconvolution andcheckerboard artifacts,”
Distill , vol. 1, no. 10, p. e3, 2016.[108] M. Jaderberg, K. Simonyan, A. Zisserman, et al. , “Spatialtransformer networks,” in
Advances in neural informationprocessing systems , pp. 2017–2025, 2015.[109] A. Paszke, S. Gross, S. Chintala, G. Chanan, E. Yang, Z. De-Vito, Z. Lin, A. Desmaison, L. Antiga, and A. Lerer, “Au-tomatic differentiation in pytorch,” 2017.[110] M. Ester, H.-P. Kriegel, J. Sander, X. Xu, et al. , “A density-based algorithm for discovering clusters in large spatialdatabases with noise.,” in
Kdd , vol. 96, pp. 226–231, 1996.[111] J.-Y. Tinevez, N. Perry, J. Schindelin, G. M. Hoopes, G. D.Reynolds, E. Laplantine, S. Y. Bednarek, S. L. Shorte, andK. W. Eliceiri, “Trackmate: An open and extensible plat-form for single-particle tracking,”
Methods , vol. 115, pp. 80–90, 2017.[112] M. Siemons, C. Hulleman, R. Thorsen, C. Smith, andS. Stallinga, “High precision wavefront control in pointspread function engineering for single emitter localiza-tion,”
Optics express , vol. 26, no. 7, pp. 8397–8416, 2018.[113] M. Ovesn`y, P. Kˇrížek, J. Borkovec, Z. Švindrych, and G. M.Hagen, “Thunderstorm: a comprehensive imagej plug-infor palm and storm data analysis and super-resolutionimaging,”