[PDF] Artefact removal in ground truth and noise model deficient sub-cellular nanoscopy images using auto-encoder deep learning

Abstract

Image denoising or artefact removal using deep learning is possible in the availability of supervised training dataset acquired in real experiments or synthesized using known noise models. Neither of the conditions can be fulfilled for nanoscopy (super-resolution optical microscopy) images that are generated from microscopy videos through statistical analysis techniques. Due to several physical constraints, supervised dataset cannot be measured. Due to non-linear spatio-temporal mixing of data and valuable statistics of fluctuations from fluorescent molecules which compete with noise statistics, noise or artefact models in nanoscopy images cannot be explicitly learnt. Therefore, such problem poses unprecedented challenges to deep learning. Here, we propose a robust and versatile simulation-supervised training approach of deep learning auto-encoder architectures for the highly challenging nanoscopy images of sub-cellular structures inside biological samples. We show the proof of concept for one nanoscopy method and investigate the scope of generalizability across structures, noise models, and nanoscopy algorithms not included during simulation-supervised training. We also investigate a variety of loss functions and learning models and discuss the limitation of existing performance metrics for nanoscopy images. We generate valuable insights for this highly challenging and unsolved problem in nanoscopy, and set the foundation for application of deep learning problems in nanoscopy for life sciences.

Full PDF

AArtefact removal in ground truth and noise modeldeficient sub-cellular nanoscopy images usingauto-encoder deep learning S UYOG J ADHAV , S EBASTIAN A CUNA , K RISHNA A GARWAL , AND D ILIP

K. P

RASAD Indian Institute of Technology (Indian School of Mines), Dhanbad 826004, India Department of Physics and Technology, UiT The Arctic University of Norway, Tromsø, Norway Department of Computer Science, UiT The Arctic University of Norway, Tromsø, Norway * [email protected] Abstract:

Image denoising or artefact removal using deep learning is possible in the availabilityof supervised training dataset acquired in real experiments or synthesized using known noisemodels. Neither of the conditions can be fulﬁlled for nanoscopy (super-resolution opticalmicroscopy) images that are generated from microscopy videos through statistical analysistechniques. Due to several physical constraints, supervised dataset cannot be measured. Due tonon-linear spatio-temporal mixing of data and valuable statistics of ﬂuctuations from ﬂuorescentmolecules which compete with noise statistics, noise or artefact models in nanoscopy imagescannot be explicitly learnt. Therefore, such problem poses unprecedented challenges to deeplearning. Here, we propose a robust and versatile simulation-supervised training approach of deeplearning auto-encoder architectures for the highly challenging nanoscopy images of sub-cellularstructures inside biological samples. We show the proof of concept for one nanoscopy methodand investigate the scope of generalizability across structures, noise models, and nanoscopyalgorithms not included during simulation-supervised training. We also investigate a variety ofloss functions and learning models and discuss the limitation of existing performance metricsfor nanoscopy images. We generate valuable insights for this highly challenging and unsolvedproblem in nanoscopy, and set the foundation for application of deep learning problems innanoscopy for life sciences.

1. Introduction

This article addresses the problem of artefact removal, a form of denoising problem, in nanoscopy(i.e. super-resolved optical microscopy) images obtained using computational nanoscopytechniques [1–8] used for sub-cellular imaging of biological cells. These techniques takein a high speed ﬂuorescence microscopy video (also called the raw microscopy data or rawmicroscopy image stack) comprising of 10s to 100s frames, which are fast enough to capture theﬂuctuations arising from photokinetic nature of ﬂuorescent emitters (referred to as emitters forsimplicity) [9, 10] used to label a sample, and perform statistical analysis of these ﬂuctuationsto construct super-resolved images. We refer to these methods as ﬂuctuations based nanoscopymethods (FNMs). As reported in [11], artefacts appear to be an unavoidable feature of FNMsbecause of the nonlinear statistical analysis tools used as the backbone. Fig. 1 shows examples oftwo diﬀerent sub-cellular structures, the average image of the raw stack (referred to as diﬀractionlimited image), and the nanoscopy images generated by multiple signal classiﬁcation algorithm(MUSICAL [3], an example FNM) using noisy and noise-free raw microscopy data. Being lessphototoxic and more live cell compatible, FNMs are highly desirable for conducting nanoscalestudies in life sciences but the artefacts in nanoscopy images can interfere in deriving correctinferences. Therefore, suppressing artefacts in them is an endeavour of signiﬁcant impact.Artefact removal in the nanoscopy images can be considered as a version of denoising problemin the sense that artefacts are associated to the noise characteristics and have to be removed fromthe image similar to the need of removal of noise in denoising. An important deviation must be1 a r X i v : . [ ee ss . I V ] S e p ig. 1. A side by side view of noisy (SBR3) and ideal MUSICAL reconstructions(simulated) is presented.The top row shows an example of mitochondria while thebottom row shows an example of vesicles. In some case, as shown in (c,d) artefactscan suppress resolvability of features in addition to contributing background debris. Inother cases, it may compromise the sharpness of certain structures (yellow arrows ing) and reduce optical sectioning by reconstructing out-of-focus structures (as shownwith red arrows in g). Number of frames used to generate nanoscopy images 200.Diﬀraction limited image (mean image of all frames) is abbreviated as Diﬀ. Lim. Scalebar 500 nm in 3D plot and 1 µ m. in [b-d] and [f-h]. noted though. Artefacts may not be completely stochastic in nature as opposed to the generalnoise distributions since they encode the stochastic parameters of noise and photokinetics aswell as the systematic distortion introduced by the microscope or algorithm. Nonetheless, forsimplicity of reference, we use the terms noisy, noise-free, and denoising for artefact-ridden,artefact-free, and artefact removal respectively. Also, unless speciﬁed otherwise, these termsapply in the context of the processed nanoscopy images rather than the raw microscopy imagestacks used for generating the nanoscopy images.Deep learning based denoising of signals and images has gained quite some traction in therecent times [12–17], even for denoising microscopy images [18]. Assumption of large supervisedtraining dataset is an inherent assumption in deep learning, which is often diﬃcult to achieve sincegenerating pair of noisy and noise-free images using the same sensor is not possible. Therefore,the noise model is assumed to be known and is used to create synthetic supervised trainingdatasets. Additive white Gaussian noise is often assumed [12,13], which provided state-of-the-artdenoising performance when released. However, another contemporary study indicated thattraditional methods work better in most real scenarios of noisy images [14]. This is a primereason for using traditional approaches such as feature based reconstruction for microscopyimages even in recent times [19–23]. This led to the appreciation that synthesizing the right noisemodel is a key to quality denoising [15, 16, 21].In microscopy data, there are two major sources of stochastic noise, namely the shot noise(Poisson distribution) arising from the photon scattering behaviour and the electronic noises ofcameras, whose noise models depend on the type of scientiﬁc cameras [20, 23]. There are othersystematic sources of artefacts, such as camera drift, microscope aberrations, and occasionallydead and hot pixels. Systematic artefacts due to these sources can be greatly reduced or evencompletely removed by changing or upgrading the microscope hardware. In some cases, theeﬀect of electronic noises and shot noise can be reduced by using extremely high light doses fornon-ﬂuorescent microscopes, such as used for creating a supervised microscopy dataset in [24].However, neither creating a supervised training dataset nor modeling the noise or artefacts is2n option in FNMs due to multiple reasons, as explained next. For creating an experimentallymeasured supervised training dataset, a pair of identical raw microscopy data should be measured,except that one is noisy while the other is noise-free. This is possible to some extent for individualﬂuorescence microscopy images [25, 26], but impossible for videos due to the following:• Fluorescence bleaching eﬀect:

Unlike [24, 25] where a high dose of light was used togenerate low noise images for emulating noise-free raw data, using high light dose forgetting the equivalent raw noise-free videos bleaches the ﬂuorescent samples and insteadintroduces fast decreasing intensity eﬀect which does not match the low light dose image.•

Impossible to replicate the ﬂuctuation statistics:

The emission of photons from ﬂu-orescent molecules is a stochastic process. Therefore, it is impossible to replicate thetemporally precise series of emissions between the two set of measurements. Further,the averaging approach over multiple frames considered in [26] cannot be used since itmodiﬁes the manifestation of ﬂuctuations in the averaged stack and also does not matchthe temporal rate of noisy image stack.Further, generating a noise or artefact model is not possible for FNMs. Each image in the rawmicroscopy image stack itself is a linear map of the microscope’s transfer function (conventionallycalled the point spread function in microscopy) convolved with the emitter locations and weightedby the number of photons emitted by these emitters during that frame as a consequence oftheir photokinetics. The point spread function (PSF) and the image characteristics encodeoptical properties such as numerical aperture, wavelength of ﬂuorescence, and camera pixelsize. Known camera-speciﬁc noise models also apply to individual images. But, as FNMsperform spatio-temporal mixing through statistical analysis and generate non-linear functionsthat indicate the presence of emitters, the ﬂuctuations, noise, and microscope parameters all getnon-linearly mapped into the nanoscopy images. These mappings have a strong dependence uponspatio-temporal density of photon emissions, light dose and its temporally non-linear eﬀect onboth photokinetics and noise level, the PSF, the statistical techniques, and the control parametersused for the FNMs, for example as discussed for super-resolution optical ﬂuctuations imaging(SOFI) method [27], often resulting into custom artefacts of complicated nature. For example,in Fig. 1, the artefacts for the mitochondria example appear in the form of background debrisand compromise in resolvability of the foreground structures, while the artefacts in the vesiclesexample are in the form of blur edges, loss of certain details, and non-negligible visibility ofout-of-focus structures. Furthermore, a wide range of ﬂuctuation statistics, density of emitters,and microscopes may be encountered in practice, making it diﬃcult to learn a generally validartefact model for the chosen FNM. Therefore, learning or generating noise models is also notpractically feasible. In brief, this particular artefact removal problem is not only ground truthdeﬁcient but also deﬁcient of noise or artefact model. Therefore, there is no appropriate guidefor creating experimental or synthetic supervised training dataset in the conventional sense. Atthe same time, the complex nature of artefacts indicate the need of black box deep learningapproaches such as autoencoders [28] which mandates supervised training datasets. One potentialsolution is to ﬁrst denoise the individual microscopy images using a suitable microscopy imagedenoising approach and then pass the denoised raw microscopy data to FNM. However, such anoperation introduces non-linear computational distortion in the raw microscopy data, both to thenoise components and the ﬂuorescence ﬂuctuations component. This distortion renders denoisedmicroscopy image stack as unsuitable for FNMs.In view of these compound challenges, we take an unconventional route towards deep learning,namely simulation-supervised training dataset. We create the pair of noisy and noise-freenanoscopy images through simulating two exactly identical raw microscopy stacks, except thatone is passed through a noise engine and the other is not. A key to the success of simulationsupervised deep learning is the ability of the simulation engine to simulate the real scenarios and3 ig. 2. The proposed approach of artefact removal is illustrated here. The asterisk (*)shown in block C1 indicated the low-dimensional latent feature space of autoencoder,suitable for representing feature-deﬁcient microscopy and nanoscopy images. Relevantdetails of the labeled blocks appear in section 2. the corresponding ground truth. It becomes further paramount in problems that have implicationson scientiﬁc inference. Therefore, each physical phenomenon underlying the raw microscopyvideo generation has been simulated to the needed accuracy. Example includes simulatingpractical range of photokinetics and even including the glass coverslip used to cover the samplein our simulation engine. In order to obtain an artefact removal approach that is robust andversatile across a wide variety of situations, structures, and microscope parameters, we consider arange of simulation parameters. We note that simulation supervised approaches is not necessarilynew in microscopy domain [29–31], but the development of simulation engines that can createcustomized problem and microscopy modality speciﬁc as well physically loyal data is a fairlyrecent practice. Speciﬁcally, a simulation-supervised deep learning approach for ‘nanoscopy’denoising for sub-cellular structures is used for the ﬁrst time. Here, we demonstrate artefactremoval for one candidate FNM, namely multiple signal classiﬁcation algorithm (MUSICAL) [3],although the concept is generalizable to any FNM. Beside showing versatility to denoising imageswith sub-cellular structures and structural density for which training has not been performed, wealso show that the method is robust to noise models not simulated in the raw microscopy data.We attribute this robustness to both the diversity of other conditions in the simulated dataset aswell as the fact that the artefacts created by FNMs are related to the noise and other stochasticparameters in a complex manner such that the manifestation of only the distribution of noisecannot be singled out. The highlight of our results is the quality of artefact removal on actualexperimental public data of sub-cellular structures, including in ﬁxed and living cells.The outline of the paper is as follows. Section 2 presents the proposed method while section 3presents diverse validation and experimental results. Discussion and insights are presented insection 4. Section 5 concludes the work. 4 . Proposed approach

Our approach is shown in Fig. 2. We ﬁrst create a simulated training dataset as illustrated inblock A and B of the 2. For this ﬁrst two sets of raw microscopy image stacks are simulatedusing precisely the same physical characteristics, with the exception that one raw microscopyimage stack is noisy since it is generated by the noise-free raw microscopy image stack passingthrough a noise simulator. Both the raw microscopy image stacks are individually processedusing MUSICAL to obtain corresponding noisy and noise-free nanoscopy images. Since thesimulated dataset is used for training, it is imperative for the simulated dataset to emulate therelevant aspects of reality as closely as possible while retaining enough diversity across thesimulated conditions. Several thousands of such pairs are generated and used as the supervisedtraining dataset for the autoencoder. Then, through a good choice of autoencoder architectureand the loss functions, the autoencoder is trained for denoising the nanoscopy images. In the testphase or actual ﬁeld use, raw microscopy image stack obtained by a real microscopy experimentis processed through the MUSICAL algorithm to obtain a noisy nanoscopy image, which isthen passed through the trained autoencoder to generate the corresponding noise-free nanoscopyimage. We discuss the details of the various blocks in the subsequent sub-sections.

A1: Sample simulator − The concept is that the shape and size hypotheses created by priorstudies are used to simulate sample geometries. In this work, we consider three types of sub-cellular structures, namely actin ﬁlaments [32–34], mitochondria [35–37], and vesicles [38–40].However, the setup is easily scalable to include other types of sub-cellular structures. First, the3D geometries of the structures are simulated. Then, the positions of ﬂuorescent molecules(called emitters for simplicity) are stochastically generated as labeling the structures.For simulating an actin ﬁlament, a 3D smooth curve is created by selecting certain number ofspline control points and then ﬁtting a spline through them. The number of control points is alsoselected randomly from the range [ , ] . The maximum length allowed for a ﬁlament is kept at 5 µ m. The emitters are placed randomly across the length of the spline curve with linear densityof 100 emitters/ µ m. This is based on two assumptions. First, the periodicity of binding sitesin actin is 5-7 nm. Second, the labeling eﬃciency is never 100%. Assuming 30-50% labelingeﬃciency, the selected emitter density is reasonable.For simulating a single mitochondrion, ﬁrst the spline similar to actin ﬁlament is considered.Then, a curvilinear cylinder of radius 150 nm is ﬁt over it by convolving a cylinders of thechosen radius and height 1 nm over the spline. The selection of the diameter 300 nm is closeto diﬀraction limit of most microscopes and its outer membrane label is not distinguishable inraw microscopy data with noise, but is expected to be reconstructed as outer boundary by ananoscopy method. Further, as seen in Fig. 1, under signiﬁcant noise, the membrane boundarymay not be explicitly reconstructed. So, we consider this radius as a border line situation offailure of MUSICAL under noise. However, other ranges of diameters may be included in thefuture. After constructing the geometry, the emitters are distributed randomly on the surface ofthe geometry with an emitter surface density 500 emitters/ µ m . This emulates outer membranelabel of mitochondria. The emitter density is chosen heuristically based on expert input.A vesicle is simulated as a sphere of radius randomly chosen from the range [25,500] nm.The emitters are distributed on its surface with an emitter density of 2000 emitters/ µ m , chosenheuristically. The surface labeling emulates the membrane of vesicles.There may be multiple instances of a structure in an image region, however only one type ofstructure is expected in one ﬂuorescent color channel. Therefore, we simulate multiple actinﬁlaments, multiple mitochondria, or multiple vesicles in each example. The number of them in asingle image is chosen randomly from the range [3,10], [1,4], and [10,30] for actin ﬁlaments,mitochondria, and vesicles respectively. We impose some boundaries on the 3D space in which5he sample may be present. These are x , y ∈ [− . , . ] µ m and z ∈ [− , ] nm where z = A2: Photokinetics simulator − In reality, there are multiple distributions associated withthe emissions of photons when the ﬂuorescent molecule is active, the ﬂuorescent moleculeentering, dwelling, and exiting the dark states, the photobleaching etc. [41–43]. However, atimage acquisition rates of milliseconds to seconds, the need of knowing and simulating individualdistributions is obviated, and simpler probability distributions can be used to represent the macro-behaviour of ﬂuctuations in photon emissions arising from photokinetics. This simpliﬁcationmay not apply if speciﬁc dyes are used with long dark states, but this is neither the requirementof ﬂuctuations based nanoscopy techniques nor are the regime in which they provide a particularadvantage over other localization based methods [44, 45].Therefore, we use the simpler photokinetic model based on the implementation of [46]. In thismodel, a single emitter is characterized with a 2-state model. The states are simply called on inwhich the molecule is producing photons, and oﬀ in which case no photons are emitted. Thetime the emitter stays in each state is modeled with an exponential distribution controlled by twoparameters called τ on and τ oﬀ . These correspond to the mean time the emitter spends in eachstate. The emission rate of photons is considered constant and therefore, the number of photonsemitted while the emitter is in the on state is just the rate by the total time. As a result, the dutycycle is then τ on /( τ on + τ oﬀ ) . All emitters are considered identical and therefore all of them in asample have the values of τ on and τ oﬀ .In order to emulate a range of photokinetic behaviour, we choose the values of τ on and τ oﬀ asintegers taken from the ranges [ , ] and [ , ] , respectively. It is of interest to observer thatthe pair ( τ on , τ oﬀ ) having a value (5,1) indicates extremely dense ﬂuctuations, i.e. an extremelychallenging condition for ﬂuctuation based nanoscopy techniques where they do not providesigniﬁcant resolution enhancement. On the other hand the pair having value ( , ) is a conduciveregime for such techniques. A3: Microscope simulator − The imaging function of the microscope is simulated usingGison Lanni model of point spread function (PSF) [47]. We use a fast implementation of GibsonLanni PSF reported in [48]. The PSF simulates the blurring introduced by the optics of themicroscope as the light passes through the coverslip and microscope optics to the image regionwhere the camera is placed.Among the various parameters needed for simulating the Gibson Lanni PSF, the followingwere used as a constant for the setup. The sample is assumed to be mounted on a glass surface(such as slide) and present in water medium. A glass coverslip of 170 µ m is assumed to be presentbetween the sample and the microscope optics. The numerical aperture (NA) of the system isselected randomly from the range [ . , . ] . For simulation purposes, the emission wavelengthof the emitters is assumed to 660 nm. In practice, the emission wavelength is a characteristic ofthe ﬂuorescent dye chosen for the experiment for a particular type of structure and is generally inthe range [ , ] nm for visible range ﬂuorescent dyes. However, the manifestation of thewavelength is in terms of achievable resolution and the spread of PSF. The same eﬀect can beachieved through varying the NA of the microscope. Therefore, choosing a ﬁxed wavelength butsuﬃciently large span of NA allows us to simultaneously consider variety of microscopes anddyes without loss of generalization. Since the PSF is computed in the image region to constructthe image of an emitter, the camera’s pixel size is also needed. The camera pixel size in terms ofthe sample dimensions is computed by dividing the actual hardware pixel size of the camera withthe magniﬁcation of the microscopes. We consider pixel sizes in sample dimensions directlyand select candidate values most popularly encountered in high NA microscopy systems. Fourdiﬀerent pixel sizes were considered for simulation (65, 80, 108 and 120 nm), each pixel size6etting used for exactly one quarter of the total number of samples simulated for each type ofstructure. A4: Noise simulator − The noise simulation approach is taken from [3,49]. There are two mainsources of noise. The ﬁrst is the camera’s electronic noise that contributes a noisy background inthe image. The second is photon noise, which is based on Poisson statistics of arrival of photonat the expected location. Let the simulated microscopy image, scaled to span [ , ] be denotedas I . Moreover, let the signal to noise ratio be SNR and the measured background values inthe camera with closed shutter be b . First, a microscopy image of the expected signal strength(such as observed in the microscopy data) and having a constant background b is simulated asˆ I = b ( SNR − ) I + b . Then, the noisy microscopy image ˜ I is generated such that each pixel in ˜ I is generated using a Poisson distribution with mean equal to the corresponding pixel in ˆ I .With ∼ ms exposure time used in FNMs, the electronic noise is signiﬁcantly stronger thanthe photon noise. In such situation, signal to background ratio (SBR) is a practical measureof noise. The original article of MUSICAL reports super-resolution for SBR ≥

3. Therefore,we simulate our dataset with the lowest SBR (i.e. highest level of noise) recommended forMUSICAL. Furthermore, we noted that a large variety cameras have background noise in therange [ . ] on a 16 bit intensity scale, depending upon the type of camera, the imaging speed,the cooling system, and other usage factors. We used a constant value b =

100 in our simulations.A total of 3000 noise-free and noisy image stacks were created, each containing 200 frames.Among them, 1000 pairs simulated each for actin ﬁlaments, microtubules and vesicles. 75% ofthe pairs were used for training and 25% were used for testing. The selection was performedrandomly.

For each pair of raw microscopy data, MUSICAL is applied independently on the noise-free andnanoscopy raw microscopy image stacks to obtain one pair of training data. Here, we explainMUSICAL and the MUSICAL parameters.MUSICAL achieves super-resolution by performing spatio-temporal analysis of the ﬂuctuationsin the measured image stack and exploiting that the noise is stochastic while the ﬂuctuationsarising from photokinetics are modulated through the PSF in the microscopy images. MUSICALdecomposes the image stack using singular value decomposition or eigenvalue decomposition[3,50,51] into a orthogonal set of vectors called eigenimages, and eigenvalues uniquely associatedto them. In particular, the eigenimages with high eigenvalues are associated to the actual emittersand therefore, are strongly related to the PSF of the system. Speciﬁcally, eigenimages associatedto the actual structure are expected to be linear combinations of the PSFs at emitter locations.These eigenimages (the ones with high eigenvalues) are grouped into one set, called the signalsubspace, since they span the images measured in the stack. Notably, only a subset of all theeigenimages belongs to this set. The ones that do not, are grouped into another set called thenoise subspace. The key property exploited by MUSICAL is as follows. The signal and noisesubspaces are orthogonal, and the signal subspace is given by the linear combinations of PSFs atthe emitters. Therefore, the PSFs at emitter locations are also orthogonal to the noise subspace.As a result, a test point at an emitter location, will have a large projection in the signal subspaceand small in the noise subspace. On the other hand, if a test point is far from an actual structure, ithas small projection in the signal subspace, and large projection in the noise subspace. These twosituations are combined in a so-called ‘indicator function’ that takes the ratio of the projection inthe signal and noise subspace. As a result, the function is high for test points at emitters locationsand low otherwise.MUSICAL needs the following knowledge about the microscopy data: the emission wavelengthof the ﬂuorophore (or equivalently the collection wavelength of the microscope), the pixel size7 a) UNet(b) Feature pyramid network (FPN)Fig. 3. Block diagrams of the autoencoder architectures explored in this work. of the camera as scaled for sample dimensions, and the NA of the microscope. In addition,MUSICAL needs three algorithmic control parameters: (a) a threshold for assigning eigenimagesto the signal and noise subspace, (b) a contrast parameter α , and (c) the level of subpixelationwhich determines the ﬁneness of the grid and pixel size in the nanoscopy image. We used arecent work on automatic soft threshold for the ﬁrst parameter, which obviates the need foruser-speciﬁed threshold []. Further, α has been set to 4 following the recommendation of [3], andsubpixelation of 10 since this subpixelation gives pixel size well below the smallest structures wehave considered. Autoencoder architectures

As noted in [49, 52, 53], microscopy and nanoscopy data posesseveral challenges as compared to the normal computer vision data because of absence of color,texture, and edge features. However, the small latent space of autoencoders (such as shown inblock C1 of Fig. 2) is an eﬃcient way of exploiting the sparsity and lack of feature variety whichis characteristic of microscopy and nanoscopy images.We tried two diﬀerent architectures for this task. The ﬁrst one is the U-Net [54] model. It wasdesigned speciﬁcally for biomedical images and it is known for good performance in the medicalimaging domain. The inspiration behind using U-Net is primarily the similarity of the applicationdomain. The second model is the Feature Pyramid Network (FPN) [55]. Although FPN wasoriginally designed for object detection tasks, many have successfully utilised the architecture forimage-to-image tasks like semantic segmentation and instance segmentation [56, 57]. Inspirationfor this choice was to see if the impressive performance seen with an image-to-image task likesegmentation can transfer to a denoising task like ours. The architectures are shown in Fig. 3.For each model, we considered two options for convolutional layer architectures, namely R-34and R-50, where R stands for residual network. This was done to explore both deep and deeperarchitectures. We found that FPN with R-50 model often did not converge while training. So, wedrop discussion on this combination hereon.We note that some changes in the input images and the architectures were made to accommodate8or the special case of the chosen nanoscopy algorithm, as described next. The simulated inputimages had 32-bit ﬂoating point pixel values. Both the input and output images were normalizedusing max normalization. Without the normalization, the neural network has to deal with anill-deﬁned problem as the actual dynamic range of the data may be much smaller than 32 bit forthe noisy nanoscopy image. This is a consequence of the MUSICAL’s nanoscopy performingindicator function. Further, learning the intensity span of the actual 32 bit image for the noise-freenanoscopy for each case is more challenging than deﬁning the intensity in the output image tobe in the range [ , ] . Therefore, the max normalization makes the input and output intensityranges better-deﬁned and mapping more learnable. At the same time, loosing the actual intensityvalue in the output is not considered a problem since the quality of the output image is unaﬀectedand its interpretability unaltered due to it. This is because MUSICAL and several other FMNsare qualitative reconstruction techniques in the sense that the intensity values generated by themindicate statistical signiﬁcance of presence of emitters but not values of physical quantities. Theonly exception to best of our knowledge is balanced super-resolution optical ﬂuctuation imaging(b-SOFI) [58]. The selected architectures is then modiﬁed to ﬁt the new output format of 32bit ﬂoating point images with intensity range [ , ] . To do so, a rectiﬁed linear unit (ReLU)activation layer is added at the output to force the lower limit of the output image to be greaterthan zero. This layer is followed by a max normalization step to limit the intensity values in theoutput image between 0 and 1. Choice of the Loss Function

The choice of the loss function determines the nature and qualityof learning. Since nanoscopy image denoising for FNMs is new, we experimented with a varietyof loss functions presented below. We use the following notations. The input denoised image(the output of the autoencoder) is denoted as ˆ I while the corresponding noise-free image (thetarget or ground truth for the autoencoder) is denoted by I . The pixel indices are speciﬁed by n and the total number of pixels is N . Therefore intensity in the denoised image for n th pixel isdenoted as ˆ I n and similarly for the noise-free image. L1 loss:

The pixel-wise mean absolute error between the output and the ground truth image is: L L1 ( ˆ I , I ) = N N (cid:213) n = (cid:12)(cid:12) ˆ I n − I n (cid:12)(cid:12) (1) L2 loss:

The pixel-wise mean squared error between the output and the ground truth images: L L2 ( ˆ I , I ) = N N (cid:213) n = (cid:16) ˆ I n − I n (cid:17) (2) SSIM loss:

The SSIM metric comprises of three perceptual components, namely luminance l ( ˆ I , I ) , contrast c ( ˆ I , I ) , and structure s ( ˆ I , I ) , as shown below.SSIM ( ˆ I , I ) = l ( ˆ I , I ) · c ( ˆ I , I ) · s ( ˆ I , I ) (3)The detailed expression and further insights into SSIM are available at [59, 60]. It is remarkablethat two images should be similar to each other in terms of the overall luminance, contrastand structure for the SSIM value to be large, which trends at the level of individual pixelsare not considered too important. The SSIM values are limited to be between 0 and 1 usingReLU, thus we just subtract the SSIM value from 1 to obtain the SSIM loss function as L SSIM ( ˆ I , I ) = − SSIM ( ˆ I , I ) . MS-SSIM loss:

For calculating MS-SSIM [61], the image pairs are iteratively scaled downby a factor of 2 down M number of times. Let us denote ˆ I m and I m as the denoised andnoise-free images after the m th scale down. c ( ˆ I m , I m ) and s ( ˆ I m , I m ) are calculated for all values9f m ∈ [ , M ] while l ( ˆ I M , I M ) is only calculated only for M th scaled down version. Then,MS-SSIM is computed as:MS-SSIM ( ˆ I , I ) = [ l ( ˆ I M , I M )] α M · M (cid:214) m = (cid:16) c ( ˆ I m , I m ) (cid:17) β m (cid:16) s ( ˆ I m , I m ) (cid:17) γ m (4)where α M , β m , and γ m are powers imparted to luminance, contrast, and structure terms for therelevant scales. In the original article [61], their values are set to 1, and we have used the same.The MS-SSIM values are also limited to the range [ , ] using ReLU. We therefore subtract the MS-SSIM value from 1 to obtain the MS-SSIM loss function as L MS - SSIM ( ˆ I , I ) = − MS-SSIM ( ˆ I , I ) . Perceptual or VGG loss:

The perceptual loss [] is calculated by comparing the high-levelrepresentations obtained by feeding the images to a pretrained benchmark convolutional network,such as VGG-16 [62] (hence the name VGG loss). The activations obtained from the 4th, 9th,16th and 23rd layer in the VGG-16 model by passing the denoised and noise-free images asinputs are used for comparison. Let ˆ A l and A l denote the activation maps obtained from the l thlayer of VGG-16 for the denoised and the noise-free images, i.e. ˆ I and I , respectively. Then theVGG loss is given as: L VGG ( ˆ I , I ) = (cid:213) l ∈{ , , , } (cid:12)(cid:12) ˆ A l − A l (cid:12)(cid:12) (5) Weighted combination:

Apart from the loss functions described above, a few more lossfunctions were devised by using a weight sum of two loss functions. L combo ( ˆ I , I ) = ( − β ) L i ( ˆ I , I ) + β L j ( ˆ I , I ) (6)Two such combinations are explored - a combination of MS-SSIM and L1 loss functions (with β = . β = . β was determined empirically. Training algorithm:

For training, Adam optimizer was used with a learning rate of 0.001.The models were trained for 60 epochs. PyTorch library was used for designing and training themodels.

3. Results

We perform validation of our approach using both simulated and actual experimental data. Theresults and insights are presented below.

A test set was created comprising of 250 image pairs each of actin ﬁlaments, mitochondria, andvesicles using the raw microscopy data simulator discussed in section 2.1.

Quantitative comparison of diﬀerent methods

We perform quantitative comparison ofdiﬀerent models and loss functions using peak signal-to-noise ratio (PSNR), which is a prominentquantitative metrics used for gauging denoising performance. For simplicity, we refer to acombination of a loss function and a model as a method. Therefore, essentially, we compare21 diﬀerent denoising methods in Table 1 using PSNR. The separation of the test results forthe diﬀerent structures is done to appreciate if the geometry has a bearing on the achievabledenoising. It is noted in Table 1 that UNet (R-50) together with VGG performs the best forvesicles and mitochondria and the second best for actin ﬁlaments.10 ualitative analysis:

Single valued quantitative metrics such as PSNR are often unsuitable inrepresenting the quality of images, speciﬁcally for the case of low-contrast microscopy images.An illustration of this point is given in Fig. 4. The output produced by the method labeled M-3is much cleaner, with least background debris, while the one produced by FPN (R-34) | L2 hasvisible artefacts right along the edges of each of the strands. Despite this, the PSNR metricvalues the former at 37.48 dB while the latter is valued at a much higher PSNR value of 40.02dB. In contrast, SSIM metric values the denoising output from FPN (R-34) | L2 at a lower scoreof 0.944 while M-3 is valued at a higher score of 0.95. This is more true to the reality than thePSNR score. However, there will be other cases, where SSIM is not a good indicator of quality.Therefore, we perform a qualitative analysis of aretfact suppression. We consider the followingthree methods for qualitative comparison:• M-1: UNet (R-50) trained with VGG loss (superior performing in terms of PSNR)• M-2: UNet(R-50) trained with SSIM + L1 combination loss• M-3: UNet (R-50) trained with L1 lossThe results for mitochondria are presented in Fig. 5. It is seen that M-1 (5c) and M-3 (Fig. 5e)restore the resolution and construct the boundary of the membrane. M-2 (Fig. 5d) also appears

Table 1. Quantitative analysis in terms of PSNR for diﬀerent combinations of modelsand loss functions. The method with the best PSNR value is highlighted in bold.Further, the three methods M1-M3 used for qualitative comparison are indicated inunderline.

Loss functionsL2 L1 SSIM MS-SSIM VGG MS-SSIM SSIM Best lossModel + L1 + L1 function

Actin ﬁlamentsUNet(R-34) 38.27 37.64 35.89 36.27 37.69 37.21 37.11 L2UNet(R-50) 37.94 36.61 37.58 36.74 38.60 37.43 37.99 VGGFPN(R-34)

Best

FPN UNet UNet FPN UNet UNet UNet

Model

R-34 R-34 R-50 R-34 R-50 R-50 R-50VesiclesUNet(R-34) 39.29 38.96 38.66 37.62 39.78 37.89 38.63 VGGUNet(R-50) 39.64 39.18 38.62 37.28

Best

UNet UNet UNet FPN UNet UNet UNet

Model

R-50 R-50 R-34 R-34 R-50 R-50 R-50MitochondriaUNet(R-34) 38.13 37.19 38.05 35.46 38.97 38.63 38.21 VGGUNet(R-50) 36.59 36.73 39.20 38.69

Best

FPN FPN UNet UNet UNet FPN UNet

Model

R-34 R-34 R-50 R-50 R-50 R-34 R-50 ig. 4. The results of denoising the noisy image (a) using the method with best PSNR(c) and another method M-3 (d) listed in section 3.1, and quality comparison withthe noise-free image (b). The PSNR and SSIM values for the denoised images areindicated.Scale bar 1 µ m.Fig. 5. A qualitative comparison for mitochondria where artefact suppression restoresresolution. In a-e, the contrast is adjusted manually for best visualization of resolutionrestoration. The intensities along the yellow line shown in (a-e) are plotted in f. g-kshow saturated versions of a-e, where the the out of focus regions and backgrounddebris are also visible. Scale bar 500 nm. to perform well, unless the intensities at a line section (shown as yellow line in Fig. 5a-e) areobserved (Fig. 5f), where M-2 shows a jittery intensity proﬁles between the two peaks, whichmay be mistaken as resolving further small features. Here, we have shown only one line section,but we observed similar eﬀect along multiple other sections. Another observation is that theall the methods methods seem to suppress out-of-focus structures (left bottom tail in a-e), butnot as eﬀectively as the noise free image (Fig. 5b). The contrast stretched and over-saturatedversions of the images (Fig. 5g-k) show that the out-of-focus structures are present in all theimages, including the noise-free, however with signiﬁcantly lower intensity as in seen in Fig. 5b.In this sense, better optical sectioning supported by the noise-free image is still not achievedby the denoised images, although M-3 works the best in this sense. Lastly, from Fig. 5g-k, itis seen that M-2 and M-3 and signiﬁcantly more eﬀective in terms of suppressing backgrounddebris artefacts. We noted similar observations for actin ﬁlaments, i.e. M-3 produces the thinnestﬁlaments and M2-M3 suppress the background debris well. Further, M-3 performs better insuppression of the out-of-focus structures. The results are not reported for space constraints.It was indicated in Fig. 1 using red and yellow arrows how the noisy nanoscopy image createdbackground debris due to out-of-focus structures and witnessed reduced sharpness in the features.We show the denoising results for the same example in Fig. 6. The pesudocolor rendering anddiﬀerent contrasts in Fig. 6a-b help in observing these eﬀects more clearly. The yellow linesection helps in investigating both the eﬀects simultaneously. The log-intensities at yellow linesections in Fig. 6a,e-g are shown in Fig. 6d. It is seen that the M-1 and M-3 follow quite similar12 ig. 6. Qualitative comparison for an example of vesicles sample. (a,b) shows the samenoise-free image rendered in two contrast stretch. The contrast c1 in (a) is set so that thethe structure marked in yellow triangle can be seen in both noisy and noise-free images.The contrast c2 in (b) is set so that the appearance and visual thickness of the brightspot marked in red triangle appears similar to the noise-free image. The intensitiesalong the line sections shown in (a,b-f) are compared in (g). Scale bar 500 nm.Fig. 7. Testing of M-3 method on the nanoscopy images generated by other algorithms.Scale bar 500 nm. a. SOFI order 2. b. bSOFI. c. SRRF with ring radius 2. trend with each other and with the noise-free image. Both lower valley in the background region.M-2 generally follows the trend well in the high intensity zones, but may introduce peaks of smallintensity in the background.Overall, M-2 and M-3 are more eﬀective in suppressing background, and M-1 and M-3 arebetter at improving the sharpness of the image. Generally, for simulated examples, M-3 presentsthe best qualitative results.

Testing nanoscopy images from other nanoscopy algorithms

Here, we consider if ourtrained models can be directly applied to nanoscopy images generated by other FNMs. For thesame vesicles example as shown in Fig. 1, we use the noisy raw microscopy image stacks andprocessed them with three diﬀerent methods, namely SOFI [1], bSOFI [58], and super-resolutionradial ﬂuctuations [2] to obtain noisy nanoscopy images. These are then processed using M-3 togenerate denoised nanoscopy images. The results are presented in Fig. 7. It is seen that M-3 doesnot denoise SOFI and bSOFI images well, but seems to performing well for SRRF. Whether itworks well on SRRF data of wider variety is still an open question. Therefore, we conclude thateven if some transferability may be present across methods that generate similar type of featuresfor certain structures (such as seen here for SRRF on vesicles), such an assumption cannot begenerally applicable across FNMs and either fresh training or retraining on data created usingspeciﬁc FNMs should be undertaken. At the same time, we note the concept of the proposed13 ig. 8. Artefact suppression in nanoscopy image generated using raw microscopy datawith speckle noise model. Top row (a-e) and bottom row (f-j) correspond to signalto noise ratio 10 and 5, respectively. (d,e,i,j) correspond to intensity in log scale for(b,c,g,h), respectively.Fig. 9. Artefact suppression in nanoscopy image generated using raw microscopy datawith Gaussian noise model. Top row (a-e) and bottom row (f-j) correspond to signal tonoise ratio 100 and 10, respectively. (d,e,i,j) correspond to intensity in log scale for(b,c,g,h), respectively. method is generalizable, but not the trained models themselves.

Testing nanoscopy images with raw data noise model other than used for simulation

Weconduct this study in order to study the nature of artefacts if noise model than that simulated inraw microscopy data, and assess the generalizability of denoising approach to such noise modelsin the raw data. We modeled noisy raw data with speckle noise model, assuming variance of0.1 and 0.2 respectively relative to the max intensity in the noise-free raw data for two diﬀerentsimulations. These correspond to eﬀective signal to noise ratios of 10 and 5, respectively. Theresults are shown in Fig. 8. As expected with the speckle noise model [64], the noise eﬀectsthe foreground in the raw microscopy data (see Fig. 8a,f). Yet, the noisy nanoscopy image canresolve the boundary of the mitochondrion (Fig. 8b,g) but contains signiﬁcant debris in thebackground (see the log-scale nanoscopy image in Fig. 8d,i). We performed denoising usingthe method M-3. The results indicate that on one hand, denoising makes the boundaries of the14itochondrion sharper (Fig. 8c,h), and on the other hand, the debris is suppressed only poorly byM-3 (Fig. 8e,j).We repeat this experimental with Gaussian noise model. We consider two values of variances,0.01 and 0.1 relative to the maximum intensity in the noise-free raw data, which corresponds tosignal to noise ratio 100 and 10 respectively. The results are presented in Fig. 9. It is seen that theGaussian noise aﬄicts the raw microscopy and well as the nanoscopy more severely than the caseof speckle noise (Fig. 8). Even for signal to noise ratio 100, the resolvability of the boundaries ofthe mitochondrion is compromised (Fig. 9b) marginally and the background debris visible in thelog scale is quite signiﬁcant in Fig. 9d. Nonetheless, denoising using M-3 restores the boundaryof the mitochondrion (Fig. 9c) as well as suppressed the background debris eﬀectively (Fig. 9e).However, in the case of signal to noise ratio 10, the resolvability of the boundaries cannot berestored by denoising (Fig. 9g) even though the background debris is signiﬁcantly reduced (Fig.9j).Therefore, it is evident that the diﬀerent noise distributions in the raw data result into diﬀerentnatures of artefacts and have signiﬁcant variation in the prominence of artefacts for a given signalto noise ratio. We also note that the denoising method supervised on nanoscopy data generatedby raw microscopy with one noise model may be partially eﬀective in reducing artefacts arisingfrom another noise model, for example in terms of resolvability or background debris. In otherwords, we noted only partial generalizability across raw data noise models.

We performed denoising experiments on real microscopy data of actin ﬁlaments (invitropreformed), microtubules in ﬁxed cells (these are thick ﬁber like structures not included inour training data), liposomes (lab-fabricated agarose stablized artiﬁcial small vesicles), andmitochondria in living cells. The results for them are presented in Figs. 10 −

13, respectively.The experimental details and discussion on results for each data is presented below.

In vitro preformed actin ﬁlaments, Fig. 10

This data is taken from the publicly available dataof [3]. We use the ﬁrst 500 frames. The relevant imaging parameters are NA 1.49 total internalreﬂection microscopy, pixel size 65 nm, and emission wavelength of 590 nm. Detailed protocolcan be found in [3]. The denoising results for a sample of actin ﬁlaments are shown in Fig. 10.We choose two regions, shown in green and yellow boxes in Fig. 10a to consider regions withdiﬀerent local SBRs. The SRBs for the green and yellow boxes are 3.2 and 3.63, despite the peakintensity in the green box being signiﬁcantly higher. This is because the density of structures inthe green box is signiﬁcantly larger than in the yellow box. We see that all the denoising methodsperform similar with minor diﬀerence. It is seen that the portion in the top with a loop thatappears saturated in the noisy nanoscopy image (Fig. 10b) gets better intensity distributed afterdenoising (Fig. 10c-e). This indicates better contrast distribution close to junctions. The logscale versions of the nanoscopy images for the green box (Fig. 10d-g) clearly indicate that thebackground region is suppressed well by all the methods. However, it is noted that M-3 restoresthe continuity of some low-intensity strands, a feature that is missed by M-1 and M-2. For thesparser region (yellow box). The denoised results in Fig. 10i-k appear similar and are eﬀective inrestoring the visibility of the strands. When seen in the log scale (Fig. 10l-o), it is evident thatM-3 is better at restoring continuity but M-1 is better at suppressing the background faster thanM2- and M-3.

Microtubules in ﬁxed cell, Fig. 11

We consider an example of microtubules in ﬁxed cellstaken from [65] as another challenge case. This is because a microtubule has geometric similaritywith actins and mitochondria in the sense of tubularity but is signiﬁcantly diﬀerent in terms ofradius. The radii of microtubules is in the range 25-30 nm while those of actin ﬁlaments are in15 ig. 10. Results of artefact removal from nanoscopy result of in-vitro preformed actinﬁlaments. The second and the fourth rows show results in log scale. The contrast in logscale is adjusted such that the elliptic blob in the top portion of green ROI and the forkin the bottom right of the yellow ROI appear visually similar across the row. Scale bar2 µ m in a, 500 nm in b-o. the range 5-7 nm. Detailed protocol of the considered example can be found in [65].The ﬁrst 500frames of the second example of microtubules in ﬁxed cell are used. This data is also publiclyavailable. The relevant imaging parameters are inverted epiﬂuorescence system of 1.49 NA, 108nm pixel size, and emission wavelength of 667 nm. The SBR of the selected region is ∼

4. Sincethe sample and illumination are 3D, out-of-focus light is also a problem.The result is shown in Fig. 11. The sample has a dense structure with a number of thin strands,not previously encountered in the simulated data. The results are shown in ﬁgure 11. We can seefrom ﬁgures 11 (b - e) that M-1 to M-3 all manage to enhance the continuity of the strands whilealso suppressing the low intensity, out-of-focus strands. On a qualitative front, M-1 seems toperform the best with good amount of clarity in the individual strands. This example illustratessome potential of generalization of the model for untrained structures and structural density onstructures geometrically and optically similar to those simulated for the training dataset.

Liposomes stabilised in agarose, Fig. 12

This is one of the challenging samples withstructures having radii of 125 ±

30 nm. This data is also taken from a publicly released datasetof [11]. The imaging parameters of relevance are epiﬂuorescence microscope of NA 1.42, pixelsize 80 nm, and emission wavelength of 537 nm. The liposomes were lab fabricated with averagediameter of 250 nm. The emulate vesicles with membrane labels. They were stabilized inagarose, which is likely to contribute to background through autoﬂuorescence. The fragile natureof liposome assembly also means that there may have been debris from liposomes that weredisintegrating before the ﬁxation in agarose. These sources of extra background case the SBR atthe bright spot seen in Fig. 12a to be ∼ ∼ ig. 11. Results of artefact removal from nanoscopy images of microtubules, whichwere not simulated nor included in the training. Bottom row shows results in the logscale. Scale bar 1 µ m.Fig. 12. Results of artefact removal from nanoscopy result of liposomes. f-i showresults in log scale. The contrast in log scale is adjusted such that the elliptic blob tothe left of the color bar appears visually similar. Scale bar 500 nm. simplicity of the structures, ﬁxation, and relatively favorable sparsity of liposome distribution.Further, the diﬀraction limited resolution for the microscope parameters is approximately 190nm for the noise-free case. Therefore the liposomes comparable in size to the resolution limit.The denoising results are shown in Fig. 12c-d, while Fig. 12b shows the noisy nanoscopyimage. While the denoising or artefact suppression eﬀect is not evident in the denoised images inthe ﬁrst glace, the contrast enhancement and visibility of liposomes other than the two clearlydeﬁned ones is witnessed. A further insight is obtained from the log-scaled images shown in Fig.12g-k, where the background suppression by M-1 to M-3 is easily noticeable.We note that the methods were trained for images with multiple vesicles of radii distributeduniformly in the range [25,500], the radius being selected independently for each vesicle. Sincethe intensity of vesicles in the raw microscopy and the nanoscopy images is proportional to thesize, smaller object produce dimmer signals and are not trained well for. This is particularlyimportant for sub-diﬀraction structures where the resolution-limited image will display intensityproportionally to their sizes. Since MUSICAL introduces non-linearities in order to achievesuper-resolution, this also means that inherently will reduce the contribution and therefore theappearance of dimmer objects. As a result, the training set is implicitly adding a bias towardlarger structure. However, the structures in the experimental data have a narrow distributionaround the resolution limit which explains why the results seems diﬀerent from the ones obtainedfor the simulations. Therefore, there exists a margin for customization where the training setcontains narrower distribution of the diameter.17 ig. 13. Results of artefact removal from nanoscopy result of mitochondria in livingcells. (f-i) show results in log scale. Scale bar 2 µ m. Mitochondria in a living cell, Fig. 13

This data is measured in our laboratory on livingcardiomyocytes, in which mitochondria were labeled using MitoTracker green dye which are livecell compatible. Two hundred frames were acquired at a frame rate of 40 frames per second butwith an exposure time of 3 ms. The other relevant microscopy parameters are epiﬂuorescencemicroscope of NA 1.42, 80 nm pixel size, and emission wavelength of 520 nm. The SBR of theimage stack at the brightest point shown in the green box (labeled k) in Fig. 13a is ∼

4. Discussion

Here we present our observations and comments on various points of interest.

Generalizability and scalability

Apart from the visually better results obtained on structuresthat the models were trained on, we observe the models performing generally good on newweakly-related structures that the model was not trained on. We also note a general restoration ofresolvability of structures and a reduction in the background debris. Similar observation extendedto noise distributions in the raw data which were not considered in training, where at least partialgeneralizability of the denoising approach was witnessed. However, since diﬀerent cameras mayhave diﬀerent noise distributions or characteristics, it might be judicious to include a range ofnoise models in the simulated training dataset.We deliberately train for an SBR value which is considered quite poor in the hope that itcan be generalized for data with better SNR as well. This is clearly witnessed in our results on18ctual experimental microscopy data. The results also verify that the unconventional approach ofsimulation-supervised deep learning works well for this problem and helps in circumventing theground truth absence problem. The random selection of the values of a variety of parametersensures that diversity of situations are included without introducing signiﬁcant bias. Nonetheless,some of the quantities at present are ﬁxed either for simplicity of simulations or for limitingthe size of dataset (and thereby the time needed for creating it). In the future, the same datasetmay be expanded for more variety of conditions, or more independent datasets may be createdfor exploration of transfer learning across FPMs, structures, microscopes; and other sources ofartefacts.

Models and loss functions

Coming to speciﬁc methods, we see M-1 performing really wellon a variety of structures including structures of varying thicknesses, and even densely-packedstructures like microtubules. M-2 and M-3 lag slightly behind but still seem to work reallywell at suppressing the background debris. In summary, M-1 comes across as the mostgeneralizable model producing good results across a variety of structures with appreciableresolution improvement and signiﬁcant background noise reduction. It is also the method thatgenerally resulted into leading PSNR values in our test data. From our results, it appears thatVGG-based perceptual loss function used in M-1 provides good qualitative as well as quantitativeperformance. It is possibly due to the use activation maps of abstract nature at multiple depthsthat VGG loss function is able to learn sophisticated artefact suppression model. On the otherhand, we think that the combination of SSIM and L1, such as used in M-2, provides a goodbalance between perceptual quality and pixel wise match.

Metrics and the value of quantitative analysis.

The training procedure of deep learningmethods need loss functions and therefore inherently uses some form of quantitative indicator ofquality of denoising. Nonetheless, as exempliﬁed through Table 1 and Fig. 4, a single valuedquantitative metric may fail to be an absolute hallmark of quality assessment, especially for themicroscopy images in general and nanoscopy images in particular. It might be interesting in thefuture to design quality metrics customized for this ﬁeld of science.

5. Conclusion

In this work, artefact removal for a selected ﬂuctuations based nanoscopy method is reported.Artefacts in such nanoscopy methods are attributed to the noise, the photokinetics, as wellas the computational treatment of data. A fundamental impediment of the practical artefactremoval problem is that it is impossible to experimentally curate a supervised training datasetor synthesize noise-model based datasets. The problem of ground truth absence is eﬀectivelydealt with simulations that realistically mimic every aspect of measurement. It is seen thatautoencoder deep learning through simulation-supervised training dataset is quite eﬀective insuppressing artefacts arising from photokinetics, raw microscopy, and nanoscopy algorithminduced non-linear data distortions. Our approach is also observed to be generalizable acrossmultiple diﬀerent structures, diﬀerent noise models, and nanoscopy algorithms not used duringthe training process and thus previously unseen by any of the models. Nonetheless, scaling thedataset for more variety of conditions can be easily incorporated or transfer learning can beexplored. In the future, we wish to add more versatility to the simulation-supervised trainingdataset and explore the design of suitable metrics for quality analysis in the nanoscopy images.

Acknowledgements

The authors acknowledge Ida Sundvor Opstad for generating the data for mitochondria in livingcell, Balpreet Singh Ahluwalia for providing the microscope and Åsa Birna Birgisdottir for19reparing and providing the cardiomyoblast cells. The providers of the public datasets insection 3.1 and the teams that contributed to the experimental data in these datasets are alsoacknowledged.

Author contributions

KA and DKP conceived the idea. SJ generated the simulation supervised dataset under theguidance of SA and using the codes provided by him. SJ also performed the deep learning andanalysis, with DKP providing insights on the topic. SJ, SA, and KA made the ﬁgures. All theauthors contributed to writing.

Disclosures

We declare no conﬂicts of interest.

Data availability

The deep learning models, the simulated raw microscopy data, and the data of mitochondria in liv-ing cells will be made public after the manuscript is accepted. The codes will be shared at https://github.com/IAmSuyogJadhav/Nanoscopy-Artefact-Suppresion . References

1. T. Dertinger, R. Colyer, G. Iyer, S. Weiss, and J. Enderlein, “Fast, background-free, 3d super-resolution opticalﬂuctuation imaging (soﬁ),” Proc. Natl. Acad. Sci. , 22287–22292 (2009).2. N. Gustafsson, S. Culley, G. Ashdown, D. M. Owen, P. M. Pereira, and R. Henriques, “Fast live-cell conventionalﬂuorophore nanoscopy with imagej through super-resolution radial ﬂuctuations,” Nat. Commun. , 12471 (2016).3. K. Agarwal and R. Macháň, “Multiple signal classiﬁcation algorithm for super-resolution ﬂuorescence microscopy,”Nat. Commun. , 1–9 (2016).4. I. Yahiatene, S. Hennig, M. MÃĳller, and T. Huser, “Entropy-based super-resolution imaging (esi): From disorder toﬁne detail,” ACS Photonics , 1049–1056 (2015).5. Y. S. Hu, X. Nan, P. Sengupta, J. Lippincott-Schwartz, and H. Cang, “Accelerating 3b single-molecule super-resolutionmicroscopy with cloud computing,” Nat. Methods , 96–97 (2013).6. Y. Deng, M. Sun, P.-H. Lin, J. Ma, and J. W. Shaevitz, “Spatial covariance reconstructive (score) super-resolutionﬂuorescence microscopy,” PLoS One , 1–9 (2014).7. W. Zhao, J. W. Liu, C. Kong, Y. Zhao, C. Guo, C. Liu, X. Ding, X. Ding, J. Tan, and H. Li, “Faster super-resolutionimaging with auto-correlation two-step deconvolution,” arXiv: Opt. (2018).8. O. Solomon, M. Mutzaﬁ, M. Segev, and Y. C. Eldar, “Sparsity-based super-resolution microscopy from correlationinformation,” Opt. Express , 18238–18269 (2018).9. G. T. Dempsey, J. C. Vaughan, K. H. Chen, M. Bates, and X. Zhuang, “Evaluation of ﬂuorophores for optimalperformance in localization-based super-resolution imaging,” Nat. Methods , 1027 (2011).10. G. C. Rollins, J. Y. Shin, C. Bustamante, and S. Pressé, “Stochastic approach to the molecular counting problem insuperresolution microscopy,” Proc. Natl. Acad. Sci. , E110–E118 (2015).11. I. S. Opstad, S. AcuÃśa, L. E. V. Hernandez, J. Cauzzo, N. Åăkalko Basnet, B. S. Ahluwalia, and K. Agarwal,“Fluorescence ﬂuctuations-based super-resolution microscopy techniques: an experimental comparative study,”arXiv:2008.09195 (2020).12. H. C. Burger, C. J. Schuler, and S. Harmeling, “Image denoising: Can plain neural networks compete with BM3D?”in IEEE Conference on Computer Vision and Pattern Recognition, (2012), pp. 2392–2399.13. K. Zhang, W. Zuo, Y. Chen, D. Meng, and L. Zhang, “Beyond a gaussian denoiser: Residual learning of deep CNNfor image denoising,” IEEE Transactions on Image Process. , 3142–3155 (2017).14. T. Plotz and S. Roth, “Benchmarking denoising algorithms with real photographs,” in IEEE Conference on ComputerVision and Pattern Recognition, (2017), pp. 1586–1595.15. S. Guo, Z. Yan, K. Zhang, W. Zuo, and L. Zhang, “Toward convolutional blind denoising of real photographs,” in

IEEE Conference on Computer Vision and Pattern Recognition, (2019), pp. 1712–1722.16. T. Brooks, B. Mildenhall, T. Xue, J. Chen, D. Sharlet, and J. T. Barron, “Unprocessing images for learned rawdenoising,” in

IEEE Conference on Computer Vision and Pattern Recognition, (2019), pp. 11036–11045.17. A. Singh, A. Bhave, and D. K. Prasad, “Single image dehazing for a variety of haze scenarios using back projectedpyramid network,” in

European Conference on Computer Vision Workshops, (2020).18. H. Wang, Y. Rivenson, Y. Jin, Z. Wei, R. Gao, H. Günaydın, L. A. Bentolila, C. Kural, and A. Ozcan, “Deep learningenables cross-modality super-resolution in ﬂuorescence microscopy,” Nat. Methods , 103–110 (2019).

9. S. A. Haider, A. Cameron, P. Siva, D. Lui, M. J. Shaﬁee, A. Boroomand, N. Haider, and A. Wong, “Fluorescencemicroscopy image noise reduction using a stochastically-connected random ﬁeld model,” Sci. Reports , 20640(2016).20. S. Liu, M. J. Mlodzianoski, Z. Hu, Y. Ren, K. McElmurry, D. M. Suter, and F. Huang, “scmos noise-correctionalgorithm for microscopy images,” Nat. Methods , 760–761 (2017).21. W. Meiniel, J.-C. Olivo-Marin, and E. D. Angelini, “Denoising of microscopy images: a review of the state-of-the-art,and a new sparsity-based method,” IEEE Transactions on Image Process. , 3842–3856 (2018).22. S. K. Maji and H. Yahia, “A feature based reconstruction model for ﬂuorescence microscopy image denoising,” Sci.Reports , 7725 (2019).23. B. Mandracchia, X. Hua, C. Guo, J. Son, T. Urner, and S. Jia, “Fast and accurate scmos noise correction forﬂuorescence microscopy,” Nat. Commun. , 1–12 (2020).24. B. Manifold, E. Thomas, A. T. Francis, A. H. Hill, and D. Fu, “Denoising of stimulated raman scattering microscopyimages via deep learning,” Biomed. Opt. Express , 3860–3874 (2019).25. T.-A. Nguyen, G. M. Hagen, and J. Ventura, “Deep learning for denoising of ﬂuorescence microscopy images,”(2018).26. Y. Zhang, Y. Zhu, E. Nichols, Q. Wang, S. Zhang, C. Smith, and S. Howard, “A poisson-gaussian denoising datasetwith real ﬂuorescence microscopy images,” in IEEE Conference on Computer Vision and Pattern Recognition, (2019),pp. 11710–11718.27. X. Yi and S. Weiss, “Cusp-artifacts in high order superresolution optical ﬂuctuation imaging,” Biomed. Opt. Express , 554–570 (2020).28. P. Vincent, H. Larochelle, I. Lajoie, Y. Bengio, P.-A. Manzagol, and L. Bottou, “Stacked denoising autoencoders:Learning useful representations in a deep network with a local denoising criterion.” J. Mach. Learn. Res. (2010).29. A. A. Sekh, I. S. Opstad, R. Agarwal, A. B. Birgisdottir, T. Myrmel, B. S. Ahluwalia, K. Agarwal, and D. K. Prasad,“Simulation-supervised deep learning for analysing organelles states and behaviour in living cells,” (2020).30. L. Yao, Z. Ou, B. Luo, C. Xu, and Q. Chen, “Machine learning to reveal nanoparticle dynamics from liquid-phasetem videos,” ACS Cent. Sci. (2020).31. H. Gupta, M. T. McCann, L. Donati, and M. Unser, “Cryogan: A new reconstruction paradigm for single-particlecryo-em via deep adversarial learning,” BioRxiv (2020).32. W. Chiu, A. McGough, M. B. Sherman, and M. F. Schmid, “High-resolution electron cryomicroscopy of macro-molecular assemblies,” Trends Cell Biol. , 154–159 (1999).33. V. E. Galkin, A. Orlova, G. F. Schröder, and E. H. Egelman, “Structural polymorphism in f-actin,” Nat. structural &Mol. Biol. , 1318 (2010).34. E. Egelman, N. Francis, and D. DeRosier, “F-actin is a helix with a random variable twist,” Nature , 131–135(1982).35. D. W. Fawcett, “An atlas of ﬁne structure: the cell, its organelles, and inclusions.” Tech. rep. (1966).36. T. Stephan, A. Roesch, D. Riedel, and S. Jakobs, “Live-cell sted nanoscopy of mitochondrial cristae,” Sci. Reports ,1–6 (2019).37. S. M. Rafelski, “Mitochondrial network morphology: building an integrative, geometrical view,” BMC Biol. , 1–9(2013).38. C. Huang, D. Quinn, Y. Sadovsky, S. Suresh, and K. J. Hsia, “Formation and size distribution of self-assembledvesicles,” Proc. Natl. Acad. Sci. , 2910–2915 (2017).39. M. E. de Araujo, G. Liebscher, M. W. Hess, and L. A. Huber, “Lysosomal size matters,” Traﬃc , 60–75 (2020).40. J. Huotari and A. Helenius, “Endosome maturation,” The EMBO J. , 3481–3500 (2011).41. T. Ha and P. Tinnefeld, “Photophysics of ﬂuorescent probes for single-molecule biophysics and super-resolutionimaging,” Annu. Rev. Phys. Chem. , 595–617 (2012).42. R. M. Dickson, A. B. Cubitt, R. Y. Tsien, and W. E. Moerner, “On/oﬀ blinking and switching behaviour of singlemolecules of green ﬂuorescent protein,” Nature , 355–358 (1997).43. S. Cox, E. Rosten, J. Monypenny, T. Jovanovic-Talisman, D. T. Burnette, J. Lippincott-Schwartz, G. E. Jones, andR. Heintzmann, “Bayesian localization microscopy reveals nanoscale podosome dynamics,” Nat. Methods , 195–200(2012).44. M. J. Rust, M. Bates, and X. Zhuang, “Sub-diﬀraction-limit imaging by stochastic optical reconstruction microscopy(storm),” Nat. Methods , 793 (2006).45. J. Schnitzbauer, M. T. Strauss, T. Schlichthaerle, F. Schueder, and R. Jungmann, “Super-resolution microscopy withDNA-PAINT,” Nat. Protoc. , 1198–1228 (2017).46. A. Girsault, T. Lukes, A. Sharipov, S. Geissbuehler, M. Leutenegger, W. Vandenberg, P. Dedecker, J. Hofkens, andT. Lasser, “Soﬁ simulation tool: A software package for simulating and testing super-resolution optical ﬂuctuationimaging,” PLOS ONE , 1–13 (2016).47. S. F. Gibson and F. Lanni, “Experimental test of an analytical model of aberration in an oil-immersion objective lensused in three-dimensional light microscopy,” J. Opt. Soc. Am. A , 154–166 (1992).48. J. Li, F. Xue, and T. Blu, “Fast and accurate three-dimensional point spread function computation for ﬂuorescencemicroscopy,” J. Opt. Soc. Am. A , 1029–1034 (2017).49. A. Sekh, I.-S. Opstad, A. Birgisdottir, T. Myrmel, B. Ahluwalia, K. Agarwal, and D. K. Prasad, “Learning nanoscalemotion patterns of vesicles in living cells,” in IEEE Conference on Computer Vision and Pattern Recognition, (2020), p. 1–10.50. K. Agarwal and D. K. Prasad, “Eigen-analysis reveals components supporting super-resolution imaging of blinkingﬂuorophores,” Sci. Reports , 4445 (2017).51. S. A. AcuÃśa-Maldonado, “Multiple Signal Classiﬁcation Algorithm: computational time reduction and patternrecognition applications,” Master’s thesis, UiT The Arctic University of Norway, TromsÃÿ, Norway (2019).52. T. Pärnamaa and L. Parts, “Accurate classiﬁcation of protein subcellular localization from high-throughput microscopyimages using deep learning,” G3: Genes, Genomes, Genet. , 1385–1392 (2017).53. E. A. Hay and R. Parthasarathy, “Performance of convolutional neural networks for identiﬁcation of bacteria in 3dmicroscopy datasets,” PLoS Comput. Biol. , e1006628 (2018).54. O. Ronneberger, P. Fischer, and T. Brox, “U-net: Convolutional networks for biomedical image segmentation,” in International Conference on Medical image computing and computer-assisted intervention, (Springer, 2015), pp.234–241.55. T.-Y. Lin, P. Dollár, R. Girshick, K. He, B. Hariharan, and S. Belongie, “Feature pyramid networks for objectdetection,” in

IEEE Conference on Computer Vision and Pattern Recognition, (2017), pp. 2117–2125.56. S. S. Seferbekov, V. Iglovikov, A. Buslaev, and A. Shvets, “Feature pyramid network for multi-class land segmentation.”in

IEEE Conference on Computer Vision and Pattern Recognition Workshops, (2018), pp. 272–275.57. A. Kirillov, R. Girshick, K. He, and P. Dollár, “Panoptic feature pyramid networks,” in

IEEE Conference on ComputerVision and Pattern Recognition, (2019), pp. 6399–6408.58. S. Geissbuehler, N. L. Bocchio, C. Dellagiacoma, C. Berclaz, M. Leutenegger, and T. Lasser, “Mapping molecularstatistics with balanced super-resolution optical ﬂuctuation imaging (bsoﬁ),” Opt. Nanoscopy , 4 (2012).59. Z. Wang, A. C. Bovik, H. R. Sheikh, and E. P. Simoncelli, “Image quality assessment: from error visibility tostructural similarity,” IEEE Transactions on Image Process. , 600–612 (2004).60. D. M. Rouse and S. S. Hemami, “Understanding and simplifying the structural similarity metric,” in IEEE InternationalConference on Image Processing, (2008), pp. 1188–1191.61. Z. Wang, E. P. Simoncelli, and A. C. Bovik, “Multiscale structural similarity for image quality assessment,” in

TheThrity-Seventh Asilomar Conference on Signals, Systems & Computers, 2003, vol. 2 (2003), pp. 1398–1402.62. K. Simonyan and A. Zisserman, “Very deep convolutional networks for large-scale image recognition,” arXiv preprintarXiv:1409.1556 (2014).63. H. Zhao, O. Gallo, I. Frosio, and J. Kautz, “Loss functions for image restoration with neural networks,” IEEETransactions on Comput. Imaging , 47–57 (2016).64. R. C. Gonzalez, R. E. Woods, and S. L. Eddins, Digital image processing using MATLAB (Pearson Education India,2004).65. K. Agarwal, R. Macháň, and D. K. Prasad, “Non-heuristic automatic techniques for overcoming low signal-to-noise-ratio bias of localization microscopy and multiple signal classiﬁcation algorithm,” Sci. Reports , 1–14(2018)., 1–14(2018).