[PDF] Noise Reduction in X-ray Photon Correlation Spectroscopy with Convolutional Neural Networks Encoder-Decoder Models

Abstract

Like other experimental techniques, X-ray Photon Correlation Spectroscopy is subject to various kinds of noise. Random and correlated fluctuations and heterogeneities can be present in a two-time correlation function and obscure the information about the intrinsic dynamics of a sample. Simultaneously addressing the disparate origins of noise in the experimental data is challenging. We propose a computational approach for improving the signal-to-noise ratio in two-time correlation functions that is based on Convolutional Neural Network Encoder-Decoder (CNN-ED) models. Such models extract features from an image via convolutional layers, project them to a low dimensional space and then reconstruct a clean image from this reduced representation via transposed convolutional layers. Not only are ED models a general tool for random noise removal, but their application to low signal-to-noise data can enhance the data quantitative usage since they are able to learn the functional form of the signal. We demonstrate that the CNN-ED models trained on real-world experimental data help to effectively extract equilibrium dynamics parameters from two-time correlation functions, containing statistical noise and dynamic heterogeneities. Strategies for optimizing the models performance and their applicability limits are discussed.

Full PDF

NNoise Reduction in X-ray Photon CorrelationSpectroscopy with Convolutional Neural NetworksEncoder-Decoder Models

Tatiana Konstantinova , Lutz Wiegart , Maksim Rakitin , Anthony M. DeGennaro

1, + , andAndi M. Barbour

1, * Brookhaven National Laboratory, NSLS-II, Upton, NY 11973, USA + [email protected] * [email protected] ABSTRACT

Like other experimental techniques, X-ray Photon Correlation Spectroscopy is a subject to various kinds of noise. Randomand correlated ﬂuctuations and heterogeneities can be present in a two-time correlation function and obscure the informationabout the intrinsic dynamics of a sample. Simultaneously addressing the disparate origins of noise in the experimental data ischallenging. We propose a computational approach for improving the signal-to-noise ratio in two-time correlation functionsthat is based on Convolutional Neural Network Encoder-Decoder (CNN-ED) models. Such models extract features from animage via convolutional layers, project them to a low dimensional space and then reconstruct a clean image from this reducedrepresentation via transposed convolutional layers. Not only are ED models a general tool for random noise removal, but theirapplication to low signal-to-noise data can enhance the data’s quantitative usage since they are able to learn the functionalform of the signal. We demonstrate that the CNN-ED models trained on real-world experimental data help to effectively extractequilibrium dynamics’ parameters from two-time correlation functions, containing statistical noise and dynamic heterogeneities.Strategies for optimizing the models’ performance and their applicability limits are discussed.

Introduction

Noise reduction in experiments facilitates reliable extraction of useful information from a smaller amount of data. This allowsfor more efﬁcient use of experimental and analytical resources as well as enables the study of systems with intrinsically limitedmeasurement time, e.g. cases with sample damage or out-of-equilibrium dynamics. While instrumentation development andoptimization of experimental protocols are crucial in noise reduction, there are situations where computational methods canadvance the improvements even further.X-ray Photon Correlation Spectroscopy (XPCS) is a statistics-based technique that extracts information about a sample’sdynamics through spatial and temporal analysis of intensity correlations between sequential images (frames) of a speckledpattern collected from coherent X-ray beam scattered from the sample. The two-time intensity-intensity correlation function (2TCF) is a matrix calculated as: C ( qqq , t , t ) = (cid:104) I ( qqq , t ) I ( qqq , t ) (cid:105)(cid:104) I ( qqq , t ) (cid:105)(cid:104) I ( qqq , t ) (cid:105) (1)where I ( qqq , t ) is the intensity of a detector pixel corresponding to the wave vector qqq at time t . The average is taken over pixelswith equivalent qqq values. An example of a 2TCF is shown in Fig. 1. The dimensions of the matrix are N x N , where N is a numberof frames in the experimental series. The dynamics can be traced along the lag times δ t = | t − t | . In the case of equilibriumdynamics, information from a 2TCF can be ‘condensed’ to a single dimension by integrating along the (1,1) diagonal producinga time-averaged one-time photon correlation function (1TCF) : C ( qqq , δ t ) = C ∞ + β | f ( qqq , δ t ) | (2)where f ( qqq , δ t ) is the intermediate scattering function at lag time δ t , β is the optical contrast and C ∞ is the baseline that equalsto 1 for ergodic samples. While 1TCF can be directly obtained from raw data , calculating 2TCF as an intermediate step isbeneﬁcial even for presumably equilibrium cases. 2TCF contains time-resolved information about both samples’ intrinsicdynamics and ﬂuctuations of the experimental conditions, which enables one to determine between stationary and non-stationarydynamics and whether or not the time-averaged 1TCF is a valid representation of the scattering series. Investigation of 2TCF a r X i v : . [ c ond - m a t . m t r l - s c i ] F e b elps to identify single-frame events, such as cosmic rays detection, and beam-induced dynamics, where timescales might varywith the accumulation of X-ray dose absorbed by the sample during the acquisition of the dataset.XPCS experiments can suffer from various sources of noise and artifacts: probabilistic nature of photon scattering, detectorshot noise, and instrumentational instabilities. Signiﬁcant progress in reduction of the noise involved in photon detection andcounting has been made by developing single-photon counting devices and employment of the ‘droplet’ algorithm orpixel binning . Efforts have been dedicated to integrating feedback loops into instrumentational controls to reduce theimpact of instabilities. Despite the current advances of experimental setup and methods for data analysis in reduction of noiseand instability effects, achieving high signal-to-noise ratio is still a practical challenge in many XPCS experiments. The needto suppress the high-frequency ﬂuctuations leads to extended data collection times – an approach that itself can introduceadditional errors, for instance slow changes in experimental conditions. Limited experimental resources may not allow formultiple repeated measurements for systems with very slow dynamics. Besides, a sample’s intrinsic properties can limit thetime-range, within which the dynamics can be considered as equilibrium and thus quantitatively evaluated with Eq. 2. A toolthat helps to accurately extract parameters of the system’s equilibrium dynamics from a limited amount of noisy data would beuseful, but no generally applicable, out-of-the-box tool exists for XPCS results.Solutions based on artiﬁcial neural networks are attractive candidates as they are broadly used for application-speciﬁc noiseremoval. Among such solutions are extensions of autoencoder models , which are unsupervised algorithms for learning acondensed representation of an input signal. The principle behind an autoencoder is based on a common fact that the informationabout signiﬁcant non-random variations in data is contained in a much smaller number of variables than the dimensionality ofthe data. An autoencoder model consists of two modules: encoder and decoder. The encoder transforms the input signal toa set of unique variables called latent space. The decoder part then attempts to transform the encoded variables back to theoriginal input. As the number of components in the latent space is generally much smaller than the number of components inthe original input, the nonessential information, i.e. random noise, is lost during such transformations. Thus, an autoencodermodel on its own can be used as an effective noise reduction tool. However, in the scope of this work we employ a broader ideaof noise. We treat all dynamic heterogeneities due to changes in a sample conﬁguration caused by stress or diffusive processes,as well as correlated noise in 2TCF, as an unwanted signal. An autoencoder model can be modiﬁed to address the removal of adeterministic, application-speciﬁc noise by replacing its targets with ‘noise-free’ versions of the input signals. In the case of animage-like input, such as an XPCS 2TCF, convolutional neural networks (CNN) are the obvious choice for the encoder anddecoder modules. CNN-based encoder-decoder (CNN-ED) models have been successfully implemented for noise removal andrestoration of impaired signals in audio applications and images .Here, we demonstrate an approach for noise reduction in 2TCFs by means of CNN-ED models. An ensemble of such models,trained on real experimental data, shows noticeable supression of noise while preserving the functional form of system’sequilibrium dynamics and the temporal resolution of the signal. Addressing noise removal from 2TCF instead of the scatteringsignal at the detector makes the approach agnostic to the type of the registering device, the size of the selected area, the shapeof the speckles, the intensity of the scattering signal and the exposure time, enabling the models’ application to a wide range ofXPCS experiments. Results

Data Processing.

We train our models using data from the measurements of equilibrium dynamics of nanoparticle ﬁlledpolymer systems conducted at the Coherent Hard X-ray Scattering (CHX) beamline at NSLS-II. For the nanoparticles’ dynamicsEq. 2 takes the form : C ( qqq , t ) = C ∞ + β e − ( Γ t ) α (3)where Γ is the rate of the dynamics and α is the compression constant. The baseline C ∞ is nearly 1 in the considered cases. Atypical experiment contains a series of 250 –1000 frames. Multiple detectors’ regions of interest (ROI) corresponding to theequivalent wave vectors are analyzed for each series and the 2TCFs are calculated for each ROI. For each model datum, oran example, the input image is obtained by cropping a 50x50 pixels part from a 2TCF with the center on the (1,1) diagonal,starting at the lower left corner, as shown in Fig. 1(A). Each next datum is obtained by shifting the center of the cropped imagealong the diagonal. The target image for each example is the average of all the cropped inputs extracted from the same 2TCF.Thus, groups of 5 to 20 inputs have the same target. While the target images still contain noise, its level is signiﬁcantly reducedwith respect to the one of the input images. Here, the size of 50x50 pixels is chosen as for the majority of the examples in ourdataset the dynamics’ parameters can be inferred from the ﬁrst 50 frames. However, any size can be selected to train a modelwith no modiﬁcation to its architecture if enough data are available.The diagonal (lag=0) 2TCF values of the raw data reﬂect a normalized variance of the photon count. Such values are vastlydifferent between experiments and detector ROIs. They can by far exceed the values of photon correlation between frames typically on a scale between 1 and 1.5) and are usually excluded from the traditional XPCS analysis. To prevent the inﬂuenceof the high diagonal 2TCF values on the model cost function, we ﬁll the pixels along the diagonal with the values randomlydrawn from the distribution of 2TCF values at lag=1. In doing so, we avoid artiﬁcial discontinuities in the images.For a proper model training process all the input data should be brought to the same scale. However, a commonly appliedstandard scaling is not suitable for the present case as the level of noise may affect the values of the essential parameters such asbaseline and contrast. To bring all examples to a similar scale, the contrast β = input − . For this, thecontrast, or speckle visibility, is calculated for each frame and is averaged among all the frames in the series. After processing,the data are split into the training, validation and test sets as shown in Table 1. The splitting is done in a way that no two inputsfrom different sets have the same target. Model Training.

The ED model architecture used in this work is shown in Fig. 2. The encoder part consists of two convolutionallayers with the kernel size 1 ×

1. Larger kernel sizes do not further improve the performance of the model, which is expectedsince the input images do not have sharp edges or distinct features. The ﬁrst convolutional layer has 10 channels and the secondlayer has 20 channels with rectiﬁed linear unit (ReLU) activation function applied to the output of each channel. No poolinglayers were introduced to prevent information loss at the encoding stage. The output of the convolutional layers contains 50,000features. A linear transformation is performed to convert them to the latent space of a much smaller dimension. The decoderpart consists of 2 transposed convolutional layers that convert the latent space back to a 50 ×

50 image. The cost function usedfor optimizing the model weights is the mean squared error (MSE) between the model’s output and the target, which is shownto be useful for image denoising even in cases of some noise still being present in the target . To avoid over-ﬁtting, earlystopping based on the cost function calculated for the validation set is adopted.However, the MSE of the validation set is not the only parameter to consider when selecting the optimal parameters for themodel. For example, the MSE of the validation set does not have an obvious minimum for the models with the latent spacedimensions between 2 and 200. During a traditional XPCS data processing, not only is the visual representation of 2TCFimportant, but the values of dynamics’ parameters, such as β , Γ , α and C ∞ are essential to drawing scientiﬁc conclusions. Anefﬁcient model would precisely recover these parameters for a broad set of observations. We thus select the optimal dimensionbased on generalizability of the model, i.e. the size of the range of dynamics’ parameters, for which the application of themodel allows to accurately extract the parameters. Here, the rate of the dynamics Γ is the most important parameter to considersince the variation of β is taken care of by pre-processing normalization and the variations in α and C ∞ are naturally very smallin the considered applications. Since only a limited number of validation examples from the real experiments is available, wegenerate 2,000 examples of 2TCF from random parameters of Γ , C ∞ , α and the noise level to test the performance of the modelover a broad parameter space. The dynamics parameters and the amplitude of the Gaussian noise are uniformly distributed overthe range of values that can be encountered by the experimental data. The contrast β is set to unity.To reduce the variance associated with the randomness of the initial weights initialization, we train 10 models with differentrandom initializations for each latent space dimension. For each of the 2,000 generated examples, the output of each model isconverted to 1TCF, the results of models with the same latent space dimension averaged and then ﬁt to Eq. 3. By comparing theextracted ﬁt parameters with the respective values used to generate the examples, we identify the span of Γ values where themodels perform reasonably. The results for ensembles of 10 models with different latent space dimensions are shown in Fig. 3.The convenience of such comparison is that it allows to distinguish between models with high bias and high variance. Themodels with larger latent space dimensions have less bias and perform well for more diverse sample dynamics than modelswith fewer latent variables. For quantitative analysis of the models’ performances see Methods. The ensemble with the latentspace dimension of 200 was selected because it demonstrates the smallest bias for the broadest range of Γ . Models with largerlatent space dimensions show no apparent increase of the applicability range. To address the high variance of the CNN-ED withthe latent space dimension of 200, we train 76 such models with different random initializations and select among them the 10best performing models based on the MSE of the validation set. Selecting only a limited number of the best performing modelsinstead of combining all trained models also optimize the use of storage memory and computational resources. Model Testing.

The performance of the ﬁnal ensemble of the models is evaluated with the test set. An example of noiseremoval from a test datum is shown in Fig. 4. Reduction of the noise is especially important for larger lag times, where fewerscattering frames are available for calculating the correlations.As mentioned above, despite the MSE cost function working well for determining the optimal weights for a model, it is notsufﬁcient to assess the reliability of the model output for quantitative analysis of the materials’ dynamics. We assess theperformance of the ensemble by comparing the ﬁts with Eq. 3 for the 1TCFs calculated from the cropped 50 ×

50 raw data(inputs), the corresponding denoised ensemble’s outputs and the full-range raw data (ground truth target) (see the Methodssection). From the results of the test set, the noise removal from the raw cropped 2TCFs with the CNN-ED ensemble noticeablyimproves precision for dynamics’ parameters in a wide range of cases with 0 . f rames < Γ < . f rames (i.e. the contrast dropsby half within the ﬁrst 8-50% of the frames) in comparison with ﬁtting the raw cropped 2TCFs. The results of the ensemble llow to get reasonable estimates even in cases when the low signal-to-noise ratio of the cropped data prevents convergent ﬁtswithin the parameter boundaries. In the region Γ > . f rames , the results of the models perform not worse than the raw data ingeneral. Note that the precision of the models’ results depends on the 2TCF noise level and the accuracy of identifying theoptical contrast from speckle visibility. Comparison to Other Techniques.

We compare the performance of our approach to several of-the-shelf solutions fornoise reduction in images: linear principle components–based, Gaussian, median and total variation denoising (Chambolles’projection) ﬁlters. The comparison of the application of these techniques to the same test example as in Fig. 4 is shownin Fig. 5. Principle components ﬁlters have the same idea as the ED model – preserving only the information from a fewessential components of the original data. In fact, an autoencoder is a type of non-linear principle component generator. As onewould expect, a ﬁlter based on linear principle components under-performs comparing to the case of non-linear componentsdue to a larger bias of the procedure for the components extraction. Gaussian and median ﬁlters are based on smoothing theintensity ﬂuctuations between neighboring pixels and the total variation denoising is a regularized minimization of the additivenormally distributed noise. While these approaches help to reduce pixel-by-pixel intensity variations, unlike the demonstratedhere CNN-ED models, they do not learn the functional form of the equilibrium 2TCF images and are not improved by havinga larger training set. Moreover, noise removal with the above ﬁlters can introduce false trends in 1TCF, which makes themunsuitable for quantitative XPCS data analysis. Application of the CNN-ED models removes the dynamics’ heterogeneitiesfrom the raw 2TCF and does not introduce nonequilibrium artifacts in the data. Discussion

The CNN-ED approach to noise removal in XPCS shows a reasonable improvement in the quality of the signal, allowingfor quantiﬁcation of a sample’s dynamics from a limited amount of data, avoiding extensive data collection and accessingnarrow quasi-equilibrium regions of nonequilibrium dynamics. The CNN-ED models go beyond a simple smoothing ofintensity between neighboring pixels and empirically learns the structural form of the 2TCF. The models are fast to train anddo not require an extensive amount of training data. The accuracy of the models is pretty robust with respect to the choice ofhyperparameters such as the convolutional kernel size and the latent space dimension. The computational resources requiredfor the application of the ensemble of 10 models (excluding the calculation of the speckle visibility) are smaller than one wouldneed to calculate 2TCFs for a typical number of frames required to achieve the same signal-to-noise ratio.There are several limitations to keep in mind when applying CNN-ED models to a 2TCF. The testing results show that themodels may not reliably remove the noise for the cases of very fast and very slow dynamics (see schematic diagram in Fig. 6).The limiting values of the dynamics’ rates depend on the size N in of the input 2TCF (here, the input size of 50 ×

50 framesconsidered) and the level of noise in the raw data. Besides, before relying on extracting information from the ﬁrst N in frames,one should have a prior knowledge of the type of the system’s dynamics. Some essential dynamics can be outside of theﬁrst N in frames even when a substantial drop of speckle visibility is observed there. Examples include periodic modulationsof correlation functions for a ﬁlm growth , an advective motion is sedimenting suspensions and a shear ﬂow of colloidalparticles .In this work, only equilibrium dynamics described by stretched exponents with the baseline 1 were considered. However,the demonstrated approach to the noise removal can be expanded to other types of dynamics – static cases, sample aging –with sufﬁcient amount of data for training. Even in the absence of proper denoised target data, the autoencoder version of themodel can signiﬁcantly reduce the random noise. Besides, a CNN-ED model can be trained to correct for speciﬁc types ofartifacts, such as impact of instrumentational instabilities or outlier frames, leading to a more efﬁcient use of experimental userfacilities . In the broader scope, modiﬁcations of the presented here CNN-ED models have the potential for application inautomated XPCS data collection and processing pipelines. Similarly to other ﬁelds , the autoencoder models can be usedfor identifying unusual observations in the stream of XPCS data. Additionally, the encoded low-dimensional representation ofthe 2TCF can be used for classiﬁcation, regression and clustering tasks, related to samples’ dynamics. Methods

Model Training Details.

The cost function used for training the models is the Mean Squared Error (MSE) between the target2TCF and the models’ output: cost = m m ∑ k = || x outk − x targetk || (4)where x outk is the model output for the k –th training example and x targetk is the corresponding target’s pixel, m is the number ofexamples, || · || stands for . t every training epoch, batches of size 8 are processed. Adam optimizer with initial learning rate 0.001 is used. Learningrate is reduced by a factor of 0.9995 at every epoch. Initial weights in the convolutional and linear layers are assigned accordingto Xavier uniform initialization . The models are trained with Nvidia GPU accelerator GeForce RTX 2070 Super. Typicaltraining time per epoch is 3.2–6.7 seconds and depends on the dimensionality of the latent space and the kernel size. For thebest selected CNN-ED conﬁguration, the average training time is 5.8 seconds per epoch with 7-29 epochs necessary to train amodel. Selecting Optimal Latent Space Dimension.

As described in the main text, the latent space dimension for the models isselected based on how accurately one can recover dynamics’ parameters from the denoised data. A quantitative measure of biasin recovering Γ is the Absolute Mean Error (AME): AME Γ = | avg ( Γ fit − Γ target ) | (5)Since the parameter Γ can be over-evaluated (positive the mean error) or under-evaluated (negative the mean error), the absolutevalue of the mean error needs to be considered. We look at the values of the AME Γ , calculated with Eq. 5, at different regions of Γ . The results where the optimal ﬁt is outside of the Γ bounds [0, 0.5] f rames are excluded from calculating the errors. Figure 7shows that AME Γ is signiﬁcantly reduced for models with the 18 or more variables in the latent space. For the majority of Γ regions, the models with the latent space dimension of 200 have the smallest AME Γ , indicating low bias of these models.The mean squared error of Γ , calculated as MSE Γ = avg (( Γ fit − Γ target ) ) (6)reﬂects both the bias and the variance of the model’s results. Figure 7 shows that generally the MSE Γ drops for the models with18 latent variables and then ﬂattens, except for cases with very fast dynamics.Due to advantageous performance of the models with 200 latent variables across a broad range of the dynamics’ parameters, weselect such models for our ﬁnal result. To further reduce the variance of the models, an ensemble of 10 the best performingCNN-EDs (out of 76) is constructed. Finding Limiting Cases

We test the performance of the best ensemble (10 models, the latent space dimension is 200) bycomparing the dynamics’ parameters (Eq. 3) extracted from the raw models’ inputs, i.e. cropped parts of the experimental2TCFs, and the results of denoising with the ensemble of CNN-EDs to the parameters extracted from the full-range 2TCF (theground truth values).One can see from Fig. 8 that the rate Γ is extracted from the ensemble output with a good precision for Γ < . f rames (thecontrast drops by half in 4 or more frames) for the validation set. Above Γ = . f rames , the variance for the Γ extracted fromdenoised data is similar to the one of the Γ extracted from the raw data. Other dynamics’ parameters are generally extractedwith better precision from the denoised data than from the raw data. Note, the precision of the β is largely dependent on theaccuracy of extracting the speckle visibility at lag=0 from the photon distribution which serves as the normalization parameter.A similar situation is observed for the test set ( Fig. 9 and Fig. 10), indicating a good generalizability of the model. Besides,the results for the test set establish the lower boundary Γ = . f rames , above which an output of the CNN-ED ensemblegenerally leads to the more precise dynamics parameters than the raw 2TCF. In cases with Γ < . f rames , an insigniﬁcantportion of the dynamics is complete by the 50th frame (the contrast drops in half in 23 or more frames) and the dynamics’parameters usually cannot be accurately identiﬁed from the available data. For the cases within the constraints on Γ , a pooraccuracy in identifying dynamics’ parameters is observed for inputs with a very high noise level and/or the presence of wellpronounced dynamical heterogeneities. Acknowledgements

The authors thank A. Fluerasu and M. Fukuto for fruitful discussions. This research used CHX and CSX beamlines andresources of the National Synchrotron Light Source II, a U.S. Department of Energy (DOE) Ofﬁce of Science User Facilityoperated for the DOE Ofﬁce of Science by Brookhaven National Laboratory(BNL) under Contract No. DE-SC0012704 andunder a BNL Laboratory Directed Research and Development (LDRD) project 20-038 ”Machine Learning for Real-Time DataFidelity, Healing, and Analysis for Coherent X-ray Synchrotron Data”

References Madsen, A., Fluerasu, A. & Ruta, B.

Structural Dynamics of Materials Probed by X-Ray Photon Correlation Spectroscopy ,1617–1641 (Springer International Publishing, 2016). Shpyrko, O. G. X-ray photon correlation spectroscopy.

J. synchrotron radiation , 1057–1064 (2014). . Sinha, S. K., Jiang, Z. & Lurio, L. B. X-ray photon correlation spectroscopy studies of surfaces and thin ﬁlms.

Adv. Mater. , 7764–7785 (2014). Brown, G., Rikvold, P. A., Sutton, M. & Grant, M. Speckle from phase-ordering systems.

Phys. Rev. E , 6601 (1997). Madsen, A., Leheny, R. L., Guo, H., Sprung, M. & Czakkel, O. Beyond simple exponential correlation functions andequilibrium dynamics in x-ray photon correlation spectroscopy.

New J. Phys. , 055001 (2010). Li, L. et al.

Photon statistics and speckle visibility spectroscopy with partially coherent X-rays.

J. Synchrotron Radiat. ,1288–1295, DOI: 10.1107/S1600577514015847 (2014). Lumma, D., Lurio, L. B., Mochrie, S. G. J. & Sutton, M. Area detector based photon correlation in the regime ofshort data batches: Data reduction for dynamic x-ray scattering.

Rev. Sci. Instruments , 3274–3289, DOI: https://doi.org/10.1063/1.1287637 (2000). Grybos, P., Kmon, P., Maj, P. & Szczygiel, R. 32k Channels readout IC for single photon counting detectors with 75 µmpitch, ENC of 123 e- rms, 9 e- rms offset spread and 2 % rms gain spread. In , 1–4, DOI: 10.1109/BioCAS.2015.7348438 (2015). Llopart, X., Campbell, M., Dinapoli, R., San Segundo, D. & Pernigotti, E. Medipix2: A 64-k pixel readout chip with 55-/splmu/m square elements working in single photon counting mode.

IEEE transactions on nuclear science , 2279–2283(2002). Livet, F. et al.

Using direct illumination ccds as high-resolution area detectors for x-ray scattering.

Nucl. InstrumentsMethods Phys. Res. Sect. A: Accel. Spectrometers, Detect. Assoc. Equip. , 596–609 (2000).

Falus, P., Lurio, L. & Mochrie, S. Optimizing the signal-to-noise ratio for x-ray photon correlation spectroscopy.

J.synchrotron radiation , 253–259 (2006). Kongtawong, S. et al.

Recent improvements in beam orbit feedback at nsls-ii.

Nucl. Instruments Methods Phys. Res. Sect.A: Accel. Spectrometers, Detect. Assoc. Equip.

Strocov, V. et al.

High-resolution soft x-ray beamline adress at the swiss light source for resonant inelastic x-ray scatteringand angle-resolved photoelectron spectroscopies.

J. synchrotron radiation , 631–643 (2010). Kramer, M. A. Nonlinear principal component analysis using autoassociative neural networks.

AIChE journal , 233–243(1991). Grais, E. M. & Plumbley, M. D. Single channel audio source separation using convolutional denoising autoencoders. In , 1265–1269 (IEEE, 2017).

Park, S. R. & Lee, J. A fully convolutional neural network for speech enhancement. arXiv preprint arXiv:1609.07132 (2016).

Pathak, D., Krahenbuhl, P., Donahue, J., Darrell, T. & Efros, A. A. Context encoders: Feature learning by inpainting. In

Proceedings of the IEEE conference on computer vision and pattern recognition , 2536–2544 (2016).

Mao, X.-J., Shen, C. & Yang, Y.-B. Image restoration using very deep convolutional encoder-decoder networks withsymmetric skip connections. arXiv preprint arXiv:1603.09056 (2016).

Lehtinen, J. et al.

Noise2noise: Learning image restoration without clean data. arXiv preprint arXiv:1803.04189 (2018).

Duran, J., Coll, B. & Sbert, C. Chambolle’s projection algorithm for total variation denoising.

Image processing on Line ,311–331 (2013). Ju, G. et al.

Coherent x-ray spectroscopy reveals the persistence of island arrangements during layer-by-layer growth.

Nat.Phys. , 589–594 (2019). Möller, J. & Narayanan, T. Velocity ﬂuctuations in sedimenting brownian particles.

Phys. Rev. Lett. , 198001 (2017).

Burghardt, W. R., Sikorski, M., Sandy, A. R. & Narayanan, S. X-ray photon correlation spectroscopy during homogenousshear ﬂow.

Phys. Rev. E , 021402 (2012). Campbell, S. et al.

Outlook for artiﬁcial intelligence and machine learning at the nsls-ii.

Mach. Learn. Sci. Technol. (2020).

Baur, C., Wiestler, B., Albarqouni, S. & Navab, N. Deep autoencoding models for unsupervised anomaly segmentation inbrain mr images. In

Brainlesion: Glioma, Multiple Sclerosis, Stroke and Traumatic Brain Injuries , 161–169 (SpringerInternational Publishing, Cham, 2019).

Chong, Y. S. & Tay, Y. H. Abnormal event detection in videos using spatiotemporal autoencoder. In

Advances in NeuralNetworks - ISNN 2017 , 189–196 (Springer International Publishing, Cham, 2017). Kingma, D. P. & Ba, J. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014).

Glorot, X. & Bengio, Y. Understanding the difﬁculty of training deep feedforward neural networks. In

Proceedings of thethirteenth international conference on artiﬁcial intelligence and statistics , 249–256 (2010).

Figure 1.

Data for the model. (A) 2TCF for an experimental series consisting of 400 frames. Red squares show examples ofregions selected for the model training. Yellow arrow shows the temporal direction t of the system’s dynamics. Yellow solidline shows the 1TCF along t , calculated from the 2TCF. (B) Example of 50 ×

50 2TCF, passed as an input to the model. (C)Example of the target data for the model, obtained by averaging multiple 50 ×

50 diagonal sections of the 2TCF. All imageshave the same intensity scale. Training Validation TestUnique Inputs 4692 1076 972Unique Targets 373 109 95

Table 1.

Distribution of examples between training, validation and test sets. igure 2.

Architecture of the CNN-ED model. The input and the output images have the same intensity scale.

Figure 3.

Selection of the latent space dimension using the set of 2,000 artiﬁcially generated data. The rates of the dynamics, Γ fit , are extracted from the outputs of the ensembles of ten CNN-ED models with the dimensionality of the latent space 2, 10,80 and 200 versus the corresponding values of the rates Γ real used to generate the inputs for the models. igure 4. Example of 2TCF denoising with the CNN-ED models. (A) From left to right: the raw 2TCF obtained from thedata; the 2TCF averaged among all frames in the dataset; the result of the denoising the raw 2TCF with ensemble of 10CNN-ED models. (B) 1TCF calculated from each 2TCF in (A). The dashed line corresponds to a baseline C ∞ = igure 5. Comparison of various noise removal techniques applied to an example from the test set. Top row: results ofapplying ﬁlters to the raw 2TCF, middle row: 1TCFs calculated from the 2TCFs for the raw input (blue dashed line), the resultsof the respective ﬁlters (green solid line) and the target (solid orange line), bottom row: residuals of the 1TFC calculated fromthe example after denoising with the respective ﬁlters. igure 6.

Schematics of the model performance depending on the rate Γ and the noise level in 2TCF. igure 7. Measures of accuracy of determining dynamics’ Γ parameter from denoised data. The AMEs (A) and the MSEs (B)for different regions of Γ (in the units of inverse frames count). igure 8. Extracting dynamics parameters from raw (blue) and denoised (red) 50 ×

50 2TCF from the validation set. Greentriangles correspond to the cases where ﬁt for the raw data did not converge within the parameters’ boundaries. Horizontal axiscorresponds to the values extracted from full-sized 2TCF. Under-ﬁtting of α is observed for examples with very high noiselevel and the presence of dynamics’ heterogeneity. Γ is given in the units of [ f rames − ] , and can be converted to appropriatereverse time units. igure 9. Extracting dynamics parameters from raw (blue) and denoised (red) 50 ×

50 2TCF from the test set. Green trianglescorrespond to the cases where ﬁt for the raw data did not converge within the parameters’ boundaries. Horizontal axiscorresponds to the values extracted from full-sized 2TCF. Γ is given in the units of [ f rames − ] , and can be converted toappropriate reverse time units. igure 10. Extracting dynamics parameters from raw (blue) and denoised (red) 50 ×

Only the cases with Γ real > . are shown to highlight thehigher precision of the extracted parameters for these cases. Γ is given in the units of [ f rames − ] , and can be converted toappropriate reverse time units., and can be converted toappropriate reverse time units.