Adaptive deep learning for time-varying systems with hidden parameters: Predicting changing input beam distributions of compact particle accelerators
Alexander Scheinker, Frederick Cropp, Sergio Paiagua, Daniele Filippetto
DDemonstration of adaptive machine learning-based distribution tracking on a compact accelerator: Towards enabling model-based 6D non-invasive beam diagnostics
Alexander Scheinker,
Frederick Cropp,
Sergio Paiagua, and Daniele Filippetto * Corresponding author email: [email protected] Los Alamos National Laboratory, P.O. Box 1663, Los Alamos, New Mexico 87545, USA Department of Physics and Astronomy, University of California Los Angeles, Los Angeles, California 90095, USA Lawrence Berkeley National Laboratory, One Cyclotron Road, Berkeley, California 94720, USA (Dated: February 8, 2021)
Abstract : Intense charged particle beam dynamics are dominated by complex nonlinear collective effects such as space charge forces and coherent synchrotron radiation in a 6 dimensional (6D) phase space (x,y,z,p x ,p y ,p z ). Non-invasive 6D phase space diagnostics would be of great benefit to all particle accelerators. Physics models can potentially serve as non-invasive beam diagnostics, however the main challenges they face are uncertain parameters and beam distributions to be used for initial conditions which drift with time requiring lengthy measurements that interrupt accelerator operations. We present a fast adaptive machine learning method to automatically track time varying distributions and quantum efficiency maps and demonstrate the potential to predict 6D phase space. Particle accelerators are powerful tools for scientific research such as imaging of nuclear motion by ultrafast electron diffraction (UED) [1], imaging dynamic phenomenon at free electron lasers (FEL) [2], and generating GV/m accelerating fields in plasma wakefield accelerators (PWFA) [3]. Major challenges faced by advanced accelerators are the ability to quickly and automatically control the phase space distributions of accelerated beams such as custom current profiles, and switching between different experiments with order of magnitude changes in beam properties, which can require hours of hands-on tuning. The difficulties are due to limited non-invasive diagnostics, time variation of accelerator parameters, and complex time varying intense bunches. Intense bunches of 10 -10 particles are complex objects whose 6D phase space (x,y,z,p x ,p y ,p z ) dynamics are coupled through collective effects such as space charge forces and coherent synchrotron radiation [4]. Accelerators generate extremely short bunches: 30 fs bunches at the SwissFEL X-ray FEL [5], sub 100 fs bunches for UEDs [6], and picosecond bunch trains for UEDs and multicolor XFELs [7]. The High Repetition-rate Electron Scattering apparatus (HiRES, shown in Fig. 1) at Lawrence Berkeley National Laboratory (LBNL), accelerates pC-class, sub-picosecond long electron bunches up to one million times a second (MHz), providing some of the most dense 6D phase space among accelerators at unique repetition rates, making it an ideal test bed for advanced algorithm development [8,9]. Short intense bunches require advanced phase space manipulation techniques: laser shaping for uniform 3D ellipsoidal charge distributions [10], plasma wakefields to control longitudinal phase space (LPS) [11], and complex, 3D alternating-phase focusing optics for optimizing injection into dielectric-laser acceleration, increasing beam energy while stabilizing 6D phase space [12]. Such techniques rely on accurate beam dynamics models. Advances in modeling have demonstrated the potential of simulations to serve as non-invasive diagnostics for accelerator beams [13-19]. Models have been used to discover new physics such as stabilization of the hose instability in PWFA [20], and the prediction of the details of the final PS for a 180 pC electron bunch in the LCLS XFEL in the presence of microbunching instability [21]. However, even the most powerful simulation tools cannot accurately predict beam dynamics if they do not start with the correct initial conditions, which drift because particle accelerators are coupled to a changing environment. Full 6D measurements using conventional methods require hours [22], making it difficult to provide real-time phase space diagnostics for adaptive phase space tuning. Figure 1: Overview of the HiRES UED adapted from [8].
In this Letter, we propose a new method for real time adaptive tracking of time-varying accelerator beam phase space distributions, photocathode quantum efficiency (QE), and accelerator parameters based on measured data along the accelerator. We demonstrate our method on the HiRES UED beamline. Our general approach can be used at accelerators around the world for adaptively reconstructing beam distributions at various locations based on sufficiently rich measurements such as energy spread spectra or transverse deflecting cavity (TCAV) LPS measurements [23], which can be used to guide adaptive feedback. Our method has the potential to enable adaptive model-based virtual 6D phase space diagnostics for controlling particle accelerator beams at unprecedented speed and accuracy, optimizing accelerator applications and expanding scientific reach. Machine learning (ML) tools can learn complex relationships in dynamic systems. Recent ML applications include quantum Monte Carlo simulations [24], many-body contact interactions [25], and adaptive ML for LPS control with fs resolution [26]. In the present work we utilize an adaptive convolutional neural network (CNN) approach for mapping measurements directly to the learned principal components that represent our physical system. The first step was to collect data to learn a basis with which to represent a distribution of interest. In our case we were interested in representing the time-varying input beam distribution of the HiRES. We started by collecting 5 laser spot images on the HiRES photocathode over a period of 5 days of small transverse size of roughly 300×300 μm , an area small enough so that significant quantum efficiency variations were not expected and we could relate laser intensity directly to an initial electron bunch charge distribution. The five images were then each rotated through 360 deg at steps of 1 deg generating a total of 1800 52×52 pixel images with pixel resolution of 6.45 μm. These images were then stretched out as 1×52 dimensional vectors and stacked in a 2704×1800 data matrix. Considering this as a collection of 1800 vectors from a 2705 dimensional space, we then carried out a principal component analysis (PCA) [27], a method closely related to singular value decomposition for finding lower dimensional representations of higher dimensional spaces, such as the method for finding the most important components of a large complex beam line [28]. e could then generate input beam distributions I i as linear combinations of N pca PCA components I i,Npca as 𝐼 𝑖,𝑁 𝑃𝐶𝐴 = ∑ 𝛼 𝑖,𝑛 × PC 𝑛 . (1) 𝑁 𝑝𝑐𝑎 𝑛=1 Although hundreds of PCA components were required to represent >95\% of the variance in our data, we wanted to significantly reduce the dimensionality of our problem. The first 30 principal components (PC ,…,PC ) are shown in Fig. 2 and qualitatively the most dominant modes are the first 15. To quantify the accuracy of reconstruction as a function of the number of PCA components used, we defined the error 𝑒𝑟𝑟𝑜𝑟 = 100 × ∑ |𝐼 𝑖 − 𝐼 𝑖,𝑁 𝑝𝑐𝑎 | / 𝑥,𝑦 ∑ 𝐼 𝑖𝑥,𝑦 . (2) We then represented the 5 measured input beam distributions according to (1) and found that the error quickly dropped and leveled off at ~8 % for N pca =15 after which it decayed very slowly. We decided on a 15 dimensional representation with the first 15 images shown in Fig. 2 serving as the basis vectors for representing arbitrary input beam distributions for HiRES.
Figure 2: The first 30 principal components of the input electron beam distribution are shown. The most dominant modes are n=1 to n=15, after which the components are seen to represent very fine details.
Next we calibrated a General Particle Tracer (GPT) model with the aid of a generative neural network surrogate model (SM) to quickly (ms) map parameter settings to output beam distributions [15,16]. The SM enabled a high dimensional parameter search (including the beam's average (x p ,y p ,z,E) values) to match model predictions to measured beam data. The calibrated GPT model was then used to generate 51 thousand pairs (X i , y i ), with each y i = (α i,1 ,…,α i,15 ) representing 15 PCA coefficients sampled from a normal distribution to generate a random input beam distribution Î i as in (1), and X i being the 51×51 image of the (x,y) output beam distribution at the first view screen of HiRES, generated by the GPT model using Î i as input. We then trained a convolutional neural network (CNN) (using 98 % of the image/PCA component pairs for training) to map output beam distributions to the PCA components of their associated input beam distributions, as shown in Figure 3 (A, B). The CNN was tested on 1000 unseen pairs and had an average error as defined in (2) of 1.88 % with a standard deviation of 0.56 %. Next, we used an experimentally measured output beam distribution as the input to the CNN and were able to predict the corresponding measured input beam distribution with an accuracy of 14.05 %. inally, the CNN's prediction was fine-tuned using an iterative adaptive feedback approach with each new input distribution fed into GPT giving a new output distribution, as shown in Fig. 3 (C, D, E, B), resulting in prediction accuracy of 9.69 %. A detailed view of adaptive CNN predictions is shown in Figure 4. Figure 3: A: The measured beam output is fed into the CNN of convolutional layers with LeakyReLU activation functions followed by dense layers with ReLU activations. The CNN outputs principal component coefficients (a ,...,a N ) which generate an input beam distribution (B) used as an input to the simulation (C). D: Simulation output is compared to the measured beam output to adaptively fine tune both the principal components (E) and model parameters (F). The adaptive method used was extremum seeking (ES) for high dimensional dynamic systems of the form 𝒙̇ = 𝒇(𝒙(𝒑, 𝑡), 𝒑, 𝑡), 𝑦̂ = 𝑦(𝒙(𝒑, 𝑡), 𝒑, 𝑡) + 𝑛(𝑡), (3) where f is an analytically unknown time-varying nonlinear function, x are measurable system values, p are adjustable parameters, and ŷ is a noisy measurement of an analytically unknown function y that can be maximized by adjusting the parameters p according to 𝑑𝑝 𝑗 𝑑𝑡 = √2𝛼𝜔 𝑗 cos(𝜔 𝑗 𝑡 + 𝑘𝑦̂), (4) where ω i ≠ ω j for i ≠ j. The term α > 0 is the dithering amplitude and can be increased to escape local minima and k > 0 is feedback gain. For large ω j , the dynamics of (4) are, on average d p /dt = -kα ∇ p y, a gradient descent (for k>0) or ascent (for k<0), with respect to p , of the actual, analytically unknown function y although feedback is based only on the noisy measurements ŷ(t), for details and proofs see [29, 30]. In our approach the dynamic system was HiRES and its GPT-based simulation, the parameters being tuned, p , were the PCA coefficients defining the input beam distribution, and the function being maximized was the structural similarity index (SSIM) between a measured beam distribution ρ m (x,y), and the GPT simulation-based output beam distribution ρ s (x,y), both of which were smoothed with a Gaussian filter of variance 2 and normalized to a range of [0,1]. SSIM is defined 𝑆𝑆𝐼𝑀(𝜌 𝑚 , 𝜌 𝑠 ) = (2𝜇 𝑚 𝜇 𝑠 + 𝑐 )(2𝜎 𝑚𝑠 + 𝑐 )(𝜇 𝑚2 + 𝜇 𝑠2 + 𝑐 )(𝜎 𝑚2 + 𝜎 𝑠2 + 𝑐 ) , (5) here μ m and μ s are mean values of the measured and simulated distributions, σ and σ are their variance, σ ms is the covariance, and c = c = 10 -4 [31]. The SSIM value lies within the range [-1.0, 1.0] with a value of 1.0 for images that are exactly the same. Figure 4: A-E: Comparing input distributions to the CNN predictions reconstructed from the predicted PCA component coefficients. The 5 examples are chosen from the 1000 test distributions which the CNN did not see during training and are showing the CNN’s best prediction (A), worst prediction (E), and a uniformly spaced range between (B-D). F: The CNN’s prediction is shown based on an experimentally measured beam output and compared to the experimentally measured beam input. G: Results of the prediction from F are fine tuned via ES.
It is worth pointing out that due to the choice of using only the first 15 PCA components to represent the input beam, the lowest error our method can achieve when using a real distribution is ~8 % error. Exact reproduction would require all 1800 PCA components. To demonstrate the robust adaptive capability of our approach to simultaneously track both changing accelerator parameters and time-varying initial beam distributions, we performed a study running three sets of GPT simulations. In the first simulation, representing HiRES, an input beam was defined and its PCA components as well as the current of solenoid S1 were changed over time, resulting in a large variation of both the input and output beam. Based on that output beam of the first simulation, the CNN alone was used to predict the PCA components which generated an input to a second simulation to predict the output beam. This second simulation quickly diverged from the first because the system changed from the one that had been used to train the CNN. For a third simulation, the adaptive feedback algorithm (4) was also applied with y as in (5), which adaptively tuned both the PCA components and the current of solenoid S1 in order to track the time-varying output distribution of the first simulation, as shown in Fig. 3 (B-F). Fig. 5 (A) shows the error of the PCA component predictions and demonstrates that the CNN + ES setup was able to simultaneously track both the time-varying solenoid current and the PCA omponents. Fig. 5 (B) compares the SSIM of the output beams with and without ES for tracking, (C) shows the percent error (2), and Fig. 6 (A, B) compares the target output beam and input beam with and without using ES. Using the adaptively tracked input distribution and S1 setting we simulated a 1.6 pC bunch with 3D space charge and compared the 6D phase space to that which would have been generated by the exact correct values as well as to the CNN alone in Fig. 6C, showing an almost exact match of the 6D phase space.
Figure 5: A: PCA component tracking error shown with CNN alone and with CNN+ES, in the latter case the solenoid strength is also tracked. B: CNN+ES tracks the output beam. C: Output and input beam distribution errors continuously minimized.
Finally, to demonstrate the ability of the approach in Fig. 5 to track time-varying beamline paramters, we generated an artificial quantum efficiency map
QE(x,y) , and used the measured laser intensity
I(x,y) to construct a beam distribution ρ = I × QE which was used as input to the GPT model. The output was fed into the CNN whose initial guess of ρ was then adaptively fine tuned. The QE of the cathode was then reconstructed as QE(x,y) = ρ(x,y) / I(x,y) , as shown in Fig. 7.
Figure 6: A: The output beam distribution prediction adaptively tracked by tuning solenoid current and PCA components. B: Input beam distribution is predicted with high accuracy. C: Resulting accurate 6D phase space prediction.
Figure 7: A: Schematic of laser image, quantum efficiency, and the resulting beam density. B, C: Target QE and beam density compared to adaptive reconstructions. e have demonstrated a method for adaptively tracking the time varying input beam distribution of a particle accelerator and the associated QE when a virtual laser image measurement is available. Our general approach can be applied at accelerators around the world to enable physics models to accurately map distributions between various locations to enable adaptive tuning. For example TCAV-based LPS measurements can be used to reconstruct the initial (z,E) LPS. Such model-based non-invasive diagnostics are possible because the rich physics in models are strong constraints on the dynamic maps between accelerator distributions, and therefore given sufficiently rich measurements it is very unlikely to achieve a close match at one section of an accelerator without uniquely reproducing the correct distribution of another. Our approach can be applied at accelerators around the world to map distributions between various locations to enable adaptive tuning. For example TCAV-based LPS measurements can be used to reconstruct the initial (z,E) LPS. Our method can be extended to 3D distributions ρ ( r ) using 3D PCA components, spherical harmonics defining star-convex distribution boundaries 𝜕ρ (θ,φ), or general distributions from radial basis functions: ∂𝜌(θ, φ) = ∑ 𝛼 𝑙𝑚 𝑌 𝑙𝑚 (θ, φ), ρ(𝒓) = ∑ 𝛼 𝑛 𝑒 −(𝒓−𝒓 𝑐,𝑛 ) /𝜎 𝑛2 𝑛 . (6) 𝑙,𝑚 This material is based upon work supported by the U.S. Department of Energy (DOE), Office of Science, Office of High Energy Physics under contract number 89233218CNA000001 and DE-AC02-05CH11231. F.C. acknowledge support from NSF PHY-1549132, Center for Bright Beams. D.F and S.P. acknowledge support for Machine learning studies at HiRES by the Laboratory Directed Research and Development program of LBNL under U.S. DOE Contract DE-AC02-05CH11231.