[PDF] Autonomous Materials Discovery Driven by Gaussian Process Regression with Inhomogeneous Measurement Noise and Anisotropic Kernels

Abstract

A majority of experimental disciplines face the challenge of exploring large and high-dimensional parameter spaces in search of new scientific discoveries. Materials science is no exception; the wide variety of synthesis, processing, and environmental conditions that influence material properties gives rise to particularly vast parameter spaces. Recent advances have led to an increase in efficiency of materials discovery by increasingly automating the exploration processes. Methods for autonomous experimentation have become more sophisticated recently, allowing for multi-dimensional parameter spaces to be explored efficiently and with minimal human intervention, thereby liberating the scientists to focus on interpretations and big-picture decisions. Gaussian process regression (GPR) techniques have emerged as the method of choice for steering many classes of experiments. We have recently demonstrated the positive impact of GPR-driven decision-making algorithms on autonomously steering experiments at a synchrotron beamline. However, due to the complexity of the experiments, GPR often cannot be used in its most basic form, but rather has to be tuned to account for the special requirements of the experiments. Two requirements seem to be of particular importance, namely inhomogeneous measurement noise (input dependent or non-i.i.d.) and anisotropic kernel functions, which are the two concepts that we tackle in this paper. Our synthetic and experimental tests demonstrate the importance of both concepts for experiments in materials science and the benefits that result from including them in the autonomous decision-making process.

Full PDF

AAutonomous Materials Discovery Driven by GaussianProcess Regression with Inhomogeneous Measurement Noiseand Anisotropic Kernels

Marcus Noack* , Gregory S. Doerk , Ruipeng Li , Jason K. Streit , Richard A. Vaia ,Kevin G. Yager , and Masafumi Fukuto The Center for Advanced Mathematics for Energy Research Applications (CAMERA),Lawrence Berkeley National Laboratory, Berkeley, CA 94720 Center for Functional Nanomaterials, Brookhaven National Laboratory, Upton, NY11973 National Synchrotron Light Source II, Brookhaven National Laboratory, Upton, NY11973 Materials and Manufacturing Directorate, Air Force Research Laboratories, WPAFB,OH 45433 [email protected] 5, 2020

Abstract

A majority of experimental disciplines face the challenge of exploring large and high-dimensionalparameter spaces in search of new scientiﬁc discoveries. Materials science is no exception; the wide va-riety of synthesis, processing, and environmental conditions that inﬂuence material properties gives riseto particularly vast parameter spaces. Recent advances have led to an increase in efﬁciency of materialsdiscovery by increasingly automating the exploration processes. Methods for autonomous experimen-tation have become more sophisticated recently, allowing for multi-dimensional parameter spaces to beexplored efﬁciently and with minimal human intervention, thereby liberating the scientists to focus oninterpretations and big-picture decisions. Gaussian process regression (GPR) techniques have emergedas the method of choice for steering many classes of experiments. We have recently demonstrated thepositive impact of GPR-driven decision-making algorithms on autonomously steering experiments at asynchrotron beamline. However, due to the complexity of the experiments, GPR often cannot be used inits most basic form, but rather has to be tuned to account for the special requirements of the experiments.Two requirements seem to be of particular importance, namely inhomogeneous measurement noise (inputdependent or non-i.i.d.) and anisotropic kernel functions, which are the two concepts that we tackle in thispaper. Our synthetic and experimental tests demonstrate the importance of both concepts for experimentsin materials science and the beneﬁts that result from including them in the autonomous decision-makingprocess.

Artiﬁcial intelligence and machine learning are transforming many areas of experimental science. Whilemost techniques focus on analyzing “big data” sets, which are comprised of redundant information, collect-ing smaller but information-rich data sets has become equally important. Brute-force data collection leadsto tremendous inefﬁciencies in the utilization of experimental facilities and instruments, in data analysisand data storage; large experimental facilities around the globe are running at 10 to 20 percent utilizationand are still spending millions of dollars each year to keep up with the increase in the amount of datastorage needed [16, 14, 1, 35]. In addition, conventional experiments require scientists to prepare samplesand directly control experiments, which leads to highly-trained researchers spending signiﬁcant effort onmicromanaging experimental tasks rather than thinking about scientiﬁc meaning. To avoid this problem,autonomously steered experiments are emerging in many disciplines. These techniques place measure-ments only where they can contribute optimally to the overall knowledge gain. Measurements that collect1 a r X i v : . [ phy s i c s . c o m p - ph ] J un edundant information are avoided. These autonomous approaches minimize the number of needed mea-surements to reach a certain model conﬁdence, thus optimizing the utilization of experimental, computingand data-storage facilities.A universal goal in materials science is to explore the characteristics of a given material across theset of all conceivable combinations of experimental parameters, which can be thought of as a parameterspace deﬁning that class of materials. The experimental parameters can be the characteristics of materialcomponents, their composition, processing or synthesis parameters, and environmental conditions on whichthe experimental outcomes depend [29, 19]. Successful exploration of the parameter space amounts to beingable to deﬁne a high-conﬁdence map, i.e. a surrogate model function, of experimental outcomes across allelements of the set. For two-dimensional parameter spaces, this is traditionally achieved by “scanning” thespace, often on a simple Cartesian grid. Selecting a scanning strategy implies picking a scan resolutionwithout knowing the model function. When the parameter space is high-dimensional, an approach basedon intuition is often used, i.e., manually selecting measurements, assessing trends and patterns in the data,and selecting follow-up measurements. With increasing dimensionality of the parameter space, this methodquickly fails to efﬁciently explore the space and becomes prone to bias. Needless to say, the human brainis generally poorly equipped for high-dimensional pattern recognition.What is needed are methods that decouple the human from the measurement selection process. Thisfact served as a motivation to establish a research ﬁeld called design of experiment (DOE) [9], whichcan be traced back as far as the late 1800s. These DOE methods are largely geometrical, independentof the measurement outcomes and are concerned with efﬁciently exploring the entire parameter space.The latin-hyper-cube method is the prime example of this class of methods [26, 11]. Most of the recentapproaches to steer experiments are part of a ﬁeld called active learning, which is based on machine learningtechniques [29, 32, 15, 2]. Others have used deep neural networks to make data acquisition cheaper [6].Many techniques originated from image analysis [15, 24], but, as images are traditionally two or threedimensional, these methods rarely scale efﬁciently to high-dimensional spaces. A useful collection ofmethods can be found in Santner et al. [31] and Forrester et al. [12].Gaussian process regression (GPR) is a particularly successful technique to steer experiments autonomously[27, 17]. The success of GPR in steering experiments is due to its non-parametric nature; simply speaking,the more data that is gathered the more complicated the model function can become. The number of param-eters of the function, and therefore its complexity, does not have to be deﬁned a priori. This is in contrastto neural networks, which need a speciﬁcation of an architecture (number of layers, layer width, activationfunction) beforehand. GPR also naturally includes uncertainty quantiﬁcation, which is an absolute necessityin experimental sciences. However, traditional GPR has mostly been derived and applied under an assump-tion of independent and identically distributed noise (i.i.d. noise) [17, 37, 18, 33, 25, 34, 3], i.e., noisethat follows the same probability density function at each measurement point. Since we are exclusivelydealing with Gaussian statistics, this means that all measurements have the same variance. In Kriging, thegeo-statistical analogue of GPR, this concept is called the nugget effect, named after gold nuggets in thesub-surface. In early geo-statistical computations, the gold nuggets lead to seemingly random errors. Thesewere assumed to be constant across the domain. However, for materials-discovery experiments the assump-tion of i.i.d. noise is an unacceptable simpliﬁcation. The variance of real experimental measurements varygreatly across the parameter space, and this has to be reﬂected in the steering process as well as in the ﬁnalmodel creation. For instance, in x-ray scattering experiments, the variance of a raw measurement dependsstrongly on the exposure time, computed quantities can have wildly different variances depending on thedata in that part of the space (e.g. ﬁt quality will not be uniform), and material heterogeneity will dependstrongly on location within the parameter space. These inhomogeneities in the measurement noise need tobe actively included in the ﬁnal model to avoid interpolation mistakes and consequently erroneous models.Fortunately, non-i.i.d. noise can easily be included in the GPR framework [5, 20]. Large variances haveto be countered with more measurements in the respective areas until a desired uncertainty threshold isreached. When creating the ﬁnal model, the algorithm has to incorporate that the ﬁnal model function doesnot have to explain data points exactly if there is an associated variance. Therefore, the model function doesnot have to pass through every data point. After correct tuning, GPR is perfectly equipped for this situationsince it keeps track of a probability distribution over all possible model functions; conditioning will thenproduce the most likely model function incorporating all measurement variances optimally.Another effect that has a signiﬁcant impact on autonomous experiments is anisotropy of the parameterspace, which is either introduced by differing parameter ranges or different model variability in differentparameter-space directions. In isotropic GPR one ﬁnds a single characteristic length scale for the dataset. This was again motivated by early geo-statistical surveys in which isotropy was a good assumption.However, when one of the parameters is of signiﬁcantly different magnitude, for instance spatial directions2n mm ∈ [0 , versus a temperature in ◦ C ∈ [5 , , we should ﬁnd different length scales for differentdirections of the parameter space. Also, there might be different differentiability characteristics in differentdirections. It is therefore vitally important to give the model the ﬂexibility to account for those varyingfeatures. This can either be done by using an altered Euclidean norm or by employing different normsthat provide more ﬂexibility of distance measures in different directions. The general idea including theconcepts proposed in this paper are visualized in Figure 1.This paper is organized as follows: First, we introduce the traditional theory of Gaussian process re-gression with i.i.d. noise and standard isotropic kernel functions. Second, we make formal changes to thetheory to include non-i.i.d. noise and anisotropy. Third, we demonstrate the impact of the two concepts onsynthetic experiments. Fourth, we present a synchrotron beamline experiment that exploited both conceptsin autonomous control. We deﬁne the parameter space

X ⊂ R n , which serves as the index set or input space in the scope ofGaussian process regression and elements x ∈ X . We deﬁne four functions over X . First, the latentfunction f = f ( x ) can be interpreted as the inaccessible ground truth. Second, the often noisy mea-surements are described by y = y ( x ) : X → R d . To simplify the derivation, we assume d = 1 ;allowing for d > is a straightforward extension. Third, the surrogate model function is then deﬁned as ρ = ρ ( x ) : X → R . Fourth, the posterior mean function m ( x ) , which is often assumed to equal thesurrogate model, i.e., m ( x ) = ρ ( x ) , but this is not necessarily the case. We also deﬁne a second space, aHilbert space H ⊂ R N × R N × R J , with elements [ f y f ] T , where N is the number of data points, J isthe number of points at which we want to predict the model function value, y are the measurement values, f is the vector of unknown latent function evaluations and f is the vector of predicted function values ata set of positions. Note that scalar functions over X , e.g. f ( x ) , are vectors (bold typeface) in the Hilbertspace H , e.g. f . We also deﬁne a function p over our Hilbert space which is just the function value of theGaussian probability density functions involved. For more explanation on the distinction between the twospaces and the functions involved see Figure 2. Deﬁning a GP regression model from data D = { ( x , y ) , ..., ( x N , y N ) } , where y i = f ( x i ) + (cid:15) ( x i ) , isaccomplished in a GP regression framework, by deﬁning a Gaussian probability density function, called theprior, as p ( f ) = 1 (cid:112) (2 π ) dim | K | exp (cid:20) −

12 ( f − µµµ ) T K − ( f − µµµ ) (cid:21) , (1)and a likelihood p ( y ) = 1 (cid:112) (2 π ) dim σ exp (cid:20) − σ ( y − f ) T ( y − f ) (cid:21) , (2)where µµµ = [ µ ( x ) , ..., µ ( x N )] T is the mean of the prior Gaussian probability density function (not to beconfused with the posterior mean function m ( x ) ). The prior mean can be understood as the position ofthe Gaussian. f = [ f ( x ) , ..., f ( x N )] T , K ij = k ( φ, x i , x j ); x ∈ X is the covariance of the Gaussianprocess, with its covariance function, often referred to as the kernel, k ( φ, x i , x j ) , where φ are the hyperparameters, and where σ is the variance of the i.i.d. observation noise. The problem here is that, in practice,the i.i.d. noise restriction rarely holds in experimental sciences, which is one of the issues to be addressedin this paper. The kernel k is a symmetric and positive semi-deﬁnite function, such that k : X × X → R .As a reminder, X is our parameter space and often referred to as index set or input space in the literature.A well-known choice [37] is the Mat´ern kernel class deﬁned by k ( x i , x j ) = σ s (1 − ν ) Γ( ν ) (cid:32) √ ν rl (cid:33) ν B ν (cid:32) √ ν rl (cid:33) , (3)where B ν is the Bessel function of second kind, Γ is the gamma function, σ s is the signal variance, l isthe length scale, r = || x i − x j || l is the Euclidean distance between input points and ν is a parameter3-ray beam data acquisition devicee.g. a synchrotron light sourcesample f ( x ) surrogate ρ ( x ) detector image automated dataanalysis modelcomputationuncertaintyquantiﬁcation objectivefunctionfunctionoptimizationcommunicatemeasurementrequests Figure 1: Schematic of an autonomous experiment. The data acquisition device in this example is a beam-line at a synchrotron light source. The measurement result depends on parameters x . The raw data is thensent through an automated data processing and analysis pipeline. From the analyzed data, the autonomous-experiment algorithm creates a surrogate model and an uncertainty function whose maxima represent pointsof high-value measurements; they are found by employing function optimization tools. The new measure-ment parameters x are then communicated to the data acquisition device and the loop starts over. The maincontribution of the present work is that the model computation and uncertainty quantiﬁcation account forthe anisotropic nature of the model function and the input-dependent (non-i.i.d.) measurement noise. Thesurrogate model (bottom) shows how the model function is evolving as the experiment is steered and moredata ( N ) is collected. The red dots indicate the positions of the measurements and their size representsthe varying associated measurement variances. The numbers l x and l y indicate the anisotropic correlationlengths that the algorithm ﬁnds by maximizing a log-likelihood function. The ellipses show the foundanisotropy visually. The take-home message for the practitioner here is that the method will ﬁnd the mostlikely model function given all collected data with their variances. The model function will not pass directlythrough the points but ﬁnd the most likely shape given all available information.4 a) (b) Figure 2: Figure emphasizing the distinction between the spaces and functions involved in the derivation.(a) A function over X . This can be the surrogate model ρ ( x ) , the latent function f ( x ) to be approximatedthrough an experiment, the function describing the measurements y ( x ) or the predictive mean function m ( x ) . x and x are two experimentally controlled parameters (e.g., synthesis, processing or environmentalconditions) that the measurement outcomes potentially depend on. (b) The Gaussian probability densityfunction over H which gives GPR its name. For noise-free measurements, y = f at measurement points,meaning that we can directly observe the model function. Generally this is not the case and the observations y are corrupted by input dependent (non-i.i.d) noise.that controls the differentiability characteristics of the kernel and therefore of the ﬁnal model function. Thewell-known exponential and squared exponential kernels are special cases of the Mat´ern kernels. The signalvariance σ s and the length scale l are hyper parameters ( φ ) that are found by maximizing the log-likelihood,i.e., solving arg max φ µ (cid:16) log( L ( D ; φ, µ )) (cid:17) (4)where log( L ( D ; φφφ, µ ( x ))) = (5) −

12 ( y − µµµ ( x ))( K ( φ ) + σ I ) − ( y − µµµ ( x )) −

12 log( | K ( φ ) + σ I | ) − dim( y )2 log(2 π ) , where I is the identity matrix. In the isotropic case, we only have to optimize for one signal varianceand one length scale (per kernel function). The mean function µ ( x ) is often assumed to be constant andtherefore does not have to be part of the optimization. The mean function assigns the location of the prior in H to any x ∈ X . The mean function can therefore be used to communicate prior knowledge (for instancephysics knowledge) to the Gaussian process. Provided some hyper parameters, the joint prior is given as p ( f , f ) = 1 (cid:112) (2 π ) dim | Σ | exp (cid:34) − (cid:16) (cid:20) f − µµµ f − µµµ (cid:21) T Σ − (cid:20) f − µµµ f − µµµ (cid:21) (cid:17)(cid:35) , (6)where Σ = (cid:18) K κκκκκκ T KKK (cid:19) , (7)where κκκ i = k ( φ, x , x i ) , KKK = k ( φ, x , x ) and, as a reminder, K ij = k ( φ, x i , x j ) . Intuitively speaking, Σ , K and k are all measures of similarity between measurement results y ( x ) of the input space. While K stores this similarity between all data points, Σ stores the similarity between all data points and all unknownpoints of interest, and κ contains the similarity only between the unknown y ( x ) of interest. k contains theinstruction on how to calculate this similarity. The reader might wonder: “how do we ﬁnd the similarity5etween unknown points of interest?” The answer lies in the formulation of the kernels that calculate thesimilarity just by knowing locations x ∈ X and not the function evaluations y ( x ) . x is the point wherewe want to estimate the mean and the variance. Note here that, with only slight adaption of the equation,we are able to compute the mean and variance for several points of interest.The predictive distribution is deﬁned as p ( f | y ) = (cid:90) R N p ( f | f , y ) p ( f , y ) d f ∝ N ( µµµ + κκκ T ( K + σ I ) − ( y − µµµ ) , KKK − κκκ T ( K + σ I ) − κκκ ) (8)and the predictive mean and the predictive variance are therefore respectively deﬁned as m ( x ) = µµµ + k T ( K + σ I ) − ( y − µµµ ) (9) σ ( x ) = k ( x , x ) − k T ( K + σ I ) − k , (10)which are the posterior mean and variance at x , respectively. N ( · , · ) stands for the normal (Gaussian)distribution with a given mean and covariance. To incorporate non-i.i.d. observation noise one can redeﬁne the likelihood (2) as p ( y ) = 1 (cid:112) (2 π ) k | V | exp (cid:20) −

12 ( y − f ) T V − ( y − f ) (cid:21) , (11)where V is a diagonal matrix containing the respective measurement variances. The matrix V can also havenon-diagonal entries if the measurement noise happens to be correlated. We will only discuss non-correlatedmeasurement noise.From equations (6) and (11), we can calculate equation (8), i.e., the predictive probability distributionfor a measurement outcome at x , given the data set. The mean and variance of this distribution are m ( x ) = µµµ + k T ( K + V ) − ( y − µµµ ) (12) σ ( x ) = k ( x , x ) − k T ( K + V ) − k , (13)respectively. Note here, that the matrix of the measurement errors V replaces the matrix σ I in equations(9) and (10). However, this does not follow from a simple substitution, but from a signiﬁcantly differentderivation. The log-likelihood (15) changes accordingly, yielding log( L ( D ; φφφ, µ ( x ))) = (14) −

12 ( y − µµµ )( K ( φ ) + V ) − ( y − µµµ ) −

12 log( | K ( φ ) + V | ) − dim( y )2 log(2 π ) . This concludes the derivation of GPR with non-i.i.d. observation noise. Figure 3 illustrates the effect ofdifferent kinds of noise on an one-dimensional model function. As we can see, while some details of thederivation change when we account for inhomogeneous (also known as input dependent or non-i.i.d) noise,the resulting equation are very similar and the computation exhibits no extra costs.

For parameter spaces X that are anisotropic, i.e., where different directions have different characteristiccorrelation length, we can redeﬁne the kernel function to incorporate different length scales in differentdirections. One way of doing this for axial anisotropy is by choosing the l norm as distance measure andredeﬁne the kernel function as k ( x m , x n ) = σ s d (cid:89) i k i ( x mi − x ni ; φ i ) , (15)6 a) (b) (c) Figure 3: Three one-dimensional examples with (a) no noise, (b) i.i.d. noise and (c) non-i.i.d. noise, respec-tively. For the no-noise case, the model has to explain the data exactly. In the i.i.d. noise-case, the algorithmis free to choose a model that does not explain the data exactly but allows for a constant measurement vari-ance. In the non-i.i.d. noise case, the algorithm ﬁnds the most likely model given varying variances acrossthe data set. Note the vertical axis labels; y ( x ) are the measurement outcomes, m ( x ) is the mean function,i.e., the most likely model, ρ ( x ) is the surrogate model, often assumed to equal the mean function and f ( x ) is the ”ground truth” latent function.where the superscripts m, n mean point labels, the subscript i means different directions in X and d = dim ( X ) .Deﬁning a kernel per direction gives us the ﬂexibility to enforce different orders of differentiability in differ-ent directions of X . The main beneﬁt, however, is the possibility to deﬁne different length scales in differentdirections of X (see Figure 4). Unfortunately, the choice of the l norm can lead to a very recognizablecheckerboard pattern in the surrogate model, but the predictive power of the associated variance function issigniﬁcantly improved compared to the isotropic case.A second way, which avoids the checkerboard pattern in the model but does not allow different kernelsin different direction, is to redeﬁne the distances in X as r = √ x T M x , (16)where M is any symmetric positive semi-deﬁnite matrix playing the role of a metric tensor [36]. Thisis just the Euclidean distance in a transformed metric space. In the actual kernel functions, any r/l canthen be replaced by the new equation for the metric. We will here only consider axis-aligned anisotropywhich means the matrix M is a diagonal matrix with the inverse of the length scales on its diagonal. Theextension to general forms of anisotropy is straightforward but needs a more costly likelihood optimizationsince more hyper parameters have to be found. The rest of the theoretical treatment, however, remainsunchanged. The mean function µ ( x ) , the hyper parameters φ i and the signal variance σ s are again foundby maximizing the marginal log-likelihood (15). The associated optimization tries to ﬁnd a maximum of afunction that is deﬁned over R d +1 , if we ignore the mean function as it is commonly done. We thereforehave to ﬁnd d + 1 parameters which adds a signiﬁcant computational cost. If M is not diagonal we haveto maximize the log-likelihood over R ( d − N ) / . However, the optimization can be performed in parallelto computing the posterior variance, which can hide the computational effort. It is important to note thataccounting for anisotropy can make the training of the algorithm, i.e. the optimization of the log-likelihood,signiﬁcantly more costly. The extent of this depends on the kind of anisotropy considered. As we shall see,taking anisotropy into account leads to more efﬁcient steering and a higher-quality ﬁnal result, and is thusgenerally worth the additional computational cost. Our synthetic tests are carefully chosen to demonstrate the beneﬁts of the two concepts under discussion,namely: non-i.i.d. observation noise and anisotropic kernels. To demonstrate the importance of includingnon-i.i.d. observation noise into the analysis, we consider a synthetic test based on actual physics which weused in previous work to showcase the functionality of past algorithms [27]. We are choosing an examplegiven in a closed form, because it provides a noise-free “ground truth” that we can compare to, whereasexperimental data would inevitably include unknown errors. To showcase the importance of anisotropickernels as part of the analysis, we provide an high-dimensional example based on a simulation of a materialthat is subject to a varying thermal history. 7igure 4: Model function with different length scales and different orders of differentiability in differentdirections. In x direction we have assumed that the model function is not differentiable. Therefore weused the exponential kernel. In x direction, the model can be differentiated an inﬁnite number of times.We therefore chose the squared exponential kernel. For other orders of differentiability, other kernels canbe used. Fixing the order of differentiability also gives the user the ability to incorporate domain knowledgeinto the experiment.The shown synthetic tests explore spaces of very different dimensionality. There is no theoretical lim-itation to the dimensionality of the parameter space. Indeed the autonomous methods described herein aremost advantageous when operating in high-dimensional spaces, since this is where simpler methods—andhuman intuition—typically fail to yield meaningful searches. For this test, we deﬁne a physical “ground truth” model ( f ( x ) ), whose correct function value at x is inac-cessible due to non-i.i.d measurement noise, but can be probed by our simulated experiment through y ( x ) .In this case, we assume that the measurements are subject to Gaussian noise with a standard deviation of of the function value at x . The ground-truth model function is deﬁned to be the diffusion coefﬁcient D = D ( r, T, C m ) for the Brownian motion of nanoparticles in a viscous liquid consisting of a binarymixture of water and glycerol: D = k B T πµr , (17)where k B is Bolzmann’s constant, r ∈ [1 , is the nanoparticle radius, T ∈ [0 , ◦ C isthe temperature and µ = µ ( T, C m ) is the viscosity as given by [8], where C m ∈ [0 . , .

0] % is theglycerol mass fraction. This model was used in [27] to show the functionality of Kriging based autonomousexperiments. The experiment device has no direct access to the ground truth model, but adds an unavoidablenoise level, i.e., D = k B T πµr + (cid:15) ( T, C m , r ) , (18)To demonstrate the importance of the noise model, we ﬁrst ignore noise (cid:15) , then approximate it assumingi.i.d. noise, and ﬁnally model it allowing for non-i.i.d. noise. Figure 5 shows the results after 500 measure-ments, and a comparison to the (inaccessible) ground truth. Figure 6 compares the decrease in the error,in form of the Euclidean distance between the models and the ground truth, with increasing number ofmeasurements N , between the three different types of noise.8he results show that treating noise as i.i.d. or even non-existent can lead to artifacts in the surrogatemodel. Additionally, the discrepancy between the ground truth and the surrogate mode is reduced far moreefﬁciently if non-i.i.d. noise is accounted for. non i . i . d . noisei . i . d noiseno noise r = n m r = n m r = n m ◦ C T ◦ C % C m % log ( D ) [ m s ] Figure 5: The result of the diffusion-coefﬁcient example on a three-dimensional input space. The ﬁgureshows the result of the GP approximation after 500 measurements for three different nanoparticle radii.While the measurement results are always subject to differing noise, the model can take noise into accountin different ways. Most commonly noise is ignored (left column). If noise is included, it is common toapproximate it by i.i.d. noise (middle column). The proposed method models the noise as what it is, whichis non-i.i.d. noise (right column). The iso-lines of the approximation are shown in white while the iso-lines of the ground truth are shown in red. Observe how the no-noise and the i.i.d. noise approximationscreate localized artifacts. The non-i.i.d. approximation does a far better job of creating a smooth model thatexplains all data including noise.

Allowing anisotropy can increase efﬁciency of autonomous experiments signiﬁcantly for any dimensional-ity of the underlying parameter space. However, as the dimensionality of the parameter space increases, theimportance of anisotropy increases substantially, purely due to the number of directions in which anisotropycan occur. To demonstrate this link, we simulated an experiment where a material is subjected to a varyingthermal history. That is, the experiment consists of repeatedly changing the temperature, and taking mea-surements along this time-series of different temperatures. The temperature at each time step can be thoughtof as one of the dimensions of the parameter space. The full set of possible applied thermal histories thusbecome points in the high-dimensional parameter space of temperatures.9

100 200 300 400 500number of measurements468101214 E u c li d e a n D i s t a n c e no noisei.i.d. noisenon i.i.d. noise Figure 6: The approximation errors of the surrogate model during the diffusion-coefﬁcient example (Figure5), for three different noise models noted in the legend. The bands around each line represent the standarddeviation of this error metric computed by running repeated synthetic experiments.In particular, we consider the ordering of a block copolymer, which is a self-assembling material thatspontaneously organizes into a well-deﬁned morphology when thermally annealed [10]. The material orga-nizes into a deﬁned unit cell locally, with ordered grains subsequently growing in size as defects annihilate[23]. We use a simple model to describe this grain coarsening process, where the grain size ξ increases withtime according to a power-law ξ = kt α , (19)where α is a scaling exponent (set to . for our simulations) and the prefactor k captures the temperature-dependent kinetics k = Ae − E a /k B T . (20)Here, E a is an activation energy for coarsening (we select a typical value of E a = 100 kJ / mol ), and theprefactor A sets the overall scale of the kinetics (set to × nm / s α ). From these equations we constructan instantaneous growth-rate of the form: d ξ d t = k /α ξ − /α . (21)Block copolymers are known to have a order-disorder transition temperature ( T ODT ) above which thermalenergy overcomes the material’s segregation strength, and thus the nanoscale morphology disappears infavor of a homogeneous disordered phase. Heating beyond T ODT thus implies driving ξ to zero. Wedescribe this ‘grain dissolution’ process using an ad-hoc form of: d ξ d t = − k diss ( T − T ODT ) , (22)where we set k diss = 1 . − K − and T ODT = 350 ◦ C . We also apply ad-hoc suppresion of kineticsnear T ODT and when grain sizes are very large to account for experimentally-observed effects. Overall, thissimple model describes a system wherein grains coarsen with time and temperature, but shrink in size if10emperature is raised too high. The parameter space deﬁned by a sequence of temperatures will thus exhibitregions of high or low grain size depending on the thermal history described by that point; moreover there isnon-trivial coupling between these parameters since the grain size obtained for a given step of the annealing(i.e. a given direction in the parameter space) sets the starting-point for coarsening in the next step (i.e. thenext direction of the parameter space).We select thermal histories consisting of 11 temperature selections (temperature is updated every ),which thus deﬁnes an 11-dimensional parameter space for exploration. Each temperature history deﬁnesa point ( x ∈ X ) within the 11-dimensional input space. As can be seen in Figure 7(a), the majority ofthermal histories one might select terminate in a relatively small grain size (blue lines in ﬁgure). This can beeasily understood since a randomly-selected annealing protocol will use temperatures that are either too low(slow coarsening) or too high ( T > T

ODT drives into disordered state). Only a subset of possible historiesterminate with a large grain size (dark, less transparent lines in Figure 7), corresponding to the judiciouschoice of annealing history that uses large temperatures without crossing ODT. While this conclusion isobvious in retrospect, in the exploration of a new material system (e.g. for which the value of materialproperties like T ODT are not known), identifying such trends is non-trivial. Representative slices throughthe 11-dimensional parameter space (Fig. 7(b) and (c)) further emphasize the complexity of the searchproblem, especially emphasizing the anisotropy of the problem. That is, different steps in the annealingprotocol have different effects on coarsening; correspondingly the different directions in the parameter spacehave different characteristic length scales that must be correctly modeled (even though every direction isconceptually similar in that it describes a thermal annealing process).Autonomous exploration of this parameter space enables the construction of a model for this coarseningprocess. Moreover, the inclusion of anisotropy markedly improves the search efﬁciency, reducing the modelerror more rapidly than when using a simpler isotropic kernel (Fig. 7(d)). As the dimensionality of theproblem and the complexity of the physical model increase, the utility of including an anisotropic kernelincreases further still.

The proposed GP-driven decision-making algorithm that takes into account non-i.i.d. observation noise andanisotropy has been used successfully in autonomous synchrotron experiments. Here we present, as anillustrative example, the results of an autonomous x-ray scattering experiment on a polymer-grafted goldnanorod thin ﬁlm, where a combinatorial sample library was used to explore the effects of ﬁlm fabricationparameters on self-assembled nanoscale structure.Unlike traditional short ligand coated particles, polymer-grafted nanoparticles (PGNs) are stabilizedby high molecular weight polymers at relatively low grafting densities. As a result, PGNs behave as softcolloids, possessing the favorable processing behavior of polymer systems while still retaining the ability topack into ordered assemblies [7]. Although this makes PGNs well suited to traditional approaches for thin-ﬁlm fabrication, the nanoscale assembly of these materials is inherently complex, depending on a numberof variables including, but not limited to particle-particle interactions, particle-substrate interactions, andprocess methodology.The combinatorial PGN ﬁlm sample was fabricated at the Air Force Research Laboratory. A ﬂow-coating method [7] was used to deposit a thin PGN ﬁlm on a surface-treated substrate where gradients incoating velocity and substrate surface energy were imposed along two orthogonal directions over the ﬁlmsurface. A 250 nM toluene solution of 53 kDa polystyrene-grafted gold nanorods (94% polystyrene byvolume), with nanorod dimensions of ± nm in length and . ± . nm in diameter (based on TEManalysis), was cast onto a functionalized glass coverslip using a motorized coating blade. The resultingﬁlm covered a rectangular area of dimensions 50 mm ×

60 mm. The surface energy gradient on the glasscoverslip was generated through the vapor deposition of phenylsilane [13]. The substrate surface energyvaried linearly along the x direction from 30.5 mN/m (hydrophobic) at one edge of the ﬁlm ( x = 0 ) to70.2 mN/m (hydrophilic) at the other edge ( x = 50 mm). Along the y direction, the ﬁlm-casting speedincreased from 0 mm/s (at y = 0 ) to 0.5 mm/s ( y = 60 mm) at a constant acceleration of 0.002 mm/s . Theﬁlm-casting condition corresponds to the evaporative regime where solvent evaporation occurs at similartimescales to that of solid ﬁlm formation [4]. In this regime, solvent evaporation at the meniscus inducesa convective ﬂow, driving the PGNs to concentrate and assemble at the contact line. The ﬁlm thicknessdecreased with increasing coating speed, resulting in transitions from multilayers through a monolayer to asub-monolayer with increasing y . This was veriﬁed by optical microscopy observations of the boundaries11etween multilayer, bilayer, monolayer and sub-monolayer regions, the last of which were identiﬁed by thepresence of holes in the ﬁlm, typically 1 µ m or greater as seen in the optical images.The autonomous small-angle x-ray scattering (SAXS) experiment was performed at the Complex Mate-rials Scattering (11-BM CMS) beamline at the National Synchrotron Light Source II (NSLS-II), BrookhavenNational Laboratory. As described previously [27, 28], experimental control was coordinated by combin-ing three Python software processes: bluesky [22] for automated sample translations and data collection, SciAnalysis [21] for real-time analysis of newly collected SAXS images, and the above GPR-based opti-mization algorithms for decision-making. The incident x-ray beam was set to a wavelength of 0.918 ˚A (13.5keV x-ray energy) and a size of 0.2 mm × xy translation stages. Transmission SAXS patterns werecollected on an area detector (DECTRIS Pilatus 2M) located at a distance of 5.1 m downstream of the sam-ple, with exposure time of 10 s/image. The SAXS results indicate that the polymer grafted nanorods tend toform ordered domains in which the nanorods lie ﬂat and parallel to the surface and align with their neigh-bors. The ﬁtting of SAXS intensity proﬁles via real-time analysis allowed for the extraction of quantitiessuch as: the scattering-vector position q for the diffraction peak corresponding to the in-plane inter-nanorodspacing d = 2 π/q ; the degree of anisotropy η ∈ [0 , for the in-plane inter-nanorod alignment, where η = 0 for random orientations and η = 1 for perfect alignments [30]; the azimuthal angle χ or the factor cos(2 χ ) for the in-plane orientation of the inter-nanorod alignment; and the grain size ξ of the nanoscaleordered domains, which is inversely proportional to the diffraction peak width and provides a measure ofthe extent of in-plane positional correlations between aligned nanorods. The analysis-derived best-ﬁt valuesand associated variances for these parameters were passed to the GPR decision algorithms.Three analysis-derived quantities ξ , η , and cos(2 χ ) were used as signals to steer the SAXS measure-ments as a function of surface coordinates ( x, y ) . For the initial part of the experiment, N < (ﬁrst 4 h),where N is the number of measurements completed up to a given point in the experiment, the autonomoussteering utilized the exploration mode based on model uncertainty maxima [28] for ξ , η , and cos(2 χ ) . Forthe latter part of the experiment ( ≤ N ≤ or next 11 h), the feature maximization mode [28] wasused for η , while keeping ξ and cos(2 χ ) in the exploration mode. We found that the nanorods in the ordereddomains tended to orient such that their long axes were aligned along the x direction [ cos(2 χ ) ≈ ], i.e.,perpendicular to the coating direction, and that ξ and η are strongly coupled. Figure 8A (top panels) showthe N -dependent evolution of the model for the grain size distribution ξ over the ﬁlm surface. It should benoted that the entire experiment took 15 h, and that the GPR-based autonomous algorithms identiﬁed thehighly ordered regions in the band < y < mm (between red lines in Fig. 8A), corresponding to theuniform monolayer region, within the ﬁrst few hours. By contrast, grid-based scanning-probe transmissionSAXS measurements would not be able to identify large regions of interest at these resolutions in such ashort amount of time.The collected data is corrupted by non-i.i.d. measurement noise. While all signals are corrupted bynoise, we draw attention to the peak position q because it shows the most obvious correlation of non-i.i.d. measurement noise and model certainty. The green circles in Figure 8B (middle panel) and C (rightpanel) highlight the areas where the measurement noise affects the Gaussian-process predictive variancesigniﬁcantly. Note that we have not used q for steering in this case, but the general principle we want to showremains unchanged across all experiment results. Figure 8A shows the time evolution of the explorationof the model and the impact of non-i.i.d. noise on the model but also on the uncertainty. If q had beenused for steering without taking into account non-i.i.d.noise into the analysis, the autonomous experimentwould have been misled because predictive uncertainty due to high noise levels would not have been takeninto account. Figure 8 shows that the next suggested measurement strongly depends on the noise. Wewant to remind the reader at this point that the next optimal measurement happens at the maximum of theGP predictive variance. The locations of the optima (Figure 8C) are clearly different when non-i.i.d. noiseis taken into account. The objective function without measurement noise (Fig. 8C, left panel) shows nopreference for regions of high noise (green circles in Fig. 8B, middle panel), where preference means higherfunction values of the GP predictive variance. In contrast, the variance function that takes measurementnoise into account (Fig. 8C, right panel) gives preference to regions (green circles) where measurementnoise of the data is high. This is a signiﬁcant advantage and can only be accomplished by taking into accountnon-i.i.d. measurement noise. In conclusion, the model that assumes no noise looks better resolved, whichcommunicates a wrong level of conﬁdence and misguides the steering. The model that takes into accountnon-i.i.d. noise ﬁnds the correct most likely model and the corresponding uncertainty. The algorithm alsotook advantage of anisotropy by learning a slightly longer length scale in the x-direction which increased theoverall model certainty. Note that the algorithm used an objective function formulation that put emphasison high-amplitude regions of the parameter space. This led to a higher resolution in areas of interest.12he above autonomous SAXS experiment revealed interesting features from the material fabricationperspective as well. First, a somewhat surprising result is that the grain size is not observed to changesigniﬁcantly with surface energy (Figure 8A). Previous work on the assembly of polystyrene-grafted spher-ical gold nanoparticles [7] demonstrated a signiﬁcant decrease in nanoparticle ordering when fabricatingﬁlms on lower surface energy substrates (greater polymer-substrate interactions). Although the surface en-ergies used in this study are similar, a different silane was used to modify the glass surface (phenylsilane vsoctyltrichlorosilane) which may differ in its interaction with polystyrene. We also note that PGN-substrateinteractions will be sensitive to molecular orientation of the functional groups, which is known to be highlydependent on the functionalization procedure [13]. Second, an unexpected well-ordered band was iden-tiﬁed at < x < mm and y > mm (between blue lines in Figure 8A), corresponding to thesub-monolayer region with an intermediate surface-energy range. We believe that this effect arises from in-stabilities associated with the solution meniscus near the middle of the coating blade ( x ∼ mm). Rapidsolvent evaporation often leads to undesirable effects including the generation of surface tension gradients,Marangoni ﬂows, and subsequent contact line instabilities. This can result in the formation of non-uniformmorphologies as demonstrated by the irregular region of larger grain size centered in the middle of the ﬁlmand spanning the entire velocity range. Further investigations into these issues are currently in progress. In this paper, we have demonstrated the importance of including inhomogeneous (i.e. non-i.i.d.) observationnoise and anisotropy into Gaussian-process-driven autonomous materials-discovery experiments.It is very common in the scientiﬁc community to rely on Gaussian processes that ignore measurementnoise or only include homogeneous noise, i.e. noise that is a constant for every measurement. In experimen-tal sciences, and especially in experimental material sciences, strong inhomogeneity in measurement noisecan be present and only accounting for homogeneous (i.i.d) measurement noise is therefore insufﬁcient andleads to inaccurate models and, in the worst case, wrong interpretations and missed scientiﬁc discoveries.We have shown that it is straightforward to include non-i.i.d noise into the steering and modeling process.Figure 5 undoubtedly shows the beneﬁt of including non-i.i.d measurement noise into the Gaussian pro-cess analysis. Figure 6 supports the conclusion we drew from Figure 5 visually, by showing a faster errordecline.The case for allowing anisotropy in the input space can be made when there is a reason to believe thatdata varies much more strongly in certain direction than in others. This is often the case when the directionshave different fundamentally physical meanings. For instance, one direction can mean a temperature, whileanother one can deﬁne a physical distance. In this case, accounting for anisotropy can be vastly beneﬁcial,since the Gaussian process will learn the different length scales and use them to lower the overall uncer-tainty. Figure 7 shows how common anisotropy is, even in cases where it would normally not be expected,and how including it decreases the approximated error of the Gaussian process posterior mean. In ourexample, all axes carry the unit of temperature; even so, anisotropy is present and accounting for it has asigniﬁcant impact on the approximation error.In our autonomous synchrotron x-ray experiment we have seen how misleading the no-measurement-noise can be. While the Gaussian process posterior mean, assuming no noise, is much more detailed inFigure 8, it is not supported by the data which is subject to non-i.i.d. noise. In addition, we have seen thatthe steering actually accounts for the measurement noise if included, which leads to much a smarter decisionalgorithm that knows where data is of poor quality and has to be substantiated. We showed, that withoutaccounting for non-i.i.d. noise this phenomenon would not arise. We would therefore place measurementssub-optimally, wasting device access, staff time and other resources.It is important to discuss the computational costs that come with accounting for non-i.i.d. noise andanisotropy. While non-i.i.d. noise can be included at no additional computational costs, anisotropy poten-tially comes at a price. The more complex the anisotropy, the more hyper parameters have to be found. Thenumber of hyper parameters translates directly into the dimensionality of the space the likelihood is deﬁnedover. The training process to ﬁnd the hyper parameters will therefore take longer, the more hyper param-eters we have to ﬁnd. However, the cost per function evaluation will not change signiﬁcantly. Therefore,instead of avoiding the valuable anisotropy, we should make use of modern, efﬁcient optimization methods.While our results have shown that accounting for non-i.i.d. noise and anisotropy is highly valuable forthe efﬁciency of an autonomously steered experiment, we have only scratched the surface of possibilities.Both proposed improvements can be seen as part of a larger theme commonly referred to as kernel design.The possibilities for improvements and tailoring of Gaussian-process-driven steering of experiments arevast. Well-designed kernels have the power to extract sub-spaces of the Hilbert space of functions, which13eans we can put constraints on the function we want to consider as our model. We will look into theimpact of advanced kernel designs on autonomous data acquisition in the near future.

The work was partially funded through the Center for Advanced Mathematics for Energy Research Ap-plications (CAMERA), which is jointly funded by the Advanced Scientiﬁc Computing Research (ASCR)and Basic Energy Sciences (BES) within the Department of Energy’s Ofﬁce of Science, under ContractNo. DE-AC02-05CH11231. This work was conducted at Lawrence Berkeley National Laboratory andBrookhaven National Laboratory. This research used resources of the Center for Functional Nanomate-rials and the National Synchrotron Light Source II, which are U.S. DOE Ofﬁce of Science Facilities, atBrookhaven National Laboratory under Contract No. de-sc0012704. Partial funding was supplied by theAir Force Research Laboratory Materials and Manufacturing Directorate and the Air Force Ofﬁce of Sci-entiﬁc Research.

Author Contributions Statement

M.N., K.G.Y., and M.F. developed the key ideas. M.N. devised the necessary algorithm, formulated therequired mathematics, and implemented the computer codes. M.F. and K.G.Y. designed the x-ray scatteringexperiment. R.A.V. conceived the material and process design. J.K.S. prepared the samples and performedpreliminary characterizations. M.N., K.G.Y., M.F., G.D., and R.L. performed the autonomous experiments.K.G.Y. analyzed the experimental data. M.N. analyzed the algorithm performance and wrote the ﬁrst draftof the manuscript. M.F. and K.G.Y. supervised the work. All authors discussed the results and commentedon the manuscript.

Additional Information

Competing Interests

The authors declare no competing interests. 14 a)(b) (c)(d)

Figure 7: Visualization of the grain size as a function of temperature history for a simple model of blockcopolymer grain size coarsening. The ﬁgure demonstrates that when describing physical systems in high-dimensional spaces, strong anisotropy is frequently observed; only by taking this into account when es-timating errors, will experimental guidance be optimal. (a) , simulated temperature histories andtheir corresponding grain size represented by color. The majority of histories terminate in a small grainsize (blue lines). A small select set of histories yield large grain sizes (dark red lines). (b) Example two-dimensional slice through the 11-dimensional parameter space. The anisotropy is clearly visible. (c) Adifferent two-dimensional slice with no signiﬁcant anisotropy present. (d) The estimated maximum stan-dard deviation across the 11-dimensional domain as function of the number of measurements during asynthetic autonomous experiment. 15BCFigure 8: (top row, A) Results of an autonomous SAXS experiment probing the distribution of grainsize ( ξ ) in a combinatorial nanocomposite sample, as a function of coordinates ( x, y ) representing a two-dimensional sample-processing parameter space, for increasing number of measurements ( N ). The sampleconsisted of a ﬂow-coated ﬁlm of polymer-grafted nano-rods on a surface-treated substrate, where thesubstrate surface energy increased linearly from . / m (hydrophobic) at x = 0 to . / m (hy-drophilic) at x ≈

50 mm , and the coating speed increased at constant acceleration ( .

002 mm / s ) from / s (thicker ﬁlm) at y = 0 to .

45 mm / s (thinner ﬁlm) at y ≈

50 mm . The autonomous experi-ment successfully identiﬁed a well ordered region (between red lines) that corresponded to uniform mono-layer domains. Blue lines mark the region of solution-meniscus instability (see text). The points showthe locations of measured data points; the same axes and orientation are used in subsequent plots in thisﬁgure. (middle, row B, from the left) An exact Gaussian-process interpolation of the complete measureddata-set for the peak position q . The data is corrupted by measurement errors which corrupt the model ifstandard, exact interpolation techniques are used (including GPR). The green circles mark the regions ofthe largest variances in the model and the corresponding high errors (measurement variances) that wererecorded during the experiment. On the right is the Gaussian process model of q , taking into account thenon-i.i.d. measurement variances. This model does not show any of the artifacts that are visible in the exactGPR interpolation. (bottom row, C) The ﬁnal objective functions for no noise and non-i.i.d. noise in q whichhas to be maximized to determine the next optimal measurement. If the experiment had been steered usingthe posterior variances in q without accounting for non-i.i.d. observation noise, the autonomous experimentswould have been misled signiﬁcantly. 16 eferences [1] Ann Almgren, Phil DeMar, Jeffrey Vetter, Katherine Riley, Katie Antypas, Deborah Bard, RichardCoffey, Eli Dart, Sudip Dosanjh, Richard Gerber, et al. Advanced scientiﬁc computing research exas-cale requirements review. an ofﬁce of science review sponsored by advanced scientiﬁc computing re-search, september 27-29, 2016, rockville, maryland. Technical report, Argonne National Lab.(ANL),Argonne, IL (United States). Argonne Leadership , 2017.[2] Prasanna V Balachandran, Dezhen Xue, James Theiler, John Hogden, and Turab Lookman. Adaptivestrategies for materials design using uncertainties. Scientiﬁc reports , 6:19660, 2016.[3] Cristiano Ballabio, Emanuele Lugato, Oihane Fern´andez-Ugalde, Alberto Orgiazzi, Arwyn Jones,Pasquale Borrelli, Luca Montanarella, and Panos Panagos. Mapping lucas topsoil chemical propertiesat european scale using gaussian process regression.

Geoderma , 355:113912, 2019.[4] Xiaodan Bao, Leo Shaw, Kevin Gu, Michael F. Toney, and Zhenan Bao. The meniscus-guided depo-sition of semiconducting polymers.

Nature Communications , 9:534, 2018.[5] H Bijl. Gaussian process regression techniques with applications to wind turbines.

Delft University ofTechnology, Doctoral degree , 2016.[6] Ruijin Cang, Hechao Li, Hope Yao, Yang Jiao, and Yi Ren. Improving direct physical propertiesprediction of heterogeneous materials from imaging data via convolutional neural network and amorphology-aware generative model.

Computational Materials Science , 150:212–221, 2018.[7] Justin Che, Kyoungweon Park, Christopher A Grabowski, Ali Jawaid, John Kelley, Hilmar Koerner,and Richard A Vaia. Preparation of ordered monolayers of polymer grated nanoparticles: Impact ofarchitecture, concentration, and substrate surface energy.

Macromolecules , 49:1834–1847, 2016.[8] Nian-Sheng Cheng. Formula for the viscosity of a glycerol- water mixture.

Industrial & engineeringchemistry research , 47(9):3285–3288, 2008.[9] Edwin B Dean. Design of experiments, 2000.[10] Gregory S Doerk and Kevin G Yager. Beyond native block copolymer morphologies.

MolecularSystems Design & Engineering , 2(5):518–538, 2017.[11] Ronald A Fisher. The arrangement of ﬁeld experiments. In

Breakthroughs in statistics , pages 82–91.Springer, 1992.[12] Alexander Forrester, Andras Sobester, and Andy Keane.

Engineering design via surrogate modelling:a practical guide . John Wiley & Sons, 2008.[13] Jan Genzer, Kirill Eﬁmenko, and Daniel A Fischer. Molecular orientation and grafting density insemiﬂuorinated self-assembled monolayers of mono-, di-, and trichloro silanes on silica substrates.

Langmuir , 18(24):9307–9311, 2002.[14] Richard Gerber, James Hack, Katherine Riley, Katie Antypas, Richard Coffey, Eli Dart, TjerkStraatsma, Jack Wells, Deborah Bard, Sudip Dosanjh, et al. Crosscut report: Exascale requirementsreviews, march 9–10, 2017–tysons corner, virginia. an ofﬁce of science review sponsored by: Ad-vanced scientiﬁc computing research, basic energy sciences, biological and environmental research,fusion energy sciences, high energy physics, nuclear physics. Technical report, Oak Ridge NationalLab.(ORNL), Oak Ridge, TN (United States); Argonne , 2018.[15] GM Godaliyadda, Dong Hye Ye, Michael D Uchic, Michael A Groeber, Gregery T Buzzard, andCharles A Bouman. A supervised learning approach for dynamic sampling.

Electronic Imaging , 2016(19):1–8, 2016.[16] Salman Habib, Robert Roser, Richard Gerber, Katie Antypas, Katherine Riley, Tim Williams, JackWells, Tjerk Straatsma, A Almgren, J Amundson, et al. Ascr/hep exascale requirements review report. arXiv preprint arXiv:1603.09303 , 2016.[17] A Hanuka, J Duris, J Shtalenkova, D Kennedy, A Edelen, D Ratner, and X Huang. Online tuning andlight source control using a physics-informed gaussian process adi. arXiv preprint arXiv:1911.01538 ,2019. 1718] Tzu-Kuo Huang et al. A technical introduction to gaussian process regression. 2006.[19] Anubhav Jain, Shyue Ping Ong, Geoffroy Hautier, Wei Chen, William Davidson Richards, StephenDacek, Shreyas Cholia, Dan Gunter, David Skinner, Gerbrand Ceder, et al. Commentary: The mate-rials project: A materials genome approach to accelerating materials innovation.

Apl Materials , 1(1):011002, 2013.[20] Malte Kuss.

Gaussian process models for robust regression, classiﬁcation, and reinforcement learn-ing . PhD thesis, Technische Universit¨at, 2006.[21] Brookhaven National Laboratory. Scianalysis. https://github.com/CFN-softbio/SciAnalysis , 2015.[22] Brookhaven National Laboratory. Bluesky. https://github.com/NSLS-II/bluesky , 2015.[23] Pawel W Majewski and Kevin G Yager. Rapid ordering of block copolymer thin ﬁlms.

Journal ofPhysics: Condensed Matter , 28(40):403002, 2016.[24] Alondra Mart´ınez, Jennifer Mart´ınez, Hebert P´erez-Ros´es, and Ricardo Quir´os. Image processingusing voronoi diagrams. In

IPCV , pages 485–491, 2007.[25] Andrew McHutchon and Carl E Rasmussen. Gaussian process training with input noise. In

Advancesin Neural Information Processing Systems , pages 1341–1349, 2011.[26] Michael D McKay, Richard J Beckman, and William J Conover. Comparison of three methods forselecting values of input variables in the analysis of output from a computer code.

Technometrics , 21(2):239–245, 1979.[27] Marcus M Noack, Kevin G Yager, Masafumi Fukuto, Gregory S Doerk, Ruipeng Li, and James ASethian. A kriging-based approach to autonomous experimentation with applications to x-ray scatter-ing.

Scientiﬁc Reports , 9:11809, 2019.[28] Marcus M Noack, Gregory S Doerk, Ruipeng Li, Masafumi Fukuto, and Kevin G Yager. Advances inkriging-based autonomous x-ray scattering experiments.

Scientiﬁc Reports , 10:1325, 2020.[29] Ghanshyam Pilania, Chenchen Wang, Xun Jiang, Sanguthevar Rajasekaran, and Ramamurthy Ram-prasad. Accelerating materials property predictions using machine learning.

Scientiﬁc reports , 3:2810,2013.[30] Wilhelm Ruland and Bernd Smarsly. Saxs of self-assembled oriented lamellar nano- composite ﬁlms:an advanced method of evaluation.

Journal of Applied Crystallography , 37:575–584, 2004.[31] Thomas J Santner, Brian J Williams, William Notz, and Brain J Williams.

The design and analysis ofcomputer experiments , volume 1. Springer, 2003.[32] Nicole M Scarborough, GM Dilshan P Godaliyadda, Dong Hye Ye, David J Kissick, Shijie Zhang,Justin A Newman, Michael J Sheedlo, Azhad U Chowdhury, Robert F Fischetti, Chittaranjan Das,et al. Dynamic x-ray diffraction sampling for protein crystal positioning.

Journal of synchrotronradiation , 24(1):188–195, 2017.[33] Eric Schulz, Maarten Speekenbrink, and Andreas Krause. A tutorial on gaussian process regressionwith a focus on exploration-exploitation scenarios. bioRxiv , page 095190, 2017.[34] Oliver Stegle, Christoph Lippert, Joris M Mooij, Neil D Lawrence, and Karsten Borgwardt. Efﬁ-cient inference in matrix-variate gaussian models with \ iid observation noise. In Advances in neuralinformation processing systems , pages 630–638, 2011.[35] Jana Thayer, Daniel Damiani, Mikhail Dubrovin, Christopher Ford, Wilko Kroeger, Christopher PaulOGrady, Amedeo Perazzo, Murali Shankar, Matt Weaver, Clemens Weninger, et al. Data processing atthe linac coherent light source. In , pages 32–37. IEEE, 2019.[36] Francesco Vivarelli and Christopher KI Williams. Discovering hidden features with gaussian pro-cesses regression. In

Advances in Neural Information Processing Systems , pages 613–619, 1999.[37] Christopher KI Williams and Carl Edward Rasmussen.