Machine-Learning Prediction for Quasi-PDF Matrix Elements
MMSUHEP-19-021
Machine-Learning Prediction for Quasi-PDF Matrix Elements
Rui Zhang, Zhouyou Fan, Ruizi Li, Huey-Wen Lin,
1, 2 and Boram Yoon Department of Physics and Astronomy, Michigan State University, East Lansing, MI 48824 Department of Computational Mathematics, Science & Engineering,Michigan State University, East Lansing, MI 48824 Computer, Computational, and Statistical Sciences CCS-7,Los Alamos National Laboratory, Los Alamos, NM 87545, USA
There have been rapid developments in the direct calculation in lattice QCD (LQCD) of theBjorken- x dependence of hadron structure through large-momentum effective theory (LaMET).LaMET overcomes the previous limitation of LQCD to moments (that is, integrals over Bjorken- x )of hadron structure, allowing LQCD to directly provide the kinematic regions where the experimentalvalues are least known. LaMET requires large-momentum hadron states to minimize its systematicsand allow us to reach small- x reliably. This means that very fine lattice spacing to minimize latticeartifacts at order ( P z a ) n will become crucial for next-generation LaMET-like structure calculations.Furthermore, such calculations require operators with long Wilson-link displacements, especially infiner lattice units, increasing the communication costs relative to that of the propagator inversion. Inthis work, we explore whether machine-learning (ML) algorithms can make predictions of correlatorsto reduce the computational cost of these LQCD calculations. We consider two algorithms, gradient-boosting decision tree and linear models, applied to LaMET data, the matrix elements needed todetermine the kaon and η s unpolarized parton distribution functions (PDFs), meson distributionamplitude (DA), and the nucleon gluon PDF. We find that both algorithms can reliably predict thetarget observables with different prediction accuracy and systematic errors. The predictions fromsmaller displacement z to larger ones work better than those for momentum p due to the highercorrelation among the data. I. INTRODUCTION
In the early days, probing hadron structure with lattice QCD (LQCD) was limited to only the first few moments, dueto complications arising from the breaking of rotational symmetry by the discretized Euclidean spacetime. The nonzerolattice spacing breaks the symmetry group of Euclidean spacetime from O (4) to the discrete hypercubic subgroup H (4). Due to the reduced symmetry, the required operators are more complicated and often either suffer fromdivergences or mix with other operators under renormalization. This is treatable but complicated. As a result, evenwith increasing computational resources becoming available to the lattice-QCD community, LQCD hadronic structurecalculations were limited to the lowest few moments (see Ref. [1, 2] and references within for more details). Althoughmodeling the x -dependence to reproduce the calculated lattice moments to gain information on the x -dependence [3]was attempted, this will only give the combinations of the difference between quark and antiquark contributionsrather than individual (anti)quark contributions. Experiments such as E665 at FNAL can probe nucleon sea flavorasymmetry, meaning that lattice QCD would be excluded if it could only apply traditional moment calculations.Similarly, STAR at RHIC is probing the polarized (anti)quark structure of nucleon. The future electron-ion collider(EIC) will further study sea structure. Facing these challenges, LQCD required a new computationally friendlyapproach to extend its applicability to calculations of PDFs and catch up with ongoing experimental efforts.Large-momentum effective theory (LaMET) [4] is one of the most widely adopted new methods for calculatingthe full x dependence of hadron structure. In the LaMET framework, we take an operator containing an integral ofgluonic field strength along a line and boost the nucleon momentum toward the speed of light, tilting the spacelikeline segment toward the light-cone direction. The time-independent, nonlocal (in space) correlators at finite P z canbe directly evaluated on the lattice. For example, the quark unpolarized distribution of a hadron can be calculatedvia q lat ( x, µ, P z ) = (cid:90) dz π e izk × (cid:68) (cid:126)P (cid:12)(cid:12)(cid:12) ¯ ψ ( z )Γ (cid:32)(cid:89) n U z ( n ˆ z ) (cid:33) ψ (0) (cid:12)(cid:12)(cid:12) (cid:126)P (cid:69) , (1)where U z is a discrete gauge link in the z direction, Γ = γ t , x = k/P z , µ is the renormalization scale and (cid:126)P isthe momentum of the hadron, taken such that P z → ∞ . The q lat ( x, µ, P z ), often called the “quasi-PDF” [5], isrelated to the light-cone PDF through a factorization theorem, where the former can be factorized into a perturbativematching coefficient and the latter, up to power corrections suppressed by the nucleon momentum. This factorizationtheorem is founded in LaMET [4, 6–9], where the matching coefficient can be calculated exactly in perturbationtheory. Lattice-QCD results using LaMET already include the isovector quark PDF of the nucleon [10–14], the pion a r X i v : . [ h e p - l a t ] F e b genearlized parton distribution(GPD) [15], the meson DAs [16, 17] and the nonperturbative renormalization in theregularization-independent momentum subtraction (RI/MOM) scheme [18, 19]. Certain technical issues regarding thenonperturbative renormalization were raised and addressed in Refs. [14, 18–23]. The finite volume effect in nucleonquasi-PDF was studied in [24].Even with these promising results published and efforts ongoing, much work remains to be done. For example,most work so far has been limited to a single ensemble; more detailed studies incorporating the systematic errors fromlattice artifacts, such as finite volume and lattice spacing, is necessary to reach precision LQCD PDFs. Larger boostmomentum in the hadron is important to suppress finite-momentum corrections, as well as getting the antiquarkdistribution and small- x quark distribution corrections. Ensembles with smaller lattice spacing ( a − > P z ,a smaller lattice spacing is needed to control the ( P z a ) n lattice artifacts, similar to how heavy-quark studies mustcontrol the heavy-quark mass artifacts at order ( m q a ) n . Likely, more than O (100 , M π L ≈
4. More communication costs will be incurred transporting the Wilson linkfrom one side of the lattice to the other, which can easily become a dominating cost for the calculation. Althoughoptimizing communication efficiency may address the latter problem, we are hoping to find a method that will workfor both the large-momentum and Wilson-link displacement issues that are characteristic of LaMET and its similarapproaches.Recently, authors of Ref. [25] introduced a machine-learning (ML) approach predicting observables by taking ad-vantage of the correlations between lattice QCD observables. Two types of data with high-statistics measurements, O (100 , ×
192 and above). Althoughthis paper focuses on the discussion with quasi-PDF, what we learn here also applies to the pseudo-PDF [27–30]correlators since the building blocks of matrix elements are the same.The structure of this paper is as follows: In Sec. II we briefly describe the two ML algorithms used in this work.Section III demonstrates the application of both algorithms to LaMET-type observables, including the correlators fromthe meson distribution amplitude, kaon and η s parton distribution functions and nucleon gluon parton distributionfunction. We compare results of the ML predictions. We summarize the conclusions and future prospects of this workin Sec. IV. II. MACHINE-LEARNING ALGORITHM
ML works by optimizing a prediction model mapping between input and output data, creating a function approxi-mating the relationship between them, inferred from data. The model is built from a set of data whose label (output)is known, and it is applied to make predictions of the labels for a new set of data whose label is unknown, assumingthat there exists a consistent mapping function between the input and output data. In this study, we use regressionalgorithms, a class of ML approaches, to make quantitative predictions of lattice-QCD measurements. Specifically,the supervised ML regression algorithms we use are the simple linear regression and the gradient boosting tree (GBT)algorithms [31] implemented in the Python scikit-learn package [32]. Although all the data we use in this test havelabels, we divide them into the labeled and unlabeled sets, by hiding the label for the unlabeled dataset. Then, thelabeled dataset is used for training and bias correction procedure, while the unlabeled dataset is used for the test ofthe trained regression algorithm.Gradient boosting is one of the techniques creating a strong model from an ensemble of weak prediction models[33, 34]. For GBT, the shallow decision trees are used as the weak learners (nested mappings as elements of a morecomplicated function approximation) in series and building the active model: f k ( x ) = k (cid:88) i =1 r i h i ( x ) , f ( x ) = f N est ( x ) (2) Another paper [26] applied the neural network(NN) algorithm to the inversion problem to reconstruct PDF from pseudo-PDF matrixelements, though the model was trained and tested on mock datasets instead of real lattice data. where N est is the number of estimators, r i is the learning rate, and h i ( x ) is the function used in the base decision treeto minimize the loss function L : h i ( x ) = arg min h (cid:88) j L ( y j , f i − ( x j ) + h ( x j )) (3)where the subscript j iterates over the training-data samples. In this work, the loss function is chosen to be the meansquared error, and the depth of the decision tree is fixed at 3. To optimize the ML predictions, we must choose modelparameters in Eq. (2) within the proper range. Two parameters are tuned explicitly in this process: the learning rate r , and the number of estimators N est .The prediction accuracy of GBT is compared with those of the linear regression model f lin ( (cid:126)x ) = θ + (cid:126)θ · (cid:126)x (4)for the same set of data. Quantitatively, the quality of the prediction accuracy of the regression models is representedby the fit variance F v defined as F v = 1 − (cid:0) (cid:104) ( C ul − C pred ) (cid:105) − (cid:104) C ul − C pred (cid:105) (cid:1) /σ , (5)where C ul and C pred are the observed and predicted measurements on unlabeled dataset, respectively, and σ is thevariance of the observed measurements. The higher value of F v indicates the better fit quality, and the maximumvalue of F v is 1, which shows a perfect prediction, C pred = C ul . In practical calculations, the F v can be calculated onthe bias correction dataset of the labeled dataset, which is described below. However, we use the unlabeled datasetfor the calculation of the F v , because C ul are available in this test study.Prediction from a ML algorithm may have bias due to prediction error. We follow the bias correction strategyintroduced in Ref. [25] to remove the bias in our estimate and define the bias corrected prediction as: (cid:104) C pred , BC (cid:105) = (cid:104) C pred (cid:105) ul + (cid:104) C BC − C pred (cid:105) BC , (6)where the brackets with subscripts “ul” and “BC” denote averages over the unlabeled and bias-correction datasets,respectively. After bias correction, the expectation value of the prediction becomes the same as the expectation valueof the ground truth, and its statistical error includes the systematic error due to inaccurate predictions. After thebias correction, therefore, our main concern is reducing the statistical error of the final estimate.We normalize the labeled data so that the standard deviation of each input measurement becomes 1 before wepass it to the ML algorithms. Each subset of the data (training, bias-correction, and unlabeled datasets) describedin Ref. [25] are chosen such that the configurations are evenly distributed. The convention of notations throughoutthis work is: Subscript Convention in input to the modelpred prediction of the modelpred,BC bias-corrected predictiontr labeled training dataBC labeled bias correction datalb all labeled dataul unlabeled dataTABLE I. The convention for the subscripts we use in this work.
The errors of the predictions are estimated using the bootstrap method. We randomly pick the bootstrap samplesfor labeled and unlabeled datasets, and partition the labeled one into training and BC datasets. We train the modeland estimate the bias correction on each bootstrap sample of labeled data. We make prediction on the correspondingsample of unlabeled data and calculate the average of the results for unlabeled data. The error is then estimated overall bootstrap samples.
III. APPLICATION TO LATTICE QUASI-PDF MATRIX ELEMENTSA. Predictions of meson quasi-DA measurements
Meson distribution amplitudes (DAs) φ M are important universal quantities appearing in many factorization the-orems, which allow for the description of exclusive processes at large momentum transfers Q (cid:29) Λ [35, 36].Such quantities can be calculated using large-momentum effective theory (LaMET) [4, 8] by calculating the time-independent spatial correlators (the quasi-DA) on the lattice, followed by a matching procedure with correctionssuppressed by the hadron momentum. The light-cone meson DA φ M ( x, µ ) = if M (cid:90) dξ π e i ( x − ξn · P (cid:104) M ( P ) | ¯ ψ (0) n · γγ U (0 , ξn ) ψ ( ξn ) | (cid:105) (7)can be extracted from the quasi-DA˜ φ M ( x, µ R , P z ) = if M (cid:90) dz π e i ( x − zP z (cid:104) M ( P ) | ¯ ψ (0) γ z γ z − (cid:89) x =0 U z ( x, t ) ψ ( z ) | (cid:105) (8)through the matching [37]˜ φ M ( x, µ R , P z ) = (cid:90) dy Z φ ( x, y, µ, µ R , P z ) , φ M ( y, µ ) + O (cid:18) Λ QCD P z , m M P z (cid:19) . (9)according to LaMET. The quasi-DA can be obtained by computing the following correlators for K − and η s , aspresented in the Refs. [16, 17]: C ( z, P, t ) = (cid:104) | (cid:90) d y e i (cid:126)P · (cid:126)y ¯ ψ ( (cid:126)y, t ) γ z γ z − (cid:89) x =0 U z ( y + x ˆ z, t ) ψ ( (cid:126)y + z ˆ z, t ) ¯ ψ (0 , γ ψ (0 , | (cid:105) (10)where { ψ , ψ } are { u, s } for K − and { s, s } for η s , U ( (cid:126)x, (cid:126)x + z ) is the Wilson line connecting lattice site (cid:126)x to (cid:126)x + z ˆ z .We perform a calculation using gauge ensembles with clover valence fermions on a 48 ×
144 lattice with 2 + 1 +1 flavors (degenerate up and down, strange, and charm degrees of freedom) of highly improved staggered quarks(HISQ) [38] generated by the MILC Collaboration [39]. The lattice spacing a ≈ .
06 fm, and m sea π = 310 MeV.Hypercubic (HYP) smearing [40] is applied to the configurations. The bare quark masses and clover parameters aretuned to recover the lowest pion mass of the staggered quarks in the sea. Correlators are calculated from momentum-smearing sources [41] using 20 source locations on each of the 95 configurations (1900 measurements in total).We make two predictions using the ML algorithm. One is to predict the correlators at larger link length z pred fromthe correlators at z in < z pred . The other is to predict the correlators of larger momentum p pred from the correlatorsof p in < p pred .To determine what input data to use for these predictions, we first check the correlations among datasets withdifferent momenta, link lengths and timeslices. The results are shown in Fig. 1. Here, we set the target data to bethe 2-point quasi-DA correlators at p pred = 5, z pred = 4 with input data p in = 4, z in = 4 for p -prediction and p in = 5, z in < z -prediction. We select the timeslice t pred = 7 to check the correlations.Despite the larger error, larger timeslices have a weaker correlation with the target data. This suggests that weshould use input data close to the timeslice of the target data. On the other hand, we should be able to extend therange of momentum or links of the input.In the training process, we tried different parameters for learning rate in { . , . , . , . , . , . , . } andthe number of estimators in { , , , , } . The corresponding fit variance are plotted in a heatmap withrange [0 , p -predictions and z -predictions, we selectedparameters r = 0 . N est = 150 as having highest fit quality in both cases; these will be used for further meson-DApredictions.The datasets were evenly distributed into three parts: training data, bias-correction data, and unlabeled test data.In practice, we want to minimize the labeled data size without sacrificing much prediction quality. We varied theamount of training data and bias-correction data from 300 to 500, while keeping the number of unlabeled test data N ul = 900 fixed, to look for a best trade-off between reduced data size and prediction quality. The results are shownin Fig. 3. When correlation is obvious, small number of training and bias-correction datasets provides precise estimatethat is very close to the true observations for the unlabeled dataset. When correlation is vague, the prediction becomesmore precise as one increases the size of the training or the bias-correction datasets. Based on the plot, we picked N tr = 400, N BC = 500 for further estimations.To further check the consistency of our predictions with the observations, we calculate the effective mass from C and compare the results. The effective mass is defined as E ( t ) = ln C ( t ) C ( t + 1) (11) z in t i n p in t i n FIG. 1. Correlations between target η s DA C data at z pred = 4, p pred = 5, t pred = 7 with input data at a different linklength (momentum) and timeslice for z -prediction (left) and p -prediction (right). The correlation decays quickly, especially atlarger t .
100 150 200 250 300 N est r N est r FIG. 2. Fit variance F v of the unlabeled η s DA data for the p pred = 5, z pred = 4 prediction at t pred = 4 from z in = 3 (left)or p in ∈ [3 ,
4] (right). N tr = 400, N ul = 1000. It is clear that more estimators are needed for smaller learning rate. Increasing N est without worsening the prediction indicates that the model is robust to overfitting.Type Input Method E tr E pred E pred,BC E ul F v p -pred p in ∈ [3 , z in = 4, t in = 7 GBT 0.679(11) 0.684(13) 0.683(14) 0.6923(80) 0.50(13)linear 0.679(11) 0.6960(86) 0.6961(91) 0.6920(74) 0.911(43) z -pred p in = 5, z in ∈ [0 , t in = 7 GBT 0.679(11) 0.694(13) 0.692(12) 0.6923(80) 0.62(14)linear 0.679(11) 0.6913(76) 0.6912(75) 0.6920(74) 0.99935(40)TABLE II. Effective mass calculated from the prediction of η s DA C at p pred = 5, z pred = 4, t pred = 7 with different modelsand different inputs. Models are trained with N tr = 400, N BC = 500, N ul = 900, N est = 150, r = 0 .
1. The linear model is moreaccurate than GBT. Both models have better performance for z -prediction than p -prediction. Then, we compared different input data to be used for z -prediction in Table II. The bias correction makes the pre-diction noisier by converting the systematic error into statistical error, which improves the accuracy of the predictionfor most cases.For small datasets, such as what we have for the quasi-DA data, it can be difficult for the GBT model to extract
300 350 400 450 500 550 N tr + 0.1 N BC E ( t = )
300 350 400 450 500 550 N tr + 0.1 N BC E ( t = ) FIG. 3. The observed and z -predicted η s DA effective mass of p pred = 5, z pred = 4 at t pred = 4 with input p pred = 5, z pred ∈ [0 , t in ∈ [3 ,
5] for different choices of training data counts and bias-correction data counts. The left (right) plot isthe prediction of GBT (linear) model. The horizontal axis is N tr + 0 . N BC , with N ul = 900 fixed. The GBT parameters are N est = 150, r = 0 .
1. The blue points are predictions with bias correction for the unlabeled test data, and the brown points areobservations for unlabeled test data. the nonlinear pattern of the training dataset. As a consequence, the fit quality of the GBT model for the test data ispoor. Instead, the simpler linear regression shows better performance. Sometimes, however, when input data whenthe dataset is noisy (e.g., larger- t data), the linear regression fails with poor prediction quality, as shown in Table III,while GBT was able to capture the correlation and make predictions. Using cleaner and more correlated data like theclosest timeslice, momentum and link can significantly improve the fit quality for linear regression. Type Input Method E tr E pred E pred,BC E ul F v p -pred p in ∈ [3 , z in = 4, t in = 10 GBT 0.686(43) 0.678(45) 0.683(40) 0.675(26) 0.36(19)linear 0.683(37) 0.692(39) 0.695(39) 0.676(27) 0.72(13) p -pred p in ∈ [3 , z in = 4, t in ∈ [7 ,
13] GBT 0.686(43) 0.677(51) 0.676(43) 0.675(26) 0.25(27)linear 0.683(37) 0.675(88) 0.677(77) 0.676(27) -0.13(85)TABLE III. Effective mass calculated from the prediction of η s DA C at p pred = 5, z pred = 4, t pred = 10 with differentmodels and different input timeslices. Models are trained with N tr = 400, N BC = 500, N ul = 900, N est = 150, r = 0 .
1. Thelinear model has better performance on correlated cleaner data but fails when more uncorrelated noisy data input are included.The GBT is more stable and less sensitive to these inputs.
After determining the parameters, we run the ML program and show the effective mass of our predictions alongwith the observed datasets for both p pred and z pred predictions in Fig. 4. The linear model works well for z -prediction,but the GBT model and p -predictions still need to be improved. B. Predictions of kaon quasi-PDFs
As Nambu-Goldstone bosons associated with dynamical chiral SU(3) symmetry breaking, the pion and kaon serveas a fundamental test ground for our understanding of QCD theory at the hadronic scale. The ab initio calculationof hadron PDFs from lattice QCD provides theoretical background for particle-discovery experiments and Standard-Model (SM) tests at colliders [42]. After decades of theoretical and experimental efforts, the precision required inPDFs for more stringent tests of the SM has increased significantly. In Ref. [43], we presented the first direct latticecalculation of the valence-quark distribution in the pion, using the MILC HISQ coarse ensemble with M π ≈
330 MeV.Since the computational cost of quasi-PDF measurements on an ensemble at lighter pion mass or reduced latticespacing would increase significantly, in this work we investigate a ML algorithm to reduce the computational cost. E E tr E pred E pred , BC E ul E E tr E pred E pred , BC E ul E E tr E pred E pred , BC E ul E E tr E pred E pred , BC E ul FIG. 4. The observed/predicted η s DA effective mass at p pred = 5, z pred = 4 from p in = 5, z in ∈ [0 , p in ∈ [3 , z in = 4 (right). The top (bottom) plots are obtained by using GBT (linear) model with N tr = 400, N BC = 500, N ul = 900.The GBT parameters are N est = 150, r = 0 .
1. The linear model shows better consistency with the unlabeled data, while atsome timeslices the GBT model fails to give a good prediction.
We test on the meson unpolarized quasi-PDF measurements on the lattice: C ( z, t ) = (cid:104) | (cid:90) d y e − iy · P M ps ( (cid:126)y, t sep )¯ s ( z, t ) γ z − (cid:89) x =0 U z ( x, t ) s (0 , t ) ¯ M ps ( (cid:126) , | (cid:105) , (12) C ( t sep ) = (cid:104) | (cid:90) d y e − iy · P M ps ( (cid:126)y, t sep ) ¯ M ps ( (cid:126) , | (cid:105) , (13)where C is the three-point correlator, C is the two-point correlator, M ps = ¯ qγ q is the pseudoscalar mesonoperator, z is the length of the Wilson link, U µ ( x, t ) is the gauge link, and γ i are Dirac spinor matrices. For thisstudy we use Wilson clover valence quarks on a MILC HISQ ensemble. The lattice spacing is a ≈ .
12 fm, thelattice volume V = 40 ×
64, and the pion mass M sea π ≈
220 MeV. The valence quark masses are tuned to matchthe valence pion to the sea pion mass. We adopt Gaussian momentum smearing [41] to generate quark sources, toenhance the ground-state signal at nonzero momentum near 1.55 GeV. The Gaussian smearing width is chosen to be3, with 50 iterations, and the momentum parameter k = 4 .
82. Measurements are done on 495 configurations, using4 quark-source locations per configuration, making 1960 measurements in total. Measurements are averaged overthese quark sources before being passed to the ML algorithm, as this has shown to provide predictions with smallerstatistical errors. The ratio of the three-point correlator ( C ) to the two-point correlator ( C ) is a useful way toextract the matrix elements: R ( t ) = C ( t ) /C ( t sep ) (14)where R ( t ) is the ratio at the operator insertion time t , and t sep is the meson source and sink temporal separation.
1. Kaon quasi-PDF results
For the kaon quasi-PDF, the meson operator is K = ¯ uγ s . We first check the correlation for three-point correlatorswith insertion operator γ . Generally, the correlations are better than for the DA case. The correlations betweendifferent time are shown in Fig. 5. The correlation is insensitive to insertion time, but sensitive to the differencebetween two-point timeslice and three-point source-sink time separation. Because the correlators are similar fordifferent insertion times, we can use all the insertion timeslices as input in the same procedure. An anomaly isobserved in the momentum correlation in Fig. 6, which may due to the different number of measurements for p ∈ { , } and p ∈ { , } , since we had an extra run for p ∈ { , } with different source locations. The link correlation is thendisplayed in Fig. 7. The correlation decays slowly in the z -direction, suggesting that we may use more data at differentlinks as inputs. t t s e p , p t t t p t FIG. 5. Correlations between C and C of kaon quasi-PDF at different time separations (left, with insertion time t = t sep /
2) and at different insertion times (right, with t sep = 6). The correlation is insensitive to the insertion time. Again, we compare the parameters used for the GBT model. Fig. 8 shows the fit variance estimate F v from boththe z -prediction (with p in = p pred and z in < z pred ), and the p -prediction (with p in < p pred and z in = z pred ) usingthe GBT model trained on 400 measurements. The horizontal axis shows the number of estimators N est , and thevertical axis shows the learning rate r . The target measurement is at p pred = 4, z pred = 4, t sep = 5, and t = 2. Foreach prediction we used both C and the C . Thus, in either case a set of fit parameters can be chosen as, e.g., N est = 150, r = 0 .
1. As expected, with reduced learning rate, one needs more estimators to achieve a similar fitvariance. With fixed learning rate, the fit variance becomes stable when we keep increasing N est , indicating that themodel is robust to overfitting.Fig. 9 compares the final predictions among various training and bias-correction measurements: N tr and N BC areselected from { , , , , } , and the number of unlabeled measurements is fixed to N ul = 1180. The fitparameters are adopted as above. We observe a reduced error size of final predictions with increased N tr and N BC .Using p -prediction on kaon quasi-PDFs can reduce the computational cost, because calculating C at differentmomenta requires the calculation of different propagators from different sequential sources. The effective computa-tional savings of the ML calculation can be derived by considering the number of propagators needed to achieve thesame precision as in a calculation without ML. In our case, to use p in = 3 to predict p pred = 4, we need to calculate N in propagators at p in = 3 and N BC + N tr propagators at p pred = 4 for the ML setup. Then, we can use the modelto obtain the N ul predictions at p pred = 4. This amount of data is equivalent to a non-ML calculation with N in propagators at p in = 3 and N ul × σ ( R ul ) /σ ( R comb ) propagators at p pred = 4. The cost with ML can be quantifiedby: Cost = N in + N BC + N tr N in + N ul (cid:68) σ ( R ul ) σ ( R comb ) (cid:69) t (15) p p p t p in3pt p p r e d p t FIG. 6. Correlations between the kaon quasi-PDF three-point correlators and two-point correlators (left) or three-point corre-lators (right) at different momenta. The three-point correlation seem to be clustered; p ∈ { , } and p ∈ { , } are correlatedseparately. Thus, the prediction of p pred = 5 from smaller momentum has bad quality. -4 -3 -2 -1 0 1 2 3 4 z in - - - - z p r e d FIG. 7. Correlations between the kaon quasi-PDF three-point correlators with p in = p pred = 3 at different link lengths. Shorterlink lengths have better correlation, and the correlation decay in the z direction is slow. where N BC , N tr and N ul are the numbers of propagator calculations (which represent the computational cost) neededto obtain the corresponding datasets (bias-correction, training and unlabeled), and N in is that of the input data. Theratio σ ( R ul ) /σ ( R comb ) is the scaling factor of the effective number of measurements we can obtain by employingthe ML prediction, accounting for the increase of statistical error due to prediction error. We assume that the errorsof observables scale as 1 / √ N as the number of measurements increases. For the cost estimate, we use an averagevalue of the ratios over different insertion timeslices. We calculate the R comb = C comb3pt /C here from each bootstrapsample by taking the weighted average of the measurements on labeled data and BC predictions on unlabeled datain each sample: C comb3pt = ¯ C pred,BC3pt /σ ( C pred,BC3pt ) + ¯ C lb3pt /σ ( C lb3pt )1 /σ ( C pred,BC3pt ) + 1 /σ ( C lb3pt ) (16)while the error σ ( R comb ) is estimated from all bootstrap results. A smaller cost indicates higher prediction efficiency,so we vary N tr and N BC to find the optimal cost reduction, as shown in Fig. 10. By choosing optimal N tr and N BC ,0
100 150 200 250 300 N est r N est r FIG. 8. Estimates of the fit variance F v ( t = 2) as a function of learning rate r and number of estimators N est from the kaonquasi-PDF measurements at p pred = 4, z pred = 4, t sep = 5. N tr = 400 and N ul = 1180 are used. The left (right) plot shows theresults from the z -prediction ( p -prediction). The z -prediction has a much better fit variance because of the good correlationsbetween close links.
100 200 300 400 N tr + 0.1 N BC R ( t = )
100 200 300 400 N tr + 0.1 N BC R ( t = ) FIG. 9. Observations and predictions of the ratio R ( t = 2) of kaon quasi-PDF correlators at p pred = 4, z pred = 4, t sep = 5 frominput data at p in = 4, z in ∈ [0 , t sep = 5. The left and right sides show the results from using the GBT and linear models,respectively. We use N est = 150, r = 0 . N tr + 0 . N BC , and the number ofunlabeled measurements is fixed to 1180. Points in blue are for predictions with bias correction, and orange for observations. we can obtain about 20% reduction in computational cost.Figure 11 shows this set of fitted results from both the z -prediction and p -prediction at N tr = 240, N BC = 240,while Table IV compares several sets of p - and z -predictions and observations. The last column of the table showsthe fit quality.We compare the predicted ratios for these models in Fig. 11. The z -predictions are consistent with unlabeled datafor both models, but the p -predictions still need to be improved. η s quasi-PDF results For the η s quasi-PDF data, the meson operator is η s = ¯ sγ s . The η s data have better signals, and the correlationsamong η s data show the same patterns as those of the kaon. Therefore, we select the same parameters for the modeltraining, N est = 150, r = 0 . N tr = N BC = 240, N ul = 1180. By comparing Fig. 12 and Fig. 8, we can see that the fit1
100 200 300 400 N tr + 0.1 N BC R ( t = ) c o s t / %
100 200 300 400 N tr + 0.1 N BC R ( t = ) c o s t / % FIG. 10. Observations and predictions of the ratio R ( t = 2) of kaon quasi-PDF correlators at p pred = 4, z pred = 4, t sep = 5from input data at p in = 3, z in = 4, t sep = 5. The red line shows the effective cost averaged on R ( t ∈ [1 , N est = 150, r = 0 . N tr + 0 . N BC , and the number of unlabeled measurements is fixed to 1180. Points in blue are forpredictions with bias correction, and orange for observations. t t sep /20.200.220.240.260.280.30 R R tr R pred R pred, BC R ul t t sep /20.200.220.240.260.280.30 R R tr R pred R pred, BC R ul t t sep /20.200.220.240.260.280.30 R R tr R pred R pred, BC R ul t t sep /20.200.220.240.260.280.30 R R tr R pred R pred, BC R ul FIG. 11. The ratio R ( t ) of the kaon quasi-PDF correlators at z pred = 4, p pred = 4 from direct measurements and the predictionsof the three models. The top (bottom) row is GBT (linear) model with N est = 150, r = 0 . N tr = N BC = 240 and N ul = 1180are used. The left column uses z in ∈ [0 , p in = 4 as inputs, while the right column uses z in = 4, p in = 3 as inputs. z -predictionsare better than p -predictions. Type Input Method R tr R pred R pred , BC R comb R ul F v p -pred p in = 3 , z in = 4 GBT 0.2441(70) 0.2430(60) 0.2439(56) 0.2435(51) 0.2471(35) 0.692(41)linear 0.2441(70) 0.2479(63) 0.2480(58) 0.2472(54) 0.2471(35) 0.772(29) z -pred p in = 4 , z in ∈ [0 ,
3] GBT 0.2441(70) 0.2458(40) 0.2455(41) 0.2456(32) 0.2471(35) 0.890(26)linear 0.2441(70) 0.2470(36) 0.2473(36) 0.2466(32) 0.2471(35) 0.998(1)TABLE IV. Observations and predictions of the ratio R ( t = 2) of the kaon quasi-PDF correlators at p pred = 4, z pred = 4, t sep = 5 from different models and inputs. We use N est = 150, r = 0 . quality is slightly improved by the cleaner dataset. We infer that with more labeled kaon quasi-PDF data available formodel training, the kaon model will show better performance as well. The predictions compared with observations areshown in Fig. 14. Both z -predictions and p -predictions are more precise compare to the kaon case. Figure 13 showsthe cost on different N tr /N BC set, the linear model shows a better optimal reduction than the kaon case. Overall, thecost reductions are 12%–18% at optimal choices of the sizes of the training and bias-correction datasets.
100 150 200 250 300 N est r N est r FIG. 12. Estimates of the fit variance F v as a function of learning rate r and number of estimators N est from the η s quasi-PDFmeasurements at p pred = 4, z pred = 4, t sep = 5, and t pred = 2. N tr = 400 and N ul = 1180 are used. The left (right) side showsthe results from the z -prediction ( p -prediction). The performance is better than the model of kaon data. C. Gluon Quasi-PDF Matrix Elements
The gluon PDF contributes at next-to-leading order to deep inelastic scattering (DIS) cross sections, and it enters atleading order in jet production. Global fits have combined the data from both DIS and jet-production cross sections,and constraints on the gluon PDF from the experimental side are improving. However, on the theoretical side thegluon PDF is poorly known. PDF cannot be calculated using perturbative QCD. Recently, it has been found thatthey can be calculated directly in lattice QCD using large-momentum effective field theory. The gluon unpolarizedquasi-PDF matrix elements are computed on the lattice using C ( z ; t sep , t ) = (cid:104) | Γ (cid:90) d y e − iy · P χ ( (cid:126)y, t sep ) F µt ( z, t )[ z − (cid:89) x =0 U ( x, t )] F zµ (0 , t ) χ ( (cid:126) , | (cid:105) , (17) C ( z ; t sep ) = (cid:104) | Γ (cid:90) d y e − iy · P χ ( (cid:126)y, t sep ) χ ( (cid:126) , | (cid:105) , (18)where C is the three-point correlator, C is the two-point correlator, O ( z, t ) is the gluon operator, χ = (cid:15) abc [ u aT ( x ) iγ γ γ d b ( x )] u c ( x ) is the nucleon interpolation field, { a,b,c } are color indices, Γ = (1 + γ ), and the field3
100 200 300 400 N tr + 0.1 N BC R ( t = ) c o s t / %
100 200 300 400 N tr + 0.1 N BC R ( t = ) c o s t / % FIG. 13. Observations and predictions of the ratio R ( t = 2) of η s quasi-PDF correlators at p pred = 4, z pred = 4, t sep = 5 frominput data at p in = 3, z in = 4, t sep = 5. Red line is the effective cost averaged on R ( t ∈ [1 , N est = 150, r = 0 . N tr + 0 . N BC , and the number of unlabeled measurements is fixed to 1180. Points in blue are for predictions withbias correction, and orange for observations. tensor F µν is defined by F µν = i a g ( P [ µ,ν ] + P [ ν, − µ ] + P [ − µ, − ν ] + P [ − ν,µ ] ) , (19)where the plaquette P µ,ν = U µ ( x ) U ν ( x + a ˆ µ ) U † µ ( x + a ˆ ν ) U † ν ( x ) and P [ µ,ν ] = P µ,ν − P ν,µ . To improve the signal, westudied 1, 3, 5, 10 steps of hypercubic (HYP) smearing [40] on the gluon momentum faction (cid:104) x (cid:105) g in Eq. (3) of Ref. [44].After applying the renormalization to the bare matrix elements, the results from different numbers of HYP-smearingsteps are consistent with each other and with phenomenology results 0.42(2) within the uncertainties [44] with theexception of the 10-step. Therefore, we apply 5 steps of HYP smearing to the gluon quasi-PDF operators in thiswork. The ratio R of the three-point correlator to the two-point correlator follows the same definition as in Eq. (14).We use valence overlap fermions on RBC gauge configurations [45] with 2+1 flavors of domain-wall fermions (DWF),lattice volume L × T = 24 ×
64, lattice spacing a = 0 . m sea π = 330 MeV. We also computeclover valence quarks on the MILC N f = 2 + 1 + 1 HISQ configurations [46] with L × T = 32 × a = 0 . m sea π = 313 MeV. For the nucleon two-point function, considering all timeslices and independent smeared pointsources, the number of measurements for the two-point functions is 200 × × ,
800 on the RBC-24I latticesand 300 × × ,
800 on the MILC-a09m310 lattices.
1. Predictions of the gluon correlators with the overlap valence fermions
To make z / p -predictions on correlators based on smaller z / p values, we should first check the correlations amongcorrelators with different momenta and link lengths. In Fig. 15, we show the correlations between the three-pointcorrelation function at p pred = 2, z pred = 3 and the same three-point correlation functions at various choices ofmomenta p in = { , , } and link lengths z in = { , , , } . The source-sink time separation is fixed to t sep = 8. Wenotice that the correlations between different momenta are weaker than the correlations between different link lengthsin this case, which will result in a relatively low p -prediction fit variance as shown in Fig. 16.The fit variances F v from the p -prediction and z -prediction are shown in Fig. 16 with different learning rates in { . , . , . , . , . , . , . , . } and different numbers of estimators in { , , , , } . The targetmeasurement is with p in = [0 , p pred = 2, z in = z pred = 3, t sep = 8, and t = 4. We used both C and C forprediction. Thus, considering the F v for z / p -prediction shown in Fig. 16, we choose r = 0 . N est = 150 as theparameter set we will use in further work.For p -prediction, we varied the number of training data and bias-correction data from 15360 to 30720, while keepingthe number of unlabeled test data N ul = 143360 fixed, to compare their performance. The results are shown in Fig. 17.We will use N tr = 30720, N BC = 30720 in the following p / z -prediction.With the ML model parameters and the dataset we obtained from the overlap-fermion ensembles, we show the resultof our prediction along with the observed datasets for both p pred and z pred predictions in Fig. 18. In the prediction,4 t t sep /20.220.230.240.250.260.270.28 R R tr R pred R pred, BC R ul t t sep /20.220.230.240.250.260.270.28 R R tr R pred R pred, BC R ul t t sep /20.220.230.240.250.260.270.28 R R tr R pred R pred, BC R ul t t sep /20.220.230.240.250.260.270.28 R R tr R pred R pred, BC R ul FIG. 14. The ratio R ( t ) of η s quasi-PDF correlators at z pred = 4, p pred = 4 from direct measurements and the predictions ofthe two models. The top (bottom) row is the GBT (linear) model. The left column uses z in ∈ [0 , p in = 4, the right columnuses z in = 4, p in = 3. The model performs better on these cleaner datasets. N tr = N BC = 240, N ul = 1180 and N est = 150, r = 0 . we can use any p in < p pred or z in < z pred for prediction. In Table V, two-point and three-point correlator data at p in = 1, z in = 3, t sep = 8 or p in = 2, z in = 2, t sep = 8 are used for predicting the p pred = 2, z pred = 3, t sep = 8ratio. The data for insertion time t = 4 are shown. From the table we can see that the p -predictions are bad for bothmodels, because the correlations are weak, as shown in Fig. 18. The z -predictions are better than p -predictions, andthe linear model performs better than GBT. Type Input Method R tr R pred R pred , BC R ul F v p -pred p in = 1 , z in = 3 GBT 0.184(34) 0.178(33) 0.177(29) 0.171(14) 0.07(18)linear 0.184(34) 0.179(35) 0.177(35) 0.171(14) -0.05(38) z -pred p in = 2 , z in = 2 GBT 0.184(34) 0.185(28) 0.189(22) 0.171(14) 0.53(12)linear 0.184(34) 0.177(21) 0.176(22) 0.171(14) 0.665(79)TABLE V. Observations and predictions of gluon-correlator ratios for the overlap valence fermions observations and predictionsat p pred = 2, z pred = 3, t sep = 8, t = 4 by using N tr = 30720, N BC = 30720, N ul = 1433600, r = 0 .
02, and N est = 150. For the z -predictions, the linear model shows a better fit variance than GBT. The p -predictions are bad for both models, because thecorrelations are poor, as shown in Fig. 18. z p FIG. 15. Correlation coefficient between the three-point correlation function at p pred = 2, z pred = 3 and at various choices of p and z calculated using the overlap valence fermions. Different z at the same p cases show higher correlation than different p at the same z cases.
100 150 200 250 300 N est r N est r FIG. 16. Gluon-correlator ratio fit variance for the z -prediction (left) and p -prediction (right) for the overlap valence fermionsat p pred = 5, z pred = 3, t sep = 8 at t = 4 from p in = 5, z in = 2, t sep = 8 and p in = 4, z in = 3, t sep = 8 using N tr = 61440, N BC = 61440, and N ul = 81920. Fit variance is closely related to the correlations between the input data and unlabeled data. z -prediction works much better than p -prediction.
2. Predictions of the gluon correlators for clover valence fermions
We repeat the procedure we established from the overlap valence fermions for the clover fermions, checking thecorrelations among correlators with different momenta and link lengths. In Fig. 19, we show the correlations betweenthe three-point correlation functions at p pred = 5, z pred = 3 at various values of p in = { , , } , z in = { , , , } . Thesource-sink time separation is fixed t sep = 8. The correlations between different momenta are much stronger than inthe overlap case, which leads to a much higher p -prediction fit variance, as shown in Fig. 20. The reason that thecorrelations of clover fermion case are stronger than overlap fermion case is the construction of the sources of protoncorrelator are different in two cases. In overlap fermion, we use grid spatial source which needs gauge-averaging toget consistent correlators that dues to weak correlation properties. While the clover fermion does’t have this kind ofproblem because of using one spatial location per time source.We use the same fit-variance F v estimation as in the overlap case. The target measurement is p in = 4, p pred = 5, z in = 2, z pred = 3, t sep = 8, and t = 4. We obtain r = 0 . N est = 200 as the parameters we will use in the followingprocess from Fig. 20. These two figures indicate stronger correlations between input and target data are needed toobtain good results for the fit variance.Again, to compare their performance we varied the number of training data and bias correction data from 1440 to6 N tr + 0.1 N BC R ( t = ) N tr + 0.1 N BC R ( t = ) N tr + 0.1 N BC R ( t = ) N tr + 0.1 N BC R ( t = ) FIG. 17. The GBT (left) and linear-regressor (right) results. The observed p - and z -predicted gluon-correlator ratios are forthe overlap valence fermions at p pred = 2, z pred = 3 at t sep = 8 by using r = 0 .
02 and N est = 150 for different counts of trainingdata and bias -correction data. The horizontal axis is N tr + 0 . N BC , with N ul = 143360 fixed. The blue points are predictionswith bias correction for the unlabeled test data, and the orange points are observations for unlabeled test data. N ul = 23040 fixed. The observed, p - and z -predicted gluoncorrelator C and C ratio of the clover valence fermions p pred = 5 , z pred = 3 at t sep = 8 are shown in Fig. 21.Comparing with these results, we will use N tr = 2880, N BC = 2880 in the following p - and z -predictions.The observed/predicted gluon correlator ratios of the clover valence fermions of the GBT and linear regressor modelat p pred = 5, z pred = 3 are shown in Fig. 22. The linear model gives a slightly better results. In Table VI, two-pointand three point correlator data at p in = 4, z in = 2, t sep = 8 are used for predicting p pred = 2, z pred = 3, t sep = 8correlator. The data at insertion time t = 4 are shown. Compared with the overlap-fermion result in Table V, the fitvariance is much higher, due to the input data having stronger correlations with the target data. Type Input Method R tr R pred R pred , BC R ul F v p -pred p in = 4 , z in = 3 GBT 0.26(19) 0.300(91) 0.296(92) 0.307(72) 0.733(62)linear 0.26(19) 0.28(11) 0.279(98) 0.307(72) 0.845(60) z -pred p in = 5 , z in = 2 GBT 0.26(19) 0.26(11) 0.27(11) 0.307(72) 0.704(62)linear 0.26(19) 0.27(11) 0.29(10) 0.307(72) 0.819(51)TABLE VI. Observations and predictions of gluon correlator ratio for the clover valence fermions and predictions at p pred = 5, z pred = 3, t sep = 8, t = 4 using N tr = 2880, N BC = 2880, N ul = 23040, r = 0 .
2, and N est = 200. The linear model shows abetter fit variance than GBT. t t sep /20.000.050.100.150.200.250.30 R R tr R pred R pred, BC R ul t t sep /20.000.050.100.150.200.250.30 R R tr R pred R pred, BC R ul t t sep /20.000.050.100.150.200.250.30 R R tr R pred R pred, BC R ul t t sep /20.000.050.100.150.200.250.30 R R tr R pred R pred, BC R ul FIG. 18. The observed/predicted gluon correlator C and C ratio of the overlap valence fermions lattice ensemble at p pred = 2, z pred = 3 from p in = 1, z in = 3 (upper) and p pred = 2, z pred = 3 from p in = 2, z in = 2 (lower) by using N tr = 30720, N BC = 30720, N ul = 1433600, r = 0 .
02, and N est = 150. The GBT and linear regressor results are shown on the left and right,respectively. The predictions with bias correction do not improve much over the raw predictions. IV. SUMMARY
In this article, we applied the ML technique to quasi-DA and quasi-PDF correlators. Using both GBT model andlinear model, we tried to predict the C for meson quasi-DAs and the C for meson and gluon quasi-PDFs atlarger momenta and link lengths, which are noisier and need more computational resources. By predicting from thecomputationally less expensive data, we are able to reduce the computational cost. Systematic uncertainties from theML prediction errors are converted to the statistical uncertainties by using the bias correction procedure. With thefull bootstrap resampling, we effectively estimated and compared the errors of different model predictions.Table VII summarizes the best fit variances F v of all predictions we investigated. It is observed that for mesondatasets, the data from different links are more correlated than those of different momenta. Consequently, the z -predictions for both models work much better than p -predictions. The ML approach on the z -prediction of quasi-DAsand meson quasi-PDFs is very precise, while the p -predictions and the predictions for gluon quasi-PDFs show relativelyworse precision. By comparing two ML regression models, we find that the linear model is preferred on cleaner datasetswhen the correlations between input data and target data are good enough, such as the z -prediction of meson-DAsand meson PDFs. On the other hand, the GBT model is more robust to noisy and less-obviously correlated inputs.For the p -prediction of meson quasi-PDFs, both models are able to give a computational cost reduction of 16%.8 z p FIG. 19. Correlation coefficients between the three-point correlation functions at p pred = 5, z pred = 3 at various values of p in = { , , } , z in = { , , , } calculated using the clover valence fermions. Different z at the same p cases show highercorrelation than different p at the same z cases.
100 150 200 250 300 N est r N est r FIG. 20. Gluon-correlator ratio fit variance for the z -prediction (left) and p -prediction (right) for the clover valence fermionsat p pred = 5, z pred = 3, t sep = 8 at t = 4 from p in = 5, z in = 2, t sep = 8 and p in = 4, z in = 3, t sep = 8 using N tr = 2880, N BC = 2880, and N ul = 23040. With a stronger correlation between input and target data, smaller learning rate and numberof estimators are needed to have good fit variance score. ACKNOWLEDGMENTS
We thank the MILC Collaboration and RBC Collaboration for sharing the lattices used to perform this study.The LQCD calculations were performed using the Chroma software suite [47]. This research used resources of theNational Energy Research Scientific Computing Center, a DOE Office of Science User Facility supported by the Officeof Science of the U.S. Department of Energy under Contract No. DE-AC02-05CH11231 through ERCAP; facilities ofthe USQCD Collaboration, which are funded by the Office of Science of the U.S. Department of Energy, and supportedin part by Michigan State University through computational resources provided by the Institute for Cyber-EnabledResearch. RL, ZF, HL and RZ are supported by the US National Science Foundation under grant PHY 1653405“CAREER: Constraining Parton Distribution Functions for New-Physics Searches”. BY is supported by the U.S.Department of Energy, Office of Science, Office of High Energy Physics under Contract No. 89233218CNA000001and by the Los Alamos National Laboratory (LANL) LDRD program. BY also acknowledges support from the U.S.Department of Energy, Office of Science, Office of Advanced Scientific Computing Research and Office of Nuclear9 N tr + 0.1 N BC R ( t = ) N tr + 0.1 N BC R ( t = ) N tr + 0.1 N BC R ( t = ) N tr + 0.1 N BC R ( t = ) FIG. 21. The observed/predicted gluon correlator C and C ratio of the clover valence fermions p pred = 5, z pred = 3 at t sep = 8 by using r = 0 . N est = 200 for different counts of training data and bias-correction data. The horizontal axisis N tr + 0 . N BC , with N ul = 23040 fixed. The GBT and linear-regressor results are shown on the left and right, respectively.The blue points are predictions with bias correction for the unlabeled test data, and the orange points are observations forunlabeled test data. Type Method η s DA kaon PDF overlap gluon PDF clover gluon PDF z -pred GBT 0.62(14) 0.890(26) 0.53(12) 0.704(62)linear 0.99935(40) 0.998(1) 0.665(79) 0.819(51) p -pred GBT 0.50(13) 0.692(14) 0.07(18) 0.733(62)linear 0.911(43) 0.772(29) − . F v for all the cases we investigate. The larger value of F v indicates the better predictions,and a perfect prediction yields F v = 1 .
0. In general, z − predictions work better than p − predictions, and linear model showsbetter performance than GBT on our dataset. Physics, Scientific Discovery through Advanced Computing (SciDAC) program. [1] S. Aoki et al. (Flavour Lattice Averaging Group), (2019), arXiv:1902.08191 [hep-lat].[2] H.-W. Lin et al. , Prog. Part. Nucl. Phys. , 107 (2018), arXiv:1711.07916 [hep-ph].[3] W. Detmold, W. Melnitchouk, and A. W. Thomas, Eur. Phys. J.direct , 13 (2001), arXiv:hep-lat/0108002 [hep-lat]. t t sep /20.40.20.00.20.40.6 R R tr R pred R pred, BC R ul t t sep /20.40.20.00.20.40.6 R R tr R pred R pred, BC R ul t t sep /20.40.20.00.20.40.6 R R tr R pred R pred, BC R ul t t sep /20.40.20.00.20.40.6 R R tr R pred R pred, BC R ul FIG. 22. The observed/predicted gluon correlator ratios of the clover valence fermions at p pred = 5, z pred = 3 from p in = 4, z in = 3 (upper) and p pred = 5, z pred = 3 from p in = 5, z in = 2 (lower) by using N tr = 2880, N BC = 2880, N ul = 230400, r = 0 . N est = 200. The GBT and linear-regressor results are shown in the left and right columns, respectively. The predictionswith bias correction do not much improve the raw predictions.[4] X. Ji, Phys. Rev. Lett. , 262002 (2013), arXiv:1305.1539 [hep-ph].[5] W. Detmold, R. G. Edwards, J. J. Dudek, M. Engelhardt, H.-W. Lin, S. Meinel, K. Orginos, and P. Shanahan (USQCD),Eur. Phys. J. A55 , 193 (2019), arXiv:1904.09512 [hep-lat].[6] X. Ji, J.-H. Zhang, and Y. Zhao, Phys. Rev. Lett. , 112002 (2013), arXiv:1304.6708 [hep-ph].[7] Y. Hatta, X. Ji, and Y. Zhao, Phys. Rev.
D89 , 085030 (2014), arXiv:1310.4263 [hep-ph].[8] X. Ji, Sci. China Phys. Mech. Astron. , 1407 (2014), arXiv:1404.6680 [hep-ph].[9] X. Ji, J.-H. Zhang, and Y. Zhao, Phys. Lett. B743 , 180 (2015), arXiv:1409.6329 [hep-ph].[10] H.-W. Lin, J.-W. Chen, S. D. Cohen, and X. Ji, Phys. Rev.
D91 , 054510 (2015), arXiv:1402.1462 [hep-ph].[11] C. Alexandrou, K. Cichy, V. Drach, E. Garcia-Ramos, K. Hadjiyiannakou, K. Jansen, F. Steffens, and C. Wiese, Phys.Rev.
D92 , 014502 (2015), arXiv:1504.07455 [hep-lat].[12] J.-W. Chen, S. D. Cohen, X. Ji, H.-W. Lin, and J.-H. Zhang, Nucl. Phys.
B911 , 246 (2016), arXiv:1603.06664 [hep-ph].[13] C. Alexandrou, K. Cichy, M. Constantinou, K. Hadjiyiannakou, K. Jansen, F. Steffens, and C. Wiese, Phys. Rev.
D96 ,014513 (2017), arXiv:1610.03689 [hep-lat].[14] H.-W. Lin, J.-W. Chen, T. Ishikawa, and J.-H. Zhang (LP3), Phys. Rev.
D98 , 054504 (2018), arXiv:1708.05301 [hep-lat].[15] J.-W. Chen, H.-W. Lin, and J.-H. Zhang, (2019), arXiv:1904.12376 [hep-lat].[16] J.-H. Zhang, J.-W. Chen, X. Ji, L. Jin, and H.-W. Lin, Phys. Rev.
D95 , 094514 (2017), arXiv:1702.00008 [hep-lat].[17] J.-W. Chen, L. Jin, H.-W. Lin, A. Sch¨afer, P. Sun, Y.-B. Yang, J.-H. Zhang, R. Zhang, and Y. Zhao, (2017),arXiv:1712.10025 [hep-ph].[18] J.-W. Chen, T. Ishikawa, L. Jin, H.-W. Lin, Y.-B. Yang, J.-H. Zhang, and Y. Zhao, Phys. Rev.
D97 , 014505 (2018),arXiv:1706.01295 [hep-lat]. [19] C. Alexandrou, K. Cichy, M. Constantinou, K. Hadjiyiannakou, K. Jansen, H. Panagopoulos, and F. Steffens, Nucl. Phys. B923 , 394 (2017), arXiv:1706.00265 [hep-lat].[20] M. Constantinou and H. Panagopoulos, Phys. Rev.
D96 , 054506 (2017), arXiv:1705.11193 [hep-lat].[21] J. Green, K. Jansen, and F. Steffens, Phys. Rev. Lett. , 022004 (2018), arXiv:1707.07152 [hep-lat].[22] J.-W. Chen, T. Ishikawa, L. Jin, H.-W. Lin, Y.-B. Yang, J.-H. Zhang, and Y. Zhao, (2017), arXiv:1710.01089 [hep-lat].[23] J.-W. Chen, T. Ishikawa, L. Jin, H.-W. Lin, A. Sch¨afer, Y.-B. Yang, J.-H. Zhang, and Y. Zhao, (2017), arXiv:1711.07858[hep-ph].[24] H.-W. Lin and R. Zhang, Phys. Rev.
D100 , 074502 (2019).[25] B. Yoon, T. Bhattacharya, and R. Gupta, Phys. Rev.
D100 , 014504 (2019), arXiv:1807.05971 [hep-lat].[26] J. Karpie, K. Orginos, A. Rothkopf, and S. Zafeiropoulos, JHEP , 057 (2019), arXiv:1901.05408 [hep-lat].[27] K. Orginos, A. Radyushkin, J. Karpie, and S. Zafeiropoulos, Phys. Rev. D96 , 094503 (2017), arXiv:1706.05373 [hep-ph].[28] A. V. Radyushkin, Phys. Rev.
D96 , 034025 (2017), arXiv:1705.01488 [hep-ph].[29] B. Jo´o, J. Karpie, K. Orginos, A. V. Radyushkin, D. G. Richards, R. S. Sufian, and S. Zafeiropoulos, (2019),arXiv:1909.08517 [hep-lat].[30] B. Jo´o, J. Karpie, K. Orginos, A. Radyushkin, D. Richards, and S. Zafeiropoulos, (2019), arXiv:1908.09771 [hep-lat].[31] A. Natekin and A. Knoll, Frontiers in neurorobotics , 21 (2013).[32] F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss,V. Dubourg, J. Vanderplas, A. Passos, D. Cournapeau, M. Brucher, M. Perrot, and E. Duchesnay, Journal of MachineLearning Research , 2825 (2011).[33] J. H. Friedman, Annals of Statistics , 1189 (2000).[34] J. H. Friedman, Comput. Stat. Data Anal. , 367 (2002).[35] M. Beneke, G. Buchalla, M. Neubert, and C. T. Sachrajda, Phys. Rev. Lett. , 1914 (1999), arXiv:hep-ph/9905312[hep-ph].[36] M. Beneke, G. Buchalla, M. Neubert, and C. T. Sachrajda, Nucl. Phys. B606 , 245 (2001), arXiv:hep-ph/0104110 [hep-ph].[37] X. Ji, A. Sch¨afer, X. Xiong, and J.-H. Zhang, Phys. Rev.
D92 , 014039 (2015), arXiv:1506.00248 [hep-ph].[38] E. Follana, Q. Mason, C. Davies, K. Hornbostel, G. P. Lepage, J. Shigemitsu, H. Trottier, and K. Wong (HPQCD,UKQCD), Phys. Rev.
D75 , 054502 (2007), arXiv:hep-lat/0610092 [hep-lat].[39] A. Bazavov et al. (MILC), Phys. Rev.
D87 , 054505 (2013), arXiv:1212.4768 [hep-lat].[40] A. Hasenfratz and F. Knechtli, Phys. Rev.
D64 , 034504 (2001), arXiv:hep-lat/0103029 [hep-lat].[41] G. S. Bali, B. Lang, B. U. Musch, and A. Sch¨afer, Phys. Rev.
D93 , 094515 (2016), arXiv:1602.05525 [hep-lat].[42] A. C. Aguilar et al. , Eur. Phys. J.
A55 , 190 (2019), arXiv:1907.08218 [nucl-ex].[43] J.-W. Chen, L. Jin, H.-W. Lin, Y.-S. Liu, A. Sch¨afer, Y.-B. Yang, J.-H. Zhang, and Y. Zhao, (2018), arXiv:1804.01483[hep-lat].[44] Z.-Y. Fan, Y.-B. Yang, A. Anthony, H.-W. Lin, and K.-F. Liu, Phys. Rev. Lett. , 242001 (2018), arXiv:1808.02077[hep-lat].[45] T. Blum, P. Boyle, N. Christ, J. Frison, N. Garron, R. Hudspith, T. Izubuchi, T. Janowski, C. Jung, A. J¨uttner, et al. ,Physical Review D , 074505 (2016).[46] A. Bazavov, C. Bernard, J. Komijani, C. DeTar, L. Levkova, W. Freeman, S. Gottlieb, R. Zhou, U. Heller, J. Hetrick, et al. , Physical Review D , 054505 (2013).[47] R. G. Edwards and B. Joo (SciDAC, LHPC, UKQCD), Lattice field theory. Proceedings, 22nd International Sympo-sium, Lattice 2004, Batavia, USA, June 21-26, 2004 , Nucl. Phys. Proc. Suppl.140