On feasibility of azimuthal flow studies with Principal Component Analysis
OOn feasibility of azimuthal flow studies with Principal Component Analysis
Igor Altsybeev Saint-Petersburg State University, Universitetskaya nab. 7/9, St. Petersburg, 199034, Russia ∗ (Dated: July 27, 2020)It is shown that the Principal Component Analysis applied to azimuthal single-particle distribu-tions allows to perform flow analysis in ways that are analogous to the traditional approaches basedon multi-particle correlations. In particular, symmetric cumulants are considered. It is demon-strated also that statistical fluctuations due to a finite number of particles per event practically donot play a role for higher order PCA-based cumulants. ∗ [email protected] a r X i v : . [ nu c l - t h ] J u l I. INTRODUCTION
Principal Component Analysis (PCA) is a method for decorrelation of multivariate data. PCA finds the mostoptimal basis for a given problem and thus reduces its dimensionality. Recently, it was suggested to apply PCA toheavy-ion collisions data on two-particle azimuthal correlations [1], in order to reveal hidden patterns of the collectivebehaviour of the hadronic medium. In [2] it was shown that PCA applied directly to single-particle azimuthal ( ϕ )distributions in A–A collisions reveals Fourier harmonics as the natural and the most optimal basis.In the latter approach, technically, one needs to take distributions of particles in M bins in each out of N events,normalize them and subtract the mean, and then apply PCA to the obtained N × M matrix (see more details in [2, 3]).As an output from PCA, we have a set of orthonormal eigenvectors ( e i , i = 1 , ..., M ), and also a set of coefficients α ki ( k = 1 , ..., N ) of PCA decomposition, such that the particle distribution in k -th event (denoted as x ( k ) that is avector with M elements) can be written as x ( k ) = M (cid:88) i =1 α ( k ) i e i . (1)By construction, the first K components ( K < M ) contain the most of the total variance of a dataset.The azimuthal flow in heavy-ion collisions is typically studied using expansion of particle azimuthal probabilitydensity in a series: f ( ϕ ) = 12 π (cid:2) ∞ (cid:88) n =1 v n cos (cid:0) n ( ϕ − Ψ n ) (cid:1)(cid:3) , (2)where v n are the flow coefficients. As mentioned above, PCA applied to event-by-event azimuthal single-particledistributions reveals the Fourier basis, and thus coefficients of the PCA decomposition gain a definite meaning. If thedecomposition (2) is applied event-by-event, values of ˆ v n observed in the k -th event are related to the PCA coefficients(assuming that the elliptic flow dominates, and the next is the triangular flow) as follows: ˆ v ( k )2 = (cid:113) M (cid:113) α k + α k ,ˆ v ( k )3 = (cid:113) M (cid:113) α k + α k , and so on.In this paper, it is demonstrated that the coefficients of the PCA decomposition can be combined into expressionsthat are equivalent to the multi-particle cumulants used in the flow studies. The first observable considered in Section2 is the flow amplitude calculated via two- and four-particle correlations. So-called symmetric cumulants that measurecorrelations between amplitudes of flow harmonics of different orders are studied in Section 3. It is estimated alsohow the statistical fluctuations contribute to the values extracted using PCA. II. HIGHER-ORDER CUMULANTS FROM PCA
In flow studies, the simplest way to get an estimate for the amplitudes v n is to use the two-particle cumulants c n { } : v n { } = (cid:112) c n { } = (cid:112) (cid:104) v n (cid:105) . (3)It is well-known that this quantity suffers from the so called non-flow contributions coming from e.g. resonance decaysand jets. Suppression of the non-flow is typically done by utilizing the multi-particle cumulants. For example, theamplitude of the n -th harmonic can be estimated from the fourth-order cumulant c n { } [4] as v n { } = (cid:112) − c n { } = (cid:112) (cid:104) v n (cid:105) − (cid:104) v n (cid:105) . (4)We may try to adopt (3) and (4) for the flow studies with the PCA. When one deals with event-by-event particledistributions (like in the present application of the PCA), it is essential to investigate the influence of the statisticalfluctuations due to a limited number of particles per event. Following the approach used, for instance, in [5, 6], wedenote a true amplitude of the n -th Fourier harmonic in a given event as v n and an amplitude of the statistical noiseas a n . If ˆ v n is the observed amplitude, extracted by PCA in a given event, projections of the corresponding flow vectoron x and y axes in the transverse plane areˆ v n,x = v n,x + a n,x , ˆ v n,y = v n,y + a n,y . (5)The squares of the v n , a n and ˆ v n are v n = v n,x + v n,y , (6) a n = a n,x + a n,y , (7)ˆ v n = ˆ v n,x + ˆ v n,y = v n + a n + 2( v n,x a n,x + v n,y a n,y ) . (8)After averaging (8) over events, we get (cid:104) ˆ v n (cid:105) = (cid:104) v n (cid:105) + (cid:104) a n (cid:105) + 2 (cid:10) v n,x a n,x (cid:11) + 2 (cid:10) v n,y a n,y (cid:11) . (9)If we assume that signal and the statistical noise are uncorrelated, the last two terms factorize: (cid:10) v n,x a n,x (cid:11) = (cid:104) v n,x (cid:105)(cid:104) a n,x (cid:105) and (cid:10) v n,y a n,y (cid:11) = (cid:104) v n,y (cid:105)(cid:104) a n,y (cid:105) . Since event-averaged values of the x - and y -components of v n and a n are zero, (9) becomes (cid:104) ˆ v n (cid:105) = (cid:104) v n (cid:105) + (cid:104) a n (cid:105) , (10)and the true value v n is found by inverting (10): (cid:104) v n (cid:105) = (cid:104) ˆ v n (cid:105) − (cid:104) a n (cid:105) . (11)This result was obtained in [2] and can be used to get an estimation of the Fourier amplitudes v n { } using (3). Values (cid:104) a n (cid:105) measure the statistical fluctuations. They can be calculated by applying PCA to the same events, but withrandomized ϕ -angles.The effect from the statistical noise correction on the v is shown in Figure 1 for Pb-Pb events simulated in AMPTgenerator (2.8 × events). Uncorrected raw ˆ v values (upper gray diamonds), extracted directly from PCA, after thecorrection become blue open circles. These circles are on top of the values obtained with the traditional two-particlecumulant method (full circles). It can be seen that effect from the correction is more pronounced for the peripheralevents, where a number of particles per event is lower.For the fourth power of the observed magnitude, we can use (8) again:ˆ v n = (cid:2) v n + a n + 2( v n,x a n,x + v n,y a n,y ) (cid:3) . (12)Averaging (12) over events and taking into account that x and y components of the noise are independent and theirvariances are equal, (cid:104) a n,x a n,y (cid:105) = (cid:104) a n,x (cid:105)(cid:104) a n,y (cid:105) and (cid:104) a n,x (cid:105) = (cid:104) a n,y (cid:105) , we get (cid:104) ˆ v n (cid:105) = (cid:104) v n (cid:105) + (cid:104) a n (cid:105) + 4 (cid:104) v n (cid:105)(cid:104) a n (cid:105) . (13)From (11) and (13), 2 (cid:104) ˆ v n (cid:105) − (cid:104) ˆ v n (cid:105) = 2 (cid:104) v n (cid:105) − (cid:104) v n (cid:105) + 2 (cid:104) a n (cid:105) − (cid:104) a n (cid:105) , (14)and, inverting (14) and substituting into (4), we get the following estimation for the v n : v n { } = (cid:113) (cid:104) ˆ v n (cid:105) − (cid:104) ˆ v n (cid:105) − (cid:0) (cid:104) a n (cid:105) − (cid:104) a n (cid:105) (cid:1) . (15)From this expression, one may note that when v n (cid:29) a n , values of v n { } are remarkably insensitive to the statisticalnoise. Indeed, for a hypothetical case of constant magnitudes of the flow and the statistical noise, v n { } = (cid:112) ˆ v n − a n ,while v n { } = (cid:112) ˆ v n − a n .In Figure 1, open squares correspond to estimations of v n { } based on PCA coefficients, formula (15). They matchsmall closed squares that stand for v n { } calculated with the traditional approach using four-particle correlations. Atthe same time, it can be seen that the raw values ˆ v n { } (diamonds), calculated by (15) without taking into accountthe statistical term, are almost on top of the squares even for peripheral events. This indicates that the correction forstatistical fluctuations is almost irrelevant for the v n { } as soon as the number of particles in events is large enough.Toy studies showed that this is the case when the flow magnitude is v n ∼ . (cid:38) centrality (%) v AMPT Pb-Pb@5 TeV diamonds uncorrected values
PCA v {2}trad. v {2} PCA v {4}trad. v {4} FIG. 1. Amplitudes v of the second Fourier harmonic as a function of centrality in Pb-Pb collisions at 5 TeV in AMPT(2.8 × events). Values extracted by PCA are shown by open markers: circles – v { } , squares – v { } . Small full markersshow calculations using standard 2- and 4-particle cumulant methods. Gray diamonds denote PCA values before the correctionon the statistical noise. Analysis is performed for charged particles with pseudorapidity | η | < . p T ) range is 0.2–5 GeV/ c . Number of ϕ bins used for PCA is M = 48. III. SYMMETRIC CUMULANTS WITH PCA
Following the same strategy, we can investigate feasibility of the studies of the so-called symmetric cumulants [7]with the PCA. This observable measures correlations between the amplitudes of the n -th and m -th Fourier harmonicsand is defined as SC( n, m ) = (cid:104) v n v m (cid:105) − (cid:104) v n (cid:105)(cid:104) v m (cid:105) . (16)Previous attempt to study the symmetric cumulants with the PCA was done in a paper [3]. Since the PCA basisobtained in [3] was somehow distorted (i.e. not identical to the Fourier harmonics), extracted SC in this paper valuesdid not match the “truth” values.We start with the single-event raw quantity:ˆ v n ˆ v m = (ˆ v n,x + ˆ v n,y )(ˆ v n,x + ˆ v n,y ) == (cid:2) ( v n,x + a n,x ) + ( v n,y + a n,y ) (cid:3)(cid:2) ( v m,x + a m,x ) + ( v m,y + a m,y ) (cid:3) == (cid:2) v n + a n + 2( v n,x a n,x + v n,y a n,y ) (cid:3)(cid:2) v m + a m + 2( v m,x a m,x + v m,y a m,y ) (cid:3) . (17)Averaging over events and taking into account (cid:104) v n,x (cid:105) = (cid:104) v n,y (cid:105) = (cid:104) a n,x (cid:105) = (cid:104) a n,y (cid:105) = 0 (the same for the m -thharmonic), and also the factorization of the noise harmonics (cid:104) a n a m (cid:105) = (cid:104) a n (cid:105)(cid:104) a m (cid:105) as well as the noise and the signal (cid:104) v n a m (cid:105) = (cid:104) v n (cid:105)(cid:104) a m (cid:105) , we obtain (cid:104) ˆ v n ˆ v m (cid:105) = (cid:104) v n v m (cid:105) + (cid:104) v n (cid:105)(cid:104) a m (cid:105) + (cid:104) v m (cid:105)(cid:104) a n (cid:105) + (cid:104) a n (cid:105)(cid:104) a m (cid:105) . (18)Using (11) and (18), the desired term (cid:104) v n v m (cid:105) can be expressed via “measurable” quantities as (cid:104) v n v m (cid:105) = (cid:104) ˆ v n ˆ v m (cid:105) − (cid:104) ˆ v n (cid:105)(cid:104) a m (cid:105) − (cid:104) ˆ v m (cid:105)(cid:104) a n (cid:105) + (cid:104) a n (cid:105)(cid:104) a m (cid:105) . (19)The final expression for the SC( n, m ) is thusSC( n, m ) = (cid:104) v n v m (cid:105) − (cid:104) v n (cid:105)(cid:104) v m (cid:105) = (cid:104) ˆ v n ˆ v m (cid:105) − (cid:104) ˆ v n (cid:105)(cid:104) ˆ v m (cid:105) . (20)It is remarkable that all terms related to the statistical noise are canceled.Figure 2 shows centrality dependence of the symmetric cumulants SC(3,2) and SC(4,2) in Pb-Pb collisions fromAMPT. It can be seen that values extracted from PCA (open markers) match with calculations using multi-particlecorrelations (closed markers). For SC(4,2), there are slight deviations in peripheral centrality classes, a possible reasonis the event-plane correlation of these two harmonics. Detailed investigation of this is out of the scope of this article. centrality (%) S C ( m , n )
1e 6
AMPT Pb-Pb@5 TeV
SC(3,2) PCASC(3,2) trad.SC(4,2) PCASC(4,2) trad.
FIG. 2. Centrality dependence of the symmetric cumulants SC(3,2) and SC(4,2) in Pb-Pb collisions from AMPT. Openmarkers – values are extracted from PCA, closed markers – by traditional method of multi-particle correlations. Particleswithin | η | < . p T range 0.2–5 GeV/ c . IV. SUMMARY
It was shown that the Principal Component Analysis applied to event-by-event azimuthal single-particle distri-butions allows to perform flow analyses that are analogous to the traditional approaches based on multi-particlecorrelations. As the first example, flow amplitudes based on the fourth-order cumulant were considered. As thesecond case, correlations between flow amplitudes in terms of symmetric cumulants were calculated. Using realisticevents from the AMPT generator, PCA results were directly compared to calculations using the traditional techniques,a good correspondence was obtained. It was demonstrated also that a contribution from statistical fluctuations dueto a finite number of particles per event to the higher-order PCA-based cumulants is small.
ACKNOWLEDGEMENTS
This study is supported by Russian Science Foundation, grant 17-72-20045. [1] R. S. Bhalerao, J.-Y. Ollitrault, S. Pal, and D. Teaney, Phys. Rev. Lett. , 152301 (2015), arXiv:1410.7739 [nucl-th].[2] I. Altsybeev, Phys. Part. Nucl. , 314 (2020), arXiv:1909.03979 [nucl-th].[3] Z. Liu, W. Zhao, and H. Song, Eur. Phys. J. C , 870 (2019), arXiv:1903.09833 [nucl-th].[4] N. Borghini, P. M. Dinh, and J.-Y. Ollitrault, Phys. Rev. C , 054901 (2001), arXiv:nucl-th/0105040.[5] J. Jia and P. Huo, Phys. Rev. C , 034905 (2014), arXiv:1402.6680 [nucl-th].[6] R. He, J. Qian, and L. Huo, (2017), arXiv:1702.03137 [nucl-th].[7] A. Bilandzic, C. H. Christensen, K. Gulbrandsen, A. Hansen, and Y. Zhou, Phys. Rev. C89