[PDF] How Many Elements Matter?

Abstract

Some studies of stars' multi-element abundance distributions suggest at least 5-7 significant dimensions, but other studies show that the abundances of many elements can be predicted to high accuracy from [Fe/H] and [Mg/Fe] alone (or from [Fe/H] and age). We show that both propositions can be, and are, simultaneously true. We adopt a technique known as normalizing flow to reconstruct the probability distribution of Milky Way disk stars in the space of 15 elemental abundances measured by APOGEE. Conditioning on stellar parameters T_{\rm eff} and \log g minimizes the differential systematics. After conditioning on [Fe/H] and [Mg/H], the residual scatter for the best measured APOGEE elements is \sigma_{[X/{\rm H}]} \lesssim 0.02 dex, consistent with APOGEE's reported statistical uncertainties of \sim 0.01 - 0.015 dex and intrinsic scatter of 0.01-0.02 dex. Despite the small scatter, residual abundances display clear correlations between elements, which are too large to be explained by measurement uncertainties or by the statistical noise from our finite sample size. We must condition on at least seven elements (e.g., Fe, Mg, O, Si, Ni, Ca, Al) to reduce residual correlations to a level consistent with observational uncertainties, and higher measurement precision for other elements would likely reveal additional dimensions. Our results demonstrate that cross-element correlations are a much more sensitive and robust probe of hidden structure than dispersion alone, and they can be measured precisely in a large sample even if star-by-star measurement noise is comparable to the intrinsic scatter. We conclude that many elements have an independent story to tell, even for a "mundane" sample of disk stars and elements produced mainly by core-collapse and Type Ia supernovae. The only way to learn these lessons is to measure the abundances directly, and not merely infer them.

Full PDF

DD RAFT VERSION F EBRUARY

10, 2021Typeset using L A TEX twocolumn style in AASTeX63

How Many Elements Matter?

Yuan-Sen Ting ( 丁源森 ) and David H. Weinberg Institute for Advanced Study, Princeton, NJ 08540, USA Department of Astrophysical Sciences, Princeton University, Princeton, NJ 08540, USA Observatories of the Carnegie Institution of Washington, 813 Santa Barbara Street, Pasadena, CA 91101, USA Research School of Astronomy & Astrophysics, Australian National University, Cotter Rd., Weston, ACT 2611, Australia ∗ Department of Physics, Department of Astronomy, and Center for Cosmology and Astro-Particle Physics,The Ohio State University, Columbus, OH 43210, USA

Submitted to ApJ

Abstract

Some studies of stars’ multi-element abundance distributions suggest at least 5-7 signiﬁcant dimensions, butother studies show that the abundances of many elements can be predicted to high accuracy from [Fe / H] and [Mg / Fe] alone (or from [Fe / H] and age). We show that both propositions can be, and are, simultaneously true.We adopt a machine learning technique known as normalizing ﬂow to reconstruct the conditional probabilitydistribution of Milky Way disk stars in the space of 15 elemental abundances measured by the SDSS APOGEEsurvey. Conditioning on stellar parameters T eﬀ and log g minimizes the differential measurement systematics.After further conditioning on [Fe / H] and [Mg / H] , the residual scatter for the best measured APOGEE elementsis σ [ X/ H] (cid:46) . dex, consistent with APOGEE’s reported statistical uncertainties of ∼ . − . dex andintrinsic scatter of . − . dex. Despite the small scatter, residual abundances display clear correlationsbetween elements, which we show are too large to be explained by measurement uncertainties or by the statis-tical noise from our ﬁnite sample size. We must condition on at least seven elements (e.g., Fe, Mg, O, Si, Ni,Ca, Al) to reduce residual correlations to a level consistent with observational uncertainties, and higher mea-surement precision for other elements would likely reveal additional dimensions. Our results demonstrate thatcross-element correlations are a much more sensitive and robust probe of hidden structure than dispersion alone,and they can be measured precisely in a large sample even if star-by-star measurement noise is comparable tothe intrinsic scatter. We conclude that many elements have an independent story to tell, even for a “mundane”sample of disk stars and elements produced mainly by core-collapse and Type Ia supernovae. The only way tolearn these lessons is to measure the abundances directly, and not merely infer them.

1. Introduction

Ambitious Galactic spectroscopic surveys such as Gaia-ESO (Gilmore et al. 2012), APOGEE (Majewski et al.2017), and GALAH (Buder et al. 2020) have obtained high-resolution, high signal-to-noise ratio (SNR) spectra of hun-dreds of thousands of stars, spanning large swaths of theMilky Way disk, bulge, and halo and some nearby satellitessuch as the Sgr dwarf and the Magellanic Clouds. Other sur-veys including SEGUE (Yanny et al. 2009), RAVE (Stein-metz et al. 2006), and LAMOST (Luo et al. 2015) have ob-

Corresponding author: Yuan-Sen [email protected] ∗ Hubble Fellow tained lower resolution spectra of even larger stellar sam-ples. The high-resolution surveys provide detailed chemicalﬁngerprints for each program star, typically measuring 15-30 elements per star. This is further complemented by thelower resolution surveys which measure bulk metallicity andother abundance ratios (Ting et al. 2017b; Xiang et al. 2019;Wheeler et al. 2020). In concert with precise distances andproper motions from

Gaia (Gaia Collaboration et al. 2018),and with asteroseismic calibration of stellar ages (Pinson-neault et al. 2018; Miglio et al. 2020), these surveys afford anincreasingly detailed picture of the Milky Way’s stellar pop-ulations and dynamics, far beyond that available as recentlyas a decade ago.It has long been recognized that the ratio of α -elements(produced mainly by core-collapse supernovae) to iron peak a r X i v : . [ a s t r o - ph . GA ] F e b T ING & W

EINBERG elements (which are additionally produced by SNIa on alonger timescale) is an important dimension of stellar abun-dance variation in addition to overall metallicity character-ized by [Fe / H] (e.g., Fuhrmann 1998; Bensby et al. 2003;Hayden et al. 2015). However, the evidence on variations be-yond [Fe / H] and [ α/ Fe] is mixed. On the one hand, Nesset al. (2019) found that a combination of [Fe / H] and stel-lar age is sufﬁcient to predict the value of other APOGEE [ X/ Fe] abundance ratios with precision comparable to themeasurement uncertainties. In a related vein, Weinberg et al.(2019) and Grifﬁth et al. (2020) found that an empirical “two-process” model ﬁt to median abundance trends can predictthe APOGEE [ X/ Mg] ratios for most disk and bulge stars tosurprisingly high precision from [Mg / H] and [Mg / Fe] alone.On the other hand, by applying principal component analy-sis (PCA) to a much smaller, pre-APOGEE data set, Tinget al. (2012) found that ﬁve to seven components were neededto describe the multi-element abundances of solar neighbor-hood stars. Andrews et al. (2017) reached compatible conclu-sions with a different abundance sample. Working directlywith spectra, Price-Jones & Bovy (2018) found that 10 com-ponents are needed to explain the diversity of APOGEE H -band spectra.One goal of this paper is to reconcile these seemingly dis-parate conclusions and demonstrate that most elements con-tain critical information that cannot be simply inferred fromthe metallicity and α -enhancement alone. Our approach isbased on a powerful machine-learning technique called nor-malizing ﬂow, which we use to create an accurate model ofthe probability distribution function (PDF) of 15 abundancesmeasured in APOGEE disk stars, along with the stellar pa-rameters T eﬀ and log g . In particular, we can condition onthe values of T eﬀ , log g , and a subset of elements, then eval-uate the joint distribution of the remaining elements. Theresidual abundances — star-by-star deviations from the con-ditional mean — display signiﬁcant cross-element correla-tions, revealing underlying structures. Residual correlationsonly approach what we would expect from the observationaluncertainties after conditioning on seven elements, signalingthat most elements carry independent information.Drawing on these results, we address the related question:is it still worth measuring many elemental abundances forlarge samples, even if [Fe / H] and [ α/ Fe] can already predictthese abundances with ∼ . -dex root mean square (rms)dispersion? Our answer is an emphatic yes. Correlations canbe measured at high signiﬁcance in large samples even whenthe observational uncertainties are comparable to, or largerthan, the star-to-star intrinsic dispersion of individual ele-ments. This residual correlation structure from multi-elementabundance trends may then provide critical diagnostics aboutmyriad astrophysical processes, including stellar yields, ISMmixing, and merger history. Conditioning the abundance distribution on Mg and Fe hassome similarities to ﬁtting a two-process model like that ofWeinberg et al. (2019). A forthcoming paper (D. Weinberg etal., in prep.) will generalize this model-ﬁtting approach andexamine star-by-star correlated deviations from two-processpredictions. While there is some overlap between these twopapers, the two approaches have different underlying princi-ples and complementary practical advantages. The normaliz-ing ﬂow method used here opens a novel route to minimizingobservational systematics by conditioning on T eﬀ and log g .It is conceptually and practically straightforward to conditionon additional elements and thereby assess the independentinformation encoded in their abundances.. The future paperaims to provide more physical insights about the emergenceof these correlated deviations.In this paper’s next section we discuss why statistical cor-relations can reveal “hidden” degrees of freedom that mightbe buried in the dispersion about the conditional mean pre-dictions. Section 3 introduces the normalizing ﬂow techniquefor describing arbitrary high-dimensional distributions. In §4we apply this technique to a sample of disk red giants fromAPOGEE Data Release 16 (DR16; Ahumada et al. 2020;J¨onsson et al. 2020). In §5 we discuss the implications of ourmethodology and results for the dimensionality of the stellarabundance distribution, for methods of abundance determi-nation, for chemical tagging of co-natal stellar populations,and for the design of stellar spectroscopic surveys. In §6 wesummarize our ﬁndings and identify avenues for further ap-plication of these techniques.

2. Variance, correlation, and dimensionality

As noted above, previous studies have shown that by con-ditioning on two elements representing core-collapse super-novae and SNIa, such as Mg and Fe, one can predict otherelemental abundances to impressive accuracy. This does notnecessarily imply that other elemental abundances are redun-dant. For example, if there are other residual correlationsamong groups of elements, it would mean that the elementalabundance space contains information beyond the amplitudeof these two processes. These correlations, even if small,would imply that there are other hidden degrees of freedom(or “dimensions”) in the Milky Way’s chemical evolution. Inthis section we lay out the key ideas related to the measure-ment of such correlations before turning to the speciﬁcs ofour method in §3 and §4.Consider N random variables, X , · · · , X N , that represent N elemental abundances, and let the correlation of any of thetwo variables be ρ jk . By deﬁnition, the covariances of these N variables are C jk = ρ jk σ j σ k , (1)where σ k is the standard deviation of the k -th variable. Sup-pose these N variables represent elemental abundances after OW M ANY E LEMENTS M ATTER ? 3subtracting the mean abundances conditioned on Fe and Mg.For simplicity, we assume that these N variables approach amultivariate Gaussian distribution. We would like to know ifthere are correlations among these residual abundances, indi-cating underlying physical structures in the abundance distri-bution.One way to search for residual correlation is by investigat-ing the change in variance , after conditioning on an addi-tional variable. However, this method proves to be ratherinsensitive in practice. To understand this, we start withthe simpler illustration shown in Fig. 1. In the middle pan-els, we show two bivariate Gaussians with moderate strengthcorrelations, ρ = 0 . and . . In the right panel we com-pare the variance of the second variable evaluated from themarginal distribution with the variance after conditioning onthe ﬁrst variable. Intuitively, if two variables are correlated,one would expect conditioning on the ﬁrst to reduce the vari-ance of the second, by taking advantage of the informationprovided by the ﬁrst variable. However, Fig. 1 shows that acorrelation that is easily detected visually in the middle pan-els is barely discernible when looking at the change of thevariance as shown in the right panels.This intuition can be formulated more rigorously. Assume X to be the variable that we condition on. For a multivariateGaussian, the new covariance matrix after conditioning on X can be analytically calculated to be C (cid:48) [2 ,N ] , [2 ,N ] = C [2 ,N ] , [2 ,N ] − C , [2 ,N ] C − C [2 ,N ] , . (2)Each of these terms represents individual submatrices of theoriginal covariance matrix C jk , and we use the subscript [2 , N ] to represent the ( N − × ( N − matrix for the secondto the N -the variable. The diagonal entries of the new matrix C (cid:48) [2 ,N ] , [2 ,N ] are the variances upon the conditioning. By eval-uating Eq. 2, one can obtain an analytic expression for thesenew dispersion terms. The variance of the k -th variable canbe written as σ (cid:48) k = σ k (1 − ρ k ) . (3)This implies that the fractional change of the dispersion is − σ (cid:48) k σ k = 1 − (cid:113) − ρ k . (4)For ρ k (cid:28) , the fractional change becomes (cid:39) ρ k / .The upper left panel of Fig. 1 illustrates this relation. Forthe speciﬁc correlation values of 0.2 and 0.4, the fractionalchanges of the dispersion are 2% and 8%, respectively. Foran abundance intrinsic dispersion of 0.01 dex, typical of whatwe ﬁnd for well measured APOGEE elements (see §4), onewould be looking for changes in the dispersion of ∼ . dex or less. In this study, we will use the word “dispersion” to refer to the standarddeviation of a variable, equal to the square-root of its variance.

Although a reduction of 0.0008 dex might seem impossi-ble to measure, we emphasize that, in principle, the residualvariance of elemental abundances can be measured preciselyin a large sample even if the observational uncertainties forindividual stars are comparable to the intrinsic dispersion ofthe abundances.

The “variance of the variance” is not thevariance itself.

Mathematically, suppose that we estimate thetotal variance for a given elemental abundance as s = 1 N sample N sample (cid:88) i =1 (cid:0) X j,i − ¯ X j (cid:1) , (5)where ¯ X j is the conditional mean abundance . The mean-squared uncertainty of this estimate — the variance of thevariance — is (cid:10) ( s − σ j, tot ) (cid:11) = (cid:10) ( s ) (cid:11) − (cid:10) s σ j, tot (cid:11) + (cid:10) σ j, tot (cid:11) (6) = (cid:10) ( s ) (cid:11) − σ j, tot (7) = 1 N sample (cid:10) ( X j,i − ¯ X j,i ) (cid:11) (8) = 3 N sample σ j, tot , (9)where the third equality assumes that the star-to-star uncer-tainties are independent and the fourth assumes that they areGaussian (with a kurtosis of three). We use the symbol σ j, tot to represent the true total variance, and s to represent theestimate of the total variance. The rms fractional uncertaintyin the variance is (cid:10) ( s − σ j, tot ) (cid:11) / σ j, tot = (cid:18) N sample (cid:19) / . (10)The equation shows that the variance of the variance can besmall even if the variance itself is large.In this study, we will mostly focus on the total measuredvariance without distinguishing its sources, but we note thatthe total variance of an elemental abundance is the sum of theintrinsic variance and the observational uncertainty, σ j, tot = σ j, int + σ j, obs . (11)Since the variance can be measured with a small fractionaluncertainty in a large sample, as a corollary, the value of the We are taking N sample large enough to ignore the slight bias caused byusing the sample mean instead of the true mean. T ING & W

EINBERG

Figure 1.

Measuring abundance correlation structure is more powerful than measuring the dispersion about the conditonal mean.

Left:

Frac-tional reduction in the dispersion of a random variable after conditioning on the value of a correlated variable, as a function of the correlationcoefﬁcient (Eq. 4).

Middle:

Density plot of random draws from a bivariate Gaussian distribution with dispersion of 0.01 dex for each variable (atypical intrinsic dispersion for APOGEE elemental abundances) and correlation coefﬁcients ρ = 0 . (top) and . (bottom). Right:

Distributionof variable 2 before (blue) and after (orange) conditioning on the value of variable 1. The changes of dispersion for these two values of ρ are2% and 8%, respectively, or less than 0.0008 dex. Correlations that are readily visible in the 2-d distributions of the middle panels are barelydiscernible in the change of dispersion. intrinsic variance can be determined even if the observationaluncertainty dominates the star-to-star dispersion. While we can in principle detect the reduction of vari-ance from conditioning on a correlated variable with a largeenough sample, it remains challenging in practice. For theexamples in Fig. 1, with ρ = 0 . and 0.4, the reductions invariance are 4% and 16% (twice the fractional reductions inthe dispersion), and detecting them with σ statistical sig-niﬁcance requires N sample (cid:39) , and N sample (cid:39) ,respectively. Although these numbers might appear withinreach of large surveys, one must account for the fact that theeffective number of stars after conditioning at a point in el-emental abundance space is much smaller than the total sizeof the sample, a point we return to in §4 below. Because of Deriving the intrinsic dispersion does require accurate knowledge of theobservational dispersion, and systematic uncertainties in the magnitude ofthis observational dispersion may dominate over statistical uncertainties.For example, if σ j, int = σ j, obs , then a systematic uncertainty of √ inthe value of σ j, obs would be enough to explain all of the observed varianceas a consequence of measurement dispersion. that, as we will see, such a signal is often not statisticallysigniﬁcant with the current data.More generally, for ρ (cid:28) the fractional reduction of thevariance is ρ , and, from Eq. 10, detecting this reduction at asigniﬁcance of νσ requires N sample ≥ ν ρ . (12) While measuring the reduction of variance can be chal-lenging, measuring non-zero correlations is much easier.For a multivariate Gaussian distribution, the correlation es-timate’s uncertainty due to ﬁnite sampling has a neat analyticapproximation, known as the Fisher transformation. In par-ticular, let the correlation be ρ , and let z ﬁsher = tanh − ( ρ ) .It can be shown that (Fisher 1921), with a sample size N sample , the variable z ﬁsher is normally distributed withmean equal to ln[(1 + ρ ) / (1 − ρ )] and standard deviationof / (cid:112) N sample − . In the null hypothesis with correlation ρ = 0 , we have z ﬁsher (cid:39) ρ , and the mean of the distributionapproaches 0 and the standard deviation (cid:39) / (cid:112) N sample . OW M ANY E LEMENTS M ATTER ? 5This result echoes Eq. (10), where the fractional uncer-tainty of the variance is (cid:112) /N sample . Detecting a non-zerocorrelation ρ at signiﬁcance of νσ requires N ≥ ν ρ . (13)Comparing Eqs. (13) and (12), in addition to a factor of threegain, the key difference is that the denominator is ρ insteadof ρ . Therefore, for ρ (cid:28) , it is far easier to detect a corre-lation of elements directly than to detect it through reductionof variance. For example, to detect a signal from ρ = 0 . at σ signiﬁcance only requires an effective sample of 25 stars,and for ρ = 0 . , 100 stars. We will see an empirical demon-stration of this point in our APOGEE analysis.A second advantage of focusing on cross-element corre-lations is that observational uncertainty is unlikely to pro-duce artiﬁcial correlations at a signiﬁcant level. We discussthis point further for the speciﬁc case of APOGEE abun-dance measurements in §4.4.1 below, and we add a signif-icant caveat in §4.4.3. Nonetheless, the measurement andinterpretation of correlations is not immune to observationaluncertainty. The generalization of Eq. (11) is C jk, tot = C jk, int + C jk, obs . (14)If the observational covariance is diagonal – i.e., if the obser-vational uncertainties are uncorrelated from one element toanother – then off-diagonal elements of the total covarianceare just C jk, tot = C jk, int = ρ jk, int σ j, int σ k, int . For j (cid:54) = k ,therefore, the relation between total and intrinsic correlationsis ρ jk, tot = C jk, tot σ j, tot σ k, tot (15) = ρ jk, int σ j, int σ k, int σ j, tot σ k, tot (16) = ρ jk, int (cid:32) σ j, obs σ j, int (cid:33) − / (cid:32) σ k, obs σ k, int (cid:33) − / . (17)Thus, the measured correlations are always smaller in mag-nitude than the intrinsic correlations, by a factor that dependson the ratio of observational variance to intrinsic variance;for σ j, obs (cid:39) σ j, int the reduction is about a factor of two.This reduction makes it more difﬁcult to detect a given levelof intrinsic correlation, especially correlations involving ele-ments with large observational dispersion. In short, the mea-sured correlation is a conservative limit, and if the measuredcorrelation is signiﬁcant then the detection is signiﬁcant . Similar to the variance, systematic uncertainties in the magnitude of σ j, obs may limit our ability to infer the true values of the intrinsic correlations.However, these systematic uncertainties cannot make a non-zero measuredcorrelation consistent with a zero intrinsic correlation. The third advantage of measuring cross-element correla-tions is that their information content is richer than that ofexcess variance alone, with informative clues to the physicalorigin of the residual abundance variations. Even if the mag-nitude of intrinsic correlation coefﬁcients is uncertain, non-zero values of these coefﬁcients are unambiguous evidenceof remaining structure in the element distribution. Thus, thefocus of our observational investigation will be to ask howmany conditioning elements must be considered in the PDFbefore the residual correlations of the remaining elements areconsistent with the observational uncertainties. For purposesof this paper, we take this number of conditioning elementsto be our operative deﬁnition of the “dimensionality” of thedisk stellar population in the space of APOGEE elementalabundances.

The abundances of elements in the atmosphere of a starare estimated by ﬁtting a model to the observed spectrum inwhich the value of the abundance is a free parameter. Themodeling is complex, as it must include corrections for in-strumental effects and telluric lines, a model of the stellaratmosphere, and a calculation of a synthetic spectrum fromthat atmosphere, which in turn depends on an adopted listof wavelengths and oscillator strengths for lines of differentelements. Analyses of small sets of high-resolution spectramay be done “by hand,” but large surveys typically rely oncodes that do automated spectral ﬁtting and parameter opti-mization. For the giant stars in APOGEE DR16 that we willadopt in this study (see J¨onsson et al. 2020), the APOGEEStellar Parameters and Chemical Abundances Pipeline ( AS - PCAP ; Garc´ıa P´erez et al. 2016; Holtzman et al. 2015) as-sumes

MARCS model atmospheres (Gustafsson et al. 2008;J¨onsson et al. 2020) and ﬁrst ﬁts a seven-parameter model to the continuum-normalized H -band spectrum, then infersthe abundances of other elements by varying [M/H] and ﬁt-ting the spectrum in wavelength windows in which lines ofthat element dominate, with all other parameters held ﬁxed.There are two main sources of uncertainty in such esti-mates. The ﬁrst is the statistical uncertainty arising from pho-ton noise. The second is the systematic uncertainty arisingbecause the models and calibrations used to infer abundancesare imperfect. These systematic effects can include depar-tures from LTE or plane-parallel geometry, incomplete or in-accurate line lists, and data-related effects such as inaccurateline-spread functions or incomplete removal of telluric con- The parameters are T eﬀ , log g , the microturbulence v micro , and the abun-dance ratios [M/H], [ α /M], [N/M], and [C/M], where M is an overall metal-licity scaling all elements together and [ α /M] scales the α -elements O, Ne,Mg, Si, S, Ar, Ca, and Ti. T ING & W

EINBERG tamination. In an absolute sense — the difference betweena star’s estimated and true abundances — the systematic un-certainties are often larger than the statistical uncertainties ina high-SNR spectroscopic survey. For this reason, the highstatistical precision of abundance uncertainties derived fromhigh-SNR spectra is often dismissed as unrealistic. However,to the extent that the systematic uncertainties are the same forall sample stars, they do not add dispersion to the measuredabundances. Observational contributions to element disper-sion arise from photon noise, and they can also arise from differential systematic uncertainties within the sample.The latter contribution can be minimized by examiningstars within a narrow range of parameters such as T eﬀ , log g ,and [Fe / H] , or, with the method implemented in this pa-per, by examining distributions conditioned on these param-eters. One of our study’s innovations is the ability to mini-mize the systematic uncertainties through ﬂexible modellingof the elemental abundance space, conditioned on variablesthat contribute to the systematic uncertainties. This innova-tion allows the observational dispersion to approach the pho-ton noise limit, enabling a statistically signiﬁcant detectionof the residual abundance correlations.Accurate characterization of statistical abundance uncer-tainties due to photon noise is therefore important even ifthe absolute abundance uncertainty is dominated by system-atic uncertainties. The impact of photon noise on abun-dance measurements is primarily diagonal, i.e., adding ob-servational uncertainties that are statistically independent fordifferent elements. However, photon noise uncertainties arenot entirely diagonal because uncertainties in some parame-ters (especially T eﬀ , log g , and overall metallicity) affect themodels used to infer all abundances, and additionally becausesome abundances may be measured from blended spectralfeatures or from molecular lines that involve two elements.We estimate these off-diagonal uncertainties for APOGEEabundances in §4.4.1 below.APOGEE reports statistical uncertainties based on repeatobservations of a subset of stars, which are used to derivean empirical formula relating the standard deviation of re-peat observations to a star’s T eﬀ and [M/H] and the SNR ofits spectrum (J¨onsson et al. 2020). These uncertainties areusually larger than those derived from the χ -ﬁtting of thespectrum, which implies that variations of observing condi-tions are contributing to the observational uncertainties in ad-dition to Poisson ﬂuctuations in the number of photons perpixel. Nonetheless, we will generally refer to these statisticalmeasurement uncertainties with the shorthand phrase “pho-ton noise.”

3. Describing Distributions with Normalizing Flows

Elemental abundance space correlations can reveal manysubtle properties about stellar yield processes and ISM mix- ing. In particular, if we can recover the multi-dimensionaldistribution p ([ X/ H]) spanned by the elemental abundancesof stars, we can then calculate moments of the distributionand correlations between elemental abundances. In practice,the data that we collect (e.g., from APOGEE) is only an en-semble of realizations drawn from the PDF, { [ X/ H] i } , whereeach realization i consists of the measured abundances of anindividual star. Therefore, the ﬁrst step to study the elementalabundance space requires tools that can faithfully recover thePDF p ([ X/ H]) from the ensemble of realizations { [ X/ H] i } ,which we will elaborate on in this section.One way of recovering a distribution from an ensemble ofrealizations is to conjecture a functional form for the distri-bution and then maximize the likelihood of the parameters.Mathematically, let (cid:126)x be an N -dimensional random variable,and let { (cid:126)x i } be the ensemble of the realizations. If we assume f θ ( (cid:126)x ) to be the functional form of the normalized PDF, char-acterized by θ , ﬁnding that distribution that best describesthe data translates into a simple question of ﬁnding the θ ∗ that optimizes the likelihood, θ ∗ = argmax θ (cid:104) (cid:88) i ln f θ ( (cid:126)x i ) (cid:105) . (18)However, for an arbitrary distribution, our human heuristicon the functional form f θ can be quite limiting (see Fig. 2).Especially for a high-dimensional and irregular distributionlike that of stars in elemental abundance space, any ad hocfunctional form might not fully capture all the distributiondetails. This is where the idea of normalizing ﬂow, a machinelearning tool that is rapidly growing in applications, can playan important role.The basic idea of normalizing ﬂow is to use neural net-works, characterized by the neural network coefﬁcients ψ ,as a change of variables. More speciﬁcally, the goal is totransform the multi-dimensional random variable (cid:126)x to a newrandom variable (cid:126)z = f ψ ( (cid:126)x ) of the same dimension, such that p ( (cid:126)z ) is a much simpler and more recognizable distributionthan p ( (cid:126)x ) . We call the distribution p ( (cid:126)z ) the “base distribu-tion.” In this study, the base distribution is chosen to be aunit-multivariate Gaussian distribution with a zero mean andan identity covariance matrix.For a change of random variables, one needs to take intoaccount the change of measure f (cid:48) ψ ( (cid:126)x ) , known as the Jaco-bian. More precisely, p ( (cid:126)z ) = p ( f ψ ( (cid:126)x )) | f (cid:48) ψ ( (cid:126)x ) | . (19)Therefore, in order to evaluate likelihood in the (cid:126)z space, weneed to ensure that the neural networks’ Jacobian | f (cid:48) ψ ( (cid:126)x ) | is easily and analytically calculatable. With that premise,we can then optimize for the neural network coefﬁcients ψ ,such that the neural network transforms the original ensem-ble { (cid:126)x i } to { (cid:126)z i } and that the ensemble of { (cid:126)z i } approaches a OW M ANY E LEMENTS M ATTER ? 7

Figure 2.

Normalizing ﬂow can faithfully describe challenging distributions. As an illustration, we generate a 2D moon-shaped PDF andplot its density distribution (via a sample of random draws), as shown in the left panel. The two middle panels show the recovery of thedistribution assuming 4-component and 10-component Gaussian Mixture Models. These fail to describe the distribution, even in 2D. On theright, we demonstrate a much better recovery with a normalizing ﬂow, as illustrated further in Fig. 3. multivariate Gaussian distribution. Mathematically, we opti-mize for the neural network coefﬁcients ψ through the stan-dard back-propagation technique with a rectiﬁed ADAM op-timizer (Liu et al. 2019), and ﬁnd ψ ∗ such that ψ ∗ = argmax ψ (cid:104) (cid:88) i ln P (cid:126)z ( f ψ ( (cid:126)x i )) | f (cid:48) ψ ( (cid:126)x i ) | (cid:105) (20)In other words, we maximize the likelihood such that the en-semble of (cid:126)z approaches a unit-Gaussian distribution. We em-phasize that the condition that the Jacobian of the neural net-works be analytically calculable is crucial, since it would beotherwise prohibitively expensive to perform the optimiza-tion.Besides the Jacobian criterion, another criterion is equallyimportant. Note that f ψ transforms the random variable from (cid:126)x to (cid:126)z , with which we calculate the likelihood. Yet, the mul-tivariate Gaussian distribution p ( (cid:126)z ) is the one from which wecan easily sample. Therefore, if we want to sample p ( (cid:126)x ) ef-fectively, the neural network f ψ has to be analytically invert-ible so that we can ﬁrst sample (cid:126)z from p ( (cid:126)z ) , then evaluate f − ψ ( (cid:126)z ) to attain an ensemble of (cid:126)x .In short, to use neural networks as a ﬂexible change ofrandom variables, the networks need to satisfy two criteria,namely (a) the Jacobian of the network is analytic and (b) theneural network is analytically invertible. The subset of neuralnetworks that satisfy these two criteria are known as “nor-malizing ﬂows.” They can transform any random variablefrom a complex distribution into a unit-Gaussian (“normal”)distribution (see Fig. 3).The idea of normalizing ﬂow has inspired the ma-chine learning community since its inception (e.g., JimenezRezende & Mohamed 2015). The ability to describe high-dimensional PDFs given any ensemble also makes it one ofthe most ﬂexible machine learning tools to apply to the phys-ical sciences. However, normalizing ﬂow has had a slowstart in astronomy, with only a handful of applications to date. For example, Cranmer et al. (2019) applied normal-izing ﬂow to describe the color-magnitude diagram of theGaia data, and Reiman et al. (2020) used normalizing ﬂow tomodel quasar continua. More recently, Green & Ting (2020)adopted normalizing ﬂows to describe phase-space distribu-tions and solve for Galactic dynamics.In this study, we will adopt normalizing ﬂows to describethe elemental abundance distribution of stars. We assume asimilar normalizing ﬂow to that adopted by Green & Ting(2020). More speciﬁcally, we adopt eight units of “Neu-ral Spline Flow” coupled with the “Conv1x1” operation (or“GLOW” in the machine learning lingo). Each Neural SplineFlow unit consists of three layers of densely connected net-works with 16 neurons. We refer interested readers to theoriginal articles (Kingma & Dhariwal 2018; Durkan et al.2019). Here we give a brief overview.Roughly, a Neural Spline Flow performs an invertiblespline transformation whose Jacobian is analytically calcu-lable. The Conv1x1 operation, on the other hand, performs alinear transformation of the variables. Like most normalizingﬂows, the trick to ensure an analytic Jacobian is through theidea of “coupling”, i.e., performing a change of variable fora subset of variables at each transformation unit. By onlychanging a subset of the variables each time, we can en-sure that the f (cid:48) ψ matrix is triangular, which allows for a morestraightforward calculation of the Jacobian (the determinantof f (cid:48) ψ ). When many transformation units (eight in total, in ourstudy) are applied together, and each unit transforms a subsetof variables in a tractable way, we can change the complexmultivariate random variables gradually to become a simplerGaussian distribution (see Fig 3).Since the idea of normalizing ﬂow is rather new, well-tested public codes are limited. Through our own exten-sive exploration, we found that most public packages donot seem to perform as well as hand-crafted codes. Wetherefore use our own codes adapted from Github reposi- T ING & W

EINBERG

Figure 3.

Structure of the normalizing ﬂow used in Fig. 2 and this study. Normalizing ﬂows adapt neural networks as a change variable.The goal is to transform the distribution from a complicated distribution (top left) to a simple base distribution (bottom right) through aseries of gradual transformations. Here we assume a unit-multivariate Gaussian as the base distribution. This study adopts a normalizingﬂow that comprises eight units of Conv1x1 and Neural Spline Flow coupling. The Conv1x1, as illustrated in the even columns, performslinear transformations on the variables, while Neural Spline Flow, as shown in the odd columns, transforms the variables through splineinterpolations parametrized by neural networks. The odd rows demonstrate the transformations, and the even rows show the results of thesegradual transformations. tory

KARPATHY / PYTORCH - NORMALIZING - FLOWS . A sim-ilar code was also used in Green & Ting (2020) and is madepublic on Github .To demonstrate the power of normalizing ﬂow, in Fig. 2,we present a case study with a simple double moon-shapeddistribution. We chose this toy example as it loosely re-sembles the elemental abundance space distribution of thehigh- α population versus the low- α population. The leftpanel shows the ensemble of realizations drawn from themoon-shaped distribution. The two middle panels illustratethe cases where we attempt to describe the distribution withGaussian Mixture Models, with four and ten components,respectively. Even in this simple example of 2D variables,Gaussian Mixture Models fail to represent the actual distri-bution faithfully. The right panel shows the results ﬁtted with https://github.com/tingyuansen/deep-potential a normalizing ﬂow, which clearly outperforms the GaussianMixture Models in depicting the double moon-shaped distri-bution.We further demonstrate how normalizing ﬂow works inFig. 3. From left to right, and from top to bottom, Fig. 3shows how normalizing ﬂow gradually transforms the com-plicated double moon-shaped distribution to the simple basedistribution (a 2D Gaussian) through a series of operations.The alternate panels show the neural spline ﬂow operationand the Conv1x1 operation. For each step, the top panelshows the transformation, and the bottom shows the resultof the transformation.Depicting a conditional distribution p ( (cid:126)x | (cid:126)y ) with normal-izing ﬂow only requires a minor modiﬁcation of what wehave discussed thus far. Instead of the usual change of vari-able, characterized by a neural network f ψ , it sufﬁces to en-sure that the neural network coefﬁcients also depend (con-tinuously) on the conditioning variable (cid:126)y . In other words, OW M ANY E LEMENTS M ATTER ? 9instead of f ψ , we have f ψ ( (cid:126)y ) . Operatively, for any func-tion h ( (cid:126)x ) in a Neural Spline Flow, where h : R k → R k is a function that transforms some part of the random vari-ables with k dimension, to build in the dependency on the m -dimensional variable (cid:126)y we consider h ( (cid:126)x, (cid:126)y ) = h ( (cid:126)x ) h ( (cid:126)y ) ,where h : R k → R k is the mapping that does the originaltransformation, and h : R m → R k is an additional functionthat governs how the mapping h should be modiﬁed withdifferent (cid:126)y . Both h and h are each represented by a denselyconnected network with three hidden layers and 16 neuronseach. To put it simply, for different conditioning variables (cid:126)y , the neural network depicts a different change of variable f ψ ( (cid:126)y ) that maps the variables (cid:126)x (with the same (cid:126)y ) to vari-ables (cid:126)z that form a unit-Gaussian distribution. In principlewe could compute a conditional PDF by numerically inte-grating the joint PDF, but training a new normalizing ﬂowby this modiﬁed technique is a more efﬁcient way to attainan accurate conditional PDF and to evaluate the conditionallikelihood.In this study, we will adopt the conditional normalizingﬂow to model the distribution of the APOGEE elementalabundances. As discussed earlier, the conditioning variables (cid:126)y include T eﬀ , log g of the stars, as well as a subset of ele-mental abundances (such as Fe and Mg). The independentvariable (cid:126)x is comprised of other elemental abundances thatwe do not condition on. When we condition on a new set ofvariables, we retrain the normalizing ﬂow each time . Recallthat normalizing ﬂow allows us to sample p ( (cid:126)x | (cid:126)y ) through theinverse mapping f ψ ( y ) : ( (cid:126)z, (cid:126)y ) → (cid:126)x . Therefore, for any spe-ciﬁc reference value of the conditioning variable (cid:126)y , we canevaluate the correlation of the independent elemental abun-dances (cid:126)x by drawing samples from p ( (cid:126)x | (cid:126)y ) . As we will see inthe next section, the ability to draw samples and evaluate thecorrelation matrix at any values for the conditioning variableswill come in handy for several aspects of this study.

4. How many elements matter?

We select Milky Way disk stars from APOGEE DR16 withthe following selection criteria:• Galactocentric radius ≤ R ≤ ,• Midplane distance | Z | ≤ ,• ≤ log g < . ,• ≤ T eﬀ < , Training a normalizing ﬂow with ∼ T eﬀ , and log g ), any improvement from the GPUacceleration appears to be minimal. • − . ≤ [Mg / H] < . • SNR > for [Mg / H] ≥ − . ; SNR > for [Mg / H] < − . .We eliminate stars with STAR BAD , EXTRATARG , or

NO ASPCAP RESULT ﬂags set, or with ﬂagged values of [Fe / H] or [Mg / Fe] . Our geometric cuts are based on dis-tances from A

STRO

NN (Leung & Bovy 2019a,b), whichare publicly available as a value-added catalog for APOGEEDR16.The log g range selects luminous giants, allowing us tosample the full range of Galactocentric radii. The lower T eﬀ cut eliminates cool stars for which ASPCAP abundances maybe less reliable. Further, many stars with < K in theparent sample have ﬂagged values of Mn and Cu. The up-per T eﬀ cut eliminates red clump stars (see ﬁg. 3 of Vincenzoet al. 2021), which should have reliable abundances but couldbe offset from giant branch stars. More generally, our use ofrestricted log g and T eﬀ ranges is intended to reduce the im-pact of differential systematic uncertainties (see §4.2); wefurther mitigate the differential systematics by conditioningon T eﬀ and log g when training the normalizing ﬂows. Ourhigh SNR threshold is intended to select stars with the mostreliable abundances. We relax this threshold at low [Mg / H] to maintain an adequate sample size in this range when study-ing the lower metallicity populations. However, since ourstudy mostly focuses on the Solar metallicity stellar popula-tions, this choice is not critical.Our criteria are similar to those used by Weinberg et al.(2019), but here we use DR16 instead of DR14. Also, theyused a somewhat narrower log g range ( log g = 1 − ) andhad no separate T eﬀ cut. The number of stars passing ourcuts is , . The 15 elements that we use in this study arethe α -elements Mg, O, Si, S, Ca, the light odd- Z elementsNa, Al, and K, the iron-peak elements V, Cr, Mn, Fe, Co, Ni,and the “iron-cliff” element Cu . Although APOGEE mea-sures C and N, we do not use them here because their atmo-spheric abundances in giant branch stars are affected by inter-nal mixing and do not reﬂect the stars’ birth abundances. Wedeﬁne [ X/ H] abundances from the reported APOGEE mea-surements as [ X/ H] = [ X/ Fe] + [Fe / H] . Some stars thatpass our global ﬂag cuts have ﬂagged values of [ X/ Fe] forindividual elements. To keep our analysis straightforward,we further restrict the sample to stars with unﬂagged valuesfor all 15 abundances and − . < [ X/ H] < . for all ele-ments, reducing it to , stars. We adopt stellar parameters and elemental abundances from the “namedtags” attributes from the DR16 catalog (i.e., ﬁts.TEFF, ﬁts.LOGG,ﬁts.MG FE). This further leads to a more restricted and cleaner sample assome combinations are T eﬀ and metallicities are automatically excluded(priv. comm. H. J¨onsson). ING & W

EINBERG

Figure 4.

Abundance distribution in [ X/ Fe] versus [Fe / H] of the APOGEE training set. We adopt the training set to measure the conditionalPDF and pairwise correlations of the 15 elemental abundances. The training set comprises , APOGEE disk stars with stellar parameters < T eﬀ < and < log g < . . We restrict the parameter range of our training set to minimize differential uncertainties in theabundances, and we also restrict the sample to SNR > for [Mg / H] ≥ − . ; SNR > for [Mg / H] < − . to ensure that the abundancemeasurements are robust. In Fig. 4, we illustrate the abundance distribution of ourAPOGEE training set in the [ X/ Fe] and [Fe / H] plane. Allelemental abundances show a well-deﬁned locus. There islarger dispersion for Na and V, and to some extent K andCu, but the more considerable dispersion is not surprisingas these elements only have weak or singular features in theAPOGEE H -band spectra. The statistical uncertainties re-ported for these elements are also larger than for other ele-ments.To demonstrate the ability of the normalizing ﬂow to em-ulate this distribution, we ﬁrst train a normalizing ﬂow withthis training set and ﬁt for the joint distribution of all 15 el-emental abundances, p ([ X/ H]) . We then sample from this15D joint distribution. The density contours in Fig. 5 demon-strate the sample of drawn from the ﬁtted normalizingﬂow. Comparing Fig. 4 and Fig. 5 showcases the remark-able ability of the normalizing ﬂow to represent the APOGEEabundance distribution. From here onward, unless stated oth-erwise, we will focus on the conditional distribution, i.e., the joint distribution of some elements conditioned on the valuesof T eﬀ , log g , and two or more elements. We will ﬁrst describe the abundance distribution of13 elements, training a normalizing ﬂow to describe p ([ X/ H] | [Fe / H] , [Mg / Fe] , T eﬀ , log g ) . We include T eﬀ and log g as conditioning variables because a star’s elemen-tal spectral features depend on these atmospheric parame-ters as well as on the abundances themselves. Due to thespectral models ’ imperfection, this often translates into dif-ferent measurement systematics for different stars. Condi-tioning on them allows us to study the abundances differ-entially, pushing the measurement uncertainties to approachthose due only to photon noise. Our normalizing ﬂow mod-els also allow us to choose different reference points in T eﬀ Conditioning on [Fe / H] and [Mg / Fe] is equivalent to conditioning on [Fe / H] and [Mg / H] , since the value of [Mg / Fe] at ﬁxed [Fe / H] and [Mg / H] is just [Mg / Fe] = [Mg / H] − [Fe / H] . OW M ANY E LEMENTS M ATTER ? 11

Figure 5.

Distributions of [ X/ Fe] vs. [Fe / H] from a sample of “stars” drawn from the joint PDF constructed by applying normalizing ﬂowto the APOGEE training set shown in Fig. 4. Comparison with Fig.4 demonstrates that the normalizing ﬂow describes the APOGEE abundancePDF faithfully. The distributions in this ﬁgure are smoother because of the much larger number of drawn samples. and log g to evaluate the dispersions and the correlation ma-trices. Comparing results at different reference points allowsus to test whether they are affected by systematic uncertain-ties within the range of our sample. Small differences couldarise in principle because stars of different log g and T eﬀ havedifferent luminosity, can have different SNR (hence photonnoises) and sample the disk differently. However, the factthat median abundance trends are nearly independent of lo-cation within the disk or bulge (Weinberg et al. 2019; Grifﬁthet al. 2020) suggests that any genuine trends with disk sam-pling would be small.As a baseline model, we also condition on Fe and Mg,which serve as representative elements for two critical en-richment processes, core-collapse supernovae and Type Iasupernovae. These two elements provide informative diag-nostics for the contribution of these two processes to a star’sabundances because (a) they are well measured by APOGEE,(b) Mg is expected to come almost exclusively from core-collapse supernovae, and (c) Fe has a large additional contri-bution from SNIa. By conditioning on these two elements weremove two dimensions that are known to be important in the Milky Way abundances, allowing us to study the residuals inﬁner detail.Before studying the residual correlations, we ﬁrst examinethe the diagonal entries of the covariance matrix. Recall thatnormalizing ﬂows allow us to draw samples from the con-ditional distribution p ([ X/ H] | T eﬀ , log g, [Fe / H] , [Mg / Fe]) ,with which we can evaluate the dispersion by drawing sam-ples ( in our case) from the conditional distribution. InFig. 6, we show the dispersion of the conditional PDF, con-ditioning only on these two elements. The ﬁgure illustrates,given a star’s [Fe / H] , [Mg / Fe] , T eﬀ , and log g measurementsin APOGEE, how well the conditional mean abundance pre-dicts the other elemental abundances. The blue, orange, andgreen lines show the results for different reference values ofthe conditioning variables. We estimate the ﬁnite samplinguncertainty in the dispersion by constructing 60 bootstrap re-alizations of our , APOGEE stars and repeating ourentire procedure, training the normalizing ﬂow for the condi-tional PDF on each bootstrap realization of the data. Unlessstated otherwise all results in this study adopt 60 bootstraprealizations to calculate the ﬁnite sampling uncertainty. Fur-thermore, since the normalizing ﬂow training itself can be2 T

ING & W

EINBERG

Figure 6.

Dispersion of APOGEE elemental abundances about the conditional mean predicted at ﬁxed stellar parameters, [Fe / H] , and [Mg / Fe] ,computed from the conditional PDF of our normalizing ﬂow model trained on the APOGEE disk star sample. We deﬁne the dispersion σ [ X/ H] to be half the difference between the 16th- and 84th-percentile abundance values in the marginal PDF for each element. In the left panel, blue,green, and orange lines show the dispersion at three different reference points of T eﬀ and log g as labeled, all for [Fe / H] = [Mg / Fe] = 0 , andbands indicate the ﬁnite sampling uncertainty inferred from bootstrap realizations. The dashed line shows the mean photon noise uncertaintyreported by

ASPCAP for each element. In the right panel, green and orange lines show the dispersion at two other reference points of [Fe / H] and [Mg / Fe] as labeled, all for T eﬀ = 4500 K , log g = 2 . . noisy, we train 60 normalizing ﬂows without bootstrappingand take the median of the covariances of these realizationsas our best estimates.On the left, we illustrate the dispersion for stars with dif-ferent T eﬀ and log g , assuming Solar metallicity (by whichwe mean both [Fe / H] = 0 and [Mg / Fe] = 0 ). We evalu-ate the dispersion of a given element about the conditionalmean, denoted σ [ X/ H] , as half the difference between the16th- and 84th-percentile values in the marginal PDF. Ele-ments are listed by increasing order of this dispersion (blueline) for the T eﬀ = 4500 K , log g = 2 . conditional PDF.The dashed line shows the mean value of the reported ASP - CAP [ X/ Fe] uncertainty for all stars in our sample. The totaldispersion is a nearly monotonic function of this estimatedphoton noise, but it is consistently higher, implying, if the

ASPCAP noise estimates are accurate, that there is residualintrinsic dispersion in the abundances. If we estimate thisintrinsic dispersion as the quadrature difference between thetotal dispersion and the photon noise, we ﬁnd values of 0.01-0.02 dex for most elements (0.007 dex for O and Co). Theinferred intrinsic dispersion is larger (0.035-0.05 dex) for K,V, and Na. While these elements could truly have larger in-trinsic dispersion, they are also three of the elements that aremost difﬁcult to measure with APOGEE spectra, so we sus- pect that this difference is a consequence of observationaldispersion in excess of the estimated noise.If we deﬁne σ [ X/ H] as the rms deviation about the condi-tional mean instead of using the difference of percentile val-ues, we get dispersions (not shown) that are slightly higher(5-10%) for the best measured elements on the left side ofthe plot, but 25-70% higher for the elements with the largestdispersion (Cr, Cu, K, V, Na). The larger rms values for theseelements are driven by outliers on the tails of the PDF. Theseoutlier values could be real and might be astrophyically inter-esting, but we suspect that they are primarily non-Gaussianobservational errors because they occur for the abundancesthat are most difﬁcult to measure in the ﬁrst place. If we usedthe rms deviation to infer the intrinsic dispersion, we wouldget larger values (0.03-0.08 dex) for these elements.Green and orange lines show the dispersion at two otherchoices of T eﬀ and log g , corresponding to successivelycooler and more luminous stars. The residual dispersion issimilar to that found for our ﬁducial T eﬀ and log g point,demonstrating the robustness of our results, irrespect of thechosen stellar parameters. There are minor differences, butthose could be due to different photon noise at different refer-ence points. We ﬁnd that residual dispersions at ﬁxed Fe andMg are only slightly larger even if we do not condition on T eﬀ and log g , which indicates that our parameter range is al- OW M ANY E LEMENTS M ATTER ? 13

Figure 7.

Dispersion of abundances about the conditional mean after conditioning on two, three, or four elements (in addition to T eﬀ = 4500 Kand log g = 2 . ). Left and right panels show two different conditional locations in metallicity and α -enhancement, as labeled. Lower panelsshow the change in dispersion in units of the ﬁnite sampling uncertainty inferred from the standard deviation of bootstrap samples of the dataset. The reduction of dispersion from adding conditional elements beyond Fe and Mg is statistically insigniﬁcant in this data sample, but thereductions in residual correlations are much more detectable as shown below in Fig. 10 (see also theoretical arguments in §2 for better intuition). ready narrow enough to limit the contribution of differentialsystematics. We nonetheless retain T eﬀ , log g conditioningfor our default analysis, since (as argued in §2) these parame-ters could be correlated with abundances, which complicatesthe interpretation of the correlations, even if conditioning onthem makes only a tiny difference to the residual dispersion.The right panel shows the residual dispersion for differentmetallicity and α -enhancement. We investigate three differ-ent reference points, representing the Solar metallicity pop-ulation and the low- α and high- α branches at low metal-licity ( [Fe / H] = − . with [ α/ Fe] = 0 . , 0.25). Thecurrent APOGEE disk star sample has too few low metal-licity stars to reliably investigate the abundance PDF below [Fe / H] = − . . Fig. 6 demonstrates that the dispersionabout the conditional mean is qualitatively similar for thesedifferent populations. Some elements show larger dispersion at low metallicity, but these are mostly elements with weakspectral features in the APOGEE H -band, so the larger dis-persion could be a consequence of larger observational un-certainties at low metallicity.Strictly speaking, our use of the percentile range ratherthan rms deviation to deﬁne σ [ X/ H] means that the σ X/ H] are technically not the diagonal elements of the abundancecovariance matrix. However, culling the outliers with the per-centile range probably constitutes a better comparison withthe reported photon noise uncertainty from ASPCAP . We willignore this terminological distinction below and use the termsdiagonal covariance and dispersion to refer to the dispersionestimated by this percentile method, which responds to thecore of the distribution rather than the tails.4 T

ING & W

EINBERG

Fig. 7 shows the residual dispersion of abundances afterconditioning on two elements (the baseline case discussedpreviously), three elements (Fe, Mg, and O), or four ele-ments (Fe, Mg, O, and Ni). We adopt the reference point T eﬀ = 4500 K and log g = 2 . throughout. The left panelshows Solar metallicity stars and the right panel shows lowermetallicity, α -enhanced stars. Uncertainties in the residualdispersions are estimated from bootstrap resampling as be-fore.No reduction in dispersion is detectable at a statisticallysigniﬁcant level. This result is unsurprising in light of ourdiscussion in §2.1. Even if an element has correlations withother elements as strong as ρ = 0 . , conditioning on thatelement only reduces the dispersion by ∼ on average,which require an effective sample of ∼ for a 2 σ detec-tion, within the ﬁnite sampling ﬂuctuations. While the highquality APOGEE sample adopted here has , stars, theeffective sample at a reference [Fe / H] , [Mg / Fe] and stellarparameters is (cid:46) (see the discussion on the effective sam-ple size in §4.4.2). Stacking the signals at multiple referencepoints only moderately reduces the sampling noise, unlessone takes such a large range of [Fe / H] and [Mg / Fe] that theresults become more difﬁcult to interpret. Using a large sam-ple of lower SNR spectra would not improve the signal ei-ther; in this case, the correlations become weaker due to thelarger observation dispersions (Eq. 17), which in turn wouldrequire an even larger effective sample (Eq. 12) to measurethe reduction of dispersion at high signiﬁcance. Thus, fromFig. 7 alone, we might erroneously conclude that Fe and Mgcontain all of the information concealed in APOGEE abun-dances.As we will show in the following section, the elementalabundance space has many more hidden dimensions whichmanifest themselves through the correlations, but these di-mensions are simply not visible in dispersion with the currentlimited sample size of APOGEE. With larger samples in thenear future (e.g., with SDSS-V, 4MOST, Weave), measuringother independent dimensions through the reduction in dis-persion should become possible, though it remains highly in-efﬁcient compared to measuring correlations directly. Suchresults would validate the theoretical arguments as laid outin §2. Finally, although not shown, we also tested that con-ditioning on any other combination of elements instead of Oand Ni does not change the results.

Besides dispersion, the sample drawn from the conditionaldistribution p ([ X/ H] | T eﬀ , log g, [Fe / H] , [Mg / Fe]) also al-lows us to estimate the off-diagonal entries of the covariance matrix, and hence the correlation among the elemental abun-dances. The correlation matrix of the APOGEE data (assum-ing Solar metallicity, T eﬀ = 4500 K, and log g = 2 . ) isshown in the left panel of Fig. 8. The ﬁgure shows that evenafter removing the mean abundance trends predicted by Feand Mg, a non-trivial correlation between elements remains,implying higher dimensionality of the abundance distributionthat would be missed if we considered only the residual dis-persion. The right panel of Fig. 8 shows the trivial correlationmatrix that we would obtain if other elemental abundanceswere perfectly determined by the observed Fe and Mg, lead-ing to an identity correlation matrix. Comparing the two pan-els of Fig. 8 makes the obvious point that inferring elementalabundances from Fe and Mg (right) is not the same as mea-suring them (left) , even if they are statistically indistinguish-able in terms of their dispersions (Fig. 7). In Appendix A,we further demonstrate that such correlation structure alsoshows up across different choices of T eﬀ and log g , and islargely independent of the choice of stellar parameters.However, a critical question remains: do the measured cor-relations reﬂect astrophysics, or could they be artiﬁcially in-duced by observational uncertainties? Three potential uncer-tainties could generate artiﬁcial correlations among elemen-tal abundances; we will estimate each of these in turn andshow that they are too small to explain the APOGEE signal.Fig. 9 summarizes this comparison. Statistical covariances from the

ASPCAP measurements

The ﬁrst potential source of uncertainty is the correlatedmeasurement uncertainty from

ASPCAP . For individual el-ement abundances,

ASPCAP reports statistical uncertaintiesbut it does not report the covariance of these uncertainties.Therefore, in the following, we provide our own estimate ofthe

ASPCAP correlation.If we condition on T eﬀ , log g , [Fe / H] , and [Mg / H] , we ex-pect to remove most differential systematic uncertainties asa source of dispersion or artiﬁcial correlations, and measure-ment uncertainties should approach the photon noise limit,as borne out in Fig. 6. In this case, probing the ASPCAP measurement covariance reduces to the question of under-standing the Fisher matrix. For simplicity, we make the ap-proximation that all pixels in the APOGEE spectrum havethe same noise, and that the noise in different pixels is un-correlated. We refer interested readers to Ting et al. (2017a)for details behind the calculations presented here. With theseassumptions, it can be shown that the statistical covariancedue to the photon noise (or what is known as the Cramer-Rao bound) is proportional to ( G · G T ) − , where G is an N labels × N pixel matrix that collects all the gradient spectra.Each row in G measures how an APOGEE spectrum wouldvary as we vary individual stellar labels (stellar parametersand elemental abundances). OW M ANY E LEMENTS M ATTER ? 15

Figure 8. ( Left ) Correlation matrix of 13 elemental abundances evaluated from the normalizing ﬂow PDF conditioned on T eﬀ = 4500 K , log g = 2 . , and Solar [Fe / H] and [Mg / Fe] . Symbol areas are proportional to the magnitude of correlations, blue ﬁlled circles for positivelycorrelated element pairs and red open circles for negatively correlated pairs. Diagonal entries of the correlation matrix have a value of 1.0 bydeﬁnition. Even after removing the mean trends tracked by Fe and Mg, the APOGEE elemental abundance space has many “hidden” dimensionsthat only manifest themselves through statistical correlations of abundances. (

Right ) The trivial correlation matrix expected if we infer the 13elemental abundances from the observed Fe and Mg abundances. Inferring abundances is not the same as measuring the abundances; only thelatter reveals the information concealed in the subtle correlations, which we can now measure at high statistical signiﬁcance from the extensiveAPOGEE data.

To evaluate the gradient spectra, we adopt the Kuruczspectral models (Kurucz 1993, 2005, 2013) through AT - LAS

SYNTHE synthesizer. Since

ASPCAP measures abun-dances with spectral windows, we adopt the spectral win-dows from Garc´ıa P´erez et al. (2016) as well as addi-tional spectral windows in DR16 for Cr, Co, and Cu(priv. comm. J. Holtzman) and zero-out the gradient for anypixels that are not in the

ASPCAP window for the correspond-ing element. When calculating the statistical covariance ma-trix, besides the elemental abundances in this study, we alsoinclude gradient spectra from T eﬀ , log g, v micro , [C/H] and[N/H] . These are stellar labels that ASPCAP also derived,and their measurement uncertainties could indirectly createartiﬁcial correlations among the elemental abundance uncer-tainties.For ease of comparison, the top left panel of Fig. 9 re-peats the measured APOGEE correlations shown previouslyin Fig. 8. The top right panel shows the expected corre-lations from photon noise uncertainties, adopting the same If we were to include also v macro in the ﬁt, it would have increased themedian artiﬁcial correlation to ρ (cid:39) . , instead of ρ (cid:39) . , but theeffect would still be negligible for this study. reference point of T eﬀ = 4500 K , log g = 2 . , and So-lar metallicity. The covariance of photon noise abundanceuncertainties for an individual star would be the product ofthese correlations with the individual element dispersions,and it would scale with the SNR of the spectrum. How-ever, the correlation coefﬁcients themselves are independentof the SNR. The ﬁgure shows that the correlations amongelemental abundances expected from photon noise are min-imal, with typical pairwise values ρ (cid:39) . , much weakerthan the empirical signals; the APOGEE correlation signalsare of the order of ρ = 0 . − . . Our results echo thosein ﬁgure 17 of Ting et al. (2017a), who studied the correla-tions of abundance measurements at various resolutions andfound that for the APOGEE resolution and wavelength cov-erage, most abundance measurements are uncorrelated evenwhen blended features are included. Since ASPCAP choseonly to measure individual abundances through spectral win-dows without blended features, correlations between abun-dance measurements are even further reduced.For completeness, we note that there are a few approx-imations that we have made for this calculation. Forexample, we adopt the Kurucz models instead of the

MARCS / TURBOSPECTRUM models adopted in

ASPCAP , aswe do not have easy access to the latter. The difference in6 T

ING & W

EINBERG

Figure 9.

Comparison of the measured APOGEE correlations conditioned on Mg and Fe (the upper left panel), repeated from Fig. 8, tothree sources of observational uncertainty discussed in §4.4.1-4.4.3. The upper right panel shows the correlations expected from the impactof correlations due to the

ASPCAP measurements, which are not perfectly diagonal because values of some stellar parameters and blendedfeatures affect the inference of multiple elements. Off-diagonal entries in the bottom left panel show the magnitude of random uncertainties incorrelation coefﬁcients expected from ﬁnite sampling, estimated by bootstrap resampling of our APOGEE data set. Signs of these coefﬁcientshave been randomly chosen to emphasize that sampling uncertainty can be positive or negative. The bottom right panel shows the correlationsexpected from “measurement aberration” induced by conditioning on a star’s measured values of [Fe / H] and [Mg / H] instead of the unknowntrue values. None of these sources of uncertainty is large enough to explain the correlation signals measured in APOGEE, indicating that thedetection of the residual correlation structure in the data is statistically signiﬁcant and astrophysically relevant. atomic data is likely to modify the derived element valuesslightly but have minimal inﬂuence on the correlations due tothe photon noise. Similarly, we expect the assumption of un-correlated pixels and homogeneous pixel noise might changethe absolute scale of the covariance, but not the correlationby much. As we discuss in the following, the other twosources of correlated uncertainties are more important, andany artiﬁcial correlations due to the ASPCAP measurements can be neglected for our purposes. Another caveat is that em-pirical uncertainties from repeat spectra exceed those from χ ﬁtting (see §2.3 and J¨onsson et al. 2020), which impliesthat some variation in observational conditions (e.g., smallchanges in the spectral line spread function) contributes tostatistical measurement uncertainties in addition to pure pho-ton noise. Here we assume any additional correlation arosefrom these additional random errors is minimal. OW M ANY E LEMENTS M ATTER ? 17

Correlation estimation uncertainty from ﬁnite sampling

Another source of uncertainty that could generate artiﬁcialcorrelations is ﬁnite sampling. In the ideal scenario wherewe have inﬁnite realizations drawn from the PDF, we shouldrecover the PDF exactly. However, the ﬁnite sampling im-plies that the estimation of the conditional PDF itself, andsubsequently the correlations, must be noisy to some extent.As derived in §2.2, the uncertainty of correlation due to ﬁ-nite sampling is (cid:39) / (cid:112) N sample . Quantitatively, we have thestandard deviation of the correlation due to sampling uncer-tainty to be 0.1 for a sample size of 100 and 0.01 for a samplesize of .Although we adopt a training set of , stars in thisstudy, not all stars contribute to any single reference point.Since we study the smooth variation of the conditional dis-tribution and its correlation, it can be challenging to estimatethe effective N sample contributing to a given reference point.To do so we repeat our entire analysis procedure for 60 boot-strap resamplings of the full , star data set and takethe standard deviation of the derived correlation coefﬁcients.The bottom left panel of Fig. 9 shows these sampling un-certainties, assuming the reference point at T eﬀ = 4500 K, log g = 2 . , and Solar metallicity. The sign (positive ornegative) of the ﬂuctuation is randomly assigned to highlightthat the sampling uncertainties can perturb the correlation es-timates in either direction. The panel shows that the ﬁnitesampling uncertainties are typically ρ (cid:39) . , small com-pared to many of the non-zero correlations that we measurefrom APOGEE. As a result, the measured APOGEE corre-lations cannot be entirely caused by the random ﬂuctuationsdue to the ﬁnite size of the stellar sample.Recalling that the statistical uncertainty is (cid:39) / (cid:112) N sample for large N and weak correlations, we infer that the effec-tive sample size at our chosen ﬁducial reference point is N sample (cid:39) . In principle, we could “stack” the cor-relation signals at different reference points to increase theeffective sample. However, through numerical experimentswe found that stacking the signals over different T eﬀ - log g of our training sample ( T eﬀ = 4100 − K) only re-duces the sampling uncertainty slightly (from ρ = 0 . to . ). The effective sample is much smaller than the par-ent sample due to the conditioning on [Fe / H] and [Mg / Fe] ,not T eﬀ and log g . As we will see in §4.5, stellar populationswith different metallicities and α -enhancements exhibit sub-tle differences in the correlations. Therefore, although stack-ing the signals along [Fe / H] and [Mg / Fe] could in principlereduce the sampling uncertainty, it will come at the cost ofinterpretability. Moreover, at least for the case of condition-ing on two elements, there is a larger source of uncertaintythat we will discuss below. This uncertainty cannot be re-duced with the effective sample size but rather depends onthe abundances’ measurement precision. Therefore, for sim- plicity and for keeping any residual systematic uncertaintiesunder better control, we choose not to stack the results fromdifferent reference points.

Abundance measurement aberration

The origin of the third source of uncertainties is more sub-tle, but it is the dominant source of artiﬁcial correlation forthis study. Recall that, in the baseline model, we conditionon Fe and Mg and study the residual covariances. However,even without any astrophysical correlation, the residual co-variance will only approach the

ASPCAP measurement un-certainty plus sampling uncertainty if we condition on trueabundance values of Fe and Mg. When we train the condi-tional normalizing ﬂow, we can only condition on the mea-sured values from APOGEE, not the true values; this limita-tion itself can generate some artiﬁcial correlations. For ex-ample, if we consider a set of elements that are strongly cor-related with Fe, then in a star whose measured Fe abundanceﬂuctuates low because of uncertainty, all of those elementswill tend to appear high, in a correlated way, relative to theconditional mean. We refer to this effect as “measurementaberration,” by loose analogy to the phenomenon of aberra-tion of starlight. It is an uncertainty that arises because weare “standing in the wrong place,” predicting a star’s condi-tional mean abundances based on its measured abundancesof Fe and Mg instead of their true values.We estimate this effect through numerical experiments. Inparticular, we adopt the empirical conditional distribution p ([ X/ H] | T eﬀ , log g, [Fe / H] , [Mg / Fe]) and its correspond-ing covariance matrix as shown in Fig. 8. We then drawa mock sample that has the same T eﬀ , log g , [Fe / H] , and [Mg / Fe] values as the stars in our APOGEE training set. In-stead of drawing [ X/ H] from the joint distribution, we draweach element independently from its own marginal distribu-tion, generating a test sample that follows the same empiricaldispersion as the APOGEE data but without the correlation.The elemental abundance space spanned by the mock data isstrictly two-dimensional by construction, as Fe and Mg de-termine all abundances without any residual correlation. Tostudy the aberration effect, we then add observational uncer-tainty to [Fe / H] and [Mg / H] , assuming the mean ASPCAP reported uncertainties for our sample, ∆[Fe / H] = 0 . dexand ∆[Mg / H] = 0 . dex. Since we draw the other 13elemental abundances from their own marginal distributions,their observational dispersions as shown in Fig. 6 are alreadyautomatically included. We reﬁt a new conditional normal-izing ﬂow and study the correlation of this mock sample. Tominimize the sampling uncertainty in this aberration estimate(because we have to draw the conditioning variables from theAPOGEE data set, which is ﬁnite), we run the experiment 60times, each time drawing new randomly perturbed values ofthe conditioning variables from the APOGEE data. For in-8 T ING & W

EINBERG dividual correlation coefﬁcients, we take the median of these60 realizations as our best estimates.The bottom right panel of Fig. 9 shows the artiﬁcial corre-lations due to this effect. As we will elaborate more later withFig. 11, this source of artiﬁcial correlations is not negligible.Even though the artiﬁcial correlations peak at ρ = 0 . , theyhave a long tail extending to ρ = 0 . − . . Nevertheless,this effect is not sufﬁcient to explain the strongest APOGEEcorrelations ( ρ = 0 . − . ). Furthermore, the correlationsinduced by measurement aberration can be straightforwardlypredicted by the numerical experiment conducted here un-der the “null hypothesis” that all abundances are determinedby Fe and Mg. Deviations from this predicted structure aretherefore evidence against the null hypothesis. As we condi-tion on more elements (§4.5), the aberration effect changesand diminishes because the random uncertainty in any oneabundance matters less, so we redo the aberration predictionfor each new null hypothesis (see Fig. 11 below).The main uncertainty in predicting the aberration effect isthat we rely on the ASPCAP value of the photon noise un-certainty. In Appendix B, we show that generating artiﬁcialcorrelations as strong as ρ = 0 . − . , the largest values weﬁnd for the APOGEE data, would require that ASPCAP hasunderestimated the statistical uncertainties for Fe and Mg bya factor of ∼ , with ∆[Fe / H] = ∆[Mg / H] = 0 . dex.However, in this case the structure of the correlations wouldbe radically different, with all elements positively correlated.This also goes against the fact that the observed total disper-sions (including intrinsic dispersions) for some elements witheven less spectral information in APOGEE, such as O and Si,are close to 0.01 dex (Fig. 6 and Fig. 7). This contradictionis itself indirect evidence that ASPCAP is indeed achieving differential metallicity precision at the 0.01 dex level for Mgand Fe (as well as O and Si), consistent with the reported pho-ton noise uncertainties. The achievement of such exquisiteprecision in a large survey with mass production “pipelineabundances” is a remarkable achievement. While absolute ordifferential systematic uncertainties are a limiting factor forsome investigations, the high numerical precision achievedby APOGEE can be harnessed for many applications withproper statistical modeling.To sum up, through an exhaustive search for false posi-tive signals, we conclude that the observed APOGEE corre-lations are real and statistically signiﬁcant. They cannot beexplained away by measurement uncertainties.

After demonstrating that the APOGEE residual correlationstructure is statistically signiﬁcant, we turn to the questionthat we are the most interested in: How many APOGEEelements carry independent information? In other words, starting from the baseline model conditioned on Fe and Mg,which other elements we should condition on to reduce resid-ual correlations to a level consistent with observational un-certainties? Because of the measurement uncertainties andﬁnite sample size, our results will be a lower limit to thenumber of elements with intrinsically signiﬁcant informationcontent. Furthermore, observational uncertainty suppressescorrelations (Eq. 17), so our ability to detect correlations isreduced for the elements with the largest uncertainties.Fig. 10 presents an overview of our principal results, whichwe will elaborate more quantitatively in Fig. 11. Successivepanels show the residual correlations after conditioning ontwo, three, four, ﬁve, six, or seven elemental abundances,always at the reference point T eﬀ = 4500 K, log g = 2 . , [Fe / H] = [ X/ Fe] = 0 . Shaded blocks highlight groups ofelements that show strong internal correlations, and each newconditioning element is chosen to target one of these blocks.The strong correlations among O, Si, and S in the top leftpanel are reduced by conditioning on O. Further condition-ing on Ni reduces correlations among iron peak elements thatremain after conditioning on Fe, Mg, and O. Conditioning onSi reduces several remaining correlations among α -elementsand the light odd- Z elements Na and Al. Signiﬁcant corre-lation remain among Ca, and Al, which is reduced by condi-tioning on Ca. Finally, although it is hard to see from Fig. 10,there is a statistically signiﬁcant (Fig. 11) anti-correlation be-tween S and Al, which is reduced by conditioning on Al.Fig. 11 shows the statistical signiﬁcance of these correla-tions. In the top panel, the light blue histogram shows thedistribution of the magnitudes of the correlation coefﬁcientsafter conditioning on Fe and Mg, i.e., of the off-diagonal el-ements of the matrix in the top left of Fig. 10. The darkblue histogram shows the correlation coefﬁcients involvingO, which include several of the largest values in the distri-bution. In the inset panel, the band shows the O correla-tions element by element, with ﬁnite sampling uncertaintiescomputed from the 16-84% ( σ , dark blue) and 5-95% ( σ ,light blue) range of the 60 bootstrap resamplings of the dataset (§4.4.2). The green dot-dashed line shows our estimateof the correlations from photon noise in the ASPCAP abun-dance measurements (§4.4.1), which are small enough thatwe can neglect them relative to other sources of correlation.The black dashed line shows the correlations expected frommeasurement aberration (bottom right panel of Fig. 9), com-puted as described in §4.4.3. This line represents the predic-tion of the “null hypothesis”, computed from 60 realizationsin which we add random uncertainty to [Fe / H] and [Mg / H] values in a model that determines all abundances from Feand Mg by construction (§4.4.3). The O-Si correlation ishighly inconsistent with this null hypothesis, and the O-S,O-Ca, and O-Cr correlations are all inconsistent at well over σ . Although not shown, we also tested that even if we were OW M ANY E LEMENTS M ATTER ? 19

Figure 10.

Correlation of residual abundances after conditioning on two, three, four, ﬁve, six, or seven elements as labeled. Shaded regionsshow blocks of correlations to guide the eye. Correlations within these blocks are reduced after conditioning on one of their constituentelements. All correlations are evaluated at T eﬀ = 4500 K , log g = 2 . , and [Fe / H] = [ X/ Fe] = 0 . The strong correlations present afterconditioning on Fe and Mg alone (top left) are reduced to a level consistent with observational uncertainties by conditioning on Fe, Mg, O, Ni,Si, Ca, and Al (bottom center). See Fig. 11 for a more quantitative assessment. to include the σ uncertainty range from the estimate of theaberration (from the 60 independent numerical experiments),instead of just taking the median prediction for our null hy-pothesis, the detection signals are still > σ .Returning to the main panel, the green dot-dashed andred dashed curves show the distribution of correlation coefﬁ-cients from ASPCAP uncertainties (Fig. 9, top right) and ﬁnitesampling (Fig. 9, bottom left), respectively. The black dashedcurve shows the distribution of correlations induced by mea-surement aberration. The solid black curve shows the com-bined effect of measurement aberration and sampling uncer-tainty, obtained by adding random draws from the samplinguncertainty distribution for a given coefﬁcient to the medianmeasurement aberration for the same coefﬁcient. Many ofthe correlations measured from the APOGEE data are wellbeyond the tail of the distribution expected from measure-ment aberration and sampling uncertainty alone.The second row shows the same quantities after condition-ing on Fe, Mg, and O. Now the inset panel shows correlationcoefﬁcients for Ni, which has several of the largest values. Both Ni-Ca and Ni-Co deviate from the measurement aber-ration prediction by > σ , and Ni-Al, Ni-Mn, Ni-Cu are atthe σ level. We emphasize that the sampling uncertaintyand measurement aberration must be recomputed each timea new conditioning element is added. The measurement aber-ration effect gets gradually smaller as more conditioning el-ements are included because the random uncertainty in anyone abundance measurement has less impact and is less likelyto generate correlated aberration. The sampling uncertaintydistribution changes slightly because the uncertainty for indi-vidual coefﬁcients depends on the strength of the correlation.The ASPCAP uncertainty matrix does not need to be recom-puted as more conditioning elements are included, but rowsor columns including those elements are omitted.After adding Ni as a conditioning element, the Si-Na, Si-Al, and Si-Co correlations show the most signiﬁcant devi-ations from the measurement aberration prediction (fourthrow). After adding Si as well, the most signiﬁcant deviationsare Ca-Al and Ca-Mn (ﬁfth row). In this ﬁve-element case,the overall distribution of | ρ | is consistent with the combi-0 T ING & W

EINBERG

Figure 11.

Statistical signiﬁcance of residual correlations after conditioning on two to seven elements (top to bottom). In each panel lightblue histograms show the distribution of | ρ | , the magnitudes of off-diagonal correlations in the corresponding panel of Fig. 10. Green, red,and dashed black curves show the distribution of correlations expected from correlated observational uncertainties, ﬁnite sampling uncertainty,and measurement aberration, respectively. Solid black curves show the combined effect of sampling uncertainty and aberration. Inset panelsshow the correlation coefﬁcients for the indicated element, with σ (dark) and σ (light) ranges estimated from the percentiles of 60 bootstrapresamplings of the APOGEE data set. For the ﬁrst four panels, band colors are those of the corresponding element blocks in Fig. 10, and thesecorrelations are highlighted in the main panel histogram. Green dot-dashed and black dashed lines in the insets show the correlations predictedfrom statistical uncertainties and measurement aberration, respectively. Conditioning on seven elements is required to reduce the measuredcorrelations to values that are all individually consistent at σ with the “null hypothesis” of no further residual correlations. OW M ANY E LEMENTS M ATTER ? 21nation of measurement aberration and sampling uncertainty(main panel), but the speciﬁc Ca-Al and Ca-Mn correlationsare not (inset) because the measurement aberration value forthe Ca-Al coefﬁcient is small, and the value for Ca-Mn isopposite in sign from the observed correlation. Finally, inthe six-element case, the Al-S and Al-Co correlations remainsigniﬁcant, which we further reduce by conditioning on Al.All of these detections are more signiﬁcant than σ .After adding Al as a seventh conditioning element, thelargest residual correlations are all consistent with measure-ment aberration + sampling uncertainty at the σ level (sev-enth row). We therefore do not claim convincing evidence ofresidual correlations beyond seven elements.There is some judgment involved in deciding the order inwhich to add conditioning elements. Here we have madethese choices based on both the magnitude of the residualcorrelations and the statistical and systematic uncertainties inthe abundance measurements, skipping over some elementsfor which APOGEE measurements are less robust (e.g., Na).We have checked that alternative orderings lead to the sameconclusion about the number of elements required to reduceresidual correlations to a level consistent with observationaluncertainty, though the choice and order of which seven ele-ments to condition on is not unique. The elements that mostclearly demonstrate residual correlations are also the sevenwith among the smallest ASPCAP measurement uncertaintiesand the smallest total dispersion (see Fig. 6). We suspectthat improving the photon noise uncertainty of the abundancemeasurements would show that even more elements containsigniﬁcant independent information.Finally, stars with different [Fe / H] and [Mg / Fe] samplestellar populations that have experienced different enrich-ment histories and potentially different degrees of stochastic-ity in their chemical evolution. Fig. 12 compares the resid-ual correlations for Solar metallicity stars (left) to those for [Fe / H] = − . and [Mg / Fe] = 0 . (middle) or 0.25(right), always with T eﬀ = 4500 K and log g = 2 . . Theresidual correlations for metal-poor stars are comparable inmagnitude and similar in pattern to those for Solar metallic-ity stars, but with some differences. Most noticeably, cor-relations involving Ca are stronger and consistently positivefor the metal-poor stars. For the α -enhanced stars the correla-tions among the α -elements are somewhat stronger and thoseamong the iron peak elements somewhat weaker. These dif-ferences are not surprising given the greater relative contri-bution of core-collapse supernova enrichment to the high- α population, though we caution that the residual correlationsafter conditioning on Fe and Mg need not follow the averagecontribution of individual enrichment processes in a simpleway (see §5.1). Importantly, as for Solar metallicity stars, theresidual correlations reveal structure in the abundance distri-butions that would be buried if we were to study only the dispersion (Fig. 7). The bottom panels show that condition-ing on seven elements again removes most of the large cor-relations, though we have not investigated the signiﬁcance ofcorrelations as exhaustively for these low metallicity popula-tions.

5. Discussion

The analysis in §4.5 shows that one must consider at leastseven elements (Fe, Mg, O, Si, Ca, Ni, and Al) to removeresidual correlations in the conditional PDF of APOGEEabundances. These elements are also among the most pre-cisely measured in APOGEE data, and they display thesmallest total dispersion after conditioning on Fe and Mg (allbut Al have dispersions < . dex, and Al has a disper-sion of 0.027 dex, see Fig. 6). It is likely that most or allelemental abundances would show signiﬁcant residual corre-lation structures in data with still higher measurement preci-sion. Crucially, these correlations can only be discovered ifabundances are measured individually, not inferred based onthe abundance of other elements.Our correlation measurements are made possible by a newstatistical technique, powered by the latest technology in ma-chine learning, to model the high-dimensional and irregulardistribution of stars in elemental abundance space. The tech-nique allows us to mitigate systematic uncertainties by con-ditioning on stellar parameters ( T eﬀ , log g ) that affect abun-dance measurements. Further conditioning on Fe and Mgreduces residual dispersion about the conditional mean to0.01-0.04 dex for most elements, and it reveals the sub-0.02-dex intrinsic dispersion after subtracting the reported ASP - CAP statistical uncertainties in quadrature. Our discussionin §2 explains why detecting hidden dimensions through re-duction of residual dispersion is statistically difﬁcult, and ouranalysis in §4 bears out this expectation in practice.Instead, these complex but critical signatures of chemicalenrichment processes can be revealed by directly measur-ing cross-element correlations in conditional PDFs. Condi-tioning on Fe and Mg alone leaves residual correlations thatclearly exceed the levels expected from correlated measure-ment uncertainties, from statistical ﬂuctuations due to ﬁnitesample size, or from the measurement aberration caused byrandom uncertainties in the conditioning abundances. Weneed to condition on Fe, Mg, O, Ni, Si, Ca, and Al to re-duce the residual correlations for Solar metallicity stars to alevel that is arguably consistent with the observational uncer-tainties.In this section we discuss the implications of our resultsfor the dimensionality of the stellar distribution in elementalabundance space, for characterization of observational uncer-tainties in abundance measurements, for chemical tagging,and for design of Galactic spectroscopic surveys.2 T

ING & W

EINBERG

Figure 12.

Correlation matrices for Solar metallicity stars (left) compared to those of [Fe / H] = − . stars with [Mg / Fe] = 0 . (middle) or [Mg / Fe] = 0 . (right). Top and bottom rows show conditioning on two and seven elements, respectively. Residual correlation patterns in thetop row are similar but not identical for different populations, and they are reduced to a similar level by seven-element conditioning as shownin the bottom panels. Our ﬁndings help to resolve the tension identiﬁed in §1 be-tween studies showing that disk star abundances can be ac-curately predicted from [Fe / H] and age (Ness et al. 2019)or from [Mg / H] and [Mg / Fe] (Weinberg et al. 2019) andPCA analyses showing that 5-10 principal components arerequired to explain the diversity of stellar abundance patterns(Ting et al. 2012; Andrews et al. 2012) or APOGEE spectra(Price-Jones & Bovy 2018). Conditioning on Fe and Mg doesindeed reduce residual dispersion to a level that only mod-erately exceeds that expected from photon noise. However,cross-element correlations clearly demonstrate the presenceof underlying residual structure in the abundance patterns be-yond the star-by-star dispersion.The question of how many elements are needed to removeresidual correlations is closely connected to the more gen-eral question of the dimensionality of the stellar distributionin elemental abundance space. If we have measurements of M abundances for every star, then these measurements de- ﬁne an M -dimensional space, but the stars may lie along aone-dimensional curve, a two-dimensional surface, a three-dimensional hypersurface, etc. We expect the number ofdimensions to be connected to the number of distinct as-trophysical processes that contribute to the elements beingconsidered. However, the connection is indirect. For ex-ample, in a one-zone model with a fully mixed interstellarmedium (ISM), stellar abundances depend on a single pa-rameter (time), even if the star formation history is complexand there are many processes contributing to the elements(core-collapse supernovae, Type Ia supernovae, AGB stars,neutron star mergers, etc.). While the relative contribution ofthese processes changes with time, all stars of the same agehave the same integrated contributions.Adding dimensionality to the abundance distribution thusrequires mixing stellar populations that have experienced dif-ferent enrichment histories. Radial migration of stars is onesuch mixing mechanism: star formation, accretion, and out-ﬂow histories within the Galactic disk change systematicallywith radius, and the stars present at a given radius today were OW M ANY E LEMENTS M ATTER ? 23born in a range of annuli with a variety of chemical evolutiontracks (e.g., Sch¨onrich & Binney 2009; Minchev et al. 2013;Frankel et al. 2020). Incomplete mixing of the ISM, in az-imuth at a given radius or even within a single star-formingcomplex, allows stars to be born with a variety of abundancepatterns at nearly the same location and time (Krumholz &Ting 2018). Bursts of star formation produce sharp excur-sions of [ α/ Fe] ratios and complex evolutionary tracks for el-ements produced by AGB stars on a timescale separate fromthat of core-collapse supernovae or Type Ia supernovae (e.g.,Johnson & Weinberg 2020). Mergers are another mechanismfor mixing stellar populations with different histories, thoughthe thinness of the Galactic disk implies that the fraction ofdisk stars that originated in a distinct satellite should be small(e.g., Toth & Ostriker 1992; Ting & Rix 2019).Our analysis here implies a lower limit of seven for the ef-fective dimensionality of the APOGEE disk abundance dis-tribution. We note that this number is clearly a conserva-tive limit. The elements studied in this paper are expected tocome predominantly from core-collapse and Type Ia super-novae (see Andrews et al. 2017; Rybizki et al. 2017; Wein-berg et al. 2019). But even within the most “boring” elementsof disk stars at Solar metallicity, our study reveals a plethoraof abundance structures. An analysis including elements pro-duced by other processes (e.g., optical surveys like GALAHor Gaia-ESO) will undoubtedly exhibit an even richer struc-ture.We have not attempted to interpret the correlations mea-sured here. Interpreting residual abundance correlations willultimately require forward modeling of both the astrophysi-cal sources of different elements and the stochastic effects inGalactic chemical evolution that drive variations in the rel-ative importance of these sources. Models that incorporatethese effects have been little explored to date. What is clear isthat correlation measurements like those presented here are apowerful diagnostic, presenting enormous new opportunitiesfor understanding the subtle processes that drive the chemicalevolution of the Milky Way.

In §2.3 we emphasized the distinction between “absolute”abundance uncertainties that are the same for all stars in asample, differential systematic uncertainties that depend onstellar parameters such as T eﬀ and log g that vary across thesample, and statistical abundance uncertainties from photonnoise. The ﬁrst may dominate the difference between stars’measured and true abundances, but it does not contributedispersion to abundances. Differential systematics can con-tribute dispersion, but they can be mitigated by conditioningon stellar parameters, a powerful feature of the normalizingﬂow method. By mitigating any differential systematics, we demon-strated that (Fig. 6) the small statistical uncertainties reportedby the ASPCAP pipeline are an accurate representation ofphoton noise, with total dispersion including an intrinsic con-tribution of . − . dex for well measured elements. Wehave also seen that even photon noise that is uncorrelatedfrom pixel to pixel can cause correlated abundance errors be-cause the abundances of multiple elements may be affectedby the same uncertain stellar parameter, and because someabundances are estimated from blended features or molec-ular lines . We have estimated these correlated measure-ment uncertainties for ASPCAP in §4.4.1 and ﬁnd that theyare small compared to statistical uncertainties in correlationcoefﬁcients from ﬁnite sampling (§4.4.2). The largest sourceof artiﬁcial correlations is the “measurement aberration” aris-ing because we can only condition on measured values ofabundances rather than true values (§4.4.3).Although the photon noise alone might have limited im-pact on our ability to detect correlations at high signiﬁcance,characterizing it is still enormously important. Fundamen-tally, we do not have direct access to the intrinsic varianceand correlations, only to measured values that include ob-servational contributions. Interpreting these measurementsand tracing them back to astrophysical phenomena (§5.1) re-quires a robust characterization of the intrinsic correlations,and hence an accurate determination of the observational un-certainties. While smaller statistical uncertainties are prefer-able, the most important thing is to understand them wellenough so that their effects (including that of measurementaberration) can be removed.This challenging goal is within reach as we approach amore ab initio way to perform full-spectral modelling. Full-spectral ﬁtting is sometimes presented as a way to extractmore information from blended features. Such an argumentcan be misleading because for high-resolution spectra thisgain is minimal, as the information per spectral feature onlyadds in quadrature (Ting et al. 2017a; Sandford et al. 2020).However, in terms of extracting intrinsic covariance, per-forming full spectral ﬁtting with all stellar labels simultane-ously (e.g., with

THE PAYNE , Ting et al. 2019, or

CYCLE - STARNET , O’Briain et al. 2020) may be advantageous com-pared to a multi-step approach like

ASPCAP ’s; the statisticalcovariance matrix from χ -minimization from full-spectralﬁtting represents the full correlated uncertainties from pho-ton noise. In the same vein, classical ﬁtting techniques haveadvantages compared to deep learning inferences like A S - Another source of correlation comes from the fact that some elements (e.g.,essential electron donors) can substantially alter the stellar atmosphere. Asa result, it would modify other elements’ spectral features even those spec-tral features are not associated with the elements in question (see the ap-pendix in Ting et al. 2016a)

ING & W

EINBERG

Figure 13.

Residual correlation structures derived with different spectral analysis pipelines, all evaluated at our standard reference point T eﬀ = 4500 K , log g = 2 . , [Fe / H] = [Mg / Fe] = 0 . The left panel uses the

ASPCAP abundances (this study). The middle panel usesabundances for the same data set derived by

THE PAYNE with atmospheric models improved through

CYCLE - STARNET . The right paneluses a DR14 sample with abundances from

THE CANNON . In all three cases, we adopt the common subset of 13,207 stars that have

THECANNON

DR14 abundances as the training set for a more robust comparison. The two ab initio modeling pipelines yield similar though notidentical correlation patterns, while correlations from

THE CANNON are weaker on average and different in structure for some elements. Weakercorrelations could arise if

THE CANNON ’s abundances are partly affected by the fact that data-driven models partially infer abundances throughastrophysical correlations instead of measuring them (see Fig. 8).

TRO

NN (Leung & Bovy 2019a) or

STARNET (Fabbro et al.2018), as the latter does not have easy access to evaluate theobservational covariance from ﬁrst principles.Alternatively, the repeated spectrum technique alreadyused to infer

ASPCAP ’s statistical uncertainties (§5.4 ofJ¨onsson et al. 2020) could be extended with larger repeatsamples to estimate abundance uncertainty correlations andto better capture effects that lead to non-Gaussian deviations.Our study highlights the importance of collecting repeatedspectra to characterize the photon noise uncertainty. We cau-tion that while the dispersion of abundances in star clustersis sometimes used as an empirical estimate of observationaluncertainties, such an estimate is inadequate for modelingthe photon noise uncertainty. The dispersion from open clus-ters includes both photon noise uncertainties and differen-tial systematics, and for some elements the photon noise un-certainty could be further inﬂated by dispersion of intrinsicabundances within the cluster.One of the important outcomes of large multi-elementdata sets has been the emergence of data-driven abundancepipelines such as

THE CANNON (Ness et al. 2015), A S - TRO

NN (Leung & Bovy 2019a),

STARNET (Fabbro et al.2018) which use labels determined from a subset of spectra totrain a model that infers these labels from other spectra. Thisapproach is especially powerful for cross-calibrating surveyswith different wavelength ranges or spectral resolution (Nesset al. 2015; Nandakumar et al. 2020), and it may achievesmaller statistical uncertainties than a strictly forward model-ing approach. However, a data-driven method carries the riskof “learning” the astrophysical correlations of abundances in stars, so that the abundance of an element is to some degreeinferred from the abundances of other elements rather thanfrom true spectral features of that element (see, e.g., ﬁg. 19in Wheeler et al. 2020, and ﬁg. 3 in Ting et al. 2017b). Var-ious approaches have been attempted to mitigate this effectwith L1 regularization (Casey et al. 2016), censoring pixels,or imposing theoretical priors (Ting et al. 2017b; Xiang et al.2019), but these mitigations are not perfect.For the problem of measuring residual abundance correla-tions, the focus of this paper, a data-driven pipeline may beless well suited than a traditional forward modeling pipeline.If abundance values are partly inferred through astrophysi-cal correlations with conditioning elements, residual corre-lations with these elements will be suppressed (see Fig. 8),and correlations among non-conditioning elements could beartiﬁcially enhanced. We investigate this issue in Fig. 13. Wecrossmatch our training set with the APOGEE DR14

CAN - NON catalog (APOGEE does not have an ofﬁcial DR16

CAN - NON catalog). This cross-matching leaves us with , stars as the training set. As a comparison, we also run thesame analysis using THE PAYNE . Since the DR14

PAYNE catalog (Ting et al. 2019) did not measure a few elementscompared to

ASPCAP , we re-run

THE PAYNE on DR16 us-ing an improved set of Kurucz models (Kurucz 1993, 2005,2013), which are auto-calibrated with a machine learningtechnique known as domain adaptation in

CYCLE - STARNET (O’Briain et al. 2020). In order to have a more robust com-parison, we train conditional normalizing ﬂows for the AS - PCAP , CANNON , and

PAYNE / CYCLE - STARNET abundanceswith this common subset of , stars. OW M ANY E LEMENTS M ATTER ? 25Fig. 13 compares the results. Since

THE CANNON didnot measure Cu, and

THE PAYNE / CYCLE - STARNET cannotprovide robust measurement for K, we omit these two ele-ments in this comparison. The ﬁgure shows that ab initio ﬁtting techniques (

ASPCAP and

THE PAYNE ) give qualita-tively similar correlation structures and strong correlations( ρ = 0 . − . ), though there are a few notable differences.For example, THE PAYNE abundances seem to show smallerresidual correlations associated with O and stronger correla-tions associated with V. Some differences are not unexpecteddue to the different spectral models adopted . THE CANNON abundances, by contrast, show visiblyweaker residual correlations, with some qualitatively differ-ent correlation structures, despite that fact that

THE CANNON was trained on a higher quality subset of

ASPCAP abundancesand therefore should have inherited the same model system-atics. The differences are particularly prominent for entriesthat are far from the diagonal. These differences tentativelysuggest that

THE CANNON is artiﬁcially damping some resid-ual correlations because its abundance values are affected by“learned” correlations with the conditioning elements Fe andMg. More generally, the impact of correlated measurementuncertainties on observed correlation patterns may be moredifﬁcult to evaluate for a data-driven pipeline than for an ab initio forward modeling pipeline. We plan to investigatethis issue more fully in a forthcoming paper (V. Chandra, inprep.).

Describing the elemental abundances of stars with theconditional normalizing ﬂow and minimizing systemat-ics through parameter conditioning offers exciting newprospects in stellar population and Galactic evolution studies.One such opportunity is a more sensitive and robust approachto ﬁnding chemical outliers, stars whose unusual abundancesmay reveal rare astrophysical processes or Galactic events.Very metal-poor or metal-rich stars are already rare in the lo-cal disk, but this in itself does not imply that their abundance patterns are unusual. The conditional PDF allows us to askwhether a star’s abundances are unusual relative to stars ofthe same overall metallicity and α -enhancement, and con-ditioning on T eﬀ and log g means that these unusual abun-dances are unlikely to be caused by differential systematicuncertainties. As already seen in Ting et al. (2019), oxygen abundances from

THE PAYNE appear to follow an [O / Fe] − [Fe / H] trend closer to the one from the op-tical surveys (see their ﬁg 12), in which [O / Fe] continues to increase forlower [Fe / H] , whereas the [O / Fe] from

ASPCAP plateaus at low metallic-ity (Fig. 4).

Several recent studies have searched for groups of chemi-cal outliers with similar, distinctive abundance patterns. Forexample, Price-Jones et al. (2020) applied DBSCAN to the8D elemental abundance space in APOGEE and found 21candidate (disrupted) clusters that have more than 15 mem-bers with the same ages. Ting et al. (2016b) constrainedthe maximum mass of Milky Way star clusters through the non -detection of large, chemically homogeneous groups inAPOGEE DR12. Hogg et al. (2016) applied a k-means par-titioning of 15-element abundance space to APOGEE abun-dances derived by

THE CANNON , recovering multiple knownstar clusters through a blind search of abundance space. Rat-cliffe et al. (2020) applied a hierarchical clustering algorithmto a set of 19 abundances for red clump stars in APOGEEDR14, ﬁnding that groups deﬁned by abundances are spa-tially separated as a function of age. However, to date mostsearches have mostly found rather obvious structures, such asglobular clusters and the metal-poor high- α population (butsee Price-Jones et al. 2020), or reconstructing tidal streamsof known star clusters (e.g. Kos et al. 2018; Simpson et al.2020). A critical obstacle is that the sampling density ofstars changes drastically with [Fe / H] and [ α/ Fe] , and thesechanges can overwhelm more subtle signals in a clusteringalgorithm. Searches in the residual abundances after con-ditioning on Fe and Mg will be a more sensitive probe ofdisrupted structures with distinctive abundance patterns.The most ambitious extension of this approach to GalacticArchaeology is the idea of “chemical tagging,” that we canidentify pairs or groups of stars that were born in the samecluster because they have nearly identical abundances, evenif they are now widely separated in phase space (Freeman& Bland-Hawthorn 2002). Since a star’s atmospheric abun-dances do not change much in its lifetime, they can serve asthe “DNA ﬁngerprints” of the star’s origins. Critical to theviability of this program is the ability to identify “chemicalsiblings” that have the same intrinsic abundances in the pres-ence of “chemical doppelgangers” that are unrelated but haveabundances that are consistent within observational uncer-tainties. Some recent analyses of this challenge have painteda rather bleak outlook for the “strong chemical tagging ”(Ting et al. 2015; Ness et al. 2018). Our methods and ﬁnd-ings have numerous implications for the prospects of chemi-cal tagging. We brieﬂy address several of them here, reserv-ing a detailed discussion for future work.First, we note the obvious point that if all abundanceswere in fact predictable from [Fe / H] and [Mg / Fe] then the Unfortunately, the word “chemical tagging” has been used in other contextsnowadays and therefore can be confusing at times. For example, “weak”chemical tagging often refers to inferring the stars’ birth Galactocentricradii from their elemental abundances (and ages). Here we are referringto chemical tagging in its original form – i.e., reconstructing disrupted starclusters.

ING & W

EINBERG

Figure 14.

Reassessing chemical tagging and the chemical doppelganger rate. (

Left ) The blue histogram shows the probability that a pair ofco-natal stars with identical intrinsic abundance has log-likelihood ln L same (Eq. 21), assuming observational uncertainty equal to the reported ASPCAP values. The orange histogram shows the log-likelihood distribution for pairs of stars drawn randomly from N (0 , C jk, tot ) , i.e., froma Gaussian approximation to the residual abundance PDF conditioned on [Fe / H] = [Mg / Fe] = 0 . (

Right ) The solid black curve shows thecompleteness of selecting co-natal pairs as a function of the adopted threshold in ln L same . The dashed blue curve shows the correspondingcontamination by unrelated random pairs, assuming a ﬁeld-to-co-natal ratio f = 1000 . Orange and green dashed curves show the contaminationif the observational dispersion is reduced by a factor of two or three, respectively; in each case we have added a constant to ln L same so thatthe completeness (black curve) stays the same. Chemical tagging with APOGEE abundances is difﬁcult given current abundance precision and f (cid:39) , but factors of 2-3 improvement in precision would make it possible to recover co-natal pairs with high ﬁdelity. task of chemical tagging would be hopeless, as the infor-mation encoded by elemental ﬁngerprints would be too lim-ited. Our detection of numerous signiﬁcant correlations atthe measurement precision already achieved by APOGEE isan encouraging demonstration that the information contentof multi-element abundances is rich. Second, our analysisshows that APOGEE really is achieving the high precisionimplied by repeat stellar observations, with uncertainty of0.01-0.02 dex for eight or more elements (Fig. 6). Largerobservational dispersion suggested by some previous studiesmay be caused by differential systematic uncertainties acrossthe sample. Third, the normalizing ﬂow technique may bea powerful tool for chemical tagging because it can mit-igate differential systematics through conditioning and be-cause it offers a well deﬁned way of assessing the probabilityof observing a given set of abundances for a star or pair ofstars, including non-Gaussian features of the PDF. Finally,the correlations revealed by our APOGEE analysis alter theefﬁciency of distinguishing siblings from doppelgangers, aneffect not previously accounted for in these estimates (Nesset al. 2018). To calculate this last effect, we start with the two-elementconditional PDF p ([ X/ H] | T eﬀ , log g, [Fe / H] , [Mg / Fe]) ,evaluated at our usual reference point of Solar metallic-ity, T eﬀ = 4500 K, log g = 2 . . Recall that the residualvariances and correlations are largely independent of thischoice. As discussed in §2.2, the total covariance matrix C jk, tot , which we can evaluate from the conditional normal-izing ﬂow, is the sum of the observational dispersion C jk, obs and the intrinsic covariance C jk, int . For this calculation weapproximate C jk, obs as a diagonal matrix with entries givenby the reported ASPCAP photon noise (dashed line in Fig 6).The difference of C jk, tot and C jk, obs gives us the intrinsiccovariance C jk, int .Suppose that we observe a pair of stars whose intrinsicabundances are identical. If the observed abundance vectors (cid:126)x and (cid:126)x have Gaussian observational uncertainties, then thePDF of the abundance differences (cid:126)x − (cid:126)x follows a normaldistribution N (0 , C jk, obs ) , with a log-likelihood ln L same = ln N (0 , C jk, obs ) . (21)For a pair of unrelated stars, the PDF of (cid:126)x − (cid:126)x insteadfollows N (0 , C jk, tot ) . Recall that we have already condi- OW M ANY E LEMENTS M ATTER ? 27tioned on Fe and Mg, so we are considering pairs of stars forwhich these two abundances are indistinguishable. The nor-malizing ﬂow gives us the full and potentially non-Gaussian p ( (cid:126)x − (cid:126)x ) , but here we approximate the residual abundancedistribution as Gaussian to facilitate our comparisons of al-ternative cases below.The left panel of Fig. 14 plots the distribution of ln L same (Eq. 21) for pairs of stars that are co-natal and thus have iden-tical abundances, and for pairs of stars drawn randomly fromthe conditional PDF. As expected, co-natal stars are muchmore likely to have small (cid:126)x − (cid:126)x and thus high ln L same . Forchemical tagging one wants to ﬁnd a threshold in ln L same that selects most co-natal pairs while rejecting most random“doppelganger” pairs.Whether or not the chemical doppelgangers will over-whelm the chemical siblings depends critically on the ﬁeld-to-co-natal ratio f , i.e., the prior knowledge of whether apair of stars is random as opposed to being co-natal. As dis-cussed at length by Ting et al. (2015), for a given star-formingevent with cluster mass M c , this ratio can be approximatedto be (cid:39) M ∗ /M c , where M ∗ is the integrated star formationrate (SFR) of a certain Galactocentric annulus that producesstars with similar Fe and Mg to the cluster. If we furtherassume that the total integrated SFR at a given Galactic an-nulus is about × M (cid:12) (e.g., Ting et al. 2015; Bovy &Rix 2013; Bland-Hawthorn & Gerhard 2016) , and approx-imate that we can grid the Fe and Mg abundance space to1000 bins (assuming Fe and Mg photon noise precision of0.01 dex with 0.5 dex span in [Fe / H] , and 0.2 dex span in [Mg / Fe] ), then f = M c / (5 × M (cid:12) ) . For a large clus-ter with M c = 5 × M (cid:12) , like Westerlund 1, we will have f (cid:39) , the value adopted below.The solid black line in the right panel of Fig. 14 shows thecompleteness of selecting co-natal pairs as a function of thelog-likelihood threshold, i.e., if we only consider pairs with ln L same larger than the value on the x -axis. The blue dashedline shows the contamination rate, deﬁned as the ratio of ran-dom ﬁeld pairs above the threshold to the total number ofpairs above the threshold, assuming a ﬁeld-to-co-natal ratioof f = 1000 . With the current ASPCAP photon noise illus-trated in Fig. 6, which corresponds to ∆[Fe / H] (cid:39) . dexand similar or larger values for other elements, it remainschallenging to identify co-natal stars. For any threshold thathas reasonable completeness, the contamination rate is nearly100%, i.e., doppelgangers far outnumber siblings.However, moderate improvements in measurement pre-cision can dramatically change this picture. Orange and Note that, the integrated SFR is not exactly the current stellar mass, butthis ratio is roughly compensated if we only count stars within a certainGalacto-centric annulus like the Solar neighbourhood (for details, see Tinget al. 2015). green dashed curves show forecasts in which we reducethe observational dispersion by a factor of two or a factorof three (equivalently, reduce the variance by a factor offour and nine), setting C jk, tot = C jk, int + C jk, obs / and C jk, int + C jk, obs / . We add a constant to the log-likelihoodsuch that, in all three cases, the completeness results (solidblack line) coincide with each other. With a factor two re-duction of dispersion, one can choose a threshold that yields80% completeness with ∼ contamination, enough toyield strong statistical conclusions even if any given pairhas a signiﬁcant chance of being random. For a factor ofthree reduction of dispersion, one can choose a thresholdthat yields high completeness and minimal contamination.In the SNR-limited regime, these reductions would requirefactors of four or nine increase of observing time per star.More importantly, the ASPCAP photon noise estimates, basedon repeat spectra of stars, are usually larger than those esti-mated from χ -ﬁtting (J¨onsson et al. 2020) by a factor of afew, which implies that an abundance extraction pipeline thatcorrected for additional observational effects might achievesigniﬁcantly higher abundance precision even with the exist-ing APOGEE spectra. Conversely, without such corrections,simply increasing exposure times may not improve precisionas rapidly as σ ∼ / SNR ∼ /t . Achieving sub-0.01 dexprecision, even differentially, requires an even better under-standing of the abundance extraction pipeline, which is nodoubt challenging.We also caution that our forecast has uncertainties becauseour decomposition of the observed C jk, tot into C jk, obs and C jk, int can be affected by systematic uncertainties in thecharacterization of C jk, obs . A more careful investigation ofthe photon noise characterization is critical (§5.2). Regard-less, the forecast in this study is undoubtedly a conserva-tive estimate – supplementing our current set of APOGEEelements with elements expected to have a larger contri-bution from AGB stars or other neutron-capture processescould substantially improve chemical tagging by addingmore chemical variations. Chemical tagging may also befeasible in lower density regions of the residual abundancespace, even if doppelgangers overwhelm siblings in the coreof the distribution. Finally, besides characterizing C jk, obs ,measuring a robust C jk, tot as we have done in this study isalso important. These correlations change the chemical tag-ging effectiveness because they “tilt” C jk, int relative to thenearly diagonal C jk, obs . We repeated our (cid:112) C jk, obs / fore-cast after setting the off-diagonal elements of C jk, int to zero,and we found that ignoring the correlations can artiﬁciallyalter the expected contamination rate (by 10-20%). The design of a Galactic spectroscopic survey involvestrade-offs among numerous competing considerations, in-8 T

ING & W

EINBERG

Figure 15.

Inﬂuence of measurement precision and sample size on ability to detect low-amplitude correlations of residual abundance variations.(

Top ) Distribution of correlation coefﬁcients produced by measurement aberration in our data set (black curve), with abundance uncertaintyof ∆[Fe / H] = 0 . dex and ∆[Mg / H] = 0 . dex, and in data sets with abundance uncertainties lower (red) or higher (blue) by a factorof two. ( Bottom ) Distribution of the ﬁnite sampling uncertainties in correlation coefﬁcients, as estimated from bootstrap resamplings of ourAPOGEE data set (black curve) and numerical experiment with samples of , , or , stars (blue, orange, green) from the samedistribution. Although our APOGEE sample has (cid:39) , stars, the sampling uncertainty after conditioning on T eﬀ , log g , [Fe / H] , and [Mg / H] corresponds to an effective sample size N eﬀ = 1 /ρ (cid:39) . Detecting correlations at the level of ρ (cid:39) . − . requires highabundance precision and samples that are either large or targeted in stellar parameters so that N eﬀ approaches N sample . cluding number of targets, types of targets, spatial coverage,wavelength range, spectral resolution, and SNR. Design de-cisions depend partly on the instrumentation, telescope facili-ties, and observing time available, on synergy with other datasets, and on the prioritization of the survey’s science goals.Here we discuss implications of our results for a survey thatplaces high priority on mapping the correlations among el-emental abundances, to understand nucleosynthesis and theastrophysical origin of the elements, to trace distinct stellarpopulations through space and time, and to probe complexi-ties of Galactic chemical evolution. The key design consid-erations are to minimize systematic uncertainties in artiﬁcialcorrelations and to maximize the signal-to-noise ratio for de-tecting intrinsic correlations.We ﬁrst note that our analysis in §4.3 suggests an intrin-sic dispersion of 0.01-0.02 dex for the residual abundancesof many APOGEE elements after conditioning on Fe and Mg. This is a useful reference value when thinking about thestatistical precision of abundance measurements. Recall thatfrom Eq. 17, observed correlations are suppressed relative tointrinsic correlations by a factor that depends on the ratio ofobservational variance to intrinsic variance. Consequently, itwill generally be difﬁcult to measure low-amplitude intrin-sic correlations for elements whose photon noise uncertain-ties are much larger than the intrinsic dispersion, or 0.02 dex.The fact that six out of the seven conditioning elements in thisstudy have dispersions of < . dex vividly illustrates thisconstraint. Nonetheless, we emphasize that measuring thesecorrelations is still possible in a large sample if all sources ofspurious correlations are sufﬁciently well characterized.We have already discussed the issue of correlated statis-tical uncertainties in abundance measurements (§5.2). Thisconsideration favors higher spectral resolution to reduce theimpact of blended features that drive stronger correlations. OW M ANY E LEMENTS M ATTER ? 29In principle, the correlations from statistical uncertainty canbe predicted and subtracted, but this requires accurate knowl-edge of their magnitude. The larger they are, the more likelyit is that systematic uncertainty in their level will be a limitingfactor. Ideally one would like to derive the observational co-variance “theoretically” from χ ﬁtting and empirically fromrepeat spectra of stars, and demonstrate consistency betweenthem. If these estimates are inconsistent, as they currentlyare for APOGEE (J¨onsson et al. 2020), it means that the sta-tistical uncertainties are not fully understood, and it indicatesthat improved data reduction and modeling might be able toextract higher precision abundances from the existing spec-tra. We emphasize yet again that it is valuable to minimizeand fully characterize the photon noise in abundance mea-surements even if the absolute uncertainty is dominated byimperfections in the atmospheric and spectral synthesis mod-els. These modeling systematics generally do not add disper-sion or artiﬁcial correlations to the abundances derived forstars with similar properties. The latter can be attained bymodeling the data with normalizing ﬂows.Higher measurement precision is desirable both to boostthe expected correlation signal (assuming that the intrinsiccorrelation ρ jk, int is ﬁxed by the underlying astrophysics)and to reduce the artiﬁcial correlations caused by measure-ment aberrations. In the top panel of Fig. 15, the black solidcurve shows the distribution of correlation coefﬁcients pro-duced by measurement aberration in our analysis, computedas described in §4.4.3. This calculation assumes random un-certainty in [Fe / H] and [Mg / H] at the level of the mean AS - PCAP uncertainties for our sample, 0.008 dex and 0.011 dex,respectively. The red curve shows the predicted distributionif the random uncertainties for both elements are reducedby a factor of two, which drastically reduces the numberof aberration-induced correlations above | ρ | (cid:39) . . Con-versely, doubling the [Fe / H] and [Mg / H] uncertainties (bluecurve) leads to a much broader distribution of aberration cor-relations. In principle, the aberration correlations are a pre-dictable mean signal, not an uncertainty, so one should beable to detect true correlations even at levels within this dis-tribution. However, systematic uncertainty in the exact levelof the photon noise uncertainty makes the predicted aberra-tion signal uncertain. This systematic uncertainty in the aber-ration correlations is a major reason that we do not push be-yond seven conditioning elements in our current analysis.The other limiting factor in detecting weak correlations issampling uncertainty, which scales as N − / . Althoughnot shown, we tested this theoretical scaling extensively withnumerical experiments. For N sample smaller than the cur-rent sample size, we subsampled the APOGEE sample. Asfor N sample larger than the current sample size, we drawmock samples from the emulated APOGEE joint distribu-tion (Fig. 5). We found that the scaling is robust for < N sample < , . We did not test beyond N sample =100 , . The exact scaling also indirectly demonstrates thatour normalizing ﬂow parameterization adequately describesthe distribution and only incurs negligible uncertainty. Theresults from some of these numerical experiments are shownin the bottom panel of Fig. 15. The black curve showsthe distribution of sampling uncertainty amplitudes estimatedfrom bootstrap resamplings of our APOGEE training sam-ple (§4.4.2). The uncertainties are typically ρ (cid:39) . ,with some variation depending on the element correlationin question. Sampling uncertainty sets a statistical limit onthe detectability of low-amplitude correlations for a givensample size. Green, orange, and blue curves show the dis-tribution from our numerical experiments with N sample =100 , , , or , respectively.As previously discussed, the sampling uncertainty we ﬁndin our bootstrap analysis implies that the effective size of oursample after conditioning on T eﬀ , log g , [Fe / H] , and [Mg / H] is only N eﬀ = 1 /ρ (cid:39) , even though our full sample is N sample (cid:39) , . Because our measured correlation sig-nal is fairly consistent from one reference point to another(Appendix A), we could average results from multiple refer-ence points to reduce sampling uncertainty. We have exper-imented with this approach and obtained some reduction ofsampling uncertainty when integrating over T eﬀ and log g .However, since the training sample already spans a smallrange of T eﬀ − log g , we found that the reduction is minimal(from ρ = 0 . to . ). Expanding the sample beyondthis T eﬀ − log g range will risk distorting or diluting correla-tions due to the differential systematics of the abundances.We found that we can approach the theoretical ρ (cid:39) . corresponding to N eﬀ (cid:39) if we further integrate over [Fe / H] and [Mg / H] , but as demonstrated in Fig. 12, differentpopulations have subtle differences in the residual correlationstructures, and hence we chose not to integrate the signals.The bottom line is that the same survey strategy as APOGEEas assumed in the bottom panel of Fig. 15 is not the most ef-fective approach for this particular study. With a more carefulselection (e.g., through Gaia’s color-magnitude diagram) of“stellar twins,” with similar T eﬀ , log g, [Fe / H] , and [Mg / Fe] ,at any given reference point in this 4D-distribution, we couldachieve what APOGEE enabled in this study with a sampleof only O (1000) stars.While normalizing ﬂows could mitigate some of these lim-itations, nonetheless, collectively these considerations sug-gest that an effective strategy for mapping out multi-elementabundance correlations might be to target moderate-sizedsamples ( N sample (cid:39) O (10 ) ) of stars pre-selected in narrowranges of T eﬀ , log g at multiple reference points in [Fe / H] ,and [Mg / Fe] , obtaining high SNR at high spectral resolu-tion, analogous to solar twin studies (Ram´ırez et al. 2009;Nissen 2015; Bedell et al. 2018). Choosing narrow ranges of0 T

ING & W

EINBERG stellar parameters at each reference point mitigates differen-tial systematics as a source of observational dispersion, and itminimizes the difference between N sample and N eﬀ for set-ting sampling uncertainty. High SNR mitigates the dilutionof intrinsic correlations by observational dispersion, reducesmeasurement aberration, and reduces any systematic uncer-tainty from correlated statistical uncertainties.Within a program such as GALAH or the SDSS-V MilkyWay Mapper (Kollmeier et al. 2017), this goal could beachieved by targeting small subsets (e.g., O (1)% ) of thefull ( (cid:38) star) samples for repeated observations to buildhigh SNR. Coverage of a wide range of elements could beachieved by observing these stars in common in both opticaland infrared surveys. With well-characterized correlations atvarious locations in [Fe / H] , and [Mg / Fe] , the much largerfull samples could be used to search for outlier stars andchemically distinct groups and to apply these correlations tochemical tagging.

6. Conclusions

High-resolution, highly multiplexed spectroscopic surveysare currently measuring 10-30 elemental abundances in sam-ples of more than stars. Embedded in this multi-dimensional elemental abundance space are clues to the as-trophysical sources of the elements and archaeological in-formation about the Milky Way’s history. However, the toolsthat we currently have to decipher the irregular distribution ofstars in this high-dimensional space remain rather rudimen-tary. In this study, we have proposed a new method based ona machine learning technique known as normalizing ﬂow todepict the distribution of Milky Way disk stars in the abun-dance space spanned by 15 elements measured by the SDSSAPOGEE survey. Our key ﬁndings are summarized as fol-lows.• Conditional normalizing ﬂow allows us to minimizethe impact of differential systematic uncertaintieson observational dispersion in abundance measure-ments. After conditioning on T eﬀ , log g , [Fe / H] , and [Mg / Fe] , the residual APOGEE abundances have a to-tal dispersion of 0.01-0.02 dex for O, Si, Ca, Ni, . − . dex for Mn, Al, Co, S, Cr, Cu, and . − . dexfor K, V, and Na. These dispersions are typically 1.5-2 times higher than the photon noise uncertainties re-ported by ASPCAP for our

SNR > sample, and thedifference is plausibly explained by intrinsic disper-sion with a typical amplitude of . − . dex. Dif-ferentially, the observational dispersion of APOGEE’s [Fe / H] and [Mg / Fe] measurements is (cid:39) . dex orbetter.• We have argued theoretically and demonstrated empir-ically that abundance correlations can be measured ro- bustly in a large data set even when the statistical un-certainties of individual abundance measurements arecomparable to the intrinsic dispersion. Studying onlythe dispersions about the conditional means could missmany hidden dimensions in the elemental abundancespace. Abundance correlations are much more effec-tive for measuring this subtle information.• Although knowledge of a star’s [Fe / H] and [Mg / Fe] issufﬁcient to predict most of its APOGEE abundancesat the ∼ . dex level, the residual abundances showcross-element correlations at high signiﬁcance. Thesecorrelations cannot be discovered if the abundances areinferred from Fe and Mg, only if they are measuredindependently.• Even for Solar metallicity disk stars and a set of el-ements expected to come mainly from core-collapseand Type Ia supernovae, we must condition on at leastseven elements (e.g., Fe, Mg, O, Ni, Si, Ca, and Al)to reduce residual correlations to a level consistentwith observational uncertainties. Correlation patternsfor [Fe / H] = − . stars, with [Mg / Fe] = 0 . or [Mg / Fe] = 0 . , are similar to those found at So-lar metallicity. Our results reconcile previous ﬁnd-ings that [Fe / H] and [Mg / Fe] accurately predict manyother abundances with other studies ﬁnding 5-10 sig-niﬁcant dimensions to the stellar distribution in abun-dance space: both conclusions are true. However,since the dispersion is much less sensitive than cross-correlations of elemental abundances, the lack of re-duction in dispersion does not imply that elementalabundances are redundant.We have discussed implications of our results for surveydesign and analysis in §5. The robust statistical modelingof the elemental space mitigates differential systematics, im-proves the prospects for chemical tagging of co-natal stars,and puts the concept of chemical tagging onto a ﬁrmer sta-tistical footing. However, chemical tagging remains chal-lenging with current abundance precision. For detectingand characterizing the full network of correlations, we ad-vocate high-SNR observations of intermediate-sized samples( ∼ stars), concentrated in stellar parameters and at spe-ciﬁc locations in [Fe / H] and [Mg / Fe] , measuring elementsthat probe a range of astrophysical processes. These cor-relations encode critical information about nucleosyntheticsources and subtleties of chemical evolution. Once accu-rately measured, they can be applied to larger, lower SNRsamples to map temporal and spatial variations and selectstars with common histories.Normalizing ﬂows are a powerful new technique for de-scribing complex, high-dimensional distributions. This tech- OW M ANY E LEMENTS M ATTER ? 31nique has many potential applications in Galactic Archaeol-ogy, including identiﬁcation of outlier stars and distinctiveclusters in abundance space. In cosmology, the advent ofenormous data sets drove the development of sophisticatedstatistical techniques to interpret them. As Galactic Archae-ology surveys grow to many elements for vast numbers ofstars, a similar revolution in analysis techniques is neededto exploit the rich, multi-faceted constraints they provide onstellar astrophysics, nucleosynthesis, and the history of theMilky Way.

Acknowledgments

We thank Gregory Green for sharing his codes to visualizenormalizing ﬂows (Fig. 3). We thank Jon Holtzman, Char-lie Conroy and Henrik J¨onsson for illuminating discussions.YST is grateful to be supported by the NASA Hubble Fellow-ship grant HST-HF2-51425.001 awarded by the Space Tele-scope Science Institute. DHW acknowledges support fromNSF grant AST-1909841 and from the W.M. Keck Founda-tion and the Hendricks Foundation at the Institute for Ad-vanced Study. The APOGEE/Sloan Digital Sky Survey IV isfunded by the Alfred P. Sloan Foundation, the U.S. Depart-ment of Energy Ofﬁce of Science, and the Participating In-stitutions and acknowledges support and resources from theCenter for High- Performance Computing at the Universityof Utah. We thank Robert Kurucz for developing and main-taining the spectral synthesis programs and databases andFiorella Castelli for allowing us to use her Linux versionsof the programs.

Appendix A Conditioning on different T eﬀ and log g In this study, we focus on the reference point T eﬀ =4500 K and log g = 2 . , but our results are not sensitive tothis choice. Fig. A1 demonstrates that the residual correla-tion structures are almost identical across different reference T eﬀ and log g within the stellar parameters spanned by thetraining set.Besides mitigating any T eﬀ − abundance correlations, wefound that conditioning on T eﬀ and log g is also crucial forstudying the dispersions in this study. Conditioning on T eﬀ and log g reduced the measured dispersions (Fig. 6). With-out conditioning on T eﬀ = 4500 K and log g = 2 . , the Mnabundances demonstrate a 70% larger dispersion (0.037 dexinstead of 0.022 dex). The dispersions for Ca, Ni, Cr, Cu,and K also increase by 10-17%, indicating that for some ele-ments, abundance measurements from ASPCAP have residual differential systematics even over the restricted range of T eﬀ in this study. Appendix B Can mischaracterization of the ASPCAPphoton noise explain the correlationsignals?

In §4.4, we have demonstrated that the

ASPCAP measure-ment correlations and ﬁnite sampling uncertainty cannot ex-plain the APOGEE residual correlation structures. However,the measurement aberration effect, on the other hand, is sen-sitive to the assumption on the

ASPCAP photon noise. Inthis study, we adopt ∆[Fe / H] = 0 . dex and ∆[Mg / H] =0 . dex as reported in ASPCAP . This is likely a robust as-sumption because we have seen that even for elements withgenerally less spectral information than Fe and Mg such as Oand Si, the total dispersion (the quadrature sum of the photonnoise and intrinsic dispersion) is only ∼ . dex (Fig. 6).Here we further verify this assumption by estimating themeasurement aberration if ASPCAP had mischaracterized thephoton noise. We follow the same procedure as was done inFig. 9. The top right panel of Fig. A2 demonstrates the mea-surement aberration effect if the photon noise is two timessmaller than what was reported, and the bottom right paneltwo times larger. The top left panel illustrates the APOGEEresidual correlations as a reference, and the bottom rightpanel the measurement aberration in this study. Note that,unlike the bottom right panel of Fig. 9, here we plot the aber-ration effect perturbed by the ﬁnite sampling uncertainty tovisualize the combined effect of the two. Recall that the

ASPCAP correlation uncertainty is minimal. Therefore, thecombined effect here serves as a conservative limit. With-out the sampling noise, the difference between the measure-ment aberrations and the APOGEE correlations would beeven more apparent.The ﬁgure demonstrates that, in order to generate thestrong correlations ( ρ = 0 . − . ) as we measured fromthe APOGEE data, the photon noise of ASPCAP needs to betwo times larger than what was reported, which is at oddswith the total dispersions for the other elements. But moreimportantly, even in this case, as shown in the bottom rightpanel, the aberration effect will become so strong that all el-emental abundances will be highly and positively correlated.As shown in the top left panel, the APOGEE residual correla-tions exhibit more structures and do not simply exhibit strongcorrelations among all elemental abundances. Our numericalexperiment suggests that the correlations that we measured inAPOGEE cannot be explained simply by the mischaracteri-zation of the ASPCAP photon noise. This experiment also in-directly demonstrates that photon noise uncertainty reportedby

ASPCAP is robust. Differentially, APOGEE has indeedmeasured [Fe / H] to a precision of 0.01 dex or better.2 T ING & W

EINBERG

Figure A1.

The abundance correlation structures at different reference T eﬀ , log g when conditioning on two elements (Fe and Mg). Thecorrelation structures, as shown in the left panel of Fig. 8, are largely invariant within the stellar parameter range spanned by the APOGEEtraining data in this study. References

Ahumada, R., Prieto, C. A., Almeida, A., et al. 2020, ApJS, 249, 3Andrews, B. H., Weinberg, D. H., Johnson, J. A., Bensby, T., &Feltzing, S. 2012, AcA, 62, 269. https://arxiv.org/abs/1205.4715Andrews, B. H., Weinberg, D. H., Sch¨onrich, R., & Johnson, J. A.2017, ApJ, 835, 224Bedell, M., Bean, J. L., Mel´endez, J., et al. 2018, ApJ, 865, 68Bensby, T., Feltzing, S., & Lundstr¨om, I. 2003, A&A, 410, 527Bland-Hawthorn, J., & Gerhard, O. 2016, ARA&A, 54, 529Bovy, J., & Rix, H.-W. 2013, ApJ, 779, 115Buder, S., Sharma, S., Kos, J., et al. 2020, arXiv e-prints,arXiv:2011.02505. https://arxiv.org/abs/2011.02505Casey, A. R., Hogg, D. W., Ness, M., et al. 2016, arXiv e-prints,arXiv:1603.03040. https://arxiv.org/abs/1603.03040Cranmer, M. D., Galvez, R., Anderson, L., Spergel, D. N., & Ho,S. 2019, arXiv e-prints, arXiv:1908.08045.https://arxiv.org/abs/1908.08045Durkan, C., Bekasov, A., Murray, I., & Papamakarios, G. 2019,arXiv e-prints, arXiv:1906.04032.https://arxiv.org/abs/1906.04032Fabbro, S., Venn, K. A., O’Briain, T., et al. 2018, MNRAS, 475,2978Fisher, R. A. 1921, Metron, 1, 1Frankel, N., Sanders, J., Ting, Y.-S., & Rix, H.-W. 2020, ApJ, 896,15Freeman, K., & Bland-Hawthorn, J. 2002, ARA&A, 40, 487Fuhrmann, K. 1998, A&A, 338, 161Gaia Collaboration, Brown, A. G. A., Vallenari, A., et al. 2018,A&A, 616, A1Garc´ıa P´erez, A. E., Allende Prieto, C., Holtzman, J. A., et al.2016, AJ, 151, 144Gilmore, G., Randich, S., Asplund, M., et al. 2012, The Messenger,147, 25Green, G. M., & Ting, Y.-S. 2020, arXiv e-prints,arXiv:2011.04673. https://arxiv.org/abs/2011.04673 Grifﬁth, E., Weinberg, D. H., Johnson, J. A., et al. 2020, arXive-prints, arXiv:2009.05063. https://arxiv.org/abs/2009.05063Gustafsson, B., Edvardsson, B., Eriksson, K., et al. 2008, A&A,486, 951Hayden, M. R., Bovy, J., Holtzman, J. A., et al. 2015, ApJ, 808,132Hogg, D. W., Casey, A. R., Ness, M., et al. 2016, ApJ, 833, 262Holtzman, J. A., Shetrone, M., Johnson, J. A., et al. 2015, AJ, 150,148Jimenez Rezende, D., & Mohamed, S. 2015, arXiv e-prints,arXiv:1505.05770. https://arxiv.org/abs/1505.05770Johnson, J. W., & Weinberg, D. H. 2020, MNRAS, 498, 1364J¨onsson, H., Holtzman, J. A., Allende Prieto, C., et al. 2020, AJ,160, 120Kingma, D. P., & Dhariwal, P. 2018, arXiv e-prints,arXiv:1807.03039. https://arxiv.org/abs/1807.03039Kollmeier, J. A., Zasowski, G., Rix, H.-W., et al. 2017, arXive-prints, arXiv:1711.03234. https://arxiv.org/abs/1711.03234Kos, J., Bland-Hawthorn, J., Freeman, K., et al. 2018, MNRAS,473, 4612Krumholz, M. R., & Ting, Y.-S. 2018, MNRAS, 475, 2236Kurucz, R. L. 1993, SYNTHE spectrum synthesis programs andline data, ed. Kurucz, R. L.—. 2005, Memorie della Societa Astronomica ItalianaSupplementi, 8, 14—. 2013, ATLAS12: Opacity sampling model atmosphereprogram, Astrophysics Source Code Library.http://ascl.net/1303.024Leung, H. W., & Bovy, J. 2019a, MNRAS, 483, 3255—. 2019b, MNRAS, 489, 2079Liu, L., Jiang, H., He, P., et al. 2019, arXiv e-prints,arXiv:1908.03265. https://arxiv.org/abs/1908.03265Luo, A. L., Zhao, Y.-H., Zhao, G., et al. 2015, Research inAstronomy and Astrophysics, 15, 1095 OW M ANY E LEMENTS M ATTER ? 33

Figure A2.

The measurement aberration effect, assuming different photon noise levels for

ASPCAP . The top left panel shows the measuredAPOGEE correlations (when conditioning with two elements - Fe and Mg) as a reference. The other panels demonstrate different degrees ofspurious correlations due to measurement aberration adopting different photon noise uncertainties for [Fe / H] and [Mg / H] . We perturb theaberration effect with the ﬁnite sampling uncertainty in the APOGEE data to better visualize the combined effect. In this study (Fig. 9), weassume the ASPCAP reported values – ∆[Fe / H] = 0 . and ∆[Mg / H] = 0 . , and the aberration effect is shown in the bottom left panel.The top right panel shows the aberration effect if the ASPCAP photon noise uncertainties were two times smaller than what had been reportedin DR16, and the bottom right panel two times larger. A two times larger photon noise could mimic the amplitudes of the measured signals interms of the residual abundance correlations. However, in this case, almost all elemental abundances will be highly and positively correlated,which is at odds with the measured APOGEE abundance correlations.

ING & W