Marginal likelihoods of distances and extinctions to stars: computation and compact representation
MMon. Not. R. Astron. Soc. , 000–000 (0000) Printed Thursday 8 th November, 2018 (MN L A TEX style file v2.2)
Marginal likelihoods of distances and extinctions to stars:computation and compact representation
S. E. Sale and J. Magorrian , Rudolf Peierls Centre for Theoretical Physics, Keble Road, Oxford OX1 3NP, UK Institut d’Astrophysique de Paris, 98bis Boulevard Arago, 75014, Paris, France
Received .........., Accepted...........
ABSTRACT
We present a method for obtaining the likelihood function of distance and extinction to astar given its photometry. The other properties of the star (its mass, age, metallicity and soon) are marginalised assuming a simple Galaxy model. We demonstrate that the resultingmarginalised likelihood function can be described faithfully and compactly using a Gaussianmixture model. For dust mapping applications we strongly advocate using monochromaticover bandpass extinctions, and provide tables for converting from the former to the latter fordifferent stellar types.
Key words: dust, extinction – methods: statistical – stars: distances
Our present lack of knowledge of the three-dimensional distribu-tion of interstellar dust is a significant barrier to building a completepicture of our Galaxy. Like the sun, most of the Galaxy’s stars lieclose to the plane, which means that their light is subject to sig-nificant extinction before it reaches us. Therefore any attempt toconstruct a complete model of the Galaxy’s stellar density distribu-tion must include a three-dimensional model of the extincting dustdistribution. Dust is also interesting in its own right as a tracer ofthe densest parts of the interstellar medium (ISM). Consequentlythere is now a growing industry devoted to understanding and map-ping extinction, with a number of authors either presenting methodsfor mapping extinction (e.g Majewski, Zasowski & Nidever 2011;Sale 2012; Hanson & Bailer-Jones 2014; Green et al. 2014) or con-structing actual maps of extinction in two or three dimensions (e.g.Marshall et al. 2006; Planck Collaboration et al. 2014; Lallementet al. 2014; Sale et al. 2014).A superficially attractive and straightforward way of produc-ing a 3D extinction map is by first using a method such as Berryet al. (2012) and Hanson & Bailer-Jones (2014) to calculate pos-terior expectations for the distances and extinctions to large num-bers of stars individually, then binning the results spatially to pro-duce a map. Unfortunately, this produces maps that are biased ina complicated manner. There are three principal sources of bias.First, almost any catalogue of stars will itself not be an unbiasedsample of the stars in the Galaxy. Most catalogues are magnitudelimited, which biases them towards less extinguished, and there-fore brighter, stars. Dealing effectively with such selection effectsis not trivial and is the subject of Sale (in prep.). Second, we expectthat extinction along two nearby sightlines should be correlated:two-dimensional projected dust maps exhibit correlations on scalesranging from less than 1 pc (e.g. di Francesco et al. 2010) up to that of spiral arms. Third, the posteriors distributions of the distancesand extinctions to individual stars are frequently extended and ex-hibit complicated forms. As a result the posterior expectations ofdistance and extinction will not transmit the full range of uncer-tainties nor the complex correlations that exist between distanceand extinction.In Sale & Magorrian (2014) we presented a new method formapping extinction from star counts that avoids these problems.Building on earlier work in Vergely et al. (2001) and Sale (2012),we used a simple physical model of Kolmogorov turbulence to im-pose spatial correlations on the density map, which prevents theformation of non-physical ‘fingers of God’ as found in the maps inMarshall et al. (2006) and Sale et al. (2014). The method avoidsthe need for spatial binning, producing (realisations of) extinctionmaps whose resolution is set naturally by the available data.Most of the three-dimensional extinction mapping proceduresmentioned above, including that of Sale & Magorrian (2014), sharethe requirement that one have some way of calculating the marginallikelihoods of distances and extinctions to individual stars. Thatis, having some observations yyy (photometry and/or spectroscopyand/or astrometry) of a single star at Galactic coordinates ( l , b ) ,they need the likelihood p ( yyy | s , l , b , A , α , β ) of the distance s and ex-tinction A to the star, in which the details of the star’s mass, age,metallicity and so on have been marginalised out assuming somegalaxy model β and set of extinction laws and isochrones α . Thepresent paper provides one way of calculating such marginalisedlikelihoods. We begin though by considering the problem of howbest to parametrize the extinction law included in α and how tocalculate the effects of extinction in a range of popular photo-metric passbands. This is the subject of section 2; tables givingthe results of our calculation are available online. Then in sec- c (cid:13) a r X i v : . [ a s t r o - ph . GA ] J a n Sale and Magorrian tion 3 we present a method for calculating the marginal likelihood p ( yyy | s , l , b , A , R , α , β ) and constructing compact, accurate fits to itsdependence on ( s , A ) . Section 4 sums up. We start by defining extinction and its relationship to the columnof dust between us and a star. Much of what we discuss in this sec-tion has previously appeared by various authors, including Golay(1974), McCall (2004), Sale et al. (2009), Stead & Hoare (2009),Bailer-Jones (2011) and Casagrande & VandenBerg (2014). Nonethe less, we repeat it here for completeness and clarity.Historically, extinction has usually been estimated by lookingat the broad-band colours of stars. If one has a star of known spec-tral type, then by comparing, say, the measured B − V colour of thestar to its expected intrinsic colour, one obtains the colour excess E ( B − V ) ≡ ( B − V ) measured − ( B − V ) intrinsic , (1)which is a direct estimate of the difference A B − A V between the B -and V -band extinctions to the star. Typically (e.g. Cardelli, Clayton& Mathis 1989; Fitzpatrick 2004) the shape of the extinction lawat optical and near–infrared wavelengths (see Figure 1 below) isassumed to depend on a single parameter R V , defined through R V ≡ A V A B − A V = A V E ( B − V ) . (2)Estimates of A V , A B then follow directly from E ( B − V ) given anassumed R V . The procedure for other bands ( X , Y ) is similar: mea-sure a colour excess E ( X − Y ) , then use an assumed extinction lawto obtain the broad-band extinctions A X and A Y .Such broad band extinctions are less than ideal for mappingdust, however. To see this, recall that the extinction in a band X toa distance s along a single line of sight is given by A X ( s ) = − . (cid:32) (cid:82) ∞ d λ F ( λ ) T X ( λ ) e − (cid:82) s ds (cid:48) κ λ ( s (cid:48) ) ρ ( s (cid:48) ) (cid:82) ∞ d λ F ( λ ) T X ( λ ) (cid:33) , (3)where ρ ( s ) is the density of dust along the line of sight, κ λ ( s ) isits wavelength-dependent opacity, F ( λ ) is the SED of the observedstar and T X ( λ ) the combination of the transmission of the filter X ,the transmission of the atmosphere, the transmission of the rest ofthe telescope, the detector efficiency and a function that charac-terises how the detector responds to incident flux . It is obviousfrom this equation that passband-based measurements of extinc-tion and reddening, such as A V and E ( B − V ) , depend not only onthe dust column between us and a star (i.e. ρ and κ ), but also on thestar’s SED (see also e.g McCall 2004; Sale et al. 2009; Bailer-Jones2011). Consequently it is possible to observe two stars of differentspectral types behind the exact same dust column and obtain dif-ferent measurements of, e.g., A V from each. It is perhaps less im-mediately obvious from equation (3) that the relationship between We note that ‘marginal likelihood’ can refer to a likelihood with some or all of the parameters of the model employed marginalised out. However, thisterminology is most frequently used in the case where all parameters havebeen marginalised, in which case the ‘marginal likelihood’ is sometimesalso called the ‘evidence’ and is employed in model selection applications.Our use of the term is distinct to this case as we only marginalise someparameters. To keep this and subsequent expressions readable, we suppress the de-pendence on the line of sight ( l , b ) in this and subsequent expressions. For most modern detectors this is a factor proportional to λ since thedetector counts incident photons (Bessell 2005). λ/ ˚ A A λ / A Figure 1.
Fitzpatrick (2004) extinction laws for ‘ R V = .
1’ (black), ‘ R V = .
6’ (red), ‘ R V = .
1’ (blue), ‘ R V = .
6’ (green), ‘ R V = .
1’ (magenta),‘ R V = .
6’ (cyan) and ‘ R V = .
1’ (yellow). All have been normalised to5495 ˚A. column density and the broad-band extinction A X is not linear, evenwhen the opacity κ λ is independent of position. Therefore, althoughquantities such as E ( B − V ) and A V are conveniently close to obser-vation, they mix together the effects of the dust column and stars’SEDs in a way that is non-trivial to disentangle: by consideringpassband-based measurements of extinction one obscures the truephysics of the ISM behind a layer of obfuscating variables.An alternative to passband-based measurements of extinctionis to consider monochromatic measurements (e.g., McCall 2004;Sale et al. 2009; Bailer-Jones 2011). The monochromatic extinctionat wavelength λ is given by A λ ( s ) = − . (cid:16) e − (cid:82) s ds κ λ ( s (cid:48) ) ρ ( s (cid:48) ) (cid:17) (4) = . (cid:90) s ds (cid:48) κ ( s (cid:48) , λ ) ρ ( s (cid:48) ) , (5)which follows directly from (3) on adopting a Dirac delta functionfor the transmission filter T X ( λ ) . It is immediately apparent that this A λ ( s ) does not depend on the SED of the observed star and thatits derivative d A λ / d s is linear in κ λ ρ . Therefore monochromaticextinction offers a much more direct view on the distribution ofdust, mediated only by variations in dust opacity.It might appear that using monochromatic extinctions wouldbe significantly more complicated than employing band-basedmeasurements. But, if working within a Bayesian framework, or in-deed with any methodology that employs a forward model, buildingin a monochromatic measure of extinction is essentially trivial: onesimply requires a model for how variations in monochromatic ex-tinction will alter observed apparent magnitudes. We will developthis model in section 2.3. The normalised form of the opacity ( κ ) dependence on wavelengthis typically referred to as the reddening or extinction law, the shapeof which depends on the dust grain size distribution, with a greaternumber of larger grains leading to a greyer extinction law (Wein-gartner & Draine 2001). There exist a number of parametrizationsof extinction laws, inferred from a range of sightlines within theGalaxy (e.g. Cardelli, Clayton & Mathis 1989; O’Donnell 1994;Fitzpatrick 2004). In Fig. 1 we show a number of those given byFitzpatrick (2004).Typically (e.g. Cardelli, Clayton & Mathis 1989; Fitzpatrick c (cid:13)000
1’ (yellow). All have been normalised to5495 ˚A. column density and the broad-band extinction A X is not linear, evenwhen the opacity κ λ is independent of position. Therefore, althoughquantities such as E ( B − V ) and A V are conveniently close to obser-vation, they mix together the effects of the dust column and stars’SEDs in a way that is non-trivial to disentangle: by consideringpassband-based measurements of extinction one obscures the truephysics of the ISM behind a layer of obfuscating variables.An alternative to passband-based measurements of extinctionis to consider monochromatic measurements (e.g., McCall 2004;Sale et al. 2009; Bailer-Jones 2011). The monochromatic extinctionat wavelength λ is given by A λ ( s ) = − . (cid:16) e − (cid:82) s ds κ λ ( s (cid:48) ) ρ ( s (cid:48) ) (cid:17) (4) = . (cid:90) s ds (cid:48) κ ( s (cid:48) , λ ) ρ ( s (cid:48) ) , (5)which follows directly from (3) on adopting a Dirac delta functionfor the transmission filter T X ( λ ) . It is immediately apparent that this A λ ( s ) does not depend on the SED of the observed star and thatits derivative d A λ / d s is linear in κ λ ρ . Therefore monochromaticextinction offers a much more direct view on the distribution ofdust, mediated only by variations in dust opacity.It might appear that using monochromatic extinctions wouldbe significantly more complicated than employing band-basedmeasurements. But, if working within a Bayesian framework, or in-deed with any methodology that employs a forward model, buildingin a monochromatic measure of extinction is essentially trivial: onesimply requires a model for how variations in monochromatic ex-tinction will alter observed apparent magnitudes. We will developthis model in section 2.3. The normalised form of the opacity ( κ ) dependence on wavelengthis typically referred to as the reddening or extinction law, the shapeof which depends on the dust grain size distribution, with a greaternumber of larger grains leading to a greyer extinction law (Wein-gartner & Draine 2001). There exist a number of parametrizationsof extinction laws, inferred from a range of sightlines within theGalaxy (e.g. Cardelli, Clayton & Mathis 1989; O’Donnell 1994;Fitzpatrick 2004). In Fig. 1 we show a number of those given byFitzpatrick (2004).Typically (e.g. Cardelli, Clayton & Mathis 1989; Fitzpatrick c (cid:13)000 , 000–000 arginal likelihoods of distances and extinctions to stars T eff / K . . . . R V Figure 2.
The R V s implied by column densities of R = .
056 dust asa function of effective temperature along the main sequence defined byStraizys & Kuriliene (1981). Different colours correspond to different ex-tinctions: A = R V de-fined in equation (2) above. This is not ideal, as the broadband ex-tinctions A B and A V that define R V depend on the SED of the starbeing observed. Consequently, as shown in Fig. 2 and by McCall(2004), one can place different sources behind the same dust col-umn and still obtain significantly different measurements of R V .Moreover, as the broadband A V and A B do not depend linearly ondust column density, the inferred R v depends also on the depth ofthe dust column in front of the star.Ideally we would like to have a dust extinction law that de-pends only on the dust’s intrinsic opacity κ λ . One possibility wouldbe to use the value of R V that one would measure if a vanishinglysmall amount of the dust were placed in front of a standard star(McCall 2004), but this is unnecessarily complicated. Instead wefollow Maiz Apell´aniz (2013) and adopt the more straightforwardquantity R ≡ A A − A , (6)where A and A are the monochromatic extinctions at5495 ˚A and 4405 ˚A respectively . This is designed to be similarto R V , but, as R is defined using monochromatic extinctions,it does not depend on the SED of the star observed and will varyalong a line of sight only if the grain size distribution and therefore κ varies.With this choice of monochromatic wavelengths, the val-ues of R for the Fitzpatrick (2004) selection of extinctionlaws are similar to the R V s they quote. For example, their ‘ R V = . , . , .
1’ curves give R = . , . , .
034 respectively.In Fig. 2 we plot the R V implied by the R = .
056 (‘ R V = . R V along the main sequence, in addition to smaller variations inresponse to increasing extinction. We note that this procedure typ-ically gives R V = . A is sometimes denoted as A (e.g Bailer-Jones 2011; Sale et al.2014), following its use in Cardelli, Clayton & Mathis (1989). given that Fitzpatrick (2004) used a sample of O,B and A stars todetermine their extinction laws Now that we have defined our extinction law, we can easily trans-form monochromatic extinction given at one wavelength to anyother wavelength. Therefore, we are free to choose the refer-ence wavelength at which monochromatic extinctions are defined.When monochromatic extinctions have been used in earlier work,the choice of wavelength has generally been made for reasonsof convenience. For example, Hanson & Bailer-Jones (2014) andSale (2012) followed Bailer-Jones (2011) in adopting “ A ”, themonochromatic extinction at 5495 ˚A, chosen to enable easy useof the Cardelli, Clayton & Mathis (1989) extinction laws, whichare anchored at this wavelength. In contrast, Sale et al. (2009) used A , the monochromatic extinction at 6250 ˚A. As this wavelengthlies near the centre of the IPHAS r band used in that study, the re-sulting measurements were less affected by variations in R .From (5) we have that dA λ ds ( s ) = . κ λ ( s ) ρ ( s ) . (7)If κ did not change along a sightline, it would be trivial to obtainthe dust column density from A λ if R , and therefore κ λ , wereknown. In reality, however, we expect that the grain size distribu-tion, and consequently R and κ , will vary along lines of sightas well as between them. Instead we can look for a wavelengthwhere κ is approximately independent of R . Examination ofthe Draine (2003) models indicates that κ λ varies only weakly withchanging dust grain distribution at around λ = A ( s ) (cid:39) . κ (cid:90) s ρ ( s (cid:48) ) ds (cid:48) , (8)where κ is the opacity at 4000 ˚A that Draine (2003) quotes as3 . × − m kg − for his R V = . λ = R = . A V (cid:39) . A + . A , (9) R V (cid:39) . R at 4000 ˚A, but is nearlyindependent of R at approximately 8000 ˚A. Therefore we ad-vise that it may prove necessary to renormalise to monochromaticextinction to a different wavelength in the future, should improveddust models that contradict those of Draine (2003) become avail-able. Changing this reference wavelength would have no effect onthe methods we propose below; the key point is that we choosea wavelength that minimises opacity variations, the actual wave-length itself is not directly important. We now examine how passband extinctions A X depend on the ex-tinction law (i.e., R ) and stellar type. We use a method similar to c (cid:13) , 000–000 Sale and Magorrian R . . . . . . c ( ) V R . . . . . . c ( ) r R . . . . . . c ( ) K R − . − . − . − . c ( ) V / c ( ) V R − . − . − . − . − . − . c ( ) r / c ( ) r R − . − . − . − . − . c ( ) K / c ( ) K Figure 3.
Examples of the dependence of c ( ) X and c ( ) X (equation 11) on R for an approximate A0V star with T eff = g = .
07 and solarmetallicity. The values in the left column are for the Johnson-Cousins V band, as defined by Bessell (1990). The centre column is for the IPHAS r -band andthe right column for the UKIDSS K -band as defined by Hewett et al. (2006). T eff . . . . c ( ) V T eff . . . . c ( ) r T eff . . . . c ( ) K T eff − . − . − . − . − . − . c ( ) V / c ( ) V T eff − . − . − . − . c ( ) r / c ( ) r T eff − . − . − . − . − . c ( ) K / c ( ) K Figure 4.
Examples of the dependence of c ( ) X and c ( ) X on T eff along the main sequence, as defined in ( T eff , log g ) by Straizys & Kuriliene (1981) for starshaving solar metallicity. The columns are the same as Fig. 3. log g . . . . c ( ) V log g . . . . c ( ) r log g . . . . c ( ) K +5 . × − Figure 5.
Examples of the dependence of c ( ) X and c ( ) X on log g . We fix T eff = (cid:13)000
Examples of the dependence of c ( ) X and c ( ) X on log g . We fix T eff = (cid:13)000 , 000–000 arginal likelihoods of distances and extinctions to stars . . . . . E ( B − V ) − . − . − . . A g − A tr u e g A tr u e g . . . . . E ( B − V ) − . − . . . . A r − A tr u e r A tr u e r . . . . . E ( B − V ) − . − . . . . . A z − A tr u e z A tr u e z . . . . . E ( B − V ) − . − . − . − . . A g − A tr u e g A tr u e g . . . . . E ( B − V ) − . − . − . . A r − A tr u e r A tr u e r . . . . . E ( B − V ) . . . A i − A tr u e z A tr u e z Figure 6.
Comparison of responses to extinction in different SDSS bands using stars of type A0V (solid curves) and M3V (dashed) as examples. Upper row:fractional errors in A g (left), A r (middle) and A z (right) obtained assuming the A X / E ( B − V ) ratios given by Schlegel, Finkbeiner & Davis (1998) (red), using A X / E ( B − V ) ratios from Schlafly & Finkbeiner (2011) (green) and by using our fit (11) (black). The solid (dashed) curves plot results for A0V (M3V) stars.Lower row: zoomed-in views of the fractional errors produced by our fits (11). that employed in Sale et al. (2009). For stars cooler than 12000 Kwe adopt the closest match from the Phoenix library of syntheticspectra (Husser et al. 2013) to the star’s SED. The Phoenix librarydoes not cover hotter stars, so, for T eff > A X , with κ λ given by the appropriate Fitz-patrick (2004) extinction law for the assumed R .Clearly one does not want to repeat this procedure every timeone seeks to determine the model colours and apparent magnitudeof some star. Casagrande & VandenBerg (2014) have suggestedprecomputing the implied passband extinctions for all combina-tions of model stars and extinction laws for a range of A . How-ever, storing all these extinctions for a reasonably dense isochronelibrary and for a decent range of R and A would be imprac-tical. Our approach is instead to use quadratic relations A X = c ( ) X A + c ( ) X A (11)to fit the dependence of A X on A in the range 0 (cid:54) A < ( T eff , log g , [ Fe / H ]) and extinctionlaws R .As one might expect, the correction coefficients c ( ) X and c ( ) X depend sensitively on the parameter R that sets the shape of theextinction law (see, e.g., Fig. 3). Varying the effective temperatureof the star is also significant, a fact that is often overlooked (e.g.,Fig. 4). The extent and form of the temperature dependence variesfrom band to band, and is governed by the position of the bandrelative to the peak of the star’s spectrum.Varying log g , [ Fe / H ] or [ α / Fe ] affects A X much less than vari-ations in T eff . For example, Fig. 5 plots the dependence of c ( ) X and c ( ) X on log g , showing that variations in the latter produce changesthat are more than an order of magnitude smaller than those causedby varying T eff . The response to changes in [ Fe / H ] or [ α / Fe ] is sig-nificantly smaller still; only in u or similar bands, where variationsin [ Fe / H ] most strongly affect spectra, can the chemical composi- tion of the star measurably affect the effect of reddening and thentypically only at a level comparable to log g .Therefore, in general we approximate c ( ) X and c ( ) X as func-tions of R and T eff only, neglecting the dependence on log g , [ Fe / H ] and [ α / Fe ] . In appendix A we provide tables of p and q for a variety of different popular photometric systems for a rangeof T eff and R . We encourage readers to adopt this calibrationwhen considering extinction for stars within our Galaxy, since itaccounts for SED and extinction law variation as well as the non-linear response of extinction in any given photometric band due toincreasing dust column.In the absence of a proper calibration, such as that in ap-pendix A, many have turned to table 6 of Schlegel, Finkbeiner &Davis (1998) to convert between extinctions in different bands. Wewarn that this table of relative extinctions was not intended to beused for stars within the Galaxy, but rather to deredden photometryof galaxies behind the relatively sparse dust columns that charac-terise high Galactic latitudes. As such their table was calibratedusing an elliptical galaxy as a source in the limit of small extinc-tions. Therefore, the calculated ratios are not suitable for detaileduse when considering stars subject to significant extinction.In Fig. 6 we show the fractional errors that arise due to as-suming parametrizations of extinction given here and those thatresult from assuming the A X / E ( B − V ) ratios given by Schlegel,Finkbeiner & Davis (1998) and Schlafly & Finkbeiner (2011). Theerrors due to the calibration we propose here are extremely small,generally less than 0.1%. In contrast, the error that arises by fol-lowing the Schlegel, Finkbeiner & Davis (1998) or Schlafly &Finkbeiner (2011) ratios is frequently on the order of 5%. Having discussed how best to model the effects of extinction, wenow turn to the problem of estimating distances and extinctionsto individual stars in situations where we are not interested in the c (cid:13) , 000–000 Sale and Magorrian details of each star’s spectral type. This problem occurs when con-structing three-dimensional extinction maps from stellar catalogues(e.g., Sale & Magorrian 2014).It is convenient to replace the distance s by the distance mod-ulus µ ≡ ( s / ) (12)and the extinction A by its logarithm, a ≡ ln A . (13)As s > A >
0, it is sensible to consider thelogarithms of both values, since both µ and a span the entire realline. In addition, the use of the distance modulus is sensible, as un-certainties on it are often approximately Gaussian. Meanwhile, theuse of a is largely motivated by the fact that in Sale & Magor-rian (2014) we place a Gaussian random field prior on it to createan extinction map.The particular problem we address is the following. Given aset of observations yyy of some star, we would like to compute themarginal likelihoodp ( yyy | µ , a , R , α , β ) = (cid:90) p ( yyy | µ , a , R , xxx , α ) p ( xxx | µ , β ) d xxx (14)of the distance modulus µ and log-extinction a to the starby marginalising the star’s intrinsic parameters xxx , which includeits mass, age, metallicity and so on. We assume a mix of stellarpopulations β , which specifies the (possibly position-dependent)prior p ( xxx | µ , β ) on xxx . When the observations yyy = yyy phot are lim-ited to the star’s photometric apparent magnitudes the likelihoodp ( yyy phot | µ , a , R , xxx , α ) is straightforward to calculate usingthe extinction model α described in Section 2. If one has indepen-dent additional data yyy other , such as a spectroscopic metallicity or atrigonometric parallax, then the likelihood becomesp ( yyy | µ , a , R , α , β )= p ( yyy phot | µ , a , R , α , β ) p ( yyy other | µ , β ) , (15)in which p ( yyy other | µ , β ) is (usually) independent of extinction and isrelatively easy to treat. In the following we ignore p ( yyy other | µ , β ) andassume that yyy = yyy phot only.The dependence of the marginalised likelihoodp ( yyy phot | µ , a , R , α , β ) on ( µ , a , R ) is usually dif-ficult to predict directly from the observed yyy phot . We mightreasonably expect that it will typically have two maxima – onethat corresponds to the star being on the main sequence, the otherto the giant branch – but the locations ( µ , a , R ) and extentof these maxima cannot be found without some exploration ofthe ( µ , a , R ) space. We use an MCMC algorithm to carryout this exploration, and then fit a simple mixture model to the ( µ , a , R ) dependence of the likelihood function.The marginal likelihood depends on a calibration α , that in-cludes the extinction calibration discussed in section 2 in additionto a set of isochrones rendered in the appropriate filter system.In this paper we employ Padova isochrones (Bressan et al. 2012),which use bolometric corrections calculated from ATLAS9 (Castelli& Kurucz 2003) model spectra. To investigate the extent of the sys-tematic error stemming from the use of
ATLAS9 derived bolometric Recall that p ( xxx | µ , β ) is actually p ( xxx | µ , l , b , β ) , as we have suppressed thedependence on the line of sight ( l , b ) . corrections, we have repeated the tests of section 3.2 but with bolo-metric corrections derived from Phoenix model spectra. We foundthat the posterior expectations of distance modulus and extinctiontypically change by (cid:46) .
02. We caution that, although this uncer-tainty may appear small, as it is systematic it will potentially havea measurable impact on extinction maps.
Although the marginalised likelihood (14) is a function of ( µ , a , R ) , it is not a probability density in these parametersand therefore cannot directly be sampled using MCMC methods.So, we begin by using Bayes’ theorem to express it asp ( yyy | µ , a , R , α , β )= p ( yyy | α , β ) (cid:82) p ( µ , a , R , xxx | yyy , α , β ) d xxx p ( µ , a , R | β ) , (16)in which the prior on ( µ , a , R ) that appears in the denomi-nator is given byp ( µ , a , R | β ) = (cid:90) p ( µ , a , R , xxx | β ) d xxx , (17)which is completely determined by our choice of galaxy model β .Now the posteriorp ( µ , a , R , xxx | yyy , α , β )= p ( yyy | µ , a , R , xxx , α , β ) p ( µ , a , R , xxx | β ) p ( yyy | α , β ) (18)that appears within the integral in the numerator of (16) can besampled using any convenient MCMC method. The likelihoodp ( yyy | µ , a , R , xxx , α , β ) is easy to calculate and the normalisingfactor p ( yyy | α , β ) is important only if we want to compare modelswith different population mixes β or dust properties α . ( s , A , R , xxx | β ) In the examples that follow we adopt a prior p ( xxx | µ , β ) on the stel-lar parameters that comes from an intentionally simple Galacticmodel β . We include in β a Salpeter IMF and a constant star for-mation history. We model metallicity variations as being Gaussianwith a standard deviation of 0.2 dex and a mean that declines by0.06 dex kpc − with Galactocentric radius, following Luck & Lam-bert (2011), normalised to solar metallicity at the solar circle. Wecould also impose a vertical metallicity gradient, but opt not to doso here since the stars we use as examples later in this section lievery close to the Galactic mid-plane.In the light of (16), we are free to use any convenient prioron the parameters µ , a and R provided only that it is suf-ficiently broad to cover plausible regions of parameter space. Weadopt a flat prior in all three parameters, but with R limited tothe range 2 . (cid:46) R (cid:46) . ( µ , a , R , xxx | β ) ∝ p ( xxx | µ , β ) p ( µ , a , R ) , p ( xxx | µ , β ) ∝ M − . N ([ M / H ] | . ( R (cid:12) − R ) , . ) , p ( µ , a , R ) ∝ (cid:40) , if 2 . (cid:54) R < . , , otherwise , (19)where M is the initial mass of the star, [ M / H ] its metallicity, R the c (cid:13) , 000–000 arginal likelihoods of distances and extinctions to stars Galactocentric radius of the star implied by the distance modulus µ and Galactic coordinates ( l , b ) , R (cid:12) the Galactocentric radius of thesun, and N ( ·|· , · ) denotes a normal distribution. There are many MCMC algorithms that could be used to samplethe pdf (18). We use the affine invariant ensemble sampler of Good-man & Weare (2010) as implemented in
EMCEE (Foreman-Mackeyet al. 2013). This algorithm employs a collection of ‘walkers’ thatexplore the parameter space. At each iteration each walker attemptsto move some distance along the vector towards another randomlychosen walker. We initialise the sampler with an array of 100 walk-ers positioned along the main sequence and red giant branch in ( T eff , log g ) space, with [ M / H ] drawn from the prior distribution, R drawn from a uniform distribution on the full range of Fitz-patrick (2004) reddening laws and µ and a chosen to be themaximum likelihood values given all the other parameters. How-ever, when running this algorithm it became clear that groups ofwalkers would occasionally become stuck in islands of low proba-bility, with the relatively high dimensionality making it difficult forthem to transition out to higher probability regions. We thereforeadopt a similar approach to Hou et al. (2012) and ‘prune’ the set ofwalkers at the end of burn-in, moving some walkers when a dispro-portionally large number are stuck in islands of low probability.Our general schema then consists of using 100 walkers, with aburn-in of 1000 iterations, of which the last 100, thinned by a fac-tor of 10, are used to facilitate the pruning. After the pruning, wethen iterate for a further 9000 iterations to obtain our final MCMCchain, thinning the chain by a factor of 10. The thinned chain typ-ically has an autocorrelation length of around 1, implying an au-tocorrelation length of roughly 10 for the unthinned chain, with atotal sample size of 90,000 and an effective sample size of roughly45,000. Although the moments of the distribution could be foundto sufficient precision with a much smaller effective sample size,capturing some of the detail in the posterior requires such a largesample. We can then perform the integration in (16) by simply ig-noring all parameters other than µ , a and R in the MCMCchain. Dividing the result by our (trivial) prior gives the desiredmarginal likelihood p ( yyy | µ , a , R , α , β ) .We demonstrate how we obtain the marginal likelihoods us-ing photometry from IPHAS (INT/WFC photometric H α sur-vey of the northern Galactic plane; Drew et al. 2005; Barentsenet al. 2014) and UKIDSS-GPS (UKIRT infrared deep sky sur-vey - Galactic plane survey; Lucas et al. 2008). In particu-lar we use a crossmatched catalogue that covers 5 (cid:48) × (cid:48) cen-tred on ( l , b ) = ( . , − . ) . We use a 1 arcsec matching ra-dius and only stars flagged as stellar in both surveys are in-cluded. We show In Fig. 7 colour-colour plots of this cata-logue. From this catalogue we select three stars to concentrate on:IPHAS2 J211210.70+482106.8, IPHAS2 J211225.40+481927.6and IPHAS2 J211223.10+481656.4, which we label as stars A,Band C respectively. These span a range of colours and apparentmagnitudes. We add, in quadrature, to the stated photometric un-certainties an additional factor of 2% to account for systematic un-certainties, such as those on the photometric zero points. This ad-ditional factor dominates the uncertainty budget for stars B and Cand makes an important contribution for star A.Two-dimensional histograms of the marginal likelihoods ob-tained for the three stars are displayed in Fig. 8. It is apparent that,as in Green et al. (2014), some exhibit complicated shapes, largelydue to to the irregular shape of the stellar locus in colour–magnitude space. In particular, star A could be either a main sequence star oron the red giant branch: from its photometry alone we are unableto make a distinction. On the other hand, qualitative examinationof the colour–magnitude diagram in Fig. 7 indicates that the starshould be on the giant branch, due to its position in a redder se-quence (Sale et al. 2009). However, this qualitative analysis hasbeen implicitly conditioned upon the photometry of all the stars inthe catalogue – we would not have been able to identify a red se-quence if we only had the photometry of star A. In contrast, thelikelihood in Fig. 8 is conditioned upon only the photometry of starA. In order to condition it on the entire photometric catalogue werequire a method such as that of Sale & Magorrian (2014) (see inparticular their equation 19), in which case the construction of theextinction map would break the degeneracy between the main se-quence and red giant branch. Both stars B and C appear too hot tobe on the red giant branch.Although we have assumed a flat prior on R , the combi-nation of optical and near-infrared photometry has enabled us tonarrow the range of possible extinction laws (see also Berry et al.2012). If our data had not constrained R our uncertainties onboth µ and a would have been increased, since R is covari-ant with both. We note that the uncertainties on R depend, toa large degree, on the number and wavelength range of the photo-metric bands employed: if, as in Berry et al. (2012), we had usedSDSS data in place of IPHAS we would have 8 bands instead of 6and the bluer coverage of the u and g bands and so should be ableto achieve more precise estimates of R . Having explored the ( µ , a , R , xxx ) posterior, we carry out themarginalisation of xxx in the numerator of (16) by simply ignoring the xxx values returned by the sampler and focusing only on the distribu-tion of the ( µ , a , R ) samples. These samples are drawn fromthe marginal likelihood function (14) weighted by the prior (17) ofour assumed galaxy model β .Although it would be possible to use the full set of samplesfrom the MCMC chain (re-weighted to account for the prior) as ourdescription of the marginalised likelihood, this is far from ideal.The chains are long. Therefore the cost of storing them is high,all the more so when one considers that, when constructing coher-ent maps of extinction or stellar density, one will typically wantto use marginalised likelihoods for many stars simultaneously. Thedata volumes can be reduced by thinning the MCMC chain (i.e.,by removing all but every n th entry). Drastic thinning would enablethe data volumes to be manageable, even with a large catalogue ofstars. The cost associated with thinning, however, is that it reducesthe ability of the chain to represent the true underlying marginalisedlikelihood function, particularly in the relatively low likelihood re-gions. This can be a key problem if the likelihood is then fed intoa hierarchical model, as in Sale & Magorrian (2014), where datafrom other stars suggests that the distance or extinction to this starmay lie in such a low probability region. For example, if the rangeof possible extinctions to a particular star were constrained by othernearby stars to a region that is only sampled by a single point in thethinned MCMC chain, the resultant marginal posterior distributionof extinction to this star will take the form of a delta function andthe uncertainty would therefore be drastically underestimated.At the other extreme, the most common solution to this prob-lem is to simply report the mean and covariance matrix of the like-lihood function. But doing this does not pass on any detailed infor- c (cid:13) , 000–000 Sale and Magorrian . . . . . . ( r − i ) r AB C . . . . . . ( r − i ) . . . . . . ( r − H α ) AB C .
00 0 .
15 0 .
30 0 .
45 0 . ( H − K ) . . . . . . ( J − H ) AB C
Figure 7.
Colour-colour diagrams in the IPHAS and UKIDSS-GPS systems of our sample catalogue. The three stars studied in detail are marked withred crosses, labelled with their corresponding letter, where: A is IPHAS2 J211210.70+482106.8, B is IPHAS2 J211225.40+481927.6 and C is IPHAS2J211223.10+481656.4, Solid lines show unreddened main sequences. Only stars that appear in all six bands and are flagged as stellar in both surveys areshown. µ . . . . . a µ . . . . . . R
10 12 14 µ − . . . . . . a
10 12 14 µ . . . . . R
11 12 13 14 µ . . . . . . a
11 12 13 14 µ . . . . . R . . . c o un t / m a x . c o un t Figure 8.
Example marginal likelihoods shown for stars A (left), B (centre) and C (right). We have binned the MCMC samples only for the purposes ofproducing histograms. µ . . . . . a µ . . . . . . R
10 12 14 µ − . . . . . . a
10 12 14 µ . . . . . R
11 12 13 14 µ . . . . . . a
11 12 13 14 µ . . . . . R . . . c o un t / m a x . c o un t Figure 9.
Examples of the Gaussian mixture approximations, plotted in histograms to match Fig. 8. The coloured ellipses show the 2- σ contours of each ofthe Gaussian mixture model components, with the width of the ellipses’ curves linearly increasing with the weight of the corresponding component in theGaussian mixture model. As in Fig. 8 star A is in the left hand column, B in the centre and C on the left. c (cid:13)000
Examples of the Gaussian mixture approximations, plotted in histograms to match Fig. 8. The coloured ellipses show the 2- σ contours of each ofthe Gaussian mixture model components, with the width of the ellipses’ curves linearly increasing with the weight of the corresponding component in theGaussian mixture model. As in Fig. 8 star A is in the left hand column, B in the centre and C on the left. c (cid:13)000 , 000–000 arginal likelihoods of distances and extinctions to stars µ c o un t s .
80 1 .
95 2 .
10 2 . a c o un t s . . . . . . R c o un t s . . . . . µ c o un t s − . . . . . . a c o un t s . . . . . . . R c o un t s
10 11 12 13 14 15 µ c o un t s .
05 1 .
20 1 .
35 1 .
50 1 . a c o un t s . . . . . . R c o un t s Figure 10.
Examples of 1D marginal likelihoods for each star shown fitted with Gaussian mixtures. The black solid line shows a histogram of the MCMCsamples. As in Fig. 8 we have only performed the binning of the MCMC samples to produce the plotted histograms. The coloured lines show the contributionof each of the components in the Gaussian mixture model, whilst the dashed black line shows the total one dimensional marginal likelihood implied by theGaussian mixture model. Once again star A is in the left hand column, B in the centre and C on the left. mation about the shape of the likelihood function, which, as demon-strated by Fig. 8, may well be somewhat irregular. In particular itwill not reveal multimodality, as might be the case if there are twopeaks in the likelihood corresponding to the observed star being onthe main sequence or on the giant branch.An alternative is to describe the likelihood function usingsome mixture of simple distributions. For example, Carrasco Kind& Brunner (2014) depict the posterior distributions of photomet-ric redshifts to galaxies using a mixture of Gaussians and Voightprofiles. We instead fit a mixture of trivariate Gaussians to themarginalised posterior p ( µ , a , R | yyy , α , β ) . As we assume aflat prior on ( µ , a , R ) , this is equivalent to fitting Gaussiansto the marginalised likelihood function p ( yyy | µ , a , R , α , β ) .So, writing θ ≡ ( µ , a , R ) , our goal is to fit a functionp ( θ | yyy , β , α ) ≈ K ∑ k = w k N ( θ | mmm k , C k ) , (20)to our MCMC sample ( θ ,..., θ N ) by adjusting the weights w k ,means mmm k and covariances C k of the Gaussians on the right-handside, along with their number K .Before explaining our procedure for carrying out the fitting,we note that using Gaussians here has the key advantage that onecan often carry out further marginalisation analytically. An exam-ple of this is given in Section 4.2 of Sale & Magorrian (2014),in which the distances and extinctions to individual stars weremarginalised in order to obtain the pdf of the parameters describ-ing the large-scale extinction distribution. Similarly, having fit thetrivariate Gaussian mixture model above, one could later decide to take the prior p ( R | β ) to be Gaussian with a mean and stan-dard deviation from e.g. Fitzpatrick & Massa (2007) and thenmarginalise R analytically to obtain the marginal likelihoodp ( yyy | µ , a , α , β ) . This new, two-dimensional marginal likelihoodwould still be expressed as a sum of Gaussians. One way of addressing the problem of fitting the Gaussian mixturemodel (20) to the MCMC sample would be by modelling the latteras a Dirichlet process mixture of Gaussians. Our goal here thoughis not to consider all possible Gaussian-mixture descriptions of theMCMC chain, but instead to obtain a single , compact, “best” de-scription of the marginalised likelihood. Generally a single Gaus-sian will not describe the marginalised posterior well, but a mixtureof two or more Gaussians will do better.A simple and robust approach is to iterate of a range of pos-sible K . For each K we use the expectation–maximization (EM)algorithm, as implemented in scikit-learn (Pedregosa et al.2011), to find the parameters ( w k , mmm k , C k ) of each of the K Gaus-sians that maximise the likelihood L K ≡ N ∏ n = K ∑ k = w k N ( θ n | mmm k , C k ) , (21)subject to the constraint that ∑ k w k =
1. The EM algorithm func-tions by introducing N × K new latent variables { z nk } that allowthe awkward product of sums in this likelihood to be rewritten as c (cid:13) , 000–000 Sale and Magorrian the easier-to-handle sum of products L K ≡ ∑ { z nk } N ∏ n = K ∏ k = [ w k N ( θ n | mmm k , C k )] z nk . (22)The new variables z nk indicate the probability that MCMC sam-ple n was drawn from Gaussian k . The algorithm proceeds by al-ternately updating the latent membership probabilities { z nk } hold-ing { w k , mmm k , C k } fixed, then, for this choice of { z nk } , finding the { w k , mmm k , C k } that maximise the likelihood. We initialise the EM runwith the means of the components given by the mean of the MCMCsample and with diagonal covariance matrices with the variance foreach parameter being the corresponding variance from the MCMCsample. The EM algorithm is then run for 100 iterations to findoptimal values of { w k , mmm k , C k } . Having obtained maximum likelihoods for K = , , ,... the ques-tion then becomes one of deciding how many Gaussians are ac-tually justified. For example, if we chose K (cid:62) N (i.e., there areas many Gaussians as there are MCMC samples), then the likeli-hood would be unbounded: simply centre one Gaussian on eachpoint from the MCMC sample and let its covariance shrink to zero.We would like to avoid fitting the shot noise in our MCMC sam-ples like this, or, more practically, requiring such a large number ofGaussians that they cause data volume problems.A natural way of comparing models with different compo-nents would be to adopt uninformative priors on { w k , mmm k , C k } andto marginalise the likelihoods L k to obtain the marginal likeli-hoods p ( { θ }| K ) for each K . These p ( { θ }| K ) could be estimatedby a variational Bayes method (see, e.g., Appendix C of Magorrian2014), but doing so would be overkill for our present purposes. Asa straightforward alternative, we instead employ the Bayesian In-formation Criterion (BIC, Schwarz et al. 1978)BIC = − L K + ( K − ) ln N , (23)where ˆ L K is the maximum likelihood of the K -component Gaus-sian mixture model, as found by the EM algorithm. The secondterm in this expression acts as a penalty on the number of compo-nents, with the ( K − ) factor accounting for the number of freeparameters in a K -component trivariate Gaussian mixture model:3 K numbers are needed to specify the means mmm k , 6 K for the sym-metric covariance matrices C k , and K − w k . Ourfavoured model is simply the one that minimises the value of BIC.We find that this minimum is typically achieved for mixtures hav-ing K ∼ R takes on values close to the cut-offsimposed by the range covered by the Fitzpatrick (2004) extinctioncurves; our marginalised likelihoods fall sharply to zero at theseextreme values, a behaviour which the Gaussian mixture has diffi-culty reproducing. However, we note that the impact of such issueswill be dramatically reduced by the imposition of any sensible prioron R . For example, one could place a simple Gaussian prioron R with a mean and variance taken from e.g. Fitzpatrick & Massa (2007). Under such a prior the probability of the problem-atic extreme values of R would be very low and so the issuesrelated to the fit would become essentially irrelevant. One of our primary goals in this paper is to find a compact de-scription of the marginal likelihood p ( yyy | µ , a , R , α , β ) . It isnatural then to ask whether our Gaussian mixture model provides amore compact summary of this function than, say, a thinned sampleof points from an MCMC chain. In the following we consider twodifferent measures of how well such fits reproduce the true marginallikelihood. One way of quantifying the fidelity of different descriptions of themarginal likelihood is by using the Kullback-Leibler (KL) diver-gence. Let P be the true marginalised likelihood and Q a fit fromeither the Gaussian mixture or the thinned MCMC sample. The di-vergence of the fit Q from the true function P is given by D KL ( P (cid:107) Q ) = (cid:90) ∞ − ∞ d θ P ( θ ) log (cid:18) P ( θ ) Q ( θ ) (cid:19) . (24)This can be recognised as the entropy of P relative to Q , a measureof how much more information there is in P than in the fitted Q .One problem with applying this is that we do not know the truemarginal likelihood function P ( θ ) : we only have discrete samples ( θ ,..., θ N ) of it from the MCMC chain. So, to construct our refer-ence P we take a very long chain of N ∼ samples and then, asa simple kernel density estimator, we replace each sample point θ n (which has density δ ( θ − θ n ) ) by a narrow Gaussian kernel centredon θ n . Then the value of the function P at any point θ is given bythe sum of the contributions from all N ∼ kernels at that point.We set the kernel width using ten-fold cross validation. Thatis, each point from the MCMC chain is assigned at random to oneof ten subsamples. Then, for a given trial kernel width, we constructa kernel density estimate using nine of the ten subsamples. We usethis kernel density estimate to calculate the log-likelihood of thepoints in the remaining sub-sample. This is then repeated for allten subsamples and the average log-likelihood found. By consider-ing a range of kernel widths we can choose an optimum value bymaximising the mean log-likelihood. Typically the kernel widthsfound by this procedure are small – on the order of 0.01 in µ , forexample – and much smaller than the bin sizes adopted in Fig. 8.Consequently, if one applies a binning to the kernel density esti-mate of the marginal likelihood to match that employed in Fig. 8,one would obtain a distribution that will closely resemble the his-tograms in Fig. 8.We use a similar procedure to reconstruct Q ( θ ) for the thinnedMCMC chains. We do so by thinning the main MCMC chain andreapplying the cross-validation procedure to each thinned chain. Fi-nally we estimate the integral in (24) using Monte Carlo integrationwith 10,000 samples drawn from P ( θ ) .We are interested in how the KL-divergences from P ( θ ) ofGaussian mixture fits and of the thinned MCMC chains scale withthe number of parameters needed to describe each fit. As discussedin section 3.2 we require ( K − ) parameters to describe a K com-ponent Gaussian mixture model, whilst the number of parametersneeded to describe a thinned MCMC chain is the dimensionality(i.e. three) multiplied by the number of samples in the chain. c (cid:13) , 000–000 arginal likelihoods of distances and extinctions to stars
10 100 1000 N D K L
10 100 1000 N D K L
10 100 1000 N D K L Figure 11.
Kullback-Leibler divergences as a function of the number of parameters required. Smaller divergences indicate a greater degree of similaritybetween the two distributions and so a more successful approximation. In black we plot D KL for thinned MCMC chains relative to the long unthinned chain,using kernel density estimates of both and in red we plot values the D KL between the Gaussian mixture approximation and the kernel density estimate of theunthinned chain, with crosses indicating different values of K running from K = K =
20. In the left plot we show values for star A, in the middle star Band star C on the right.
10 100 1000 N − − D P
10 100 1000 N − − D P
10 100 1000 N − − D P Figure 12.
Peacock (1983) distances as a function of the number of parameters required. Smaller distances indicate a greater degree of similarity between thetwo distributions and so a more successful approximation. In black we plot distances for thinned MCMC chains relative to the unthinned chain and in red weplot values the distances between the Gaussian mixture approximation and the unthinned chain. In the left plot we show values for star A, in the middle star Band star C on the right.
In Fig. 11 we compare the D KL found using the Gaussian mix-ture model approximation to those obtained using thinned MCMCchains as a function of the number of parameters required. Toachieve a given D KL , the Gaussian mixture model requires an orderof magnitude fewer points than the thinned MCMC chain.We note that D KL for the Gaussian mixture does not pass be-low ∼ .
02 for any of the three stars shown. There are a numberof sources of error and noise that will prevent a perfect agreementbetween Q and P , and so D KL = K (cid:54)
20 Gaussian components. In particularthe marginal likelihood for star A takes a more complicated formthan that of B or C (Fig. 8), which is reflected in correspondinglylarge values of D KL . In addition, we do not actually know the exactmarginal likelihood. Instead we have only a noisy kernel-densityestimate of it, which limits our ability to fit smooth functions, suchas Gaussian mixtures to it. Also, we limit the EM algorithm thatfits the Gaussian mixture to a maximum number of iterations. Con-sequently it will generally not achieve the absolute best fit. Theresulting error will be manifested in a small contribution to themeasured D KL . Despite these shortcomings of our D KL tests, weneverthless believe that it is evident that our Gaussian-mixture fitsproduce very good descriptions of the marginal likelihoods. The D KL (cid:39) .
02 achieved for stars B and C indicate that our Gaussianmixture fit differs from the marginal likelihood by, at most, ∼ With one dimensional data it is common to compare samples and/ordistributions using the the Kolmogorov-Smirnov distance, D KS ( P , Q ) ≡ sup x | P ( x ) − Q ( x ) | , (25)where P and Q are cumulative distributions, either derived directlyfrom a probability distribution, or found empirically from a sampleof points. The Kolmogorov-Smirnov distance is not immediatelyapplicable to our situation, as it is defined only for one dimensionaldistributions P ( x ) and Q ( x ) . In this one-dimensional case there areonly two possible cumulative distribution functions, either p ( x < X ) or p ( x > X ) , each of which is uniquely defined by the other, becausep ( x (cid:54) X ) = − p ( x > X ) . (26)In p (cid:62) ( x < X , x < X ,..., x p < X p ) or p ( x > X , x < X ,..., x p > X p ) and so on. For each of our p = p = P and Q , then taking the maximum such dis-tance as our measure of the “similarity” of the two functions. The multidimensional analogue of equation (26) means that any one ofthese CDFs is completely determined by the other 2 p −
1. So, there are2 p − = (cid:13) , 000–000 Sale and Magorrian
An advantage of this scheme over the KL divergence is that itcan be applied directly to the samples from MCMC chains: it avoidsthe need for kernel density estimates of either P or, in the case ofthinned MCMC chains, Q . Fig. 12 shows the results. As with theKL-divergences, for a given number of parameters the Gaussianmixture model provides a far better approximation to the marginallikelihood than a thinned MCMC chain can. We have considered how one should measure the distance and ex-tinction to individual stars for use in constructing extinction mapsof the whole galaxy. We advocate the use of monochromatic extinc-tions, since, unlike band pass measures such as A V and E ( B − V ) ,monochromatic extinctions are linear functions of the dust columndensity and are independent of the source SED. In particular wesuggest the use of A , the monochromatic extinction at 4000 ˚Abecause of its insensitivity to the dust grain size distribution.We have developed one way of calculating the marginal likeli-hood p ( ˜ yyy | µ , a , R , α , β ) by marginalising the (unknown and,for our purposes, uninteresting) fundamental parameters of the starin order to estimate the marginal likelihood. As this integration isnot possible analytically, we suggest a scheme for doing so usingMCMC methods, specifically the affine invariant ensemble samplerof Goodman & Weare (2010)We find that the resulting marginal likelihood function can bedescribed very well using a Gaussian mixture model composed ofonly K (cid:39) ( ˜ yyy | µ , a , R α , β ) is vitalwhen one is constructing maps from large catalogues of stars. An-other advantage of expressing the marginal likelihood as a sum ofGaussians is that it makes further marginalisation of any or all ofthe parameters ( µ , a , R ) straightforward. This is particularlyimportant if one models the dust density distribution as a Gaussianrandom field (Sale & Magorrian 2014).In common with Green et al. (2014), the approach adoptedin Sale & Magorrian (2014) is to split the production of three-dimensional dust maps into two distinct steps. First we estimate themarginal likelihood p ( ˜ yyy | µ , a , R , α , β ) of distance modulus µ and (log) extinction a to each star in the catalogue. Then weconstruct maps from these distances and extinctions. The methodwe present in this paper for carrrying out the first of these two stepsis very similar to the method Green et al. (2014) use for calculat-ing their posterior pdf p ( µ , A | ˜ yyy ) . The most important differences arethat we use monochromatic extinctions and we return the result ina compact multi-Gaussian form. Our sample of Galactic plane starsmeant that we could reasonably use a simple prior β on stellar dis-tances and intrinsic parameters xxx : this is easy to change for moreextended samples.The alternative to these two-step approaches would be to infersimultaneously the distance–extinction relationship and the proper-ties of all the stars that trace it (Sale 2012). The benefit of this isthat MCMC schemes operating in the extended space of the stars’intrinsic parameters and their ( µ , a , R ) would tend to avoidregions of ( µ , a , R ) that are a posteriori unlikely, reducingthe computing load. The downside is that parallelization becomesvery difficult, making it infeasible to scale up to large datasets. Incontrast, in the two-step procedure one has no way of knowing whatportions of ( µ , a , R ) parameter space are going to be impor- tant, and so it has to be explored thoroughly. But this is a smallprice to pay for the trivial parallelization opportunities.The software libraries used to obtain the results in this pa-per are available online, including a library for manipulatingisochrones and the code used to sample the marginal likelihoodand fit it with a Gaussian mixture model . ACKNOWLEDGEMENTS
The research leading to the results presented here was supportedby the United Kingdom Science Technology and Facilities Coun-cil (STFC, ST/K00106X/1), the European Research Council underthe European Unions Seventh Framework Programme (FP7/2007-2013)/ERC grant agreement no. 321067. JM thanks the Institutd’Astrophysique de Paris for their hospitality and Ville de Parisfor support through their “Research in Paris” programme.
REFERENCES
Bailer-Jones C. A. L., 2011, MNRAS, 411, 435Barentsen G. et al., 2014, MNRAS, 444, 3230Berry M. et al., 2012, ApJ, 757, 166Bessell M., Bloxham G., Schmidt B., Keller S., Tisserand P., Fran-cis P., 2011, PASP, 123, 789Bessell M. S., 1990, PASP, 102, 1181—, 2005, ARA&A, 43, 293Bressan A., Marigo P., Girardi L., Salasnich B., Dal Cero C.,Rubele S., Nanni A., 2012, MNRAS, 427, 127Cardelli J. A., Clayton G. C., Mathis J. S., 1989, ApJ, 345, 245Carrasco Kind M., Brunner R. J., 2014, MNRAS, 441, 3550Casagrande L., VandenBerg D. A., 2014, MNRAS, 444, 392Castelli F., Kurucz R. L., 2003, in IAU Symposium, Piskunov N.,Weiss W. W., Gray D. F., eds., pp. 20P–+Cohen M., Wheaton W. A., Megeath S. T., 2003, AJ, 126, 1090Crawford D. L., Barnes J. V., 1970, AJ, 75, 978Crawford D. L., Mander J., 1966, AJ, 71, 114di Francesco J. et al., 2010, A&A, 518, L91Draine B. T., 2003, ARA&A, 41, 241Drew J. E. et al., 2014, MNRAS, 440, 2036—, 2005, MNRAS, 362, 753Fitzpatrick E. L., 2004, in Astronomical Society of the PacificConference Series, Vol. 309, Astrophysics of Dust, Witt A. N.,Clayton G. C., Draine B. T., eds., p. 33Fitzpatrick E. L., Massa D., 2007, ApJ, 663, 320Foreman-Mackey D., Hogg D. W., Lang D., Goodman J., 2013,PASP, 125, 306Golay M., ed., 1974, Astrophysics and Space Science Library,Vol. 41, Introduction to astronomical photometryGoodman J., Weare J., 2010, Communications in Applied Mathe-matics and Computational Science, 5, 65Green G. M. et al., 2014, ApJ, 783, 114Hanson R. J., Bailer-Jones C. A. L., 2014, MNRAS, 438, 2938Hewett P. C., Warren S. J., Leggett S. K., Hodgkin S. T., 2006,MNRAS, 367, 454Hou F., Goodman J., Hogg D. W., Weare J., Schwab C., 2012,ApJ, 745, 198 https://github.com/stuartsale/iso_lib https://github.com/stuartsale/marg_iso c (cid:13) , 000–000 arginal likelihoods of distances and extinctions to stars Husser T.-O., Wende-von Berg S., Dreizler S., Homeier D., Rein-ers A., Barman T., Hauschildt P. H., 2013, A&A, 553, A6Lallement R., Vergely J.-L., Valette B., Puspitarini L., Eyer L.,Casagrande L., 2014, A&A, 561, A91Lucas P. W. et al., 2008, MNRAS, 391, 136Luck R. E., Lambert D. L., 2011, AJ, 142, 136Magorrian J., 2014, MNRAS, 437, 2230Maiz Apell´aniz J., 2013, in Highlights of Spanish AstrophysicsVII, Guirado J. C., Lara L. M., Quilis V., Gorgas J., eds., pp.583–589Majewski S. R., Zasowski G., Nidever D. L., 2011, ApJ, 739, 25Marshall D. J., Robin A. C., Reyl´e C., Schultheis M., Picaud S.,2006, A&A, 453, 635McCall M. L., 2004, AJ, 128, 2144Munari U., Sordo R., Castelli F., Zwitter T., 2005, A&A, 442,1127O’Donnell J. E., 1994, ApJ, 422, 158Patat F. et al., 2011, A&A, 527, A91Peacock J. A., 1983, MNRAS, 202, 615Pedregosa F. et al., 2011, Journal of Machine Learning Research,12, 2825Planck Collaboration et al., 2014, A&A, 571, A11Sale S. E., 2012, MNRAS, 427, 2119Sale S. E. et al., 2014, MNRAS, 443, 2907—, 2009, MNRAS, 392, 497Sale S. E., Magorrian J., 2014, MNRAS, 445, 256Schlafly E. F., Finkbeiner D. P., 2011, ApJ, 737, 103Schlegel D. J., Finkbeiner D. P., Davis M., 1998, ApJ, 500, 525Schwarz G., et al., 1978, The annals of statistics, 6, 461Stead J. J., Hoare M. G., 2009, MNRAS, 400, 731Straizys V., Kuriliene G., 1981, AP&SS, 80, 353Stubbs C. W., Doherty P., Cramer C., Narayan G., Brown Y. J.,Lykke K. R., Woodward J. T., Tonry J. L., 2010, ApJS, 191, 376Vergely J.-L., Freire Ferrero R., Siebert A., Valette B., 2001,A&A, 366, 1016Weingartner J. C., Draine B. T., 2001, ApJ, 548, 296
APPENDIX A: TABULATED RESPONSE TO A FOR AVARIETY OF PHOTOMETRIC BANDS
We make available with this paper a tabulation of the the coeffi-cients c ( ) X and c ( ) X of (11) for a variety of filters and the full rangeof Fitzpatrick (2004) reddening laws. We do this for SEDs along asolar metallicity main sequence, defined in ( T eff , log g ) by Straizys& Kuriliene (1981). We also include a Rayleigh-Jeans spectrum, todemonstrate the limiting behaviour for extremely hot stars.In table A1 we list the photometric systems and their con-stituent filters that we employ. For all the survey filter sets wealso employ the detector quantum efficiency curve and atmospherictransmission for the instrument and site used.We include two ‘standard’ filter sets: the Bessell (1990) UBV RI set and a Str¨omgren filter set with the uvby transmissionstaken from Crawford & Barnes (1970) and H β wide and H β narrow from Crawford & Mander (1966). We present results for these fil-ters using the INT/WFC CCD quantum efficiency and the Patatet al. (2011) model for atmospheric absorption at Cerro Paranal.In addition, for reference purposes, we also provide results for the https://github.com/stuartsale/A4000_coeffs Bessell (1990) filter set with no atmospheric absorption and a 100%efficient detector.A sample of the table of values of c ( ) X and c ( ) X is given intable A2. c (cid:13) , 000–000 Sale and Magorrian
System Filters SourceBessell
UBV RI
Bessell (1990)Str¨omgren ubvy H β narrow H β wide Crawford & Barnes (1970), Crawford & Mander (1966)2MASS
JHK s Cohen, Wheaton & Megeath (2003)Gaia G INT (IPHAS/UVEX)
Ugri H α PAN-STARRS grizy
Stubbs et al. (2010)SDSS ugriz
Skymapper uvgriz
Bessell et al. (2011)UKIDSS
ZY JHK
Hewett et al. (2006)VISTA
ZY JHK s VST ugri H α , Drew et al. (2014) Table A1.
A list of the systems and filters for which we tabulate the response to extinction. R T eff log g c ( ) J c ( ) J c ( ) H c ( ) H c ( ) Ks c ( ) Ks c ( ) U c ( ) U c ( ) B c ( ) B c ( ) V c ( ) V Table A2.
An extract from the compilation of the coefficients c ( ) X and c ( ) X of (11) for the filters listed in table A1 and the Fitzpatrick (2004) reddening laws.c (cid:13)000