[PDF] Galaxy Formation: Bayesian History Matching for the Observable Universe

Abstract

Cosmologists at the Institute of Computational Cosmology, Durham University, have developed a state of the art model of galaxy formation known as Galform, intended to contribute to our understanding of the formation, growth and subsequent evolution of galaxies in the presence of dark matter. Galform requires the specification of many input parameters and takes a significant time to complete one simulation, making comparison between the model's output and real observations of the Universe extremely challenging. This paper concerns the analysis of this problem using Bayesian emulation within an iterative history matching strategy, and represents the most detailed uncertainty analysis of a galaxy formation simulation yet performed.

Full PDF

aa r X i v : . [ s t a t . M E ] M a y Statistical Science (cid:13)

Institute of Mathematical Statistics, 2014

Galaxy Formation: Bayesian HistoryMatching for the Observable Universe

Ian Vernon, Michael Goldstein and Richard Bower

Abstract.

Cosmologists at the Institute of Computational Cosmology,Durham University, have developed a state of the art model of galaxyformation known as Galform, intended to contribute to our understand-ing of the formation, growth and subsequent evolution of galaxies in thepresence of dark matter. Galform requires the speciﬁcation of many in-put parameters and takes a signiﬁcant time to complete one simulation,making comparison between the model’s output and real observationsof the Universe extremely challenging. This paper concerns the analy-sis of this problem using Bayesian emulation within an iterative historymatching strategy, and represents the most detailed uncertainty anal-ysis of a galaxy formation simulation yet performed.

Key words and phrases:

Computer models, Bayesian statistics, his-tory matching, Bayes linear, emulation, galaxy formation.

1. INTRODUCTION

Understanding the evolution of the universe fromthe Big Bang to the current day is the fundamentalgoal of cosmology. A major part of this is the prob-lem of structure formation: understanding the for-mation, growth and subsequent evolution of galaxiesin the presence of dark matter. The world leadingGalform group, based at the Institute of Compu-tational Cosmology, Durham University, has devel-oped a state of the art model of galaxy formationknow as Galform. However, they face a critical prob-lem. Galform requires the speciﬁcation of many in-put parameters and takes a signiﬁcant time to com-

Ian Vernon is Lecturer and Michael Goldstein isProfessor, Department of Mathematical Sciences,Durham University, Science Laboratories, South Road,Durham, DH1 3LE, United Kingdom e-mail:[email protected]. Richard Bower is Professor,ICC/Department of Physics, Durham University,Science Laboratories, South Road, Durham, DH1 3LE,United Kingdom.

This is an electronic reprint of the original articlepublished by the Institute of Mathematical Statistics in

Statistical Science , 2014, Vol. 29, No. 1, 81–90. Thisreprint diﬀers from the original in pagination andtypographic detail. plete one simulation, making comparison betweenthe model’s output and real observations of the uni-verse extremely challenging.Here we describe the analysis of this problem us-ing Bayesian history matching methodology, high-lighting why the problem itself can only be sensiblyformulated within a subjective Bayesian context anddemonstrating the use of Bayesian emulators withinan iterative history matching strategy. This workrepresents the most detailed uncertainty analysis ofa galaxy formation simulation yet performed and,to our knowledge, the most detailed history match,with the most number of iterations completed, forany model in the scientiﬁc literature. This method-ology is widely applicable across any scientiﬁc dis-cipline that uses computer simulations of complexphysical processes.We discuss galaxy formation in Section 2, theBayesian history matching methodology in Section 3and the application and results in Section 4. For amore detailed account of this ongoing project seeVernon, Goldstein and Bower (2010).

2. GALAXY FORMATION2.1 A Universe Full of Galaxies

The night sky is full of stars. Yet the stars thatare visible to the human eye are only an unimagin- I. VERNON, M. GOLDSTEIN AND R. BOWER

Fig. 1.

Left: the Andromeda galaxy (NASA), the closest large galaxy to our own, contains approximately 1 trillion stars.Typical output from a galaxy formation simulation showing the conﬁguration of dark matter (middle) and baryonic stars andgas (right) (Eagle collaboration). ably tiny fraction of the stars in the universe as awhole. Equipped with telescopes, we discover that atgreat distances beyond our own galaxy lie millionsof millions of other galaxies, each with their ownpopulations of stars. Moreover, galaxies come in agreat variety of shapes and forms. Our own MilkyWay galaxy is one of the larger spiral type galax-ies. Spiral galaxies are dominated by a ﬂat disk ofstars, often with prominent spiral arms (Figure 1).With modern telescopes, it has become possible tostudy galaxies at greater and greater distances fromearth. Because of the ﬁnite speed of light, such dis-tant galaxies are seen when the universe was muchyounger. Astronomers can use this time delay to ob-serve the buildup and formation of galaxies.These observations have revealed some, at ﬁrstsight, puzzling results. Explaining the tension be-tween the prima facie theoretical expectation andthe observational evidence was one of the key mo-tivations for developing the theoretical model dis-cussed below. The problem for current theories ofgalaxy formation is not so much to understand whygalaxies form, but to understand why they are rela-tively small and few. The basic ingredients are clear(the force of gravity and radiative cooling of bary-onic matter), but we are only now beginning to un-derstand how the formation of galaxies is regulated.The surprising result is that the black holes (thedensest objects in the universe) appear to play akey role in this.So how do galaxies form? Why is the universeﬁlled with such objects? In principle, it is a straight-forward consequence of the dominance of the grav-itational force. Since all matter makes a positivecontribution to the gravitational force, the clump-ing of the universe’s mass is a run away process. Asthe condensations of matter become denser, theybecome more eﬀective as attractors. These matter concentrations are referred to as haloes. The ob-servational evidence shows that most of this mass,however, is not normal, “baryonic,” matter (thatyou and I are made from) and that the universe isdominated by “Cold dark matter” (CDM): massiveparticles that interact very weakly (possibly associ-ated with super-symmetric extensions of the stan-dard model of particle physics).The CDM particles explain the collapse andgrowth of the gravitating dark matter haloes, butto populate these haloes with luminous galaxies,we must turn to the astrophysics of the baryonicmatter. As the baryons are pulled together by thecollapse of the dark matter halo, they heat up andstart to resist further compression. The baryonic gas(but not the collision-less dark matter) radiates thisenergy and cools, leading to a run-away contractionthat is only stopped by the conservation of angularmomentum. The baryons form a thin, cold spinningdisk of gas. Further condensation leads to the forma-tion of stars and black holes. In this scenario, mosthaloes are able to convert almost all their baryoniccomponent into stars, but this is in direct conﬂictwith the observed 10% baryonic conversion The ori-gin of this discrepancy is a key cosmological puzzleand astronomers appeal to “feedback” to resolve it:somehow the formation of stars and black holes mustinject energy that prevents further gas cooling. Oneof the key aims of the Galform project is to identifythe feedback schemes that are needed to account forthe observed universe.

Feedback greatly complicates an otherwise almoststraightforward problem. In order to solve the prob-lem from ab-initio principles, we would need tomodel the formation of individual stars and blackholes. Fortunately, we can make progress by param-

ALAXY FORMATION: BAYESIAN HISTORY MATCHING Table 1

Table of the 17 input parameters that make up the vector x and associated ranges (which were converted to − to 1 for theanalysis). Input parameters are grouped by physical process Input Process Input Processparameter x Min Max modelled parameter x Min Max modelled V hot , disk

100 550 SNe feedback α cool . . V hot , burst

100 550 · ε Edd .

004 0 . · α hot . · f df . . α reheat . . · f ellip . . · ε ⋆

10 1000 Star formation f burst .

01 0 . · α ⋆ − . − . · F bh .

001 0 . · p yield .

02 0 . · v cut

20 50 Reionisation t disk · z cut · f stab .

65 0 .

95 Disk stability eterising our lack of knowledge as uncertain coef-ﬁcients in formulae that summarise macroscopic ef-fects, and then by adjusting these coeﬃcients to pro-vide the best description of the observed universe.For example, although we cannot derive the rate ofstar formation from the ﬁrst principles, we can in-clude a parameter that describes the rate at whichcold gas is converted to stars and then attempt todetermine its plausible range of values through com-parison with observations.The Galform code used in this project representsthe state-of-the-art in this approach. It has beenused to establish a very plausible model for theformation of galaxies (Bower et al. (2006)) thatdescribes many of the observed properties of thegalaxy population, as diverse as the abundance ofgalaxies of diﬀerent masses and the history of thegrowth of their black holes. It also makes well-testedpredictions for properties of the gas that is left overfrom galaxies (Bower et al. (2012)). The model com-bines many physical ingredients, including modulesto track: the gravitational collapse and buildup ofdark matter haloes; the cooling and accretion of gas;the formation of stars, stellar evolution and “feed-back” from supernova explosions; galaxy mergersand instabilities in stellar disks; the formation ofblack holes and the associated feedback. The mod-ules link together to form a network of nonlinearequations that are integrated in time to trace theevolving properties of the galaxy population (seeFigure 1).

Each module has associated parameters. For ex-ample, these might specify the rate at which cold gas is converted into stars, ε ⋆ , or the energy gener-ated in supernova feedback and its dependence ongalaxy mass V hot , disk and V hot , burst . Galform requiresa total of 17 such input parameters, shown in Ta-ble 1 along with appropriate ranges elicited fromthe cosmologists and with the physical module eachparameter relates to. Exploring this 17-dimensionalspace is vital but extremely challenging, as Galformtakes approximately

20 hours to complete a singleevaluation. It also requires a detailed forcing func-tion, the speciﬁcation of the Dark matter contentof the universe at all times (Figure 1), provided bythe Millennium simulation: a dark matter simulationthat took 3 months on a supercomputer and that isnot easily repeated. For this project we had accessto 256 processors and can parallelise the Galformcalculation into 40 sub-volumes.Of all the outputs produced by Galform, we fo-cus our analysis on by far the most important: thebj and K luminosity functions, which give the lognumber of blue or red (i.e., young or old) galaxies,respectively, per unit volume, binned by luminosity(Norberg et al. (2002)). These observed luminosityfunctions, shown as the black points in Figure 2, areconsidered to be the benchmark by which modelsof galaxy formation are judged. Models will be dis-carded if they do not match these luminosity func-tions alone, and determining the set of input pa-rameters that give rise to such matches is of inher-ent scientiﬁc worth, as it will be highly informativeregarding the various physical processes involved ingalaxy formation. Determining if any matches evenexist and, if so, obtaining a large set of runs thatmatch this data for use in future analysis are majorgoals of the project.

I. VERNON, M. GOLDSTEIN AND R. BOWER

Fig. 2.

The bj (left) and K (right) luminosity functions giving the (log) number of galaxies per unit volume, binned byluminosity. Black points: observed data, along with 2 sigma intervals representing all relevant uncertainties identiﬁed inSection 4.1. The coloured lines are the Galform outputs from 993 wave 1 runs of the model, none of which were found to beacceptable. The vertical lines show the 7 outputs f ( x ) chosen for emulation (see Section 4.2 and Table 2).

3. BAYESIAN HISTORY MATCHING

This study concerns

Bayesian history matching ,to identify a collection of input parameter choicesfor Galform which give acceptable matches to cer-tain measurements on the universe. History match-ing is a common term in the oil industry, where itis used to describe the adjustment of a model of areservoir, by modifying the input parameter choices,until it closely reproduces the historical productionand pressure proﬁles recorded in that reservoir. InDurham, we have developed a general Bayesian ap-proach to this problem for oil reservoirs, expandingthe use of the term from ﬁnding a single match tosearching for all such matches. A good descriptionof this work can be found in Craig et al. (1997).This history matching methodology is part of thegeneral Bayesian treatment of uncertainty in phys-ical systems modelled by complex computer simu-lators. A good reference for this area is the websitefor the Managing Uncertainty in Complex Models(MUCM) project, . Here,we focus on those aspects of the general methodol-ogy that are most relevant to history matching.We want to use the Galform simulator to repro-duce the observed history of the physical system.Therefore, we need to consider how good the matchshould be in order to be acceptable. We must recog-nise the limitations of the simulator as a representa-tion of the physical system. Our models approximateand simplify both the properties of the system andthe physical principles used to generate the corre-sponding system behaviour. Even so, the mathemat- ical implementation is still too complex for precisesolution, and so is further simpliﬁed and approx-imated. Add to this our uncertainty about initialconditions, boundary conditions and forcing func-tions for the system, and it is clear that we must as-sess the structural discrepancy between model out-comes, even if well chosen, and actual physical be-haviour of the system. Our judgements about struc-tural discrepancy determine our views about thequality of the match that we may achieve.The general structure of the problem is as follows.We represent the simulator as a vector function, tak-ing inputs x which represent system properties, andreturning outputs f ( x ) which are intended to corre-spond to certain properties, y , of the physical sys-tem. We have observations z on y . We represent thediﬀerence between z and y by the relation z = y + e, (1)where e is the vector of random observational er-rors, taken to be independent of y and, typically, ofeach other. If f ( x ) was a perfect representation ofthe system, then we would only accept a choice x ∗ as representing the system if f ( x ∗ ) = y . Because wecan only compare f ( x ∗ ) with z , we would thereforerequire the match between f ( x ∗ ) and z to be proba-bilistically consistent with the relation z = f ( x ∗ ) + e .However, because of structural discrepancy, evenif we had evaluated an appropriate choice f ( x ∗ ),we would still be uncertain about the true systemvalue, y . If we represent this residual uncertaintyby the random structural discrepancy vector ε and ALAXY FORMATION: BAYESIAN HISTORY MATCHING consider ε to be independent of f ( x ∗ ), then we canwrite y = f ( x ∗ ) + ε, (2)where, for example, the variance of each elementof ε expresses our judgement about how well thecorresponding element of f ( x ∗ ) is expected to re-produce that element of the system, and the cor-relation between two elements of ε expresses ourjudgements about the similarities of the issues relat-ing to each component of the discrepancy. We mayview ε as a way of expressing the sense that we areprepared to tolerate a less than perfect match, andexplore the eﬀect of diﬀerent choices for this toler-ance on the space of acceptable parameter matches.[For a much more detailed treatment of the conceptof model discrepancy, see Goldstein and Rougier(2009) and the accompanying discussion.] Speciﬁ-cation of beliefs for ε may partly be carried out byexperiments on the simulator itself [e.g., by explor-ing the eﬀect of perturbing the forcing function oradding some internal randomness to the propagationof an internal state vector propagated over time bythe model; see, e.g., Goldstein, Seheult and Vernon(2013)]. However, a large component of such speci-ﬁcation comes from the scientiﬁcally grounded butsubjective judgements of the expert.Combining (1) and (2), we therefore consider thematch acceptable if it is probabilistically consistentwith the relation z = f ( x ∗ ) + ε + e. (3)Our aim is to identify the collection, χ ( z ), of allchoices of x ∗ which would give acceptable ﬁts to his-torical data or, at the least, to identify a wide rangeof elements of χ ( z ). If our input parameter spaceis low dimensional, and the function is very fast toevaluate, then we can ﬁnd χ ( z ) by evaluating thefunction everywhere and identifying the collection ofall choices x ∗ consistent with (3). However, for mostcomplex physical models, it is infeasible to evaluatethe simulator at enough choices to search the inputspace exhaustively. Therefore, we must construct arepresentation of our uncertainty about the value ofthe simulator at each input choice for which we havenot yet evaluated the simulator. This representationis termed an emulator . The emulator both suggestsan approximation to the function and also containsan assessment of the likely magnitude of the error of the approximation. A common choice of form foremulation of component f i is f i ( x ) = X j β ij g ij ( x A i ) + u i ( x A i ) + w i ( x ) , (4)where the active variables x A i are subsets of the17 inputs, B = { β ij } are unknown scalars, g ij areknown deterministic functions of x A i , for example,polynomials, u i ( x A i ) is a Gaussian process or, in aless fully speciﬁed version, a weakly second orderstationary stochastic process, with, for example, cor-relation functionCorr( u i ( x A i ) , u i ( x ′ A i ))(5) = exp( −k x A i − x ′ A i k /θ i ) , and w i ( x ) is an uncorrelated nugget. Bg ( x ) ex-presses global variation in f , while u ( x ) expresseslocal variation in f . We ﬁt the emulators, given acollection of carefully chosen simulator evaluations,using our favourite statistical tools, guided by ex-pert judgement. We use detailed diagnostics to checkemulator validity. A good introduction to functionemulation is given by O’Hagan (2006).Using the emulator, we can obtain, for each choiceof inputs x , the mean and variance, E( f ( x )) andVar( f ( x )). Applying relation (3), for x ∈ χ ( z ), givesVar( z i − E( f i ( x ))) = Var( f i ( x )) + Var( ε i ) + Var( e i ).We can therefore calculate, for each output f i ( x ),the “implausibility” if we consider the value x to bea member of χ ( z ). This is the standardised distancebetween z i and E( f i ( x )), which is I i ) ( x ) = | z i − E( f i ( x )) | (6) / [Var( f i ( x )) + Var( ε i ) + Var( e i )] . Large values of I ( i ) ( x ) suggest that it is implausiblethat x ∈ χ ( z ). The implausibility calculation can beperformed univariately, or by multivariate calcula-tion over sub-vectors. The implausibilities are thencombined, such as by using I M ( x ) = max i I ( i ) ( x ),and can then be used to identify regions of x withlarge I M ( x ) as implausible. With this information,we can then refocus our analysis on the “nonim-plausible” regions of the input space, by makingmore simulator runs and reﬁtting our emulator oversuch subregions and iteratively repeating the analy-sis. This is a form of iterative global search aimed atﬁnding all choices of x which would give acceptableﬁts to historical data. We may ﬁnd χ ( z ) is empty, I. VERNON, M. GOLDSTEIN AND R. BOWER which is a strong warning of problems with our sim-ulator or with our data.History matching may be compared with modelcalibration which aims to identify the one “true”value of the input parameters x ∗ . Often, we willprefer to carry out a history match because eitherwe do not believe in a unique true input value forthe model or we are unsure as to whether any goodchoices of input parameters exist. Further, full prob-abilistic calibration analysis may be diﬃcult, as,typically, χ ( z ) will comprise a tiny volume of theoriginal parameter space. Therefore, even if there isan eventual intention to carry out a full probabilis-tic calibration, it is often good practice to historymatch ﬁrst, in order to check the simulator and toreduce the original parameter space down to χ ( z ).Finally, a note on the methods used in this study.We may carry out a full Bayes analysis, with com-plete joint probabilistic speciﬁcation of all of the un-certain quantities in the problem. Alternatively, wemay carry out a Bayes linear analysis, based juston a prior speciﬁcation of the means, variances andcovariances of all quantities of interest. Probabilityis the most common choice, but there are advan-tages in working with expectations, as the uncer-tainty speciﬁcation is simpler and the analysis ismuch more technically straightforward. Bayes lin-ear analysis [for a detailed account, see Goldsteinand Wooﬀ (2007)] is based around these updatingequations for mean and variance: E z [ y ] = E( y ) + Cov( y, z ) Var( z ) − ( z − E( z )) , (7) Var z [ y ] = Var( y )(8) − Cov( y, z ) Var( z ) − Cov( z, y ) . History matching ﬁts naturally with this approachand the Galform study has been analysed usingBayes linear methods. There are natural probabilis-tic counterparts, which we expect could have foundsimilar history matches to those we discovered, butwith considerably more eﬀort in prior speciﬁcationand computation.

4. APPLICATION TO A GALAXYFORMATION SIMULATION4.1 Sources of Uncertainty

We now describe the application of the method-ology introduced in Section 3 to the Galform modeldescribed in Section 2. In order to determine themeaning of an acceptable match, it is essential that we identify all sources of uncertainty that lie be-tween the model output f ( x ) and reality y . Notethat the majority of these uncertainties have beenneglected or ignored in even the most detailed ofprevious analyses. As discussed in Section 3, theuncertainties separate into two classes: the modeldiscrepancy ε and the observation errors e . In thecase of Galform, the model discrepancy was de-composed into three uncorrelated contributions ε =Φ IA + Φ DM + Φ E where:Φ IA Inactive variable uncertainty : due to codingissues, for the ﬁrst three waves we could not varyall 17 parameters simultaneously. The 9 least activeinputs were ﬁxed and their eﬀects represented bythis term.Φ DM dark matter uncertainty : unknown conﬁgu-ration of dark matter in the universe, assessed fromcomputer model experiments on the 40 sub-volumes.Φ E Subjective expert assessment of model discrep-ancy between full Galform model (all 17 inputs andcorrect dark matter) and real universe. Using a de-tailed elicitation tool, the cosmologist was able tospecify a multivariate covariance structure for Φ E representing beliefs about the known deﬁciencies ofthe model (over/under abundance of matter andageing rates of red/blue galaxies), leading to posi-tive correlations between the discrepancy for bj out-puts and smaller positive correlations across bj andK outputs. Our methods also incorporate a sensitiv-ity analysis regarding the expert’s assessment of Φ E (Goldstein and Vernon (2009)).The four contributions to the observation errors e are as follows: Luminosity zero point error : correlated uncer-tainty across luminosity outputs due to diﬃculty ofdeﬁning galaxy of “zero” brightness.

The k + e error : a highly correlated error on alloutput points due to (i) galaxies being so far awayit takes light billions of years to reach us and (ii)galaxies moving away from us so quickly their lightis redshifted. Normalisation error : correction for over/underpopulation of galaxies in local universe using the-oretical considerations of universe on large scales.

Galaxy production error : uncertain theoreticalcorrection due to bright/faint galaxies being mea-sured up to relatively large/short distances fromEarth.Note that the observations represent theory ladendata having been heavily preprocessed prior to ouranalysis and, hence, it would be dangerous to ne-

ALAXY FORMATION: BAYESIAN HISTORY MATCHING Fig. 3.

Left panel: the sd of each contribution from the various sources of uncertainty for the full range of the bj luminosityfunction (the x -axis is the same as Figure 2). The vertical lines represent the three bj outputs chosen for emulation in wave 1.Green line: the total uncertainty due to all contributions; this value is used for the error bars in Figure 2. The K luminosityresults are similar. The residual standard deviation σ for waves 1 to 3 (top right panel) and the adjusted R for waves 1 to 3(bottom right panel) for the polynomial part Bg ( x ) of each emulator [equation (4)]. We ﬁt high-dimensional cubic polynomialsdue to having large run numbers. First 6 connected points: bj outputs chosen for emulation, later 5 are the K outputs (shownas vertical lines in Figure 6, left and right panels, resp.). See Vernon, Goldstein and Bower (2010). glect any one of the above observational errors. Allthe above uncertainties are shown for the full bj lu-minosity function in Figure 3 (left panel), plottedas one standard deviation against luminosity [the x -axis is the same as Figure 2 (left panel)]. We proceed to emulate in iterations or waves asdescribed in Section 3. In each wave we design aspace ﬁlling set of runs, choose a subset of viableoutputs f i ( x ) for emulation, for each output choose a subset of active inputs x A and then construct aBayes linear emulator for f i ( x ) using equations (4)and (5). The emulators are combined with the sub-jectively assessed model discrepancy and the obser-vation errors to produce an implausibility measure I ( i ) ( x ) for each output [equation (6)]. We then dis-card regions of input space x that do not satisfycutoﬀs on I M ( x ), I M ( x ) and I M ( x ) [the ﬁrst, sec-ond and third highest I ( i ) ( x )]. Table 2 summarisesthe 4 waves that were performed. For example, inwave 1 we emulated only 7 outputs (shown as verti- Table 2

Summary of the 4 waves of emulation. Col. 2: the no. of model runs used to construct the emulator; col. 3: no. of outputsemulated, col. 4: the no. of active variables; col. 5–8: the implausibility thresholds; col. 9: the percentage of the parameterspace deemed nonimplausible

Wave Runs Outputs emul. Active inputs I M I M I M I MV % Space I. VERNON, M. GOLDSTEIN AND R. BOWER

Fig. 4.

The top three panels give waves 1, 2 and 3 minimised implausibility projection plots in the V hot , disk – α cool plane,representing I M ( x ) minimised across the remaining 15 inputs. The red region indicates high implausibility where input pointswill be discarded, green/yellow: nonimplausible points. The bottom three panels give the optical depth plots, showing the fractionof the hidden 15-dimensional volume that satisﬁes the implausibility cutoﬀ, at that grid-point. cal dotted lines in Figure 2) and used only 5 activevariables for each emulator, imposing cautious im-plausibility constraints on only I M ( x ) and I M ( x )[as I M ( x ) can be sensitive to inaccuracies in the em-ulators]. At each wave we performed 200 diagnosticruns to check emulator performance.In each new wave we perform more runs, the em-ulators become more accurate, the implausibilitymeasures more informative and, hence, we are ableto discard more space as implausible than in the pre-vious wave. Explicit improvement in the emulatorsover the ﬁrst three waves is shown in Figure 3 (topright and bottom right panels). We expect this em-ulator improvement, as at each wave (a) there are ahigher density of runs which improves the Gaussianprocess part of the emulator, (b) we can choose moreactive inputs x A , (c) we are emulating a smootherfunction since it is deﬁned over a smaller volume and(d) we can hence choose more outputs to emulate.The iterative nature of the space reduction processis the main reason the history matching approach isso powerful and is shown in Figure 4 for waves 1 to3. The percentage of input space remaining is givenin Table 2. We performed 4 waves of history matching in or-der to identify the set of input parameters consistentwith the luminosity function observations. Various2D projections of the nonimplausible set of inputs atwave 4 are shown in Figure 5 (left panel), where theprojections are onto the subspaces of pairs of 7 ofthe most interesting input parameters, out of the full17 given in Table 1. These projections, along withhigher-dimensional equivalents, provide the cosmol-ogists with detailed insight into to the behaviourof the Galform model: indeed, there was much ini-tial surprise as to the extent of the nonimplausi-ble region in some directions, despite it occupying atiny percentage of the original input space of only0 . ALAXY FORMATION: BAYESIAN HISTORY MATCHING Fig. 5.

Left: wave 4 minimised implausibility (below diagonal) and optical depth (above diagonal) projections (see Figure 4).Right: the wave 5 runs coloured by the implausibility consistent with the left panel, now with no emulator uncertainty, conﬁrmingthe wave 4 predictions. to obtain a large set of acceptable runs for use bythe cosmologists, a major goal of the project. Thesewave 5 runs are shown in Figure 5 (right panel) withthe same implausibility colour scale as in the leftpanel, but now without any emulator uncertainty.Large numbers of acceptable runs were found, and306 runs were found to satisfy the more strict cut-oﬀ I M ( x ) < .

5, superior to any matches previouslyfound by the cosmologists. The outputs of these ac-ceptable runs, along with those of previous waves,are shown in Figure 6. Note that the acceptable runs are good matches across all luminosities, not just atthe 11 ouptuts chosen for emulation.

5. CONCLUSION

The task of ﬁnding matches between complexgalaxy formation simulation output and observa-tions of the real universe represents a fundamentalchallenge within cosmology. Even to deﬁne what wemean by an acceptable match requires an assessmentof model discrepancy, which can only come througha, necessarily subjective, scientiﬁc judgement based

Fig. 6.

Left: the bj luminosity function output for the ﬁrst 500 runs of waves 1, 2 and 3 and the wave 5 acceptable runs thatsatisfy I M ( x ) < . . Right: K luminosity. The disparity at luminosity ≤

19 between K luminosity data and wave 5 runs is dueto the limited resolution of the dark matter simulation [see Bower et al. (2006)] and so is not considered of interest. I. VERNON, M. GOLDSTEIN AND R. BOWER on many years of experience in constructing suchsimulations. Therefore, this problem ﬁts naturallyinto a Bayesian framework, in which we treat all ofthe uncertainties arising from properties of the sim-ulator or of the data in a uniﬁed manner.The resulting problem, of identifying matches con-sistent with our uncertainty measures, is extremelychallenging, involving understanding the simulator’sbehaviour over a high-dimensional input parameterspace. It is diﬃcult to see how to proceed withoutthe use of carefully constructed Bayesian emulatorsthat represent our beliefs about the behaviour ofthe deterministic function at all points in the inputspace and which are fast to evaluate. These emu-lators are used within an iterative history match-ing strategy that seeks only to emulate in detailthe most interesting parts of the input space, andthus provides a global search algorithm which givesa practical and tractable Bayesian solution to theproblem.We have demonstrated this solution for the galaxyformation simulator. Speciﬁcally, we have identiﬁedthe regions of input space of interest, occupying0 . ACKNOWLEDGEMENTS

Supported by EPSRC, STFC, MUCM Basic Tech-nology initiative. REFERENCES

Bower, R. G. , Benson, A. J. et al. (2006). The brokenhierarchy of galaxy formation.

Mon. Not. Roy. Astron. Soc.

Bower, R. G. , Benson, A. J. et al. (2012). What shapes thegalaxy mass function? Exploring the roles of supernova-driven winds and AGN.

Mon. Not. Roy. Astron. Soc.

Bower, R. G. , Vernon, I. , Goldstein, M. et al. (2010).The parameter space of galaxy formation.

Mon. Not. Roy.Astron. Soc.

Craig, P. S. , Goldstein, M. , Seheult, A. H. and

Smith, J. A. (1997). Pressure matching for hydro-carbon reservoirs: A case study in the use of Bayeslinear strategies for large computer experiments. In

Case Studies in Bayesian Statistics ( C. Gatsonis , J. S. Hodges , R. E. Kass , R. McCulloch , P. Rossi and

N. D. Singpurwalla , eds.) Goldstein, M. and

Rougier, J. (2009). Reiﬁed Bayesianmodelling and inference for physical systems (with dis-cussion).

J. Statist. Plann. Inference

Goldstein, M. , Seheult, A. and

Vernon, I. (2013). Assess-ing model adequacy. In

Environmental Modelling: FindingSimplicity in Complexity

Wiley, New York.

Goldstein, M. and

Vernon, I. (2009). Bayes linear anal-ysis of imprecision in computer models, with applicationto understanding the Universe. In

ISIPTA’09: Proceedingsof the 6th International Symposium on Imprecise Probabil-ity: Theories and Applications

Goldstein, M. and

Wooff, D. (2007).

Bayes Linear Statis-tics: Theory and Methods . Wiley, Chichester. MR2335584

Norberg, P. , Cole, S. et al. (2002). The 2dF galaxy redshiftsurvey: The b J -band galaxy luminosity function and surveyselection function. Mon. Not. Roy. Astron. Soc.

O’Hagan, A. (2006). Bayesian analysis of computer code out-puts: A tutorial.

Reliability Engineering and System Safety Vernon, I. , Goldstein, M. and

Bower, R. G. (2010).Galaxy formation: A Bayesian uncertainty analysis (withdiscussion).

Bayesian Anal.5