[PDF] Comparison of Observed Galaxy Properties with Semianalytic Model Predictions using Machine Learning

Abstract

With current and upcoming experiments such as WFIRST, Euclid and LSST, we can observe up to billions of galaxies. While such surveys cannot obtain spectra for all observed galaxies, they produce galaxy magnitudes in color filters. This data set behaves like a high-dimensional nonlinear surface, an excellent target for machine learning. In this work, we use a lightcone of semianalytic galaxies tuned to match CANDELS observations from Lu et al. (2014) to train a set of neural networks on a set of galaxy physical properties. We add realistic photometric noise and use trained neural networks to predict stellar masses and average star formation rates on real CANDELS galaxies, comparing our predictions to SED fitting results. On semianalytic galaxies, we are nearly competitive with template-fitting methods, with biases of 0.01 dex for stellar mass, 0.09 dex for star formation rate, and 0.04 dex for metallicity. For the observed CANDELS data, our results are consistent with template fits on the same data at 0.15 dex bias in M star and 0.61 dex bias in star formation rate. Some of the bias is driven by SED-fitting limitations, rather than limitations on the training set, and some is intrinsic to the neural network method. Further errors are likely caused by differences in noise properties between the semianalytic catalogs and data. Our results show that galaxy physical properties can in principle be measured with neural networks at a competitive degree of accuracy and precision to template-fitting methods.

Full PDF

DDraft version May 23, 2019

Typeset using L A TEX twocolumn style in AASTeX62

Comparison of Observed Galaxy Properties with Semianalytic Model Predictions using Machine Learning

Melanie Simet,

1, 2

Nima Chartab Soltani, Yu Lu, and Bahram Mobasher University of California Riverside, 900 University Ave, Riverside, CA 92521, USA Jet Propulsion Laboratory, California Institute of Technology, Pasadena, CA 91109, USA The Observatories, The Carnegie Institution for Science, 813 Santa Barbara Street, Pasadena, CA 91101, USA

ABSTRACTWith large-scale galaxy surveys, we can observe hundreds of thousands of galaxies or more, up tobillions with upcoming experiments such as WFIRST, Euclid and LSST. While such surveys cannotobtain spectra for all observed galaxies, we have access to the galaxy magnitudes in color ﬁlters.This data set behaves like a high-dimensional nonlinear surface, making it an excellent target formachine learning methods. In this work, we use a lightcone of semianalytic galaxies tuned to matchCANDELS observations from Lu et al. (2014) to train a set of neural networks on a set of galaxyproperties (stellar mass, metallicity, and average star formation rate) using the truth values from thesemianalytic catalogs. We also demonstrate the eﬀect of adding simulated observational noise to thesimulated magnitudes, and then use neural networks trained on the noisy data to predict stellar masses,metallicities, and average star formation rates on real CANDELS galaxies, comparing our results tothe physical parameters obtained from SED ﬁtting. On semianalytic galaxies alone, we are nearlycompetitive with template-ﬁtting methods. For the observed CANDELS data, our results are notas accurate, with indications that this inaccuracy is due to a combination of diﬀerent assumptions intemplate-ﬁtting and diﬀerences between the semianalytic models and the observed galaxies, particularlyin the noise properties. Our results show that stellar mass, metallicity, and star formation rate can inprinciple be measured with neural networks at a competitive degree of accuracy and precision relativeto physically motivated template-ﬁtting methods if an appropriate training set can be obtained.

Keywords: galaxies: statistics, galaxies: fundamental parameters, methods: numerical INTRODUCTIONOver the last decade, a number of wide-area spectro-scopic surveys have been performed, including WiggleZ(Drinkwater et al. 2010), BOSS (Dawson et al. 2013),GAMA (Driver et al. 2012) and zCOSMOS (Lilly et al.2007). These were used to measure spectral line diag-nostics of physical conditions in galaxies to study theevolution of galaxies with look-back time. The resultsfrom these studies were often limited by Poisson noisedue to the small number of galaxies when divided interms of their redshifts or in physical parameter bins.In recent years the scale of such surveys have increasedby many orders of magnitude in terms of both depthand area coverage. The on-going Dark Energy Survey(DES; Drlica-Wagner et al. 2018) and future Large Syn-

Corresponding author: Melanie [email protected] optic Survey Telescope (LSST; LSST Science Collabo-ration et al. 2009) will generate billions of galaxies withmulti-waveband photometric data. These will soon joinby space-based surveys like Euclid (Laureijs et al. 2011)and WFIRST (Spergel et al. 2015). Given their size,it is impossible to perform spectroscopic observations ofindividual galaxies. Therefore, techniques should be de-veloped to measure the physical parameters associatedwith galaxies in these surveys, needed to study the pro-grams they have been designed for.The physical parameters for galaxies (redshift, stellarmass, star formation rate, extinction) are conventionallyestimated by ﬁtting their observed Spectral Energy Dis-tributions (SEDs) to template SEDs that correspond tothe type and class of galaxies in the observed sample.By shifting the model SEDs in redshift space, they areﬁtted to the observed SED. The template that providesthe best ﬁt to observations is then chosen, with its as-sociated physical parameters assigned to the galaxy inquestion (Bolzonella et al. 2000; Brammer et al. 2008; a r X i v : . [ a s t r o - ph . GA ] M a y ? ). A serious problem in using template-ﬁtting methodis that the results depend on the type of the galaxiesused in template models. For example, if the galaxyin question is not represented in the model templates,there will be serious uncertainties in the estimated pa-rameters for that galaxy. Furthermore, apart from red-shifts, for which a true estimate could be made (in theform of spectroscopic redshift) and used to measure itsuncertainty, measuring a true value for other physicalparameters is diﬃcult and the results often do not di-rectly correspond to the predicted value. Moreover, theysuﬀer from photometric uncertainties, absence of a fullphotometric coverage and degeneracies between diﬀer-ent parameters (Hildebrandt et al. 2010; Abdalla et al.2011; S´anchez et al. 2014).Recently, Machine Learning (ML) techniques haveemerged as independent alternatives for measuring thephysical parameters of galaxies (Sadeh et al. 2016; Ballet al. 2008; Hogan et al. 2015; Masters et al. 2015). Us-ing a training sample with known physical parameters,they generate statistical models to predict the distribu-tion of those parameters in a target data set. The MLtechniques are only applicable within the limits of thetraining data. Any extrapolation of them to a diﬀerentredshift or mass regime would lead to errors in the ﬁ-nal estimate. However, a distinct advantage of ML isthat one could incorporate extra information (i.e. mor-phology, galaxy light proﬁle) within the algorithm. TheML techniques are divided into two categories: super-vised and unsupervised. In supervised ML, the inputattributes (magnitudes and colors) are provided alongwith the output (redshift) and directly used for trainingin the learning process (Lima et al. 2008; Freeman et al.2009; Gerdes et al. 2010). Here, the learning processis supervised by the input parameters. The unsuper-vised techniques do not use the desired output values(e.g. spectroscopic redshifts) during the training withonly the input attributes used. Due to its limitations,this is not frequently used.The use of ML methods has so far been limited mainlyto measuring photometric redshifts of galaxies. This isbecause the ML algorithms can be more easily trainedusing the spectroscopic redshifts whereas for other pa-rameters such true and unambiguous estimates are hardto obtain. Detailed comparisons have been performedbetween diﬀerent methods and algorithms measuringphotometric redshifts (Carrasco Kind & Brunner 2013;Dahlen et al. 2013) and masses (Mobasher et al. 2015)of galaxies. However, except in a few cases where MLtechniques are used (MCC and neural net) these weremostly based on diﬀerent variants of template ﬁttingusing observed or model SEDs. In particular, very few studies have performed a comparison of diﬀerent MLtechniques. A recent study has used Bayesian combina-tion of photometric redshift Probability Density Func-tions (PDFs) using diﬀerent ML methods to improveestimates of photometric redshifts (Carrasco Kind &Brunner 2014). However, no such study has been per-formed for other physical parameters of galaxies.In this paper we use diﬀerent ML techniques to mea-sure the stellar mass and star formation rates of galax-ies with available photometric data and photometricredshifts using a training set of semianalytic galaxiesfrom (Lu et al. 2014). The test sample in this studyis from the Hubble Space Telescope (HST) Cosmic As-sembly Near-infrared Deep Legacy Survey (CANDELS)GOODS-South ﬁeld. This has imaging data in optical(ACS) and near-infrared (WFC3) wavelengths as well asSpitzer Space Telescope mid-infrared (IRAC) bands. Wedescribe the CANDELS data and semianalytic galaxiesin section 2, give an overview of the neural network MLalgorithm we use in section 3, test our neural networkswith a semianalytic test set in section 4, and ﬁnally ap-ply our networks to CANDELS data in section 5, withsummary and conclusions in section 6. Throughout thepaper, we assume the cosmology of the simulation usedto generate the mock catalog and semianalytic galaxies:Ω M = 0 .

27, Ω Λ = 0 .

73, Ω b = 0 . h = 0 . n = 0 . σ = 0 . DATA2.1.

CANDELS galaxies

The Cosmic Assembly Near-IR Deep ExtragalacticLegacy Survey (CANDELS; Grogin et al. 2011; Koeke-moer et al. 2011) is a ∼

800 arcmin survey per-formed using the Wide Field Camera 3 (WFC3) andAdvanced Camera for Surveys (ACS) on the HubbleSpace Telescope. The survey consists of ﬁve ﬁelds(GOODS-S, GOODS-N, COSMOS, EGS, and UDS).Multi-waveband photometric imaging observations wereperformed spanning the wavelength range from UVto mid-infrared. In two of these ﬁelds (GOODS-Sand GOODS-N) deeper photometric observations oversmaller areas were performed.In this study, we use data from the GOODS-S deepﬁeld (Guo et al. 2013) covering an area of 170 arcmin .The combination of CANDELS pointings with supple-mentary data sets (Giavalisco et al. 2004; Riess et al.2007; Bouwens et al. 2010; Windhorst et al. 2011) resultsin a multi-waveband catalog consisting of three WFC3ﬁlters (F105W, F125W and F160W) and ﬁve ACS ﬁl-ters (F435W, F606W, F775W, F814W and F850LP).Any galaxy that was not detected in any of the aboveﬁlters or which was not covered by imaging in any of theﬁlters was excluded from the catalog. The ﬁnal data setconsists of 20,512 objects out of an initial 34,930.We use the spectral energy distribution (SED) ﬁt-ting code LePhare (Arnouts et al. 1999; Ilbert et al.2006), combined with the Bruzual & Charlot (2003) stel-lar population synthesis models, to derive the physicalproperties for each galaxy (stellar mass and star forma-tion rate). We assume an exponentially declining starformation history with nine diﬀerent e-folding times inthe range of 0 . < τ <

30 Gyr. The dust propertiesare modeled with varying E ( B − V ) between 0 and 1.1assuming the Calzetti et al. (2000) dust attentuationcurve. We also take into account nebular emission linecontribution as described in Ilbert et al. (2009) and as-sume a Chabrier initial mass function Chabrier (2003a),with lower and upper mass limits 0.01 Msun and 100Msun respectively. We consider three diﬀerent metallic-ity values: Z = 0 .

02, 0 . . LePhare produces a marginalized likelihood of stel-lar masses, and we use the median value of this likeli-hood as our stellar mass estimate. To measure the starformation rate (SFR), we use the rest-frame UV lumi-nosity which traces timescales of ∼

100 Myr which isassociated with continuum from massive, short-lived Oand B type stars. We adopt the Salim et al. (2007)SFR(UV) calibration:log SFR = − . UV , AB − .

53 (1)where M UV , AB is the dust-corrected monochromatic ab-solute UV magnitude in AB system. We measure theobserved UV magnitude (M UV , observed ) by using the1600˚A ﬂux density from the best-ﬁt SED. The UV spec-tral slope ( β UV ) is measured by ﬁtting a power law ofthe form f λ ∝ λ β UV between 1300˚A < λ < f λ being the wavelength-dependentﬂux density. We dust correct the observed M UV by as-suming the Meurer et al. (1999) calibration:M UV = M UV , observed − (1 . β UV + 4 .

43) (2)where M UV is the dust-corrected UV magnitude.2.2. Semianalytic galaxies

We use mock catalogs generated from semianalyticmodels to mimic CANDELS observations, as presentedin Lu et al. (2014). In short, a semianalytic modeltakes a cosmological dark-matter-only simulation andadds baryonic components with recipes for their evo-lution through cosmic time depending on the evolvingproperties of the dark matter host halo. This baryoniccomponent can consist of stars, cold gas, hot gas, or ablack hole, with various physical processes transferring mass from one component into another (e.g., stars canform from cold gas).The mock catalog we use was presented in Lu et al.(2014) as the “Lu” model. It assumes heating of thegas by reionization which, in turn, limits the fraction ofthe baryons that collapse into the halos (Gnedin 2000;Kravtsov et al. 2004). Radiative cooling is estimatedassuming the Croton et al. (2006) model that collapsesa fraction of the hot gas onto central (but not satel-lite) galaxies depending on the cooling timescale of thehalo. As in other semianalytic models, the cold gas isassumed to be distributed in an exponential disk wherestars are formed. A particular feature of model we usehere is that the star formation rate depends on the cir-cular velocity of the host halo in addition to the moretypical dependences on star-forming gas mass and diskdynamical time scale. Supernova feedback reheats thecold gas and ejects both cold and hot gas. No explicitmodel for black hole accretion or AGN feedback is as-sumed, but a halo quenching model is implemented thatswitches oﬀ radiative cooling above a halo mass around10 M (cid:12) . Galaxy mergers are handled by following sub-halo information in the merger tree and assuming thateven an unresolved subhalo will remain intact for somefraction of the dynamical friction time. A fraction of themass in new stars is assumed to convert into metals andis instantly recycled back into the disk (and from thereinto the cold and hot gas surrounding the disk, accord-ing to the above prescriptions), parameterized using aChabrier initial mass function (Chabrier 2003b).The merger trees for our mock catalogs used weredrawn from the Bolshoi N -body cosmological simu-lation (Klypin et al. 2011), with a volume of (250Mpc/ h ) , using 8 billion particles with a mass resolu-tion of 1 . × M (cid:12) , and 180 stored time steps. Haloﬁnding was performed with the Rockstar code (Behrooziet al. 2013a) and merger trees were constructed usingthe Consistent Trees algorithm (Behroozi et al. 2013b).Lightcone halo mock catalogs are extracted from thesimulation box. These lightcone catalogs mimic the ﬁveCANDELS ﬁelds, and have redshift range from z = 0 to z = 10. Eight realizations are generated for each of theﬁelds. Each dark matter halo in a lightcone catalog has aunique ID. The corresponding dark matter halo mergertree branch is found from the simulation box and rootedon the halo.The model parameters used in Lu et al. (2014) aretuned to match calibration data. The main calibra-tion set is the stellar mass function of local galaxiesfrom Moustakas et al. (2013). Parameters were tunedusing Markov Chain Monte Carlo chains with the diﬀer-ential evolution algorithm (Braak 2006) and a likelihoodbased on a weighted χ , with a parameterization to ac-count for incompleteness in the data at low mass.The best ﬁt model is adopted to apply onto eachmerger tree of the lightcone mocks to predict the starformation history of each galaxy hosted by every halo inthe mock. The semianalytic models contain simulatedmagnitudes for the eight CANDELS bands describedabove.We only use galaxies from the mock catalog withinthe redshift range 0 . < z <

6. We also impose a cutsuch that the H -band AB magnitude is <

30, to avoida small population of high-mass low-magnitude galaxiesthat are separated in parameter space from the rest ofthe models, and are signiﬁcantly fainter than them.In the real world, the data contains observational un-certainties. To examine how much observational er-ror degrades the best-case performance of our neuralnetworks, we will also analyze the simulations withpseudo-“observational error” applied. We incorporateerrors in the mock catalogs using the observational er-rors associated with CANDELS galaxies. We measurethe median multiplicative ﬂux error in bins of magni-tude with width 0.5. These were then linearly inter-polated to obtain a typical multiplicative uncertaintyfor a given magnitude in the data. We then draw arandom Gaussian-distributed number δF with a scalelength given by this multiplicative uncertainty, and addthe term η = − . (1 + δF ) to simulate observa-tional errors. The simulation data perturbed in thisway looks similar to the CANDELS data. This pseudo-observational error is added when the catalogs are readinto our neural network software, before any colors arecomputed or any splitting into training and validationdata sets is performed. We also explore a case where wemimic the noise properties of only well-resolved galax-ies, deﬁned as galaxies with ﬂux signal-to-noise ratios ≥ Comparison of semianalytic models withCANDELS observations

The semianalytic models have been tuned to matchobservations (Lu et al. 2014). Here, we compare thegalaxy magnitudes in both the noise-free and noisy case,to demonstrate the feasibility of using the semianalyticgalaxies to train a neural network that is then appliedto the CANDELS data.In Fig. 1, we show color-color plots, with heatmapsshowing the location of both the semianalytic galaxies(with and without simulated noise) and the observed B - S D SS i Figure 1.

Color-color plots for one projection of the 8-dimensional color space. Left panel contains semianalyticgalaxies without noise, center panel is CANDELS measure-ments, right panel is semianalytic galaxies with simulatedphotometric noise. The spanned color space is very similar;the vertical feature in the semianalytic plots missing from theCANDELS distribution consists of faint high-redshift galax-ies below the CANDELS completeness limit.

CANDELS galaxies. We chose this projection of thehigher-dimensional color space as good representativesof the general trend: the semianalytic galaxies withoutnoise lie on thin manifolds, with the CANDELS galaxiesbeing consistent with their overall trend but signiﬁcantlybroadened by observational noise, and the semianalyticgalaxies with added noise looking similar to the CAN-DELS galaxies, but with some galaxies smeared out-side the boundaries of the semianalytic and CANDELSgalaxies by the noise. In the ﬁrst panel, the verticalmanifold at i − z ≈ − . MACHINE LEARNING PROCEDUREMachine learning techniques have recently beenadapted to problems in astronomy including photo-metric redshifts (e.g. S´anchez et al. 2014; Bilicki et al.2017; Tanaka et al. 2018), large-scale structure (Aragon-Calvo 2018), galaxy morphologies (Dom´ınguez S´anchezet al. 2018), and calibration factors for weak lensingdata measurement algorithms (Tewes et al. 2018). Inthis work, we will be using the technique known as aneural network.The idea of a neural network is simple. The underly-ing motivation is that we have a data vector (say galaxycolors), as well as a desired output (stellar mass, for ex-ample) that is a nonlinear function of that data vector,and we want to learn how to estimate the output givena new data vector. We don’t know the optimal formof the function, however, and even if we did, it is likelythat the complicated form would make determining theparameter values diﬃcult. Neural networks computa-tionally determine the important combinations of thedata points, and the appropriate coeﬃcients for thosecombinations, by breaking the prediction process downinto a series of linear combinations of the data points,combined with nonlinear transformations of those sumsto reproduce nonlinear behavior.As illustrated in Fig. 2 for a single data point, in theusual neural network setup, a set of vectors is fed toa layer of computational units called nodes. Each nodecomputes a weighted sum of the coordinates of each vec-tor, and optionally applies a simple nonlinear functionto the sum (which might clip negative values, for exam-ple). The output of the layer–one value for each node foreach input datapoint–can then act as a new set of vec-tors, which is fed to a new layer of nodes. The ﬁnal layerhas only one node, and the value it computes for eachdata point is the prediction, y p , that corresponds to thetrue value of y for the data points. The free parametersof the model produced by the network are the weightsused to compute the weighted sum in each node. Tocompute the optimal values of these parameters—a pro-cess called training the network —iterative adjustmentsare made based on a comparison of the predictions y p to the known true values for the input data points y .The iterations proceed until either the desired accuracyis reached or the network stops improving its accuracy.A more detailed overview of this technique with addi-tional complications can be found in Goodfellow et al.(2016). Error estimation can be performed by trainingmultiple networks (e.g. Bilicki et al. 2017) but here weuse a simple point estimate from a single network. Inits simplest form, a neural network consists of a series ofmatrix multiplications with a simple function applied to the outputs: for an input data point x with M featuresfed to a layer of N nodes, the output O i of the i th layeris simply O i = f ( W i x + C i ) (3)where C i is a scalar or vector of oﬀsets and W i is amatrix. The function f ( x ) is chosen to ﬁt the problemat hand, while the values of the matrices W i are nu-merically trained to ﬁt the training sample as describedabove. By convention, this function is called a “responsefunction” in machine learning terminology. Again, theresponse function is usually nonlinear: if it is linear,then the whole network produces a simple linear combi-nation of the input data. The computational diﬃcultyof neural networks comes from the complexity of train-ing the many values of the W matrices, and particularlythe diﬃculty of training more than one or two layers(Goodfellow et al. 2016; Lanusse et al. 2018, and ref-erences therein). The initial values of W are typicallyset near 1 with small random oﬀsets in each element(Abadi et al. 2015), meaning that (unless the randomnumber generator is seeded in the exact same way) thesame neural network architecture will produce slightlydiﬀerent predictions for the same training set.The necessary choices to set up a neural network in-clude: • The number of layers • The number of nodes in each layer • The response functions f i • The method used to train the network (i.e., to alterthe weight matrices W i after each iteration) • The function that computes a metric distance be-tween the predicted points and the true values(called a loss function )We use the Google package

TensorFlow (Abadi et al.2015) to implement our neural network. TensorFlow isa highly-optimized framework designed to enable fastimplementation of neural networks and other machine-learning problems. The package runs mostly compiledcode to increase speed of execution, but the user in-terface is in Python. For this work, we use the high-level “Estimator” API for

TensorFlow , which automatesmost of the bookkeeping necessary to setup and train aneural network. Our network has a set of 3 20-node lay-ers; we found this architecture to be complex enough toreproduce high-dimensional nonlinear manifolds in color x f ( W x ) f ( W x ) f ( W n x ) x x x x n f ( W m × x m ) f ( W m × x m ) f ( W mn × x m ) x m x m x m x mn W ( m +1)0 x m y p ( x ) Figure 2.

A schematic for a simple neural network. On the left, the inputs consist of N galaxy colors shown in blue boxes,plus a constant value to provide oﬀsets. This vector of data (everything in the red rectangle labeled x ) is passed to a layer of n nodes (the green circles); each node performs a weighted sum, and then performs a nonlinear transformation of the sum via f ( x ). The outputs of this layer of nodes, along with another constant value, form a new vector, x , which is itself fed to anotherlayer of n nodes. This process repeats until we have fed the data through m layers. Finally, the outputs of the last layer, x m ,are fed to a single node that performs a weighted sum, and the value of this weighted sum is our prediction, y p . space, while still being simple enough that the neuralnetwork training algorithm could converge on a goodsolution. The response functions f i in our network arethe “relu”, or rectiﬁed linear unit, function (Glorot et al.2011): f ( x ) =  x if x > = 0 , x < . (4)This satisﬁes the requirement that f ( x ) is nonlinear in x to reproduce nonlinear behavior, while still being com-putationally eﬃcient.Our loss function is a simple squared distance betweenthe labels y from our catalog and the predictions y p ( x )from our neural network, summed over input trainingpoints k : L = (cid:88) k ( y k − y p ( x k )) (5)This is, of course, related to the χ that is more com-monly used. In this case, we do not normalize by theexpected values y k or expected uncertainty; we foundthat doing so has the eﬀect of focusing the training ﬁrston small values of y k , in practice making it more dif- ﬁcult to ﬁnd a manifold that works well for the entireparameter range.3.1. Training the network

To train the network, we split the data into two sets:a set to use to train the coeﬃcients of the network, and aset to evaluate how well our network is reproducing thedata. The two sets must be diﬀerent in order to avoidoverﬁtting, the problem where the network reproducesnot only the average trends in the data, but the speciﬁcnoise realization of the data set used for training (Good-fellow et al. 2016). We use a 70-30 split, with 70% ofthe data used to train the coeﬃcients (called the “train-ing set”) and 30% to check its performance at intervals(called the “validation set”).We use stochastic gradient descent to train the net-work, in which we iteratively compute the predictionsof the network on a random subset of our training data(Goodfellow et al. 2016). In our case, we ﬁnd that using10 ,

000 points per step works well, with a check againstthe validation set every 500 steps (5 · total train-ing data points passed through the network). We usethe Adam optimizer (Kingma & Ba 2014) to update thecoeﬃcients after every step. Brieﬂy, the Adam algo-rithm involves an adaptive learning rate computed frommoments of derivatives of the loss function, Eq. (5) sothat the coeﬃcients change more quickly when the gra-dient of the loss function is high, and then change onlyslowly as the loss function approaches a (local) mini-mum. We use the default parameter settings for theAdam optimizer implementation within TensorFlow .We explored changing the learning rate α , which con-trols how fast or slow the coeﬃcients change for a givenvalue of the loss function, but accuracy and precisiondecreased as we moved away from the default value. APPLICATION TO SEMIANALYTIC GALAXIESIn our semianalytic catalog, each mock galaxy is asso-ciated with four physical parameters: stellar mass M star ,redshift z , stellar metallicity Z star , and average star for-mation rate SF R . There has been good progress inusing machine learning techniques to measure photo-metric redshifts (Salvato et al. 2018). Therefore, weconcentrate on measuring other parameters here.We train networks using 5 possible sets of input datacolumns. We will use the bold letters in parentheses asshorthand in tables throughout the paper. • Galaxy magnitudes ( m ) • Galaxy pairwise colors ( B − V , V − i , etc) ( c ) • Galaxy magnitudes and pairwise colors ( mc ) • All galaxy colors ( B − i in addition to B − V and V − i , etc) ( C ) • Galaxy magnitudes and all galaxy colors ( mC )Machine learning algorithms generally respond toinformation in diﬀerent ways than the deterministicmodel-ﬁtting methods more commonly used in astron-omy. If the output is sensitive to a particular combi-nation of data points (such as a color formed from twonon-adjacent ﬁlters), then it is often more numericallyeﬃcient to give the algorithm this combination, evenif the magnitudes are available to the algorithm to besubtracted internally. To some extent this is also de-pendent upon the training we are doing: we are, insome sense, optimizing the inputs for the architecture Other quantities in the catalog, for example the dark mat-ter halo mass, will correlate with galaxy light properties becausethey correlate with, for example, the stellar mass; for this work,we consider only the direct correlations, not such indirect corre-lations, which pick up both additional parameters like the stellar-halo mass connection and additional noise. L o ss mcCmcmC Figure 3.

The loss function for the test set the ﬁrst 100training iterations (500 steps per iteration) in neural net-works trained to reproduce stellar mass. Letter codes arediﬀerent input column choices: m, magnitudes; c, pairwisecolors; C, all colors. Training the network is a numericaloptimization problem and does not always proceed mono-tonically. of our network, in addition to optimizing the networkthat uses those inputs. A diﬀerent set of layer and nodeparameters, or a diﬀerent learning rate, for example,might respond diﬀerently to the choice of input values.To quantify how well a networks performs, in additionto the loss function, we will report the mean bias, uncer-tainty, and 3 σ outlier rate, computed for the parameter b k = y k − y p ( x k ) (6)where, as above, y k is the true value of the galaxy prop-erty for data point k , and y p ( x k ) is the predicted valuefor the galaxy property computed using the vector x k .We report (cid:104) b (cid:105) , σ b = ( (cid:104) b (cid:105) − (cid:104) b (cid:105) ) / , and the fraction ofdata points with | ( b k − (cid:104) b (cid:105) ) /σ b | > Basic predictions

Stellar mass

Table 1 shows that we achieve the best performancewhen our input data includes galaxy magnitudes as wellas the set of all colors (not just pairwise colors). Thefact that using all colors optimizes the relation may bespeciﬁc to our neural network setup: the non-pairwisecolors can be computed from the pairwise colors, andanother architecture might discover this relation.

Noise Input Number Average 3 σ outlierlevel columns of steps Loss / bias Uncertainty rateNone m 100000 0.157 − .

055 0.147 0.013None c 100000 0.115 0.003 0.115 0.016None C 100000 0.120 − .

021 0.119 0.018None mc 100000 0.071 − .

025 0.066 0.0147

None mC 100000 0 . − .

011 0 .

056 0 . σ mC 10000 0.124 0.007 0.124 0.012Full noise mC 10000 0.204 -0.019 0.203 0.018 Table 1.

Summary of the performance of the neural networks for log M star . Input column codes are: m, magnitudes; c,pairwise colors; C, all colors. The best-performing metric for the noise-free case is highlighted in bold text. See section 4 formore details on column choices and performance metrics. In Fig. 3, we show the value of the loss function as afunction of the number of training steps for the diﬀerentsets of input columns. The numerical nature of the algo-rithm is obvious from the non-monotonic behavior andsharp jumps in some of the lines. The magnitude-onlyor color-only networks, which did not perform as wellfor stellar mass, asymptote to relatively high values ofthe loss relatively early on; 100 steps in, however, themagnitude plus color networks are still improving.In the absence of noise, our performance on log M star is reasonable for individual galaxies, with overall biasesa few percent or less and uncertainty of the order of 10per cent, as shown in Table 1. The overall distributionis well reproduced by eye, and likely well reproducedenough for scientiﬁc applications, although distributionsimilarity tests such as the K-S test indicate that thedistributions are statistically diﬀerent. The error as afunction of galaxy parameter in Fig. 4 demonstrates thatwe can train the networks well where there is a highdensity of points, but we have increased errors towardsthe tails of the distribution.We now compare our results to stellar mass measure-ments of mock catalogs. Mobasher et al. (2015) per-formed a comprehensive study to estimate uncertaintiesand sources of bias in measuring physical parameters ingalaxies. A number of diﬀerent tests were performed,based on diﬀerent initial parameters and codes. Herewe adopt TEST-2A as our benchmark, as it is closest toour sample. TEST-2A used semianalytic models witha diversity of star formation histories and other param-eters, and as with our semianalytic models, used mea-surements in CANDELS ﬁlter bands as their data set.We note that TEST-2A used more CANDELS bandsthan our measurement here: 13 instead of 8, includ-ing U-band, K-band, and Spitzer infrared data in addi-tion to the Hubble optical and near-infrared bands weuse, and excluding the F184W (ACS) ﬁlter that we use.Comparing to the distribution of biases and uncertain- ties returned by the individual methods in Mobasheret al. 2015, we ﬁnd that we easily improve on the biasand have competitive uncertainty in our measurements.However, we expect to have lower bias: diﬀerent as-sumptions of initial mass functions, star formation rates,etc were found by Mobasher et al. 2015 to be a sourceof systematic oﬀsets between diﬀerent codes, adding aconstant bias to every galaxy, whereas we are implicitlyassuming the same initial mass function and star for-mation rate since our samples are drawn from the samemock catalog. Our templates also have varying amountsof dust and varying metallicities, the lack of which hasalso been found to bias stellar mass estimates (Mitchellet al. 2013).We repeat the analysis with pseudo observational er-ror, either matched to the full uncertainty distributionor matched to the uncertainty distribution of galaxieswhich have at least a 5 σ -detected ﬂux measurement ineach band. We show only the best-performing input col-umn set from the noise-free case, after checking that thiscolumn choice still performs best when error is added.This additional simulated observational noise causes anincrease in bias and a large increase in uncertainty, vis-ible in the right-hand plots of Fig. 4. However, the av-erage bias of − .

019 dex is still smaller by an order ofmagnitude than any of the biases from Mobasher et al.(2015) (with the same caveat about correct implicit as-sumptions), and the uncertainty only a little worse thanthe maximum uncertainty from that comparison, 0.203dex instead of 0.183 dex.4.1.2.

Metallicity

As with stellar mass, we ﬁnd good performance fromour metallicity-predicting neural networks, with smallbiases and order 10 per cent uncertainty, as shown inTable 2 and Fig. 5. The overall predicted distributionis somewhat more skewed from the original distributionthan in the stellar mass case, though again we performwell where the density of training points is high. As with P r e d i c t e d - t r u e l o g M s t a r No noise Simulated noise M star d N / d l o g M s t a r Figure 4.

Neural network performance reproducing stellar mass on the validation set of the simulation galaxies. Left panelsare noise-free, right panels have simulated observational noise. Top panels: Prediction error relative to the true values as afunction of predicted stellar mass. Performance is shown in bins of size ∆ log M star = 0 .

1, and the median and 1-, 2-, and3 σ percentile contours are shown as the dark blue line and the three lighter blue regions, respectively. Outliers are shown aslight-blue crosses. The white line is zero bias. The bias is low in the intermediate range of stellar masses, where most of thetraining points are, but increases at low and high stellar mass. Bottom panel: The histograms of predicted and actual stellarmass, which are similar in shape for the noise-free case, though the predicted stellar mass peaks at a slightly lower value thanthe truth. The predicted distribution is signiﬁcantly skewed in the noisy case relative to the expected distribution.Noise Input Number Average 3 σ outlierlevel columns of steps Loss / bias Uncertainty rateNone m 100000 0.092 0.002 0.092 0.013None c 100000 0.075 0.010 0.074 0.015 None C 100000 0 . − .

001 0 .

064 0 . None mc 100000 0.070 0.007 0.069 0.014None mC 100000 0.065 0.017 0.062 0.01375 σ C 10000 0.149 -0.008 0.149 0.011Full noise C 10000 0.204 -0.035 0.201 0.015

Table 2.

Summary of the performance of the neural networks for log Z star . Input column codes are: m, magnitudes; c,pairwise colors; C, all colors. The best-performing metric for the noise-free case is highlighted in bold text. See section 4 formore details on column choices and performance metrics. stellar mass, we perform worse when noise is included,but not by a signiﬁcant amount: an increase of bias by afactor of 2 and uncertainty by a factor of 3. This uncer-tainty, of 0.2 dex in the full-noise case, is larger than theconvolutional neural network machinery of Wu & Boada(2018), who obtained 0.08 dex uncertainty, albeit usingsubstantially more data (3-color 128x128 pixel cutouts,not eight measured magnitudes) and brighter and lower-redshift galaxies (brighter than 25th magnitude). Weobtain a similar result to theirs in that low-metallicitygalaxies have systematically high metallicity predictionsin the presence of noise.This good performance on metallicity requires somediscussion. Metallicity is typically the hardest of theseproperties to measure by traditional methods so it is surprising that we are able to do well. The dominantcomponent of our performance is driven by the relation-ship between metallicity and stellar mass. There is astrong relationship between these quantities in data andin simulation (Lu et al. 2014, and references therein),and in practice the fact that we can ﬁt stellar mass wellmeans that most of our prediction of metallicity comesfrom the ability of the network to learn the M star − Z star connection and to predict M star from the photometry, ascan be seen by the similar shape of the bias-parametercurves in the noisy right-hand columns of in Fig. 4 andFig. 5. That cannot be the full picture–metallicity per-forms best when we use colors alone, while stellar massperforms best when we use both magnitudes and colors–but it is a signiﬁcant contributor to our good results.0 P r e d i c t e d l o g Z s t a r - t r u e l o g Z s t a r No noise Simulated noise Z star d N / d l o g Z s t a r Figure 5.

Neural network performance reproducing stellar metallicity on the validation set of the simulation galaxies. Leftpanels are noise-free, right panels have simulated observational noise. Top panel: Prediction error relative to the true values asa function of predicted stellar metallicity. Performance is shown in bins of size ∆ log Z star = 0 .

1, and the median and 1-, 2-,and 3 σ percentile contours are shown as the dark blue line and the three lighter blue regions, respectively. Outliers are shownas light-blue crosses. The white line is zero bias. As with stellar mass, we perform well for metallicities in the middle of themetallicity range where the density of training points is high, but we see increased bias and, here, increased uncertainty forpoints with low or high metallicity. Bottom panel: The histograms of predicted and actual stellar metallicity. Here, it is clearthat the predicted metallicities have a narrower distribution than the true metallicities even in the noise-free case. Z true M L i m p r o v e m e n t o v e r s t e ll a r m a ss f i t i n l o g Z p r e d i c t Figure 6.

A 2-D distribution showing the improvement inmetallicity estimation made by our neural network, relativeto a simple model that uses the mean M star − Z star rela-tion. Points above the white line at 0 have a neural networkprediction that is closer to the true value of Z star than thesimple model prediction. The bulk of the points are above0, indicating that the neural network is improving on thesimple model, but the amplitude of the improvement indi-cates that the bulk of the correlation is being driven by the M star − Z star relation and not an independent measurementof metallicity. To illustrate this, we ﬁt a simple quadratic equation tothe log M star − log Z star relationship in the simula-tions to predict a metallicity for each galaxy assumingno noise, and then we check how much improvement weget from the noise-free neural network when compared to this simple M star -based prediction. Fig. 6 shows a2-D histogram of this improvement, with points above 0being an improvement on the polynomial ﬁt, and pointsbelow 0 having an increased bias. The bulk of our neuralnetwork predictions do improve on the simple model, by1-2 σ , but given that our data spans more than 2 ordersof magnitude, an improvement of 0.1-0.2 dex indicatesthat the simple stellar mass-metallicity relationship canexplain a good fraction of our predicted metallicities.The additional improvement is also suggested by thefact that the metallicity network performs better with-out magnitudes, while the stellar mass network performsbetter with them, meaning that the metallicity networkmust be learning diﬀerent information than stellar massalone. Still, this eﬀect suggests that, for any method,if metallicity is not well-constrained by the data, usinga stellar mass-based prior with appropriate uncertaintymay produce adequate results.4.1.3. Star formation rate

For the star formation rate, we have slightly higherbias and uncertainty. The bias of − .

053 dex corre-sponds to a 12 percent error in linear space, visible inthe histogram of Fig. 7 as the blue prediction histogrampeaking slightly higher than the true distribution in red.Much of the uncertainty is contributed by points withvery low star formation rates. Fig. 7 shows the perfor-mance as a function of predicted star formation rate aswell as a histogram of true and predicted values.4.2.

Redshift eﬀects Noise Input Number Average 3 σ outlierlevel columns of steps Loss / bias Uncertainty rateNone m 100000 0.297 − .

063 0.291 0.016None c 100000 0.178 − .

014 0.178 0.015None C 100000 0.179 − .

004 0.179 0.016None mc 100000 0.142 0.019 0.141 0.014

None mC 100000 0 .

150 0 .

053 0 .

141 0 . σ mC 10000 0.213 − .

043 0.227 0.018Full noise mC 10000 0.296 − .

088 0.304 0.023

Table 3.

Summary of the performance of the neural networks for log SF R . Input column codes are: m, magnitudes; c,pairwise colors; C, all colors. The best-performing metric for the noise-free case is highlighted in bold text. See section 4 formore details on column choices and performance metrics. P r e d i c t e d l o g S F R - t r u e l o g S F R No noise Simulated noise SFR d N / d l o g S F R Figure 7.

Neural network performance reproducing average star formation rate on the validation set of the simulation galaxies.Left panels are noise-free, right panels have simulated observational noise. Top panel: Prediction error relative to the true valuesas a function of predicted average star formation rate. As in the above ﬁgures, the median and 1-, 2-, and 3 σ percentile contoursare shown as the dark blue line and the three lighter blue regions, respectively. Outliers are shown as light-blue crosses. Thewhite line is zero bias. The large uncertainty in average star formation rate can be seen clearly, with an order of magnitude inuncertainty for the least star-forming galaxies. A bias is visible even in the well-constrained regions near the peak of the starformation rate distribution. Bottom panel: The histograms of predicted and actual star formation rate. The histograms are notmarkedly diﬀerent, although the diﬀerence is signiﬁcant given the number of objects in each bin..Input Number AverageProperty columns of steps Loss / bias Uncertainty ∆ | bias | ∆ uncertainty M star mCz 100000 0.046 -0.010 0.045 0.001 0.011 Z star Cz 100000 0.078 0.009 0.077 − . − . SF R mCz 100000 0.136 − .

004 0.136 − .

049 0 . Table 4.

Summary of the performance of the neural network for diﬀerent predictions when redshift is included as an inputcolumn. Input column codes are as in Table 3. We show only the best-performing input column set from the case withoutusing redshifts to examine the improvement redshift provides. See section 4 for more details on column choices and performancemetrics. We note that the value of bias for star formation rate, − . PERFORMANCE ON CANDELS DATAWe need to explore if networks trained on semianalyticmodels can be applied to observed galaxy data. We usethe CANDELS data from the GOODS-S ﬁeld describedin Section 2.1, with the physical parameters measuredthrough template ﬁtting. From Mobasher et al. (2015),we know that diﬀerent template-ﬁtting methods can re-sult in diﬀerences in the stellar masses of 0.2 dex. Thisgives an estimate of the accuracy we should aim for inour estimate of the stellar mass by this neural networktechnique.We show the performance of the networks for all threenoise cases (no noise, noise similar to galaxies with > σ detections, and noise similar to the full CANDELS dataset) in Table 5. The neural networks trained on semian-alytic data without observational noise do not reproducethe data well. This is not surprising: the power of neuralnetworks is that they reproduce high-dimensional non-linear structure, so perturbing the data by some errorhas nonlinear eﬀects on the prediction. As we add noise,the performance on the data improves, even though theperformance on the mock catalogs worsens. Interest-ingly, we sometimes obtain lower bias or uncertaintywhen the method is applied to CANDELS data thanwe did on the simulations. This is because the simula-tions contain a larger population of faint galaxies thanthe CANDELS data, and the increased relative noise infainter galaxies decreases the accuracy and precision ofour predictions. We show the performance of the fullnoise case for two quantities ( M star and SF R , as thetemplate-ﬁtting outputs did not include metallicity) inFig. 8. The fact that the networks perform better fornoisier data indicates that some subtle diﬀerence be-tween the simulation and the real galaxies is a likelyculprit for some of our bias and uncertainty in the realdata. Adding simulated noise, as we have done here, is not a perfect solution, since selection cuts to match theCANDELS data were applied to the noise-free data.The diﬀerence between the neural network predictionsand the template-ﬁtting predictions for the stellar massare within our expectation of 0.2 dex for diﬀerences be-tween template-ﬁtting methods with diﬀerent assump-tions. However, the star formation rate numbers aresigniﬁcantly more discrepant, although even that diﬀer-ence is likely explained by a combination of diﬀerentassumptions and diﬀerences in the data, rather than bydiﬀerences in the data alone. The large uncertainty inboth measurements is likely also due to a combinationof diﬀerent template-ﬁtting assumptions and diﬀerentdata; we cannot tell which eﬀect is dominant from ourmeasurements alone. SUMMARY AND DISCUSSIONWe train neural networks on semianalytic models topredict from photometric data three galaxy propertiesof scientiﬁc interest: stellar mass, stellar metallicity, andaverage star formation rate. In the absence of noise–the best-case scenario–we achieve excellent accuracy andprecision on all properties, indicating that the mappingbetween galaxy properties and galaxy photometry haslow enough noise and few enough degeneracies to bea good target for neural networks. Injecting artiﬁcialphotometric noise degrades the performance on a re-served sample of semianalytic galaxies, but allows thealgorithm to perform better on real data from CAN-DELS, which contains noise.Our accuracy and precision show that semianalyticgalaxies can be used to train neural networks that canproduce photometric stellar masses, metallicities, andstar formation rates, and that these galaxy propertiesare accessible targets for machine learning. However,our performance on noisy simulated data is not yet fullycompetitive with mature template-ﬁtting and machine-learning methods from the literature. There are sev-eral likely reasons for this discrepancy. First, many ofthese methods have been under development for yearsand have reached a level of complexity not matched byour simple implementation here. Second, because ourmodel includes many more free parameters, we may betrading oﬀ precision (a narrower set of outputs generatedby a template) with accuracy: our mean bias is lower forstellar mass, as we mentioned above, but our uncertaintyis higher. Third, as mentioned in Section 4.1.2, we haveused a wide range of ﬂuxes and redshifts, a calibrationset that may be more ambitious than some of the datasets we are comparing to. Still, as a ﬁrst-step implemen-tation of neural networks on this data, we have achievedgood accuracy in predicting all galaxy properties.3

Input Number Noise Average 3 σ outlierProperty columns of steps property bias Uncertainty rate M star mC 10000 No noise 0.270 0.950 0.014 M star mC 10000 5 σ cut 0.208 0.581 0.029 M star mC 10000 Full noise 0.149 0.559 0.023 SF R mC 10000 No noise − .

06 2.14 0.019

SF R mC 10000 5 σ cut − .

591 1.43 0.029

SF R mC 10000 Full noise − .

605 1.37 0.031

Table 5.

Summary of the performance of the neural network trained on semianalytic galaxies with simulated photometric noiseand applied to CANDELS data for diﬀerent predictions. Input column codes are as in Table 3. P r e d i c t e d l o g M s t a r - t r u e l o g M s t a r M star d N / d l o g M s t a r TruePredicted 2.01.51.00.50.00.51.01.52.0 P r e d i c t e d l o g S F R - t r u e l o g S F R SFR0.00.10.20.30.40.5 d N / d l o g S F R TruePredicted

Figure 8.

Summary statistics of performance of the neural network trained on semianalytic galaxies with simulated photometricnoise and applied to CANDELS data. In each subﬁgure, the top panel shows the prediction error relative to the true values asa function of the predicted value for the galaxy property, and the bottom panel shows a histogram of the distributions of thegalaxy property in the data and in our predictions. As before, in the top panel, the blue bars are the 1-, 2-, and 3 σ percentilecontours, while the median is in dark blue and the white line is zero error. The left-hand pair is stellar mass, and the right-handpair is average star formation rate. Because the distributions are not symmetric about the median, the median biases shown inthis ﬁgure are not equal to the mean bias from Table 5. The performance on real CANDELS data is not asaccurate or precise as comparison data sets from the lit-erature (Mobasher et al. 2015). Some of that is drivenby our decreased accuracy in the presence of noise evenfor the semianalytic galaxies, although the fact that thegalaxies in CANDELS are brighter, on average, thanour semianalytic galaxies means that in some cases wedo better on the average CANDELS galaxy than wedo on the average semianalytic galaxy. Also, we notethat the photometry in the semianalytic catalog was de- signed to match the CANDELS observed distributions,but since the CANDELS data are noisy and the semi-analytic galaxies are not, we expect some mismatch be-tween the locations of the underlying noise-free mani-folds. Additionally, we do not expect perfect agreement,since the template-ﬁtting results rely on other parame-ters (such as redshift and metallicity) that may diﬀerfrom the values in our semianalytic catalogue; these dif-ferences in assumptions explain some of our observeddiscrepancies.4Interestingly, our results show that adding redshift in-formation to our training sample results in either minorimprovement or in degradation of accuracy and preci-sion, indicating that the networks are capable of learningthe relevant mapping between color, redshift, and othergalaxy properties without explicit redshift information.This indicates that competitive accuracy and precisioncan be achieved even for galaxies that lack spectroscopicredshifts, greatly increasing the available sample sizesfor future studies.We are able to obtain good accuracy and precisionfor stellar metallicities, which is traditionally the mostdiﬃcult to measure of the galaxy properties. This ac-curacy and precision is driven by a strong relationshipbetween metallicity and stellar mass. We improve on theaccuracy obtained by simply relying on a stellar mass–metallicity relation, but the improvement is order 0.1dex, compared to the more than two orders of magni-tude range in the parameter space, indicating that thepredictive accuracy is dominated by the stellar mass–metallicity relation. This suggests the use of a strongstellar mass-metallicity prior when trying to obtain stel-lar metallicities from low-resolution data.Future work will be needed to develop the machinelearning architecture to a higher level of complexity andprecision in order to be competitive in accuracy withcurrently-existing methods. However, our work showsthat simple machine learning methods that do not knowabout the physics of galaxy formation and evolution canin principle reproduce galaxy property measurements with high accuracy, as long as the training set and thedata of interest are consistent with one another andthe machine learning is carefully trained to handle de-generacies and noise in the data. This allows for notonly the use of semianalytic models as a training setfor galaxy property measurement, but also a multiplica-tion of template-ﬁtting eﬀorts–for example, if template-ﬁtted stellar masses are available for a representativespectroscopic subset of a large photometric survey, thenmachine learning is a computationally eﬃcient way toextend those template-ﬁtting results to the larger pho-tometric data set. As we have shown here, machinelearning methods can learn even complicated relation-ships between galaxy properties and photometry mea-surements drawn from a small number of ﬁlters, indicat-ing that scientiﬁcally interesting galaxy properties canbe measured with reasonable computational time on fu-ture datasets from large-scale photometric surveys.Support for this work was provided by the Universityof California Riverside Oﬃce of Research and EconomicDevelopment through the FIELDS NASA-MIRO pro-gram. A portion of this research was carried out atthe Jet Propulsion Laboratory, California Institute ofTechnology, under a contract with the National Aero-nautics and Space Administration. We would like toacknowledge Dritan Kodra for the updated CANDELSphotometric redshift catalog.REFERENCES arXiv:1804.00816 )Arnouts S., Cristiani S., Moscardini L., Matarrese S.,Lucchin F., Fontana A., Giallongo E., 1999, MNRAS,310, 540Ball N. M., Brunner R. J., Myers A. D., Strand N. E.,Alberts S. L., Tcheng D., 2008, ApJ, 683, 12Behroozi P. S., Wechsler R. H., Wu H.-Y., 2013a, ApJ, 762,109Behroozi P. S., Wechsler R. H., Wu H.-Y., Busha M. T.,Klypin A. A., Primack J. R., 2013b, ApJ, 763, 18Bilicki M., et al., 2017, preprint, ( arXiv:1709.04205 )Bolzonella M., Miralles J.-M., Pell´o R., 2000, A&A, 363,476 Bouwens R. J., et al., 2010, ApJL, 709, L133Braak C. J. F. T., 2006, Statistics and Computing, 16, 239Brammer G. B., van Dokkum P. G., Coppi P., 2008, ApJ,686, 1503Bruzual G., Charlot S., 2003, MNRAS, 344, 1000Calzetti D., Armus L., Bohlin R. C., Kinney A. L.,Koornneef J., Storchi-Bergmann T., 2000, ApJ, 533, 682Carrasco Kind M., Brunner R. J., 2013, MNRAS, 432, 1483Carrasco Kind M., Brunner R. J., 2014, MNRAS, 442, 3380Chabrier G., 2003a, PASP, 115, 763Chabrier G., 2003b, PASP, 115, 763Croton D. J., et al., 2006, MNRAS, 365, 11Dahlen T., et al., 2013, ApJ, 775, 93Dawson K. S., et al., 2013, AJ, 145, 10Dom´ınguez S´anchez H., Huertas-Company M., BernardiM., Tuccillo D., Fischer J. L., 2018, MNRAS, 476, 3661Drinkwater M. J., et al., 2010, MNRAS, 401, 1429Driver S. P., et al., 2012, MNRAS, 427, 3244 Drlica-Wagner A., et al., 2018, ApJS, 235, 33Freeman P. E., Newman J. A., Lee A. B., Richards J. W.,Schafer C. M., 2009, MNRAS, 398, 2012Gerdes D. W., Sypniewski A. J., McKay T. A., Hao J., WeisM. R., Wechsler R. H., Busha M. T., 2010, ApJ, 715, 823Giavalisco M., et al., 2004, ApJL, 600, L93Glorot X., Bordes A., Bengio Y., 2011, in Gordon G.,Dunson D., Dudk M., eds, Proceedings of MachineLearning Research Vol. 15, Proceedings of the FourteenthInternational Conference on Artiﬁcial Intelligence andStatistics. PMLR, Fort Lauderdale, FL, USA, pp315–323,http://proceedings.mlr.press/v15/glorot11a.htmlGnedin N. Y., 2000, ApJ, 542, 535Goodfellow I., Bengio Y., Courville A., 2016, DeepLearning. MIT PressGrogin N. A., et al., 2011, ApJS, 197, 35Guo Y., et al., 2013, ApJS, 207, 24Hildebrandt H., et al., 2010, A&A, 523, A31Hogan R., Fairbairn M., Seeburn N., 2015, MNRAS, 449,2040Ilbert O., et al., 2006, A&A, 457, 841Ilbert O., et al., 2009, ApJ, 690, 1236Kingma D. P., Ba J., 2014, CoRR, abs/1412.6980Klypin A. A., Trujillo-Gomez S., Primack J., 2011, ApJ,740, 102Koekemoer A. M., et al., 2011, ApJS, 197, 36Kravtsov A. V., Gnedin O. Y., Klypin A. A., 2004, ApJ,609, 482 LSST Science Collaboration et al., 2009, preprint,( arXiv:0912.0201 )Lanusse F., Ma Q., Li N., Collett T. E., Li C.-L.,Ravanbakhsh S., Mandelbaum R., P´oczos B., 2018,MNRAS, 473, 3895Laureijs R., et al., 2011, preprint, ( arXiv:1110.3193 )Lilly S. J., et al., 2007, ApJS, 172, 70Lima M., Cunha C. E., Oyaizu H., Frieman J., Lin H.,Sheldon E. S., 2008, MNRAS, 390, 118Lu Y., et al., 2014, ApJ, 795, 123Masters D., et al., 2015, ApJ, 813, 53Meurer G. R., Heckman T. M., Calzetti D., 1999, ApJ, 521,64Mitchell P. D., Lacey C. G., Baugh C. M., Cole S., 2013,MNRAS, 435, 87Mobasher B., et al., 2015, ApJ, 808, 101Moustakas J., et al., 2013, ApJ, 767, 50Riess A. G., et al., 2007, ApJ, 659, 98Sadeh I., Abdalla F. B., Lahav O., 2016, PASP, 128, 104502Salim S., et al., 2007, ApJS, 173, 267Salvato M., Ilbert O., Hoyle B., 2018, Nature Astronomy,S´anchez C., et al., 2014, MNRAS, 445, 1482Spergel D., et al., 2015, preprint, ( arXiv:1503.03757 )Tanaka M., et al., 2018, PASJ, 70, S9Tewes M., Kuntzer T., Nakajima R., Courbin F.,Hildebrandt H., Schrabback T., 2018, preprint,( arXiv:1807.02120 )Windhorst R. A., et al., 2011, ApJS, 193, 27Wu J. F., Boada S., 2018, preprint, ( arXiv:1810.12913arXiv:1810.12913