Uncertain Photometric Redshifts with Deep Learning Methods
aa r X i v : . [ a s t r o - ph . I M ] M a r Astroinformatics 2016Proceedings IAU Symposium No. 325, 2016A.C. Editor, B.D. Editor & C.E. Editor, eds. c (cid:13) Uncertain Photometric Redshifts with DeepLearning Methods
A. D’Isanto † Heidelberg Institute for Theoretical Studies (HITS)Schloss-Wolfsbrunnenweg 35, 69118 Heidelberg - GERMANYemail: [email protected]
Abstract.
The need for accurate photometric redshifts estimation is a topic that has fundamen-tal importance in Astronomy, due to the necessity of efficiently obtaining redshift informationwithout the need of spectroscopic analysis. We propose a method for determining accurate multi-modal photo-z probability density functions (PDFs) using Mixture Density Networks (MDN)and Deep Convolutional Networks (DCN). A comparison with a Random Forest (RF) is per-formed.
Keywords. techniques: galaxies: distances and redshifts, photometric, methods: data analysis,surveys, (galaxies:) quasars: general etc.
1. Introduction
Determination of distances for astronomical objects through redshift acquired in therecent years an increasingly importance, having a fundamental role in cosmological re-search. In fact, it is well known that redshift is a fundamental step of the cosmic distanceladder. Redshift is traditionally obtained through spectroscopic analysis but due to longintegration times and costly instrumentation requirements, it is not possible to measureit for all objects. Therefore, a convenient alternative is the estimation of photometric red-shifts, e.g. based on measurements of pure photometry. However, the uncertainty of suchan approach is much higher than the measurement errors obtained from spectroscopy.For this reason, the astronomical community has focused in the uncertainty quantifica-tion of redshift estimates through probability density functions (PDFs), instead of usingsimple point estimates. In this work we propose two neural network models based onMixture Density Networks (MDN) (Bishop 1994). We use a deep MDN as first architec-ture, designed to use photometric features as inputs and to generate PDFs. The secondarchitecture is a combination of a Deep Convolutional Network (DCN) (LeCun et al.1998) with a MDN with the purpose to obtain photo-z PDFs based on images as in-put. We will show that this approach achieves better predictions due to its use of imagedata that - in contrast to using pre-defined features - allows to capture more details ofthe objects. We compare the results obtained with a commonly used tool in the relatedliterature, the Random Forest (RF) (Breiman 2001).
2. Deep learning algorithms
In the next two subsections we give a description of the deep learning algorithms usedfor the experiments. † The author gratefully acknowledges the support of the Klaus Tschira Foundation.
Table 1.
DCMDN architecture
Mixture Density Network
A Mixture Density Network (Bishop 1994) is the combination of a feed-forward neuralnetwork and a Gaussian mixture model. The outputs of the network parametrize theGaussian mixture p ( θ | x ) = P nj =1 ω j N ( µ j , σ j ), i.e. they define the means, variances, andweights. Thus the MDN produces a multi-modal PDF suitable for the case of photo-z,which a flexible enough to represent a multi-modal behavior. The means, variances andweights, are then obtained by the outputs z of the network: µ j = z µj , σ j = exp( z σj ) , ω j = exp( z ωj ) P ni =1 exp( z ωi ) . (2.1)Normally the MDN uses negative log-likelihood as a loss function, but in this work weuse the continuous rank probability score (CRPS) (Gneiting et al. 2005) as loss function.This is to obtain a trained MDN which produces PDFs both well calibrated and sharpas measured by the CRPS, as explained in detail in Polsterer et al. (2016).2.2. Deep Convolutional Network
A Deep Convolutional Network is a model in which several convolutional and sub-sampling layers are coupled with a fully-connected network. This architecture is par-ticularly meant to learn from raw image data. In our case, we want to estimate redshiftsdirectly from images, without the need to extract photometric features, so we couple aDCN with a MDN, in order to produce photo-z PDFs directly from SDSS images. Wealternate convolutional and pooling layers to generate feature maps and generate a hier-archically compressed representation of the input data. The output of the convolutionalnetwork is then taken as input for the MDN which produces a multi-modal predictivedensity for photo-z. Thereby the extraction of the feature maps is automatically doneby the network. Those obtained feature maps are then taken as inputs for the fully-connected part. We choose a modified version of the LeNet-5 architecture (LeCun et al.1998), properly coupled with the presented MDN (see Section 2.1), obtaining what wecall a Deep Convolutional Mixture Density Network (DCMDN). In Tab. 1 there is thearchitecture of the DCMDN used for the experiments, designed to run on GPUs, usinga cluster equipped with Nvidia Titan X. ncertain Photometric Redshifts with Deep Learning Methods
3. Experiments and Analysis
The data we use for the experiments are taken from the Sloan Digital Sky SurveyQuasar Catalog V (Richards et al. 2010), based on the 7-th data release of the SloanDigital Sky Survey (SDSS), consisting in 105 ,
783 spectroscopically confirmed quasars, ina redshift range between 0 .
065 and 5 .
46. For the experiments we use a random subsampleof 50 ,
000 patterns. For each pattern we take the five ugriz magnitudes as input featuresand the respective images in the same bands. Finally, we compare the performances ofMDN and DCMDN with the widely used RF.The RF, in its original architecture, is not meant to produce PDFs. In order to obtaina distribution, we first collect the predictions z t,n of each individual decision tree t inthe forest, for every n -th data item. We take T = 256 number of trees in the forest anddefine the PDF for the RF by fitting a mixture of 5 Gaussian components to the outputs, p ( θ | x ) = P j =1 ω j N ( θ | ( µ j , σ j )), as we described also in Section 2.1 for the MDN. z )012345 e s t i m a t e d r e d s h i f t ( z ) Random Forest summed probability density0.0 0.5 1.0 r e l a t i v e f r e q u e n c y PIT r e l a t i v e f r e q u e n c y CRPS = 0.1992 z )012345 e s t i m a t e d r e d s h i f t ( z ) Mixture Density Network summed probability density0.0 0.5 1.0 r e l a t i v e f r e q u e n c y PIT r e l a t i v e f r e q u e n c y CRPS = 0.2143 z )012345 e s t i m a t e d r e d s h i f t ( z ) DCMDN summed probability density0.0 0.5 1.0 r e l a t i v e f r e q u e n c y PIT r e l a t i v e f r e q u e n c y CRPS = 0.1812
Figure 1.
Results of the predictions obtained with the MDN and the DCMDN, comparedwith the RF results. For each experiment, three plots are given. The upper plots comparethe spectroscopic redshift with the predictive distributions produced by the models, where thecolor indicates the summed probability density of the distributions. In the two lower plots, thehistogram of the PIT values and the histogram of the individual CRPS values, are shown. Themean CRPS value is also given.
For the RF and the MDN we use as input the 5 magnitudes plus all the possiblecolor combinations, obtaining a 15-dimensional feature vector, respectively. The gener-ated training and test set both contain 25 ,
000 patterns. The DCMDN is trained on theimages, that are obtained using the
Hierarchical Progressive Surveys data partitioningformat (Fernique et al. 2015) and performing a proper cutout on client side, in order toobtain the desired dimensions (28x28). Each pattern is originally a stack of 5 images inthe ugriz filters, where every pixel is converted from flux units to luptitudes (Lupton etal. 1999). As done with the usual features, we additionally form the color images fromthe ugriz images by taking all possible pairwise differences, thus obtaining a stack of 15 A. D’Isantoimages; every object/pattern is then represented by a tensor of dimensions 15x28x28.In order to have a rotational invariant network, we perform data augmentation, takingrotations of each image at 0, 90, 180, 270 degrees. By doing so, we obtain a training setof 100 ,
000 images, a validation set of 50 ,
000 images and a test set of 50 ,
000 images.Dropout is also applied to limit overfitting.The results of the experiments are reported in Fig. 1. Following Polsterer et al. (2016),we use two statistical tools: the CRPS as a score function, and the probability integraltransform (PIT) histogram (Gneiting et al. 2005), in order to obtain a visual estimationof the quality of the produced PDFs. In the RF experiment, the model reaches a CRPSof 0.20 and the PIT shows a bit of overdispersion. The performance of the MDN is abit worse than the RF in terms of the CRPS, with a score of 0.21, but exhibits a bettercalibrated PIT. Using the DCMDN architecture we achieve the best results in terms of theCRPS, with a score of 0.19. The resulting PIT is acceptable, although it is still showingsome underdispersion. The reason for the better overall performance of the DCMDN isthat the features-based approach use only a fraction of the available information. In fact,in the process of features extraction a lot of information gets lost. Instead, using images,the DCMDN is able to automatically determine thousands of features, leading to a betterprediction of the photo-z PDFs.
4. Conclusions
Main purpose of this work is to show a method to produce photo-z PDFs using deeplearning architectures. We generate very good probabilistic predictions based on featuresor images as input, producing a Gaussian mixture model as output. Our proposed ar-chitectures show better performances in the comparison carried out with a RF basedmethod. In particular, we show that the proposed DCMDN gives the best performance,as it is able to use the entire information contained in the images. As showed by thePIT analysis, some optimization with respect to calibration can still be done, in orderto deal with some dispersion phenomena. We firmly believe that the presented methodneeds little improvements to become a standard in predicting photo-z PDFs. As regres-sion problems are very common in Astronomy, this approach can easily be applied tomany other scientific topics.
References
Christopher M. Bishop. Mixture density networks. Technical report, 1994.Y. LeCun, L. Bottou, Y. bengio, and P. Haffner. Gradient-based learning applied to documentrecognition.
Proceedings of the IEEE , 86(11):2278-2324, November 1998Leo Breiman. Random forests.
Mach. Learn. , 45(1):5-32, October 2001.T. Gneiting, A. E. Raftery, A. H. Westveld, and T. Goldman. Calibrated Probabilistic Forecast-ing Using Ensemble Model Output Statistics and Minimum CRPS Estimation.
MonthlyWeather Review , 133:1098, 2005K. L. Polsterer, A. D’Isanto, and F. Gieseke. Uncertain photometric redshifts. 2016G. T. Richards, P. B. Hall, D. P. Schneider, et al., VizieR Online Data Catalog: The SDSS-DR7quasar catalog (Schneider+, 2010).
VizieR Online Data Catalog , 7260, May 2010P. Fernique, M. G. Allen, et al. Hierarchical progressive surveys. Multi-resolution HEALPixdata structures for astronomical images, catalogues, and 3-dimensional data cubes.