Photometric Catalogue of Quasars and Other Point Sources in the Sloan Digital Sky Survey
Sheelu Abraham, Ninan Sajeeth Philip, Ajit Kembhavi, Yogesh G Wadadekar, Rita Sinha
aa r X i v : . [ a s t r o - ph . I M ] A ug Mon. Not. R. Astron. Soc. , 000–000 (0000) Printed 26 September 2018 (MN L A TEX style file v2.2)
A Photometric Catalogue of Quasars and Other PointSources in the Sloan Digital Sky Survey
Sheelu Abraham, ⋆ Ninan Sajeeth Philip, † Ajit Kembhavi, ‡ Yogesh G Wadadekar § and Rita Sinha ¶ , St. Thomas College, Kozhencheri 689641, India Inter-University Centre for Astronomy and Astrophysics, Post Bag 4, Ganeshkhind, Pune 411007, India. National Centre for Radio Astrophysics,TIFR, Post Bag 3, Ganeshkhind, Pune 411007, India. Elviraland 194, 2591 GM The Hague, The Netherlands; formerly with [3]
26 September 2018
ABSTRACT
We present a catalogue of about 6 million unresolved photometric detections in theSloan Digital Sky Survey Seventh Data Release classifying them into stars, galaxiesand quasars. We use a machine learning classifier trained on a subset of spectro-scopically confirmed objects from 14th to 22nd magnitude in the SDSS i -band. Ourcatalogue consists of 2,430,625 quasars, 3,544,036 stars and 63,586 unresolved galax-ies from 14th to 24th magnitude in the SDSS i -band. Our algorithm recovers 99.96%of spectroscopically confirmed quasars and 99.51% of stars to i ∼ i = 21 . Key words: astronomical data bases: miscellaneous – catalogues – techniques: pho-tometric – methods: statistical – surveys.
There has been a surge in the number of large astronomi-cal surveys trying to map the deep sky in terms of its con-stituents, their number density and evolution since the earlyepochs. The Sloan Digital Sky Survey (SDSS, York et al.2000) is one such survey, covering about a quarter of thesky and providing photometry in five optical bands for ∼
357 million objects in its seventh and final SDSS-II datarelease (DR7, Abazajian et al. 2009). As less than one per-cent of these have been spectroscopically observed as a partof the survey, the exact nature of an overwhelming numberof objects in the survey remains unconfirmed. This situationwill also prevail with future large surveys, where the gap be-tween imaging and spectroscopy is only expected to widen.It is, therefore, necessary to develop techniques to identifyobjects reliably on the basis of their photometric data, us-ing the colours of spectroscopically identified objects as a ⋆ [email protected] † [email protected] ‡ [email protected] § [email protected] ¶ [email protected] guide. In this paper, we will describe the use of a machinelearning classifier to classify unresolved objects from DR7into three categories (quasars, stars and galaxies), using asample of spectroscopically confirmed objects for trainingthe classifier.The SDSS images objects in five bands with its imagingcamera (Gunn et al. 1998, 2006), allowing one to view eachobject in four independent colours. The colours can be usedto identify spectral classes of the objects. Stoughton et al.(2002) describe this in detail using SDSS early data releaseof spectroscopically confirmed objects. The same methodis used for the SDSS spectroscopic quasar candidate se-lection pipeline described by Richards et al. (2002). Thiscandidate selection procedure, along with followup spec-troscopy, has been used to produce a catalogue of over100,000 spectroscopically confirmed quasars for SDSS DR7,making it the largest available spectroscopic quasar cat-alogue. SDSS has undergone several improvements in itsphotometric quality (Adelman-McCarthy et al. 2008) andother researchers have also made use of colour as a se-lection tool for preparing catalogues of different objectsof interest. These include photometric identification ofgalaxies (Oyaizu et al. 2008), quasars (Richards et al. 2004,2009a), stars (Covey et al. 2007) and photometric redshift c (cid:13) Sheelu Abraham et al. (Niemack et al. 2009; Richards et al. 2009b) with reliableaccuracies.The colours of quasars and stars are sufficiently distinctfor them to be reasonably well separated in a two-colour di-agram, say
U-B against
B-V , but there is significant over-lap between the two classes and a more reliable separationrequires a larger number of colours. Fig. 1 shows a typi-cal colour distribution of spectroscopically confirmed ( u-g against g-r ) unresolved objects in the SDSS. It is apparentthat the relative number of quasars, stars and galaxies varywidely over different regions of the colour plane. This is areflection of the fact that objects with intrinsically differentspectra occupy different regions of the multi-colour space.However, there is substantial overlap of different types. Theblack box encloses the region of u-g and g-r colour space con-taining the highest density of the SDSS confirmed quasars.The region has almost equal number of low redshift quasars( z ≈ ∼ . Figure 1.
Different spectral classes of spectroscopically con-firmed unresolved objects in u-g − g-r colour-colour space isshown in the upper panel. The black box shows the 2D projec-tion of the region used in the present study. Although it mightappear to be dominated by low redshift quasars (blue), this re-gion also has a large number of main sequence stars (green). TheSDSS spectroscopic target selection algorithm had efficiently fil-tered out many stars and that is why they appear in relativelyfewer number in the plot. In addition to these, the region is occu-pied by high redshift quasars (red), a few late type stars (orange)and galaxies (pink). About 84 per cent of all known SDSS and2dF quasars (blue) are within this colour window (black) coveringthe full redshift range upto z ∼ magnitudes will confirm the actual class of those objects.We also briefly discuss how machine learning tools can beused to identify outliers and errors in the data.The organization of the paper is as follows: Section 2describes briefly the dataset and the colour selection crite-ria. The classifier is described in Section 3. The constructionof training and test data, test results and their analysis arepresented in Section 4. The catalogue, its format, the resultsof cross-matching, quasar number density and completenessare described and presented in Section 5. Finally, we sum-marise results in Section 6. c (cid:13) , 000–000 Photometric Catalogue The DR 7 catalogue contains spectra of 930,000 galaxies,120,000 quasars and 460,000 stars (DR7, Abazajian et al.2009). The SDSS has five filters, namely u, g, r, i and z that give spectral coverage from ∼ ) for the training part ofthis study. ’SpecPhoto’ consists of the SDSS spectroscopi-cally confirmed detections with clean spectra. The spectralclassification in this table is labelled ’SpecClass’ and is givennumerical labels ranging from 1 to 6 to represent the differ-ent spectroscopic types.The objects used in this study are unresolved pointsources which have SDSS i -band point spread function (psf)magnitude ranging from 14 to 24 and which occupy thecolour window region defined by the colour cuts in Table 1.This region has 106,466 unresolved spectroscopically classi-fied objects that we used for training and testing our classi-fier. This population contains similar numbers of stars andquasars with a small number of faint unresolved galaxiesscattered over the region. The data for the photometriccatalogue that we describe later also was taken from thesame region. This gave us three groups of data. The firsttwo groups, namely, training and testing data have spectro-scopic confirmation of their identity. The training data areused to adjust the parameters of our classifier during thetraining process and the quality of training achieved is as-sessed using test data. In the testing round, the classifierpredicts the identity of the object based on what it learnedfrom the training data. Since the spectroscopic identities fortest data is available, this allows us to determine the com-pleteness and contamination in the predicted classes. Allunresolved objects from the region that had spectroscopicconfirmation were included in the test data, while a smallersubset of about 10 per cent of it were used as the train-ing data. The third dataset in the group, referred to as theprediction data was the larger dataset that included all unre-solved point sources in the region irrespective of whether ornot they had a spectroscopic confirmation. The predictionsmade on these data are compared with 29 publicly availablecatalogues to determine the accuracy of our predictions atbrighter magnitudes. We describe the details of the trainingand testing procedure in section 4. All magnitudes used inthis paper are ¨uber-calibrated psf magnitudes described byPadmanabhan et al. (2008). The ’¨ubercalibration’ improvesthe photometric fidelity of SDSS data that represent themost robust photometric measurements and was first intro-duced with DR6 data. SDSS reports the photometric mea-surements in asinh magnitudes (Lupton et al. 1999). Themagnitudes used have all been corrected for galactic extinc-tion. http://cas.sdss.org/dr7 We used a Difference Boosting Neural Network (DBNN)(Philip & Joseph 2000) classifier which is a Bayesian super-vised learning algorithm. The DBNN has been used in thepast for successful star-galaxy classification (Philip et al.2002), galaxy morphology classification (Odewahn et al2002; Goderya et al 2004) and quasar candidate identifica-tion (Sinha et al. 2007) problems. Bayes theorem allows oneto compute the probability for an event to occur based onsome prior belief and a likelihood for the event to be relatedto an observation. The prior belief is the domain knowledgeabout the event and usually is the most difficult quantityto estimate correctly. The estimation of the likelihood alsocan be difficult when there is conditional dependence be-tween the observations. For example, saying that the colourof an object is red alone does not allow one to say whatthe object is. Some additional information related by thelogical
AND operation is to be associated with the colourto make the communication meaningful. We refer to this asconditional dependence. In such situations, the likelihoodhas to be computed in consideration of all the associatedconditions, thereby making Bayesian estimation computa-tion intensive.The DBNN suggests that binning can be used to as-certain conditional independence on the observations. Thismight appear to be an unrealistic constraint; however, it isnot. Classification inherently demands uniqueness in obser-vations. If we bin the observed feature sufficiently narrowly(like a histogram with small bin sizes), each bin would cap-ture the likelihood for the feature to occupy a certain binlocation. If we also make separate bins for each class, thosewill represent the likelihood for the feature to be in a givenbin for each class. Now, the value of a feature will alwaysimpose some constraint on the possible values the other fea-tures can have. The method works if there are sufficientlylarge number of events (counts) as compared to the numberof bins to give a faithful estimate of the likelihood. For thisstudy, our training sample has about 14,356 objects and weuse 61 bins for each feature. This is not a critical numberand the results would not have changed if we had used asomewhat different number. The procedure is to start withsmall values and gradually increase the bin size until the dif-ferent classes and their diversities are adequately capturedby the learning algorithm. Since there are no quantitativemeasures for the adequate capture of diversities, it is of-ten computed, by trial and error, as the best bin size thatmaximises the prediction accuracy by the classifier. A greatadvantage of the binning scheme is that conditional inde-pendence allows the posterior Bayesian probability to becomputed as the product of the individual probabilities andthus significantly simplifies the computational overheads. Inaddition, the binning scheme allows the classifier to havesome of the advantages of non-parametric classifiers whileretaining some of the advantages of a parametric classifierthat the effective number of features does not grow with thesize of the dataset.A second issue that often affects the Bayesian compu-tation is the uncertainty of the prior. The binning schemehas further complicated the estimation of the prior sincewe now need to know the prior for the likelihood for eachbin of the input feature to compute the posterior. DBNN c (cid:13) , 000–000 Sheelu Abraham et al. resolves this issue by computing the prior from the data.In the Bayes formula, the prior appears as a multiplicativeterm. The DBNN initially assumes a flat prior for all thebins. In the training phase, when a set of features are givenas input, DBNN makes a prediction about its class based onits current prior and likelihood. If the prediction is wrong,it updates its prior, which is called weight, using a gradientdescent algorithm. The important point is that the gradientdescent is computed based on the differences in the esti-mated probabilities for the predicted class and the real classof the sample and hence the computation is largely devoidof fluctuations due to outliers.In principle the classifier is able to compute the proba-bility for the sample to be a member of each of the classes.But in practical situations, the most likely class will havethe highest confidence and the second one might have agood share of the remaining confidence. In our software im-plementation, the classifier is able to make two predictionsalong with the associated confidence it has in each of thepredictions. This information can be extended to identifyoutliers, reduce contamination and to some extent, evaluatethe limitations of the features.The Bayesian classifier we use can help us to identifyunrepresented and rare examples existing in the data. Ac-cording to the Bayesian theorem, the posterior probabilityis the normalised product of the likelihood and the prior foran outcome. Likelihood is the probability with which similarevents have appeared in the past. Since the likelihood for anunseen event is zero, the classifier will flag it as ’rejected’and will not be classified. Objects with flag ’rejected’ can beindividually studied and can be subsequently added to thetraining sample to efficiently identify completely new classesof objects.The posterior probability will be high when the likeli-hood and the prior are high. Thus it is often referred to asthe ’confidence’ in a prediction. A high confidence usuallymeans that the object occupy a location in feature spacethat is well within the boundary of the cluster formed byits class. However, this is not an assurance that the objectis always correctly classified. It may happen that a negligi-bly small fraction of objects within that cluster belongs toa different class. Because of their small number, the likeli-hood for them tends to zero and it might happen that suchobjects will never be correctly identified. On the flip side,this helps the classifier to efficiently learn the boundaries ofa class even when there are outliers in the data. Since ourclassifier bins the data and separately computes the likeli-hood for each bin, one can optimise the bin width to im-prove the sensitivity of the likelihood estimates in favour ofthe marginally represented samples in the data.The training and testing procedures for our classifierare explained in the DBNN home page. As mentioned earlier, photometric correlations of colourswith the spectral class of objects are well established in theliterature. The SDSS CAS server has ’SpecPhoto’ table that ∼ nspp/dbnn.html Table 1.
Colour cuts used for preparing training data.colour Lower Limit Upper Limit u-g -0.25 1.00 g-r -0.25 0.75 r-i -0.30 0.50 i-z -0.30 0.50 provides the five SDSS magnitudes and spectroscopic classi-fication of all the primary objects selected for spectroscopyby the survey. These spectroscopic classifications are auto-mated in the SDSS pipeline and thus in the case of a smallfraction of the objects, the classifications are in error. A fol-low up visual verification of the classification has thus beencarried out for quasars and this is available separately onthe SDSS web site as the final SDSS quasar catalogue. Inaddition to our own visual examination of the spectra, weincorporated the corrections in the SDSS DR7 final quasarcatalogue (Schneider et al. 2010) in our training data.The five bands of SDSS can give four independentcolours. Sinha et al. (2007) had shown that good accuracyon quasar classification is possible with the use of the fourindependent colours and one pivot magnitude with DBNNand our work is an extension of their study. While their clas-sification accuracy on the test data was about 97 per cent,we find that the use of all the ten colours ( C ) which canbe formed from the five SDSS magnitudes, and one mag-nitude can improve the accuracy to about 99 per cent onthe spectroscopically confirmed test data. There is no ad-ditional information in the newly added correlated colours.However, finer details of the probability distribution func-tion (pdf) that are unresolved when only the independentcolours are used become distinct when all the colours areused. A Bayesian classifier differentiates objects based on alikelihood that is estimated from the pdf and hence its reso-lution plays a significant role in classification. We illustratethis in Fig. 2 by considering objects from a narrow region,0 . < u-g < .
26, of one feature and plotting the distri-butions of the same objects in the remaining nine features.It is seen that despite the fact that some of the colours arecorrelated, the probability distribution functions looks dif-ferent in each representation. The Bayesian estimator in ourclassifier has made use of this, which would be missing if weuse only the independent colours, to efficiently separate theoverlapping features of objects belonging to different classes.It may be noted that the resolution in the feature spaceincreases when the bin width is decreased. However, narrow-ing down the bin sizes to improve resolution in colour spacerequires an unlimited reserve of observations in each bin sothat pdfs can be plotted. We thus restrict our analysis toregions in the feature space where maximum spectroscopicclassification is available. This is the first criterion we hadfor selecting the particular window region for our study. Al-though it is only a small region of the colour space, it hasover 46 per cent of the available SDSS spectroscopy in unre-solved detections. The colour cuts used by us to define thisregion are given in Table 1.As said, we use all the ten colours, u-g , u-r , u-i , u-z , g-r , g-i , g-z , r-i , r-z , i-z plus the u -band psf magnitude asinput features for our classifier. During the training process, c (cid:13) , 000–000 Photometric Catalogue Figure 2.
Distribution in various colours of spectroscopically identified objects from the region 0 . < u-g < .
26 in the ten dimensionalcolour space. The colour code is red for quasars, blue for stars and black for galaxies.c (cid:13) , 000–000
Sheelu Abraham et al. by definition, the classifier learns the correlation betweencolour and spectroscopic types in the training data. This isstored and when similar features are presented to the clas-sifier at a later time, it is used to predict the likely spectro-scopic classification of an object. To make comparison easy,the predictions were assigned the same labels as used by’SpecClass’ in the ’SpecPhoto’ table of SDSS (See section2).
For going to fainter levels, we assume that within the win-dow region of the selected colour space, the likely variationsin colour at fainter levels can be learned by the classifierfrom the colour dispersions observed at brighter magnitudes.However, it is possible that a fainter object is at a differentredshift compared to the bright object of its kind and thatits observed spectrum and hence colour is altered due toredshift. In the case of quasars, it is known that the redshiftdistribution at brighter magnitudes (
J < .
2) are similar tothat at fainter magnitudes (
J > .
2) (Koo & Kron 1988).Hence, if we have spectroscopically confirmed bright quasarsfrom all redshift ranges in our training data, the classifiercan learn the intrinsic variations due to redshift differencesand then reliably extend this information to classify quasarsat fainter magnitudes. This is another reason for restrictingour study to a small region in colour space that has maxi-mum spectroscopically confirmed quasars.The colour cut we used is so selected that it avoidsmost of the late type stars and faint galaxies that can comein as contaminants in our catalogue. However, to includefaint stars and galaxies that might have different colourscompared to their brighter counterparts and thus might haveentered the colour window, we took a few representativefaint objects that have spectroscopic confirmation of theirclass in 2dF and included them in our training sample. Thesegave us representative training samples with spectroscopicconfirmation to 22nd magnitude in SDSS i -band. Our finaltraining data thus had 14,356 unresolved spectroscopicallyconfirmed objects.For preparing the catalogue, we selected objectsthat have flag BINNED1 set and excluded objectswith flags EDGE, NOPROFILE, PEAKCENTER,SATURATED, NOTCHECKED, PSF FLUX INTERP,DEBLEND NOPEAK, BAD COUNTS ERROR or IN-TERP CENTER . At fainter levels, the SDSS magnitudeerror estimate becomes unreliable. Since the same exposuretime is used by SDSS for the entire frame, there will onlybe a fewer photons from faint objects. This significantlyaffects the signal to noise ratio and puts an upper limit onthe faintness that can reliably be used to extract colourinformation. Though not compensatory, we restricted theupper limit for magnitude errors to i ∼ g ∼ i ∼ . The selected region has 106,466 spectroscopically confirmedunresolved objects. For preparing a training sample, we ini-tially took a random set ∼
10 per cent of this data. It wasfound that the random sampling did not give a good rep-resentation of the sparsely represented examples, like latetype stars of which there are only 301, or galaxies that areonly about 852 in number in our spectroscopically confirmeddata. We used the following strategy to handle this issue.When a feature vector is presented to the classifier duringthe testing round, it checks whether it has seen an examplethat looks similar to it earlier. If the test fails, then the clas-sifier will flag that object as ’rejected’ without attemptingclassification. We grouped such flagged objects and addedrepresentative samples from them into the training data sothat the classifier will be able to classify them. As a result, allthe under represented examples got included in the trainingdata.Another issue is caused by redshift that makes thecolours of an object appear similar to objects belonging toanother class. Since this causes the outcomes for the twoclasses to be equally likely, the Bayesian probability esti-mated by the classifier for objects in such regions will belower. Our classifier use this information to find regions thatrequire extra training samples so that the minor differencesin the features may be learned to separate out the classesefficiently. Thus our training data has more examples fromregions in the colour space that are occupied by objects fromdifferent classes.Another problem we observed in random sample selec-tion was that objects that had higher representation in thedata always dominated in the training sample. This causesthe dominated class to bias the classifier to its favour. Insuch a case one has to either remove the excess examplesfrom the training data or add more examples from the un-der represented class to the training data as a compensatorymeasure. All these requirements together gave us 14,356 ob-jects with spectroscopic confirmation as our training sample.This is composed of 3,968 stars, 9,236 quasars, 851 galaxiesand all the 301 late-type stars. Out of the 14,356 objects inthe training data, 2,806 are from 2dF which include 1,025stars, 1,470 quasars, 278 galaxies and 33 white dwarfs withpsf magnitude of i ranging from 17 to 22 mag. As men-tioned earlier, the 2dF objects were added to improve theprediction accuracy of our classifier at the fainter magni-tudes where SDSS spectroscopy was not available. Since thetraining data was constructed from the SDSS and 2dF spec-troscopic data, the object type for all of the data are known.During the training process the Bayesian likelihood foreach of the training example is computed and related infor-mation are stored by the classifier in its runtime file. Thisinformation is used later when new data are presented forclassification.To evaluate the performance of the classifier, the trainedclassifier is used to predict the class of the test data. Sincethe class of the objects in the test sample is known, the pre-dictions can be easily compared. It is found that the classifiercorrectly predicted 99.5 per cent of stars and 99.96 per centof quasars. In Table 2, we summarise the actual number ofobjects in the test data set, the predicted numbers in eachclass and the accuracy of prediction. c (cid:13) , 000–000 Photometric Catalogue Table 2.
The accuracy of our classifier as compared to the SDSS DR7 spectroscopic classification of the test sample.Object DBNN PredictionsType Star Galaxy Quasar Star-Late Completeness ContaminationStar 18,337 0 90 0 99.51 % 0.47 %Galaxy 27 705 120 0 82.74 % 0.00 %Quasar 34 0 86,852 0 99.96 % 0.28 %Star-Late 25 0 34 242 80.40 % 0.00 %Total 18,423 705 87,096 242 99.69 % 0.31 %
In addition to the likely spectral type, the classifier alsoreturns the computed Bayesian posterior estimate for theprediction, which is a measure of the confidence the classi-fier has in the prediction. Usually an object predicted withhigh confidence is predicted correctly. But sometimes, it maybe noted that an object is predicted with high confidence toa wrong class. This can happen when the colour of the ob-ject becomes similar to the colour of objects in another classwhich densely populate that region of the colour space. Theother possibility is that the assigned object label is incor-rect. The latter enabled us to find incorrect spectroscopiclabelling of quasars as galaxies in SDSS data, which we de-scribe in the next section.
The photometric classification of objects based on coloursappears to be straightforward, but it has been observed thatcolours of different object types sometimes overlap due tovarious reasons. In our test data there were 34 quasars thatgot incorrectly classified as stars by our classifier. The lo-cus of the colour feature space that forms the failed cases(black dots) in Fig. 3 shows that the colours of these objectslie mostly along the boundary of the stars and quasars. Forclarity, we did not include galaxies in the plot. What causesthis overlap? Fig. 4 shows that most of the failures clusteraround some specific patches of redshift at which the appar-ent colours of quasars are similar to those of some dominantstellar populations. Some of these populations, like that near z ∼ u−g r − z −0.4 −0.2 0.0 0.2 0.4 0.6 − . . . . *** ** ** *** * * * ***** ** *** ** ** *** * ** ** ** ***** * ** *** ** ** ** **** *** ** * ** ** ** Figure 3.
A two dimensional projection of the feature space ofquasars (blue) and stars (green) along with quasars mistakenlyidentified as stars and stars mistakenly identified as quasars (black* marks) are shown.The failures are at the colour boundary be-tween quasars and stars in the ten dimensional feature space. logue (DR7Q). We thus updated our training sample withthe DR7Q classifications and used it to produce the photo-metric catalogue. However, the corrections in the trainingdata resulted in only very minor changes to our catalogue,less than 0.5 per cent of its previous estimates. This is be-cause changing a few labels in the data need not necessarilybring forth considerable changes in the estimated probabil-ity distribution function used by the Bayesian classifier. Thisadvantage of our method helps the classifier to robustly han-dle unseen outliers that inevitably exist in any data.
The catalogue is created using our trained classifier on theprediction data that was described earlier. The predictiondata are similar to the training and test data with the onlydifference being that they have no known class label.There are 6,038,247 rows (object detections) in our cat-alogue. These are classified into 2,430,625 quasars, 3,544,036 c (cid:13) , 000–000 Sheelu Abraham et al.
Figure 4.
The SDSS colours of stars and quasars are indistin-guishable in a few patches of redshift. In the upper panel, thehistogram in orange indicates the correctly predicted quasars andthe histogram in black represents the distribution of quasars thatwere incorrectly classified as stars (failures). The bottom panelshows how the confidence of the classifier falls in these patches.The black dots represent the confidence of failed objects. It maybe noted that, despite the reduced confidence of the classifier inthe region shown, most of the quasars were correctly identified.The combined failures from galaxies and stars together is only ∼ stars and 63,586 galaxies by our classifier. The distributionof i magnitude of the objects in our catalogue is shown inFig. 5. According to the classifier predictions, the distri-bution of stars peaks at i ∼
20, followed by quasars andgalaxies at ∼ i magni-tude of 20.2, which is the limiting magnitude of SDSS quasarspectroscopy (Richards et al. 2002) and of these 1,99,690 arepredicted as quasars, 1,352,871 as stars and 60,190 as faintgalaxies.It was found that 69 per cent of the objects in the cata-logue are predicted with 100 per cent confidence while only4 per cent have less than 90 per cent and the remaining27 per cent have confidence between 90 and 100 per cent.This means, approximately 69 per cent of the objects arewell resolved in the colour space while the remaining 31 percent where sharing the colour space with objects from dif- Figure 5.
Overall magnitude distribution (upper panel) ofquasars, stars and galaxies in our catalogue is shown in blackextending from 14th magnitude to 24 magnitude in SDSS i -band.The individual distributions show how the surface density of thetypes changes. First the stars (green), then quasars (blue) andgalaxies (pink) peak in the distributions as we move towardsfainter magnitudes. In the lower panel, a 3D colour cube [u-g,g-r, r-i] of the 6 million predictions in our catalogue, colour codedas galaxies (larger yellow points are used to make their locus vis-ible), stars (green) and quasars (blue), is shown. ferent classes. Only 1 per cent objects were predicted withconfidence less than 70 per cent, which are from regionswhere there is substantive overlap between different classes.As stated earlier, lower posterior prediction probability oc-curs when the events are equally likely, say, the colour ofone type of object overlaps with another in the data. A plotof normalised cumulative histogram of these probabilities inthe catalogue is shown in Fig. 6.Objects with the same confidence value within a smallhypercube of the feature space forms the most similar ob-jects in the entire data. One can build subset of objectsgrouped on the basis of their confidence value for follow upstudies. For example, to study objects of a specific kind, inaddition to other regular identifiers like colour etc, the con-fidence measure given in our catalogue for its kind can beused as an additional dimension. c (cid:13) , 000–000 Photometric Catalogue Table 4.
Sample rows from our photometric catalogue based on SDSS DR7 (Please see text for column references).SDSS ID R.A. DEC. i mag Most Probable Confidence 1 Second Most Probable Confidence 2% %587732772667326484 185.24782587 10.87881822 18.59683 3 100.0000 1 0.00000587732772667326517 185.18802403 10.93259451 19.817682 1 99.99999 4 0.00001587732771042623616 152.67642584 8.17636451 19.76019 2 100.0000 3 0.00000587732771039346749 145.17463514 7.29821365 18.57126 3 100.0000 4 0.00000587732771039871127 146.37899605 7.54401918 19.97955 4 98.93777 3 1.06223587732771043016884 153.62477833 8.34608604 19.97215 1 100.0000 3 0.00000587732771042296010 151.99741353 8.11097072 19.46754 2 88.48774 3 11.51226587732771039608895 145.81816028 7.50764781 18.41648 3 100.0000 1 0.00000587732771047866444 164.72707514 9.09094786 19.79277 4 99.99086 3 0.00914587732771043410105 154.58767573 8.27519667 19.55149 3 95.53498 1 4.46502
Figure 6.
Plot of the normalised cumulative histogram of thepredicted Bayesian posterior probability for quasars (blue), stars(green), galaxies (pink) in the catalogue are shown. The individ-ual values with each prediction can be regarded as the confidencethe classifier has in that prediction. This information may be usedto subgroup objects for follow up studies on the basis of similar-ities.
Table 3.
The distribution of objects in our photometric cata-logue.Class Predicted NoMain Sequence Stars 3,540,337Galaxy 63,586Low Z Quasars 2,257,905High Z Quasars 172,720Late type Stars 3,699Total 6,038,247
The full catalogue contains 8 columns of 6,038,247 SDSSsources that uses the same numeric labels in SDSS tablesto classify objects into stars, galaxies, low redshift quasars,high redshift quasars (HizQSO) and late-type-stars. Thequasars in the catalogue include BAL quasars, BL Lacer-tae objects and other AGNs. Although we assign separatelabels for high and low redshift quasars, in this first release,we are not classifying the different types of AGNs and its sub classes and are grouping them all under the single namequasars. A summary of the objects predicted by the classifieris given in Table 3. A sample of 10 entries in the catalogueis given in Table 4. The content in each column is as follows.(i) SDSS ID : SDSS photometric object ID(ii) R.A. : Right ascension in decimal degrees (J2000)(iii) DEC. : Declination in decimal degrees (J2000)(iv) i mag : SDSS i -band PSF ¨ubercalibrated asinh mag-nitude(v) Most Probable : Most probable class of the object,represented by integers with the same meaning assigned bySDSS in their tables.(vi) Confidence: The confidence the classifier has in themost probable class.(vii) Second Most Probable : The second most probableclass of the object, same format as Object Type.(viii) Confidence: The confidence the classifier has in thesecond most probable class. We carry out cross matching between our catalogue and sev-eral other catalogues which contain a subset of objects fromour catalogue. The purose of the cross matching is two fold:(i) To identify potential limitations of the catalogue basedon available data. This is important because the whole exer-cise is based on a single survey and has not considered any ofthe inherent constraints in the survey. In many surveys thetarget selection algorithms are optimised to detect a partic-ular category of candidates and when the training data isderived from it, it is not necessary that it would be repre-sentative for the kind of objects observed by other surveys.Cross matching can reveal such biases if they exist.(ii) To estimate the quality of classification at brightermagnitudes where existing spectroscopy can provide a rep-resentative sample. This is significant because, even withmodern technology, spectroscopic confirmation of all brightobjects is impossible and one has to adopt other methods todetermine their number density of various types. Since dif-ferent surveys cover a different set of objects depending ontheir objective, robustness of a method can be estimated bycross matching predictions with spectroscopic confirmationsdone by different surveys. c (cid:13) , 000–000 Sheelu Abraham et al.
Table 5.
Summary of the matching of our catalogue predictions with some existing catalogues.DBNN Predictions Failures as per catalogueCat. Code Quasar Galaxy Star Quasar Galaxy Star Accuracy i mag Range Ref2DF 5976 238 1535 122 0 52 98% 17.0 - 22.0 1XBH 212 0 0 0 0 0 100% 15.8 - 20.5 2ASFS 1088 12 31 0 12 31 96% 14.5 - 22.1 3BATCS 21 0 0 3 0 0 86% 18.1 - 20.5 4CGRBS 265 1 0 0 1 0 100% 14.7 - 21.5 5DLyaQ 21 0 1 0 0 1 95% 16.5 - 19.4 6F2QZ 186 1 3 0 1 3 98% 16.6 - 21.0 7KFQS 144 2 13 3 1 7 94% 16.8 - 20.6 8LQAC 61504 17 267 0 17 267 100% 14.7 - 22.3 9LQRF 60280 14 219 0 14 219 100% 14.7 - 21.7 10BZC 249 4 2 0 4 2 98% 15.0 - 21.0 11PCS 53 0 2 0 0 2 96% 15.1 - 18.5 12ROSA 1134 0 1 0 0 1 100% 15.5 - 20.5 13SQ13 65223 55 395 0 55 395 99% 14.7 - 22.8 14SQR13 7 0 21 7 0 0 75% 16.3 - 20.3 14DR7Q 79140 17 341 0 17 341 100% 14.9 - 21.8 15SSSC 82 2 1171 82 2 0 93% 14.9 - 21.5 16SSA13 5 0 1 0 0 0 83% 17.8 - 20.8 17XMMSS 37 0 5 1 0 2 93% 14.9 - 20.7 18SDSS/XMM 580 0 0 0 0 0 100% 15.2 - 20.5 19RASS/2MASS 6 0 0 0 0 0 100% 15.5 - 18.4 20CAIXA 16 0 0 0 0 0 100% 15.1 - 17.8 21WDMB 20 0 106 20 0 0 84% 15.3 - 20.5 22PMS 639 6 19596 639 6 0 97% 14.8 - 20.2 23GLQ 2 0 0 0 0 0 100% 18.8 - 19.1 24(1) Croom et al.2009a; (2) Kelly et al.2008; (3) Healey et al.2007; (4) Zhang et al.2004; (5) Healey et al.2008; (6) Curran et al.2002; (7)Cirasuolo et al.2005; (8) Maddox et al.2008; (9) Souchay et al.2009; (10) Andrei et al.2009; (11) Massaro et al.2009; (12) Kuraszkiewiczet al.2004; (13) Suchkov et al.2006; (14) Veron-Cetty & Veron 2010; (15) Schneider et al.2010; (16) Skiff 2009; (17) Fomalont et al.2006;(18) Watson et al.2009; (19) Young et al.2009; (20) Haakonsen & Rutledge 2009; (21) Bianchi et al.2009; (22) Heller et al.2009; (23)Gould & Kollmeier 2004; (24) Oguri et al. 2008; To this end, we match our predictions with several othercatalogues which contains some of the objects in our cata-logue and summarise the results in Table 5. The tables alsoinclude the magnitudes covered by the matched objects inthe catalogues. The list of unconfirmed predictions are givenin Table 6 and a detailed discussion of the cross validationresults are given in Table 9. Multi-wavelength cataloguescovering X-ray, optical and radio with spectroscopically con-firmed objects were given preference in selecting cataloguesfor cross matching.All cross-matching is done by matching the celestial co-ordinates of the objects within 1 arcsec of their value in ourcatalogue. Since the selected region is rich in quasars, weselected a few spectroscopic surveys with confirmed quasarsas reference data for our catalogue. The surveys that weused for this are marked with a ** in Table 9. This resultedin the identification of 90,249 spectroscopically confirmedquasars with objects in our catalogue. We find that we hadcorrectly classified 89,549 ( ∼
99 per cent) of the objects asquasars. These included 10,230 ( ∼
11 per cent) spectroscop-ically confirmed non-SDSS quasars, of which, 9,887 (97 percent) were correctly classified while labelling 48 of the ob-jects as galaxies and 295 as stars. In the 9,887 non-SDSSquasars, 5,167 were fainter than SDSS quasar spectroscopicupper limit of 20.2 and 5,036 (97 per cent) were correctlypredicted as quasars by our classifier. Comparing our cata-logue with spectroscopically identified stars from 2dF, Skiffs catalogue of Stellar Spectral Classification, Proper MotionCatalogue from SDSS ∩ USNO-B (details and references inTable 9) and SDSS DR7 catalogue resulted in 36,645 stars inthe window region selected by us. Of these 35,830 ( ∼
98 percent) stars were correctly identified by our classifier. Com-parison with 746 spectroscopically confirmed galaxies fromSDSS DR7 and 2dF resulted in the correct identification of546 galaxies.Comparison with DR7Q gave 79,498 quasars of which79,140 were correctly identified by our classifier. The 341quasars that were predicted as stars by our classifier arefrom a few patches of redshift that includes z ∼ resulted in 30,261 objects. c (cid:13) , 000–000 Photometric Catalogue Table 6.
Unconfirmed predictions in the catalogue.DBNN PredictionsCat. Code Quasar Galaxy Star RefCNDWF 365 4 23 25ARC 38 0 0 26XMM2iS 3176 27 287 27ROSAT-FSC 24 0 9 28XMMCOSMOS 88 0 8 29(25) Brand et al.2006; (26) Aslan et al.2010; (27) Xmm-NewtonSurvey Science Centre, C. 2008, Vizier Online Data catalogue,9040, 0; (28) Veron-Cetty et al.2004; (29) Cappelluti et al.2009;
Figure 7.
The histogram of quasars in our catalogue that arefainter than SDSS spectroscopic magnitude limit in i-band andare having spectroscopic confirmation by other surveys is shown.The black histogram are the objects correctly identified by ourclassifier and the red are the failed ones. The counts on y-axis areshown in log scale for clarity.
Of these only 7,293 objects have spectroscopic confirmationfrom 2dF and of those, 97.8 per cent objects were correctlypredicted. The correctly identified quasars in this data in-cludes two gravitationally lensed quasars, SDSS J1216+3529and SDSS J0832+0404, from Oguri et al. (2008) and 20damped Lyman alpha quasars from Ellison et al. (2009).A distribution of the correctly identified and failed faintquasars in our catalogue that have spectroscopic confirma-tion is shown in Fig. 7.A photometric (Richards+2009) catalogue of about 1.2million quasars was constructed by Richards et al. (2009a)using an 8-dimensional photometric classification scheme.They used a Bayesian based kernel density estimator toidentify quasar candidates from SDSS DR6 with a limitingmagnitude of i =21.3 and expected completeness of ∼ Figure 8.
The magnitude histogram of quasars from the photo-metric catalogue by Richards Gordon for objects with flag good > Figure 9.
A sample colour-colour plot of the distribution of ob-jects in the photometric catalogue of Richards Gordon for objectswith flag good > tables 7 and 8. These tables show that our method givesbetter accuracy and lesser contamination in its prediction.Given that the data is already categorised as possible quasarcandidates, a marginally higher contamination rate for starsand galaxies are understandable. This also explains why thenumber of stars in the table are much lesser than the numberof quasars. Our catalogue has about twice as many quasars as classifiedby Richards+2009. Does this mean that we are overestimat-ing the quasar luminosity function? One way to evaluatethe reliability of the catalogue is to compare the predictedquasar number density in our catalogue with what has beenobserved. Our catalogue of unresolved objects from i ∼ c (cid:13) , 000–000 Sheelu Abraham et al.
Table 7.
A comparison of our predictions for objects with good > Table 8.
Completeness and contamination details of predictionsby DBNN as per Table 7.Object Type Completeness % Contamination %Quasars 99.56 0.39Stars 79.15 26.19Galaxies 65.02 21.05 the colour estimates. Croom et al. (2009b) have estimatedthe quasar luminosity function at 0 . < z < . . < z < .
1) from the 2SLAQsurvey (after applying corrections for coverage, photomet-ric and spectroscopic completeness) SGP (blue) and NGP(red) strips respectively are shown. It may be noted thatthe counts agree at brighter magnitudes where as at fainterlevels, our catalogue produces marginally larger counts be-cause our redshift window is ∼
50 per cent wider, extendingto z ∼ .
6. We assume that our completeness for redshift < . z .
6) as per our catalogueis ∼ deg − at limiting magnitudes g ∼
22 and falls to ∼
54 and ∼
18 quasars deg − at g = 21 and 20 respec-tively. It is also noted that our number count, which mostlikely consists of quasars with z < .
6, remains less than orequal to the redshift unbound number counts reported byKoo & Kron (1988) at the respective magnitudes. Thus afactor in the excess of quasars in our catalogue as comparedto Richard+2009, which is limited to i ∼
21, is to be under-stood as the contribution of objects from fainter magnitudesand as such is not contradicting the earlier estimates of thequasar luminosity function.
Completeness is defined as the ratio N cor /N obj where N cor isthe number of correctly predicted objects (say quasars) and N obj is the actual number of objects in that class. Likewise,contamination is defined as the ratio of incorrectly labelled Figure 10.
The upper panel shows quasars number counts (∆ g =0 . ∼ . < z < ∼ .
6) to the full magnitude range in our cat-alogue. In the lower panel, our predictions (black) are comparedwith observed quasar number count in the redshift range 0.4 to2.1 from the 2SLAQ survey SGP (blue) and NGP (red) strips.Both show close agreement with our catalogue. The marginal de-viations may be ignored considering the fact that our redshiftwindow is ∼
50 per cent wider than ( z < . , . < z < ∼ .
6) theredshift coverage of the other two. objects to the total number of objects in a class. It may alsobe defined as 1 − Accuracy.An ideal situation would be one where the completenessis 100 per cent and the contamination is zero. However, thisis not possible in reality. We have discussed the difficultiesin the classification of some of the objects and a reasonablesolution would be one where the sample is ’complete’ beyonda certain threshold and the contamination is the minimum.The classifier assigns a confidence value to every predictionit makes, which may be meaningfully used for this purpose.An important point to consider here is the scientific goalat the end of the classification process. If we want to get ahigh level of completeness, then we would want to keep a lowcut-off value for the confidence that can have any value be-tween 1 /N and 1, N being the number of predefined classesto be isolated. However, if we want to target quasars forspectroscopy, we might want to obtain quasar candidatesthat have a high chance to be a quasar. The selection of asuitable lower cut-off based on the confidence value allowsone to do it. As it may be noted from the cumulative confi-dence plot (Fig. 6), a value of 55 per cent confidence couldbe regarded as a good cut-off for most purposes. However, c (cid:13) , 000–000 Photometric Catalogue this may also result in the loss of quasars at some specificpatches of redshift where the confidence value drops becauseof the merger of colours from different classes. This is the in-evitable trade-off between completeness and contamination. Photometric estimation of redshift will significantly increasethe usability of our catalogue by giving an additional dimen-sion of their distributions. Secondly, the colour cuts that weused might have left out many interesting objects. Can wemeaningfully classify objects from regions where there arefewer number of training samples? One option is to go tomulti-band observations with capabilities to handle miss-ing attributes. Many such objects are of specific interest fortarget selection for astronomical observations such as vari-ability and so forth. These and related issues are now beinginvestigated.
In this paper, we develop a machine learning algorithmbased on Bayes theorem and train it on the colours of spec-troscopically confirmed objects from SDSS to produce a cat-alogue of over 6 million unresolved photometric detectionsin the SDSS DR7, classifying them into stars, galaxies andquasars. These objects are from a small region of the SDSScolour space that has about 106,466 spectroscopically con-firmed point sources and about 6 million photometric detec-tions without spectroscopy, dominated by quasars and mainsequence stars. We go to the limiting magnitudes of SDSSphotometry and predict the class of the objects with a set oflogically derived constraints. Our predictions are comparedwith other deep sky surveys in X-ray, optical, infra-red andradio and the results indicate that the method produces a re-liable classification of faint objects using only the five SDSSmagnitudes. The full catalogue and the data are available inelectronic form. The high accuracy of matching and the abil-ity to go to fainter levels of magnitude is expected to makeour classifier a valuable addition to photometric classifica-tion and candidate identification for some of the upcomingdeep sky surveys.The catalogue is limited to the colour window used forthis study and hence the completeness of the catalogue onlyrefers to objects within this window. We have noted thatmany of the failures have occurred at specific patches ofredshifts and are in agreement with literature. However wewish to note that the true nature of objects beyond i ∼ . ACKNOWLEDGEMENTS
The SQL query used to download the data from SDSS:SELECTs.objID as ObjID,s.ra as Ra,s.dec as Dec,(u.psfMag_u - s.extinction_u) as psfMag_u,(u.psfMag_g - s.extinction_g)as psfMag_g,(u.psfMag_r - s.extinction_r) as psfMag_r,(u.psfMag_i - s.extinction_i) as psfMag_i,(u.psfMag_z - s.extinction_z) as psfMag_z,s.type as type,s.z as RedShift,s.SpecClass as SpecClassFROMUberCal u,SpecPhoto s into SpecUberMagWHEREu.objID=s.objID and s.type=6AND((u.psfMag_u - s.extinction_u) -(u.psfMag_g - s.extinction_g)) between -0.25 and 1.00AND(u.psfMag_g - s.extinction_g) -(u.psfMag_r - s.extinction_r)) between -0.25 and 0.75AND(u.psfMag_r - s.extinction_r) -(u.psfMag_i - s.extinction_i)) between -0.30 and 0.50AND(u.psfMag_i - s.extinction_i) -(u.psfMag_z - s.extinction_z)) between -0.30 and 0.50AND ((flags & 0x10000000) != 0)AND ((flags & 0x8100000c00a4) = 0)AND (((flags & 0x400000000000) = 0) or (psfmagerr_g <= 0.2))AND (((flags & 0x100000000000) = 0) or (flags & 0x1000) = 0) c (cid:13) , 000–000 Sheelu Abraham et al.
Table 9: Detailed description of the cross-matching results and the cat-alogues usedCat. Code Remarks2DF The 2dF-SDSS LRG and QSO survey (Croom et al. 2009a) is a spectroscopic quasar cataloguewhich covers an area of 191.9 deg . There are 30,261 objects in 2dF catalogue that wereoverlapping with the colour space selected for our study. Of these, only 5510 2dF objectshave spectroscopic confirmation. In that 4247 objects were predicted as quasars,1113 objects as stars and remaining 238 objects were predicted as galaxies by ouralgorithm. Thus the overall accuracy is 98 per cent.XBH** This is the catalogue of 318 radio-quiet and X-ray emitting quasars (RQQ) studied byKelly et al. (2008). Of 318 detections, 212 in the colour region investigated by us.All of them were correctly predicted by our classifier.ASFS The CRATES: All-Sky Survey of Flat-Spectrum Radio Sources (Healey et al. 2007) has14,467 sources of which 1131 overlap with the region of our analysis. Theseobjects are characterized by a flat radio spectra with high variability in optical, significantpolarization and bimodal synchrotron/Compton spectral energy distributions. Henceall of them are believed to be AGNs viewed ’pole-on’. The classifier correctly identified1088 of them as quasars while it predicted 31 as stars and 12 as galaxies.BATCS Zhang et al. (2004) list the optical counterparts of 157 X-ray sources selected using the multicolourCCD imaging observations made by the Beijing-Arizona-Taiwan-Connecticut Sky Survey. Of these,21 fall in the region of our catalogue and all are predicted as quasars whilethree of these objects were identified as star burst galaxies.CGRBS** CGRaBS is an all-sky gamma-ray Blazar candidates survey (Healey et al. 2008) selected bytheir flat radio spectra. Of the 1625 target observations, 266 are in our catalogue. 265 ofthese were correctly classified as quasars while one got classified as galaxy.DLyaQ** Curran et al. (2002) give a catalogue of 322 damped Lyman alpha absorbers. Of these 22 appearin our catalogue and 21 of them are correctly identified as quasars while one failed as a star.F2QZ** Cirasuolo et al. (2005) gives a sample of faint radio-loud quasars from FIRST. The samplehas 238 detections of which 190 appear in our catalogue. 186 of them are correctlyidentified as quasars and 3 failed as stars and one as a galaxy.KFQS Maddox et al. (2008) have created a catalogue of 3154 K -band detections of possible quasarcandidates and their spectroscopic classifications. 159 of these objects appear in ourcatalogue while three of them have no spectral class. 144 of them were labelled as quasars.Two was predicted as galaxy and 13 as stars.LQAC Souchay et al. (2009) have constructed a large quasar astrometric catalogue of 113666 quasars.Of this, 61,788 overlap with our catalogue. It was found that our classifier correctlydetected 61,504 of them while predicting 267 as stars and 17 as galaxies.LQRF Andrei et al. (2009) has constructed a large quasar reference frame of 100165quasars observed in different surveys. Of these, 60,513 fall in the region of ourcatalogue objects. The classifier could correctly identify 60,280 of these objectsas quasars while 219 got labelled as stars and 14 as galaxies.BZC** Roma-BZCAT (Massaro et al. 2009) is a catalogue of 2837 blazars of which 255 had photometricdetection by SDSS in the colour space of our catalogue. The classifier correctly detected 249 ofthem while got 4 incorrectly labelled as galaxies and 2 as stars.PCS** Kuraszkiewicz et al. (2004) provide a catalogue of 220 spectroscopically confirmed AGNs using theFaint Object Spectrograph on the Hubble space telescope. Of these, 55 i -band bright objectsappear in our catalogue. 53 of which are correctly identified as quasars while incorrectly labelled 2 as stars.ROSA** Suchkov et al. (2006) gives 1744 type 1 AGNs that have X ray observation in ROSAT PSPC.Of this, 1135 are present in our catalogue. All of these except 1 got correctly classified as quasars.SQ13** Quasars and Active Galactic Nuclei (13th Ed.) (V´eron-Cetty & V´eron 2010) is a catalogue of168,941 (all known prior to July 1st, 2009) AGNs. 65,673 of these objects have entriesin our catalogue of which 65,223 were correctly identified as quasars while 395 got labelledas stars and 55 as galaxies.SQR13 Rejected Quasars and Active Galactic Nuclei (13th Ed.) (V´eron-Cetty & V´eron 2010)has 178 entries that were previously considered as quasars and were rejected as mostly stars.Of these 28 objects have entry in our catalogue. Our algorithm incorrectly classified 7 ofthem as quasars while 21 were correctly labelled as stars.DR7Q** Schneider et al. (2010) give the fifth edition of SDSS quasar catalogue consisting of 105,783spectroscopically confirmed quasar candidates. Of these, the colour space covered by our catalogue c (cid:13) , 000–000 Photometric Catalogue contains 79,498 quasars. Our classifier correctly identified 79,140 quasars while it labelled17 as galaxy and 341 as star.SSSC Skiff catalogue of Stellar Spectral Classifications (Skiff 2009) is a compilation of 423055stellar objects from literature. Of these, 1255 have entries in our catalogue. Our classifierlabelled 82 objects as quasars of which 2 have been identified as quasars by SDSS DR7quasar catalogue. Two objects got labelled as galaxy, while the remaining 1171 werecorrectly identified as stars.SSA13 Fomalont et al. (2006) prepared radio/optical catalogue of the SSA 13 Field that has 878radio sources of which only 6 have entries in our catalogue. All the objects except one were correctlypicked by our classifier.XMMSS The second XMM-Newton Serendipitous Source catalogue (Watson et al. 2009) has 3504 pointsources and 42 of these are in our catalogue. 37 of them were predicted as quasarsin that one is an emission line galaxy and 5 as stars.SDSS/XMM Young et al. (2009) gives the optical quasar candidates of 792 X-ray sources observedserendipitously in the X-ray with XMM Newton. 580 of these objects appear in our catalogueand all of them were correctly identified by the classifier.RASS/2MASS Haakonsen & Rutledge (2009) gives an associated catalogue of 18,806 X-ray sources in RASS/BSCthat have counterpart with near-infrared sources from 2MASS/PSC. Of these, 6 objects appearin our catalogue and all are labelled quasars by our classifier.CAIXA The catalogue of AGN in the XMM-Newton archive (Bianchi et al. 2009) has 156radio-quiet X-ray unobscured AGNs of which 16 appear in our catalogue. All of themgot classified as quasars.WDMB Heller et al. (2009) gives 857 white dwarf - M binaries from SDSS DR6, of which 126 were presentin our catalogue. 106 of them got labelled as stars while 20 were labelled as quasars.One object predicted as quasar is a confirmed quasar from SDSS DR7 quasar catalogue.White dwarfs are known contaminants in photometric quasar cataloguesexplaining the relative low classification accuracy of 84 per cent in this case.PMS Gould & Kollmeier (2004) prepared a catalogue of proper motion of 390,476 stars from SDSS andUSNO-B observations. 20,241 objects from it are present in our catalogue. Our classifier labelled19,596 of the objects as stars, 639 as quasars and 6 as galaxies. Out of the 623 quasars predicted,18 are confirmed quasars.Oguri et al. (2008) gave 4 gravitationally lensed quasars from SDSS Quasar Lens Search,GLQ which is a systematic survey of lensed quasars from SDSS spectroscopic quasars. Of these,two fall in the region of our catalogue and both are correctly predicted by the classifier.CNDWF The Chandra XBootes Survey optical counterpart catalogue (Brand et al. 2006) has 5318 pointsources and 392 of them are in our catalogue. The true class of the objects is not known. Theclassifier labelled 365 as quasars, 23 as stars and 4 as galaxies.ARC Astrometric positions of radio sources (Aslan et al. 2010) give the positions of theextragalactic radio detection of about 300 objects. Of this 38 are in our catalogue. All of themwere classified as quasars.XMM2iS The XMM-Newton Second Incremental Source cataloguegive a catalogue of 221,012 X-ray sources. Of these 3,490 overlap with our catalogue. Our classifieridentified 3,176 objects as quasars, 287 as stars and 27 as galaxies.ROSAT-FSC V´eron-Cetty et al. (2004) give optically selected bright AGN samples in ROSAT Faint Sourcecatalogue. Of the 103 quasar candidate in this, 33 are present in our catalogue. 24 objects were correctlypredicted by the classifier while 9 as stars.XMMCOSMOS The XMM - Newton wide-field survey (Cappelluti et al. 2009) in the COSMOS field gives1887 point-like X-ray sources. 96 objects were present in our catalogue. Of these objects 88were predicted as quasar while 8 as stars. c (cid:13) , 000–000 Sheelu Abraham et al.
REFERENCES
Abazajian, K. N., et al. 2009, ApJS, 182, 543Adelman-McCarthy, et al. 2008, ApJS, 175, 297Andrei, A. H., et al. 2009, A&A, 505, 385Aslan, Z., Gumerov, R., Jin, W., Khamitov, I., Maigurova,N., Pinigin, G., Tang, Z., & Wang, S. 2010, A&A, 510,A10Bianchi, S., Guainazzi, M., Matt, G., Fonseca Bonilla, N.,& Ponti, G. 2009, A&A, 495, 421Brand, K., et al. 2006, A&A, 641, 140Cappelluti, N., et al. 2009, A&A, 497, 635Cirasuolo, M., Magliocchetti, M., & Celotti, A. 2005, MN-RAS, 357, 1267Covey, K. R., et al. 2007 AJ, 134, 2398CCroom, S. M., et al. 2009, MNRAS, 392, 19Croom, S. M., et al. 2009, MNRAS, 399, 1755Curran, S. J., Webb, J. K., Murphy, M. T., Bandiera, R.,Corbelli, E., & Flambaum, V. V. 2002, Publications of theAstronomical Society of Australia, 19, 455Ellison, S. L., Murphy, M. T., & Dessauges-Zavadsky, M.2009, MNRAS, 392, 998Fomalont, E. B., Kellermann, K. I., Cowie, L. L., Capak,P., Barger, A. J., Partridge, R. B., Windhorst, R. A., &Richards, E. A. 2006, ApJS, 167, 103Fukugita, M., Ichikawa, T., Gunn, J. E., Doi, M., Shi-masaku, K., & Schneider, D. P. 1996, AJ, 111, 1748Goderya, S.. Andreasen, J. D. & Philip, N. S. 2004, inproceedings of Astronomical Data Analysis Software andSystems (ADASS) XIII, 314, 617Gould, A., & Kollmeier, J. A. 2004, ApJS, 152, 103Gunn, J. E., et al. 1998, AJ, 116, 3040Gunn, J. E., et al. 2006, AJ, 131, 2332Haakonsen, C. B., & Rutledge, R. E. 2009, ApJS, 184, 138Healey, S. E., Romani, R. W., Taylor, G. B., Sadler, E. M.,Ricci, R., Murphy, T., Ulvestad, J. S., & Winn, J. N. 2007,ApJS, 171, 61Healey, S. E., et al. 2008, ApJS, 175, 97Heller, R., Homeier, D., Dreizler, S., & Østensen, R. 2009,A&A, 496, 191Kelly, B. C., Bechtold, J., Trump, J. R., Vestergaard, M.,& Siemiginowska, A. 2008, ApJS, 176, 355Koo, D. C., & Kron, R. G. 1988, ApJ, 325, 92Kuraszkiewicz, J. K., Green, P. J., Crenshaw, D. M., Dunn,J., Forster, K., Vestergaard, M., & Aldcroft, T. L. 2004,ApJS, 150, 165Lupton, R. H., Gunn, J. E., & Szalay, A. S. 1999, AJ, 118,1406Maddox, N., Hewett, P. C., Warren, S. J., & Croom, S. M.2008, MNRAS, 386, 1605Massaro, E., Giommi, P., Leto, C., Marchegiani, P.,Maselli, A., Perri, M., Piranomonte, S., & Sclavi, S. 2009,A&A, 495, 691Niemack, M. D. et al., 2009 ApJ. 690, 89NOdewahn, S. C., Cohen, S. H., Windhorst, R. A. & Philip,N. S. 2002 ApJ, 568, 539Oguri, M., et al. 2008, AJ, 135, 520Oyaizu , H., et al. 2008 ApJ, 674, 768Padmanabhan, N., et al. 2008, ApJ, 674, 1217Philip, N. S., & Joseph, K. B. 2000, Journal of IntelligentData Analysis, 4, 463Philip, N. S., Wadadekar, Y., Kembhavi, A.,& Joseph, K. B. 2002, A&A 385, 1119Richards, G. T., et al. 2002, AJ, 123, 2945Richards, G. T., et al. 2004, ApJS, 155, 257Richards, G. T., et al. 2009, ApJS, 180, 67Richards, G. T., et al. 2009, AJ, 137, 3884Schneider, D. P., et al. 2007, AJ, 134, 102Schneider, D. P., et al. 2010, AJ, 139, 2360Sinha, R. P., Philip, N. S., Kembhavi, A. K., & Mahabal,A. A. 2007, Highlights of Astronomy, 14, 609Skiff, B. A. 2009, Vizier Online Data catalogue, 1, 2023Stoughton, C., et al., 2002 AJ,123, 485SSouchay, J., et al. 2009, A&A, 494, 799Suchkov, A. A., Hanisch, R. J., Voges, W., & Heckman,T. M. 2006, AJ, 132, 1475V´eron-Cetty, M.-P., et al. 2004, A&A, 414, 487V´eron-Cetty, M. P., & V´eron, P. 2010, Vizier Online Datacatalogue, 7258, 0Watson, M. G., et al. 2009, A&A, 493, 339York, D. G., et al. 2000, AJ, 120, 1579Young, M., Elvis, M., & Risaliti, G. 2009, ApJS, 183, 17Zhang, H., et al. 2004, AJ, 127, 2579 c (cid:13)000