[PDF] Finding Quasars behind the Galactic Plane. I. Candidate Selections with Transfer Learning

Abstract

Quasars behind the Galactic plane (GPQs) are important astrometric references and useful probes of Milky Way gas. However, the search for GPQs is difficult due to large extinctions and high source densities in the Galactic plane. Existing selection methods for quasars developed using high Galactic latitude (high-b) data cannot be applied to the Galactic plane directly because the photometric data obtained from high-b regions and the Galactic plane follow different probability distributions. To alleviate this dataset shift problem for quasar candidate selection, we adopt a Transfer Learning Framework at both data and algorithm levels. At the data level, to make a training set in which dataset shift is modeled, we synthesize quasars and galaxies behind the Galactic plane based on SDSS sources and Galactic dust map. At the algorithm level, to reduce the effect of class imbalance, we transform the three-class classification problem for stars, galaxies, and quasars to two binary classification tasks. We apply XGBoost algorithm on Pan-STARRS1 (PS1) and AllWISE photometry for classification, and additional cut on Gaia proper motion to remove stellar contaminants. We obtain a reliable GPQ candidate catalog with 160,946 sources located at |b|\leq 20^{\circ} in PS1-AllWISE footprint. Photometric redshifts of GPQ candidates achieved with XGBoost regression algorithm show that our selection method can identify quasars in a wide redshift range (0<z\lesssim5). This study extends the systematic searches for quasars to the dense stellar fields and shows the feasibility of using astronomical knowledge to improve data mining under complex conditions in the Big Data era.

Full PDF

DDraft version February 22, 2021

Typeset using L A TEX default style in AASTeX63

Finding Quasars behind the Galactic Plane. I. Candidate Selections with Transfer Learning

Yuming Fu ,

1, 2

Xue-Bing Wu ,

1, 2

Qian Yang , Anthony G. A. Brown , Xiaotong Feng,

1, 2

Qinchun Ma,

1, 2 and Shuyan Li Department of Astronomy, School of Physics, Peking University, Beijing 100871, China Kavli Institute for Astronomy and Astrophysics, Peking University, Beijing 100871, China Department of Astronomy, University of Illinois at Urbana-Champaign, Urbana, IL 61801, USA Leiden Observatory, Leiden University, Niels Bohrweg 2, 2333 CA, Leiden, The Netherlands (Received 10 Sep, 2020; Revised 30 Jan, 2021; Accepted 18 Feb, 2021)

Submitted to ApJSABSTRACTQuasars behind the Galactic plane (GPQs) are important astrometric references and useful probesof Milky Way gas. However, the search for GPQs is diﬃcult due to large extinctions and high sourcedensities in the Galactic plane. Existing selection methods for quasars developed using high Galac-tic latitude (high- b ) data cannot be applied to the Galactic plane directly because the photometricdata obtained from high- b regions and the Galactic plane follow diﬀerent probability distributions.To alleviate this dataset shift problem for quasar candidate selection, we adopt a Transfer LearningFramework at both data and algorithm levels. At the data level, to make a training set in whichdataset shift is modeled, we synthesize quasars and galaxies behind the Galactic plane based on SDSSsources and Galactic dust map. At the algorithm level, to reduce the eﬀect of class imbalance, wetransform the three-class classiﬁcation problem for stars, galaxies, and quasars to two binary classi-ﬁcation tasks. We apply XGBoost algorithm on Pan-STARRS1 (PS1) and AllWISE photometry forclassiﬁcation, and additional cut on Gaia proper motion to remove stellar contaminants. We obtain areliable GPQ candidate catalog with 160 ,

946 sources located at | b | ≤ ◦ in PS1-AllWISE footprint.Photometric redshifts of GPQ candidates achieved with XGBoost regression algorithm show that ourselection method can identify quasars in a wide redshift range (0 < z (cid:46) Keywords:

Active galactic nuclei (16), Astrostatistics techniques (1886), Catalogs (205), Classiﬁcation(1907), Galactic and extragalactic astronomy (563), Quasars (1319) INTRODUCTIONThe Galactic plane has long been the “zone of avoidance” for extragalactic astronomy, including quasar surveys.The Half Million Quasar (HMQ; Flesch 2015) catalog contains a total of 510,764 objects, but only 35,105 located at b ≤ | ◦ | (half of the whole sky area), 3,730 at b ≤ | ◦ | , and 255 at b ≤ | ◦ | . Although it is diﬃcult to search forQuasars behind the Galactic Plane (hereafter GPQs), such quasars are important references for astrometry and usefulprobes of Milky Way gas.Quasars are used as astrometric references due to their small parallaxes and proper motions. GPQs enable theaccurate measurement of positions, distances, and proper motions of stars in the Galactic disk, which is key tounderstanding our own Galaxy. The high-precision astrometry provided by the Gaia mission deﬁnes a celestial referenceframe through the positions of 556,869 candidate quasars, however only a tiny fraction of these quasars are located at | b | ≤ ◦ (Gaia Collaboration et al. 2018a). A large sample of GPQs will help build a better reference frame in the [email protected]; [email protected] a r X i v : . [ a s t r o - ph . GA ] F e b Fu et al. optical, through direct coverage of the sky in the Galactic plane, and will help to better understand the systematicastrometry errors of Gaia in the Galactic plane region (Arenou et al. 2018).Line-of-sight absorption towards quasars can probe gas structures of the Milky Way. While quasars at high Galacticlatitude have been useful in studying the Milky Way halo gas (e.g. Savage et al. 1993, 2000; Ben Bekhti et al. 2008,2012), GPQs allow absorption line studies on gaseous structures in the Galactic plane (e.g. Anti-Center Shell, HComplex; see Westmeier 2018). Moreover, a high density sample of GPQs can map the gas distribution with a higherangular resolution than that is possible with the 21 cm surveys.Another application of GPQs is adaptive-optics observation on quasar host galaxies, which is achieved by theirproximity to nearby bright stars as natural guide stars (Im et al. 2007; Fischer et al. 2019). For adaptive optics,natural guide stars should be located within a few arcseconds of the science target, which rarely occurs outside of theGalactic plane but is more common in the plane.The diﬃculty of ﬁnding quasars behind the Galactic plane is caused by several challenges, including: • In comparison to objects at high Galactic latitude (high- b ), sources in the Galactic plane suﬀer from higherextinction and reddening. As a result, many sources (especially extragalactic sources) can not be detectedwithin the survey detection limit. For other detectable sources, their colors are diﬀerent from those at highGalactic latitude. • The source density in the Galactic plane is high. The quality of photometry can be worse in dense regions,because sources can be easily contaminated by visible or unseen neighbors. • A lot of “unusual” stars are located within the Galactic plane, including some white dwarfs, M/L/T dwarfs, andYoung Stellar Objects (YSOs), that share many similar observational properties with quasars. These sourcescan be contaminants for quasars at diﬀerent redshifts (e.g. Kirkpatrick et al. 1997; Vennes et al. 2002; Chiu et al.2006; Koz(cid:32)lowski & Kochanek 2009).Since the ﬁrst identiﬁcation of quasar (3C 273; Schmidt 1963), many methods for quasar candidate selection havebeen developed, including ultraviolet excess (e.g. Sandage 1965; Green et al. 1986), radio sources (e.g. Gregg et al.1996; White et al. 2000; Becker et al. 2001), X-ray sources (e.g. Pounds 1979; Grazian et al. 2000), optical/near-infrared(near-IR) colors (e.g. Richards et al. 2002; Fan et al. 2001; Wu & Jia 2010), mid-IR colors (e.g. Lacy et al. 2004; Sternet al. 2005, 2012; Mateos et al. 2012; Wu et al. 2012; Yan et al. 2013), and quasar variability (e.g. Dobrzycki et al. 2003;Palanque-Delabrouille et al. 2011). In addition, tools based on statistical machine learning (e.g. Richards et al. 2004;Bovy et al. 2011) and deep learning (e.g. Y`eche et al. 2010; Pasquet-Itam & Pasquet 2018) have also been establishedto ﬁnd quasars with various data that are available.A few studies have focused on ﬁnding quasars/AGNs behind dense stellar ﬁelds such as the Galactic plane, MagellanicClouds, and M31 and M33 galaxies. Most of these studies used infrared selection methods to eﬃciently ﬁnd quasars.For example, Im et al. (2007) discovered 40 bright quasars at | b | ≤ ◦ by applying the combination of a near-IR colorcut of J − K > . ∼ . Gaia

Data Release 2 (

Gaia

DR2; Gaia Collaboration et al. 2016,2018b) as stars, quasars, and galaxies with Gaussian Mixture Model and addressed the problem of class imbalance in Gaia

DR2.However, the studies listed above either treated sources in the Galactic plane and high Galactic latitude as thesame, or removed the Galactic plane from consideration. Selection methods for quasars at high Galactic latitude arenot generic and can not be applied to the Galactic plane directly, because data (e.g. PS1 and AllWISE photometry) uasars behind the Galactic plane b and low- b follow diﬀerent probability distributions. For example, apparent colors of quasars(stars) vary from high- b to low- b regions, and so do the source density of quasars (stars). Such behavior of data is akind of non-stationarity called dataset shift (Quionero-Candela et al. 2009), which leads to signiﬁcant estimation biasof supervised machine learning algorithms. The color cuts for quasar selection can also be regarded as simple decisiontree models in machine learning regime. Previous color cuts obtained from high Galactic latitude regions fail in theGalactic plane due to the dataset shift .To deal with these dataset shift problems, transfer learning (Pan & Yang 2009) has been proposed and studiedextensively by data scientists. The idea of transfer learning is to use knowledge gained in one problem and apply it toa diﬀerent but related problem. Although spectroscopically identiﬁed (i.e. “labeled”) samples of extragalactic objectsare inadequate in the Galactic plane, such labeled samples are available at high Galactic latitude. The labeled datamake it possible to build a good selection method for GPQs, once the knowledge transfer from high Galactic latitudeto low Galactic latitude is successful.This paper is the ﬁrst one of this series for ﬁnding GPQs. In this paper we present a transfer learning method forquasar selection, as well as a GPQ candidate catalog with 160 ,

946 sources. In Section 2, we introduce the archivaldata used for this study. In Section 3, we describe the algorithm design for GPQ selection. In Section 4, we synthesizequasars and galaxies behind the Galactic plane with extragalactic objects at high Galactic latitude from the SloanDigital Sky Survey (SDSS; York et al. 2000), to make a training set in which dataset shift is modeled. In Section 5,we transform the three-class classiﬁcation problem for stars, galaxies and quasars to two binary classiﬁcation tasks:stars versus extragalactic objects, and quasars versus galaxies to reduce the class imbalance and class-balance change .In Section 6, we calculate the photometric redshifts for GPQ candidates. In Section 7, we present the GPQ candidatecatalog and some statistical properties of the sample. We summarize the results in Section 8. Throughout this paper,we use AB magnitude for PS1 photometry and Vega magnitude for AllWISE photometry unless mentioned. DATAWe make use of optical and infrared photometric data from PS1 and AllWISE, and astrometric data from

Gaia

DR2.We also retrieve samples of spectroscopically identiﬁed objects from SDSS and LAMOST.2.1.

PS1 DR1 photometry

Pan-STARRS1 (PS1; Chambers et al. 2016) has carried out a set of synoptic imaging sky surveys including the 3 π Steradian Survey and the Medium Deep Survey in 5 bands ( grizy P ). The mean 5 σ point source limiting sensitivitiesin the stacked 3 π Steradian Survey in ( grizy P ) are (23.3, 23.2, 23.1, 22.3, 21.4) and the single epoch 5 σ depths in( grizy P ) are (22.0, 21.8, 21.5, 20.9, 19.7). For better astrometry in the crowded Galactic plane ﬁeld, we use meancoordinates from the PS1 MeanObject table. Mean PSF magnitudes are used for all bands ( grizy P ), and meanKron magnitudes (Kron 1980) are used for i P and z P bands. The Galactic extinction coeﬃcients for ( grizy P ) are R g , R r , R i , R z , R y = 3 . , . , . , . , . R λ = A λ /A V × R V ,where A λ /A V is the relative extinction value for band λ given by a new optical to mid-IR extinction law (Wang &Chen 2019), and R V = 3 . grizy P >

0) and signiﬁcantly detected in i P (error in PSF mag of i P band i err < . i P -band SNR larger than 5); (ii) not too bright in i P to avoid possible saturation ( i > i P and z P bands ( i Kron > z Kron > g, r, i, z, y ) torepresent the PSF magnitudes of PS1 bands ( grizy P ) in color indexes (e.g. g − r , g − W

1) and derived quantities i − i Kron and z − z Kron . The z P PSF magnitude does not appear alone and will not be confused with the redshiftsymbol z . 2.2. AllWISE photometry for point-like sources

The AllWISE catalog is built upon the work of the Wide-ﬁeld Infrared Survey Explorer mission (WISE; Wright et al.2010) by combining data from the WISE cryogenic and NEOWISE (Mainzer et al. 2011) post-cryogenic survey. WISEhas 4 bands at 3.4, 4.6, 12, and 22 µ m (W1, W2, W3, and W4). The 5 σ limiting magnitudes of the AllWISE catalogin W1, W2, W3, and W4 bands are 19.6, 19.3, 16.7, and 14.6 mag. The Galactic extinction coeﬃcients for W1, W2,W3 used in this study are R W , R W , R W = 0 . , . , . A λ /A V values from Wang & Chen (2019). Fu et al.

We cross-match the PS1 sources with AllWISE using a radius of 1 (cid:48)(cid:48) to avoid source confusion in the dense ﬁeldsof the Galactic plane. We also set a few constraints on the AllWISE data. All sources should be: (i) AllWISEpoint sources ( ext f lg = 0); (ii) not too bright to avoid possible saturation ( W > W > W W W snr > W snr > cc f lags =“0000”); (v) unblended with nearby detections, so that only one component is used in each proﬁle-ﬁttingfor each source ( nb = 1). 2.3. Gaia DR2 astrometryGaia

DR2 (Gaia Collaboration et al. 2016, 2018b) contains celestial positions and the apparent brightness in G bandfor approximately 1 . . G BP (330–680 nm) and G RP (630–1050 nm) bands are available for 1 . Gaia

DR2 catalog (columns pmra , pmra error , pmdec ,and pmdec error ) to ﬁnd quasars.2.4. SDSS Quasar Catalog: the fourteenth data release

SDSS (York et al. 2000) has mapped the high Galactic latitude northern sky and obtained imaging as well asspectroscopy data for millions of objects including stars, galaxies, and quasars. The 14th data release of the SDSSQuasar Catalog (SDSS DR14Q; Pˆaris et al. 2018) contains 526,356 quasars. We cross-match the DR14Q catalog withPS1 and AllWISE both with a radius of 1 (cid:48)(cid:48) . To ensure the data quality, we use the same constraints in Section 2.1and Section 2.2 to retrieve a subset of DR14Q. This subset has 289,271 sources and is denoted as

GoodQSO hereafter.As can be seen from the HEALPix (G´orski et al. 2005) density map of

GoodQSO (Figure 1), very few sources of

GoodQSO are located at | b | ≤ ◦ . Density plot of GoodQSO sources ( N side = 64 )1 50 90 Density [ deg ] Figure 1.

HEALPix density map of

GoodQSO sources from SDSS DR14Q (in Galactic coordinate system) with a mediandensity of 20.3 deg − . The HEALPix parameter N side = 64 and the sky area per pixel is 0.839 deg . SDSS spectroscopically identiﬁed stars and galaxies

In order to compare high- b sources with Galactic plane sources that we use, a sample of stars and a sample ofgalaxies are extracted from SpecPhotoAll table of SDSS Data Release 15 (Blanton et al. 2017; Aguado et al. 2019).We cross-match both the star and galaxy sample with PS1 and AllWISE with a radius of 1 (cid:48)(cid:48) . The SDSS star samplehas 23,693 sources. We also apply quality constraints in Section 2.1 and Section 2.2 to select galaxy subset with goodphotometry for later use. The resulted subset of galaxy (denoted as GoodGal hereafter) has 1,635,053 sources. MostSDSS stars and galaxies are located at high Galactic latitude ( | b | > ◦ ).2.6. Stars from LAMOST general catalog

The Large Sky Area Multi-Object Fiber Spectroscopic Telescope (LAMOST, also called the Guoshoujing Telescope)is a special reﬂecting Schmidt telescope, the design of which allows both a large eﬀective aperture of 3.6 m–4.9 m and a uasars behind the Galactic plane ◦ (Wang et al. 1996; Su & Cui 2004; Cui et al. 2012). The LAMOST spectral survey (Zhao et al.2012; Luo et al. 2012; Luo et al. 2015) consists of two major components, i.e. the LAMOST Experiment for GalacticUnderstanding and Exploration (LEGUE; Deng et al. 2012) , and the LAMOST ExtraGAlactic Survey (LEGAS).The LEGUE observes stars in diﬀerent sky regions with diﬀerent magnitude ranges, including the Galactic halo with r < . | b | > ◦ , the Galactic anti-center with 14 . < r < . ◦ ≤ l ≤ ◦ and | b | < ◦ (Yuan et al. 2015), as well as the Galactic disk with r (cid:46)

16 mag at | b | ≤ ◦ with uniform coverage along Galacticlongitude. The LEGAS mainly identiﬁes galaxies and quasars that are within the SDSS footprint but complementaryto the SDSS spectroscopic samples (e.g. Shen et al. 2016; Yao et al. 2019) . Nevertheless, extragalactic objectsin the LEGUE plates are also targets of the LEGAS. The LAMOST spectral survey has obtained the largest stellarspectra sample to date. We retrieve star sample from LAMOST general catalog from DR1 to DR7v0. A total numberof 3,940,076 LAMOST stars meet the the same constraints in Section 2.1 and Section 2.2. From this LAMOST starsample, we select 1,334,577 Galactic plane stars with | b | ≤ ◦ (denoted as T Star hereafter). Most T Star sources arefrom the LEGUE survey and are brighter than 18 mag in i P band.2.7. The Million Quasars (Milliquas) Catalog

The Million Quasars (Milliquas) Catalog (Flesch 2019) is a compilation of quasars and quasar candidates from theliterature. The Milliquas v6.4c update includes 758,908 type-I QSOs and AGN up to 31 December 2019. We use thiscatalog to extract extant GPQ sample within PS1 footprint. There are 4,344 quasars (with “Q” label in the “Descrip”column) located in | b | ≤ ◦ in Milliquas v6.4c. Cross-matching these 4,344 known GPQs with PS1 and AllWISEboth with a radius of 1 (cid:48)(cid:48) gives 2,757 sources. After applying same constraints as in Section 2.1 and Section 2.2, we geta subset of 1,853 sources. This Galactic plane subset of Milliquas quasars, denoted as M LQSU B , will be used laterfor candidate validation. DESIGN OF THE TRANSFER LEARNING FRAMEWORK3.1.

Dataset shift problem in the Galactic plane

The task of quasar selections can be described by classiﬁcation problems in machine learning. Here we look intothe three-class classiﬁcation for stars, galaxies, and quasars with photometric data. The learning process requirestwo independent datasets for model training and model validation respectively. Training and validation sets can betwo nonoverlapping subsets from a common parent sample with both features (colors and/or magnitues) and classlabels (star, galaxy, and quasar). Usually the class labels are given by spectroscopic identiﬁcations. The classiﬁcationalgorithm learns a mapping relation from features to class labels with the training set. Often, the trained classiﬁcationmodel (classiﬁer) is applied to another dataset without class labels (i.e. no spectroscopic identiﬁcations), which iscalled application set or test set. The classiﬁer takes features from the test set as inputs X (a.k.a. covariates) andgives class labels as outputs Y .A basic assumption for traditional machine learning is that training and test data follow the same probabilitydistribution (Bishop 2006; Hastie et al. 2009; Vapnik 2013). However, this assumption no longer holds if we use high- b data for model training and low- b data for application, because the joint distribution of inputs and outputs P ( X, Y )diﬀers between training and test data (i.e. dataset shift ; Quionero-Candela et al. 2009).For our GPQ selections, the dataset shift includes changes in both source colors and prior probabilities of diﬀerentclasses. Sources in the Galactic plane become fainter and redder than those at high Galactic latitude due to greaterreddening, which changes the distribution of input features and the conditional probability of the output labels giventhe inputs P ( Y | X ) (i.e. covariate shift ; Shimodaira 2000; Sugiyama & Kawanabe 2012). Prior probabilities of starsare much higher than those of quasars (and galaxies) in the Galactic plane, which means the marginal probability P ( Y ) diﬀers from that at high Galactic latitude (i.e. class-balance change ; Saerens et al. 2002; Du Plessis & Sugiyama2014). Moreover, class ratio between extragalactic objects and stars may vary signiﬁcantly from one place to anotherin the Galactic plane, which we refer to as “internal” class-balance change of the test data. Transfer learning can be applied to improve the learning performance under dataset shift from a source domain tothe target domain (see a review in Pan & Yang 2009), where domain is a set D that consists of a feature space X anda marginal probability distribution P ( X ), D = {X , P ( X ) } . For our classiﬁcation task, the source domain data arefrom high Galactic latitude ( | b | > ◦ ) and the target domain data are from the Galactic plane ( | b | ≤ ◦ ). In thisstudy we only care about areas at δ > − ◦ due to the limit of PS1 survey coverage. Comparison of some propertiesof the source and target domains are listed in Table 1. Fu et al.

Table 1.

Comparison of the two domains of learningDomains of learning Location Labels of stars Labels of quasars/galaxies Internal class-balance changeSource domain | b | > ◦ Available Available ModerateTarget domain | b | ≤ ◦ Available Unavailable Severe

As large numbers of stars, quasars and galaxies have been spectroscopically identiﬁed at | b | > ◦ , labels for thesethree classes are available in the source domain. Since spectroscopically identiﬁed samples of quasars and galaxies aresigniﬁcantly lacked at | b | ≤ ◦ , labels for these two classes are unavailable in the target domain. Nevertheless, labelsof many stars in the target domain are available with the help of LAMOST spectroscopic survey.According to the classiﬁcation scheme for diﬀerent settings of transfer learning by Pan & Yang (2009), the set-upof classiﬁcation in the Galactic plane can be categorized into Transductive Transfer Learning , where source domainlabels are available and target domain labels are unavailable. A popular approach to

Transductive Transfer Learning is Feature-based Transfer (e.g. Blitzer et al. 2006; Argyriou et al. 2006), which reduces the diﬀerence between the sourceand target domain through feature transformation in either one or both of the domains.To solve the dataset shift problem of classiﬁcation in the Galactic plane, we borrow the idea of

Feature-based TransferLearning . Using the mapping relation between the features of high- b and low- b objects, we can generate mock samplesof quasars and galaxies in the Galactic plane to simulate the covariate change of their colors and magnitudes. TheLAMOST Galactic plane stars also contribute to a more accurate probability distribution of data in the target domain.To reduce the eﬀect of class-balance change, we manually go through two binary classiﬁcation steps rather than runninga three-class classiﬁcation algorithm only once.3.2. Modelling covariate change with mock samples

As data of LAMOST Galactic plane stars are available, we only focus on reducing the diﬀerences in features ofextragalactic objects between training and test data. For our classiﬁcation problem, all features will be constructedwith photometric data from PS1 and AllWISE. We assume that the diﬀerences in photometric properties betweenextragalactic objects in the Galactic plane and those oﬀ the plane are only caused by diﬀerent extinctions/reddeningalong their sight-lines. In this way, we can simply generate mock extragalactic objects behind the Galactic plane withdata obtained at high Galactic latitude, using the mapping relation determined by the Galactic extinction law andGalactic dust map.The covariate change can then be shown as color change of extragalactic objects on a set of color-color diagrams.Our classiﬁcation will perform better by adding mock samples of quasars and galaxies behind the Galactic plane intothe training set. 3.3.

Dealing with class imbalance and class-balance change in machine learning

With the data-level improvements above, the covariate change can be reduced. Eﬀorts on the algorithm level arerequired to handle the class imbalance and class-balance change. During the GPQ selections, instead of performing astar-quasar binary classiﬁcation, we additionally take galaxies into account and perform a three-class classiﬁcation.Many machine learning software packages support multi-class classiﬁcation jobs, by transforming the task intomultiple binary classiﬁcation problems. However, the built-in treatment is often inﬂexible and sometimes destructivewhen dealing with class imbalance problems. For example, in the scenario of using one-vs-rest (also known as one-vs-all ) strategy for multi-class classiﬁcation, at some stages, samples of one class are regarded as the positive sampleswhile all samples of other classes are regarded as negative samples. Even if all the classes in the training set have asame sample size, the binary classiﬁcation situation is imbalanced as the positive class (the “one”) has less samplesthan the negative class (the “rest”). In our case, the GPQ set (in both training and test set) has signiﬁcantly lesssamples than the sets of galaxies and stars, thus severe class imbalance will happen.To reduce the disadvantage of one-vs-rest strategy which is commonly used in machine learning algorithms, weconvert this three-class classiﬁcation problem into two binary classiﬁcation problems manually. In the ﬁrst step, theGalactic plane sources are classiﬁed into two classes: stars and extragalactic objects. Extragalactic objects are thenclassiﬁed into quasars and galaxies in the second step. By combining the two minority classes of quasar and galaxy into uasars behind the Galactic plane MOCK CATALOGS FOR QUASARS AND GALAXIES BEHIND THE GALACTIC PLANEIn order to construct training samples for extragalactic objects, as well as understand the covariate shift of themfrom high Galactic latitude to the Galactic plane, we synthesize quasars and galaxies behind the Galactic plane using

GoodQSO and

GoodGal samples. The synthesis is plausible if we assume the distribution of quasars on the celestialsphere is homogeneous and isotropic on large scale, just as the Cosmological principle has suggested. We not onlyobserve the changes in colors of quasars and galaxies as they are placed in low Galactic latitudes in this modelingprocess, but also get a rough estimation on the sky distributions of the sources that could be detected by a certainsky survey. 4.1.

Synthesizing procedures

Let E be a set of extragalactic objects ( E can be GoodQSO or GoodGal ). The synthesis process consists of followingsteps.

1. Correcting for extinctions.

Extinctions of objects in set E are corrected according to a two-dimensional dust mapprovided by Planck Collaboration et al. (2014, hereafter Planck14), and the optical to mid-IR extinction law fromWang & Chen (2019) with R V = 3 .

1. The E ( B − V ) values are retrieved using a Python module, dustmaps (Green2018).

2. Assigning new locations.

We generate a random sample of points that are uniformly distributed on the sky with | b | ≤ ◦ . The number of these random points is equal to the sample size of E . Coordinates of these points arerandomly assigned to objects of E as their new locations. Now we get a new set E m ( M ockGP Q , M ockGal ) withoutline-of-sight extinctions.

3. Adding new extinctions.

We add extinctions to the E m sample using the Planck14 dust map based on their new(mock) locations.

4. Setting limiting magnitudes.

We obtain a subset of E m by choosing sources brighter than the PS1 single epoch 5 σ depths in all PS1 passbands: ( grizy P ) < (22 . , . , . , . , . E gm ( GoodM ockGP Q , GoodM ockGal ), represents “good” mock sample that can be detected by PS1 survey in all bands. However, we don’tapply similar constraints to AllWISE bands as the magnitude which corresponds to a 5 σ sensitivity varies with location.Also, this extinction-selection eﬀect relies more on the optical survey depth than the IR survey depth. Factors suchas observation strategies and source confusions in dense ﬁelds are not taken into consideration in this step. Thereforewe may overestimate the detection rate of GPQs (and galaxies) through this synthesis. We select sources within thePS1 footprint (i.e. δ ≥ − ◦ ) and obtain set E gm -PS1 ( GoodM ockGP Q -PS1,

GoodM ockGal -PS1).

5. Constructing training sets with mock and real data.

For mock quasars that are not included in

GoodM ockGP Q -PS1, their original counterparts (high- b quasars in the input set GoodQSO ; denoted as C QSO ) are also added to thetraining and validation sets along with

GoodM ockGP Q -PS1. For mock galaxies that are not included in

GoodM ockGal -PS1, 25% of their original counterparts (high- b galaxies in the input set GoodGal ; denoted as C Gal ) are added to thetraining and validation sets. The resulted quasar and galaxy samples for training and validation are denoted as T QSO and T Gal , respectively. T QSO , T Gal , and the LAMOST Galactic plane star sample T Star form the training andvalidation sets for machine learning classiﬁcation.By adding good mock samples and real data ( C QSO and C Gal ) together instead of using only good mock samplesas training data for quasars or galaxies, we increase the data diversity as well as sample size of the training set. This data diversiﬁcation ensures that the training set can provide more discriminative information for the machine learningmodel (Gong et al. 2019). In addition, more training data can help reduce overﬁtting.The ﬂowchart of the synthesizing procedures is displayed in Figure 2.In the synthesizing process, we adopt the Planck14 dust map because it detects dust at a greater depth and betterestimates the 2D extinctions in the Galactic plane than do the dust maps constructed with stellar photometry (e.g.Green et al. 2018, 2019). We assume a uniform R V = 3 . R V varies slightly in the Galaxy with a dispersion ofabout 0.18 (Schlaﬂy et al. 2016). Such minor variations in R V can lead to small uncertainties of magnitudes and colorsof individual mock quasars (galaxies), but have limited impacts on the statistical properties of the training sample Fu et al.

GoodQSO (SDSS DR14Q), GoodGal (SDSS Galaxy)Correcting for extinctions(Planck dust map)Assigning random coordinates with | b | < 20°Adding new extinctions(Planck dust map)yes no( grizy P1 ) < (22.0, 21.8, 21.5, 20.9, 19.7) GoodMockGPQ,GoodMockGal Cross-id C QSO , C Gal noyesdec > -30°

GoodMockGPQ-PS1, GoodMockGal-PS1

TQSO , TGal

Figure 2.

Flowchart of synthesizing procedures for the mock catalogs. because mock sources with large extinctions (and thus large uncertainties caused by R V variations) are removed bythe magnitude limits, as we shall see in Section 4.2.4.2. Synthesizing results and dataset shift

We deﬁne the extinction-based selection rate in the Galactic plane as R = | E gm | / | E | , where | E | is the cardi-nality, i.e. number of elements/sources of set E . The source numbers of the input samples are | GoodQSO | =289 ,

271 and | GoodGal | = 1 , , | GoodM ockGP Q | =101 ,

482 and | GoodM ockGal | = 771 , R GP Q = | GoodM ockGP Q | / | GoodQSO | = 0 .

35 and R Gal = | GoodM ockGal | / | GoodGal | = 0 .

47, respectively. The selectionrate of galaxies is higher than that of GPQs because the input galaxies are on average brighter than the input quasars.With Step 2 in Section 4.1, sources of

M ockGP Q and

M ockGal are randomly and evenly distributed in the Galacticplane ( | b | < ◦ ). But after Step 4, the densities of remaining sources ( GoodM ockGP Q and

GoodM ockGal ) areinversely related to the dust map (Figure 3): more extragalactic sources remain detectable in regions with smaller E ( B − V ), and voids of detection present at regions with large E ( B − V ). A sky survey deeper than PS1 might helpmake up some fraction of the gap in the middle of the Galactic plane. The GoodM ockGP Q sample is sparser com-pared to

GoodM ockGal , simply because the input quasars are fewer than galaxies. Most sources of

GoodM ockGP Q and

GoodM ockGal have line-of-sight color excess of E ( B − V ) < .

5, which corresponds to extinction of A V < . R V = 3 .

1. The medians of line-of-sight E ( B − V ) of GoodM ockGP Q -PS1 and

GoodQSO are 0.21 and 0.03,respectively. In general,

GoodM ockGP Q -PS1 sample has signiﬁcantly larger E ( B − V ) compared to GoodQSO (seeFigure 4 (a)). Therefore covariate change for color indexes from high- b to low- b regions cannot be ignored. In addtion, GoodM ockGP Q -PS1 sources are fainter than

GoodQSO sources (Figure 4 (b)). uasars behind the Galactic plane b Dust map from Planck data b Sky density of

GoodMockGPQ l b Sky density of

GoodMockGal E ( BV ) D e n s it y [ d e g ] D e n s it y [ d e g ] Figure 3.

Dust extinction map along the Galactic plane retrived from Planck14 (top panel), sky density of

GoodMockGP Q (middle panel) and

GoodMockGal (bottom panel). E ( B V ) P r ob a b ilit y d e n s it y GoodMockGPQ-PS1GoodQSO (SDSS DR14Q) (a)

16 18 20 i P [AB mag] P r ob a b ilit y d e n s it y GoodMockGPQ-PS1GoodQSO (SDSS DR14Q) (b)

Figure 4.

Histograms of (a) line-of-sight E ( B − V ) and (b) i P band magnitudes of GoodMockGP Q -PS1 and

GoodQSO . The i P band magnitudes are not corrected for extinction. A series of color-color diagrams for

GoodQSO and

GoodM ockGP Q -PS1 along with SDSS stars and Galactic planepoint sources are shown in Figure 5 and Figure 6. In Figure 5, from the left to the middle panels, the covariate changeof colors of quasars from high Galactic laititude to the Galactic plane can be directly observed. The Galactic reddeningmakes the cluster of GPQs in a color-color plane extend towards redder colors (to the upper right along the reddeningvector) and scatter more than high- b quasars. The scattering is greater in color indexes of bluer bands, while less atredder bands. This trend is also observable in the quasar evolutionary tracks with E ( B − V ) = 0 , . , .

5. From thetop to the bottom panels in Figure 5, the distance between two quasar evolutionary tracks with diﬀerent reddeningdecreases.The covariate change of stellar colors is also evident from the color-color diagrams. The stellar loci are simple andclear for high- b (SDSS) stars (see Figure 5 (1a, 2a, 3a)). However, additional spikes along the direction of increasing E ( B − V ) appear in the stellar loci of Galactic plane stars due to reddening, as can be seen from Figure 5 (1b, 1c,0 Fu et al. g r [AB mag]0.50.00.51.01.52.0 r i [ A B m a g ] z =0 z =1 z =2 z =3 z =4Quasar evolutionary track E ( B V )=0 01234 R e d s h i f t (1a) g r [AB mag]0.50.00.51.01.52.0 r i [ A B m a g ] z =0 z =1 z =2 z =3 z =4Quasar evolutionary tracks 01234 R e d s h i f t (1b) g r [AB mag]0.50.00.51.01.52.0 r i [ A B m a g ] (1c) r i [AB mag]0.50.00.51.01.5 i z [ A B m a g ] z =0 z =1 z =2 z =3 z =4Quasar evolutionary track E ( B V )=0 01234 R e d s h i f t (2a) r i [AB mag]0.50.00.51.01.5 i z [ A B m a g ] z =0 z =1 z =2 z =3 z =4Quasar evolutionary tracks 01234 R e d s h i f t (2b) r i [AB mag]0.50.00.51.01.5 i z [ A B m a g ] (2c) i z [AB mag]1.00.50.00.51.01.52.0 z y [ A B m a g ] z =0 z =1 z =1.5Quasar evolutionary track E ( B V )=0 01234 R e d s h i f t (3a) i z [AB mag]1.00.50.00.51.01.52.0 z y [ A B m a g ] z =0 z =1 z =1.5Quasar evolutionary tracks 01234 R e d s h i f t (3b) i z [AB mag]1.00.50.00.51.01.52.0 z y [ A B m a g ] (3c) Figure 5.

Color-color diagrams of (1a, 2a, 3a) reddening-corrected

GoodQSO (color coded dots) and SDSS stars (black dots);(1b, 2b, 3b)

GoodMockGP Q -PS1 (color coded dots) and a random sample of PS1-AllWISE point sources (black dots) in Galacticplane ( | b | ≤ ◦ ); and (1c, 2c, 3c) the same sample of PS1-AllWISE point sources in Galactic plane. For panel on the left (1a,2a, 3a), quasar evolutionary tracks from redshift 0 to 4 without Galactic reddening ( E ( B − V ) = 0) are shown in red. Whilefor panel in the middle (1b, 2b, 3b), quasar evolutionary tracks from redshift 0 to 4 with E ( B − V ) = 0 , . , . E ( B − V ). Yellow crossesdenote points on the quasar evolutionary tracks without Galactic reddening with z = 0 , , , , z = 0 , , . b SDSS stars in the training set.Since mid-IR bands are less sensitive to extinction and reddening, the covariate change of AllWISE colors are lessobvious than that of PS1 colors. For instance, in Figure 6 (a), the quasar evolutionary tracks with E ( B − V ) = uasars behind the Galactic plane , . , . W − W W − W GoodM ockGP Q -PS1sources.The AllWISE color-color diagram also gives “hardness” information on the classiﬁcation problems that separatingquasars from galaxies is harder than separating quasars from stars. In general, quasars have redder W − W W − W W − W ≈

0; on the lower-left) and galaxy locus ( W − W ≈ . , W − W ≈ . W − W W − W W W W W [ V e g a m a g ] z = 1 z = 7 . . . . . Quasar evolutionary tracksReference contour lineMean quasar color at z <0.1 r e d s h i f t (a) W W W W [ V e g a m a g ] Reference contour line 0 1 2

PDF P D F / (b) Figure 6. (a) The W − W W − W GoodMockGP Q -PS1 (color coded dots) and PS1-AllWISE point sources (black dots) in Galactic plane ( | b | ≤ ◦ ), and (b) The W − W W − W W − W W − W GoodMockGP Q -PS1sample is plotted, and a reference contour line (in magenta) with density of 0.02 is speciﬁed. Quasar evolutionary tracks thatbegin at z = 0 . z = 7 with E ( B − V ) = 0 , . , . E ( B − V ). Gold cross marks denotes points onthe quasar evolutionary tracks without Galactic reddening with z = 1 , · · · ,

7. The mean color of quasars with z < . GoodMockGP Q -PS1 sample from panel (a) is shown in reddashed line; the lowest W − W W − W Some comparisons between mock quasars (

GoodM ockGP Q -PS1) and mock galaxies (

GoodM ockGal -PS1) behindthe Galactic plane are also shown in Figure 7. In the color-color diagrams of PS1 bands, galaxies largely overlap withquasars (Figure 7 (a, b, c)); while on the W − W W − W i PSF − i Kron at the faint end (Figure 7 (e)), as hasbeen pointed out by Yang et al. (2017). Among all the ∼ . GoodGal sources, ∼

200 are point sources with i PSF <

18 and i PSF − i Kron <

0, which can also been seen from Figure 7 (e). These point sources include few quasarswith “galaxy” labels and may also include some stars that are misclassiﬁed as galaxies. We do not pay attention tothese point sources because they only contribute to a tiny fraction of the whole galaxy sample.2

Fu et al. g r [AB mag] r i [ A B m a g ] GoodMockGPQ-PS1GoodMockGal-PS1 (a) r i [AB mag] i z [ A B m a g ] GoodMockGPQ-PS1GoodMockGal-PS1 (b) i z [AB mag] z y [ A B m a g ] GoodMockGPQ-PS1GoodMockGal-PS1 (c) W W W W [ V e g a m a g ] GoodMockGPQ-PS1GoodMockGal-PS1 (d)

14 16 18 20 i PSF [AB mag] i P SF i K r on GoodMockGPQ-PS1GoodMockGal-PS1 (e)

Figure 7.

Color-color diagrams of

GoodMockGP Q -PS1 and

GoodMockGal -PS1 (a, b, c, d), and i PSF − i Kron versus i PSF plotfor the two samples (e). Orange circles represent

GoodMockGP Q -PS1 sources while blue circles represent

GoodMockGal -PS1sources.

To sum up, we examine the properties of

GoodM ockGP Q , GoodM ockGal and PS1-AllWISE point-like sources inthe color-color spaces. For quasar candidates selection, contamination from both stars and galaxies should be takencare of. Simple PS1 color cuts are only capable of selecting quasars that are away from the stellar loci. Using a seriesof PS1 colors in high-dimensional space might help reduce the overlap between the stellar loci and clusters of quasarand galaxy. Moreover, with AllWISE colors, quasars can be better separated from stars and galaxies. Therefore weexpect that the combination of PS1 and AllWISE data will make quasar selection more eﬃcient.4.3.

A rough estimation on the lower limit to the sky density of GPQs

An estimation on the sky density of GPQs will be useful for evaluating the ﬁnal GPQ candidate sample and theselection method. However, the

GoodM ockGP Q sky distribution in the middle panel of Figure 3 does not reﬂect thetrue density of GPQs due to two reasons: (i) the synthesizing process does not consider the source crowdedness and itseﬀects on the photometric data quality; (ii) the source number of

GoodM ockGP Q only depends on the size of input

GoodQSO sample when the dust extinction map is ﬁxed.Let the density of

GoodM ockGP Q be D old , then the relative density of quasars with good photometry across theGalactic plane is: D (cid:48) new = D old × D goodph D all (1)where D all is the sky density of all PS1-AllWISE sources in the Galactic plane, and D goodph is the sky density ofsources with good photometry as deﬁned in Section 2.1 and Section 2.2. The fraction D goodph /D all roughly quantiﬁesthe eﬀects of source crowdedness on the photometric quality. We expect that the median sky density of GPQs is nohigher than that of GoodQSO (Median( D new ) ≤ Median( D GoodQSO )), therefore the lower limit “absolute” sky densityof GPQs can be computed as: D new ≥ D (cid:48) new × Median( D GoodQSO )Median( D (cid:48) new ) (2) uasars behind the Galactic plane D GoodQSO ) = 20 . − , and Median( D (cid:48) new ) = 2 . − . Figure 8 shows the sky distribution of D all , D goodph , D goodph /D all and D new . The estimated D new has a median of 20 . − and a maximum of 66 . − . b (a) Sky density of PS1-AllWISE Galactic plane sources ( D all ) b (b) Sky density of PS1-AllWISE Galactic plane sources with good photometry ( D goodph ) b (c) Fraction of sources with good photometry ( D goodph / D all ) l b (d) Estimated lower limit to the sky density of GPQs ( D new ) 24 D e n s it y [ d e g ] D e n s it y [ d e g ] F r ac ti on D e n s it y [ d e g ] Figure 8.

Sky density of all PS1-AllWISE Galactic plane sources (a) and its subset with good photometry (b), fraction ofsources with good photometry in the PS1-AllWISE sample (c), and estimated lower limit to the sky density of GPQs (d).

The predicted marginal probability of GPQs to the PS1-AllWISE sample with good photometry is D new /D goodph ,which ranges from 2 × − to 0.17 with a median of 3 × − . The maximum value of 0.17 is not reliable becauseit locates at edges of the HEALPix map ( δ ∼ − ◦ ), where source count in a pixel does not correspond to the truenumber in the sky region. GPQ CANDIDATE SELECTIONS WITH XGBOOSTWe use XGBoost (Chen & Guestrin 2016), a scalable tree boosting system, to perform machine learning classiﬁcationfor GPQ selection. XGBoost is an implementation of the original gradient boosting framework (Friedman et al. 2000;Friedman 2001), known for high eﬃciency and outstanding performance in machine learning competitions (Chen &Guestrin 2016). Compared to traditional Gradient Boosting Machines (GBM), XGBoost has made a few improvementsin the algorithm level. For example, XGBoost includes regularization terms in the objective function to control themodel complexity, therefore can reduce overﬁtting and improve the model generalization; XGBoost is optimizedfor sparse input data, i.e. data with missing values; Other than greedy algorithm by Friedman (2001), XGBoostsupports a weighted quantile sketch algorithm that can more eﬀectively ﬁnd the optimal split points. Moreover,system enhancements for parallelization, tree pruning, and cache optimization have been integrated into XGBoost.Recently, XGBoost has been applied to astronomy and showed its capabilities of handling astronomical problems,including identifying Galactic candidates among unassociated sources from the Third Fermi Large Area Telescope(LAT) catalog (3FGL; Acero et al. 2015) (e.g. Mirabal et al. 2016), distinguishing M giants from M dwarfs for spectralsurveys (e.g. Yi et al. 2019), and selecting quasar candidates with photometric data (e.g. Jin et al. 2019).In order to obtain the optimal models, we use optuna (Akiba et al. 2019), a hyperparameter optimization frameworkto tune the learning hyperparameters. As has been mentioned in Section 3.3, we transform the three-class classiﬁcationproblem into two binary classiﬁcation problems (stars versus extragalactic objects, and galaxies versus quasars). Under4

Fu et al. this setting, hyperparameters can be ﬁne-tuned separately for the two classiﬁcation steps. After classifying the Galacticplane sources with the two classiﬁers, we may use necessary additional criteria to ensure the purity of GPQ candidates.The classiﬁcation scheme is shown in Figure 9.A few evaluation metrics are used in the machine learning process:

Accuracy , P recision , Recall , F , M CC (Matthews correlation coeﬃcient) and

AU CP R (Area Under the Precision-Recall Curve). With true positive de-noted as

T P , true negative as

T N , false positive as

F P and false negative as

F N , the ﬁrst ﬁve metrics are deﬁnedas:

Accuracy = T P + T NT P + T N + F P + F N (3)

P recision = T PT P + F P (4)

Recall = T PT P + F N (5) F = 2 × P recision × RecallP recision + Recall (6)

M CC = T P × T N − F P × F N (cid:112) ( T P + F P )( T P + F N )( T N + F P )( T N + F N ) . (7)The Precision-Recall (PR) curve can be constructed by plotting precision-recall pairs (operating points) that areobtained using diﬀerent thresholds on a probabilistic or other continuous-output classiﬁer (Boyd et al. 2013). The AU CP R can then be calculated with numerical integration methods.Among the six metrics,

Accuracy , P recision , Recall and F are commonly used. However, the Accuracy and F metrics fail to measure the classiﬁcation performance correctly under class-imbalanced situations, because they willbe heavily biased towards the majority class. For example, given a sample with 95 from the negative class and 5 fromthe positive class, simply classifying all instances as negative produces Accuracy = 0 .

95 and F = 0 . Accuracy and F are high. The last two metrics, M CC and

AU CP R are considered better evaluation measures in class-imbalancedcases. The

M CC takes the four confusion matrix categories (

T P , T N , F P , F N ) into account, and it is high only ifthe classiﬁer makes good predictions on both positive and negative classes, independently of their ratios in the overalldataset (Chicco & Jurman 2020). It is also suggested by studies that the PR curve is more informative than themore famous Receiver Operator Characteristic (ROC) curve (ﬁrst recommended by Provost et al. 1998), especiallyon imbalanced datasets (Davis & Goadrich 2006; Saito & Rehmsmeier 2015).

AU CP R is useful as a measure of theoverall performance of the model.A total of 13 features are chosen for the two classiﬁcation steps and the later photometric redshift regression,including 11 colors: g − r , r − i , i − z , z − y , g − W r − W i − W z − W y − W W − W

2, and W − W i − i Kron and z − z Kron . As has been discussed in Section 4.2, using a set of PS1colors ( g − r , r − i , i − z , and z − y ) can help reduce the overlap between clusters of quasars and stellar loci ontwo-dimensional diagrams. Quasars have redder W − W W − W i − W y − W

1, and z − W

2) can be used to eﬃciently distinguish quasars from stars and improve the performanceof XGBoost classiﬁcation. We construct similar colors as features by combining all PS1 bands and W g − W r − W i − W z − W y − W W g − r ) do. The diﬀerence between PSF magnitude and Kronmagnitude in i P and z P bands ( i − i Kron and z − z Kron ) are used as morphological features to separate point sources(stars and quasars) from extended sources (galaxies). We convert Vega magnitude to AB magnitude for AllWISE datawhen constructing all the features. As we don’t set constraints on W W snr , some sources may havepoor or missing W W − W

3) data. Nevertheless, the use of W − W SN R are more informative than missing values.5.1.

Binary classiﬁcation for stars and extragalactic objects uasars behind the Galactic plane PS1-AllWISE Galactic plane sourcesXGBoost CLF-1 Star v.s. extra-galactic object Extra-galacticobjectsGalactic plane stars XGBoost CLF-2Galaxy v.s. QSOGalaxies GPQ candidatesyes no

Gaia log( f PM0 )≥ -4

XGBoost photo- z regressorReliable GPQcandidates withphoto- z Reject

Figure 9.

Flowchart of GPQ selection and photometric redshift calculation.

In the ﬁrst classiﬁcation step, the input data for training and validation consist of synthetic quasar sample T QSO ,synthetic galaxy sample T Gal (see Section 4.1), and LAMOST Galactic plane star sample T Star (Section 2.6). Theinput data have more than 3 million rows. For binarization, we assign the label

EXT (extragalactic object) to all T QSO and T Gal instances, and keep the label for T Star as STAR . Here we regard extragalactic objects as the positive classand stars as the negative class.We ﬁrst apply ﬁve-fold cross validations with optuna to ﬁnd the optimal setting of hyperparameters that minimizesthe log loss among 500 trials. For a binary classiﬁcation problem with a true label y ∈ { , } and a probability estimate p = Pr( y = 1), the log loss per sample is the negative log-likelihood of the classiﬁer given the true label: log loss ( y, p ) = − log Pr( y | p ) (8)= − ( y log( p ) + (1 − y ) log(1 − p )) . (9)Then we randomly split the whole input data into training set and validation set according to a 4 : 1 ratio and calculatescores of the six metrics with the validation set. This 4 : 1 split ratio is consistent with that of the ﬁve-fold crossvalidations. The large sample size of input data also ensures both training and validation sets have enough samples.Some ﬁxed parameters in our programs are: objective = binary:logistic ; booster = gbtree ; tree method = hist .For hyperparameters that are tuned, the default values, optimal values found by the cross validations, and correspond-ing metric scores of these parameters are listed in Table 2. The number of boosting rounds ( num boost round , a.k.a. n estimators in scikit-learn API of XGBoost) is ﬁxed to 100 and not tuned together with eta (a.k.a. learning rate ),because the eﬀects of increasing num boost round can cancel that of decreasing eta , and vice versa. In the trainingprocess, we need to lower the learning rate eta and increase the num boost round to reduce the generalization error.6 Fu et al.

The Classiﬁer No.1 (CLF-1) is trained using eta = 0 . num boost round = 1 ,

200 with other optimal parameters inTable 2.

Table 2.

Default and optimal hyperparameter settingsfor CLF-1 (star versus extragalactic object classiﬁca-tion) Hyperparameter Default Optimal eta (learning rate) lambda (reg lambda) alpha (reg alpha) max depth gamma (min split loss) grow policy depthwise depthwisemin child weight subsample colsample bytree max delta step

Accuracy

P recision + Recall + F MCC

AUCP R

We then classify the PS1-AllWISE point-like sources with CLF-1. To exclude as many stars as possible, we adopta high threshold on p EXT (model-predicted probabilities of sources for being extragalactic) to select extragalacticcandidates. Sources with p EXT > .

99 are labeled as

EXT , and the others are labeled as

STAR and removed.5.2.

Binary classiﬁcation for galaxies and quasars

We use T QSO and T Gal samples as input data for training and validation in the second classiﬁcation step. Here weregard quasars as the positive class and galaxies as the negative class.The same processes of parameter tuning and training as those of CLF-1 are applied to build CLF-2. We keep someparameters unchanged as: objective = binary:logistic ; booster = gbtree ; tree method = hist . For hyperparame-ters that are tuned, the default values, optimal values found by the cross validations, and corresponding metric scoresof these parameters are listed in Table 3. The CLF-2 is trained using eta = 0 . num boost round = 1 ,

500 withother optimal parameters in Table 3.The optimal scores of the six metrics in Table 3 are all lower than those in Table 2, indicating that the quasar–galaxy classiﬁcation “hardness” is higher than that of star–extragalactic problem. Here we also use a high thresholdof probability to select sources of our target class. We classify the sources labeled as

EXT with CLF-2. Sources with p QSO > .

95 are kept as GPQ candidates, where p QSO is the probability of a source for being a quasar predicted bythe XGBoost model.5.3.

Additional cut based on Gaia proper motion to remove stellar contaminants

In the ﬁrst classiﬁcation process, we classify all PS1-AllWISE point-like sources to stars and extragalactic objects.We ignore stars in the second classiﬁcation step. Although the metrics of CLF-1 are high (Table 2), some stars can bemisclassiﬁed as extragalactic objects, and then be classiﬁed either as quasars or galaxies. Faint stars are more likelyto be misclassiﬁed than bright stars because stars in the training sample ( T Star ) are biased towards the bright end. uasars behind the Galactic plane Table 3.

Default and optimal hyperparameter settingsfor CLF-2 (quasar versus galaxy classiﬁcation)Hyperparameter Default Optimal eta (learning rate) lambda (reg lambda) alpha (reg alpha) max depth gamma (min split loss) grow policy depthwise depthwisemin child weight subsample colsample bytree max delta step

Accuracy

P recision + Recall + F MCC

AUCP R

When using optical and near-IR colors for candidate selection, white dwarfs are major contaminants for low redshiftquasars, and M/L/T dwarfs are typical contaminants for high redshift quasars (e.g. Kirkpatrick et al. 1997; Venneset al. 2002; Chiu et al. 2006). In the mid-IR regime, potential stellar contaminants for quasars are Young StellarObjects (YSO), Asymptotic Giant Branch (AGB) stars, and Planetary Nebulae (PNe) (Koz(cid:32)lowski & Kochanek 2009;Koenig & Leisawitz 2014; Assef et al. 2018).The YSOs are stars at the early stages of evolution, and are often divided into four subclasses (Lada 1987): Class I,Class II, Flat spectrum, and Class III. Among them, Class II and Flat spectrum YSOs are the most-likely contaminantssince they have optical and mid-IR SEDs similar to those of quasars. Since we require both optical and mid-IRdetections for classiﬁcation, optically faint Class I YSOs are eliminated in the ﬁrst place. As has been studied byKoenig & Leisawitz (2014), Class III YSOs are clustered around W − W W − W W − W > .

25 and 1 . < W − W < . W − W W − W Gaia proper motion, because the proper motion distribution of quasarsis diﬀerent from that of Milky Way stars. Although quasars should have negligible transverse motions, non-zero propermotions of them are measured by

Gaia due to various eﬀects, such as photocenter variability of quasars (see Bachchanet al. 2016, and references therein). In addition, proper motions with large uncertainties are not reliable. Therefore we8

Fu et al. need a probabilistic cut instead of a cut on the total proper motion. We deﬁne the probability density of zero propermotion ( f PM0 ) of a source based on the bivariate normal distribution of proper motion measurements of the source as: f PM0 = 12 πσ x σ y (cid:112) − ρ exp (cid:40) − − ρ ) (cid:34)(cid:18) xσ x (cid:19) − ρxyσ x σ y + (cid:18) yσ y (cid:19) (cid:35)(cid:41) (10)where x = pmra, y = pmdec, and ρ = pmra pmdec corr (correlation coeﬃcient between pmra and pmdec) areobtained from Gaia

DR2 catalog, while σ x and σ y are the true external proper motion uncertainties calculated withthe method suggested by Lindegren et al. (2018a,b). The external proper motion uncertainty can be expressed as σ ext = ( k σ i + σ s ) , where σ ext can be σ x or σ y , k = 1 .

08 is a multiplicative factor, σ i is the catalog uncertainty(pmra error or pmdec error), and σ s is the systematic error. For bright sources ( G < σ s = 0 .

032 mas / yr; for faintsources ( G > σ s = 0 .

066 mas / yr. Under the same uncertainty level, sources with smaller proper motions will havehigher f PM0 by deﬁnition.We take the logarithm of f PM0 for better comparison between samples. Figure 10 shows distributions of log( f PM0 ) ofstars, galaxies and quasars used in this study. For stellar samples, in addition to T Star (LAMOST Galactic plane starsample), a subsample of the SDSS Stripe 82 Standard Star Catalog (hereafter S82 star; Ivezi´c et al. 2007) that meetsthe same constraints in Section 2.1 and Section 2.2 is also included for comparison. We choose a log( f PM0 ) ≥ − f PM0 ). ( f PM0 ) P r ob a b ilit y d e n s it y T Star ( f PM0 ) = 4 T Star (LAMOST Galactic plane star)GoodGal (SDSS galaxy)GoodQSO (SDSS DR14Q)S82 standard star

Figure 10.

Histograms of log( f PM0 ) of T Star (LAMOST Galactic plane star),

GoodGal (from SDSS galaxy),

GoodQSO (fromSDSS DR14Q), and sources from the SDSS Stripe 82 Standard Star Catalog. Because f PM0 is the probability density whichcan be greater than 1 (the integral of the probability density function over the entire space is equal to 1), log( f PM0 ) can havepositive values.

We calculate log( f PM0 ) for GPQ candidates after cross-matching them with

Gaia

DR2. For candidates without

Gaia

DR2 proper motion records, we assign a default value of 99 for log( f PM0 ). Sources with log( f PM0 ) ≥ − PHOTOMETRIC REDSHIFT ESTIMATION FOR GPQ CANDIDATESMeasuring redshifts is an important step for quasar surveys. For quasar candidates, photometric redshifts (photo- z )estimation is a key to follow-up studies. Many diﬀerent approaches have been proposed for calculating photo- z s ofquasars, including quasar template ﬁtting (e.g. Budav´ari et al. 2001; Babbedge et al. 2004; Salvato et al. 2009), theempirical color-redshift relation (e.g. Richards et al. 2001; Weinstein et al. 2004; Wu et al. 2004; Wu & Jia 2010; Wuet al. 2012), machine learning (e.g. Y`eche et al. 2010; Laurino et al. 2011; Brescia et al. 2013; Zhang et al. 2013;Pasquet-Itam & Pasquet 2018), XDQSOz method (Bovy et al. 2012), and Skew-QSO method (Yang et al. 2017). Asthe photo- z estimation problem can be well described by the regression problem in machine learning, we also useXGBoost to train the regression model and predict photo- z s for our reliable GPQ candidates.To build the training set and validation set, we randomly split the de-reddened GoodQSO sample with a ratio of4:1. Our application set (reliable GPQ candidates) is also de-reddened. The same 13 features as those in Section 5 are uasars behind the Galactic plane z regression: g − r , r − i , i − z , z − y , g − W r − W i − W z − W y − W W − W W − W i − i Kron and z − z Kron . The morphological features i − i Kron and z − z Kron are included as they may help distinguishquasars at diﬀerent cosmological distances. To obtain the optimal model, we also tune the parameters with ﬁve-foldcross-validations using optuna. z spec z pho t RMS error = 0.35 0.000.250.500.751.001.251.50 D e n s it y Figure 11.

Photometric redshift obtained with XGBoost regression model against spectral redshift of de-reddened validationset with 57,855 quasars. The red dashed line denotes z phot = z spec and the blue dotted lines mark the margin within one RMSEfrom the red dashed line. The performance of the XGBoost photo- z regression model on the test set can be examined in z phot - z spec (pho-tometric redshift versus spectral redshift) plot (Figure 11) or with two quantities: the root-mean-square error(RMSE) and photo- z accuracy. For a validation set with sample size n , the root-mean-square error is RMSE = (cid:112)(cid:80) ni =1 ( z phot − z spec ) /n . On our validation set with a sample size of 57,855, the RMSE is 0.35. The photo- z accu-racy R . is deﬁned as the fraction of quasars with | ∆ z | ≤ .

1, where | ∆ z | = | z spec − z phot | / (1 + z spec ). Our XGBoostregression model yields a photo- z accuracy of 74% on the validation set, which is comparable to that of Yang et al.(2017) on PS1 and WISE data (79%). Yang et al. (2017) adopted a multivariate Skew-t model and prior probabilitiesfrom the quasar luminosity function (QLF) to achieve the high photo- z accuracy. Figure 12 shows the photo- z accuracy R . as a function of spectral redshift (left panel) and de-reddened i P -band magnitude (right panel) respectively. R . has maximum values at z ≈ . z ≈

4, and reaches a minimum at z ≈

3. Most z ≈ α emission line enters g P band at z ≈ . r P band at z ≈ .

5, whichleads to large excess in g P magnitudes and hence similar PS1 colors of quasars within 2 . (cid:46) z (cid:46) .

5. This kind ofdegeneracy can be alleviated if SDSS u -band data are available to characterize the Lyman limit systems (see Section4 of Yang et al. 2017). The photo- z accuracy is improved at z (cid:38) .

5, because the Lyman limit enters g P band. R . also drops at low redshift ( z < THE GPQ CANDIDATE CATALOG7.1.

Validation of the GPQ candidates with Simbad, Milliquas, and SDSS DR16Q

With our Transfer Learning Framework and aforementioned additional selection criteria, we obtain a reliable GPQcandidate sample with 161,532 sources from PS1 and AllWISE. We cross-match the GPQ candidates with Simbaddatabase (Wenger et al. 2000), and ﬁnd 2,786 matches. The object types and summary are shown in Table 4.We categorize all matched sources to four groups: AGN/QSO, star (including PNe and PN candidates), galaxy,and other-type objects. Among all the matches, 53.98% (1,504) are recorded as AGN/QSO (including candidates),8.97% (250) are recorded as star (including candidates), 4.02% (112) are recorded as galaxy, and 33.02% (920) areother-type objects labeled according to the detection properties (e.g. wavelength). Those other-type objects have0

Fu et al.

Redshift R .

16 17 18 19 20 21 22 i P [AB mag] Figure 12.

Photo-z accuracy R . (the fraction of quasars with | ∆ z | ≤ .

1, where | ∆ z | = | z spec − z phot | / (1 + z spec )) as afunction of redshift (left panel) and magnitude (de-reddened; right panel). higher probabilities to be AGN/QSO than stars, as most (728+27) of them are radio sources, and 64 are X-ray sources(see Table 4). For the 4.02% sources labeled as galaxy, a number of them may also host AGN/QSO as we haveapplied careful selection criteria to remove possible galaxy contaminants. Among the 250 sources labeled as star, 40are candidates, and the other 210 are known stars. 101 of the known stars were once selected as QSO candidates usingSDSS photometry, and then identiﬁed as stars by the 2dF-SDSS LRG and QSO Survey (Croom et al. 2009). Fromthese analyses, we can conclude that, the purity of our GPQ candidates on the small subset of 2,786 Simbad matchescan be as high as ∼ Table 4.

Matching results of GPQ candidates and Simbad database

AGN/QSO Number Star Number Galaxy Number Other types NumberQSO 1121 Star 175 Galaxy 106 Radio source 728AGN candidate 150 YSO Candidate 22 Radio galaxy 3 IR source 77BL Lac object 143 YSO 13 Brightest galaxy in acluster (BCG) 2 X-ray source 64Seyfert 1 galaxy 38 Cataclysmic binarycandidate 7 Cluster of galaxies 1 Centimetric radio source 27AGN 31 Planetary Nebula (PN) 5 Blue object 22Other subclasses 21 Other subclasses 28 Far-IR source ( λ ≥ µ m) 2Total 1504 Total 250 Total 112 Total 920Fraction 53.98% Fraction 8.97% Fraction 4.02% Fraction 33.02% As another test on stellar contamination, we cross-match our GPQ candidates with LAMOST Galactic plane starsample. This match identiﬁes 29 LAMOST stars, all of which are not recorded in Simbad. Therefore the total numberof known stars in the GPQ candidates is 239.We also examine the fraction of known GPQs that can be recovered with our candidates table. The known GPQsample is

M LQSU B with 1,853 sources, which is retrieved from Milliquas catalog and described in Section 2.7. The

M LQSU B is selected with same constraints as those on our application PS1-AllWISE data, to get a consistent analysisresult. Cross-matching

M LQSU B with our GPQ candidates results in 1,763 matches, meaning that 95.14% of GPQsfrom Milliquas can be selected with our methods, under same quality constraints on the photometric data. The recentsixteenth data release of SDSS Quasar Catalog (DR16Q; Lyke et al. 2020) has a total of 750,414 sources, in which3,727 sources are located at | b | < ◦ . Only 1,320 of these SDSS GPQs meet the photometric quality constraintsin Section 2.1 and Section 2.2. Cross-matching our GPQ candidates with SDSS DR16Q gives 1,292 matches, which uasars behind the Galactic plane Description of the GPQ candidate catalog

We remove 239 known stars (see Section 7.1) from our GPQ candidate sample. We then match the remaining GPQcandidates by coordinates with TOPCAT internally, and found 347 close pairs within 0 . (cid:48)(cid:48) . These pairs are verylikely duplicated sources, because the PS1 survey can not resolve two sources within an angular distance of 0 . (cid:48)(cid:48) . Themedian image quality for PS1 3 π survey is FWHM = (1.31, 1.19, 1.11, 1.07, 1.02) arcseconds for ( grizy P1 ) (Magnieret al. 2016). Therefore we only keep one source for each close pair, and obtain the ﬁnal GPQ candidates sample with160 ,

946 sources. The GPQ candidate catalog is compiled based on this sample, with photometric data from PS1 DR1,AllWISE, and astrometric data from

Gaia

DR2. The descriptions for the catalog are displayed in Table 5.

Table 5 . Contents of the GPQ candidate catalog

Column Units Label Explanations1 — Designation Catalog designation hhmmss.ss+ddmmss.s (J2000) based on Pan-STARRS1(PS1) coordinates2 deg ra PS1 right ascension in decimal degrees (J2000) (weighted mean) at meanepoch3 deg dec PS1 declination in decimal degrees (J2000) (weighted mean) at mean epoch4 deg l Galactic longitude in decimal degrees5 deg b Galactic latitude in decimal degrees6 — photoz Photometric redshift predicted with XGBoost regressor7 — p star Probability of the object to be a star, predicted by the ﬁrst XGBoost classi-ﬁer, a.k.a. p star (p star+p ext=1)8 — p ext Probability of the object to be an extragalactic object, predicted by the ﬁrstXGBoost classiﬁer, a.k.a. p ext (p star+p ext=1)9 — p2 gal Probability of the object to be a galaxy, predicted by the second XGBoostclassiﬁer, a.k.a. p gal (p2 gal+p2 qso=1)10 — p2 qso Probability of the object to be a quasar, predicted by the second XGBoostclassiﬁer, a.k.a. p QSO (p2 gal+p2 qso=1)11 — fpm0 Probability density of zero proper motion ( f PM0 ) of the source12 — log fpm0 The logarithm of fpm0 (log( f PM0 ))13 mag ebv Line-of-sight E ( B − V ) given by the Planck14 dust map14 — PS objID Pan-STARRS1 (PS1) unique object identiﬁer15 mag gmag Mean PSF AB magnitude from PS1 g ﬁlter detections16 mag e gmag Error in gmag17 mag gKmag Mean Kron AB magnitude from PS1 g ﬁlter detections18 mag e gKmag Error in gKmag19 mag rmag Mean PSF AB magnitude from PS1 r ﬁlter detections20 mag e rmag Error in rmag21 mag rKmag Mean Kron AB magnitude from PS1 r ﬁlter detections22 mag e rKmag Error in rKmag23 mag imag Mean PSF AB magnitude from PS1 i ﬁlter detections24 mag e imag Error in imag25 mag iKmag Mean Kron AB magnitude from PS1 i ﬁlter detections26 mag e iKmag Error in iKmag27 mag zmag Mean PSF AB magnitude from PS1 z ﬁlter detections28 mag e zmag Error in zmag29 mag zKmag Mean Kron AB magnitude from PS1 z ﬁlter detections Table 5 continued on next page Fu et al.

Table 5 (continued)

Column Units Label Explanations30 mag e zKmag Error in zKmag31 mag ymag Mean PSF AB magnitude from PS1 y ﬁlter detections32 mag e ymag Error in ymag33 mag yKmag Mean Kron AB magnitude from PS1 y ﬁlter detections34 mag e yKmag Error in yKmag35 — AllWISE ID AllWISE unique source ID36 mag W1mag W1 (Vega) magnitude (3.35 µ m)37 mag e W1mag Mean error on W1 magnitude38 mag W2mag W2 (Vega) magnitude (4.6 µ m)39 mag e W2mag Mean error on W2 magnitude40 mag W3mag W3 (Vega) magnitude (11.6 µ m)41 mag e W3mag Mean error on W3 magnitude42 mag W4mag W4 (Vega) magnitude (22.1 µ m)43 mag e W4mag Mean error on W4 magnitude44 mag Jmag 2MASS J (Vega) magnitude (1.25 µ m)45 mag e Jmag Mean error on J magnitude46 mag Hmag 2MASS H (Vega) magnitude (1.65 µ m)47 mag e Hmag Mean error on H magnitude48 mag Kmag 2MASS Ks (Vega) magnitude (2.17 µ m)49 mag e Kmag Mean error on Ks magnitude50 — Gaia source id Gaia

DR2 unique source identiﬁer51 mas parallax

Gaia

DR2 parallax52 mas parallax error Standard error of parallax53 mas/yr pmra

Gaia

DR2 proper motion in right ascension direction54 mas/yr pmra error Standard error of pmra55 mas/yr pmdec

Gaia

DR2 proper motion in declination direction56 mas/yr pmdec error Standard error of pmdec57 — pmdec pmdec corr Correlation between pmra and pmdec58 mas/yr pmra error ext True external uncertainty of pmra59 mas/yr pmdec error ext True external uncertainty of pmdec60 — sb main id Main identiﬁer for an object in Simbad database61 — sb main type Main object type for an object in Simbad database62 — sb redshift Redshift of an object recorded in Simbad database

Note —This table is published in its entirety in the machine-readable format.

The sky density of sources from the GPQ candidate catalog is shown in Figure 13. In general, the sky distributionof the GPQ candidates is consistent with the prediction in Section 4.3. The highest sky density of the candidates is72.7 deg − , which is slightly higher than the estimation (66.7 deg − ). The median density is 16.7 deg − , which iscomparable but lower than the estimated value (or the median density of GoodQSO ). As can be seen from Figure13, the sky densities of GPQ candidates at | b | (cid:46) ◦ are lower than those of the estimation in Figure 8 (d), whichindicates that the modelling process overestimates the sky density of GPQs at lower Galactic latitudes. The regionwith δ (cid:46) − ◦ (0 ◦ (cid:46) l (cid:46) ◦ ) is blank, because it is not covered by the PS1 3 π survey.The distributions of de-reddened i P magnitudes and photometric redshifts of our GPQ candidates are displayed inFigure 14 (a). The lowest and highest photometric redshift are z phot = 0 .

016 and z phot = 4 . z estimations, the actual highest redshift of the GPQs can be up to 5. Fivepeaks appear in the histogram of photometric redshift (Figure 14 (a)) at z phot ≈ (0.8, 1.2, 1.7, 2.1, 2.4), which arecaused by selection eﬀects and sample bias of the training set. Quasars with these redshifts have higher chances to beselected with PS1 photometry: (i) when z ≈ .

8, the Mg ii emission line enters g P1 band; (ii) when z ≈ .

2, Mg iiuasars behind the Galactic plane l b Sky density of GPQ candidates 2550 D e n s it y [ d e g ] Figure 13.

Sky density plot of GPQ candidates in Galactic coordinates. enters r P1 band; (iii) when z ≈ .

7, C iii] enters g P1 band, and Mg ii enters i P1 band; (iv) when z ≈ .

1, both Si iv and C iv line enter g P1 band, and C iii] enters r P1 band; and (v) when z ≈ .

4, Ly α and Si iv enter g P1 band. z phot i P (a) GPQ candidates z spec i P (b) GoodQSO (SDSS DR14Q) Figure 14. (a) De-reddened i P magnitude and photometric redshift distribution of GPQ candidates. (b) De-reddened i P magnitude and redshift distribution of GoodQSO sample. The i P magnitude is de-reddened according to the Planck14 dustmap. The distributions of de-reddened i P magnitudes and spectroscopic redshifts of GoodQSO sample from SDSS DR14Qare also shown in Figure 14 (b) for comparison. The

GoodQSO sample and the sample of GPQ candidates have similarredshift distributions with some subtle diﬀerences. The magnitude distributions are also similar to each other, exceptthat

GoodQSO has a larger fraction of bright sources ( i P <

19) than the GPQ candidates. Such diﬀerences in bothredshift and magnitude distributions of these two samples are mainly due to their diﬀerent target selection strategies.Our GPQ candidates are selected from one single parent sample, while SDSS DR14Q includes many quasar samplesin various redshift and magnitude ranges (see Section 2.2 of Pˆaris et al. 2018).The color-color properties of sources from the GPQ candidate catalog are shown in Figure 15. In general, GPQcandidates have color-color distributions that are well matched to those of

GoodM ockQSO -PS1 (see Figure 5 and 6).The unimodal structures seen from both AllWISE and PS1 colors imply a low level of contamination from stars andgalaxies. However, contamination from stars can be recognized from the i − z versus r − i diagram, where some sourcesare concentrated along the stellar locus (see the slightly contaminated region “SC” in Figure 15 (c)). We apply nocut on the “SC” region because it only contains 7,892 sources (4.90% of whole catalog) and any cut is likely to alsoremove reddened quasars (see Figure 5). SUMMARY AND CONCLUSIONSWe present a Transfer Learning Framework for quasar selection, and its application on ﬁnding GPQs. We constructmock samples of quasars and galaxies behind the Galactic plane, by assigning new locations and extinction values4

Fu et al. W W W W [ V e g a m a g ] D e n s it y (a) g r [AB mag] r i [ A B m a g ] D e n s it y (b) r i [AB mag] i z [ A B m a g ] SC D e n s it y (c) i z [AB mag] z y [ A B m a g ] D e n s it y (d) Figure 15.

AllWISE (a) and PS1 (b, c, d) color-color diagrams of sources from the GPQ candidate catalog. Contour linesbased on 2d KDE are displayed on the density plots. The red dashed lines and “SC” in subplot (c) mark the region which isslightly contaminated by stars ( r − i > . i − z > . to the extinction-corrected high- b SDSS extragalactic sources. We use PS1 limiting magnitudes to select good mocksources, and compare them with high- b sources in color-color spaces. We show that the covariate change of sourcecolors is signiﬁcant from high- b regions to the Galactic plane. We synthesize training and validation data for machinelearning with: (i) good mock samples, (ii) SDSS extragalactic sources which do not have counterparts in the goodmock samples, and (iii) real LAMOST Galactic plane star sample.We apply XGBoost algorithm for machine learning in this study. To help reduce the eﬀects of class imbalance and class-balance change , we turn the three-class classiﬁcation task (star, galaxy, and quasar) into two binary classiﬁcationproblems. A total of 13 features are used for the two classiﬁcation steps: g − r , r − i , i − z , z − y , g − W r − W i − W z − W y − W W − W W − W i − i Kron and z − z Kron . In order to remove star and galaxy contaminants, weuse high thresholds of model-predicted probabilities ( p EXT > .

99 & p QSO > .

95) to select extragalactic and quasarcandidates. We perform an additional cut on probability density of zero proper motion (log( f PM0 ) ≥ −

4) based on

Gaia

DR2 data to further reduce stellar contamination. Using the extinction-corrected SDSS DR14Q sources, we buildthe photometric redshift estimator with RMSE = 0 .

35 on the validation set.Our GPQ candidate sample is validated with Simbad database and Milliquas catalog. The purity of quasars is ∼ uasars behind the Galactic plane ∼ − and thelowest marginal probability is ∼ − . We compile the GPQ candidate catalog after removing known stars in Simbadand LAMOST, and some duplicated sources. The GPQ candidate catalog consists of 160 ,

946 sources. In addtion toour machine learning predictions, we include PS1 and AllWISE photometry, as well as

Gaia

DR2 astrometry in thetable. The GPQ candidate sample has a broad redshift coverage (0 < z (cid:46) T Star ) are bright ones, identifyingand removing faint stars can be challenging for the XGBoost classiﬁcation model (CLF-1). Using colors instead ofmagnitudes as features helps to lessen the eﬀects of such training sample bias. The strict log( f PM0 ) ≥ − f PM0 ) distribution with quasars. The use of AllWISE colors ( W − W W − W

3) andmorphological separators ( i − i Kron and z − z Kron ) largely aids the galaxy-quasar classiﬁcation. For future GPQcandidate selections, we expect to improve the machine learning performance by compiling a Galactic plane startraining sample with more stars in the faint end.We have been carrying out a series of spectroscopic identiﬁcation for GPQ candidates since 2018, using opticaltelescopes including two-meter telescopes based at Lijiang and Xinglong in China and Siding Spring in Australia, and200-inch Hale Telescope in the US. The success rate of identifying new GPQs is ∼

90% in our spectroscopic campaign,which is consistent with the estimated reliability of the GPQ candidate catalog. We have also been exploring theLAMOST spectral data to ﬁnd new GPQs. All these eﬀorts yield promising results that will be presented in the nextpaper of this series. ACKNOWLEDGMENTSWe thank the support from the National Key R&D Program of China (2016YFA0400703) and the National ScienceFoundation of China (11533001, 11721303 & 11927804). We thank the referee very much for constructive and helpfulsuggestions to improve this paper. We thank Prof. Yanxia Zhang (NAOC) for helping us cross-match PS1 andAllWISE catalogs. We thank Dr. Jinyi Yang and Dr. Feige Wang from Steward Observatory for useful suggestions.YF thanks Dr. Hassen Yesuf (KIAA-PKU), Dr. Bojin Zhuang (Ping An Technology), Hongdong Zheng (PKU) andDinghuai Zhang (Mila/PKU) for helpful discussions on machine learning. YF thanks Prof. Gregory J. Herczeg, BitaoWang, Mingyang Zhuang, Yun Zheng, Yuanhang Ning and Niankun Yu from PKU for helping edit the draft.This publication makes use of data from the Pan-STARRS1 Surveys. The Pan-STARRS1 Surveys (PS1) and the PS1public science archive have been made possible through contributions by the Institute for Astronomy, the Universityof Hawaii, the Pan-STARRS Project Oﬃce, the Max-Planck Society and its participating institutes, the Max PlanckInstitute for Astronomy, Heidelberg and the Max Planck Institute for Extraterrestrial Physics, Garching, The JohnsHopkins University, Durham University, the University of Edinburgh, the Queen’s University Belfast, the Harvard-Smithsonian Center for Astrophysics, the Las Cumbres Observatory Global Telescope Network Incorporated, theNational Central University of Taiwan, the Space Telescope Science Institute, the National Aeronautics and SpaceAdministration under Grant No. NNX08AR22G issued through the Planetary Science Division of the NASA ScienceMission Directorate, the National Science Foundation Grant No. AST-1238877, the University of Maryland, EotvosLorand University (ELTE), the Los Alamos National Laboratory, and the Gordon and Betty Moore Foundation.This publication makes use of data products from the Wide-ﬁeld Infrared Survey Explorer, which is a joint projectof the University of California, Los Angeles, and the Jet Propulsion Laboratory/California Institute of Technology,funded by the National Aeronautics and Space Administration.This work has made use of data from the European Space Agency (ESA) mission

Gaia

Multilateral Agreement.The Guoshoujing Telescope (the Large Sky Area Multi-object Fiber Spectroscopic Telescope LAMOST) is a NationalMajor Scientiﬁc Project built by the Chinese Academy of Sciences. Funding for the project has been provided by the6

Fu et al.

Software: astropy (Astropy Collaboration et al. 2013; Price-Whelan et al. 2018), dustmaps (Green 2018), GNUParallel (Tange 2011), healpy (Zonca et al. 2019), HEALPix (G´orski et al. 2005), optuna (Akiba et al. 2019), scikit-learn(Pedregosa et al. 2011), TOPCAT (Taylor 2005), XGBoost (Chen & Guestrin 2016).REFERENCES