[PDF] QSO photometric redshifts using machine learning and neural networks

Abstract

The scientific value of the next generation of large continuum surveys would be greatly increased if the redshifts of the newly detected sources could be rapidly and reliably estimated. Given the observational expense of obtaining spectroscopic redshifts for the large number of new detections expected, there has been substantial recent work on using machine learning techniques to obtain photometric redshifts. Here we compare the accuracy of the predicted photometric redshifts obtained from Deep Learning(DL) with the k-Nearest Neighbour (kNN) and the Decision Tree Regression (DTR) algorithms. We find using a combination of near-infrared, visible and ultraviolet magnitudes, trained upon a sample of SDSS QSOs, that the kNN and DL algorithms produce the best self-validation result with a standard deviation of {\sigma} = 0.24. Testing on various sub-samples, we find that the DL algorithm generally has lower values of {\sigma}, in addition to exhibiting a better performance in other measures. Our DL method, which uses an easy to implement off-the-shelf algorithm with no filtering nor removal of outliers, performs similarly to other, more complex, algorithms, resulting in an accuracy of {\Delta}z < 0.1$ up to z ~ 2.5. Applying the DL algorithm trained on our 70,000 strong sample to other independent (radio-selected) datasets, we find {\sigma} < 0.36 over a wide range of radio flux densities. This indicates much potential in using this method to determine photometric redshifts of quasars detected with the Square Kilometre Array.

Full PDF

aa r X i v : . [ a s t r o - ph . C O ] F e b Mon. Not. R. Astron. Soc. , 1–13 (2021) Printed 19 February 2021 (MN L A TEX style ﬁle v2.2)

QSO photometric redshifts using machine learning and neuralnetworks

S. J. Curran ⋆ , J. P. Moss and Y. C. Perrott School of Chemical and Physical Sciences, Victoria University of Wellington, PO Box 600, Wellington 6140, New Zealand

Accepted —. Received —; in original form —

ABSTRACT

The scientiﬁc value of the next generation of large continuum surveys would be greatly in-creased if the redshifts of the newly detected sources could be rapidly and reliably estimated.Given the observational expense of obtaining spectroscopic redshifts for the large numberof new detections expected, there has been substantial recent work on using machine learn-ing techniques to obtain photometric redshifts. Here we compare the accuracy of the pre-dicted photometric redshifts obtained from

Deep Learning (DL) with the k -Nearest Neigh-bour (kNN) and the Decision Tree Regression (DTR) algorithms. We ﬁnd using a combi-nation of near-infrared, visible and ultraviolet magnitudes, trained upon a sample of SDSSQSOs, that the kNN and DL algorithms produce the best self-validation result with a standarddeviation of σ ∆ z = 0 . ( σ ∆ z ( norm ) = 0 . ). Testing on various sub-samples, we ﬁnd thatthe DL algorithm generally has lower values of σ ∆ z , in addition to exhibiting a better per-formance in other measures. Our DL method, which uses an easy to implement off-the-shelfalgorithm with no ﬁltering nor removal of outliers, performs similarly to other, more complex,algorithms, resulting in an accuracy of ∆ z < . up to z ∼ . . Applying the DL algorithmtrained on our 70 000 strong sample to other independent (radio-selected) datasets, we ﬁnd σ ∆ z . ( σ ∆ z ( norm ) . ) over a wide range of radio ﬂux densities. This indicates muchpotential in using this method to determine photometric redshifts of quasars detected with theSquare Kilometre Array. Key words: techniques: photometric – methods: statistical – galaxies: active – galaxies:photometry – infrared: galaxies – ultraviolet: galaxies

Continuum surveys on the next generation of telescopes, e.g. the

Square Kilometre Array (SKA), are expected to yield a large num-ber of sources for which the redshifts are unknown. Even the

Evo-lutionary Map of the Universe (EMU, Norris et al. 2011) on the

Australian Square Kilometre Array Pathﬁnder , an SKA precursor,is expected to yield 70 million distant radio sources. Given the ob-servational expense of high quality spectroscopic data, there is cur-rently much activity in developing reliable photometry-based red-shifts for distant sources (Luken et al. 2019 and references therein).The need for optical spectroscopy also causes a bias towardsthe most luminous sources, leaving the more obscured, gas-rich ob-jects undetected (Curran et al. 2006; Curran et al. 2011). For these,the redshifts would ideally be obtained from the radio photometricproperties, although this is proving difﬁcult (Majic & Curran 2015;Norris et al. 2019), due to their relatively featureless spectral en-ergy distributions (SEDs) and the limited number of radio sourcesfor which redshifts are available (Morganti et al. 2015).The optical band photometry methods generally use machine ⋆ [email protected] learning models trained on the u − g , g − r , r − i & i − z coloursof the Sloan Digital Sky Survey (SDSS), upon which they are self-validated (e.g. Richards et al. 2001; Ball et al. 2008). Due to thelarge redshift ranges being explored, Curran & Moss (2019) notedthe importance of the bands beyond the visible and, including theWISE and GALEX colours as features in the k -Nearest Neigh-bour algorithm, Curran (2020) found a signiﬁcant increase in theaccuracy over using the SDSS colours alone, bringing the standarddeviation in ∆ z ≡ z spec − z phot down from σ ∆ z [ data ] = 0 . to0.314.The F UV − NUV , NUV − u , u − g , g − r , r − i , i − z , z − W & W − W colours of the SDSS sample also provideda suitable training set for another independent dataset of radio-loudsources (i.e. quasars, as well as the optically selected QSOs). Inthis paper, we use the full set of colours to compare other machinelearning methods to the kNN, speciﬁcally Decision Tree Regression and

Deep Learning , and again explore their transferability in train-ing data for other, radio-selected, surveys, thus testing their poten- Wide-Field Infrared Survey Explorer (Wright et al. 2010). Galaxy Evolution Explorer data release GR6/7 (Bianchi et al. 2017).© 2021 RAS

S. J. Curran, J. P. Moss & Y. C. Perrott

Figure 1.

The mean magnitudes of the SDSS QSOs at redshifts of z < . , . < z < . , . < z < . , . < z < . and . < z < . . The positions of the mid-infrared to ultra-violet photometric bands areindicated along the top and bottom axes. The error bars show ± σ fromthe mean. The inversion of the trend at ν > ∼ Hz, where the ultra-violetﬂuxes are expected to be generally undetectable, conﬁrms our suspicion thatthere is confusion arising at the magnitude extremes. tial in yielding reliable photometric redshifts for sources detectedin forthcoming radio continuum surveys.

We extracted the ﬁrst 100 000 QSOs with accurate spectroscopicredshifts ( δz/z < . ) from the SDSS Data Release 12 (DR12,Alam et al. 2015). These were then matched to sources in the NASA/IPAC Extragalactic Database (NED), with the NED namesbeing used to scrape the

Wide-Field Infrared Survey Explorer (WISE), the

Two Micron All Sky Survey (2MASS, Skrutskie et al.2006) and GALEX databases. As per Curran (2020), in order toensure a uniform magnitude measure between the SDSS and othersamples (Sect. 3.2), for each QSO the PSF ﬂux densities asso-ciated with the AB magnitudes which fell within ∆ log ν = ± . of the central frequency of the band were added. Withineach band range the ﬂuxes were then averaged before being con-verted to a magnitude. This binning has the advantage of be-ing applicable to other samples for which the SDSS photome-try may not be directly available, but where there is other nearbyphotometry in other databases. That is, when using the SDSS totrain other independent data sets (see Sect. 3.2). The mean mag-nitudes close to redshifts of z = 0 , , , and 4 of the sam-ple are shown in Fig. 1. Retaining the sources detected in allnine bands ( F UV, NUV, u, g, r, i, z, W , W ) left a sample size71 267 QSOs, which is 71% of the original data. As previously (Curran 2020), we use the

F UV − NUV , NUV − u , u − g , g − r , r − i , i − z , z − W & W − W colours, in addition For GALEX this was via M = − . S ν − . , where S ν isthe speciﬁc ﬂux density in Jy (http://galex.stsci.edu/gr6/). In addition to yielding any radio photometry (see Sect. 3.2.5). to the r magnitude, as features. As per typical practice, we train on80% of the data and validate on the remaining 20%, quantifying theresult via the standard deviation of the photometric redshifts fromthe spectroscopic. That is, σ ∆ z = vuut N N X i =1 ∆ z , which we give for both the data and the Gaussian ﬁt, since thelatter is quoted in some of the literature (generally, σ ∆ z [ ﬁt ] <σ ∆ z [ data ] ). Also quoted is the normalised standard deviation (e.g.D’Isanto & Polsterer 2018; Luken et al. 2019), which is obtainedfrom ∆ z ( norm ) ≡ z spec − z phot z spec + 1 , giving σ ∆ z ( norm ) < σ ∆ z for the same data. Finally, we also give the median absolute deviation (MAD), σ MAD ≡ . × median | z spec − z phot | , and the normalised median absolute deviation (NMAD), σ NMAD ≡ . × median (cid:12)(cid:12)(cid:12)(cid:12) z spec − z phot z spec + 1 (cid:12)(cid:12)(cid:12)(cid:12) , e.g. Brescia et al. (2013) and Ananna et al. (2017), respectively.Note that, due to the randomisation of the training and validationdata, the values of σ exhibit some slight variation about the valuesquoted. k -Nearest Neighbour The kNN algorithm, which compares the Euclidean distance be-tween a datum and its k nearest neighbours in a feature space, hashad some success in predicting photometric redshifts. However, thestandard practice of using the u − g , g − r , r − i and i − z coloursalone results in poor predictive power at z > ∼ and non-Gaussiandistributions of ∆ z (Richards et al. 2001; Weinstein et al. 2004;Maddox et al. 2012; Han et al. 2016), with Curran (2020) obtaining σ ∆ z [ data ] = 0 . . By including the GALEX colours as featuresin the KNeighborsRegressor function of sklearn , this wasreduced to σ ∆ z [ data ] = 0 . , with the addition of near-infrared(NIR) W & W bands bringing this down to σ ∆ z [ data ] = 0 . (Curran 2020). Note that the addition of the mid-infrared (MIR) W & W bands did not have any noticeable beneﬁt, while reduc-ing the fraction of sources with detections in all of the bands from79% to 57% of the SDSS sample.The addition of the NIR bands tightened the z phot − z spec correlation over all redshifts, probably due to these spanning λ ∼ µ m inﬂection in the SEDs (see Sect. 3.3.1), and inclusion of theUV bands had the most profound effect at low redshifts, prob-ably due to sampling of the Lyman-break (see Sect. 3.3.3). In-clusion of other bands had been previously explored, giving sim-ilar results (e.g. Ball et al. 2008; Bovy et al. 2012; Brescia et al.2013; Yang et al. 2017; Duncan et al. 2018; Salvato et al. 2019).Re-testing the kNN algorithm on our larger dataset, 71 267 cf.26 301, we ﬁnd signiﬁcant improvement with σ ∆ z [ data ] = 0 . (Fig. 2, left), although, in addition to the larger sample, some ofthis will be due to the 80:20 training to validation ratio, cf. 50:50(Curran 2020). https://scikit-learn.org/stable/ © 2021 RAS, MNRAS , 1–13 SO photometric redshifts using machine learning Another common method used to determine photometric redshiftsis the DTR algorithm, which builds a regression model in the formof a tree structure, branching a large dataset into smaller subsetsstarting from the entire dataset (the top node). This branches intotwo child nodes, based on a predeﬁned decision boundary, with onenode containing data above the decision boundary and one below(Ivezi´c et al. 2014). This decision-based bifurcation continues re-cursively until a predeﬁned stopping criterion is reached. In thiscase, a certain value of ∆ z . Using the full set of colours as fea-tures for the DecisionTreeRegressor function of sklearn ,we ﬁnd a maximum tree depth of ≈ to be optimal. Valuesgreater than this result in over-ﬁtting, making the ∆ z distributionless Gaussian with a wider spread in σ ∆ z [ data ] (although σ ∆ z [ ﬁt ] narrows considerably). With σ ∆ z [ data ] = 0 . (Fig. 2, middle),the DTR algorithm does not appear to perform as well as the kNN. The concept of deep learning is to conﬁgure a computer architec-ture based upon the natural neurons found in biological brains, thusbeing very ﬂexible, non-linear and sensitive to patterns in multi-dimensional data. The artiﬁcial neuron (“perceptron”) is the func-tional unit of the DL algorithm and consists of several layers: aninput x i ; weights and biases, w i , deﬁning the importance of the in-puts; an activation function which transforms the weighted input –this is usually a non-linear function, such as the hyperbolic tangent (tanh), a sigmoid function , or the more step-wise Rectiﬁed LinearUnit (ReLu) function. Each perceptron of the input layer may beconnected to one, more or all of the other perceptrons in the adja-cent layers, and the inputs are multiplied by the weights. The outputthen becomes y k = g M X j =0 w ( N − k,j z ( N − i ! , where z i is the output of the i th perceptron in the N − level(Tagliaferri et al. 2003).We use TensorFlow , an off-the-shelf deep learning library.Since our output layer is an estimate of the redshift, the learning issupervised and, after testing various combinations of hyperparam-eters, we found a simple model with two ReLu layers and one tanh layer comprising 200 neurons each (giving 82 601 trainable param-eters) to be the most effective, with larger and additional layersbeing of little beneﬁt while considerably increasing the computa-tion time. Since performing a regression, we employed the keras

RMSprop optimizer, using early stopping with a patience of 10 toensure against over-ﬁtting.As with the kNN and DTR methods, we use the standard80:20 training–validation split, shufﬂing the data. Using the samecolour combinations as the other algorithms, we ﬁnd that the DLalgorithm provides predictions of similar accuracy to the kNNmethod with σ ∆ z [ data ] = 0 . (Fig. 2, right). We note thatthe using the magnitudes as features in the DL algorithm, ratherthan the colours, gives a very similar result. Brescia et al. (2013);Pasquet-Itam & Pasquet (2018) and Beck et al. (2021) also usemagnitudes as features for their algorithms and we interpret thisas the neutral net being free to decide how best to use this moreelemental data. In Fig. 3, we show the performance of each of the algorithms forvarious sample sizes, using the same 80:20 training-validation split.From the standard deviation, we see that the DL algorithm gener-ally performs best and, like the kNN, exhibits the expected anti-correlation between σ ∆ z and sample size. There is some scatter,most notably the DL point at σ ∆ z ≈ . & n = 10 000 , althoughthis is to be expected due to the shufﬂing of the training and valida-tion data in these single experiments. For the DTR algorithm, thereis no apparent relationship between σ ∆ z and n , which also appearsto be case for all other measures. The superior performance of theDL method is conﬁrmed by the other measures, where this gener-ally has lower values of σ ∆ z ( norm ) , σ MAD and σ NMAD , while havinghigher values of the regression coefﬁcient and a gradient which isgenerally closer to unity than the kNN and DTR algorithms. Testing the three popular methods, we ﬁnd deep learning to be thebest in terms of self-validation. Comparing with other DL photo-metric redshift estimates of QSOs, Laurino et al. (2011) used the

Weak Gated Experts method, which utilises a feature space ( z phot )which is partitioned by F UV − NUV , NUV − u , u − g , g − r , r − i , i − z colour, and an expert maps each pattern of featurespace to a target ( z spec ), with the output of the experts deﬁning anew feature space. This resulted in a sample of ≈

27 000 from theoriginal 105 783 QSOs of the SDSS DR7 (Schneider et al. 2010).Using a 60:40 training:validation split, the gate network uses a soft-max activation function to map extracted patterns from the new fea-ture space to the target space, which is an extension of the originalfeature space, with the added expert predictions. With this methodthey obtain σ ∆ z [ ﬁt ] = 0 . and σ NMAD = 0 . compared to our0.083 and 0.042, respectively.Brescia et al. (2013) used the u, g, r, i, z, NUV, F UV, W , W , W , W , in addition tothe UKIDSS DR9 Y, J, H, K near-infrared bands in a

MultilayerPerceptron Quasi-Newton Algorithm , in which the Hessian of theerror function ﬁnds the local minima and maxima of the activationfunction’s scalar error ﬁeld. They use a 60:40 training:testing split,where 10-fold cross validation is performed on the training set.Using all of the SDSS, WISE, GALEX & UKIDSS bands, theyobtain σ ∆ z = 0 . and σ MAD = 0 . , outperforming our model(Fig. 2, right). However, this does require six additional bands,giving just 13% of the full sample, cf. 71% for ours. This wouldalso lead to similar reductions in sample size when applied to othersources (see Sect. 3.2).D’Isanto & Polsterer (2018) combined a Mixture Density Net-work and a

Deep Convolution Network into a hybrid architecture,which they term a

Deep Convolutional Mixture Density Network .Like us, they used a rectiﬁed linear unit applied to multiple layers,although they extracted the features from the SDSS DR7 & DR9images. These comprise all ﬁve SDSS magnitudes and their colourcombinations, although we ﬁnd no extra beneﬁt by combiningboth magnitudes and colours for the SDSS-W1-W2-GALEX data.The image-based algorithm automatically selects the best features, These tend to be “dragged” down from unity by the excess of z spec ≫ z phot points at z spec > ∼ (see Sect. 3.3.2).© 2021 RAS, MNRAS , 1–13 S. J. Curran, J. P. Moss & Y. C. Perrott

Figure 2.

The predictions based on the colours and r magnitude from the different methods (kNN – left, DTR – middle, DL – right) using the SDSS sample,from a 80:20 training-validation split. In the scatter plot r shows the regression coefﬁcient and the numbers to the right show the number of sources withineach grey-scale pixel. The histogram shows the ∆ z distribution, where the mean and standard deviation are given for both the distribution and the Gaussianﬁt. Figure 3.

The total sample size versus the performance of each algorithm, as measured by the standard deviation, σ ∆ z , the normalised standard deviation, σ ∆ z ( norm ) , the median absolute deviation, σ MAD , the normalised median absolute deviation, σ NMAD , the regression coefﬁcient and the gradient of the linearregression between z phot and z spec . The kNN algorithm is represented by unﬁlled squares, the DTR by unﬁlled circles and the DL by ﬁlled stars. The DLalgorithm has the best performance as judged by all indicators. which required cluster computing to run. Using 185 000 sourcesfrom the SDSS DR7 and DR9 (Schneider et al. 2010; Pˆaris et al.2012), from their 54:46 training:validation ratio, they obtained σ ∆ z [ ﬁt ] = 0 . , σ ∆ z ( norm ) [ ﬁt ] = 0 . . and σ NMAD = 0 . .Pasquet-Itam & Pasquet (2018) used a Convolutional NeuralNetwork (CNN) to detect and classify QSOs from the light curvesin SDSS data (Stripe 82). Their architecture was based upon athree-layer CNN, which convolved the raw input with either a tem-poral convolution (time dependent magnitudes) or a ﬁlter convolu-tion (integrated magnitudes). They used a hyperbolic tangent acti-vation function applied to three layers and classiﬁed with an 80:20 training:validation ratio. Their architecture gives the probability ofa signal being in a particular redshift range and, using the nor-malised root mean square, they found 80.4% of the photometricredshifts within ∆ z ( norm ) < . , 87.1% within ∆ z ( norm ) < . and 91.8% within ∆ z ( norm ) < . , compared to our 91.0%,96.6% and 98.0%, respectively. Comparing their results with thekNN, support vector machine, a random forest classiﬁer and aGaussian process classiﬁer, they found that their method comparedmost favourably to the kNN, especially for z < . . The low num-ber of quasars at higher redshifts means that their method was nottrained well at z > ∼ . . © 2021 RAS, MNRAS , 1–13 SO photometric redshifts using machine learning Figure 4.

The redshift distribution for the SDSS (ﬁlled) and OCARS (un-ﬁlled) and LARGESS (hatched histogram) QSOs/quasars.

Similar to us, Beck et al. (2021) use

TensorFlow with ReLuactivation functions, but with all three-layers using 512 neuronsand the

Adam optimizer, upon the Pan-STARRS1 g, r, i, z, y mag-nitudes. Using an 80:20 training:validation split, they achievea simulated ∆ z ( norm )[ ﬁt ] = 0 . and σ NMAD = 0 . over a limited z spec ≈ − , acknowledging that this doesnot fully capture the variance in the data and includes galax-ies. This redshift limitation permits reasonable predictions oversuch a small number of bands, due to minimisation of the shiftin the rest-frame bands (Sect. 3.3.1), and is likely the reasonfor the more accurate predictions of galaxy, cf. QSO, redshifts(e.g. Ball et al. 2008; Laurino et al. 2011; Brescia et al. 2014;D’Isanto & Polsterer 2018; Ansari et al. 2019).We summarise these in Table 1, where we see that our DLmethod produces similar results to the literature. However, unlikethese other methods, which use magnitudes directly and validateon the same database, our scraping of the photometry allows us toapply the SDSS training to other (radio selected) samples, givingcomparable results (Sect. 3.2). Our goal is to obtain statistical redshift estimates for the radiocontinuum source surveys to be undertaken with the SKA and itspathﬁnders. Given this, we are interested in whether our methodsare transferable to other (radio-selected) data, although, as outlinedin Curran & Moss (2019), ﬁnding such datasets of sufﬁcient sizefor which redshifts are available is a challenge. One dataset is the

Optical Characteristics of Astrometric Radio Sources (OCARS)catalogue of

Very Long Baseline Interferometry astrometry sources(Malkin 2018), a sample of ﬂat spectrum radio sources with S -band(2–4 GHz) ﬂux densities ranging from 15 mJy to 4.0 Jy (Ma et al.2009).Applying the kNN training of the full set of colours to theOCARS sources gave σ ∆ z [ data ] = 0 . for the 739 of the 3663sources which had all of the required photometry (Curran 2020).Here, we perform the same test, but using the apparently supe-rior DL model, in addition to testing this on the Large Area Ra-dio Galaxy Evolution Spectroscopic Survey (LARGESS). Althoughboth OCARS and LARGESS are independent, we see that the red-shift distributions are well matched (Fig. 4).We also test a sample from the

Faint Images of the RadioSky at Twenty-Centimeters (FIRST, Becker et al. 1995; White et al.1997) survey, from which the redshifts have been obtained bymatching with SDSS DR14 QSOs (Pˆaris et al. 2018). Although, ar- guably not as independent a sample as OCARS or LARGESS, thisprovides 18 273 radio sources with redshifts with which to test themodel.

Applying the DL training to the OCARS data, we ﬁnd σ ∆ z [ data ] =0 . (Fig. 5, left), which is similar to the standard deviation ob-tained previously, using the kNN training. From the SEDs (Fig. 6),we see a similar pattern to the SDSS (Fig. 1), although the OCARSsources are generally brighter, across all magnitudes, especially at z ∼ . We also note that the mean NIR band for the OCARSsources is relatively featureless. LARGESS contains 10 856 optical redshifts for 12 329 radiosources (Ching et al. 2017) and comprises a mix of galaxies andquasars. Of the latter, there are 1608 for which the redshift relia-bility ﬂag is q > and 409 where q = 5 . Of these, there are1046 and 292 QSOs with all nine magnitudes, respectively. FromFig. 5 (middle), we see that the DL training on the SDSS sampleprovides an excellent prediction of the photometric redshifts, evenfor the q < sources with σ ∆ z [ data ] = 0 . . Limiting this tothe q = 5 quasars only improves the prediction somewhat, giv-ing σ ∆ z [ data ] = 0 . , σ ∆ z ( norm ) [ data ] = 0 . , σ MAD = 0 . and σ NMAD = 0 . . The SEDs bear more similarity to the SDSSQSOs than the OCARS quasars (Fig. 7). This could be due to theLARGESS sources having been matched with SDSS counterparts(Ching et al. 2017).

The FIRST sample comprises 18 273 quasars common to theSDSS DR14. Although the full photometry is available (Pˆaris et al.2018) , we obtain the data as per the other samples, in or-der to ensure its compilation in a consistent manner. In partic-ular, the GALEX data compiled in Pˆaris et al. have been force-photometered, recovering low signal-to-noise measurements notdetected by GALEX. This, of course, leads to some matches notpresent in the GALEX GR6/7 catalogue and some extremely highmagnitudes, reaching NUV = 35 . and F UV = 37 . . Fromour matching, 6129 of the sample have all nine magnitudes, and, asmay be expected, the quality of the photometric redshifts are closeto those of the SDSS validation data (Fig. 2, right).

Upon considering the scarcity of a large catalogue of radiosources with spectroscopic redshifts (cf. Drinkwater et al. 1997;Jackson et al. 2002; Brookes et al. 2008; Callingham et al. 2017),Curran & Moss (2019) tested their combination of magnitudes onthe “21-cm sample”, a compilation of radio sources searched for The redshift reliability ﬂags range from q = 0 − by increasing quality,where, for example, q = 0 – designates a “poor-quality (or missing) spec-trum”, q = 3 – “a reasonably conﬁdent redshift” and q = 5 – an “extremelyreliable redshift from a good-quality spectrum”. Of the 219 708 sources, 12 487 have

NUV > and

34 872 have

Table 1.

Comparison with other deep learning methods applied to QSOs. n gives the total number of sources (test + validation). Note that σ ∆ z ( norm ) [ data ] isonly quoted by Brescia et al. (2013) ( σ ∆ z ( norm ) [ data ] = 0 . ). The radio samples for this paper utilise the SDSS DR12 DL model (Sect. 3.2).Reference Sample Photometry z -range n Data Fit σ NMAD σ ∆ z ( norm ) σ ∆ z σ ∆ z ( norm ) Laurino et al. (2011) SDSS DR7

F UV, NUV, u, g, r, i, z < ∼

27 000 0.029 — 0.198 —Brescia et al. (2013) SDSS DR7 15 bands .

14 000 — 0.069 — —D’Isanto & Polsterer (2018) SDSS DR9 u, g, r, i, z < ∼

185 000 0.026 — 0.217 0.095Pasquet-Itam & Pasquet (2018) SDSS DR7 (Stripe 82) u, g, r, i, z < ∼ . g, r, i, z, y < ∼

540 244 > . — — > . This paper SDSS DR12 9 bands .

71 267 0.042 0.110 0.083 0.038OCARS ... .

649 0.063 0.170 0.107 0.050LARGESS ... . . .

28 0.124 0.140 0.165 0.093

Figure 5.

The predictions for the other samples trained using the colours from the DL training on SDSS data and validation on the OCARS data (left), theLARGESS q = 3 − data (middle) and the FIRST data (right). Using the F UV, NUV, u, g, r, i, z, W , W magnitudes directly gives similar results. associated H I . z . Of this, only 262 are quasars of whichonly 28 have the full magnitude compliment. Nevertheless, we ap-ply the DL training on the SDSS sample to predict the redshifts(Fig. 8) and see that the photometric redshifts are reasonable forthis small sample.

Given the improved sensitivities, many of the sources detected bythe SKA and its pathﬁnders are expected to have ﬂux densitiesmany times lower than the radio sources discussed here. For in-stance, EMU is expected to be sensitive to ﬂuxes of S radio < ∼ . mJy over large areas of the sky (Norris et al. 2011), whereas,being VLBI calibration sources, OCARS sources have ﬂuxes S radio > ∼ . Jy. As mentioned above, the requirement of a cata-logue of radio sources with spectroscopic redshifts limits the num-ber of independent test samples. However, we should bear in mindthat the training sample itself, although optically selected, contains some radio sources. Of these, 3505 have at least one radio detec-tion with the others either being unsearched or with ﬂuxes belowthe sensitivity of current surveys.Our mining of the source photometry (Sect. 2.1) also includesradio data and, in order to quantify the radio ﬂuxes, we extract the ν < GHz values. Where there is more than one ﬂux for a givensource we average these. Showing the ﬂux distributions in Fig. 9,we do see that the OCARS sources are much brighter than theSDSS, which in turn have a similar distribution to the LARGESSquasars, which are considered radio sources (Ching et al. 2017).We note that the OCARS distribution appears more symmetricalthan those of the SDSS and LARGESS, which is suggestive of theweaker sources being truncated by the ﬂux limited nature of the sur-veys. Due to small numbers, it is difﬁcult to comment for the 21-cmquasars, although it is clear that the ﬂuxes are about two orders ofmagnitude greater than the SDSS and LARGESS QSOs. Given theresults of testing on both OCARS and LARGESS (Fig. 5), as wellas the 21-cm sample (Fig. 8), we are conﬁdent that our algorithmcan yield accurate photometric redshifts for sources with a widerange of radio ﬂux densities. © 2021 RAS, MNRAS , 1–13

SO photometric redshifts using machine learning Figure 6.

The mean magnitudes of the OCARS quasars at redshifts of z < . , . < z < . , . < z < . , . < z < . and . < z < . . The error bars show ± σ from the mean. Again, we seethe same inversion for the high redshift FUV magnitudes, indicating theirunreliability (cf. Fig. 1). Figure 7.

The mean magnitudes of the LARGESS q = 3 − quasars atredshifts of z < . , . < z < . , . < z < . , . < z < . and z > . (due to a lower number of high redshift sources, Fig. 4). Theerror bars show ± σ from the mean. Again, we see the same inversion forthe high redshift FUV magnitudes, indicating their unreliability (cf. Figs. 1& 6). In addition to the differences in radio ﬂux, it is known that the meanoptical and IR SEDs can differ between radio-loud and radio-quietsources (Elvis et al. 1994). In Fig. 10, we show the mean SEDsfrom which we see that the LARGESS and SDSS are very similar,which is not surprising given their similarity in mean radio ﬂux den-sities. As expected, the much more radio-loud OCARS and 21-cmquasars exhibit a substantial difference from the radio-quiet QSOs,particularly in the optical band. The SEDs do, however, overlapwith the SDSS σ uncertainties, which is in line with the consid-erable variation between individual sources of the similar loudnessnoted by Elvis et al.. Also, the fact that the SDSS training appliedto the other samples returns reasonable redshift predictions, sug-gests that the differences are small enough not to deviate signif-icantly from the trained model, although they could contribute tothe larger fraction of poor predictions for the OCARS data. Figure 8.

The photometric redshift prediction for the 21-cm sample. Thedotted line shows z phot = z spec and the broken lines ± σ ∆ z [ data ] . Figure 9.

The mean radio ﬂux densities of the QSOs/quasars in the SDSS,OCARS, LARGESS ( q = 3 − ), FIRST and 21-cm samples with mea-sured GHz ﬂux densities. The legend shows the mean value of eachdistribution. For consistency, the ﬂuxes for the FIRST sources are obtainedin the same way as the other samples and the binning is twice as ﬁne as forthe other samples in order to ﬁt within a similar range. The 1.4 GHz ﬂuxdensities given in the FIRST catalogue range from 0.75 mJy to 14.8 Jy forthis sample.

While the above does indicate that training the algorithmon the SDSS sources can provide reliable photometric red-shifts for the radio-selected samples, the requirement of the u, g, r, i, z, W , W , NUV, F UV magnitudes does mean that fora signiﬁcant fraction of sources we cannot obtain a photometricredshift to the same reliability. Of the 1817 of the 3033 OCARSquasars which have at least one SDSS magnitude, 1136 have all ﬁvemagnitudes, with 1034 of these also having W & W , and 649(21%) having all of the required photometry. For LARGESS, 1046 © 2021 RAS, MNRAS000

Figure 10.

The mean SEDs of the various sub-samples. The error bars showthe ± σ uncertainty in the binned SDSS luminosities. Figure 11.

Right ascension and declination of the SDSS (northern sky –yellow), OCARS (all sky – red) and LARGESS ( − ◦ < δ < ◦ – blue)sources. out of 1608 q > quasars have the full magnitude complement,giving a matching coincidence of 65%. The limited OCARS–SDSScoordinate overlap (Fig. 11) highlights an important issue: Giventhat the SKA, and two of its pathﬁnders, will operate in the south,SkyMapper (Wolf et al. 2018) magnitudes should provide the opti-cal magnitudes for the training of southern sources, upon the databecoming available. The improvement in the redshift prediction obtained by the in-clusion of the infrared and ultra-violet photometry has previouslybeen noted by Bovy et al. (2012) and Brescia et al. (2013). Theyattribute the inclusion of these to the breaking of the redshift de-generacy, which arises when using the u, g, r, i, z photometry alone(e.g. Richards et al. 2001; Maddox et al. 2012). From our own data,without the full photometry we see that the results degrade signiﬁ- http://skymapper.anu.edu.au Figure 12.

The ∆ z distributions using the SDSS colours only, theSDSS+WISE ( W , W ) and the full SDSS +WISE ( W , W )+GALEXcolour set.. Figure 13.

The dependence of source-frame wavelength with redshift forour photometric bands. The dotted lines on the left-hand ordinate show theobserved-frame values and the curves branching off from these the rest-frame wavelength as a function of source redshift. cantly (Fig. 12), conﬁrming the importance of the IR and UV data.Curran & Moss (2019) found an empirical relationship be-tween the redshift and the ratio of the U − K and W − F UV colours in the rest-frame of the source. They hypothesised that thiswas due to the UV–NIR colours tracing the luminosity of the activegalactic nucleus and, as seen from Fig. 13, the observed colourswill depend strongly upon the redshift of the source. For instance,to measure the

F UV and i luminosities at z = 3 requires the r and W magnitudes in the observed-frame. As noted by Curran (2020),inclusion of the mid-infrared W & W magnitudes to the SDSStraining has little effect on σ ∆ z , which we conﬁrm is also the casefor the DL algorithm. Curran (2020) also noted the λ ∼ µ m inﬂection in the SED(e.g. Edelson & Malkan 1986; Barvainis 1987; Elvis et al. 1994),which was attributed to λ > ∼ µ m NIR emission from heated dust.An alternative explanation is that this is due to H α ( λ = 656 nm)emission, which, at λ = 656 nm, would be apparent between the r and i bands at z = 0 , shifting to W at z > ∼ . However, as per Their inclusion reduces the validation sample size from 14 253 to11 045, but with no reduction in σ ∆ z . © 2021 RAS, MNRAS , 1–13 SO photometric redshifts using machine learning Figure 14.

Top: The W − W colour versus redshift (cf. Assef et al. 2010;Reed et al. 2015). Bottom: The NUV − u colour versus redshift. In bothpanels the points show the SDSS QSOs of the training sample and the tracesshow the average values of the three main samples. Curran (2020), we see little evidence of the shifting of this featurein either of the SEDs (Figs. 1, 6 & 7), although there does appearto be inﬂections at ≈ µ m and ≈ . µ m in the SDSS and related(LARGESS & FIRST) spectra (Fig. 10). A ≈ µ m feature hasbeen previously noted in type-2 objects (Hickox et al. 2017) and sowe attribute the ≈ . µ m inﬂection to the rise of the “big bluebump”, due to thermal emission from the accretion disk.Any inﬂection in Figs. 1, 6 & 7 could, of course, be some-what masked by the limited J, H, K photometry of our sample(see Sect. 3.3.2), although the W − W colour does exhibit aclear redshift dependence for all samples (Fig. 14, top), where thepeak at z ∼ . corresponds to the rest-frame J, H, K bands.We also see that, despite our reservation of the applicability of theSDSS training to radio-loud sources (Sect. 3.2.5), both OCARS andLARGESS follow a similar evolution to the SDSS QSOs. This alsoapplies at the other wavelength extreme, where the evolution of

NUV − u colour is also similar between the three main samples(Fig. 14, bottom) , all exhibiting a steep evolution in NUV − u at z > ∼ due to the Lyman-break. In the case of continuum emission from heated dust, we wouldexpect an increase in redshift (and thus, luminosity) to cause anincrease in the peak frequency of the modiﬁed black body emis-sion (Curran & Duchesne 2018). This increase would counteractany redshift in the peak of the proﬁle, curtailing any apparent shiftin the NIR peak perhaps holding the observed NIR peaks at closeto λ ∼ µ m in these relatively low resolution SEDs.Lastly, the W − W colour is hypothesised to trace hotdust, from AGN activity, and the W − W colour warm dust,from star formation (Jarrett et al. 2011; Donoso et al. 2012). The Given our suspicions about the

F UV photometry, especially at highredshift (Figs. 1, 6 & 7), we choose this colour rather than

F UV − NUV . For the evolution of the constituent magnitudes see Fig. 19.

Table 2.

Results of the 10-dimensional linear regression of the standardised( µ = 0 , σ = 1 ) magnitudes and z spec versus ∆ z for the SDSS test data. m is the gradient, followed by the standard error. The intercept has a valueof c = 0 . ± . . The ﬁnal column gives the t -statistic.Parameter m σ m / √ n t [ σ ] F UV

NUV -0.1490 0.003 -44.995 u -0.0728 0.005 -15.964 g r i z -0.1595 0.008 -19.877 W -0.1643 0.007 -25.177 W z spec Figure 15. ∆ z versus z spec for the SDSS test data binned into bins of250 QSOs each (leaving 3 remaining). The error bars show the standarderror, σ/ √ n , and the dashed and dotted lines show the mean and standarddeviation of the binned values, respectively. fact that the W − W colour exhibits a tighter correlation withsource redshift (Curran & Duchesne 2018) suggests that, due to theMalmquist bias, higher AGN activity is being detected at higherredshift, thus contributing to the determination of the photometricredshifts (Curran & Moss 2019). Therefore, the MIR magnitudesshould also be useful in tracing rest-frame NIR ( W & W ). How-ever, this would not take effect until z > ∼ . In order to investigate the source of the inaccuracies in z phot , weperform a 10-dimensional linear regression of the magnitudes andspectroscopic redshift versus ∆ z (Table 2). Although, most of theparameters are signiﬁcant ( | t | > ∼ σ ), as discussed above, the NIR( W ) and UV ( NUV ) appear to be extremely important.We also note the high signiﬁcance of z spec and, plotting thisagainst ∆ z (Fig. 15), we see that the positive ∆ z values at z spec > ∼ are more extreme than the negative ones at z spec < ∼ . Referringto Fig. 13, at z spec ∼ only the bluer rest-frame bands ( F UV to u ) are observed (over g to z ), with no coverage until z in the rest-frame ( W observed). That is, the large gap between the observed z and W bands, manifests in a loss of crucial data ( g, r, i ) in therest-frame, leading to the drop in accuracy at z spec > ∼ .This gap could potentially be covered with the inclusion of the2MASS J, H, K bands (see Fig. 1), which were included in our © 2021 RAS, MNRAS , 1–13 S. J. Curran, J. P. Moss & Y. C. Perrott

Figure 16.

The redshift distribution for the SDSS QSOs with all nine mag-nitudes (ﬁlled – 71 267 sources) and those with g, r, i, z, W , W magni-tudes (unﬁlled – 95 351 sources). It is seen that the additional 23 084 sourcesmostly have redshifts z > ∼ . mining of the photometry (Sect. 2.1), cf. the UKIDSS Y, J, H, K bands included by Brescia et al. (2013), Sect. 3.1. However, theinclusion of these bands reduces the sample to just 19 077 QSOs.Using an 80:20 training:testing split, we obtain very similar resultsto our standard u, g, r, i, z, W , W , NUV, F UV model, but withthe validation sample reduced from 14 253 to 3815. Note that theimplementation of a similarly wide array of bands has been usedto yield photometric redshifts through template ﬁtting, speciﬁcallyby Ananna et al. (2017) who obtain σ NMAD = 0 . for 5961 X-raysources in Stripe 82. Given the disadvantage of requiring all nine magnitudes and thefact that the “bluer” bands ( u, NUV, F UV ) are more likely tobe undetected, particularly at high redshift (Fig. 16), we investi-gate the effect of removing these magnitudes from the DL model.Excluding these gives a matching coincidence of 95%, cf. 71%,of the original sample (Sect. 2.1). As before, we use an 80:20training–validation split on the SDSS data, giving a validation sam-ple of 19 070 QSOs with the g, r, i, z, W , W magnitudes. FromFig. 17, we see a considerable spread in ∆ z ( σ ∆ z [ data ] = 0 . )compared to the full F UV, NUV, u, g, r, i, z, W , W comple-ment ( σ ∆ z [ data ] = 0 . ). Since the UV magnitudes may be im-portant in tracing the AGN activity at low redshift (Sect. 3.3.1), weexpect that most of the uncertainty, due to the omission of the bluermagnitudes, would occur at z ∼ . This is conﬁrmed in Fig. 18,where the previous inaccuracy at z > ∼ appears also to be exac-erbated (cf. Fig. 15). In summary, although the exclusion of thecolours which use the u, NUV, F UV magnitudes does increasethe number of high redshift detections, this is at the expense of thelow redshift accuracy, with the additional high redshift estimatesgained being of relatively low quality.The multi-dimensional linear regression of the magnitudesversus ∆ z (Table 2) suggests that both the NUV and u magnitudesare especially crucial in obtaining a reliable photometric redshift.Examining the redshift distribution of the magnitudes (Fig. 19), wesee that the NUV magnitude climbs rapidly at z > ∼ and the u at z > ∼ , which is consistent with the rest-frame λ = 1216 ˚ALyman-break. The F UV magnitude also exhibits this at the ex-pected z > ∼ . , although as established from Figs. 1, 6 & 7, thisband is not expected to be reliable at z > ∼ .The poorer performance resulting from the exclusion of the Figure 17.

As Fig. 2 (right) – the predictions based on the DL methodon the SDSS sample, but excluding the colours which use u and GALEXmagnitudes. Figure 18.

As Fig. 15, but excluding the ”bluer” ( u, NUV, F UV ) magni-tudes and binned into bins of 205 QSOs each (leaving 5 remaining). Theerror bars show the standard error σ/ √ n and dashed and dotted lines showthe mean and standard deviation of the binned values, respectively. bluer magnitudes, suggests that the algorithm relies heavily uponthe Lyman-break in estimating the redshift. The fact that both ofthese magnitudes peter out (at NUV > ∼ & u > ∼ ) is most likelyresponsible for the poor photometric redshifts estimates at high red-shift. Improving upon this would require an automated and reliablemethod of including the non-detections or targetted longer integra-tions upon the sources of interest. Nevertheless, from Fig. 15, itappears that the photometric redshifts are statistically accurate upto z ∼ . , which, at a look-back time of 11 Gyr, covers the past80% of cosmic history. Forthcoming radio continuum surveys on the next generation oftelescopes are expected to yield vast numbers of sources for which © 2021 RAS, MNRAS , 1–13

SO photometric redshifts using machine learning Figure 19.

The magnitude versus redshift for the SDSS validation sample in 50 bins of 151 QSOs (leaving one remaining). As mentioned above, the highredshift

F UV data are probably unreliable. the redshifts will be unknown. Since these are observationally ex-pensive to obtain, as well as introducing a bias towards the mostoptically bright objects, a rapid method of estimating accurate red-shifts from the photometry would vastly increase the scientiﬁcvalue of these surveys.Given the large datasets involved, machine learning is themost promising means. While there has been success training thealgorithms on optical (SDSS) photometry alone, we have previ-ously shown that the combination of SDSS, WISE W & W and GALEX colours leads to a signiﬁcant increase in accuracy(Curran 2020). Here we compare our previous method, the k -Nearest Neighbour, with the Decision Tree Regression and DeepLearning methods. Mining the F UV, NUV, u, g, r, i, z, W , W photometry of the 100 000 QSOs selected from the SDSS DR12,gives 71 267 sources which are detected in all nine bands. Testingthe full sample and various sub-samples shows the DL algorithmto perform the best, as measured by the (normalised) standard de-viation, the (normalised) median absolute deviation, the regressioncoefﬁcient and the gradient of the linear ﬁt between the predictedand measured redshifts. Training the DL algorithm on 80% of theSDSS sample and validating on the other 20%, yields an accuracyof ∆ z < . up to z ∼ . which corresponds to look-back timesof 11 Gyr.In order to determine the suitability of the DL model in pre-dicting redshifts for other, radio-selected, samples, as for the SDSSsample, we scrape the photometry from various databases and binthese into the appropriate bands. We then use our full SDSS DR12 sample to train the model and validate this on the other catalogues.We ﬁnd this cross-training to be successful, yielding photomet-ric redshifts up to z ∼ , with a standard deviation in ∆ z of σ ∆ z [ data ] ≈ . − . . This is despite the mean radio ﬂux den-sities of the samples differing by two orders of magnitude fromthe training sample, as well as there being a clear difference in themean optical SEDs (cf. Elvis et al. 1994).As per the kNN and other methods (Bovy et al. 2012;Brescia et al. 2013), the accuracy of the photometric redshift pre-dictions depends heavily upon the addition of the infrared andultra-violet photometry to the standard u, g, r, i, z (Richards et al.2001; Weinstein et al. 2004; Ball et al. 2008; Maddox et al. 2012;Han et al. 2016). As noted by Curran & Moss (2019), over a largerange of redshifts, a given rest-frame magnitude could occur in sev-eral other bands. This suggests that the mid-infrared W & W bands may be important at z > ∼ , where the data are currentlysparse, and that the current loss in accuracy at z > ∼ is due tothe large gap between the SDSS and WISE bands. This could besomewhat remedied with the inclusion of the NIR J, H, K bands(Brescia et al. 2013), although the large reduction in sample sizeprevents this being the case for our sample.Our data scraping and binning of the photometry has the ad-vantage over other methods which use the SDSS magnitudes di-rectly in that it produces a model that can be used to train othersamples which may have little SDSS, but other optical photome-try available (e.g.

B, G, V, R ). Our DL method has the advantageover similar applications of neural networks in that it utilises an © 2021 RAS, MNRAS , 1–13 S. J. Curran, J. P. Moss & Y. C. Perrott off-the-shelf deep learning library with basic hyperparameters –two

ReLu layers and one tanh layer each comprising 200 neurons.It also runs rapidly on a standard laptop computer, compared tothe DCMDN method (D’Isanto & Polsterer 2018), which requiresa cluster, and the MLPQNA method (Brescia et al. 2013), which re-quires the prior “pruning” of features in order to speed up the com-putation. The main disadvantage, common to these other methods,is the requirement of the detection of the source over nine differ-ent observing bands, particularly the GALEX bands which are themost restrictive. Furthermore, the requirment of SDSS magnitudesto validate the all-sky surveys halves the number of sources whichcan be used. This highlights the need for southern sky training data(e.g. using SkyMapper), if the aim is to obtain photometric redshiftsfor continuum sources detected with the SKA.

ACKNOWLEDGEMENTS

We wish to thank the referee for their very helpful comments. Thisresearch has made use of the NASA/IPAC Extragalactic Database(NED) which is operated by the Jet Propulsion Laboratory, Califor-nia Institute of Technology, under contract with the National Aero-nautics and Space Administration and NASA’s Astrophysics DataSystem Bibliographic Service. This research has also made use ofNASA’s Astrophysics Data System Bibliographic Service. Fundingfor the SDSS has been provided by the Alfred P. Sloan Foundation,the Participating Institutions, the National Science Foundation, theU.S. Department of Energy, the National Aeronautics and SpaceAdministration, the Japanese Monbukagakusho, the Max PlanckSociety, and the Higher Education Funding Council for England.This publication makes use of data products from the Wide-ﬁeldInfrared Survey Explorer, which is a joint project of the Univer-sity of California, Los Angeles, and the Jet Propulsion Labora-tory/California Institute of Technology, funded by the NationalAeronautics and Space Administration. This publication makes useof data products from the Two Micron All Sky Survey, which isa joint project of the University of Massachusetts and the InfraredProcessing and Analysis Center/California Institute of Technology,funded by the National Aeronautics and Space Administration andthe National Science Foundation. GALEX is operated for NASA bythe California Institute of Technology under NASA contract NAS5-98034.

DATA AVAILABILITY

Data and SDSS

TensorFlow training model available on request.

REFERENCES

Alam S. et al., 2015, ApJS, 219, 12Ananna T. T. et al., 2017, ApJ, 850, 66Ansari R. et al., 2019, A&A, 623, A76Assef R. J. et al., 2010, ApJ, 713, 970Ball N. M., Brunner R. J., Myers A. D., Strand N. E., Alberts S. L.,Tcheng D., 2008, ApJ, 683, 12Barvainis R., 1987, ApJ, 320, 537Beck R., Szapudi I., Flewelling H., Holmberg C., Magnier E.,2021, MNRAS, 500, 1633Becker R. H., White R. L., Helfand D. J., 1995, ApJ, 450, 559Bianchi L., Shiao B., Thilker D., 2017, ApJS, 230, 24Bovy J. et al., 2012, ApJ, 749, 41 Brescia M., Cavuoti S., D’Abrusco R., Longo G., Mercurio A.,2013, ApJ, 772, 140Brescia M., Cavuoti S., Longo G., De Stefano V., 2014, A&A,568, A126Brookes M. H., Best P. N., Peacock J. A., R¨ottgering H. J. A.,Dunlop J. S., 2008, MNRAS, 385, 1297Callingham J. R. et al., 2017, ApJ, 836, 174Ching J. H. Y. et al., 2017, MNRAS, 464, 1306Curran S. J., 2020, MNRAS, 493, L70Curran S. J., Duchesne S. W., 2018, MNRAS, 476, 3580Curran S. J., Hunstead R. W., Johnston H. M., Whiting M. T.,Sadler E. M., Allison J. R., Athreya R., 2019, MNRAS, 484,1182Curran S. J., Moss J. P., 2019, A&A, 629, A56Curran S. J. et al., 2011, MNRAS, 416, 2143Curran S. J., Whiting M. T., Murphy M. T., Webb J. K., LongmoreS. N., Pihlstr¨om Y. M., Athreya R., Blake C., 2006, MNRAS,371, 431D’Isanto A., Polsterer K. L., 2018, A&A, 609, 111Donoso E. et al., 2012, ApJ, 748, 80Drinkwater M. J. et al., 1997, MNRAS, 284, 85Duncan K. J. et al., 2018, MNRAS, 473, 2655Edelson R., Malkan M., 1986, ApJ, 308, 59Elvis M. et al., 1994, ApJS, 95, 1Han B., Ding H.-P., Zhang Y.-X., Zhao Y.-H., 2016, Research inAstronomy and Astrophysics, 16, 74Hickox R. C., Myers A. D., Greene J. E., Hainline K. N., Zakam-ska N. L., DiPompeo M. A., 2017, ApJ, 849, 53Ivezi´c ˇZ., Connolly A., VanderPlas J., Gray A., 2014, Statistics,Data Mining, and Machine Learning in Astronomy: A PracticalPython Guide for the Analysis of Survey Data. Princeton Uni-versity PressJackson C. A., Wall J. V., Shaver P. A., Kellermann K. I., HookI. M., Hawkins M. R. S., 2002, A&A, 386, 97Jarrett T. H. et al., 2011, ApJ, 735, 112Laurino O., D’Abrusco R., Longo G., Riccio G., 2011, MNRAS,418, 2165Luken K. J., Norris R. P., Park L. A. F., 2019, PASP, 131, 108003Ma C. et al., 2009, IERS Technical Note, 35, 1Maddox N., Hewett P. C., P´eroux C., Nestor D. B., Wisotzki L.,2012, MNRAS, 424, 2876Majic R. A. M., Curran S. J., 2015, Radio Photometric Redshifts:Estimating radio source redshifts from their spectral energy dis-tributions. Tech. rep., Victoria University of WellingtonMalkin Z., 2018, ApJS, 239, 20Morganti R., Sadler E. M., Curran S., 2015, Advancing Astro-physics with the Square Kilometre Array (AASKA14), 134Norris R. P. et al., 2011, PASA, 28, 215Norris R. P. et al., 2019, PASP, 131, 108004Pˆaris I. et al., 2012, A&A, 548, A66Pˆaris I. et al., 2018, A&A, 613, A51Pasquet-Itam J., Pasquet J., 2018, A&A, 611, A97Reed S. L. et al., 2015, MNRAS, 454, 3952Richards G. T. et al., 2001, AJ, 122, 1151Salvato M., Ilbert O., Hoyle B., 2019, Nature Astronomy, 3, 212Schneider D. P. et al., 2010, AJ, 139, 2360Skrutskie M. F. et al., 2006, AJ, 131, 1163Tagliaferri R. et al., 2003, Neural Networks, 16, 297Weinstein M. A. et al., 2004, ApJS, 155, 243White R. L., Becker R. H., Helfand D. J., Gregg M. D., 1997, ApJ,475, 479Wolf C. et al., 2018, PASA, 35, 10 © 2021 RAS, MNRAS , 1–13