[PDF] Non-Parametric Cell-Based Photometric Proxies for Galaxy Morphology: Methodology and Application to the Morphologically-Defined Star Formation -- Stellar Mass Relation of Spiral Galaxies in the Local Universe

Abstract

(Abridged) We present a non-parametric cell-based method of selecting highly pure and largely complete samples of spiral galaxies using photometric and structural parameters as provided by standard photometric pipelines and simple shape fitting algorithms, demonstrably superior to commonly used proxies. Furthermore, we find structural parameters derived using passbands longwards of the g band and linked to older stellar populations, especially the stellar mass surface density μ ∗ and the r band effective radius r e , to perform at least equally well as parameters more traditionally linked to the identification of spirals by means of their young stellar populations. In particular the distinct bimodality in the parameter μ ∗ , consistent with expectations of different evolutionary paths for spirals and ellipticals, represents an often overlooked yet powerful parameter in differentiating between spiral and non-spiral/elliptical galaxies. We investigate the intrinsic specific star-formation rate - stellar mass relation ( ψ ∗ − M ∗ ) for a morphologically defined volume limited sample of local universe spiral galaxies, defined using the cell-based method with an appropriate parameter combination. The relation is found to be well described by ψ ∗ ∝ M −0.5 ∗ over the range of 10 9.5 M ⊙ ≤ M ∗ ≤ 10 11 M ⊙ with a mean interquartile range of 0.4 dex. This is somewhat steeper than previous determinations based on colour-selected samples of star-forming galaxies, primarily due to the inclusion in the sample of red quiescent disks.

Full PDF

aa r X i v : . [ a s t r o - ph . C O ] N ov Mon. Not. R. Astron. Soc. , 1–39 (2002) Printed 18 September 2018 (MN L A TEX style ﬁle v2.2)

Non-Parametric Cell-Based Photometric Proxies forGalaxy Morphology: Methodology and Application to theMorphologically-Deﬁned Star Formation – Stellar MassRelation of Spiral Galaxies in the Local Universe

M. W. Grootes ⋆ , R. J. Tuﬀs , C. C. Popescu , A. S. G. Robotham , , M. Seibert ,L. S. Kelvin , , Max-Planck Institut f¨ur Kernphysik, Saupfercheckweg 1, 69117 Heidelberg, Germany Jeremiah Horrocks Institute, University of Central Lancashire, Preston PR1 2HE, UK ICRAR, The University of Western Australia, 35 Stirling Highway, Crawley, WA 6009,Australia SUPA School of Physics & Astronomy, University of St. Andrews, North Haugh, St. Andrews KY16 9SS, UK Observatories of the Carnegie Institution for Science, 813 Santa Barbara Street, Pasadena, CA 91101, USA Institut f¨ur Astro- und Teilchenphysik, Universit¨at Innsbruck, Technikerstrasse 25, 6020 Innsbruck, Austria

Accepted ???? Received ????; in original form ????

ABSTRACT

We present a non-parametric cell-based method of selecting highly pure and largelycomplete samples of spiral galaxies using photometric and structural parameters asprovided by standard photometric pipelines and simple shape ﬁtting algorithms. Theperformance of the method is quantiﬁed for diﬀerent parameter combinations, usingpurely human-based classiﬁcations as a benchmark. The discretization of the parame-ter space allows a markedly superior selection than commonly used proxies relying ona ﬁxed curve or surface of separation. Moreover, we ﬁnd structural parameters derivedusing passbands longwards of the g band and linked to older stellar populations, espe-cially the stellar mass surface density µ ∗ and the r band eﬀective radius r e , to performat least equally well as parameters more traditionally linked to the identiﬁcation of spi-rals by means of their young stellar populations, e.g. UV/optical colours. In particularthe distinct bimodality in the parameter µ ∗ , consistent with expectations of diﬀerentevolutionary paths for spirals and ellipticals, represents an often overlooked yet pow-erful parameter in diﬀerentiating between spiral and non-spiral/elliptical galaxies. Weuse the cell-based method for the optical parameter set including r e in combinationwith the S´ersic index n and the i − band magnitude to investigate the intrinsic speciﬁcstar-formation rate - stellar mass relation ( ψ ∗ − M ∗ ) for a morphologically deﬁnedvolume limited sample of local universe spiral galaxies. The relation is found to bewell described by ψ ∗ ∝ M − . ∗ over the range of 10 . M ⊙ M ∗ M ⊙ with amean interquartile range of 0 . Key words: galaxies:fundamental parameters – galaxies:spiral – galaxies:structure– galaxies:photometry.

With the advent of large optical photometric ground andspace-based surveys which are ongoing, commencing (TheSloan Digital Sky Survey (SDSS; York et al. 2000; Abaza-jian et al. 2009, The Galaxy And Mass Assembly Survey ⋆ E-mail: [email protected] (GAMA; Driver et al. 2011), SKYMAPPER (Keller et al.2007), The VST Atlas, The Kilo Degree Survey (KiDS;de Jong et al. 2012), The Dark Energy Survey (DES; TheDES collaboration 2005)) or scheduled to commence inthe next years (e.g., EUCLID; Laureijs et al. 2011), thenumber of extragalactic sources with reliable, uniform datais increasing dramatically, further opening the door tostatistical studies of the population of galaxies, both at c (cid:13) M. W. Grootes, et al. local and intermediate redshifts.To ﬁrst order, the visible matter distributions of galaxiesmay be classiﬁed as being best described either as an ex-ponential disk, i.e. a largely rotationally-supported system,or a spheroid, i.e. a largely pressure-supported system.This dichotomy forms the basis of the standard morpho-logical categorization of galaxies into late-types/spiralsand early-types/ellipticals, introduced by Hubble (1926)and in widespread use ever since. This basic morpho-logical bimodality of the galaxy population appears tobe mirrored in a range of physical properties, with late-type/spiral galaxies having blue UV/optical colours andshowing evidence of star formation, on average, whileearly-type/elliptical galaxies appear red on average, andmostly only display a low level of star formation, if any atall (e.g. Strateva et al. 2001; Baldry et al. 2004; Balogh etal. 2004). However, a wide variety of exceptions to this ruleexist. For example, spiral galaxies may appear red due tothe attenuation of their emission by dust in their disks, ora spiral may truly have very low star formation and redcolours whilst maintaining its morphological identity, while,on the other hand, an elliptical galaxy may appear blue dueto a localized recent burst of star- formation.It is assumed that diﬀerent modes of assembly of the stellarpopulations of these galaxy categories are responsible for thedistinction. This in turn, necessitates the ability to reliablyidentify and distinguish between the types of galaxies wheninvestigating the physical processes determining galaxyformation and evolution on the basis of large statisticalsamples of galaxies. Furthermore, it is clear that in anyinvestigation of galaxy properties for a given morphologicalclass, the classiﬁcation itself should not introduce a biasinto the property being investigated. For example, a puresample of spiral galaxies used to investigate star-formationas a function of galaxy environment must include thepopulation of red, passively star-forming spiral galaxies.Visual classiﬁcations of galaxy morphology by professionalastronomers therefore remain the method of choice and thebenchmark for robustly identifying the morphology of agalaxy. However, such classiﬁcations may suﬀer from biasesarising from the individual performing the classiﬁcation,and the uncertainty/robustness of the classiﬁcation isdiﬃcult to quantify. Furthermore, in the case of marginallyresolved data, even the ability of the human eye to iden-tify morphological structure may be limited, so that thedecreasing linear resolution as a function of redshift mayintroduce systematic biases. In such cases, quantitativephotometric measures of the light-proﬁle may be at leastas reliable as human classiﬁcations. The overriding factwhich immediately stymies the visual classiﬁcation byprofessionals of all sources in modern imaging surveyssuch as SDSS, however, is the size of the galaxy samplesprovided by the surveys, and accordingly the time requiredfor classiﬁcation. Thus, one is forced to develop alternativeschemes for obtaining morphological classiﬁcations of largesamples of galaxies.Recently, in an attempt to circumvent the limitations insample size, reduce the possibility of bias, and provide anobjective measure of robustness, Lintott et al. (2008) haveenlisted the help of ’citizen scientists in visually classifying alarge fraction of SDSS DR7 galaxies in the GALAXY ZOOproject (Lintott et al. 2008, 2011), releasing a catalogue of probability-weighted visual classiﬁcations into spiralsand ellipticals. Although demonstrably feasible, such anapproach is nevertheless very time consuming, especially onlarge data-sets.The often adopted alternative is to attempt an automaticclassiﬁcation of galaxies based on some proxy for a galaxy’smorphology. These automatic classiﬁcation schemes canbe roughly divided into three categories: i) those relyingon a detailed analysis of the full imaging products, ii)those using a wide variety of photometric and spectroscopicproxies, in combination with a sophisticated algorithmicdecision process, and iii) those using one or two simple,usually photometric, parameters and a ﬁxed or simplyparameterized separator. Of course, hybrids between thesecategories also exist.Examples of the ﬁrst category include the Concentration,Asymmetry, and Clumpiness (CAS, Conselice 2003) param-eters, derived directly from the data reduction and modelﬁtting of the imaging data, as well as the Gini coeﬃcient(Gini 1912; Abraham, van den Bergh & Nair 2003; Lotz,Primack, & Madau 2004) and the M coeﬃcient (Lotz,Primack, & Madau 2004). Forming a hybrid between thisand the second category, Scarlata et al. (2007) have intro-duced the Zurich Estimator of Structural Types (ZEST) based on a principle component analysis of these and othermodel- independent quantities, which has been applied tovarious data sets. Examples of the second category aregiven by classiﬁcation schemes based on neural networks(e.g. Banerji et al. 2010) and making use of support vectormachines (Huertas-Company et al. 2011). Finally, thethird category, which ﬁnds widespread use, includes, forexample, the concentration index (Strateva et al. 2001;Stoughton et al. 2002; Kauﬀmann et al. 2003), the locationin colour-magnitude space (Baldry et al. 2004), the S´ersicindex (Blanton et al. 2003; Bell et al. 2004; Jogee et al.2004; Ravindranath et al. 2004; Barden et al. 2005), thelocation in the

NUV − r resp. u − r vs. log( n ) plane (Kelvinet al. 2012; Driver et al. 2012), the location in the spacedeﬁned by the SDSS f dev parameter (i.e., the fraction ofa galaxy’s ﬂux which is ﬁt by the de Vaucouleurs proﬁle(de Vaucouleurs 1948) in the best ﬁt linear combination ofa de Vaucouleurs and an exponential proﬁle) and the axisratio of the best ﬁt exponential proﬁle, q exp (Tempel et al.2011), and, in the case of high-z galaxies the location in the( U − V ) - ( V − J ) restframe colour-colour plane (Patel etal. 2012).Overall, the advantages and disadvantages of the automaticschemes can also be categorized in a similar manner.Schemes in the category i) ideally require well resolvedimaging, which may be diﬃcult to obtain for faint galaxiesin wide ﬁeld imaging surveys, even in the local universe.Furthermore they require detailed imaging products, oftenincluding intermediate data reduction products whichare not archived, making an independent morphologicalclassiﬁcation very time consuming and/or computationallyexpensive, especially for large data sets. Schemes in cate-gory ii), on the other hand, require the implementation ofa complex analysis algorithm in addition to the existenceof a training set of objects with known morphologies, andmay require assumptions about the nature of the statisticaldistribution of the parameters considered. Finally, forthe third category, the simple parameterization must c (cid:13) , 1–39 hotometric Proxies for Selecting Spirals limit either the degree to which the selection recovers allmembers of a given morphological category, or the level atwhich the classiﬁcation is robust against contamination,even for proxies which make use of structural information.Furthermore, it should be noted that the majority of themethods considered make use of parameters linked directlyto ongoing star-formation, and as such may introducea bias into the star-formation properties of a selectedgalaxy sample. For example in category i), the clumpinessparameter in the CAS scheme traces localized current starformation in spirals, while in category ii) both the methodsof Banerji et al. (2010) and Huertas-Company et al. (2011)make use of galaxy colours, and Banerji et al. (2010) usestexture of the imaging as well. Finally in category iii) arange of simple proxies make use of the colour bimodality,linked to star-formation, of the galaxy population.In the following we present a non-parametric method forselecting spirals based on the combinations of two and threephotometric and simple structural parameters. The methodis based on a discretization of the parameter space spannedby the parameter combination performed using an adaptivegrid which increases the resolution in regions of high galaxyparameter space density. The division of the discretizedparameter space into a spiral and a non-spiral subvolumeis calibrated using the morphological classiﬁcations ofGALAXY ZOO Data Release 1 (DR1; (Lintott et al. 2011).We quantify the performance of each parameter combina-tion in terms of completeness and purity, identifying thosewith the best performance, also investigating parametercombinations which make no use of properties directlylinked to ongoing star- formation. This approach can beconsidered formally analogous to the classiﬁcations of starsin discrete spectral classes as discussed in the review ofMorgan & Keenan (1973).We describe the data used in Sect. 2 and the method inSect. 3. We then investigate the performance of the param-eter combinations in Sect. 4 and compare the performanceof our selection with other methods in Sect. 5. We discussour results and the applicability of the method in Sect. 6,and apply the selection method to obtain a reliable sampleof spirals as a basis for investigating the intrinsic scatter inthe stellar mass - speciﬁc star-formation rate relation of thisclass of galaxies in Sect. 7. Finally, we close by summarizingour results in Sect. 8. Throughout the paper we assume anΩ M = 0 .

3, Ω λ = 0 . H = 70 kms − Mpc − cosmology. Within this work we aim to investigate the eﬃcacy and per-formance as proxies of various combinations of UV/opticalphotometric parameters for the morphological selection ofspiral galaxies. To facilitate this comparison and broadenthe range of possible proxies we have endeavored to createan unbiased sample of galaxies with as much available dataas possible. We have selected all spectroscopic objects with

SpecClass = 2 (Galaxies) from the seventh data release(DR7) of SDSS (Abazajian et al. 2009) which lie within theGALEX MIS depth (1500 s; Martin et al. 2005; Morrisseyet al. 2007) footprint. We have matched this sample to thecatalogue of the MPA/JHU analysis of SDSS DR7 spectra (providing emission line ﬂuxes) and to the catalogue ofsingle S´ersic ﬁts recently published by Simard et al. (2011)using the SDSS unique identiﬁers, and to the preliminaryNUV GALEX MIS depth unique NUV source galaxycatalogue GCAT MSC (Seibert et al., 2013 in prep.) using a4 arcsec matching radius . Given the uncertainties involvedwith ﬂux redistribution (e.g., Robotham & Driver 2011),we have chosen to treat only one-to-one matches betweenSDSS and GALEX as possessing reliable UV data.Where multiple spectra are available for a single photo-metric object, we have used the spectrum correspondingto the the MPA/JHU entry. Where multiple spectra formthe MPA/JHU reductions are available, we have chosen thespectrum with the smallest redshift error. In order to obtaina reliable benchmark morphological classiﬁcation, we havematched the sample to the GALAXY ZOO data release1 (DR1) (Lintott et al. 2008, 2011) catalogue of visual,red-shift debiased morphological classiﬁcations (Bamford etal. 2009; Lintott et al. 2011) using the photometric SDSS ObjId, limiting ourselves to local universe sources (redshift z . Opticalsample ), with a subsample of114047 NUV detected, uniquely matched sources (referredto as the

NUVsample ). Finally we have cross-matchedthese samples to the catalogue of ∼

14k bright SDSS DR4(Adelman-McCarthy et al. 2006) galaxies with detailedmorphological classiﬁcations of Nair & Abraham (2010).This results in a subsample of 6220 sources with twoindependent morphological classiﬁcations (which we referto as the

NAIRsample ). 4470 sources in the

NAIRsample have NUV detections, and we refer to this subsample as the

NUVNAIRsample . We have retrieved Petrosian magnitudes, the foregroundextinction, the f deV and q exp parameters from the SDSSphotometric pipeline, and the petrosian 50th ( R ) and90th ( R ) percentile radii in the u , g , r , and i pass-bandsfrom the SDSS database using CasJobs. To obtain total(S´ersic) magnitudes we use the algorithms for convertingSDSS petrosian magnitudes to total S´ersic magnitudesderived by Graham et al. (2005). The obtained magnitudeshave been corrected for foreground extinction using the ex-tinction values supplied by SDSS (derived from the Schlegel,Finkbeiner, & Davis (1998) dust maps). K-corrections to z = 0 have been performed using kcorrect_v4.2 (Blanton& Roweis. 2007).GALEX sources with NUV artifact ﬂag indicating windowor dichroic reﬂections have been removed from the sample.The FUV and NUV magnitudes of the matched GALEXsources have been corrected for foreground extinction usingthe Schlegel, Finkbeiner, & Davis (1998) dust maps and A FUV = 8 . E ( B − V ) and A NUV = 8 . E ( B − V ) followingWyder et al. (2007).Photometric stellar mass estimates have been calculated We note that the GCAT MSC includes a cut on

S/N > (cid:13) , 1–39 M. W. Grootes, et al. from the extinction and k-corrected magnitudes using the g − i colour and the i -band absolute magnitude M i aslog( M ∗ ) = − .

68 + 0 . · ( g − i ) − . M i + 0 . · . , (1)where the factor 4 .

58 is identiﬁed as the solar i -bandmagnitude, following the prescription provided by Tayloret al. (2011). We make use of the emission line ﬂuxes form the H α , H β ,[NII]6584, and [OIII]5007 emission lines, and of the under-lying continuum ﬂux for the H α emission line. Using thesedata we calculate the H α equivalent width, and the BalmerDecrement. We use the H α equivalent width as an inde-pendent observable in the investigation of possible biases inthe morphological proxies for spiral galaxies and the BalmerDecrement in the correction of observed UV photometry forthe eﬀects of attenuation due to dust using the prescriptionof Calzetti et al. (2000) (cf. section 7). The ratios of H α to[NII]6584 and H β to [OIII]5007 are used to identify galaxieshosting an AGN following the prescription of Kewley et al.(2006). The emission line data is taken from the MPA/JHUanalysis of the SDSS DR7 spectra (performed by StephaneCharlot, Guineverre Kauﬀmann, Simon White, Tim Heck-man, Christy Tremonti, and Jarle Brinchmann). We calcu-late the H α EQWs as the ratio of emission line to contin-uum ﬂux. As the listed uncertainties are formal we multiplythe uncertainties on the emission line ﬂuxes by the factorslisted on the website, in particular by 2.473 for H α , 2.039for [NII]6584, 1.882 for H β , and 1.566 for [OIII]5007. Thesefactors have been determined by the MPA/JHU group us-ing comparisons of duplicate spectra of objects within thesample. For sources with S/N <

In constructing the parameter combinations for use asproxies, we have made use of the structural informationsupplied by the simultaneous ﬁts in the g - and r -band ofsingle S´ersic proﬁles to SDSS photometry made available bySimard et al. (2011), performed using GIM2D.

In particular,we have used the S´ersic index n , the single S´ersic eﬀectiveradius r e (half-light semi-major axis) in the r -band, andthe ellipticity e . Simard et al. (2011) ﬁnd that multiplecomponent ﬁts are not justiﬁed for most SDSS sourcesgiven the resolution of the imaging, and similar issues willaﬄict other surveys as well. Therefore we have chosen touse the largely robust single S´ersic proﬁle ﬁts in this work.We note, however, that Bernardi et al. (2012) have recentlyargued that for the brightest sources two component ﬁts arepreferable over single S´ersic ﬁts and that for these sourcesthe sizes derived by Simard et al. (2011) are systematicallytoo small. This will not aﬀect the analysis presented here, asthese sources form a minority of the population considered and the eﬀect will be accounted for in the calibration of theproxies. The GALAXY ZOO DR1 (Lintott et al. 2008; Bamfordet al. 2009; Lintott et al. 2011) represents the largestand faintest sample of galaxies with morphological clas-siﬁcations based on visual inspection. We have employedthese morphological classiﬁcations, speciﬁcally those of thesources with redshift debiased classiﬁcations as providedby Bamford et al. (2009), as a benchmark morphologicalclassiﬁcation. Such a debiased estimate is only possible forsources with spectroscopic redshifts. Rather than a binaryclassiﬁcation, GALAXY ZOO DR1 provides a probabilityfor the source being an elliptical ( P E , DB ) or a spiral ( P CS , DB (CS denotes the combined spiral class, i.e summed over thesub-classes available in GALAXY ZOO DR1, i.e clockwisespiral, anti-clockwise spiral, spiral edge-on/other), based onthe outcome of all classiﬁcations of the object . It is then upto the user to decide where to place the threshold for assum-ing a classiﬁcation is reliable. After eyeballing a selectionof galaxies we have chosen to treat a debiased probabilityof 0.7 or greater as being a reliable classiﬁcation in thecontext of this work. Such a choice results in three popu-lations: i) spirals, ii) ellipticals, and iii) undeﬁned. We willshow that this choice leads to highly pure samples of spirals. Nair & Abraham (2010) have provided detailed visualmorphological classiﬁcations of 14,034 galaxies in the SDSSDR4 (Adelman-McCarthy et al. 2006), with 0 . z . g ′ < ∼ In order to obtain reliable morphological selections of galax-ies based upon photometric parameters, the parameterchosen must ideally display a distinct separation into twopopulations corresponding to the diﬀerent morphological It should be noted that due to the debiasing procedure, P CS,DB + P E,DB for a given galaxy is not necessarily equal tounity. c (cid:13) , 1–39 hotometric Proxies for Selecting Spirals categories. Prominent examples of such one parameter sep-aration criteria are the concentration index C idx = R /R (e.g., Strateva et al. 2001) and the S´ersic index n (e.g.,Blanton et al. 2003).Other schemes make use of combinations of two or moreparameters such as the u − r colour and r -band absolutemagnitude (Baldry et al. 2004), or the q exp and f deV parameters, possibly in combination with u − r colourinformation (Tempel et al. 2011). Recently, Kelvin et al.(2012) and Driver et al. (2012) have suggested the useof a UV/optical colour ( u − r , resp. NUV − r ) and theS´ersic index n in separating spiral and elliptical galaxies,and a variant of the NUV − r , n selection has been usedby Grootes et al. (2013) to select spiral galaxies for thepurpose of a radiation transfer analysis and has proven tobe eﬃcient.Common to all these approaches is the diﬃculty of selectinga curve/surface of separation between the two populations,which includes as large a fraction of the desired categoryas possible, whilst simultaneously keeping the level of con-tamination as low as feasible. In addition, this choice maybe inﬂuenced by further requirements upon the recoveryfraction and purity of the sample, which can be envisionedto vary with application.The functional form of the curve or hypersurface providingthe optimal separation of the two populations is not knowna priori, and an appropriate choice can be non-trivial,even if the population of spiral galaxies is easily separablefrom the non-spiral population by eye. Furthermore, thesharp division between the two is generally not exhibitedby the galaxy populations which show a more gradualtransition. Accordingly, sharp transitions in combinationwith simple parameterizations where the functional formmay be ill-suited can give rise to large contaminations. Rather than making assumptions about the functional formof the separation, we discretize the space spanned by theparameters used into individual cells. For each cell we can,using the Galaxy Zoo classiﬁcations measure the fraction ofthe galaxies residing therein which are spirals (i.e. P CS,DB > . F sp as F sp = N GZ , sp N cell , (2)where N GZ , sp is the number of GALAXY ZOO spirals (i.e., P CS , DB > .

7) in the cell and N cell is the total number ofgalaxies in the cell. The associated relative error ∆ F sp,rel is calculated using Poisson statistics and error propagation.We then deﬁne those cells with F sp > F sp (where F sp isthe threshold spiral fraction) and ∆ F sp,rel . to be spiralcells, i.e., we treat every object in the cell as a spiral galaxy,and thus obtain a decomposition of the parameter spaceinto a spiral and a non-spiral subvolume. The choice of∆ F sp,rel . has little eﬀect in terms of the total popu-lation, as large values of ∆ F sp,rel correspond to scarcelypopulated cells. The population is obviously more sensitiveto the choice of the limiting fraction F sp , with lower valuesleading to larger recovery fractions but lower purity. Herewe have experimented with diﬀerent values of F sp and ﬁnd F sp = 0 . F sp = 0 .

5, however, note that if a larger recoveryfraction or an even greater purity is desired this choice canbe altered.In this work we focus on combinations of two and threeparameters. While the approach is theoretically applicableto higher dimensional parameter spaces, the requirementson resolution and cell population impose an eﬀective limitof three dimensions for the calibration sample available. Weprovide a decomposition of the parameter space for threecombinations of three parameters in appendix A, whichalso provide the values of F sp and ∆ F sp,rel for all cells. Weemphasize that any reader wanting to use the discretizationsprovided must check for systematic diﬀerences betweenhis/her data/parameters and those used in this work, andrefer the reader to Sect. 6.3 for a further discussion of theapplication of the results presented here to other surveys. In order to provide a robust and reliable decomposition ofthe parameter space, the calibration sample must adequatelysample the parameter space and the galaxy population, i.e.it must contain suﬃcient galaxies to achieve the requiredlevel of resolution and to suﬃciently populate the individualcells, as well as be representative of the galaxy population asa whole. On the other hand, as the calibration sample mustbe visually classiﬁed, it is desirable to understand how the c (cid:13) , 1–39 M. W. Grootes, et al.

Figure 1.

Cell grid obtained for the parameter combination (log( n ),log( r e ),log( µ ∗ )) using a calibration sample of 10000 galaxies. The10k galaxies of the calibration sample are overplotted with colour-coding according to the probability of being a spiral (blue : spiral, red:non-spiral). performance of the method relies on the size of the calibra-tion sample. In particular, it is of interest how the purity,completeness, and contamination by ellipticals of the sampledepend on the size of the calibration sample.We deﬁne the purity fraction P pure as P pure = N sel , SP N sel , (3)where N sel is the number of galaxies selected as spirals bythe cell-based method, and N sel , SP is the number of thosegalaxies which are visually classiﬁed as being spiral galax-ies. Analogously the contamination fraction P cont is deﬁnedas the fraction of the selected galaxies which are visuallyclassiﬁed as ellipticals, i.e. P cont = N sel , E N sel . (4)The completeness fraction of the sample P comp is deﬁned as P comp = N sel , SP N SP , (5)where N SP is the total number of visually classiﬁed spiralsin the sample being classiﬁed by the cell-based method.Fig. 2 shows the fractional purity, completeness, and con-tamination by elliptical galaxies for samples selected usinga combination of the parameters S´ersic index (log( n )), eﬀec-tive radius in the r -band (log( r e )), and stellar mass surface density (log( µ ∗ )), as a function of the size of the calibrationsample (this parameter combination is found to perform wellin selecting simultaneously pure and complete samples ofspirals; for further details on the parameters, the parametercombinations, and their performance we refer the reader toSect. 4). The values at each sample size correspond to themean obtained from 5 random realizations of a calibrationsample of that size, with the error bars corresponding to the1- σ standard deviation. In each case, the calibration sampleis drawn from the whole of the GALAXY ZOO sample.The ﬁgure shows the performance in classifying three testsamples: i) the entire optical galaxy sample using the visualclassiﬁcations of spirals provided by GALAXY ZOO (solid),ii) the optical galaxy sample with independent morpho-logical classiﬁcations provided by Nair & Abraham (2010)making use of these to deﬁne which galaxies really are spi-rals (dash-dotted), and iii) the optical galaxy sample withmorphological classiﬁcations provided by Nair & Abraham(2010), but making use of the visual classiﬁcations providedby GALAXY ZOO (dashed). When calculating the contam-ination by ellipticals for GALAXY ZOO-based deﬁnitionswe assume all sources with P E , DB > . c (cid:13)000

50k galaxies no longer lead to a largeimprovement of the performance. The improvement in per-formance with increasing size of the calibration sample isparticularly striking for the optical sample matched to thebright galaxy sample of Nair & Abraham (2010). The in-creasing sample size enables a higher resolution, thus in-creasing purity and decreasing contamination by allowingregions of parameter space to be excluded, while simultane-ously allowing the full extent of the parameter space occu-pied by spiral galaxies to be suﬃciently sampled, increasingcompleteness by including other sections of the parameterspace.Even for the smallest sample sizes the performance of themethod does not appear to depend strongly on the speciﬁcrealization of the calibration sample, as shown by the er-rorbars. However, there is nevertheless a notable decrease inthe 1- σ uncertainty around the mean with increasing samplesize from ∼ − .

5% to . . z . In the context of this work we focus on a suite of directlyobserved and derived parameters for the purpose of identi-fying spiral galaxies which consists of a UV/optical colour( u − r , respectively NUV − r for the NUV matched sample),the S´ersic index n , the eﬀective radius r e (half-light semi-major axis), the i -band absolute magnitude, the ellipticity e , the stellar mass M ∗ , and the stellar mass surface density µ ∗ calculated as µ ∗ = M ∗ πr e . (6)The usefulness of the u − r colour and the S´ersic indexin selecting spirals is well documented (e.g., Baldry et al.2004 respectively Barden et al. 2005). Similarly, as spiralgalaxies are often assumed to be largely star-forming, the NUV − r colour may be assumed to be of use. We havechosen to include the i -band magnitude M i (a directlyobservable tracer of stellar mass) and the derived parameterstellar mass M ∗ , as early-type galaxies are, on average, moremassive than late-types. Furthermore, at a given stellarmass, it appears likely that a rotationally-supported spiralwill be more radially extended than a pressure-supportedearly-type galaxy, hence we make use of the eﬀective radius.This also implies that the stellar mass surface density ofsources may be useful in separating spirals from non-spirals.While for a spiral the value of µ ∗ derived using Eq. 6 Sample size N0.000.020.040.060.080.100.12 E lli p t i c a l C on t a m i na t i on F r a c t i on F r a c t i ona l C o m p l e t ene ss F r a c t i ona l P u r i t y Figure 2.

Fractional purity (top), fractional completeness (mid-dle), and fractional contamination by ellipticals (bottom) for a se-lection of spirals obtained using the S´ersic index (i.e. log( n )), theeﬀective radius in the r -band (i.e. log( r e )), and the stellar masssurface density (i.e. log( µ ∗ )), as a function of the size of the cali-bration sample. The solid line corresponds to the results obtainedwhen classifying the optical sample (i.e without the requirementof an NUV detection), while the dash-dotted line correspondsto the results obtained when classifying the optical sample withmorphological classiﬁcations by Nair & Abraham (2010) deﬁn-ing spirals using these detailed classiﬁcations, and the dashedline corresponds to the optical sample matched to the Nair &Abraham (2010) catalogue but using the GALAXY ZOO visualclassiﬁcations. The data points correspond to the mean of 5 ran-dom realizations of the calibration sample drawn form the opticalgalaxy sample with the error bars corresponding to the 1- σ stan-dard deviation about the mean. is readily interpretable in a physical sense , the valuederived in this manner for a true ellipsoid will tend tounderestimate the actual surface density of the object,as the approximation of the surface area using r e as inEq. 6 will tend to overestimate the projected surface area.Hence, any observed separation of the spiral and non-spiralpopulations in this parameter will represent a lower limit As a spiral galaxy can be assumed to be circular to ﬁrst order,the eﬀective radius can be used to derive a reasonable estimateof the surface area and consequently of the stellar mass surfacedensity.c (cid:13) , 1–39

M. W. Grootes, et al. to the actual separation. Finally we have included theobserved ellipticity e , as the objects on the sky whichappear most elliptical are likely to be spirals observed at amore edge-on orientation. We note, however, that the useof ellipticity as a parameter will bias any selection of spiralstowards sources seen edge-on.Our goal is to identify (multiple) optimal sets of parameterswhich can be used as morphological proxies in the selectionof highly pure and largely complete samples of spiralgalaxies. As NUV data is only available for a subset of thetotal sample we perform the investigations in parallel bothfor the OPTICALsample , as well as the

NUVsample .For the

OPTICALsample we perform the discretizationof the parameter space using a sample of 50k galaxiesrandomly drawn from the

OPTICALsample (the samesample is used for all parameter combinations) and clas-sify the performance using the

OPTICALsample andthe

NAIRsample (i.e. the subsample with morphologicalclassiﬁcations fromNair & Abraham (2010)). For the NUVpreselected sample (the

NUVsample ) we perform thediscretizations using a sample of 30k galaxies with NUVdetections (randomly sampled from the sample of 50kgalaxies used for the

OPTICALsample ), and in this caseclassify the performance using the entire

NUVsample , andthe

NUVNAIRsample (i.e., the subsample of galaxies withmorphological classiﬁcations from Nair & Abraham (2010)and NUV detections.)Fig. 3 shows the distributions of the parameters forthe entire

OPTICALsample (dashed), as well as for therandomly selected subset of 50k galaxies in the calibrationsubsample (solid). As expected, the distributions for thetwo samples are so similar as to be indistinguishable inFig. 3 with the diﬀerences being smaller than the linewidth The ﬁgure shows the distributions for the galaxiesin the samples classiﬁed as spirals ( P CS,DB > .

7, blue),ellipticals ( P E,DB > .

7, red), non-spirals ( P CS,DB < . P CS,DB < . P E,DB < . µ ∗ notably alsodisplays a distinct separation of the two populations, andeven shows a separation between the spiral and undeﬁnedpopulations. The parameters stellar mass, eﬀective radius,and i - band absolute magnitude show the expected trendsin the populations as previously discussed. The distributionof ellipticities, however, is noteworthy. As expected, thespiral sample dominates the largest values of ellipticity anddisplays a separation from the undeﬁned population athigh ellipticity. However, at intermediate and lower valuesof e there is considerable overlap with the other popula-tions. Furthermore, the population of spirals as deﬁnedby GALAXY ZOO appears biased towards high values of This is quantitatively supported by the fact that Kolmogorov-Smirnoﬀ tests (and two sample χ -tests for similarity for the dis-crete distributions in e , n , and r e ) support the null hypothesisthat the samples have the same distribution ( p > . ellipticity, i.e. galaxies seen edge-on . As a consequencea discretization of parameter space using this calibrationsample and e in the parameter combination will also bebiased towards high values of ellipticity (even more so, thandue to the intrinsic overlap of the spiral and non-spiralsample at low and intermediate values of e ). However,the bias will not aﬀect the discretization of the parameterspace for combinations of parameters which are, to ﬁrstorder, independent of the orientations of the galaxies withrespect to the observer (e.g. log( r e ), log( M ∗ ), log( µ ∗ ), M i ,log( n ) . In such cases, the distribution of ellipticities ofspiral galaxies in each of the cells may be expected to besimilar to that of the entire calibration sample, hence thebias towards edge-on systems will have no eﬀect.The bias of the GALAXY ZOO spiral sample must alsobe taken into account when quantifying the performanceof diﬀerent combinations of parameters. When usingsamples relying on the GALAXY ZOO classiﬁcations astest samples, the bias in e can give rise to spuriouslycomplete samples in combination with e as a selectionparameter. In spite of this bias, we nevertheless chooseto use the GALAXY ZOO sample for calibration andtesting purposes, as it represents the only large and faintsample of visually classiﬁed galaxies with a wide range ofhomogeneous ancillary data available. We check for eﬀectsarising from the ellipticity bias using the bright subsampleof galaxies with independent visual classiﬁcations by Nair& Abraham (2010), which does not display an ellipticitybias.Fig. 4 shows the same for the parameter distributions ofthe NUVsample and the randomly selected subset of 30kgalaxies constituting the NUV calibration sample.Comparing the parameter distributions between the

OP-TICALsample and the

NUVsample shown in Figs. 3 &4, the samples appear remarkably similar. Nevertheless,Kolomogorov-Smirnoﬀ and χ tests indicate that, in spiteof their similar appearance, the null hypothesis that theparameter distributions in these samples are the samehas low probability ( p . OPTICALsample and the

NUVsample ( p > . u − r and NUV − r colours ( p · − ), indicating that the NUV pre-selectionmainly aﬀects the undeﬁned population and its size relativeto the spiral and elliptical populations. Despite of thesediﬀerences, overall, the use of UV-preselection only has asmall eﬀect on the parameter distributions, in comparisonwith the large shift in the distributions between themorphological categories. This qualitative impression isconﬁrmed for the optical properties of spirals and ellipticals,the null hypothesis being supported with p > .

37. A might For an unbiased sample one would expect a ﬂat distribution inellipticity A bias in ellipticity can potentially give rise to a slight bias to-wards redder UV/optical colours, as edge-on spirals appear red-der on average. However, we have found no signiﬁcant evidenceof such a bias. Recent work by Pastrav et al. (2013a) has alsofound that fully resolved dust rich galaxies seen edge-on may ap-pear larger than when seen face-on, however, the strength of thiseﬀect remains to be quantiﬁed for marginally resolved sources.c (cid:13) , 1–39 hotometric Proxies for Selecting Spirals be expected, the null hypothesis is, however, rejected forthe NUV − r and u − r colors ( p · − ). The NUVpre-selection also appears to aﬀect the undeﬁned populationand its size relative to the spiral and elliptical distributions,even in the optical parameters, the null hypothesis beingrejected for this class for all parameters. Our goal in this work is to identify parameter combina-tions which provide a pure, but also largely complete sampleof spiral galaxies. As such an additional important ﬁgure ofmerit in quantifying the performance of the diﬀerent param-eter combinations is the bijective discrimination power P bij which we deﬁne as the product of P pure and P comp as deﬁnedin Eqs. 3, 5, i.e. P bij = P pure · P comp . (7)This provides a measure of the eﬃcacy of the parametercombination at simultaneously selecting a pure and com-plete sample of spirals from the test samples. P bij cantake on values between 0 and 1, with 1 corresponding toa perfectly pure and complete sample. As a reference, aselected sample with P pure = 0 .

75 and P comp = 0 . P bij = 0 . P pure = 0 .

984 and P comp = 0 . P bij = 0 . P CS , DB > . P cont as deﬁned in Eq. 4, where wedeﬁne all sources with P E , DB > . In the following we investigate the performance of selectionsusing parameters which can be applied to samples withoutthe requirement of UV data, i.e. u − r colour, log( n ),log( r e ), log( M ∗ ), log( µ ∗ ), M i , and e . The ﬁgures of meritinvolving completeness P comp and P bij are given in relationto the OPTICALsample and the

NAIRsample . Tables 1 and 2 show the ﬁgures of merit achieved whentesting using the

OPTICALsample and the

NAIRsample ,respectively, for all 21 unique combinations of two parame-ters drawn from the suite applicable to optical samples. This statement is valid for the combination of UV and opticalphotometric depths in the dataset used in this work. We cautionthat for diﬀerent datasets this may not necessarily be true.

Testing the performance of diﬀerent parameter com-binations using the

OPTICALsample , we ﬁnd that theparameters log( µ ∗ ) and log( r e ) are eﬃcient at selectingcomplete samples, with all samples with P comp > . P pure > .

7. In concert with either log( µ ∗ )or log( r e ), the parameter log( n ) also leads to pure andcomplete samples of spirals (in particular (log( n ),log( r e ))attains the highest value of P bij = 0 . e inparameter combinations leads to selections which are highlypure on average ( P pure & . P comp < . µ ∗ ), e ) with P pure = 0 . P comp = 0 . P bij = 0 . P bij overall. However, this may be inﬂuenced bythe ellipticity bias in the test sample (see the previousdiscussion in Sect. 4).Interestingly, use of the u − r colour does not of itself lead tovery pure samples, as the purity of, e.g., the combinations( u − r ,log( M ∗ )) and ( u − r , M i ) is only ∼ .

6, while similarcombinations (e.g., (log( r e ), log( M ∗ )) attain much greatervalues. In addition, the completeness attained by usingthe u − r colour is strongly dependent upon the secondparameter used. If the second parameter is more bimodal,e.g. log( µ ∗ ), the combination provides good purity andcompleteness, while the completeness drops for parameterswith less separation of the populations (e.g. M i ). Similarly,the S´ersic index is less eﬃcient than expected, as thebijective discrimination power of the combinations of log( n )with log( M ∗ ) and M i (but also u − r ), is low comparedto that attained in combination with log( r e ) and log( µ ∗ ).Overall, the combination (log( n ),log( r e )) has the greatestbijective discrimination power ( P bij = 0 . µ ∗ ), e ) with ( P bij = 0 . r e ),log( M ∗ ), (log( r e ),log( µ ∗ )),and (log( n ),log( µ ∗ )) all with P bij ≈ .

5. Amongst thesecombinations (log( n ),log( r e )) and (log( n ),log( µ ∗ )) havethe lowest values of contamination by ellipticals with P cont . NAIRsample , using boththe independent morphological classiﬁcations of Nair &Abraham (2010) and the GALAXY ZOO visual classiﬁca-tions.Overall, the purity of the selections obtained when testingthe parameter combinations using the

NAIRsample withGALAXY ZOO visual classiﬁcations is greater than for the

OPTICALsample with values of P pure ∼ . − .

9, indi-cating, that some of the ’impurities’ in the selections fromthe

OPTICALsample are very likely unreliably classiﬁedspirals. On the other hand, the fractional completenessof the selections is of order 0 . − . OPTICALsample . An exception to this are the combina-tions including e , for which the fractional completeness is ∼ . e in the OPTICALsample which is not present in the

NAIRsample . As for the

OPTI- c (cid:13) , 1–39 M. W. Grootes, et al. e /kpc )01000200030004000500060008 9 10 11 12log(M ∗ / M Ο • )0100020003000 5 6 7 8 9 10log( µ ∗ / M Ο • kpc )01000200030004000 -24 -22 -20 -18 -16M i Figure 3.

Distribution of the parameters in the entire

OPTICALsample (dashed) and the calibration sample as deﬁned in Sect. 3.2 forthe population of spirals (blue), ellipticals (red), non-spirals (green), and undeﬁned (orange). The distributions of the whole sample andthe calibration subsample are nearly indistinguishable as diﬀerences are smaller than the line width. e /kpc )01000200030008 9 10 11 12log(M ∗ / M Ο • )050010001500 5 6 7 8 9 10log( µ ∗ / M Ο • kpc )0500100015002000 -24 -22 -20 -18 -16M i Figure 4.

Distribution of the parameters in

NUVsample (dashed) and NUV the calibration sample as deﬁned in Sect. 3.2 (solid) for thepopulation of spirals (blue), ellipticals (red), non-spirals (green), and undeﬁned (orange). The distributions of the whole sample and thecalibration sample are nearly indistinguishable.

CALsample , the parameter combination with the greatestbijective discrimination power is (log( n ,log( r e )). Unlike forthe OPTICALsample , however, the combination with thesecond largest value of P bij is (log( n ),log( µ ∗ )), which alsoattains the lowest value of contamination by ellipticals,rather than (log( µ ∗ ), e ) (likely due to the removal of theellipticity bias as previously discussed). As for the OPTI-CALsample the 5 combinations with the highest valuesof P bij ((log( n ),log( r e )), (log( n ),log( µ ∗ ), ( u − r ,log( µ ∗ )), (log( r e ),log( M ∗ )), (log( µ ∗ ), M i )) all include either log( r e )or log( µ ). Furthermore, log( n ) again leads to very pureand complete selections in combination with log( r e ) orlog( µ ∗ ). In addition, its eﬃciency in combination with otherparameters is also increased (e.g., (log( n ), M i )).Testing using the NAIRsample with the independent clas-siﬁcations of Nair & Abraham (2010) leads to very similarresults. However, the fractional purity of the selectionsis even larger, further underscoring the conclusion that a c (cid:13) , 1–39 hotometric Proxies for Selecting Spirals Table 1. N sel , P pure , P comp , P bij , P cont for combinations of twoparameters applied to the OPTICALsample .Parameter combination N sel P pure P comp P bij P cont ( u − r ,log( n )) 67436 0.617 0.655 0.404 0.060( u − r ,log( r e )) 57168 0.710 0.639 0.453 0.054( u − r ,log( M ∗ )) 63194 0.580 0.577 0.334 0.084( u − r ,log( µ ∗ )) 65254 0.690 0.709 0.489 0.054( u − r , M i ) 61275 0.584 0.563 0.329 0.079( u − r , e ) 47567 0.719 0.538 0.387 0.042(log( n ),log( r e )) 64179 0.724 0.731 0.529 0.032(log( n ),log( M ∗ ) 67304 0.623 0.660 0.412 0.055(log( n ),log( µ ∗ ) 67026 0.688 0.726 0.499 0.027(log( n ), M i n ), e ) 55547 0.685 0.599 0.410 0.038(log( r e ),log( M ∗ )) 63985 0.711 0.716 0.509 0.048(log( r e ),log( µ ∗ )) 61678 0.721 0.700 0.504 0.048(log( r e ), M i ) 61263 0.699 0.674 0.471 0.071(log( r e ), e ) 44938 0.760 0.538 0.409 0.051(log( M ∗ ),log( µ ∗ )) 60231 0.724 0.686 0.496 0.040(log( M ∗ ), M i ) 45243 0.578 0.412 0.238 0.069(log( M ∗ ), e ) 34862 0.737 0.405 0.298 0.062(log( µ ∗ ), M i ) 65086 0.697 0.714 0.497 0.049(log( µ ∗ ), e ) 66627 0.710 0.744 0.528 0.035( M i , e ) 35006 0.730 0.402 0.293 0.072 large contribution to the ’impurity’ of the selections is dueto unreliably classiﬁed spirals. which also has amongst thelowest contamination by ellipticals. The combinations withthe highest bijective discrimination power again includeeither log( r e ), log( µ ∗ ), and/or log( n ), supporting theprevious ﬁndings.Overall, the parameters log( µ ∗ ), log( r e ), and log( n ) appearto be most eﬃcient at selecting pure and complete samplesof spirals. While the performance of selections using only two param-eters is already encouraging, it seems likely that the purityand completeness, and hence the bijective discriminationpower, as well as the fractional contamination, can beimproved by using more information in the selection, i.e. byusing a third parameter.Tables 3 and 4 show the ﬁgures of merit achieved whentesting using the

OPTICALsample and the

NAIRsample ,respectively, for all 35 unique combinations of three pa-rameters drawn from the suite applicable to optical samples.Testing the performance of diﬀerent combinations ofthree parameters using the

OPTICALsample , we ﬁnd thatboth the purity and completeness attained are greater, onaverage, than for combinations of two parameters, as shownin Table 3. In most cases, the use of additional informationin the form of a third parameter leads to a simultaneousincrease in purity and completeness. In some cases, however,the deprojection along the additional third axis can lead tothe inclusion of more parameter space, causing an increaseof completeness at the cost of a decrease in purity or, viceversa, to the exclusion of parameter space, increasing purityat the expense of completeness (e.g., (log( r e ),log( M ∗ )) with P pure = 0 .

711 & P comp = 0 .

716 and (log( r e ),log( M ∗ ), M i ) with P pure = 0 .

707 & P comp = 0 . n ), M i ) with P pure = 0 .

615 & P comp = 0 .

694 and(log( n ), M i , e ) with P pure = 0 .

708 & P comp = 0 . e attain high values of purity(13/15 with P pure > .

7, and 6/15 with P pure > . r e ) and log( µ ∗ )) also attain very highvalues of completeness ( & . P bij (of the 10 combinations with the highest values of P bij ,the ﬁrst 6 include e ). However, as for the combinationsof two parameters, these high values of completeness arepartially due to the ellipticity bias of the OPTICALsample .We will discuss the performance of these combinations onthe basis of tests using the

NAIRsample below. However,we note that all six combinations with the highest valuesof P bij include log( r e ) and/or log( µ ∗ ) . The remaining fourparameter combinations of the 10 with the highest valuesof P bij are (in descending order) (log( n ),log( r e ), M i ) with P bij = 0 . n ),log( r e ),log( µ ∗ )) with P bij = 0 . n ),log( M ∗ ),log( µ ∗ )) with P bij = 0 . n ),log( r e ),log( M ∗ )) with P bij = 0 . r e ) and/or log( µ ∗ ) in additionto log( n ), indicating the potential of these parameters toselect pure and complete samples of spirals. In additionthese four combinations exhibit the lowest contaminationby ellipticals with P cont . .

02. As for combinations of twoparameters, however, log( n ) is only eﬃcient in combinationwith another eﬃcient parameter. The same is true for theparameter u − r colour. Finally, the parameters M i , andlog( M ∗ ), are eﬃcient in combination with combinations oflog( r e ), log( µ ∗ ), and log( n ).Testing the performance of three-parameter combina-tions using the NAIRsample with GALAXY ZOO visualclassiﬁcations (Table 4), we again ﬁnd again ﬁnd that thevalues of P pure and P comp are greater than for combinationsof two parameters. Comparison of the values of purity withthose obtained for the OPTICALsample also again indicatethat a fraction of the ’impurity’ arises from the unreliableclassiﬁcation of spirals.Of the 10 combinations with the highest values of P bij none include e , indicating that the high values attainedfor the OPTICALsample are, at least partially, due tothe ellipticity bias. In descending order, the combina-tions with the greatest bijective discrimination powerare (log( n ),log( r e ),log( µ ∗ )), (log( n ),log( M ∗ ),log( µ ∗ )), (log( n ),log( µ ∗ ), M i ), (log( n ),log( r e ), M i ), and(log( n ),log( r e ),log( M ∗ )), supporting the results obtainedusing the OPTICALsample .Testing using the

NAIRsample with the independentclassiﬁcations of Nair & Abraham (2010) again leadsto very similar results. In terms of choice of the mosteﬀective parameters, the 5 parameter combinations withthe greatest values of P bij are the same as found whenusing the GALAXY ZOO visual classiﬁcations, althoughthe combination with the overall greatest bijective dis-crimination power is (log( n ),log( µ ∗ ), M i ) rather than(log( n ),log( r e ),log( µ ∗ )). c (cid:13) , 1–39 M. W. Grootes, et al.

Table 2. N sel , P pure , P comp , P cont , and P bij for combinations of two parameters applied to NAIRsample using the GALAXY ZOO visualclassiﬁcations (columns 3-6) and the independent classiﬁcations of Nair & Abraham (2010, columns 7-9). In the case of the independentclassiﬁcations the contamination fraction is taken to be the complement of the purity (i.e. this includes sources with T-type = 99).GALAXY ZOO Nair & Abraham (2010)Parameter combination N sel P pure P comp P bij P cont P pure P comp P bij ( u − r , log( n )) 2104 0.839 0.601 0.505 0.048 0.923 0.575 0.530( u − r , log( r e )) 1828 0.882 0.549 0.485 0.040 0.9234 0.496 0.458( u − r , log( M ∗ )) 1856 0.799 0.505 0.403 0.075 0.883 0.481 0.425( u − r , log( µ ∗ )) 2053 0.884 0.618 0.546 0.030 0.950 0.572 0.544( u − r , M i ) 1815 0.803 0.496 0.398 0.068 0.888 0.473 0.420( u − r , e ) 1111 0.832 0.315 0.262 0.038 0.926 0.302 0.280(log( n ), log( r e )) 2479 0.821 0.693 0.569 0.086 0.874 0.641 0.560(log( n ), log( M ∗ ) 2173 0.824 0.609 0.502 0.055 0.904 0.581 0.525(log( n ), log( µ ∗ ) 2124 0.873 0.631 0.551 0.023 0.950 0.597 0.567(log( n ), M i n ), e ) 1435 0.833 0.407 0.339 0.033 0.929 0.394 0.366(log( r e ), log( M ∗ )) 2006 0.893 0.610 0.545 0.026 0.947 0.558 0.528(log( r e ), log( µ ∗ )) 1948 0.901 0.598 0.538 0.024 0.956 0.546 0.523(log( r e ), M i ) 1868 0.866 0.551 0.477 0.050 0.926 0.507 0.469(log( r e ), e ) 1354 0.792 0.365 0.289 0.091 0.854 0.339 0.290(log( M ∗ ), log( µ ∗ )) 1858 0.906 0.573 0.519 0.021 0.959 0.523 0.502(log( M ∗ ), M i ) 1351 0.827 0.380 0.314 0.057 0.899 0.356 0.320(log( M ∗ ), e ) 798 0.786 0.213 0.168 0.056 0.905 0.212 0.192(log( µ ∗ ), M i ) 2012 0.891 0.610 0.543 0.027 0.953 0.562 0.535(log( µ ∗ ), e ) 1880 0.874 0.559 0.489 0.023 0.950 0.522 0.497( M i , e ) 793 0.784 0.212 0.166 0.067 0.898 0.209 0.187 Overall we ﬁnd that the optimum results in termsof purity and simultaneous completeness for optical sam-ples are obtained by combinations of three parametersincluding log( r e ), log( µ ∗ ), log( n ), and log( M ∗ ) or M i ,notably (log( n ),log( r e ),log( µ ∗ )), (log( n ),log( r e ), M i ), and(log( n ),log( µ ∗ ), M i ). Spirals are very often found to be systems with on-goingstar formation, consequently possessed of a younger stel-lar population emitting in the UV (FUV and NUV) anddisplaying blue UV/optical colours. Early-type galaxies onthe other hand are generally found to be more quiescentand redder. Where available, the use of UV properties ofsources may thus prove eﬃcient in the selection of spiralgalaxies. Similarly, a pre-selection on UV emission will en-hance the purity of a sample of star-forming spiral galaxies,at the expense of removing UV-faint, quiescent spirals. Inthe following we investigate the performance of selectionsusing parameters which can be applied to samples prese-lected on the availability of NUV data (the

NUVsample and

NUVNAIRsample in this case), i.e.

NUV − r colour, log( n ),log( r e ), log( M ∗ ), log( µ ∗ ), M i , and e . The ﬁgures of meritinvolving completeness P comp and P bij are given in relationto the NUV preselected samples ( P comp , n and P bij , n ) and tothe optical samples for comparison ( P comp , o and P bij , o ). Tables 5 and 6 show the ﬁgures of merit for all 21 uniquecombinations of two parameters applied to the NUV preselected samples.Testing using the

NUVsample , the combinationswith the greatest values of P bij , n are (log( µ ∗ ), e ) with P bij , n = 0 .

542 (although the completeness may be in-ﬂuenced by the ellipticity bias), (log( r e ),log( M ∗ )) with P bij , n = 0 . n ),log( r e )) with P bij , n = 0 . r e ),log( µ ∗ )) with P bij , n = 0 . r e ), M i )with P bij , n = 0 . r e ) and log( µ ∗ )again result in the most simultaneously pure and completesamples, particularly in combination with log( M ∗ ), M i , orlog( n ). In particular log( µ ∗ ) leads to selections with highpurity (4/5 with P pure > . P pure > . NUV − r colour and S´ersic index are less eﬃcientat selecting pure and complete samples than expected, onlyattaining values of P pure & . NUV − r colourdoes, however, predominantly lead to samples with highcompleteness ( & . M ∗ )and M i .Making use of the NUVNAIRsample with GALAXYZOO visual classiﬁcations we ﬁnd that the combina-tions with the greatest bijective discrimination power are(

NUV − r ,log( r e )) with P bij , n = 0 . NUV − r ,log( M ∗ ))with P bij , n = 0 .

612 and (

NUV − r , M i ) with P bij , n = 0 . n ,log( r e )) with P bij , n = 0 .

568 and(log( n ,log( µ ∗ )) with P bij , n = 0 . NUV − r and a marginally eﬃcient parameter applied to theNUV preselected sample leads to highly complete sam-ples ( P comp , n ∼ . NUV − r in combinationwith eﬃcient parameters leads to pure samples ( e.g.( NUV − r ,log( µ ∗ )) with P pure = 0 . µ ∗ ) all result in very pure samples with P pure > . c (cid:13) , 1–39 hotometric Proxies for Selecting Spirals Table 3. N sel , P pure , P comp , P bij , and P cont for combinations of three parameters applied to the OPTICALsample .Parameter combination N sel P pure P comp P bij P cont ( u − r , log( n ), log( r e )) 65154 0.724 0.743 0.539 0.024( u − r , log( n ), log( M ∗ )) 69906 0.625 0.688 0.430 0.058( u − r , log( n ), log( µ ∗ )) 66453 0.709 0.741 0.526 0.033( u − r , log( n ), M i ) 70880 0.623 0.695 0.433 0.058( u − r , log( n ), e ) 60259 0.682 0.647 0.442 0.042( u − r , log( r e ), log( M ∗ )) 65727 0.713 0.737 0.525 0.038( u − r , log( r e ), log( µ ∗ )) 63633 0.720 0.721 0.520 0.042( u − r , log( r e ), M i ) 67015 0.710 0.749 0.532 0.047( u − r , log( r e ), e ) 63993 0.764 0.770 0.588 0.022( u − r , log( M ∗ ), log( µ ∗ )) 62888 0.719 0.712 0.512 0.039( u − r , log( M ∗ ), M i ) 64714 0.582 0.593 0.345 0.082( u − r , log( M ∗ ), e ) 56811 0.701 0.626 0.439 0.045( u − r , log( µ ∗ ), M i ) 62289 0.720 0.706 0.508 0.037( u − r , log( µ ∗ ), e ) 66140 0.735 0.766 0.563 0.023( u − r , M i , e ) 56083 0.713 0.629 0.449 0.045(log( n ), log( r e ), log( M ∗ )) 65708 0.738 0.764 0.564 0.018(log( n ), log( r e ), log( µ ∗ )) 66581 0.739 0.774 0.572 0.017(log( n ), log( r e ), M i ) 66937 0.740 0.779 0.576 0.021(log( n ), log( r e ), e ) 60988 0.776 0.745 0.577 0.019(log( n ), log( M ∗ ), log( µ ∗ )) 67149 0.731 0.773 0.565 0.019(log( n ), log( M ∗ ), M i ) 68977 0.624 0.678 0.423 0.052(log( n ), log( M ∗ ), e ) 58955 0.692 0.643 0.445 0.042(log( n ), log( µ ∗ ), M i ) 68151 0.716 0.768 0.549 0.018(log( n ), log( µ ∗ ), e ) 67837 0.715 0.763 0.546 0.020(log( n ), M i , e ) 57541 0.708 0.641 0.454 0.036(log( r e ), log( M ∗ ), log( µ ∗ )) 63189 0.717 0.713 0.511 0.044(log( r e ), log( M ∗ ), M i ) 66491 0.706 0.739 0.521 0.052(log( r e ), log( M ∗ ), e ) 64608 0.754 0.767 0.579 0.027(log( r e ), log( µ ∗ ), M i ) 66374 0.707 0.739 0.523 0.055(log( r e ), log( µ ∗ ), e ) 65079 0.759 0.777 0.590 0.026(log( r e ), M i , e ) 58887 0.753 0.698 0.525 0.038(log( M ∗ ), log( µ ∗ ), M i ) 63574 0.713 0.713 0.509 0.045(log( M ∗ ), log( µ ∗ ), e ) 65408 0.754 0.776 0.585 0.027(log( M ∗ ), M i , e ) 49084 0.686 0.530 0.363 0.061(log( µ ∗ ), M i , e ) 66104 0.745 0.775 0.577 0.033 usually, however, at the cost of completeness.Using the independent morphological classiﬁcations of Nair& Abraham (2010) we obtain very similar results, with themost bijectively powerful combinations including NUV − r with M i , log( M ∗ ), or log( r e ) followed by those combininglog( n ), log( r e ), and log( µ ∗ ).For the bright subsample of Nair & Abraham (2010) NUV − r eﬃciently selects pure and complete samples ofspirals, however, the eﬃciency of the parameters log( M ∗ )and log( r e ) also remains high.Overall, the parameters log( n ), log( r e ), and log( µ ∗ ) appeareﬃcient in selecting pure and complete samples of spiralsas for optical samples. In addition, the NUV − r colour incombination with NUV preselection is also eﬃcient in thisrespect.A comparison of the ﬁgures of merit of the selectionsapplied to the NUV pre-selected samples with those ofcomparable parameter combinations applied to the opticalsamples indicates that the use of such a preselectionenhances the ability of the method to select pure andcomplete samples of spirals, with P bij , n being, on average,greater than P bij for comparable parameter combinationsapplied to the optical samples. This is due to the NUV pre- selection removing non-spiral contaminants, thus enlargingthe spiral subvolume by making spirals more dominant andincreasing the purity of spiral cells. In many cases both thecompleteness and the purity of the selections increase (e.g.,(log( r e ), log( M ∗ ))). However, in some cases the increase incompleteness is accompanied by a (slight) decrease in thepurity, indicating that the enlargement of parameter spaceis the dominant eﬀect.Nevertheless, it must be born in mind that these samplesare complete with respect to the preselected sample andmay be biased against intrinsically UV faint spiral galaxiesas well as strongly attenuated spirals seen edge-on if thesesources lie below the NUV detection threshold. Application of combinations of three parameters to theNUV preselected samples has much the same eﬀect as forthe optical samples, i.e. the purity and completeness, andconsequently the bijective discrimination power, increasewith respect to selections based on two parameters. Thesame processes as discussed in Sect. 4.1.2 apply. Tables 7and 8 show the ﬁgures of merit for combinations of three c (cid:13) , 1–39 M. W. Grootes, et al.

Table 4. N sel , P pure , P comp , P cont , and P bij for combinations of three parameters applied to NAIRsample using the GALAXY ZOOvisual classiﬁcations (columns 3-6) and the independent classiﬁcations of Nair & Abraham (2010, columns 7-9). In the case of theindependent classiﬁcations the contamination fraction is taken to be the complement of the purity (i.e. this includes sources with T-type= 99). GALAXY ZOO Nair & Abraham (2010)Parameter combination N sel P pure P comp P bij P cont P pure P comp P bij ( u − r , log( n ), log( r e )) 2339 0.867 0.690 0.598 0.041 0.925 0.640 0.592( u − r , log( n ), log( M ∗ )) 2280 0.829 0.643 0.533 0.053 0.910 0.614 0.559( u − r , log( n ), log( µ ∗ )) 2270 0.872 0.674 0.588 0.033 0.941 0.632 0.595( u − r , log( n ), M i ) 2353 0.826 0.662 0.546 0.052 0.909 0.633 0.576( u − r , log( n ), e ) 1627 0.846 0.469 0.396 0.030 0.930 0.448 0.416( u − r , log( r e ), log( M ∗ )) 2100 0.897 0.641 0.575 0.020 0.951 0.587 0.558( u − r , log( r e ), log( µ ∗ )) 2068 0.894 0.630 0.563 0.024 0.951 0.577 0.549( u − r , log( r e ), M i ) 2059 0.888 0.622 0.553 0.030 0.944 0.571 0.538( u − r , log( r e ), e ) 1872 0.888 0.566 0.502 0.017 0.947 0.521 0.493( u − r , log( M ∗ ), log( µ ∗ )) 1995 0.896 0.609 0.546 0.022 0.956 0.560 0.535( u − r , log( M ∗ ), M i ) 2066 0.809 0.569 0.460 0.071 0.886 0.537 0.476( u − r , log( M ∗ ), e ) 1375 0.834 0.391 0.326 0.038 0.919 0.371 0.341( u − r , log( µ ∗ ), M i ) 1992 0.896 0.608 0.545 0.020 0.958 0.560 0.536( u − r , log( µ ∗ ), e ) 1932 0.893 0.587 0.524 0.019 0.962 0.546 0.525( u − r , M i , e ) 1452 0.842 0.416 0.351 0.035 0.915 0.390 0.356(log( n ), log( r e ), log( M ∗ )) 2319 0.881 0.696 0.613 0.024 0.941 0.646 0.608(log( n ), log( r e ), log( µ ∗ )) 2364 0.884 0.712 0.629 0.024 0.945 0.660 0.624(log( n ), log( r e ), M i ) 2360 0.879 0.706 0.621 0.032 0.935 0.652 0.610(log( n ), log( r e ), e ) 2142 0.867 0.632 0.548 0.045 0.920 0.582 0.536(log( n ), log( M ∗ ), log( µ ∗ )) 2347 0.885 0.707 0.626 0.024 0.946 0.657 0.621(log( n ), log( M ∗ ), M i ) 2283 0.833 0.647 0.539 0.049 0.908 0.613 0.557(log( n ), log( M ∗ ), e ) 1703 0.847 0.491 0.416 0.039 0.926 0.466 0.432(log( n ), log( µ ∗ ), M i ) 2363 0.881 0.709 0.625 0.020 0.950 0.664 0.631(log( n ), log( µ ∗ ), e ) 1989 0.873 0.591 0.516 0.019 0.953 0.560 0.534(log( n ), M i , e ) 1686 0.856 0.492 0.421 0.035 0.921 0.459 0.422(log( r e ), log( M ∗ ), log( µ ∗ )) 1983 0.901 0.608 0.548 0.023 0.955 0.556 0.531(log( r e ), log( M ∗ ), M i ) 2098 0.884 0.631 0.558 0.032 0.939 0.578 0.543(log( r e ), log( M ∗ ), e ) 1888 0.895 0.575 0.514 0.019 0.953 0.528 0.504(log( r e ), log( µ ∗ ), M i ) 2091 0.885 0.630 0.557 0.035 0.940 0.577 0.542(log( r e ), log( µ ∗ ), e ) 1908 0.899 0.584 0.525 0.018 0.958 0.536 0.514(log( r e ), M i , e ) 1731 0.870 0.513 0.446 0.034 0.932 0.473 0.441(log( M ∗ ), log( µ ∗ ), M i ) 1980 0.893 0.602 0.538 0.028 0.952 0.552 0.526(log( M ∗ ), log( µ ∗ ), e ) 1926 0.899 0.590 0.530 0.017 0.958 0.541 0.518(log( M ∗ ), M i , e ) 1447 0.838 0.413 0.346 0.048 0.909 0.430 0.391(log( µ ∗ ), M i , e ) 1922 0.900 0.589 0.530 0.017 0.957 0.539 0.516 parameters applied to the NUVsample and

NUVNAIRsam-ple . The combination of three parameters with the high-est value of P bij when applied to the NUVsample is(

NUV − r ,log( r e ), e ) with P bij , n = 0 .

617 ( P pure = 0 . P comp , n = 0 . e (and are likely aﬀected by the ellipticity bias). However,all 10 combinations include log( r e ), log( µ ∗ ) and/or log( n ).The three most eﬃcient parameter combinations notincluding e are (log( n ),log( r e ),log( µ ∗ )) ( P pure = 0 . P comp , n = 0 . n ),log( r e ), M i ) ( P pure = 0 . P comp , n = 0 . NUV − r ,log( r e ), M i ) ( P pure = 0 . P comp , n = 0 . NUVsample leads tovery complete selections. Of the combinations not including e P comp , n > .

7, 6 of which have P comp , n > . NUV − r in combination with at least oneeﬃcient parameter leads to very complete selections with P comp , n & . NUVNAIRsample with GALAXYZOO visual classiﬁcations the most bijectively powerfulcombination is (

NUV − r ,log( r e ), e ) with P bij , n = 0 . P pure = 0 . P comp , n = 0 . e ).However, of the ten most eﬃcient combinations, this isthe only one including e . The following 5 combinationswith the highest values of P bij , n are (in descending or-der): ( NUV − r ,log( n ),log( r e )), ( NUV − r ),log( r e ), M i ),(log( n ),log( r e ),log( M ∗ )), ( NUV − r ,log( n ),log( M ∗ )), and( NUV − r ,log( n ),log( µ ∗ )). Clearly NUV − r applied incombination with another eﬃcient parameter and NUVpreselection leads to very pure and complete selectionsrecovered from the bright subsample. Similar purity, butat the cost of completeness is also achieved by the param-eter log( µ ∗ ), even without the parameter NUV − r (e.g.(log( r e ),log( M ∗ ),log( µ ∗ )). c (cid:13) , 1–39 hotometric Proxies for Selecting Spirals Table 5.

Purity, completeness, bijective discrimination power, and contamination for combinations of two parameters applied to

NU-Vsample . Completeness and bijective discrimination power are listed w.r.t. the

OPTICALsample ( P comp , o and P bij , o ) and the NUVsample ( P comp , n and P bij , n ). Parameter combination N sel P pure P comp , n P bij , n P cont P comp , o P bij , o ( NUV − r , log( n )) 53285 0.603 0.678 0.408 0.069 0.506 0.305( NUV − r , log( r e )) 46791 0.722 0.713 0.514 0.042 0.532 0.384( NUV − r , log( M ∗ )) 56682 0.581 0.695 0.404 0.082 0.518 0.301( NUV − r , log( µ ∗ )) 47516 0.717 0.719 0.516 0.031 0.536 0.385( NUV − r , M i ) 55825 0.582 0.685 0.399 0.081 0.511 0.298( NUV − r , e ) 40000 0.714 0.603 0.431 0.041 0.450 0.321(log( n ), log( r e )) 46867 0.731 0.723 0.529 0.033 0.540 0.395(log( n ), log( M ∗ ) 53124 0.608 0.681 0.414 0.063 0.508 0.309(log( n ), log( µ ∗ ) 51284 0.688 0.744 0.512 0.032 0.555 0.382(log( n ), M i n ), e ) 37343 0.705 0.556 0.392 0.044 0.415 0.293(log( r e ), log( M ∗ )) 47184 0.731 0.727 0.532 0.039 0.543 0.397(log( r e ), log( µ ∗ )) 45305 0.741 0.708 0.525 0.036 0.529 0.392(log( r e ), M i ) 49531 0.707 0.739 0.523 0.070 0.552 0.390(log( r e ), e ) 40215 0.734 0.623 0.457 0.083 0.465 0.341(log( M ∗ ), log( µ ∗ )) 44472 0.742 0.696 0.517 0.032 0.520 0.386(log( M ∗ ), M i ) 38529 0.567 0.461 0.262 0.097 0.344 0.195(log( M ∗ ), e ) 28449 0.731 0.439 0.321 0.075 0.327 0.239(log( µ ∗ ), M i ) 47342 0.718 0.717 0.515 0.037 0.535 0.384(log( µ ∗ ), e ) 49323 0.721 0.751 0.542 0.030 0.560 0.404( M i , e ) 24399 0.767 0.395 0.302 0.061 0.294 0.226 Table 6.

Purity, completeness, bijective discrimination power, and contamination for combinations of two parameters applied to

NU-VNAIRsample using the GALAXY ZOO visual classiﬁcations (columns 3-8) and the independent classiﬁcations of Nair & Abraham(2010, columns 9-13). Completeness and bijective discrimination power are listed w.r.t. the

OPTICALsample ( P comp , o and P bij , o ) andthe NUVsample ( P comp , n and P bij , n ). In the case of the independent classiﬁcations the contamination fraction is taken to be thecomplement of the purity (i.e. this includes sources with T-type = 99).GALAXY ZOO Nair & Abraham (2010)Parameter combination N sel P pure P comp , n P bij,n P cont P comp , o P bij,o P pure P comp , n P bij,n P comp , o P bij,o ( NUV − r , log( n )) 1551 0.853 0.607 0.518 0.053 0.450 0.384 0.919 0.565 0.519 0.418 0.384( NUV − r , log( r e )) 1801 0.869 0.719 0.624 0.044 0.533 0.463 0.914 0.650 0.594 0.483 0.441( NUV − r , log( M ∗ )) 1970 0.822 0.744 0.612 0.064 0.552 0.454 0.895 0.695 0.622 0.517 0.463( NUV − r , log( µ ∗ )) 1497 0.888 0.611 0.543 0.030 0.453 0.402 0.948 0.560 0.531 0.416 0.394( NUV − r , M i ) 1950 0.824 0.738 0.608 0.064 0.547 0.451 0.896 0.689 0.617 0.512 0.459( NUV − r , e ) 1127 0.859 0.444 0.382 0.031 0.330 0.283 0.933 0.415 0.387 0.308 0.287(log( n ), log( r e )) 1790 0.831 0.683 0.568 0.084 0.507 0.421 0.879 0.623 0.548 0.461 0.405(log( n ), log( M ∗ ) 1591 0.813 0.594 0.482 0.069 0.440 0.358 0.894 0.564 0.504 0.417 0.373(log( n ), log( µ ∗ )) 1616 0.873 0.648 0.566 0.032 0.480 0.419 0.942 0.603 0.568 0.446 0.421(log( n ), M i n ), e ) 944 0.815 0.353 0.288 0.049 0.262 0.213 0.915 0.342 0.313 0.253 0.232(log( r e ), log( M ∗ )) 1512 0.900 0.625 0.562 0.026 0.463 0.417 0.950 0.567 0.539 0.421 0.400(log( r e ), log( µ ∗ )) 1447 0.902 0.599 0.540 0.025 0.444 0.401 0.956 0.546 0.522 0.405 0.388(log( r e ), M i ) 1630 0.842 0.630 0.531 0.075 0.467 0.394 0.890 0.572 0.509 0.425 0.378(log( r e ), e ) 1488 0.728 0.498 0.363 0.160 0.369 0.269 0.776 0.456 0.354 0.339 0.263(log( M ∗ ), log( µ ∗ )) 1387 0.906 0.577 0.523 0.021 0.428 0.388 0.960 0.525 0.504 0.390 0.374(log( M ∗ ), M i ) 1263 0.792 0.459 0.364 0.097 0.340 0.270 0.859 0.428 0.368 0.318 0.273(log( M ∗ ), e ) 728 0.731 0.244 0.178 0.092 0.181 0.132 0.865 0.249 0.215 0.185 0.160(log( µ ∗ ), M i ) 1488 0.898 0.613 0.551 0.026 0.455 0.408 0.953 0.559 0.533 0.416 0.396(log( µ ∗ ), e ) 1397 0.886 0.568 0.504 0.022 0.422 0.374 0.953 0.525 0.500 0.390 0.372( M i , e ) 631 0.751 0.218 0.163 0.094 0.161 0.121 0.876 0.218 0.191 0.162 0.142 Testing using the

NUVNAIRsample with the independentmorphological classiﬁcations of Nair & Abraham (2010)supports the importance of

NUV − r as a parameter forselecting pure and complete samples of spirals under NUVpreselection. The combinations with the largest bijective discrimination power are ( NUV − r ,log( n ),log( M ∗ )),( NUV − r ,log( n ),log( r e )), and ( NUV − r ,log( r e ), e ), withthe use of NUV − r leading to very complete samples, asvisible in the comparison of ( NUV − r ,log( n ),log( r e )) with c (cid:13) , 1–39 M. W. Grootes, et al. (log( n ),log( r e ),log( µ ∗ )), or (log( n ),log( r e ), M i ).To summarize, we ﬁnd that for NUV preselectedsamples the use of NUV − r as a parameter leads to verycomplete, and in the case of the bright subsample of Nair &Abraham (2010) also pure, selections of spiral galaxies. Thisis particularly the case in combination with log( r e ) andlog( n ), while combinations with log( µ ∗ ) are also eﬃcient,but mostly improve the purity of selections at the expenseof completeness. A comparison of the ﬁgures of merit forcomparable parameter combinations applied to the opticaland NUV samples shows, as for the combinations of twoparameters, that the use of NUV preselection increasesboth purity and completeness on average. We again note,however, that the values of completeness are with respectto the NUV samples, and will be biased against UV-faintsources (these may be intrinsically UV faint or UV faint dueto being seen edge-on and experiencing severe attenuationdue to dust).Overall, the parameters log( r e ), log( µ ∗ ), and log( n )appear eﬃcient at selecting pure and complete samplesof spirals, as for the optical samples. Under NUV pres-election however, the NUV − r colour becomes eﬃcientat selecting complete and pure spiral samples, much moreso that the u − r colour for the optical samples. Themost eﬃcient combinations include ( NUV − r ,log( r e ), e ),( NUV − r ,log( n ),log( r e )), and (log( n ),log( r e ),log( µ ∗ )). As shown in Sect. 4.2.2, the use of NUV preselectionresults, on average, in samples with greater completenessand often also greater purity for comparable combinationsof selection parameters. Under NUV preselection theparameter

NUV − r leads to eﬃcient selections of completesamples of spirals, while attaining high values of purityfor the bright subsample. As spiral galaxies are often starforming systems, this result is unsurprising. However, asdiscussed, NUV preselection will bias samples of spiralsagainst intrinsically UV-faint systems, as well as againstsystems which are UV-faint due to severe attenuation (e.g.on account of being seen edge-on).Overall, the eﬃciency of the considered parameter com-binations in selecting pure and complete (under theaforementioned caveat) samples is enhanced by NUV pres-election, with larger volumes of the parameter space beingincluded in the spiral volume than for the whole sample,as indicated by increases in completeness accompaniedby slight reductions in purity when using comparableparameter combinations with and without preselection. Inaddition, especially for combinations of three parameters,NUV preselection can also lead to an increase in purityaccompanied by a decrease in completeness, as regionsmarginally dominated by spirals in the whole sample areexcluded. On average, however, in both cases the value of P bij , n is larger than P bij for a comparable parameter com-bination applied to the OPTICALsample . Thus, dependingupon the science goal of the selection, UV information couldbe a valuable asset in selecting samples of spirals. However,we caution that, in addition to the biases previously dis- cussed, if the depth of the UV coverage is not such that itmatches the depth of the optical data and encompasses theentire (realistic) colour range, UV preselection will stronglysuppress the completeness attainable and introduce biasesinto any selections.In light of these eﬀects, the greater completeness of usingonly optical parameters applied to optical samples, asevidenced by the values of P comp,o in, for example, Table 7and the robustness against bias will likely outweigh thegain in purity achievable by NUV preselection for mostapplications. Based on the ﬁgures of purity, completeness, and bijectivediscrimination power it is readily apparent that the useof combinations of three parameters generally leads topurer and simultaneously more complete samples of spiralsthan using only two parameters. Furthermore, the mostimportant parameters appear to be log( r e ) & log( µ ), whichprovide the most eﬃcient selection when complemented bylog( n ) and/or M i . Applying an NUV preselection appears tofurther improve the attainable purity, and makes NUV − r a further important selection parameter. However, althoughthe purity, completeness, and bijective discriminationpower are good indicators of a selection’s performance,they provide little information about possible biases inthe selections. While the cell-based method allows for aﬂexible surface of separation, any boundary in parameterspace used in classifying objects entails that reliable spiralswith strongly outlying values in the selection parametersmay be missed, and that the selection may not be fullyrepresentative of the actual population of spirals.In the following we will investigate the potential biasescaused by the selection on the basis of four diﬀerent repre-sentative combinations of three parameters (( u − r ,log( r e ), e )resp. ( NUV − r ,log( r e ), e ), (log( n ),log( r e ),log( µ ∗ )),(log( n ),log( r e ), M i ), and (log( n ,log( M ∗ ),log( µ ∗ ))), cho-sen to be amongst the most bijectively powerful. Wewill consider the distributions of the suite of parametersinvestigated for these selections, as well as consider the thedistributions of the H α equivalent width as an independentobservable and the T-type classiﬁcation given by Nair& Abraham (2010) to investigate possible biases in theselections of spiral galaxies. Finally, we will investigate theredshift dependence of the selections of spiral galaxies. Figs. 5 & 6 show the normalized distributions of all eightparameters in the suite investigated, after selection byfour diﬀerent representative combinations of three param-eters (( u − r ,log( r e ), e ) resp. ( NUV − r ,log( r e ), e ) in red,(log( n ),log( r e ),log( µ ∗ )) in green, (log( n ),log( r e ), M i ) inblue, and (log( n ,log( M ∗ ),log( µ ∗ )) in orange), chosen to beamongst the most bijectively powerful, applied to both the OPTICALsample (Fig. 5) and to the

NAIRsample (Fig. 6).For comparison the parameter’s distribution for reliablespirals in the respective sample as deﬁned by GALAXY c (cid:13) , 1–39 hotometric Proxies for Selecting Spirals Table 7.

Purity, completeness, bijective discrimination power, and contamination for combinations of three parameters applied to

NUVsample . Completeness and bijective discrimination power are listed w.r.t. the

OPTICALsample ( P comp , o and P bij , o ) and the NUVsample ( P comp , n and P bij , n ).Parameter combination N sel P pure P comp , n P bij , n P cont P comp , o P bij , o ( NUV − r , log( n ), log( r e )) 50514 0.726 0.774 0.562 0.028 0.577 0.419( NUV − r , log( n ), log( M ∗ )) 56380 0.617 0.733 0.452 0.064 0.547 0.337( NUV − r , log( n ), log( µ ∗ )) 48707 0.716 0.736 0.527 0.032 0.549 0.39( NUV − r , log( n ), M i ) 56496 0.616 0.734 0.452 0.064 0.548 0.337( NUV − r , log( n ), e ) 43708 0.695 0.641 0.445 0.044 0.478 0.332( NUV − r , log( r e ), log( M ∗ )) 48885 0.736 0.759 0.559 0.029 0.567 0.417( NUV − r , log( r e ), log( µ ∗ )) 49163 0.737 0.765 0.564 0.029 0.571 0.421( NUV − r , log( r e ), M i ) 51151 0.731 0.789 0.577 0.033 0.589 0.430( NUV − r , log( r e ), e ) 48396 0.777 0.794 0.617 0.014 0.592 0.460( NUV − r , log( M ∗ ), log( µ ∗ )) 46269 0.746 0.728 0.543 0.029 0.543 0.405( NUV − r , log( M ∗ ), M i ) 56066 0.582 0.689 0.401 0.085 0.514 0.299( NUV − r , log( M ∗ ), e ) 43874 0.730 0.676 0.493 0.035 0.504 0.368( NUV − r , log( mu ∗ ), M i ) 48991 0.730 0.755 0.551 0.030 0.563 0.411( NUV − r , log( mu ∗ ), e ) 49430 0.748 0.780 0.583 0.015 0.582 0.435( NUV − r , M i , e ) 44092 0.734 0.683 0.501 0.033 0.509 0.374(log( n ), log( r e ), log( M ∗ )) 49304 0.744 0.773 0.575 0.020 0.577 0.429(log( n ), log( r e ), log( µ ∗ )) 49665 0.744 0.780 0.580 0.022 0.582 0.433(log( n ), log( r e ), M i ) 49054 0.749 0.775 0.580 0.023 0.578 0.433(log( n ), log( r e ), e ) 47441 0.765 0.766 0.586 0.029 0.571 0.437(log( n ), log( M ∗ ), log( µ ∗ )) 49945 0.736 0.775 0.571 0.020 0.579 0.426(log( n ), log( M ∗ ), M i ) 53302 0.611 0.687 0.420 0.062 0.513 0.313(log( n ), log( M ∗ ), e ) 41242 0.702 0.611 0.429 0.044 0.456 0.320(log( n ), log( µ ∗ ), M i ) 50378 0.719 0.764 0.550 0.019 0.570 0.410(log( n ), log( µ ∗ ), e ) 51054 0.715 0.770 0.551 0.026 0.575 0.411(log( n ), M i , e ) 42160 0.705 0.627 0.443 0.046 0.468 0.330(log( r e ), log( M ∗ ), log( µ ∗ )) 46264 0.738 0.721 0.532 0.033 0.538 0.397(log( r e ), log( M ∗ ), M i ) 48838 0.727 0.749 0.545 0.042 0.559 0.407(log( r e ), log( M ∗ ), e ) 48793 0.764 0.786 0.600 0.028 0.586 0.448(log( r e ), log( µ ∗ ), M i ) 48671 0.729 0.749 0.546 0.045 0.559 0.407(log( r e ), log( µ ∗ ), e ) 49571 0.762 0.797 0.607 0.027 0.595 0.453(log( r e ), M i , e ) 46084 0.757 0.736 0.556 0.043 0.549 0.415(log( M ∗ ), log( µ ∗ ), M i ) 47355 0.729 0.729 0.531 0.039 0.544 0.397(log( M ∗ ), log( µ ∗ ), e ) 49250 0.762 0.791 0.603 0.028 0.590 0.450(log( M ∗ ), M i , e ) 40952 0.698 0.603 0.421 0.065 0.450 0.314(log( µ ∗ ), M i , e ) 49331 0.757 0.787 0.596 0.031 0.588 0.445 ZOO is shown as a dash-dotted black line. Finally, theparameter’s distribution for reliable spirals as deﬁned bythe independent morphological classiﬁcations of Nair &Abraham (2010), i.e. in the

NAIRsample , is shown as agrey dash-dotted line.Overall, the distributions of the parameters derived fromthe selections applied to the

OPTICALsample (Fig. 5)coincide well with that of the GALAXY ZOO deﬁned sam-ple, indicating that the non-parametric method using threeparameters is neither heavily inﬂuencing the parameterranges available to the sample, nor is itself introducinglarge biases. Similarly, the parameter combinations for theselections applied to the

NAIRsample also agree well withthe parameter’s distributions as deﬁned by the GALAXYZOO and Nair & Abraham (2010) visual classiﬁcations.Nevertheless, the eﬀect of the individual choice of parametercombinations is visible in the distributions, with this beingmore pronounced for the application to the

NAIRsample .For example, all combinations involving log( n ) are biasedtowards lower values of this parameter than the visually de-ﬁned samples, while the combination ( u − r ,log( r e ), e ) tracesthem with higher ﬁdelity. The discontinuous steep fall-oﬀ towards redder u − r colours of the selection determined by( u − r ,log( r e ), e ) (most pronounced in the NAIRsample ), isalso an example of the eﬀects of the discretization.The largest diﬀerences, both between the selections and thevisually-deﬁned samples, as well as between the selectionsthemselves, are visible, however, in the distributions ofellipticity. While the distribution of e is more or less ﬂatin the NAIRsample , as is to be expected for an unbiasedsample, the GALAXY ZOO-deﬁned spiral subsample ofthe

OPTICALsample displays a bias towards high valuesof e . Using e as selection parameter, as in the combination( u − r ,log( r e ), e ), gives rise to a bias in the distribution of e for the selected sample as visible in Fig. 6, causing theselection provided by ( u − r ,log( r e ), e ) to largely coincidewith the GALAXY ZOO deﬁned spiral sample for the OP-TICALsample . This bias may also give rise to the agreementbetween the

NUV − r colour distributions of the GALAXYZOO deﬁned sample and the ( u − r ,log( r e ), e ) selectionin Fig. 5 (i.e. for the OPTICALsample ), which extend toredder colours than the other selections, as NUV emissionfrom highly inclined galaxies will be strongly attenuated,more so than in optical bands (e.g., Tuﬀs et al. 2004). In c (cid:13) , 1–39 M. W. Grootes, et al.

Table 8.

Purity, completeness, bijective discrimination power, and contamination for combinations of three parameters applied to

NAIRsample using the GALAXY ZOO visual classiﬁcations (columns 3-6) and the independent classiﬁcations of Nair & Abraham(2010, columns 7-9). Completeness and bijective discrimination power are listed w.r.t. the

NAIRsample ( P comp , o and P bij , o ) and the NUVNAIRsample ( P comp , n and P bij , n ). In the case of the independent classiﬁcations the contamination fraction is taken to be thecomplement of the purity (i.e. this includes sources with T-type = 99).GALAXY ZOO Nair & Abraham (2010)Parameter combination N sel P pure P comp , n P bij,n P cont P comp , o P bij,o P pure P comp , n P bij,n P comp , o P bij,o ( NUV − r , log( n ), log( r e )) 1879 0.864 0.745 0.644 0.047 0.553 0.477 0.915 0.681 0.623 0.504 0.461( NUV − r , log( n ), log( M ∗ )) 1934 0.841 0.747 0.628 0.055 0.554 0.466 0.906 0.694 0.629 0.514 0.465( NUV − r , log( n ), log( µ ∗ )) 1564 0.878 0.630 0.553 0.033 0.467 0.410 0.943 0.584 0.551 0.432 0.408( NUV − r , log( n ), M i ) 1906 0.839 0.735 0.617 0.055 0.545 0.457 0.902 0.681 0.615 0.504 0.455( NUV − r , log( n ), e ) 1299 0.856 0.511 0.437 0.038 0.379 0.324 0.928 0.478 0.443 0.354 0.328( NUV − r , log( r e ), log( M ∗ )) 1687 0.893 0.691 0.617 0.027 0.513 0.458 0.942 0.627 0.591 0.466 0.439( NUV − r , log( r e ), log( µ ∗ )) 1713 0.891 0.701 0.624 0.025 0.520 0.463 0.941 0.636 0.599 0.473 0.445( NUV − r , log( r e ), M i ) 1770 0.884 0.718 0.635 0.034 0.533 0.471 0.928 0.648 0.602 0.482 0.447( NUV − r , log( r e ), e ) 1705 0.908 0.711 0.645 0.014 0.527 0.479 0.956 0.643 0.615 0.478 0.457( NUV − r , log( M ∗ ), log( µ ∗ )) 1594 0.897 0.657 0.589 0.025 0.487 0.437 0.946 0.595 0.563 0.442 0.418( NUV − r , log( M ∗ ), M i ) 1970 0.815 0.737 0.601 0.069 0.547 0.446 0.887 0.690 0.612 0.512 0.455( NUV − r , log( M ∗ ), e ) 1478 0.884 0.600 0.531 0.020 0.445 0.394 0.941 0.549 0.516 0.408 0.384( NUV − r , log( mu ∗ ), M i ) 1647 0.888 0.672 0.597 0.029 0.498 0.442 0.943 0.613 0.578 0.455 0.429( NUV − r , log( mu ∗ ), e ) 1494 0.908 0.623 0.566 0.017 0.462 0.420 0.967 0.570 0.551 0.424 0.410( NUV − r , M i , e ) 1467 0.883 0.595 0.526 0.022 0.441 0.390 0.938 0.543 0.509 0.403 0.378(log( n ), log( r e ), log( M ∗ )) 1745 0.886 0.710 0.629 0.028 0.526 0.466 0.940 0.650 0.611 0.481 0.452(log( n ), log( r e ), log( µ ∗ )) 1736 0.885 0.705 0.624 0.028 0.523 0.463 0.940 0.646 0.607 0.478 0.449(log( n ), log( r e ), M i ) 1757 0.874 0.705 0.617 0.042 0.523 0.457 0.923 0.642 0.593 0.475 0.438(log( n ), log( r e ), e ) 1754 0.831 0.669 0.556 0.078 0.496 0.412 0.884 0.615 0.543 0.455 0.402(log( n ), log( M ∗ ), log( µ ∗ )) 1698 0.894 0.697 0.623 0.025 0.517 0.462 0.948 0.638 0.605 0.472 0.448(log( n ), log( M ∗ ), M i ) 1695 0.820 0.638 0.523 0.069 0.473 0.388 0.895 0.601 0.538 0.445 0.398(log( n ), log( M ∗ ), e ) 1189 0.834 0.455 0.380 0.049 0.338 0.282 0.918 0.432 0.396 0.320 0.293(log( n ), log( µ ∗ ), M i ) 1694 0.888 0.691 0.614 0.021 0.512 0.455 0.950 0.638 0.606 0.472 0.449(log( n ), log( µ ∗ ), e ) 1545 0.869 0.617 0.536 0.029 0.457 0.397 0.939 0.575 0.540 0.425 0.400(log( n ), M i , e ) 1307 0.828 0.497 0.411 0.060 0.368 0.305 0.896 0.464 0.416 0.343 0.308(log( r e ), log( M ∗ ), log( µ ∗ )) 1465 0.903 0.607 0.549 0.024 0.450 0.407 0.954 0.552 0.526 0.410 0.391(log( r e ), log( M ∗ ), M i ) 1567 0.886 0.637 0.564 0.036 0.473 0.419 0.936 0.579 0.542 0.430 0.403(log( r e ), log( M ∗ ), e ) 1528 0.889 0.624 0.554 0.026 0.462 0.411 0.944 0.569 0.537 0.423 0.399(log( r e ), log( µ ∗ ), M i ) 1567 0.880 0.633 0.557 0.041 0.470 0.413 0.934 0.577 0.539 0.429 0.400(log( r e ), log( µ ∗ ), e ) 1536 0.896 0.632 0.566 0.022 0.469 0.420 0.951 0.577 0.548 0.428 0.407(log( r e ), M i , e ) 1450 0.870 0.579 0.504 0.044 0.430 0.374 0.916 0.524 0.480 0.389 0.357(log( M ∗ ), log( µ ∗ ), M i ) 1516 0.888 0.618 0.549 0.032 0.458 0.407 0.942 0.563 0.531 0.419 0.394(log( M ∗ ), log( µ ∗ ), e ) 1556 0.894 0.639 0.571 0.021 0.474 0.423 0.951 0.584 0.555 0.434 0.413(log( M ∗ ), M i , e ) 1154 0.792 0.420 0.332 0.074 0.311 0.246 0.885 0.403 0.356 0.299 0.265(log( µ ∗ ), M i , e ) 1548 0.897 0.637 0.571 0.023 0.473 0.424 0.946 0.578 0.547 0.429 0.406 contrast to the selection using ( u − r ,log( r e ), e ), the otherinvestigated parameter combinations show distributionswhich are more or less ﬂat in e , also justifying the use ofthe GALAXY ZOO sample as a calibration sample.Comparison of the distribution of the parameters in theselections applied to the OPTICALsample with those ofthe galaxies classiﬁed as spirals in the

NAIRsample usingthe classiﬁcations of Nair & Abraham (2010), shows asystematic diﬀerence in the distributions of the param-eters between these samples. Overall, the spiral galaxiesin the

NAIRsample are more weighted towards redder

NUV − r and u − r colours, as well as towards largervalues of log( M ∗ ) and log( µ ∗ ), and brighter i -band absolutemagnitudes. Furthermore, the distributions of log( n ) andlog( r e ) are weighted towards larger values of n and lowervalues of r e , respectively. The observable diﬀerences arelargely consistent with the bright NAIRsample ( g ′ -bandmag

16) being more weighted towards large spirals which,on average, are more massive and redder than lower mass spiral galaxies. Furthermore, they often also have moredominant bulges, increasing the values of n and decreasingthose of r e , while simultaneously decreasing the value of e ,in agreement with the observed distributions. However, thediﬀerences may also be due, in part, to the fact that thecell-based selection misses regions of parameter space whichare sparsely populated by spirals and in which they do notrepresent the dominant galaxy population. Nevertheless,Fig. 6 shows that the selections using combinations ofthree parameters trained on the GALAXY ZOO visualclassiﬁcations of the OPTICALsample perform well atrecovering the

NAIRsample .Fig. 7 shows the parameter distributions for thecombinations applied to the

NUVsample (we make useof (

NUV − r ,log( r e ), e ) instead of ( u − r ,log( r e ), e )). Theresults of applying the combinations to the NUVsample arenearly identical to those obtained for the

OPTICALsample .However, the use of NUV preselection does bias the selected c (cid:13) , 1–39 hotometric Proxies for Selecting Spirals galaxy populations towards bluer objects as can be seenin the shift of the distributions of the u − r and to lesserextent the NUV − r colour, between Figs. 5 & 7. The useof NUV preselection and NUV − r colour also slightlylessens the bias against sources with low values of e selectedusing the combination ( NUV − r ,log( r e ), e ), rendering thedistribution in e of this selection ﬂatter than that of theGALAXY ZOO deﬁned sample. The overall similarity tothe results obtained for the optical samples show thatthe requirement of an NUV detection itself is only mildlyinﬂuencing the selections. α equivalent width as independentobservables Although the agreement between the parameter distribu-tions of the visually deﬁned samples and the selections isvery good, the fact that a bias towards bluer u − r and NUV − r colours is discernible, and that the selectionsslightly favour lower values of log( n ) and log( µ ∗ ) and highervalues of log( r e ), raises the possibility that the selectionsmay nevertheless be biased against a subclass of spirals. T-type distributions of the

NAIRsample

In order to investigate to what extent such a bias maybe present, we ﬁrst make use of the distributions of theT-type classiﬁcations of Nair & Abraham (2010). Fig. 8shows the normalized distributions of the T-type valuesfor the four selections, compared with the distributionsof the visually classiﬁed spiral samples (GALAXY ZOO:black, Nair & Abraham (2010):grey). The distribution ofthe T-types of galaxies classiﬁed as spirals by the selectionis shown in green, while the magenta line shows the T-typedistributions of the GALAXY ZOO deﬁned reliable spiralslocated in spiral cells following the selection. For the

NAIRsample the GALAXY ZOO classiﬁcations (black solidline) appear moderately biased against early type spirals(mainly against Sa, and less against Sa/b). The selectionsbased on the combinations of three parameters (green line)display a similar, but more pronounced bias, favoring spiralgalaxies of type Sa/b, Sb and later, underscored by thestronger bias against early type spirals of GALAXY ZOOspirals in spiral cells (magenta line). Overall, the parameterbased selections recover relatively more earlier type spiralsthan the GALAXY ZOO classiﬁcations, in line with theﬁndings that a large fraction of the ’impurity’ arises fromspiral galaxies which fail to meet the P CS,DB > . T-type distributions of the

NUVNAIRsample

Fig. 9 shows the resultant distributions of T-types for the (u-r,log(r e ),e) -0.15-0.10-0.050.000.050.100.15 ∆ f r e l a t i v e f r equen cy f (log(n),log(r e ),log( µ ∗ )) -5 0 5 10T-Type (Nair & Abraham, 2010) (log(n),log(r e ),M i )-5 0 5 10T-Type (Nair & Abraham, 2010)-0.15-0.10-0.050.000.050.100.15 ∆ f r e l a t i v e f r equen cy f (log(n),log(M ∗ ),log( µ ∗ )) Figure 8.

Distribution of T-types for galaxies in the

NAIRsample classiﬁed as spirals based on the classiﬁcations of Nair & Abraham(2010) (gray), GALAXY ZOO (black), and the parameter combi-nation listed top left (green). The T-type distribution of galaxieswith P CS , DB > . located in cells associated with spiral galaxiesis shown in magenta. The inset panel below each distributionshows the distribution of the diﬀerence in relative frequency forthis galaxy type relative to those of the Nair & Abraham (2010)classiﬁcations. selections applied to the NUVNAIRsample (using

NUV − r rather than u − r ). Overall, the results are very similar,with both the GALAXY ZOO classiﬁed spirals and thespirals selected by the parameter combinations being moreweighted towards later type galaxies than the classiﬁcationsof Nair & Abraham (2010). We note the fact that the NUVNAIRsample is more weighted towards earlier typespirals than the

NAIRsample . H α Equivalent Width Distribution of the

NAIR-sample & NUVNAIRsample

A similar investigation of the possible bias against subclassesof spiral galaxies for the

OPTICALsample , respectively forthe

NUVsample , is not possible, as these lack independentvisual classiﬁcations and T-Types. However, to at leastgain a qualitative insight into the possible biases for theselarger samples, we make use of the distributions of H α equivalent width (EQW), an observable used neither in ourclassiﬁcation nor in that supplied by GALAXY ZOO.Based on H α EQW, galaxies are often divided into twomain populations, ’line-emitting’ galaxies (i.e. galaxies with c (cid:13) , 1–39 M. W. Grootes, et al. e / kpc)0.00.20.40.60.81.01.28 9 10 11 12log(M ∗ / M Ο • )0.00.20.40.60.81.01.2 5 6 7 8 9 10log( µ ∗ / M Ο • kpc )0.00.20.40.60.81.01.2 -24 -22 -20 -18 -16M i Figure 5.

Normalized distribution of the suite of 8 parameters as recovered for all GALAXY ZOO reliable spirals in the

OPTICALsample (black dashed) and the selections deﬁned using ( u − r ,log( r e ), e ) (red) ,(log( n ),log( r e ),log( µ ∗ )) (green), (log( n ),log( r e ), M i ) (blue), and(log( n ,log( M ∗ ),log( µ ∗ )) (orange), applied to the OPTICALsample . The parameter distribution of spirals as deﬁned by the classiﬁcationsof Nair & Abraham (2010) in the

NAIRsample is shown as a grey dash-dotted line. e / kpc)0.00.20.40.60.81.01.28 9 10 11 12log(M ∗ / M Ο • )0.00.20.40.60.81.01.2 5 6 7 8 9 10log( µ ∗ / M Ο • kpc )0.00.20.40.60.81.01.2 -24 -22 -20 -18 -16M i Figure 6.

As Fig. 5 but for the

NAIRsample . non-negligible Balmer line emission, usually actively starforming) and passive galaxies (very little/no line emission,usually quiescent). In general, spirals tend to exhibit H α line emission (although a non-negligible fraction has verysmall H α EQWs indicative of passive systems), whileearly- types are predominantly passive. Similarly, earliertype spirals often have smaller values of H α EQW thanlater types (see e.g., Robotham et al. 2013. for a detaileddiscussion).Figs. 10 & 11 show the distributions of H α EQW for the

NAIRsample and

NUVNAIRsample , respectively. Thedistribution of the samples deﬁned using the classiﬁcationsof Nair & Abraham (2010) is again shown in gray, withthat of the sample deﬁned by GALAXY ZOO in black. Inboth cases the GALAXY ZOO deﬁned sample is weightedmore towards intermediate values of H α EQW with respectto the classiﬁcations of Nair & Abraham (2010), showingevidence of a bias against low values of H α EQW as wellas, to a lesser extent, against the highest values. Thedistributions of H α EQW of the samples deﬁned by the c (cid:13) , 1–39 hotometric Proxies for Selecting Spirals e / kpc)0.00.20.40.60.81.01.28 9 10 11 12log(M ∗ / M Ο • )0.00.20.40.60.81.01.2 5 6 7 8 9 10log( µ ∗ / M Ο • kpc )0.00.20.40.60.81.01.2 -24 -22 -20 -18 -16M i Figure 7.

Normalized distribution of the suite of 8 parameters as recovered for all GALAXY ZOO reliable spirals in the

NUVsample (black dashed) and the selections deﬁned using (

NUV − r ,log( r e ), e ) (red) ,(log( n ),log( r e ),log( µ ∗ )) (green), (log( n ),log( r e ), M i ) (blue),and (log( n ,log( M ∗ ),log( µ ∗ )) (orange), applied to the NUVsample . The parameter distribution of spirals as deﬁned by the classiﬁcationsof Nair & Abraham (2010) in the

NUVNAIRsample is shown as a grey dash-dotted line. selections (green) all display a similar, yet more pronouncedbias against low values of H α EQW. The selections, withthe exception of ( u − r ,log( r e , e ), all also appear weightedagainst the highest values of H α EQW. These biases againstlow values of H α EQW may be considered to be consistentwith the distributions of the T-types in the samples, withthe selections favoring later type spirals.In summary, we ﬁnd that the GALAXY ZOO classiﬁcationsdisplay a simultaneous mild bias against early type spiralsand systems with low values of H α EQW for the

NAIRsam-ple and

NUVNAIRsample , and that this bias is slightly morepronounced for the parameter combination based selections. H α Equivalent Width Distribution of the

Optical-sample & NUVsample

Bearing this mild simultaneous bias in mind, we consider thedistributions of H α EQW for parameter combinations as ap-plied to the

OPTICALsample and the

NUVsample , shown inFigs. 12 & 13, respectively.The samples selected by the same parameter combinationsas previously applied to the

NAIRsample display a biasagainst low values of H α EQW when applied to the

OPTI-CALsample , similar to that observed for their application tothe

NAIRsample . Overall, all the considered parameter com-binations recover the peak in the H α EQW correspondingto star-forming galaxies well, with high values of H α EQWbeing only minimally favored with respect to the GALAXYZOO deﬁned sample. However, all selections display a biasagainst very low values of H α EQW, least so for the combi-nation ( u − r ,log( r e , e ). The general trends in the distribu-tions of H α EQW appear very similar to those identiﬁed forthe selections applied to the

NAIRsample , hence we expectthat the selections applied to the

OPTICALsample will also exhibit a similar bias towards later type spirals.It is important to note the very good agreement betweenthe H α EQW distributions of all reliable spirals in the

OP-TICALsample (black) and

NUVsample (gray) shown in thepanels of Fig. 13. This indicates that the NUV preselectionitself is not introducing a strong bias. Nevertheless, NUVpreselection does appear to lead to a slight bias against sys-tems with low H α EQW, favoring high H α EQW systems.As for the

OPTICALsample the selections applied to the

NUVsample display a bias against low values of H α EQW,although the bias is reduced under NUV preselection. How-ever, the parameter combinations are slightly more weightedtowards high values of H α EQW than for the

OPTICALsam-ple . Overall, the trends in the H α EQW distributions aresimilar to those observed in the selections drawn from the

OPTICALsample , the

NAIRsample , and the

NUVNAIR-sample . Accordingly, we expect that the parameter basedselections will be, to some extent, biased against early typespirals.

A ﬁnal avenue of possible bias we address here, is thedependence of the performance of the selection on thedistance/redshift of the sources. This is of particularinterest, as the parameters with the best performance arelargely structural or related parameters, e.g. log( n ), log( r e ),log( µ ∗ ), and as such may depend on the resolution of theimages in terms of physical sizes.Over the time span corresponding to the redshift rangeof z = 0 − .

13 we do not expect the distribution ofgalaxy morphologies to evolve in a signiﬁcant manner(e.g. Bamford et al. 2009), hence the fraction of spiralsshould be approximately constant. However, as massive c (cid:13) , 1–39 M. W. Grootes, et al. (u-r,log(r e ),e) -0.15-0.10-0.050.000.050.100.15 ∆ f r e l a t i v e f r equen cy f (log(n),log(r e ),log( µ ∗ )) -5 0 5 10T-Type (Nair & Abraham, 2010) (log(n),log(r e ),M i )-5 0 5 10T-Type (Nair & Abraham, 2010)-0.15-0.10-0.050.000.050.100.15 ∆ f r e l a t i v e f r equen cy f (log(n),log(M ∗ ),log( µ ∗ )) Figure 9.

As Fig. 8 but for galaxies in the

NUVNAIRsample . bright galaxies are less likely to be spirals than lessmassive, fainter galaxies, this will only be the case forvolume-limited samples. In Fig. 14 we show the fractionof galaxies classiﬁed as spirals by the parameter combi-nations ( u − r ,log( r e ), e ), resp. ( NUV − r ,log( r e ), e ) inthe case of NUV-preselection, (log( n ),log( r e ),log( µ ∗ )),(log( n ),log( r e ), M i ), and (log( n ),log( M ∗ ),log( µ ∗ )) for dif-ferent volume-limited samples of galaxies. At top leftwe show the spiral fractions as a function of z for avolume-limited subsample of the NAIRsample extendingto z = 0 .

07 (i.e. M g < − D ( z = 0 . D ( z ) isthe distance module and M g is the absolute magnitude inthe g band). We ﬁnd that the spiral selections recoveredby the parameter combinations (with the exception of( u − r ,log( r e ), e )) are ﬂat in z , and are in good agreementwith the z dependence of the spiral selection for thissample deﬁned by the visual classiﬁcations of Nair &Abraham (2010) (black dash-dotted line). The middleleft panel shows that the distribution of spirals selectedfrom a volume-limited subsample of the OPTICALsample extending to z = 0 .

09 (i.e. M r < . − D ( z = 0 . z forthe selections not using colour as a parameter, while thebottom left panel shows a similar result for a volume-limitedsubsample of the OPTICALsample extending to z = 0 . M r < . − D ( z = 0 . z ). In the latter two panels, the dash-dotted blackline indicates the z dependence of the spiral fraction asdeﬁned by the GALAXY ZOO visual classiﬁcations. The (u-r,log(r e ),e) -0.04-0.020.000.020.04 ∆ f r e l a t i v e f r equen cy f (log(n),log(r e ),log( µ ∗ )) -1 0 1 2 3log( H α EQW [Angstrom]) (log(n),log(r e ),M i )-1 0 1 2 3log( H α EQW [Angstrom])-0.04-0.020.000.020.04 ∆ f r e l a t i v e f r equen cy f (log(n),log(M ∗ ),log( µ ∗ )) Figure 10.

Distribution of H α EQW for galaxies in the

NAIR-sample classiﬁed as spirals based on the classiﬁcations of Nair &Abraham (2010) (gray), GALAXY ZOO (black), and the parame-ter combination listed top left (green). The H α EQW distributionof galaxies with P CS , DB > . located in cells associated with spiralgalaxies is shown in magenta. The inset panel below each distribu-tion shows the distribution of the diﬀerence in relative frequencyfor each bin in H α EQW relative to that of the Nair & Abraham(2010) classiﬁcations. decline in the spiral fraction is largely due to the certaintyof the classiﬁcations decreasing with increasing z . If theassumption of a constant spiral faction as a function of z isvalid, these results may be seen to imply that for marginallyresolved sources, the automatic cell-based non-parametricclassiﬁcation schemes may be superior to the GALAXYZOO DR1 classiﬁcations.The right hand panels of Fig. 14 show the results ofapplying the parameter combinations to NUV preselectedsamples, taking into account the UV sensitivity limits(i.e. with the additional requirement on the samples that M NUV < − D ( z sel ), where z sel is the limiting redshiftof the sample). For volume-limited subsample of the NUVNAIRsample we ﬁnd, as for the

NAIRsample , thatthe spiral fraction is ﬂat in z . For the other volume-limitedsamples, although the selections are largely ﬂat in z ,there is nevertheless an increase with increasing redshift.Notably, the spiral fraction of selections which only de-pend on parameters determined at long wavelengths (e.g.(log( n ),log( r e ), M i )), and which have spiral distributionswhich are ﬂat in z without the requirement of an NUV c (cid:13) , 1–39 hotometric Proxies for Selecting Spirals (NUV-r,log(r e ),e) -0.04-0.020.000.020.04 ∆ f r e l a t i v e f r equen cy f (log(n),log(r e ),log( µ ∗ )) -1 0 1 2 3log( H α EQW [Angstrom]) (log(n),log(r e ),M i )-1 0 1 2 3log( H α EQW [Angstrom])-0.04-0.020.000.020.04 ∆ f r e l a t i v e f r equen cy f (log(n),log(M ∗ ),log( µ ∗ )) Figure 11.

As Fig. 10 but for galaxies in the

NUVNAIRsample . detection, also display an increase of the spiral fractionwith z under NUV preselection. This can most readilybe understood in the context of an evolution in theUV properties of the volume-limited samples of spiralsconsidered, with an increasing fraction of spiral galaxieswith NUV emission as a function of increasing redshift z . Such a scenario is consistent with the observed declinein star-formation rate density from z − M ∗ & M ⊙ over this redshift range (Moustakas et al.2013, and references therein). The volume-limited samplesconsidered will be dominated by galaxies in this mass rangeand be accordingly sensitive to such evolutionary eﬀects.We note that as the redshift range spans over a Gyrin lookback time, some evolution in the spiral fraction maybe expected linked to a slight decline in the fraction ofspirals with decreasing z , i.e. we do not expect a perfectlyconstant fraction of spirals. Nevertheless, the lack of anymajor dependence on the spiral fraction as a function ofredshift, implies that no major redshift dependent biasesare introduced into the selection when using combinationsof three parameters with the non-parametric cell-basedmethod, and that the method may even prove to be morereliable than visual classiﬁcations. (u-r,log(r e ),e) -0.02-0.010.000.010.02 ∆ f r e l a t i v e f r equen cy f (log(n),log(r e ),log( µ ∗ )) -1 0 1 2 3log( H α EQW [Angstrom]) (log(n),log(r e ),M i )-1 0 1 2 3log( H α EQW [Angstrom])-0.02-0.010.000.010.02 ∆ f r e l a t i v e f r equen cy f (log(n),log(M ∗ ),log( µ ∗ )) Figure 12.

Distribution of H α EQW for galaxies in the

OPTI-CALsample classiﬁed as spirals by GALAXY ZOO (black), andthe parameter combination listed top left (green). The H α EQWdistribution of galaxies with P CS , DB > . located in cells associ-ated with spiral galaxies is shown in magenta. The inset panelbelow each distribution shows the distribution of the diﬀerencein relative frequency for each bin in H α EQW relative to that ofthe GALAXY ZOO classiﬁcations.

Using the cell-based method presented in Sect. 3 wehave identiﬁed combinations of parameters includinglog( r e ), log( µ ∗ ), log( n ), log( M ∗ ), and M i , in partic-ular (log( n ),log( r e ),log( µ ∗ )), (log( n ),log( r e ), M i ), and(log( n ),log( M ∗ ),log( µ ∗ )), to result in simultaneously pureand complete samples of spirals. These selections appearto be robust against redshift dependent biases, and to belargely unbiased in their parameter distributions, only dis-playing a slight bias against early type spirals. Accordingly,the cell-based method using these combinations appearswell suited to selecting samples of spiral galaxies. In thefollowing we investigate the contribution of the cell-basedmethod to the demonstrable success, and compare itsperformance to a selection of widely used morphologicalproxies, as well as to a novel algorithmic approach based onsupport vector machines (Huertas-Company et al. 2011). While the use of the parameter combinations in concertwith the cell-based method presented in sect. 3 can lead to c (cid:13) , 1–39 M. W. Grootes, et al. (NUV-r,log(r e ),e) -0.02-0.010.000.010.02 ∆ f r e l a t i v e f r equen cy f (log(n),log(r e ),log( µ ∗ )) -1 0 1 2 3log( H α EQW [Angstrom]) (log(n),log(r e ),M i )-1 0 1 2 3log( H α EQW [Angstrom])-0.02-0.010.000.010.02 ∆ f r e l a t i v e f r equen cy f (log(n),log(M ∗ ),log( µ ∗ )) Figure 13.

As Fig. 12 but for galaxies in the

NUVsample . simultaneously pure and complete samples of spiral galaxies,the use of the cell-based method requires a training sample,ideally of &

30k galaxies (cf. Fig. 2) In contrast to this, theadvantage of simple hard cuts on parameters is that theyrequire no (or much smaller) such calibration samples. Inour investigations we have made use of a suite of parametersincluding ones traditionally used in the morphological clas-siﬁcation of spirals (e.g. n ), as well as novel parameters suchas µ ∗ . In order to investigate to what extent the demon-strable success is due to the parameters used, and whatthe eﬀect of the cell-based algorithm is, we have appliedthe combinations ( u − r ,log( r e ), e ), (log( n ),log( r e ),log( µ ∗ )),(log( n ),log( r e ), M i ), and (log( n ),log( M ∗ ),log( µ ∗ )) to the OPTICALsample and the

NAIRsample using ﬁxed bound-aries derived by eye from the parameter distributions shownin Fig. 3. In this context we have chosen to treat galaxieswith u − r .

1, log( r e ) . e > .

3, log( n ) . µ ∗ ) .

3, log( M ∗ ) .

7, and M i > −

22 as spirals.The results tabulated in Table 9 show that the bijective dis-crimination power of the selections using ﬁxed boundariesis much lower than when the same parameter combinationsare used with the cell-based method. It is clear that theuse of ﬁxed boundaries entails a strong trade-oﬀ betweenpurity and completeness. Although the parameter com-binations ( u − r ,log( r e ), e ), (log( n ),log( r e ),log( µ ∗ )), and(log( n ),log( r e ), M i ) all attain high values of purity (even ∼ .

05 greater than with the cell based method), they,however, are highly incomplete, with completeness values ∼ . − . s p i r a l f r a c t i on M r < 17.7 - D(z=0.13) s p i r a l f r a c t i on M r < 17.7 - D(z=0.13)M NUV < 23 - D(z=0.13) s p i r a l f r a c t i on M r < 17.7 - D(z=0.09) s p i r a l f r a c t i on M r < 17.7 - D(z=0.09)M NUV < 23 - D(z=0.09) s p i r a l f r a c t i on M g < 16 - D(z=0.07) s p i r a l f r a c t i on M g < 16 - D(z=0.07)M NUV < 23 - D(z=0.07)

Figure 14.

Spiral fraction as a function of redshift z in binsof width 0.01 for selections deﬁned using ( u − r ,log( r e ), e )resp. ( NUV − r ,log( r e ), e ) (red) ,(log( n ),log( r e ),log( µ ∗ )) (green),(log( n ),log( r e ), M i ) (blue), and (log( n ,log( M ∗ ),log( µ ∗ )) (orange),respectively. The top left panel shows the results for the combina-tions applied to a volume-limited subsample of the NAIRsample (the selection criteria are indicated in each panel). The redshiftdependence of the spiral fraction deﬁned by the classiﬁcations ofNair & Abraham (2010) in the considered subsample is shownblack as a dash-dotted line. Error bars indicate Poisson 1- σ un-certainties. The top right panel shows the same, but applied toa subsample of the NUVNAIRsample as deﬁned in the panel.The middle and bottom left panels show the redshift dependenceof the spiral fraction for the selection applied to two volume-limited subsamples of the

OPTICALsample with the GALAXYZOO deﬁned reliable spiral fraction shown as a black dash-dottedline. The middle and bottom right panels show the same for the

NUVsample . The parameter combination (log( n ),log( M ∗ ),log( µ ∗ )), onthe other hand, attains a completeness only ∼ .

07 lessthan the cell-based method, but with the purity of theselection reduced by ∼ .

1. The high values of completeness,attained simultaneously to the high values of purity whenmaking use of the parameter combinations together withthe cell-based method, thus appear largely due to theﬂexibility of the boundaries given by the cell-based method. c (cid:13)000

Purity, completeness, bijective discrimination power, and contamination for the combinations ( u − r ,log( r e ), e ),(log( n ),log( r e ),log( µ ∗ )), (log( n ),log( r e ), M i ), and (log( n ),log( M ∗ ),log( µ ∗ )) using ﬁxed boundaries, applied to the OPTICALsample (columns 2-5) and the

NAIRsample using the GALAXY ZOO visual classiﬁcations (columns 6-9) as well as the independent classi-ﬁcations of Nair & Abraham (2010, columns 10-12).

OPTICALsample NAIRsample

GALAXY ZOO NAIR & Abraham 2010Parameter combination P pure P comp P bij P cont P pure P comp P bij P cont P pure P comp P bij ( u − r ,log( r e ), e ) 0.793 0.398 0.316 0.015 0.911 0.257 0.234 0.006 0.961 0.236 0.227(log( n ),log( r e ),log( µ ∗ )) 0.794 0.567 0.450 0.006 0.934 0.487 0.455 0.007 0.976 0.442 0.431(log( n ),log( r e ), M i )) 0.782 0.507 0.396 0.007 0.922 0.372 0.343 0.013 0.965 0.339 0.327(log( n ),log( M ∗ ),log( µ ∗ )) 0.654 0.700 0.458 0.028 0.861 0.573 0.493 0.023 0.946 0.547 0.517 Having identiﬁed the cell-based method used with combina-tions of three parameters including log( r e ), log( µ ∗ ), log( n ),log( M ∗ ), and M i , in particular (log( n ),log( r e ),log( µ ∗ )),(log( n ),log( r e ), M i ), and (log( n ),log( M ∗ ),log( µ ∗ )), as amethod to select simultaneously pure and complete samplesof spirals we compare its performance to that of a selectionof widely used morphological proxies, as well as to thatof a novel algorithmic approach based on support vectormachines (Huertas-Company et al. 2011).Two well-known proxies for the general morphologicaltype of a galaxy are the concentration index in the r band,deﬁned as C r = R ,r R ,r where R ,r and R ,r are the radiiwithin which 90 resp. 50 per cent of the galaxy’s (petrosian)ﬂux are contained, and the S´ersic index n , i.e., the indexobtained for the best ﬁt of a S´ersic proﬁle (S´ersic 1968)to the galaxy’s light distribution. Strateva et al. (2001)suggest the use of the concentration index as a proxy formorphological classiﬁcation with galaxies with C r < . n < . u − r colour vs. absolute r magnitude diagram, with theseparator parameterized by a combination of a constantand a tanh function dependent on the absolute r bandmagnitude (their Eq. 11).A diﬀerent approach, also making use of two parameters,has been adopted by Tempel et al. (2011). They deﬁne asubvolume in the two dimensional space spanned by theSDSS parameters f deV (i.e., the fraction of a galaxy’s ﬂuxwhich is ﬁt by the de Vaucouleurs proﬁle (de Vaucouleurs1948) in the best ﬁt linear combination of a de Vaucouleursand an exponential proﬁle) and q exp (the axis ratio of theSDSS best ﬁt exponential proﬁle) associated with spiralgalaxies and calibrated on visual classiﬁcations of SDSSgalaxies in the Sloan Great Wall region (Einasto et al.2010) and GALAXY ZOO.Recently Huertas-Company et al. (2011) have publisheda catalogue of morphological classiﬁcations of SDSS DR7spectroscopic galaxies based on support vector machines,which compare well with GALAXY ZOO classiﬁcations ofthe same sample. Similarly to GALAXY ZOO Huertas-Company et al. (2011) assign probabilities to the possible galaxy classes, so that for the purposes of our comparisonwe have chosen to treat objects with a probability greaterthan 70 per cent of being a spiral as a spiral, analogouslyto our treatment the GALAXY ZOO sample .Table 10 shows the purity, completeness, and bijectivediscrimination power for the ﬁve morphological proxiesdiscussed above as well as the three parameter combinationsapplied to the OPTICALsample and the

NAIRsample . Allmorphological proxies, with the exception of that proposedby Tempel et al. (2011), attain values of completenesssimilar to, or larger than, that of the cell based methodwhen applied to the

OPTICALsample , although only theclassiﬁcation of Huertas-Company et al. (2011) achievesa completeness notably exceeding that of the cell-basedmethod ( P comp = 0 . OPTICALsample , much lower than thevalue of ≈

75 per cent achieved by the cell- based method,the exception again being the method of Tempel et al.(2011). As a result, the bijective discrimination power ofthese selections is lower than that achieved by the optimalcombinations of three parameters, using the cell-basedmethod, with only the method of Huertas-Company etal. (2011) attaining a comparable value of P bij . However,the contamination by ellipticals introduced by the proxiesconsidered is at least a factor 3 greater than that resultingfrom the cell based method.Applied to the brighter NAIRsample the purity of the con-sidered proxies increases notably, while the completenessslightly decreases. The purity of the selections resultingfrom the use of the considered proxies remains signiﬁcantlylower than that achieved by the parameter combinations,both when using the GALAXY ZOO visual classiﬁcations aswell as those of Nair & Abraham (2010), as can also be seenin the distributions of the T-types in the samples selectedby the considered proxies (Fig. 15). The completeness, onthe other hand, is greater than for the parameter basedselections, so that the bijective discrimination power of theconsidered proxies is comparable to that of the parameterbased selections when applied to the

NAIRsample .As can be seen in Fig. 15, the T-type distributions ofthe considered proxies display a bias towards later type Huertas-Company et al. (2011) provide probabilistic morpho-logical classiﬁcations for all but 311 of the sources in our samplec (cid:13) , 1–39 M. W. Grootes, et al. spirals very similar to that of the cell-based selections.However, the bias against Sa and Sa/b galaxies appears tobe slightly less pronounced, with the relative frequency ofearly type galaxies being marginally higher for the samplesrecovered by the proxies than by the cell-based selections.On the other hand, the T-type distributions in Fig. 15 alsoshow the considerably larger contamination by ellipticalsnot present in the cell based selections.Considering the distributions of H α EQW for the samplesobtained by these proxies applied to the

NAIRsample asshown in Fig. 16 one ﬁnds that the samples recovered bythe proxies (with the exception of the methods of Huertas-Company et al. 2011 and Tempel et al. 2011) display a biastowards sources with large values of H α EQW, considerablymore so than the cell-based selections, with ∼

10% moreof the sample consisting of high H α EQW sources thanin the samples recovered by the cell-based method. Thisresult is most pronounced for the samples selected by theconcentration index, the S´ersic index and the method ofBaldry et al. (2004). Similar but more pronounced resultsare obtained if one considers the distributions of H α EQWfor the samples obtained by these proxies applied to the

OPTICALsample , as shown in Fig. 17. In contrast, theselections based on the method of Tempel et al. (2011)and Huertas-Company et al. (2011) appear to be weightedmore towards high and low values of H α EQW than theGALAXY ZOO reference and the selections based on theparameter combinations used in concert with the cell-basedmethod.Overall, we ﬁnd that the selections resulting from theproxies are similar to, or more biased than, the selectionsbased on the cell-based method, and are clearly morecontaminated.In conclusion, we thus ﬁnd that for the purpose ofselecting a pure, yet nevertheless largely complete, sampleof spiral galaxies, not limited to the brightest galaxies,the use of the cell-based method presented in combina-tion with one of the optimal parameter combinations ispreferable over the investigated well-established proxies,and at least comparable to the sophisticated approach ofHuertas-Company et al. (2011).

Using the non-parametric cell-based method presented, wehave successfully identiﬁed several combinations of threeparameters which allow for an eﬃcient and rapid selectionof pure and simultaneously complete, largely unbiasedsamples of spiral galaxies. When applied to parent samplesnot limited to the brightest galaxies, these are superiorin performance, in terms of bijective discrimination powerand bias (e.g. in H α EQW), to the widely establishedsimple morphological proxies investigated, such as theconcentration index C r , the S´ersic index n , and the divisioninto red and blue galaxies. Furthermore, they are at leastcomparable in performance to the algorithmic approachusing SVMs of Huertas-Company et al. (2011). -0.10-0.050.000.050.10 ∆ f r e l a t i v e f r equen cy f n < 2.5 Baldry+2004 -0.10-0.050.000.050.10 ∆ f r e l a t i v e f r equen cy f Huertas-Company+2011 -5 0 5 10T-Type (Nair & Abraham, 2010) Tempel+2010-5 0 5 10T-Type (Nair & Abraham, 2010)-0.10-0.050.000.050.10 ∆ f r e l a t i v e f r equen cy f C r < 2.6 Figure 15.

T-type distributions of the discussed selection meth-ods applied to the

NAIRsample indicated top left in each panel.The distribution of GALAXY ZOO spirals with P CS,DB > . P CS,DB > . However, depending upon the eﬀort required to obtaina given parameter, either in terms of data processing oracquisition, the ‘cost’ of parameters, and hence of param-eter combinations, will vary. For example, a parametercombination including only quantities such as r e , M i , u − r , and e which can, at least for reasonably resolvedsources, often be measured directly by SExtractor (Bertin& Arnouts 1996) is ‘cheaper’ than a combination involvingparameters which require additional data reduction such c (cid:13) , 1–39 hotometric Proxies for Selecting Spirals Table 10.

Purity, completeness, bijective discrimination power, and contamination for other widely used morphological proxies, appliedto the

OPTICALsample (columns 2-5) and the

NAIRsample using the GALAXY ZOO visual classiﬁcations (columns 6-9) as well as theindependent classiﬁcations of Nair & Abraham (2010, columns 10-12). The values attained by the combinations (log( n ),log( r e ),log( µ ∗ )),(log( n ),log( r e ), M i ), and (log( n ),log( M ∗ ),log( µ ∗ )) are shown for comparison. OPTICALsample NAIRsample

GALAXY ZOO NAIR & Abraham 2010Method P pure P comp P bij P cont P pure P comp P bij P cont P pure P comp P bij (log( n ), log( r e ), log( µ ∗ )) 0.739 0.774 0.572 0.017 0.884 0.712 0.629 0.024 0.945 0.660 0.624(log( n ), log( r e ), M i ) 0.740 0.779 0.576 0.021 0.879 0.706 0.621 0.032 0.935 0.652 0.610(log( n ), log( M ∗ ), log( µ ∗ )) 0.731 0.773 0.565 0.019 0.885 0.707 0.626 0.024 0.946 0.657 0.621Huertas-Company et al., 2011 0.588 0.903 0.531 0.077 0.806 0.836 0.673 0.054 0.898 0.802 0.720Baldry et al., 2004 0.522 0.802 0.419 0.081 0.745 0.747 0.557 0.115 0.834 0.721 0.601Tempel et al., 2011 0.648 0.411 0.266 0.078 0.786 0.387 0.304 0.064 0.896 0.380 0.340 n < . C r < . as ﬁtting S´ersic proﬁles using., e.g. GIM2D (Simard et al.2002) or

GALFIT (Peng et al. 2002) . Similarly the relative‘cost’ of additional NUV data is much higher than that ofrelying solely on optical pass- bands, as it involves the useof additional observational facilities.Encouragingly, we ﬁnd that various parameter combi-nations perform similarly well, allowing for a choice ofparameter combination informed by both the envisionedscience application as well as the relative ‘expense’ of theparameters used.Overall, the most important parameters in selecting asample of spiral galaxies are the eﬀective radius log( r e ),the stellar mass surface density log( µ ∗ ), and the S´ersicindex log( n ). These parameters perform especially well incombination with the stellar mass or a tracer thereof (e.g M i ). We ﬁnd the combinations (log( n ),log( r e ),log( µ ∗ )),(log( n ),log( r e ), M i ), and (log( n ),log( M ∗ ),log( µ ∗ )) to bethose with the greatest bijective discrimination power whenapplied to the OPTICALsample . These are also amongstthe most powerful under NUV preselection, although thecombination (

NUV − r ,log( r e ), M i ) is comparably powerful.In the latter case, however, the selection appears to bedriven by the parameters M i and, in particular, log( r e ). Interms of relative ‘expense’ the combinations requiring NUVpre- selection are more ‘expensive’ than those applicable tothe whole sample. Although the best-performing combina-tions all require S´ersic proﬁles to be ﬁt, the cost is stronglyameliorated by the fact that only single S´ersic proﬁles arerequired.Unsurprisingly, the ellipticity e proves to be an eﬀectiveparameter, as only spirals seen edge-on appear stronglyelliptical. In this sense, it even counters the bias againstedge-on spirals, which can be introduced by using UV-optical colours as selection parameters, as dusty edge-onspirals may drop out of a colour selection due to attenua-tion of their UV emission. However, selections using e asa parameter are strongly biased against any spirals seen Where high resolution imaging is available these codes them-selves present a diﬀerent method of automatic morphological clas-siﬁcation, as they can perform multiple component ﬁts which canbe used to determine the morphological type of a galaxy. How-ever, the requirements on resolution are severe and ﬁtting multiplecomponents is often not justiﬁed (Simard et al. 2011). approximately face-on, respectively not edge-on. Thus,while the observed ellipticity represents a powerful criterionfor selecting a pure sample of spirals and has a low relativecost, it leads to generally less complete samples, which arestrongly biased towards edge-on systems.Although our results indicate that simple structural pa-rameters derived at longer wavelengths are eﬃcient atselecting spirals, the combinations (

NUV − r ,log( r e ), M i ),and to a lesser extent ( u − r ,log( n ),log( r e )), indicate thatUV/optical colours linked to younger stellar populations doprovide valuable information for selecting spiral galaxies.As mentioned above, however, use of UV-optical colouras a parameter can lead to biases in the selection. Dustin spirals will cause galaxies seen edge-on to appear veryred, hence, the use of a UV-optical colour can bias theselection against these systems. Furthermore UV-opticalcolour selection can introduce a bias against any spiralswhich appear intrinsically red due to lack of star formation.This is the case both for the u − r and NUV − r colours.Finally, when using a colour as a parameter (in particular aUV colour) the possibility of diﬀerent depths of photometrymust be accounted for, i.e., the photometry in both bandsmust be deep enough to ensure that the entire range ofcolour normally attributed to the galaxy population iscovered over the entire redshift range of the sample. Failureto do so will give rise to both additional incompleteness, aswell as a colour bias in the resulting sample.Depending on the science application for which the sampleis intended, and on the availability of data diﬀerent com-binations may be optimal in selecting spiral galaxies. Forexample, using the combination (log( n ), log( r e ), M i ) wouldbe appropriate to obtain a selection of spiral galaxies fora project aiming at investigating the total star formationrates of a large sample of spiral galaxies as derived fromthe UV. Such a selection would avoid a bias againstquiescent systems, as would be introduced by using a NUVpreselection or a UV-optical colour, while also guardingagainst any orientation biases which could arise if e wasused as a selection parameter. Accordingly such a samplewould be largely unbiased with respect to star formationcharacteristics. Another suitable combination for such anapplication would be (log( n ), log( r e ), log( µ ∗ )), which is c (cid:13) , 1–39 M. W. Grootes, et al. -0.050.000.05 ∆ f r e l a t i v e f r equen cy f n < 2.5 Baldry+2004 -0.050.000.05 ∆ f r e l a t i v e f r equen cy f Huertas-Company+2011 -1 0 1 2 3log( H α EQW [Angstrom]) Tempel+2010-1 0 1 2 3log( H α EQW [Angstrom])-0.050.000.05 ∆ f r e l a t i v e f r equen cy f C r < 2.6 Figure 16. H α EQW distributions of the discussed selectionmethods indicated top left in each panel applied to the

NAIRsam-ple . The distribution of GALAXY ZOO spirals with P CS,DB > . P CS,DB > . α EQW relative to that of the Nair & Abraham (2010) classiﬁ-cations. also largely independent of UV-optical colours

The stellar mass estimate used in deriving µ ∗ does depend onan optical color, i.e the g − i colour, however, this colour is linkedmainly to intermediate age and old stellar populations. Givenphotometry of suﬃcient depth, the g − i colour does not present adirect selection criterion but is only used in calculating the stellarmass, such that M ∗ and µ ∗ can be considered unbiased in terms -0.04-0.020.000.020.04 ∆ f r e l a t i v e f r equen cy f n < 2.5 Baldry+2004 -0.04-0.020.000.020.04 ∆ f r e l a t i v e f r equen cy f Huertas-Company+2011 -1 0 1 2 3log( H α EQW [Angstrom]) Tempel+2010-1 0 1 2 3log( H α EQW [Angstrom])-0.04-0.020.000.020.04 ∆ f r e l a t i v e f r equen cy f C r < 2.6 Figure 17. H α EQW distributions of the discussed selectionmethods indicated top left in each panel applied to the

OP-TICALsample . The distribution of GALAXY ZOO spirals with P CS,DB > . P CS,DB > . α EQW relative to that of the GALAXY ZOO clas-siﬁcations.

Conversely, however, a sample which required the greatestachievable purity should include both NUV preselectionand e as a parameter. Thus, the selection can and shouldbe adapted to the science case at hand, although the lack of star-formation properties. Further more the stellar mass M ∗ derived in this manner is largely independent of dust attenuation(Bell & de Jong 2001; Nicol et al. 2011; Taylor et al. 2011).c (cid:13) , 1–39 hotometric Proxies for Selecting Spirals of requirement of UV data allows the method to be easilyapplied to very large samples with minimum requirementson wavelength coverage. As discussed in Sect. 6.1, we ﬁnd that the most importantparameters in selecting spirals are the eﬀective radiuslog( r e ), the stellar mass surface density log( µ ∗ ), and theS´ersic index log( n ) in combination with the stellar mass ora tracer thereof (e.g. M i ). In addition e leads to very pureif incomplete selections. All these properties are derived inpass-bands normally associated with older stellar popula-tions ( g , r , and i ), rather than with recent star formation.The success achieved by using parameters not obviously di-rectly related to the young stellar population is remarkableand implies that the spiral and non-spiral population aremore or less distinct in these parameters. While the successof e bases on the appearance in projection of spiral galaxies,that of log( r e ) and log( µ ∗ ), on the other hand, entails thatthe radial extent and in particular the ratio of mass to sizeof the old stellar population is distinctly diﬀerent in spiralsand ellipticals. Rotationally supported systems (i.e. spirals)appear to be signiﬁcantly more extended than pressuresupported systems (i.e spheroidals/ellipticals) at a givenstellar mass .This is consistent with the notion that the stellar pop-ulations evolve via distinct evolutionary tracks for disksand spheroids, with the evolution of present day spiralsthought to involve a smooth infall of gas and inside-outstar formation, with merger activity restricted to minormergers.In contrast, ellipticals are thought to be the products ofmajor mergers in which angular momentum is redistributedmaking the central system more compact (e.g., Bournaud,Jog, & Combes 2007, and references therein).In light of our results we emphasize that parameterslinked to the old stellar population of galaxies, normallynot employed in the classiﬁcation of spirals, may providevaluable information on the morphology of a galaxy. Inparticular the stellar mass surface density and/or the radialextent (together with another parameter, e.g. M i ) may bepowerful due to the physically motivated characterizationparameters. We have shown the cell-based method to work well for SDSSgalaxies, in particular a subset of the SDSS spectroscopicsample. Hence we expect the method to be applicable tosamples of similar depth and similar angular resolution, andthus be applicable to upcoming surveys similar to SDSS,e.g. SKYMAPPER (The Skymapper Southern Sky Survey;Keller et al. 2007). Many upcoming surveys (DES, VST This size dichotomy can be boosted further by the presence ofdust in the disks, which can increase the apparent size of disksrelative to the intrinsic size (M¨ollenhoﬀ, Popescu, & Tuﬀs 2006;Pastrav et al. 2013a)

ATLAS, KiDS, and GAMA (Galaxy And Mass Assembly;Driver et al. 2011), as well as SDSS itself, however, extendto greater photometric depths than the sample used here.To answer the question of how applicable the method is toother, deeper surveys we have used a sample consisting ofthe 50k r -band brightest galaxies in the OPTICALsample (i.e m r < .

48) as a calibration sample and have subse-quently classiﬁed the faintest 50k galaxies ( m r > . n ),log( r e ),log( µ ∗ )),(log( n ),log( r e ), M i ), and (log( n ),log( M ∗ ),log( µ ∗ )) . Theresults are shown in Table 11, where we have included theresults obtained using the calibration sample employed insect. 4, as well as the results obtained using the widely usedproxies discussed in sect. 5 for comparison. Using the brightsubsample to classify the faint subsample we ﬁnd that theselections are very complete, yet appear to be less pure thanwhen classifying the entire OPTICALsample . However, thisis largely due to a decrease in the certainty of the GALAXYZOO classiﬁcations for sources which appear fainter as theypredominantly lie at greater redshifts and are smaller andless resolved. This is underscored by the very low valuesof contamination achieved for the diﬀerent combinations.The performance of the cell-based method remains easilysuperior to that of the simple proxies, achieving muchgreater purity and similar completeness.These resultssuggest that galaxy samples extending faintwards of theSDSS spectroscopic limit can also be classiﬁed using themethod presented (cf. also Sect. 4.3.3).Penultimately, the increased angular resolution and sen-sitivity of the upcoming surveys with respect to SDSSmay allow the method to be extended to sources at higherredshifts than the current very local sample. A somewhatsimilar approach deﬁning subspaces associated with early-and late-type galaxies using U − V and V − J restframecolours, calibrated using HST ACS imaging, has recentlybeen proposed by Patel et al. (2012) for galaxies at z ∼ . c (cid:13) , 1–39 M. W. Grootes, et al.

Table 11.

Purity, completeness, bijective discrimination power, and contamination for the combinations (log( n ),log( r e ),log( µ ∗ )),(log( n ),log( r e ), M i ), and (log( n ),log( M ∗ ),log( µ ∗ )) and the proxies discussed in sect. 5 applied to the faintest 50k galaxies in the OP-TICALsample , i.e m r > .

24. The results are presented for calibrations of the cell based method using the brightest 50k galaxies inthe

OPTICALsample ( m r < . P pure P comp P bij P cont P pure P comp P bij P cont (log( n ), log( r e ), log( µ ∗ )) 0.596 0.860 0.513 0.009 0.657 0.787 0.517 0.005(log( n ), log( r e ), M i ) 0.607 0.861 0.523 0.009 0.664 0.799 0.530 0.006(log( n ), log( M ∗ ), log( µ ∗ )) 0.602 0.844 0.508 0.009 0.647 0.781 0.506 0.006Huertas-Company et al., 2011 0.477 0.934 0.446 0.078Baldry et al., 2004 0.434 0.825 0.358 0.098Tempel et al., 2011 0.549 0.551 0.302 0.071 n < . C r < . The cell-based method presented here could, in principle,be adapted to identifying reliable samples of ellipticalgalaxies in an analogous fashion to that described for theidentiﬁcation of spirals. A certain population of the cells,dependent upon the requirements imposed, will not beassignable to either the spiral or the elliptical subvolumeand will remain undeﬁned. However, it is by no meansclear, that the parameter combinations which perform bestat selecting a pure and complete population of spirals willdo the same for ellipticals. As our focus has been to identifya method of reliably selecting spirals,we do not furtherdiscuss the selection of ellipticals. We note, however, that itwould be straight-forward to implement and optimize sucha method. We have also supplied the elliptical fractionsand relative errors for the three discretizations supplied inappendix A.

The use of parametric methods, such as linear discriminantanalysis for example, in classifying galaxies is attractive,as these methods are capable of assigning a probabilisticclassiﬁcation to the morphology of a galaxy, rather thana binary one such as that presented here, which willsuﬀer from contamination due to quantization eﬀects.Furthermore, as also discussed in section 3, calibratingthe cell based method requires substantial samples ofgalaxies with visual classiﬁcations, while the training setsfor parametric methods can be smaller. However, theapplicability of such a parametric method depends on theprobability distributions of galaxy properties conformingto the assumed parameterization, which may not be thecase. Obviously, a strength of the non-parametric methodpresented in this work is that it removes such biases arisingfrom assumptions about the correct parameterization.We suggest, that the non-parametric method presentedhere can also be used to investigate the performance ofparametric methods. If the results of both approaches are inreasonable agreement it may be possible to conﬁdently em- ploy the parametric method to selecting samples, relaxingthe required size of a putative calibration sample. A furtherinvestigation into the performance of multi-parameter mor-phological classiﬁcations using linear discriminant analysisand the cell-based method presented here as a comparisonwill be presented in a companion paper (Robotham et al.,in prep.).

As an application of the cell-based technique for selectingspiral galaxies we use it to rederive the empirical scalingrelation between the speciﬁc star-formation rate and thestellar mass (the ψ ∗ − M ∗ relation) for this class of objects.Previous derivations of the ψ ∗ − M ∗ relation have usedgalaxy samples sensitive to star formation properties intheir deﬁnition, thus potentially biasing the obtainedresults. A further factor inﬂuencing the derivation of the ψ ∗ − M ∗ relation is the attenuation of stellar emission fromthe galaxy due to its dust content, which introduces a largecomponent of scatter, as well as potentially of bias, into therelation. Here we capitalize on the selection of a relativelypure sample of galaxies of known disk-like geometry, byapplying a radiation transfer technique to correct forthe attenuation of stellar emission by dust, utilizing thegeometrical information (eﬀective radii & axis ratio) of eachgalaxy. To this end, we utilize the method of Grootes etal. (2013), who have presented a method to obtain highlyaccurate radiation transfer based attenuation correctionson an object-by- object basis, using only broadband opticalphotometric observables not directly linked to star forma-tion, in particular the stellar mass surface density. Themethod of Grootes et al. (2013), however, critically relieson the underlying radiation transfer model of Popescu etal. (2011) being applicable to the galaxies considered, andthus requires a clean sample of galaxies with disk geometrynot hosting AGN. c (cid:13) , 1–39 hotometric Proxies for Selecting Spirals ψ ∗ − M ∗ relation formorphologically-selected spiral galaxies in thelocal universe Starting from the

OPTICALsample we deﬁne a sampleof spirals using the cell-based method and the parametercombination (log( n ),log( r e ), M i ) and impose a redshift limitof z = 0 .

05. As shown by (e.g. Taylor et al. 2011), theSDSS with a limiting depth of r petro, = 17 .

77 is &

80 %complete for M ∗ > . M ⊙ to this redshift. The sampleconsidered thus represents a volume-limited sample for thismass range. The sample is further limited to objects withan NUV detection as well as those for which there is no UVcounterpart to the SDSS galaxy in the preliminary GCATMSC (Seibert et al., 2013 in prep.), excluding ambiguousmultiple matches which would require ﬂux redistribution.For the sources lacking an NUV counterpart, 3- σ upperlimits have been calculated. Finally, objects deﬁned as AGNfollowing the prescription of (Kewley et al. 2006) using theratios of [NII] to H α and [OIII] to H β have been excluded.This results in a total of 9885 galaxies, 536 of which haveno counterpart in the preliminary GCAT MSC. A visual in-spection of a random selection of these non-detected sourcesﬁnds that a large fraction ( ∼

50 %) of these non-detectionslie in the vicinity of bright stars or at the very edge ofGALEX tiles, so may actually have an NUV counterpart.In the following, we therefore proceed by considering twosamples: i) the entire selected sample of spiral galaxies,treating all nondetections as real non-detections, andii) only the subset of spirals with an NUV counterpart,implicitly assuming that all non-detections actually possessan NUV counterpart, and can thus be discarded. Bycomparing the ψ ∗ − M ∗ relation for the two samples, we willshow that the eﬀect of the NUV non-detections is negligibleon the derivation of the ψ ∗ − M ∗ relation.For all spiral galaxies, we have corrected the observed UVphotometry (detections and upper limits) for the eﬀectsof attenuation by dust using the radiation-transfer basedmethod presented in Grootes et al. (2013), and have derivedvalues of ψ ∗ from the de-attenuated UV photometry usingthe conversion factors given in Kennicutt (1998), scaledfrom a Salpeter (1955) IMF to a Chabrier (2003) IMF as inTreyer et al. (2007) and Salim et al. (2007). The requiredstellar masses have been derived as detailed in Sect. 2. Incli-nations (required for the attenuation corrections alongsidethe eﬀective radii) have been derived from the observedellipticity as i = arccos(1 − e ) and subsequently correctedfor the eﬀects of ﬁnite disk thickness as detailed in Sect. 3of Driver et al. (2007), using an assumed intrinsic ratio ofscale-height to semi-major axis of 0.12.Fig. 18 shows the values of ψ ∗ as a function of M ∗ before and after correction for dust attenuation (middleand top panel, respectively), with the median in bins of0 . M ∗ shown as large ﬁlled circles with the errorbarsindicating the interquartile range in logarithmic scatter ineach bin. Without attenuation corrections, the ψ ∗ − M ∗ relation displays a mean logarithmic scatter of 0 .

70 dex(0 .

63 dex considering only NUV-detected sources) for the The mean logarithmic scatter is calculated as the diﬀerencebetween the quartiles of the distribution in ψ ∗ , averaged over 15 volume-limited sample. A pure power-law ﬁt to the mediandistribution of the uncorrected sample ﬁnds an index of γ ≈ − .

8, but also shows that a pure power-law is onlymarginally suited to describing the distribution.After applying attenuation corrections, we ﬁnd that themean logarithmic scatter is reduced to 0 .

48 dex (0 .

43 dexconsidering only NUV-detected sources). In addition to thislarge reduction in scatter, we ﬁnd that the median ψ ∗ − M ∗ relation for the volume-limited corrected sample is wellrepresented by a pure power-law with an index of γ ≈ − . M ∗ , and that this power-law alsoprovides a good parameterization of the relation at leastdown to M ∗ = 10 M ⊙ . The exact value of the power- lawindex found using a linear regression analysis of the binwisemedian of ψ ∗ as shown in Fig. 18 is γ = − . ± . M ∗ considered, despitethe use of a sample incorporating red, quiescent spirals notconsidered in previous studies.Both for the corrected and uncorrected samples the median ψ ∗ − M ∗ relation is largely invariant between the wholesample, and the subsample considering only NUV-detectedgalaxies, indicating that the true distribution of NUVdetections and upper limits would provide similar results.As the selection of spiral galaxies is purely morpho-logically based, the sample is capable of including veryred and potentially passive spiral galaxies and shouldhave a low contamination rate by ellipticals ( ∼ ψ ∗ ,which might aﬀect the ψ ∗ − M ∗ relation. To investigateto what extent the visible population of passive spirals isin fact a population of misclassiﬁed ellipticals, we havevisually inspected a random sample of galaxies with NUVdetections, M ∗ > . M ⊙ , and log( ψ ∗ / yr − ) −

11 aftercorrection for dust. Fifteen randomly selected such galaxiesare shown in Fig. 19. All but two galaxies (top right paneland middle panel of second row) are clearly disk dominatedspirals, showing that the large majority of the consideredpopulation appear to be disk-like galaxies. This serves asfurther validation of the cell-based selection technique, andimplies that the derived ψ ∗ − M ∗ relation is not biased bya large contamination of elliptical galaxies.Conversely, even for the combination(log( n ),log( r e ), M i ) a slight bias against early typespirals remains, which could potentially aﬀect the ψ ∗ − M ∗ relation, in particular if a large fraction of the massive, redspiral population were missed by the cell-based selectionmethod. To investigate this potential eﬀect we begin byconsidering the early-type spirals in the NAIRsample (i.e.T-type > NAIRsample using the cell-based methodto be red ( u − r > .

2) and massive ( M ∗ > . M ⊙ ),compared to 38 % red and massive galaxies amongst the equal sized bins in M ∗ spanning 10 . M ⊙ M ∗ M ⊙ , andweighted by the number of galaxies in each bin.c (cid:13) , 1–39 M. W. Grootes, et al.

Figure 18.

Speciﬁc star formation rate ( ψ ∗ ) versus stellar mass( M ∗ ) for a sample of spiral galaxies selected using the cell-basedmethod and the parameter combination (log( n ),log( r e ), M i ) andnot hosting an AGN following the prescription of Kewley et al.(2006), with z .

05. Individual sources are plotted as ﬁlled cir-cles with the grayscale color indicating the relative source densityat their position in the ψ ∗ − M ∗ plane. Values of ψ ∗ have beenderived from NUV photometry as described in Sect. 7.1. Galaxieswithout an NUV counterpart in the GCAT MSC (Seibert et al.,2013 in prep.) are show as 3- σ upper limits. The limiting stellarmass of M ∗ > . M ⊙ above which the sample can be consid-ered volume limited is indicated by a vertical dash-dotted line.The median value in bins of 0 . M ∗ is shown as large ﬁlledcircles, with errorbars depicting the interquartile range in eachbin. The medians and scatter for the whole sample are shown inblack, while those of the sample considering only sources withNUV counterparts are shown in red. The top panel shows thedistribution and median relations after radiation transfer basedattenuation corrections following Grootes et al. (2013) have beenapplied, while the middle panel shows the uncorrected distribu-tion and median relations. The black and red dashed lines in thetop and middle panels show power-law ﬁts to the median rela-tion in the mass range 10 . M ⊙ M ∗ M ⊙ , correspond-ing to the volume-limited sample. The bottom panel shows thecorrected (circles) and uncorrected (stars) relations to facilitate adirect comparison of the slope and scatter before and after correc-tion for dust attenuation. Spirals found to host an AGN followingthe prescription of Kewley et al. (2006) are shown by blue starsin the middle panel. The relations found using the prescriptionof Baldry et al. (2004) and a simple S´ersic index cut are shownin azure and orange respectively, with the dashed line showingthe relation as determined from all galaxies considered, and thedash-dotted line indicating the relation as recovered using onlythe detected sources. early type-spirals NOT recovered by the cell- based method,implying that the early-type galaxies not recovered are notstrongly weighted more towards massive red objects thanthose recovered. To judge the impact of the bias againstearly-type spirals on the ψ ∗ − M ∗ relation, however, it isnecessary to consider not only the early-type galaxies, butthe entire populations of spiral galaxies in the NAIRsample recovered, respectively not recovered by the cell-basedmethod. Overall, one ﬁnds that for galaxies classiﬁed asspirals by Nair & Abraham (2010) and recovered by thecell-based method with the parameter combination (log( n ),log( r e ), M i ) massive red galaxies constitute 15 % of thesample, while massive red galaxies constitute 27 % ofthe spirals not recovered by the cell-based method. Thisrelatively small shift in weight at the massive red end( ∼

12 %) combined with the high completeness fraction( >

65 %) attained by the cell-based selection implies thatthe results obtained for the ψ ∗ − M ∗ relation for spiralgalaxies in the local universe are robust. Thus, althoughit is possible that the actual ψ ∗ − M ∗ relation may stillbe slightly steeper, this further steepening will be smallcompared to the steepening to the ψ ∗ proptoM − . ∗ lawfound for the cell-based sample.Finally, Fig. 18 shows the location of spiral galaxieshosting an AGN on the ψ ∗ − M ∗ relation. Although theinterpretation of the NUV emission of such sources asbeing indicative of their SFR is by no means secure, sincethe AGN can also signiﬁcantly contribute to the NUVemission, we ﬁnd that the ratio of NUV emission to stellarmass of spiral galaxies hosting optically identiﬁed AGN isnot readily distinguishable from that of similar galaxieswithout an AGN. AGN host galaxies do, however, appearto be more massive than ∼ M ⊙ as a rule, and displaya larger scatter. Fig. 21 shows the locations of opticallyidentiﬁed AGN in a sample of galaxies with the additionalrequirement of H α and H β lines with S/N > ψ ∗ − M ∗ relations for colour-selected andS´ersic-index selected samples We have previously argued and demonstrated, that thecell-based method of selecting pure and complete samplesof spiral galaxies is capable of including quiescent spiralsand is therefore well-suited to investigating the ψ ∗ − M ∗ relation for a morphologically deﬁned sample of spiralgalaxies. In order to illustrate the eﬀect that the choiceof classiﬁcation method has on the results derived forthe ψ ∗ − M ∗ relation and demonstrate the necessity ofan adequate selection method, Fig. 20 shows the relationfor galaxy samples drawn from the OPTICALsample andlimited to z .

05 selected using the prescription of Baldryet al. (2004) (left) and the S´ersic index (right). Attenuationcorrections have been applied using the method of Grooteset al. (2013) as previously described. The derived relationshave also been overplotted in Fig. 18 for comparison. For c (cid:13) , 1–39 hotometric Proxies for Selecting Spirals Figure 19.

SDSS DR7 5 band images of a random selec-tion of 15 spiral galaxies from the sample considered with anNUV counterpart in the GCAT MSC, M ∗ > . M ⊙ , andlog( ψ ∗ /M ⊙ kpc − ) −

11 after attenuation corrections havebeen applied. All but two of the sources (top right and secondrow middle) display a disk-like morphology. The images have beenretrieved using the SDSS Explore tool. the sample selected following the method of Baldry et al.(2004) we ﬁnd a power law index of γ = − . ± . γ = − . ± .

11 after applying attenuation corrections.Both before and after correction a single power-law appearsto be an adequate representation of the ψ ∗ − M ∗ relation forthis sample. Considering the scatter in the ψ ∗ − M ∗ relationwe ﬁnd that the relation is tight both before and afterapplying attenuation corrections, with values of 0 .

52 dexinterquartile and 0 .

40 dex, respectively.Using the S´ersic index to select a sample of spiralgalaxies, we ﬁnd a power-law index of γ = − . ± . γ = − . ± .

14 after applying attenuation cor-rections. The ψ ∗ − M ∗ relation before correction, however, is not well described by a single power-law. For the sampleselected in this manner, the ψ ∗ − M ∗ relation displays ascatter of 0 .

89 dex interquartile before applying attenuationcorrections which is reduced to 0 .

59 dex interquartile byapplying attenuation corrections.For both these sample selection methods - by S´ersic-indexand by colour - the power-law indices recovered are indica-tive of a shallower relation than for the cell-based selection.Given the similarity of the relations at lower stellar masses( ∼ . M ⊙ ) this appears to be largely due to a diﬀerencein the samples in the high stellar mass range, with thecell-based selection recovering more quiescent spirals. Thisis in line with the ﬁnding that the samples selected bythese widely used proxies are more strongly biased towardssources with large values of H α equivalent width. It isparticularly note-worthy that the colour based selectionof Baldry et al. (2004) leads to a much shallower slopeand a very low scatter, most likely due to the exclusion ofquiescent galaxies.This comparison demonstrates the care necessary inconstructing galaxy samples for the purpose of statisticalinvestigations and illustrates the suitability of the cell-basedmethod of morphological classiﬁcation for the investigationof the star formation properties of morphologically selectedsamples of spiral galaxies. A further discussion of the eﬀectsof sample construction on the ψ ∗ − M ∗ relation is given inSect. 7.3. In deriving the intrinsic ψ ∗ − M ∗ relation for spiral galaxiesin the local universe we have made use of the prescriptionfor obtaining attenuation corrections given by Grootes etal. (2013) and the radiation transfer model of Popescu etal. (2011), as empirically calibrated on a sample of nearbyspirals (see Xilouris et al. 1999; Popescu et al. 2000, 2004;Misiriotis et al. 2001) and incorporating corrections forthe eﬀects of dust on the perceived eﬀective radii of disksby Pastrav et al. (2013b). In order to investigate to whatextent the results obtained depend on the chosen methodof deriving attenuation corrections, we compare the resultsobtained using the prescription of Calzetti et al. (2000)with those obtained using the method of (Grootes et al.2013). These two correction methods, while both beingempirically based, have a very diﬀerent basis. Whereas themethod of Grootes et al. (2013) is calibrated on a sample oflocal universe spirals with FIR-UV detections, the methodof Calzetti et al. (2000) is calibrated on a sample of distantstarburst galaxies, utilizing measurements of emission lineﬂuxes. Furthermore, whereas, by virtue of its radiationtransfer treatment, the method of Grootes et al. (2013)does not assume a ﬁxed attenuation law in the UV/optical,this is the case for the method of Calzetti et al. (2000).This is potentially a critical factor when correcting for dustattenuation in spiral galaxies which lie on the transitionbetween optically thick and thin systems, for which oneexpects a large range in the shape of the attenuationcurve. Because of the requirement of emission line ﬂuxes,the comparison must be based on a diﬀerent sample, thistime incorporating galaxies with H α and H β line ﬂuxes c (cid:13) , 1–39 M. W. Grootes, et al.

Figure 20.

Speciﬁc star formation rate ( ψ ∗ ) versus stellar mass ( M ∗ ) for a sample of spiral galaxies selected using the method of(Baldry et al. 2004) (left top and bottom) and a simple S´ersic index cut (right top and bottom) and not hosting an AGN following theprescription of Kewley et al. (2006), with z .

05. Individual sources are plotted as ﬁlled circles with the grayscale color indicatingthe relative source density at their position in the ψ ∗ − M ∗ plane. Values of ψ ∗ have been derived as previously detailed. Galaxieswithout an NUV counterpart in the GCAT MSC (Seibert et al., 2013 in prep.) are show as 3- σ upper limits. The median values of ψ ∗ in bins of 0 . M ∗ are shown as large symbols, with errorbars depicting the interquartile range in each bin. The medians andscatter for the whole sample are shown in ﬁlled symbols and colour, while the medians of the sample considering only sources withNUV counterparts are shown as black outlines. The top panels show the distribution and median relations after radiation transfer basedattenuation corrections following Grootes et al. (2013) have been applied, while the bottom panels show the uncorrected distributionand median relations. The dashed and dash-dotted lines in the top and bottom panels show power-law ﬁts to the median relations in themass range 10 . M ⊙ M ∗ M ⊙ , corresponding to the volume-limited samples, with the dashed line showing the relation derivedfor the entire sample and the dash-dotted line showing the relation as derived only for the detected sources. Spiral galaxies found to hostan AGN following the prescription of Kewley et al. (2006) are shown by blue stars in the bottom panels. measured at > σ , which eﬀectively removes the popu-lation of red, quiescent galaxies. Thus, we select a sampleof spiral galaxies with NUV counterparts, selected usingthe cell- based method with the parameter combination(log( n ),log( r e ), M i ), with z .

05, not hosting an AGN,and with H α and H β line ﬂuxes measured at > σ as thebasis for the following comparison. We emphasize that therequirements on the spectroscopic information serve onlyto facilitate the comparison with the corrections obtainedusing the prescription of Calzetti et al. (2000).Fig. 21 shows the distributions of ψ ∗ as a functionof M ∗ without corrections for dust (top left) and withcorrections obtained using the radiation-transfer basedmethod of Grootes et al. (2013) (bottom left). The ψ ∗ − M ∗ relation obtained using the method detailed in Calzetti etal. (2000) for correcting dust attenuation is shown in the topright panel. As in the case for the full sample incorporatingred disks, the radiation transfer based corrections lead to asigniﬁcant tightening of the relation, in this case reducingthe mean logarithmic scatter from 0 .

58 dex to 0 .

37 dex. Thislends conﬁdence that the radiation transfer method also hasthe ability to predict the correct overall shift in the relation(see also discussion in Grootes et al. 2013, Sects. 5 & 6).By contrast, under the application of the corrections basedon the Balmer decrement the scatter remains at 0 .

49 dex. Nevertheless, the overall shift in the relation towardslarger values of ψ ∗ by 0 . ψ ∗ on M ∗ than found for the uncorrectedrelation, with the slope of the relation obtained using theprescription of Calzetti et al. (2000) being slightly shallowerthan that of the relation obtained by applying the methodof Grootes et al. (2013). The power-law index found underboth corrections is close to γ ≈ − .

4. The ﬂatteningcompared to the power-law index of γ ≈ − . M ∗ ≈ . M ⊙ ,not found when using the Grootes et al. (2013) attenuationcorrections. The fact that the Grootes et al. attenuation c (cid:13) , 1–39 hotometric Proxies for Selecting Spirals corrections signiﬁcantly reduce the overall scatter in therelation, may imply that the break is actually not physicalin nature, but rather may be an artifact of the applicationof the Calzetti et al. (2000) corrections to high mass spiralgalaxies. ψ ∗ − M ∗ relation with previous determinations Previous determinations of the ψ ∗ − M ∗ relation havegenerally necessarily been restricted to galaxy samplesencompassing the complete population of galaxies (e.g.Salim et al. 2007; Elbaz et al. 2007; Noeske et al. 2007), orto samples selected on the basis of colour or star-formationactivity (e.g Peng et al. 2010; Whitaker et al. 2012). Assuch, the ψ ∗ − M ∗ relation has been deﬁned in terms of ablue sequence, or more generally a sequence of star-forminggalaxies, and has been contrasted with a red sequence,or more generally a sequence of non-star-forming galaxies(Peng et al. 2010, respectively Noeske et al. 2007; Whitakeret al. 2012). However, the more fundamental distinctionmay be the morphology of the galaxy. This is because, whilerotationally supported galaxies can support an extendedcold ISM which can support distributed star-formation,any extended ISM in a spheroid must be hot and tenuousif it is in virial equilibrium with the total mass distributionas traced by the stars, in which case it would be expectedto be ineﬃcient in forming stars. To constrain processesdriving star-formation in galaxies, it is therefore instructiveto establish the ψ ∗ − M ∗ relation for a pure disk sample.We have found this relation to be a relatively tight(0 .

42 dex mean logarithmic interquartile range, correspond-ing to 0 .

31 dex 1- σ for a normal distribution) power-lawwith an index of γ = − . ± .

12, with no indicationof a cut-oﬀ at high stellar mass. This result shows thatthe phenomenon of down-sizing is also exhibited by amorphologically pure sample of disk galaxies, and is not justdue to an increasing fraction of spheroids with increasingstellar mass in the general galaxy population.The lack of an obvious turn-oﬀ in the ψ ∗ − M ∗ relationfor spirals, despite the inclusion of red quiescent spirals,suggests that if a mechanism exists to restrict the growthof spiral galaxies beyond the stellar mass range probed,such a mechanism must be accompanied by an abrupttransformation of galaxy morphology.As outlined above, previous works addressing the ψ ∗ − M ∗ relation have concentrated on the sequence ofstar-forming galaxies rather than a morphologically deﬁnedsample. For example Peng et al. (2010) make use of a U − B color selection (their Eq. 2) akin to that of Baldry et al.(2004) investigated in Sect. 7.1 of this paper, applying it toa sample of SDSS galaxies with star formation rates derivedfrom H α line measurements as provided by Brinchmannet al. (2004). These authors ﬁnd a power law index of Down-sizing describes the phenomenon that star-formation inthe current epoch is biased towards low mass structures, in con-trast to the sequence of growth in dark matter structures, whichprogresses from low mass to high mass γ = − .

1, much shallower than the relation found in thiswork. Similarly, Whitaker et al. (2012) ﬁnd that for localuniverse star-forming galaxies selected using U − V & V − J restframe colors, selecting a blue subset of thesegalaxies results in a shallow slope similar to that of Penget al. (2010). However, considering their full sample ofstar-forming galaxies Whitaker et al. (2012) ﬁnd a steeperslope of γ ≈ − .

4. Finally, Noeske et al. (2007) ﬁnd aslope of γ = − . ± .

08 for local universe galaxies withindications of on-going star-formation either in form of24 µ m emission and/or H α emission.The fact that these previously determined values of γ areall shallower than the relation found for a morphologicallyselected sample of spirals presented in this work, can bereadily understood. By selecting actively star-formingsystems, quiescent galaxies of similar morphology areexcluded from the samples. As passive spirals tend to bemore massive, on average, this leads to a ﬂattening ofthe ψ ∗ − M ∗ with respect to a morphologically deﬁned,sample, as similarly argued by Whitaker et al. (2012) in thecontext of the result of Peng et al. (2010). Indeed, for thesample of spirals selected using the cell-based method withthe combination (log( n ), log( r e ), M i ) and the additionalrequirement of H α and H β detections, as used in Sect. 7.2,we ﬁnd the ψ ∗ − M ∗ relation to be well described by asingle power-law with an index of γ = − . ± .

09 anda scatter of 0 .

37 dex interquartile (0 .

27 dex 1- σ , assuminga normal distribution), very similar to the results forstar-forming galaxies as obtained by other authors aspreviously discussed.Overall, we thus ﬁnd that the ψ ∗ − M ∗ relation for amorphologically selected sample of spiral galaxies with anindex of γ = − . γ = − . · · · − . We have presented a non-parametric cell-based method ofselecting robust, pure, complete, and largely unbiased sam-ples of spirals using combinations of three parameters de-rived from (UV/)optical photometry. We ﬁnd that the pa-rameters log( r e ), log( µ ∗ ), log( n ), and M i perform well inselecting simultaneously pure and complete samples, whilethe use of the ellipticity e leads to pure yet incomplete sam-ples. These parameters, which are linked to older stellarpopulations, perform at least as well as selections using the u − r colour or the NUV − r colour after NUV preselec-tion. The remarkable success/importance of these seldomutilized parameters is consistent with the expected contrastin the structural properties of rotationally supported sys-tems (spirals) and pressure supported systems (ellipticals),in agreement with diﬀerent evolutionary tracks for spiral andelliptical galaxies.For a selection of combinations of three parameters, thecell-based method is superior to a range of (widely used)photometric morphological proxies, and comparable to the c (cid:13) , 1–39 M. W. Grootes, et al.

Figure 21.

Speciﬁc star formation rate ψ ∗ versus stellar mass M ∗ for a subsample of spirals galaxies drawn from the OPTICALsample using the cell-based method and the parameter combination (log( n ),log( r e ), M i ) with z .

05, NUV detections and H α and H β ﬂuxesat > σ , not hosting an AGN. The linear grayscale indicates the relative galaxy density in the ψ ∗ − M ∗ plane at the position of thegalaxy. The same scale has been applied to all panels. The vertical dashed-dotted line indicates the stellar mass limit above which thesample can be considered complete. The sources are binned in bins of equal size in M ∗ , with the bars showing the interquartile range andthe ﬁlled symbols (stars, inverted triangles and circles) showing the median value of ψ ∗ in each bin. The dashed line in the top panelsand the bottom left panel shows a single power-law ﬁt to the binwise median values in the mass range 10 . M ⊙ M ∗ M ⊙ . Thebottom right panel shows the median relations to facilitate comparison. The uncorrected relation is shown as inverted triangles and adash-dotted line. The relation corrected for dust attenuation following Grootes et al. (2013) is shown as circles and a solid line, whilethe relation corrected for dust attenuation following Calzetti et al. (2000) is shown as stars and a dashed line. The bin centers have beenoﬀset by 0.01 in log( M ∗ ) for improved legibility. The scatter in the relation due to the scatter in the NUV is signiﬁcantly reduced forthe corrections based on the radiation transfer model, while the Balmer decrement based corrections have no discernible eﬀect on thescatter. In both cases the intrinsic values of ψ ∗ are shifted upwards w.r.t. the uncorrected values. Spiral galaxies fulﬁlling the criteria ofthe sample but hosting an AGN have been overplotted as blue stars in the top left panel. algorithmic classiﬁcation approach using support vector ma-chines presented by Huertas-Company et al. (2011) in select-ing pure and complete samples of spirals from faint galaxysurveys.The optimum combinations for use with the method mayvary according to the science application for which the sam-ple is being constructed. For application to optically de-ﬁned galaxy samples comparable in depth or deeper thanSDSS we identify the combinations (log( n ),log( r e ),log( µ ∗ )),(log( n ),log( r e ), M i ), and (log( n ),log( M ∗ ),log( µ ∗ )) to be themost eﬃcient in selecting a sample of spirals balanced be-tween purity and completeness.While using NUV data can lead to purer samples, it posesthe possibility of a bias against UV faint sources andedge-on systems. Furthermore, we caution that making useof UV/optical colours additionally poses stringent require-ments on the depths of the samples used in order to providecomplete and unbiased samples.In this paper, we have used the cell-based classiﬁcationscheme with the parameter combination (log( n ),log( r e ), M i )to investigate the speciﬁc star-formation rate - stellar mass ( ψ ∗ − M ∗ ) relation for a purely morphologically deﬁned sam-ple of spiral galaxies. Using this approach which is unbiasedin terms of star-formation properties and includes red, qui-escent spiral galaxies, we ﬁnd that the intrinsic, i.e. dust cor-rected, ψ ∗ − M ∗ relation for spiral galaxies can be representedas a single continuous power-law with an index of − . . M ⊙ M ∗ M ⊙ , likely even ex-tending to 10 M ⊙ M ∗ . Despite the inclusion of quiescentgalaxies, the relation is also found to be very tight, with amean interquartile range of 0 . n ),log( r e ), M i ), as used in the investi-gation of the ψ ∗ − M ∗ relation, as well as for the combinations(log( n ),log( r e ),log( µ ∗ )) and (log( n ),log( M ∗ ),log( µ ∗ )) in Ap-pendix A, together with a brief instruction on their use.Immediate future work will focus on using the method pre-sented to test the performance of linear discriminant analysis c (cid:13) , 1–39 hotometric Proxies for Selecting Spirals using multiple parameters in the morphological classiﬁcationof galaxies (Robotham et al., in prep.), as well as on deﬁn-ing samples of spirals for use in applications of radiationtransfer modelling techniques (Popescu et al. 2011), whichcritically rely on the existence of the appropriate geometry(in this case spiral disk geometry), to derive self-consistentcorrections of the attenuation of UV/optical light by dustin these objects. ACKNOWLEDGEMENTS

We thank Ted Wyder for his assistance in compiling thesample. Some of the results in this paper have been derivedusing the HEALPix REFERENCES

Abazajian K. N., et al., 2007, ApJS, 182, 543Abraham R. G., van den Bergh S., Nair P., 2003, ApJ, 588,218Adelman-McCarthy J. K., et al., 2006, ApJS, 162, 38 http://healpix.jpl.nasa.gov Baldry I. K., Glazebrook K., Brinkmann J., Ivezi´c ˇZ., Lup-ton R. H., Nichol R. C., Szalay A.S., 2004, ApJ, 600, 681Balogh M. L., et al., 2004, ApJ, 615, L101Bamford S., et al.,, 2009, MNRAS, 393, 1324Banerji M., et al., 2010, MNRAS, 406, 342Barden M., et al., 2005, ApJ, 635, 959Bell E., de Jong R. S., 2001, ApJ, 550, 212Bell E., et al., 2004, ApJ, 600, L11Bernardi M., et al., 2012, MNRAS, in press,arXiv:1211.6122Bertin E., Arnouts S., 1996, A&AS, 112, 393Blanton M. R., et al., 2003, ApJ, 594, 186Blanton M., Roweis S., 2007, AJ, 133, 734Bournaud F., Jog C. J., Combes F., 2007, A&A, 476, 1179Brinchmann J., et al., MNRAS, 2004, 351, 1151Calzetti D., et al., 2000, ApJ, 533, 682Chabrier G., 2003, PASP, 115, 763Conselice C., J., 2003, ApJS, 147, 1de Jong J. T. A., Verdoes Kleijn G. A., Kuijken K. H.,Valentijn E. A., 2012, ExA, in press, arXiv:1206.1254de Vaucouleurs G., 1964, AnAp, 11, 247Driver S. P., Popescu C. C., Tuﬀs R. J., Liske J., GrahamA. W., Allen P. D., de Propris R., 2007, MNRAS, 379,1022Driver S. P., et al., 2011, MNRAS, 413, 971Driver S. P., et al., 2012, MNRAS, 427, 3244Einasto M., et al., 2010, A&A, 522, 92Elbaz D., et al., 2007, A&A, 468, 33Gini C., 1912, reprinted in Memorie di Metodologia Statis-tica. ed. E. Pizetti & T. Salvemini (1955; Rome: LibreriaEredi Virigilio Veschi)G´orski K. M., et al., 2005, ApJ, 622, 759Graham A. W., Driver S. P., Petrosian V., Conselice C. J.,Bershady M. A., Crawford S. M., Goto T., 2005, AJ, 130,1535Grootes M. W., et al., 2013, ApJ, 766, 59Hopkins A. M., McClure-Griﬃths N. M., Gaensler B. M.,2008, ApJ, 682, L13Hubble E. P., 1926, ApJ, 64, 321Huertas-Company M., Rouan D., Tasca L., Soucail G., LeF`evre O., 2008, A&A, 478, 971Huertas-Company M., Aguerri J. A. L., Bernardi M., MeiS., S´anchez Almeida J., 2011, A&A, 525, 157Jogee S., et al., 2004, ApJ, 615, L105Kauﬀmann G., et al., 2003, MNRAS, 341, 54Keller S. C., et al., 2007, PASA, 24, 1Kelvin L. S., et al., 2012, MNRAS, 421, 1007Kennicutt R. C., 1998, ARA&A, 36, 189Kewley L. J., Groves B., Kauﬀmann G., Heckman T., 2006,MNRAS, 372, 961Laureijs R., et al., 2011, arXiv:1110.2193v1Lintott C. J., et al., 2008, MNRAS, 389, 1179Lintott C. J., et al., 2011, MNRAS, 410, 166Lotz J. M., Primack J., Madau P., 2004, AJ, 128, 163Martin D. C., et al., 2005, ApJL, 619, L1Morrissey P., et al., 2007, ApJS, 173, 682Morgan W. W., Keenan P. C., 1973, ARA&A, 11, 29Misiriotis, A., Popescu, C.C., Tuﬀs, R.J., & Kylaﬁs, N.D.2001, A&A, 372, 775M¨ollenhoﬀ C., Popescu C. C., Tuﬀs R. J., 2006, A&A, 456,941Moustakas J., et al., 2013, ApJ, 767, 50 c (cid:13) , 1–39 M. W. Grootes, et al.

Patel S. G., Holden B. D., Kelson D. D., Franx M., van derWel A., Illingworth G. D., 2012, ApJ, 748, L27Nair P. B., Abraham R. G., 2010, ApJS, 186, 427Nicol M.-H., Meisenheimer K., Wolf C., Tapken C., 2011,ApJ, 727, 51Noeske K. G., et al., 2007, ApJ, 660, L43Pastrav B. A., Popescu C. C., Tuﬀs R. J., Sansom A., 2013,A&A, 553A, 80Pastrav B. A., Popescu C. C., Tuﬀs R. J., Sansom A., 2013,A&A, 557A, 137Peng C. Y., Ho L. C., Impey C. D., Rix H.-W., 2002, AJ,124, 266Peng Y.-j., et al., 2010, ApJ, 721, 193Popescu C. C., Misiriotis A., Kylaﬁs N. D., Tuﬀs R. J.,Fischera J., 2000, A&A, 362, 138Popescu, C.C., Tuﬀs, R.J., Kylaﬁs, N.D., & Madore, B.F.2004, A&A 414, 45Popescu C. C., Tuﬀs R. J., Dopita M. A., Fischera J., Ky-laﬁs N. D., Madore B. F., 2011, A&A, 527, 109Ravindranath S., et al., 2004, ApJ, 604, L9Robotham A. S. G., Driver S. P., 2011, MNRAS, 413, 2570Robotham A. S. G., et al., 2013, MNRAS, in press,arXiv:1301.7129Salim S., et al., 2007, ApJS, 173, 267Salpeter E. E., 1955, ApJ, 121, 161Scarlata C., et al., 2007, ApJS, 172, 406Schelgel D. J., Finkbeiner D. P., Davis M., 1998, ApJ, 500,525S´ersic J.-L. ,1968, Atlas de Galaxias Australes (Cordoba:Observatorio Astronomico)Simard L., et al., 2002, ApJS, 142, 1Simard L., Mendel J. T., Patton D. R., Ellison S. L., Mc-Connachie A. W., 2011, ApJS, 196, 11Stoughton C., et al., 2002, AJ, 123, 485Strateva I., et al., 2001, AJ, 122, 1861Taylor E. N., et al., 2011, MNRAS, 418, 1587Tempel E., Saar E., Liivam¨agi L. J., Tamm A., Einasto J.,Einasto M., M¨uller V., 2011, A&A, 529, A53The DES collaboration, 2005, 2005astro.ph.10346TTreyer M., et al., 2007, ApJS, 173, 256Tuﬀs R. J., Popescu C. C., V¨olk H. J., Kylaﬁs N. D., DopitaM. A., 2004, A&A, 419, 821Whitaker K. E., van Dokkum P. G., Brammer G., FranxM., 2012, ApJ, 754, L29Wyder T., et al., 2007, ApJS, 173, 293Xilouris, E. M., Byun, Y.I., Kylaﬁs, N. D., Paleologou, E.V., Papamastorakis, J., 1999, A&A, 344, 868York D. G., et al., 2000, AJ, 120, 1579

APPENDIX A: CELL DECOMPOSITIONS OFPARAMETER SPACE

We have found the parameter combinations(log( n ),log( r e ), M i ), (log( n ),log( r e ), log( µ ∗ )), and (log( n ),log( M ∗ ), log( µ ∗ )) to be most eﬃcient in retrieving asimultaneously pure and complete, largely unbiased sampleof spiral galaxies when applied to the optically deﬁnedgalaxy sample used in this work. In addition to the highvalues of purity and completeness, these selections requirea minimal amount of spectral coverage, hence can readily be applied to various samples of galaxies.Tabs. A1, A2, & A3 provide the decompositionsof the parameter space spanned for the combinations(log( n ),log( r e ), M i ), (log( n ),log( r e ), log( µ ∗ )), and (log( n ),log( M ∗ ), log( µ ∗ )) respectively. These discretizations havebeen performed using the entire OPTICALsample as a cal-ibration sample to maximize the purity and completeness.The full tables are available in the online version of thepaper and in machine readable form from the VizieR Serviceat the CDS . Rather than supply a binary classiﬁcationinto spiral and non-spiral cells we supply the spiral fractionand its relative error for each cell, allowing the reader toadapt the classiﬁcation to his purposes. We do, howevernote, that the underlying deﬁnition of a reliable spiral( P CS , DB > .

7) is ﬁxed.In addition, we have chosen to provide the elliptical fractionfor each cell and it relative error, where ellipticals are,analogously to spirals, deﬁned as sources with P EL , DB > . F sp , itsrelative error ∆ F sp , rel , the elliptical fraction F el , its relativeerror ∆ F el , rel , and the resolution level the cell belongs to (1;1 division per axis, 2; 4 divisions per axis, 3; 8 divisions peraxis, 4; 16 divisions per axis) . With this information theentire grid can, if desired, be reconstructed. For classifyinggalaxies the tables can be used as follows: • select criteria for being a spiral (or elliptical) cell interms of F sp and ∆ F sp , rel (respectively F el and ∆ F el , rel ) • for each source identify the nearest grid point to itsforward lower left • assign the values of F sp and ∆ F sp , rel from the corre-sponding cell to the source in question • after completion for all sources select those correspond-ing to the selection criteria determined Tabs. A1, A2, & A3 are available in machine readable form atthe CDS via anonymous ftp to cdsarc.u-strasbg.fr (130.79.128.5)or via http://cdsarc.u-strasbg.fr/c (cid:13) , 1–39 hotometric Proxies for Selecting Spirals resolution corner coordinates cell dimensions Spiral fractions Elliptical fractionslog( n ) log( r e ) M i dlog( n ) dlog( r e ) d M i F sp ∆ F sp , rel F el ∆ F el , rel Table A1.

Excerpt of cell grid for the combination (log( n ),log( r e ), M i ). For cells with a spiral(elliptical) population of 0 the relativeerror is set to 1e6.resolution corner coordinates cell dimensions Spiral fractions Elliptical fractionslog( n ) log( r e ) log( µ ∗ ) dlog( n ) dlog( r e ) dlog( µ ∗ ) F sp ∆ F sp , rel F el ∆ F el , rel Table A2.

Excerpt of cell grid for the combination (log( n ),log( r e ),log( µ ∗ )). For cells with a spiral(elliptical) population of 0 the relativeerror is set to 1e6. This paper has been typeset from a TEX/ L A TEX ﬁle preparedby the author. c (cid:13) , 1–39 M. W. Grootes, et al. resolution corner coordinates cell dimensions Spiral fractions Elliptical fractionslog( n ) log( M ∗ ) log( µ ∗ ) d(log( n )) dlog( M ∗ ) dlog( µ ∗ ) F sp ∆ F sp , rel F el ∆ F el , rel Table A3.

Excerpt of cell grid for the combination (log( n ),log( M ∗ ),log( µ ∗ )). For cells with a spiral(elliptical) population of 0 the relativeerror is set to 1e6. c (cid:13)000