[PDF] Synergies between low- and intermediate-redshift galaxy populations revealed with unsupervised machine learning

Abstract

The colour bimodality of galaxies provides an empirical basis for theories of galaxy evolution. However, the balance of processes that begets this bimodality has not yet been constrained. A more detailed view of the galaxy population is needed, which we achieve in this paper by using unsupervised machine learning to combine multi-dimensional data at two different epochs. We aim to understand the cosmic evolution of galaxy subpopulations by uncovering substructures within the colour bimodality. We choose a clustering algorithm that models clusters using only the most discriminative data available, and apply it to two galaxy samples: one from the second edition of the GALEX-SDSS-WISE Legacy Catalogue (GSWLC-2; z \sim 0.06), and the other from the VIMOS Public Extragalactic Redshift Survey (VIPERS; z \sim 0.65). We cluster within a nine-dimensional feature space defined purely by rest-frame ultraviolet-through-near-infrared colours. Both samples are similarly partitioned into seven clusters, breaking down into four of mostly star-forming galaxies (including the vast majority of green valley galaxies) and three of mostly passive galaxies. The separation between these two families of clusters suggests differences in the evolution of their galaxies, and that these differences are strongly expressed in their colours alone. The samples are closely related, with star-forming/green-valley clusters at both epochs forming morphological sequences, capturing the gradual internally-driven growth of galaxy bulges. At high stellar masses, this growth is linked with quenching. However, it is only in our low-redshift sample that additional, environmental processes appear to be involved in the evolution of low-mass passive galaxies.

Full PDF

MMNRAS , 1–22 (2021) Preprint Thursday 11 th February, 2021 Compiled using MNRAS L A TEX style ﬁle v3.0

Synergies between low- and intermediate-redshift galaxy populationsrevealed with unsupervised machine learning

Sebastian Turner ,★ , Małgorzata Siudek , , † , Samir Salim , Ivan K. Baldry , Agnieszka Pollo , ,Steven N. Longmore , Katarzyna Małek , , Chris A. Collins , Paulo J. Lisboa , Janusz Krywult ,Thibaud Moutard , Daniela Vergani , and Alexander Fritz Astrophysics Research Institute, Liverpool John Moores University, 146 Brownlow Hill, Liverpool, L3 5RF, UK Institut de F’isica d’Altes Energies, The Barcelona Institute of Science and Technology, 08193 Bellaterra, Spain National Centre for Nuclear Research, ul. Hoza 69, 00-681 Warsaw, Poland Department of Astronomy, Indiana University, Bloomington, IN 47405, USA Astronomical Observatory of the Jagiellonian University, ul. Orla 171, 30-244 Kraków, Poland Aix Marseille Univ. CNRS, CNES, LAM Marseille, France Department of Applied Mathematics, Liverpool John Moores University, Byrom Street, Liverpool, L3 3AF, UK Institute of Physics, Jan Kochanowski University, ul. Swietokrzyska 15, 25-406, Kielce, Poland Department of Astronomy & Physics and the Institute for Computational Astrophysics, Saint Mary’s University, 923 Robie Street, Halifax,Nova Scotia, B3H 3C3, Canada INAF - OAS Bologna, Via P. Gobetti 93, I-40129 Bologna, Italy Lichtenbergstraße 8, D-85748 Garching, Germany

Accepted XXX. Received YYY; in original form ZZZ

ABSTRACT

The composition of a galaxy is subject to the inﬂuence of an ever-changing balance of astrophysical and cosmological processes actingupon it. Hence, chronicling of the evolutionary history of a galaxy re-quires a precise knowledge of its present contents. A galaxy expressesits contents (stars, gas, dust, etc.) in its spectral energy distribution(SED). Therefore, inventory of the composition of a galaxy requiresmeasurement of the radiation that it emits as a function of wavelength ★ E-mail: [email protected] † E-mail: [email protected] (Conroy 2013). It is impractical to measure full galaxy spectra thatspan large wavelength ranges (e.g. ultraviolet-through-infrared), es-pecially for the large number of galaxies needed for a robust statisticalstudy of galaxy evolution. Instead, their SEDs must be inferred fromcurtailed, summary measurements.Colours are the simplest such measurements. Optical colours havebeen used to probe the contents of galaxies since the infancy of ex-tragalactic astrophysics (Roberts 1963). Early studies matched sumsof individual stellar spectra (i.e. synthetic composite spectra) to theobserved optical colours of galaxies in order to discern their stellarcontent (e.g. Spinrad 1962; Spinrad & Taylor 1971; Faber 1972). © a r X i v : . [ a s t r o - ph . GA ] F e b Sebastian Turner et al.

This method was superseded by stellar population synthesis (SPS),which uses theoretical models of stellar evolution to set astrophysi-cal constraints upon these synthetic composite spectra (e.g. Bruzual& Charlot 2003; Maraston 2005). The advancement of the scope ofSPS out to ultraviolet wavelengths and the incorporation of infraredemission models has facilitated the estimation of the full ultraviolet-through-infrared SEDs of galaxies from their observed colours (e.g.Ilbert et al. 2006; Da Cunha et al. 2008; Boquien et al. 2019). SEDsspanning these wavelength regimes are governed in their shapeschieﬂy by stellar emission, and by attenuation (in the ultraviolet andoptical) and re-emission (in the infrared) of stellar emission by inter-stellar dust.The discovery of a bimodality in the two-dimensional opticalcolour distribution of galaxies (Strateva et al. 2001; Baldry et al.2004) has begotten a simple empirical paradigm of galaxy evolution.Galaxies generally go from being blue and star-forming to being redand passive. This change in their colours (and quenching of theirstar formation) is accompanied for the most part by a change in theirmorphologies from disc- (‘late-type’) to spheroid-dominated (‘early-type’) and an increase in their local environmental densities (Baldryet al. 2006; Bamford et al. 2009). A variety of processes have beenproposed as drivers of galaxy evolution (see reviews by Kormendy& Kennicutt 2004 and Boselli & Gavazzi 2006) but their interplay ispoorly understood. Furthermore, exceptions to this paradigm (Schaw-inski et al. 2009; Masters et al. 2010) complicate the issue. Studiesaiming to disentangle the interplay of evolutionary processes havefocused on galaxies between the two peaks of the colour bimodal-ity (Fritz et al. 2014; Schawinski et al. 2014; Smethurst et al. 2015;Moutard et al. 2016b; Gu et al. 2018; Manzoni et al. 2019; Krywult etal. in prep), a region called the ‘green valley’ (Martin et al. 2007). Asgalaxies under the direct inﬂuence of evolutionary processes, they areideally poised to enable an understanding of how galaxies transitionfrom blue to red.Bimodalities of galaxies have since also been observed in coloursinvolving ultraviolet and near-infrared magnitudes (Wyder et al.2007; Williams et al. 2009; Arnouts et al. 2013). Diﬀerent colours,though, yield slightly diﬀerent bimodalities; for example, galaxiesoccupying the blue peak of the 𝑔 − 𝑟 bimodality may instead oc-cupy the green valley of the 𝑁𝑈𝑉 − 𝑟 bimodality (Salim 2014),because optical-optical colours probe star formation over longertimescales than ultraviolet-optical colours do. Hence it is clear that,for a complete description of the evolution of galaxies in the con-text of the bimodality and the green valley, several colours span-ning the ultraviolet-through-near-infrared wavelength regime mustbe considered simultaneously. Machine learning techniques, whichcan parse multiple features at once, are well suited to the task. Ex-ploration of the multi-dimensional ultraviolet-through-near-infraredcolour distribution of galaxies may overcome degeneracies that existin two-dimensional colour distributions, uncover substructures to theestablished bimodality, and reveal the extent to which the ultraviolet-through-near-infrared colours of galaxies express their evolution andassembly histories.The adoption of machine learning techniques within astronomyand astrophysics was primarily a response to the enormous data vol-umes anticipated from forthcoming surveys (e.g. 20 TB per nightfrom the Legacy Survey of Space and Time; Ivezić et al. 2019).While fulﬁlling the demand for automated data analysis methods,these techniques also invite a renewed examination of our under-standing of astrophysics due to their ability to distill interpretablemodels from complex, multi-dimensional input data that may be dif-ﬁcult to fully visualise. Supervised techniques are useful for mappingexisting domain knowledge onto new data. A supervised classiﬁca- tion algorithm, for example, may assign labels to previously unseenobservations after being trained on prelabelled observations. Un-supervised techniques, on the other hand, demonstrate substantialpromise for exploration and discovery because they are less relianton prior knowledge than supervised techniques. An unsupervisedclustering algorithm, for example, assigns labels to observations inaccordance with their intrinsic similarity to one another (i.e. thedistances between observations in terms of the features used to rep-resent them). Unsupervised techniques, then, construct models thatare driven purely by the structure of input data, and require no train-ing. They may therefore be said to express the ‘natural’ structureof the input data rather than expressing structures imposed upon itby assumptions that are explicitly built into the use of supervisedtechniques. The use of unsupervised techniques does, though, incor-porate implicit assumptions, and the precise deﬁnition of similaritycan vary between techniques. Ensuring the astrophysical utility ofthese models hence requires carefully considered choices of algo-rithm and features.A growing literature has emerged in recent years, reporting theresults of the application of unsupervised techniques to various as-trophysical contexts (see Baron 2019 and Ball & Brunner 2010 forcomprehensive reviews). Clustering has been used, for example, topartition galaxies on the basis of their pixel data (Hocking et al. 2017,2018; Martin et al. 2020), their spectra (Sánchez Almeida et al. 2010;de Souza et al. 2017), their SEDs (Siudek et al. 2018b,a), and theirderived astrophysical features (Barchi et al. 2016; Turner et al. 2019).Dimensionality reduction, which can extract important or discrimi-native information from large ensembles of input features, has beenused, for example, to produce simpliﬁed projections of galaxy sam-ples based on their multi-wavelength photometry (Steinhardt et al.2020) and their estimated SEDs (Davidzon et al. 2019; Hemmatiet al. 2019), and to classify their spectra (Yip et al. 2004; Marchettiet al. 2013).In this paper, we describe work that builds on that of Siudek et al.(2018b,a). They applied a clustering algorithm to partition galax-ies observed by the VIMOS Public Extragalactic Redshift Survey(VIPERS; Scodeggio et al. 2018). They chose the Fisher Expectation-Maximisation ( FEM ) algorithm, which implements a clustering ap-proach called the ‘Discriminative Latent Mixture’ (DLM) model. Thealgorithm incorporates dimensionality reduction as it iterates ratherthan as a part of any preparation of the input data ahead of cluster-ing. This ensures that improvements to the estimated parameters ofthe model are adaptive, and that the clustering uses only the mostimportant information available from the input features. They aimedto establish the ability of

FEM to determine a naturally deﬁned, astro-physically meaningful partition in a feature space of high dimension-ality (i.e. containing more potentially discriminative information thanlower dimensionalities). Their feature space was deﬁned by spectro-scopic redshifts and 12 rest-frame ultraviolet-through-near-infraredcolours. The 12 clusters that they determined revealed substructureto the established colour bimodality of galaxies, distinguishing sub-populations of galaxies that overlapped in two-dimensional colourdistributions. In addition, their clusters correlated with a variety ofastrophysical features including stellar masses, morphologies, andemission-line strengths.We adapt the approach of Siudek et al. (2018b,a) to comparesamples of galaxies at two diﬀerent redshifts. Our aim is to use clus-tering to characterise the structures of the samples in a commonfeature space of high dimensionality, to examine similarities and dif-ferences between these structures at the two cosmic epochs, and tointerpret these similarities and diﬀerences in the context of theoriesof galaxy evolution. While each cluster will constitute a class of

MNRAS , 1–22 (2021) edshift evolution of galaxy subpopulations galaxies that are intrinsically similar to one another, connections be-tween clusters will chart the evolution of galaxies through the featurespace. Hence, we also aim to establish how strongly the evolutionaryhistories of galaxies, which are ordinarily inferred using a combi-nation of various types of features (e.g. photometric, spectroscopic,morphological), are encoded in just their ultraviolet-through-infraredcolours. Our sample of galaxies at low redshift ( 𝑧 ∼ .

06) is drawnfrom the second edition of the GALEX-SDSS-WISE Legacy Cata-logue (GSWLC-2; Salim et al. 2018), and our sample of galaxiesat intermediate redshift ( 𝑧 ∼ .

65) is based on the VIPERS sampleof Siudek et al. (2018b). We prepare our samples carefully to en-sure a fair comparison of galaxies from diﬀerent cosmic epochs anddiﬀerent surveys, and to mitigate methodological inﬂuences on theclustering outcomes. We also adjust the input features, deﬁning nineneighbouring rest-frame colours that, together, represent the shapesof the ultraviolet-through-near-infrared SEDs of the galaxies in oursamples, and thus enable insight into their evolution.The remainder of this paper proceeds as follows. In Section 2, weintroduce our samples, the data we use to represent and analyse thegalaxies that they contain (including the estimation of their SEDs),and the measures that we take to ensure a fair comparison betweenthem. In Section 3, we explain the DLM model and how

FEM algo-rithm implements it, and we describe the feature space within whichwe cluster our samples. In Section 4, we present the outcomes of theclustering, and in Section 5, we oﬀer our interpretation thereof. Fi-nally, in Section 6, we summarise, make concluding statements, andsuggest future directions for our work. Where required, we assumea ( 𝐻 , Ω 𝑚 , Ω Λ ) = (70 km s − Mpc − , 0 .

3, 0 .

7) cosmology for ourcalculations.

The second edition of the GALEX-SDSS-WISE Legacy Catalogue(GSWLC-2; Salim et al. 2016, 2018) was assembled using DataRelease 10 (DR10; Ahn et al. 2014) of the Sloan Digital Sky Survey(SDSS; York et al. 2000). GSWLC-2 aimed to characterise the starformation activity and dust content of galaxies in the local Universe.It contains all SDSS DR10 galaxies that meet the following criteria: • have apparent 𝑟 -band petrosian magnitudes < • have spectroscopic redshifts within the range 0 . < 𝑧 < . • lie within the Galaxy Evolution Explorer (GALEX; Martin et al.2005; Morrissey et al. 2007) observation footprint, whether they weredetected by GALEX or not.The lower redshift limit was imposed to exclude foreground stars,and particularly close galaxies with potentially unreliable photometryand/or distance estimates. Retaining galaxies that were not actuallydetected by GALEX itself preserves the optical selection of SDSS.In all, these criteria select 659 ,

229 SDSS DR10 galaxies. 𝑢 -, 𝑔 -, 𝑟 -, 𝑖 -, and 𝑧 -band optical photometry for galaxies inGSWLC-2 was drawn from SDSS. modelMag magnitudes, whichare based on proﬁle ﬁts, were selected due to the accuracy of theircolours. These modelMag magnitudes were corrected for extinctiondue to Milky Way dust using the empirical Yuan et al. (2013) coeﬃ-cients.The SDSS optical photometry was supplemented with near-( 𝑁𝑈𝑉 ) and far-ultraviolet (

𝐹𝑈𝑉 ) photometry from GALEX’s ﬁnaldata release (GR6/7). GALEX conducted surveys at varying depths:an All-sky Imaging Survey (which observed several targets per orbit), a Medium Imaging Survey (one target per orbit), and a Deep ImagingSurvey (several orbits per target). These surveys were nested, suchthat it is possible for a galaxy to have been observed at more than onedepth (although an observation of a galaxy at a given depth does notguarantee an observation of the same galaxy at shallower depths).Here we use the UV photometry for galaxies in GSWLC-2 based onthe deepest available observation of each galaxy (catalogue GSWLC-X2). Salim et al. (2016) applied corrections to mitigate systematicoﬀsets between the SDSS and GALEX photometry, which arosemostly due to the blending of sources in GALEX’s low-resolutionimages. Peek & Schiminovich (2013) corrections for extinction dueto Milky Way dust were applied to the UV photometry. UV photom-etry in at least one of GALEX’s two bands (almost always

𝑁𝑈𝑉 ifjust one) is available for 65 per cent of GSWLC-2 galaxies, and for80 per cent of the galaxies in our ﬁnal GSWLC-2 sample (Section2.1.2).Wide-ﬁeld Infrared Survey Explorer (WISE; Wright et al. 2010)observations at 12 and 22 𝜇 m (channels W3 and W4 respectively)were used to provide mid-infrared (MIR) photometry for GSWLC-2galaxies. Salim et al. (2018) opted for unWISE (Lang et al. 2016)forced photometry, which was based directly on SDSS source posi-tions and proﬁles. MIR photometry in at least one of channels W3and W4 is available for 78 per cent of GSWLC-2 galaxies, and for87 per cent of the galaxies in our ﬁnal GSWLC-2 sample (Section2.1.2). The rest-frame SEDs of GSWLC-2 galaxies were estimated usingthe Code Investigating GALaxy Emission (

CIGALE ; Noll et al. 2009;Boquien et al. 2019). Synthetic spectra generated by

CIGALE werevalidated against the available observed UV-through-optical photom-etry in order to constrain the SEDs. Details of this ﬁtting procedureare described at length in Salim et al. (2016, 2018); here, we oﬀer abrief summary.Synthetic spectra were generated using Bruzual & Charlot (2003)simple stellar population templates, based on a Chabrier (2003) ini-tial mass function and with metallicities of log ( 𝑍 ) = − . − . − . ∼ 𝑍 (cid:12) ), or − .

3. These templates were combined with Myr-resolution star formation histories (SFHs) consisting of two expo-nentially declining episodes of star formation, producing an old anda young population. Absorption of stellar emission by dust was im-plemented via a Noll et al. (2009) generalisation of the Calzetti et al.(2000) attenuation curve, modiﬁed to allow its slope to vary and toadd a UV bump (see section 3.4 of Salim et al. 2018).The SED estimation was additionally constrained by the galaxies’total IR luminosities (i.e. matching the energy absorbed by the dustwithin galaxies with the energy it re-emits; see section 3.2 of Salimet al. 2018). Total IR luminosities were derived from the 22 𝜇 m WISEphotometry (if available, 12 𝜇 m if not) using Chary & Elbaz (2001)templates, further corrected based on Herschel (Valiante et al. 2016)IR photometry (see section 3.1 of Salim et al. 2018). The overallquality of ﬁt was measured by its reduced chi-squared value ( 𝜒 𝑟 ).Astrophysical features including rest-frame absolute magnitudes,colour excesses [ 𝐸 ( 𝐵 − 𝑉 ) ], stellar masses ( 𝑀 ∗ ), stellar metallicities( 𝑍 ), mass-weighted stellar ages ( 𝑀𝑊𝑆𝐴 ), and speciﬁc star formationrates [ 𝑠𝑆𝐹𝑅 (SED)] were derived from the full ensemble of possiblesynthetic spectra via a Bayesian approach (Salim et al. 2007). Thelikelihood of the ﬁt of each synthetic spectrum to the photometry ofeach galaxy was used to generate a probability density function foreach feature, with the likelihood-weighted means of the functions

MNRAS , 1–22 (2021)

Sebastian Turner et al. being quoted as the best estimates of the features, and the likelihood-weighted standard deviations as the errors.

Our ﬁnal GSWLC-2 sample is subject to the following selections.Firstly, we only retain galaxies whose best-ﬁtting

CIGALE

SEDs pro-duce 𝜒 𝑟 < = .

07 (i.e. the mean plus two standard deviations ofthe logarithmic GSWLC-2 distribution in 𝜒 𝑟 ), in order to omit par-ticularly poorly constrained ﬁts. Spectroscopic redshifts are limitedto the range 0 . < 𝑧 < .

08, and stellar masses (as estimated viaBayesian analysis of the synthetic

CIGALE spectra) to > . M (cid:12) .These two restrictions ensure completeness above the imposed stellarmass limit. Finally, broad-line active galactic nuclei are removed byasserting flag_sed =

0. Our ﬁnal GSWLC-2 sample has a medianredshift of 0 .

06 and contains 177 ,

362 galaxies.As additional,

CIGALE -independent indicators of the stellar pop-ulations in GSWLC-2 galaxies, we invoke Brinchmann et al. (2004)speciﬁc star formation rates [ 𝑠𝑆𝐹𝑅 (ind.)] and 4000 Å break strengths[ 𝐷 ( ) ]. The SFRs sum two components: a spectroscopic ﬁbreSFR, and a photometric SFR outside the ﬁbre, given by an opti-cal SED ﬁt (Salim et al. 2007). The ﬁbre SFR is given by eithera H 𝛼 calibration (Charlot & Longhetti 2001) or, in the case ofspectra that have a contribution from an active galactic nucleus,a 𝐷 ( ) -based estimate (itself calibrated on the emission linesof pure star-forming galaxies). These SFRs are then normalised byphotometrically-determined stellar masses to give 𝑠𝑆𝐹𝑅 (ind.). Thetimescale probed by 𝑠𝑆𝐹𝑅 (ind.) lies between the 10 Myr timescaleof the H 𝛼 -calibrated ﬁbre SFRs, and the 1 Gyr timescale of opticalSED-based SFRs (Salim et al. 2016). The 𝐷 ( ) measurementsapply to ﬁbre region only. Both of these features are available for 97per cent of the galaxies in our GSWLC-2 sample.We obtain Sérsic indices ( 𝑛 𝑔 ) and circularised half-light radii( 𝑅 / ) for the galaxies in our GSWLC-2 sample from cataloguesassembled by Simard et al. (2011). Both were derived from ﬁts ofsingular Sérsic (1963, 1968) proﬁles to 𝑟 -band images of galaxiesin SDSS. The Sérsic indices have minimum and maximum allowedvalues of 0 . . 𝑟 -band bulge-to-totalratios ( 𝐵 / 𝑇 𝑟 ) for these galaxies, which were based on ﬁts consistingof two components: a Sérsic bulge (ﬁxed at an index of 4) and anexponential disc. Local environmental densities, available for 92 . ,

000 km s − along the line of sight. We calculate local overdensities( 𝛿 ) using 𝛿 = ( Σ − ¯ Σ )/ ¯ Σ , where Σ is the local surface density and ¯ Σ the average surface density of the sample. The VIMOS Public Extragalactic Redshift Survey (VIPERS; Guzzoet al. 2014; Garilli et al. 2014; Scodeggio et al. 2018) aimed tomatch the statistical ﬁdelity of low-redshift surveys like SDSS, but atintermediate redshifts ( 𝑧 ∼ . , with objects qualifying for VIPERSif they had extinction-corrected 𝑖 -band magnitudes 𝑖 𝐴𝐵 < .

5. Anadditional 𝑢𝑔𝑟𝑖 colour cut was applied to remove low-redshift ( 𝑧 (cid:46) .

5) galaxies from the survey (Guzzo et al. 2014). PDR2, the secondand ﬁnal public data release of VIPERS, comprises spectroscopyfor 97 ,

414 objects (Scodeggio et al. 2018). 52 ,

114 of these objects(51 ,

522 galaxies and 592 broad-line active galactic nuclei) have‘secure’ ( >

99 % conﬁdence) redshifts. This secure-redshift samplewas the subject of the Siudek et al. (2018b) study, and is the basis ofour present VIPERS sample .Photometry for this sample was taken from a catalogue prepared byMoutard et al. (2016a). The CFHTLS-Wide photometric catalogue(i.e. the basis of the targeting for VIPERS) provided optical photom-etry for this sample in 𝑢 ∗ , 𝑔 , 𝑟 , 𝑖 , and 𝑧 bands. Moutard et al. (2016a)derived total magnitudes for the galaxies in this sample by rescalingtheir isophotal magnitudes. These isophotal magnitudes were chosenfor the accuracy of their colours with a view to photometric redshiftestimation; this choice now beneﬁts our SED estimation as well.Like for our GSWLC-2 sample, UV photometry came fromGALEX. Moutard et al. (2016a) supplemented existing Deep Imag-ing Survey observations of VIPERS galaxies with deep GALEX ob-servations of their own in order to improve UV coverage within theVIPERS footprint. Coverage is complete in the W1 ﬁeld of VIPERS,but not in the W4 ﬁeld (see ﬁgure 1 of Moutard et al. 2016a). UVphotometry was then measured using a Bayesian approach with the 𝑢 ∗ -band proﬁles of galaxies as priors (Conseil et al. 2011), whichmitigated the confusion of sources due to their blended UV proﬁles.UV photometry in at least one of GALEX’s two bands (almost al-ways 𝑁𝑈𝑉 if just one) is available for 52 per cent of galaxies inthe Siudek et al. (2018b) sample and in our ﬁnal VIPERS sample(Section 2.2.2).Near-infrared (NIR) 𝐾 𝑠 -band photometry came from a dedicatedCFHT WIRCam (Puget et al. 2004) follow-up survey of VIPERSgalaxies (Moutard et al. 2016a). This 𝐾 𝑠 -band photometry was val-idated against NIR photometry from the VISTA Deep ExtragalacticObservations (VIDEO) survey (Jarvis et al. 2013), exhibiting goodagreement. We also take VIDEO survey 𝑍 , 𝑌 , 𝐽 , 𝐻 , and 𝐾 𝑠 NIR pho-tometry for our sample where available (11 per cent of the Siudeket al. 2018b sample, 10 per cent of our ﬁnal VIPERS sample; Sec-tion 2.2.2). CFHT 𝐾 𝑠 -band photometry is available for 91 per centof galaxies in the Siudek et al. (2018b) sample, and for 93 per centof galaxies in our ﬁnal VIPERS sample (Section 2.2.2). The SEDs of VIPERS galaxies are estimated via a full ﬁt of synthetic

CIGALE spectra to the available UV-through-NIR photometry. Thisdiﬀers slightly from the method used for the GSWLC-2, whose NIRSEDs were constrained not by their shapes but simply by their total IRluminosities (Section 2.1.1). While we use the same stellar templates(Bruzual & Charlot 2003, with Chabrier 2003 initial mass functionsand metallicities of 0 . . .

02, or 0 .

05) for VIPERS as wereused for GSWLC-2 , the SFHs are adjusted to reﬂect the changein cosmic epoch between samples and to account for the possibility The use of these secure redshifts is recommended by Garilli et al. (2014)and Scodeggio et al. (2018) for scientiﬁc analyses. Approximately 75 per centof all VIPERS galaxies within and throughout the redshift range of our ﬁnalVIPERS sample (see Section 2.2.2) have secure redshifts (see ﬁgure 9 ofScodeggio et al. 2018).MNRAS , 1–22 (2021) edshift evolution of galaxy subpopulations of very recent bursts of star formation . Astrophysical features arederived for VIPERS galaxies using the same Bayesian approach asfor GSWLC-2 galaxies (see Section 2.1.1). We make the following selections to yield our ﬁnal VIPERS sample.Galaxies are kept if the 𝜒 𝑟 of their best-ﬁtting CIGALE

SED has avalue less than or equal to the mean plus two standard deviations( = .

85) of the overall logarithmic VIPERS distribution. Spectro-scopic redshifts are restricted to being within the range 0 . < 𝑧 < . > . M (cid:12) with a view to masscompleteness (though see Sections 4.4.2 and 5.2, where we discussshortcomings). Broad-line active galactic nuclei and serendipitoussecondary spectral sources are removed using zflag <

10. Ulti-mately, this gives us a ﬁnal VIPERS sample consisting of 31 , . CIGALE

SEDestimation, were calculated from the [OII] 𝜆 𝜆 ,

537 of thegalaxies in our VIPERS sample, and they probe short timescales ofstar formation ( ∼

10 Myr). We normalise these [OII] SFRs by our

CIGALE stellar masses to yield speciﬁc star formation rates [ 𝑠𝑆𝐹𝑅 (ind.)]. 𝐷 ( ) was measured from VIPERS spectra by Garilliet al. (2014), using the same Balogh et al. (1999) method as was usedfor SDSS (Brinchmann et al. 2004). Sérsic indices and circularisedhalf-light radii for the galaxies in our VIPERS sample are given byKrywult et al. (2017), who ﬁtted the 𝑖 -band light distributions ofgalaxies with single Sérsic (1963, 1968) proﬁles. These features areavailable for 96 . . . We apply the Fisher Expectation-Maximisation algorithm, which es-timates the parameters of the Discriminative Latent Mixture model.Bouveyron & Brunet (2012) oﬀer full, rigorous, mathematical deriva-tions of both the Discriminative Latent Mixture model and the FisherExpectation-Maximisation algorithm in their paper; here, we oﬀerbrief summaries of the model (Section 3.1), and of its implemen-tation via the algorithm (Section 3.2). In Section 3.3, we discusssome additional relevant practicalities to the use of the model andalgorithm, and in Section 3.4, we describe the shared feature spacewithin which we cluster our two samples. Consequences of this adjustment are discussed in Section 4.4.2; the prop-erties of most VIPERS galaxies appear accurate, except for those a subpopu-lation of passive VIPERS galaxies. Our use of stellar masses given by

CIGALE means that these 𝑠𝑆𝐹 𝑅 (ind.)estimates are not entirely independent of

CIGALE , however we expect that

CIGALE ’s stellar masses would be consistent with those estimated via othermethods, given that stellar mass estimates are generally quite robust (Bell &de Jong 2001).

Figure 1.

A simple demonstration of the principles behind subspace cluster-ing. Here, a sample consisting of two clusters (represented by the two blueellipses) is represented in a two-dimensional full space deﬁned by features 𝑓 and 𝑓 . Matrix 𝑀 enables the transformation of the sample to a one-dimensional subspace, deﬁned by latent feature 𝑓 𝑙 , in which the two clustersare easily discriminated. The Discriminative Latent Mixture (DLM) model is a clusteringapproach that incorporates dimensionality reduction on the ﬂy todetermine a frugal ﬁt to the structure of an input sample, whichis assumed to consist of 𝑘 clusters. Selection of the value of 𝑘 isdiscussed in Section 3.3.The key premise of the DLM model is thus: a sample representedin a 𝐷 -dimensional space that is deﬁned by observed features actuallyoccupies an intrinsic 𝑑 -dimensional subspace ( 𝑑 < 𝐷 ; the ‘emptyspace phenomenon’; Scott & Thompson 1983) that is deﬁned byunobserved, latent features. Hence, the clustering structure of thesample should be ﬁtted in this intrinsic subspace.The subspace has two important properties in the context of theDLM model. Firstly, of all possible 𝑑 -dimensional subspaces, it is theone that best discriminates the 𝑘 clusters in the sample. The modelassumes 1 ≤ 𝑑 ≤ 𝑘 −

1: that 𝑘 clusters may be distinguished in 𝑘 − 𝐷 -dimensional space,such that the unobserved, latent features are linear combinations ofthe observed features. Hence there exists a matrix 𝑀 , common toall of the 𝑘 clusters, that enables the transformation of the samplebetween the full space and the subspace. This transformation matrixis constrained by the condition that the basis vectors of the subspacemust be orthonormal. Estimation of the transformation matrix 𝑀 isexplained in Section 3.2. Selection of the value of 𝑑 is explained inSection 3.3. Fig. 1 demonstrates these two important properties ofthe subspace.The DLM model assumes that the sample is distributed amonga mixture of 𝑘 Gaussian density functions within the discrimina-tive latent subspace. The functions, each of which corresponds to acluster, are deﬁned by three parameters: a mean vector ( 𝜇 𝑘 ), a co-variance matrix ( Σ 𝑘 ), and a scalar relative mixture proportion ( 𝜋 𝑘 ).The matrix 𝑀 enables the transformation of these parameters backto the full space. For the covariances, this includes the addition ofGaussian ‘noise’ ( 𝛿 𝑘 ; unique to each of the clusters), which is deﬁnedas non-discriminative structure that exists in the full space but notin the subspace. While Σ 𝑘 captures the cluster covariances insidethe discriminative latent subspace, 𝛿 𝑘 captures the cluster covari-ances outside the subspace. Full space covariances are the sum ofboth. Estimation of the cluster means, covariances, and noise termsis discussed in Section 3.2. MNRAS , 1–22 (2021)

Sebastian Turner et al.

Implementation of the DLM model hence requires the estimationof the following parameters: • 𝑘 − 𝜋 𝑘 ; given that one clusterhas a proportion of 1); • 𝑘𝑑 parameters for the mean vectors ( 𝜇 𝑘 ) in the subspace; • 𝑘𝑑 ( 𝑑 + )/ Σ 𝑘 ) in thesubspace (fewer than 𝑘𝑑 parameters because covariance matricesare symmetric); • 𝑑 ( 𝐷 − ( 𝑑 + )/ ) parameters for the transformation matrix 𝑀 (the number of free parameters, given the constraint that the basisvectors of the subspace must be orthonormal); • 𝑘 noise terms ( 𝛿 𝑘 ; given that this non-discriminative structureis Gaussian and spherical, and may therefore by parametrised by asingle value in reference to the Gaussian density function estimatedfor each cluster).The total number of parameters ( 𝑞 𝐷𝐿𝑀 ) is most strongly inﬂu-enced by the value of 𝑑 . The maximum 𝑞 𝐷𝐿𝑀 at a certain combi-nation of 𝐷 and 𝑘 is given by setting 𝑑 to its maximum value of 𝑘 − 𝑘 clusters maybe distinguished in 𝑘 − 𝑞 𝐷𝐿𝑀 is smallerthan the number of parameters that must be estimated for a GaussianMixture Model in the full space ( 𝑞 𝐺𝑀 𝑀 ), especially if 𝑑 << 𝐷 ( 𝑞 𝐺𝑀 𝑀 is given by the sum of 𝑘 − 𝑘 𝐷 parameters for the mean vectors, and 𝑘 𝐷 ( 𝐷 + )/ 𝑞 𝐷𝐿𝑀 may be further reduced by imposing additionalconstraints upon the DLM model. For example, the covariance ma-trices ( Σ 𝑘 ) may be assumed to be the same for all Gaussians ( Σ ;the Gaussians all have the same shape). Alternatively, they may beassumed to be diagonal ( 𝛼 𝑘, 𝑗 , where the subscript 𝑗 indicates adiﬀerent variance in each dimension of the subspace), meaning thelatent features that deﬁne the subspace are uncorrelated. These diag-onal covariance matrices may then also be assumed to be isotropic( 𝛼 𝑘 ; spherical Gaussians in the subspace), the same for all Gaussians( 𝛼 𝑗 ), or both ( 𝛼 ). The noise terms ( 𝛿 𝑘 ) may be assumed to be thesame for all Gaussians ( 𝛿 ) as well. Constraints like these may beimposed to speed up the clustering, in anticipation of a particularclustering structure, or (as in our case) to compare ﬁts of models ofvarying complexities (see also Section 3.3). The various combina-tions of these constraints on the covariance matrices and noise termsyield 11 submodels of the full Σ 𝑘 , 𝛿 𝑘 DLM model. They are listedin full in table 1 of Bouveyron & Brunet (2012) (and listed partiallyin Table 1 of this paper).

The Fisher Expectation-Maximisation algorithm (

Fisher-EM or, aswe will call it in this paper,

FEM ) estimates the parameters ( 𝜋 𝑘 , 𝜇 𝑘 , Σ 𝑘 , 𝑀 , 𝛿 𝑘 ) of the DLM model, ﬁtting a sample of 𝑁 observations,observed in a 𝐷 -dimensional space (the ‘full’ space, deﬁned by 𝐷 observed features), with 𝑘 Gaussian density functions in a 𝑑 -dimensional discriminative latent subspace (1 ≤ 𝑑 ≤ 𝑘 − FEM comprises the following steps:(0) Initialisation: 𝑘 starting points are selected within the extentof the sample in the full space;(1) Expectation (E): transform the parameters of the mixture ofGaussians to the full space, and calculate the probability of eachobservation having originated from each Gaussian;(2) Fisher (F; based on discriminant analysis): using the observa-tion probabilities, ﬁnd the subspace that best separates the Gaussians; (3) Maximisation (M): update the parameters of the mixture ofGaussians (including non-discriminative structure, termed ‘noise’)within the subspace.The Expectation, Fisher, and Maximisation steps are iterated suchthat FEM improves its estimates of the DLM model parameters as itproceeds.

FEM is slow to run on our large samples and, unlike tra-ditional expectation-maximisation algorithms, does not always con-verge perfectly (such that there are no changes between successiveiterations; due to the Fisher step). We therefore terminate

FEM at thecompletion of 25 iterations; changes between iterations become neg-ligible well before this number (see Appendix A). The ﬁnal outputof

FEM is a series of 𝑘 probabilities for each of the observations:probabilities of each observation having originated from each of the 𝑘 Gaussians. Final cluster labels are given by assigning each observa-tion to the Gaussian with the highest probability of having originatedit. While successive iterations of

FEM improve its estimates of theDLM model parameters, these estimates improve only towards localmaxima of the likelihood function.

FEM is hence run with varyinginitialisations, which may intuitively be considered as ‘exploring thesurface’ of the likelihood function of the model parameters. This en-courages optimisation towards diﬀerent local maxima and, hopefullyamong these, the global maximum, corresponding to the very bestestimate of the DLM model parameters.Initialisation techniques may be as simple as a uniform random se-lection of 𝑘 observations from the sample. We opt to use the k-means algorithm (MacQueen 1967; Lloyd 1982), which implements a sim-ple centroid-based clustering approach, to generate initialisationsfor FEM . k-means is an expectation-maximixation algorithm and,like FEM , only optimises to local maxima. We therefore initialise k-means itself

100 times in the hope of encouraging optimisationtowards the global maximum of its objective function (which mea-sures how separated the clusters are). Use of varying initialisationsprovided by a heuristic like k-means leads to ‘pre-optimisation’ of

FEM because the separated centroids are likely to span the full extentof the sample in its full space. This facilitates improvement of

FEM ’sestimates of the DLM model parameters towards the global maxi-mum of their likelihood functions. Following this initialisation,

FEM proceeds to the Fisher step, in which it ﬁnds the subspace that bestseparates the ﬁnal k-means clusters, and to the Maximisation step,in which it ﬁts the observations with a mixture of Gaussians withinthis subspace.

FEM then loops back around to the Expectaton step andbegins iterating proper.The Expectation step uses the parameters estimated in the Maximi-sation step ( 𝜋 𝑘 , 𝜇 𝑘 , Σ 𝑘 , 𝛿 𝑘 ) to calculate the conditional probabilityof each observation having originated from each of the 𝑘 Gaussians.These parameters are transformed from the subspace, within whichthey are estimated in the Maximisation step, to the full space usingmatrix 𝑀 , found in the Fisher step.The Fisher step ﬁnds the 𝑑 -dimensional discriminative latent sub-space that best separates the new partition calculated in the Expecta-tion step. Bouveyron & Brunet (2012) base this step on discriminantanalysis, which ﬁnds the linear combination of the input featuresthat maximises the ratio of the scatter between clusters to the scatter within clusters. Similar principles have been applied for the visuali-sation of multi-dimensional clusters as well (e.g. Lisboa et al. 2008).These scatters are weighted by the probabilities calculated in theExpectation step. A constraint of the DLM model is that the 𝑑 basisvectors that deﬁne the subspace must be orthonormal, which is notnecessarily a property of the 𝑑 basis vectors that linear discriminantanalysis (LDA) provides. Bouveyron & Brunet (2012) assert this MNRAS , 1–22 (2021) edshift evolution of galaxy subpopulations constraint by applying the orthonormal discriminant vector method(ODV; Okada & Tomita 1985). ODV uses LDA to ﬁnd the 𝑑 basisvectors in succession while also ensuring the orthonormality of eachnew basis vector with respect to all of those that have already beencalculated. The ﬁrst basis vector, which is free of this constraint, isgiven by the direct application of LDA to the sample in the full space.The 𝑑 orthonormal basis vectors constitute the columns of 𝑀 , thematrix that enables the transformation of the sample between the fullspace and the subspace.The Maximisation step updates the estimates of the means, co-variances, and relative mixture proportions ( 𝜋 𝑘 , 𝜇 𝑘 , Σ 𝑘 ) of the 𝑘 Gaussians in order to maximise the likelihood of the ﬁt. These esti-mates are measured within the subspace found in the Fisher step, andare weighted by the probabilities calculated in the Expectation step.This step also updates the estimates of the noise terms ( 𝛿 𝑘 ), whichis given by the diﬀerences between the full-space variances (againweighted by the probabilities calculated in the Expectation step) andthe newly updated subspace variances. We do not presume a DLM submodel or value of 𝑘 with which toﬁt our samples. Instead, we conduct a search over all of the DLMsubmodels and over a range of values of 𝑘 to determine the best-ﬁtting combination. Three of the DLM submodels ( 𝛼 𝑗 , 𝛿 𝑘 ; 𝛼 𝑗 , 𝛿 ; 𝛼 , 𝛿 𝑘 ) are not available for use in the version of FEM that we usefor our ﬁtting. This reduces the total number of available submodelsfrom 12 (including the full Σ 𝑘 , 𝛿 𝑘 model) to nine.We identify the best-ﬁtting combination of DLM submodel andvalue of 𝑘 by using the Integrated Completed Likelihood criterion(ICL; Biernacki et al. 2000):ICL = ln ( 𝐿 ) − 𝑞 𝐷𝐿𝑀 ( 𝑁 ) − [− Σ 𝑁𝑖 = Σ 𝑘𝑙 = 𝑧 𝑖,𝑙 ln ( 𝑝 𝑖,𝑙 )] , (1)where 𝐿 is the likelihood of the ﬁt, 𝑝 𝑖,𝑙 is the probability of obser-vation 𝑖 belonging to cluster 𝑙 , and 𝑧 𝑖,𝑙 denotes cluster membership,taking a value of 1 when 𝑝 𝑖,𝑙 = max ( 𝑝 𝑖, : ) and a value of 0 other-wise. The ICL is related to the popular Bayesian Information criterion(BIC; Schwarz 1978). While both the BIC and ICL criteria penalisethe likelihood using the number of model parameters (to avoid over-ﬁtting), the ICL criterion also rewards separated clusters (a generalaim of clustering). The combination of submodel and 𝑘 that returnsthe highest ICL score is deemed the the best ﬁt.The dimensionality of the discriminative latent subspace is con-strained by the number of clusters being ﬁtted: 1 ≤ 𝑑 ≤ 𝑘 −

1. Themaximal 𝑑 = 𝑘 − 𝑘 cluster centres so that thefull-space vectors to each of the remaining 𝑘 − FEM , we hold 𝑑 at its maximum value of 𝑘 −

1. This is recommended by Bouveyron& Brunet (2012) to avoid omitting any discriminative structure fromthe subspace and to ease convergence of

FEM (which may becomeunstable or fail to converge if 𝑑 is too small in comparison with 𝑘 and/or 𝐷 ). Hence, the maximum value of 𝑘 in our model selectionsearch is 9 (set by 𝑑 =

8, given 𝐷 = Version 1.5.1, for the R statistical computing environment. The ﬁtting of the clustering structures of both of our samples isconducted within a nine-dimensional feature space deﬁned by UV-through-NIR colours. We opt for colours because of their widespreaduse in studies of galaxy evolution, and because of the relative easewith which they may be measured. While clustering in terms ofderived astrophysical features may facilitate a more direct interpre-tation of resultant clusters in terms of theories of galaxy evolution,their derivation is much more model-dependent than that of colours.Clustering in terms of colours ensures the generalisability of ouroutcomes.The colours that we use are calculated not from the observedphotometry that is used as input to the SED ﬁtting, but from rest-frame magnitudes estimated by

CIGALE . This ensures homogeneityamong the input features, and that the feature space is deﬁned byrest-frame colours (which is more diﬃcult to ensure using colourscalculated directly from observed photometry). In addition, the SEDestimation can infer the rest-frame magnitudes of galaxies in bandsfor which there is no observed photometry. The full list of rest-framecolours used for the clustering is:

𝐹𝑈𝑉 − 𝑁𝑈𝑉 , 𝑁𝑈𝑉 − 𝑢 , 𝑢 − 𝑔 , 𝑔 − 𝑖 , 𝑖 − 𝑟 , 𝑟 − 𝑧 , 𝑧 − 𝐽 , 𝐽 − 𝐻 , and 𝐻 − 𝐾 𝑠 . These rest-frame colours areintended to represent the shape of each galaxy’s UV-through-NIRSED, and to remove the inﬂuence of the intrinsic brightnesses of thegalaxies on the clustering outcomes. The rest-frame magnitudes ofGSWLC-2 galaxies (but not VIPERS galaxies) are subject to somesmoothing (see Appendix B). In addition, the rest-frame NIR coloursof GSWLC-2 galaxies were inferred from UV and optical photometry(given the lack of input NIR photometry). Use of the term ‘colour’from this point forward in this paper is intended in reference to theserest-frame colours, as estimated by CIGALE .These colours diﬀer from those used by Siudek et al. (2018b);they used rest-frame colours deﬁned with reference to the rest-frame 𝑖 -band magnitudes of galaxies ( 𝐹𝑈𝑉 − 𝑖 , 𝑁𝑈𝑉 − 𝑖 , etc.), also with theaim of removing the inﬂuence of galaxy intrinsic brightnesses on theirclustering outcomes. However, their UV colours, deﬁned across thelargest distances in wavelength among their features, exhibited largespreads (up to a factor of 10 larger than the spreads of other colours)and dictated much of their clustering. Preliminary tests of clusteringwith these 𝑖 -band based colours for our present, carefully preparedsamples conﬁrmed this. The 𝛼 𝑘, 𝑗 and 𝛼 𝑘, 𝑗 submodels achieved thehighest ICL scores for these 𝑖 -band colours, but gave only relativelycrude segmentations of our samples (see also Appendix C). Ourcolours, deﬁned using magnitudes in ﬁlters at neighbouring eﬀectivewavelengths, mitigate this eﬀect and encourage FEM to converge tomore detailed partitions (although, as shown in Fig. 2, bluer coloursare still most important).

FEM submodel selection

As outlined in Section 3.3, we conduct a search for the best-ﬁtting

FEM submodel and number of clusters for our samples. We identifythe best-ﬁtting combination using the ICL criterion (Equation 1),which penalises the number of parameters of the submodel whilefavouring separated clusters. Table 1 lists ICL scores reported forboth samples. The uncertainties on these scores, which span the full variation (i.e. from minimum to maximum) over 100 initialisations,show that

FEM is extremely stable and self-consistent, robustly con-verging to highly similar outcomes over successive runs that use thesame combination of submodel and number of clusters. The best-

MNRAS , 1–22 (2021)

Sebastian Turner et al.

Table 1.

Integrated Completed Likelihood (ICL) scores reported by our search over all possible combinations of submodel (see Section 3.1 for furtherexplanation) and 𝑘 for our samples. The uncertainties span the full range of ICL scores (i.e. from minimum to maximum) registered over 100 initialisations foreach combination. As mentioned in Section 3.3, only nine of the 12 submodels are available in the version of FEM that we use for our ﬁtting. The score of thebest-ﬁtting combination is highlighted using bold text. While submodel Σ , 𝛿 produces the highest score for our GSWLC-2 sample (at 𝑘 = FEM did not converge (see Appendix C). The entries listed in this table aresubject to the multipliers at the right-hand side of each section. The ICL scores for our GSWLC-2 sample are systematically higher than those for our VIPERSsample because it contains more galaxies. Submodel Σ 𝑘 , 𝛿 𝑘 Σ 𝑘 , 𝛿 Σ , 𝛿 𝑘 Σ , 𝛿 𝛼 𝑘, 𝑗 , 𝛿 𝑘 𝛼 𝑘, 𝑗 , 𝛿 𝛼 𝑘 , 𝛿 𝑘 𝛼 𝑘 , 𝛿 𝛼 , 𝛿 G S W L C - 𝑘 = ± ± ± ± ± ± ± ± ± × 𝑘 = ± ± ± ± ± ± ± ± ± 𝑘 = ± ± ± ± ± ± ± 𝑘 = ± ± ± ± ± ± ± ± 𝑘 = ± ± ± ± ± ± 𝑘 = ± ± ± ± ± ± 𝑘 = ± ± ± ± 𝑘 = ± ± ± ± ± V I P E R S 𝑘 = ± ± ± ± ± ± × 𝑘 = ± ± ± ± ± ± ± 𝑘 = ± ± ± ± ± ± 𝑘 = ± ± ± ± ± ± ± 𝑘 = ± ± ± ± ± 𝑘 = ± ± ± ± ± 𝑘 = ± ± ± 𝑘 = ± ± ﬁtting combinations for each sample are highlighted using bold text.We brieﬂy describe patterns of behaviour of the various submodelsand explain the large range in ICL scores in Appendix C. Despiteit registering the highest score for the GSWLC-2 sample, we rejectthe 𝑘 = Σ , 𝛿 combination due to its inclusion of empty clusters(explained further also in Appendix C).Both samples are best partitioned into seven clusters, within asix-dimensional discriminative latent subspace. The Gaussian den-sity functions representing the clusters are each described by theirown unique, full covariance matrices ( Σ 𝑘 ); the clusters each havediﬀerent shapes, and the use of full covariance matrices indicatescorrelations (as expected) among the input features within the sub-spaces. While the best-ﬁtting submodel for the GSWLC-2 sampleuses unique noise terms for each cluster ( 𝛿 𝑘 ), the best-ﬁtting sub-model for the VIPERS sample does not ( 𝛿 ), owing to the smootherdistribution of the VIPERS sample in the feature space (see e.g.Fig. 3). Submodels Σ 𝑘 , 𝛿 𝑘 and Σ 𝑘 , 𝛿 report similar ICL scores andproduce similar clustering structures in general and may thereforereadily be compared with one another (see also Appendix C). That FEM has converged to highlighting these closely related submodels asbeing optimal for describing both samples is encouraging, and givesus conﬁdence that we are conducting a fair comparison.

In Fig. 2, we show the relative importance of each input feature to theclustering. Speciﬁcally, we calculate the mutual information (

𝑀 𝐼 )between each input feature and the output cluster labels:

𝑀 𝐼 ( 𝑓 , 𝑙 ) = 𝐷 𝐾 𝐿 ( 𝑝 𝑓 ,𝑙 || 𝑝 𝑓 𝑝 𝑙 ) . (2)Here, 𝐷 𝐾 𝐿 is the Kullback-Leibler divergence (Kullback &Leibler 1951; also known as the relative entropy) between the jointprobability distribution of input feature 𝑓 and output label 𝑙 , andtheir independent distributions. For Fig. 2, 𝑀 𝐼 𝑓 ,𝑙 is normalised byits sum across all input features to give a relative value.

Figure 2.

The relative importance of each of the input features to the clus-tering. ‘F’ stands for FUV, and ‘N’ for NUV. The mutual information (seeSection 4.2 and Equation 2) of each of the input features with respect to thecluster labels has been normalised by the sum across all of the input featuresfor each sample.

The lines in Fig. 2 are broadly similar, indicating that, on thewhole,

FEM uses the nine features in a similar way to determine itsbest partitions. This is further conﬁrmed by noting that the subspaceswithin which

FEM determined these best partitions have the samedimensionality (6) for both samples. The lines are especially consis-tent among the optical colours, which is expected given that opticalphotometry is ubiquitously available for galaxies in both samples.Altogether, the optical regime is the most important to the cluster-ing. Individually, colours from the UV region of the SEDs of thegalaxies in both samples are most strongly related to the output clus-ter labels. This highlights, as expected, the star formation activityand the dust content of galaxies as major inﬂuences on the shapes oftheir UV-through-NIR SEDs.UV colours are slightly more important for the clustering in ourGSWLC-2 sample, which reﬂects the increased UV coverage of itsgalaxies by GALEX (80 per cent, as opposed to 52 per cent for ourVIPERS sample). NIR colours are less important for distinguishingclusters within our GSWLC-2 sample than within our VIPERS sam-ple, which is likely due to their having been inferred purely from UV

MNRAS000

MNRAS000 , 1–22 (2021) edshift evolution of galaxy subpopulations and optical input photometry. This is in contrast with the galaxiesin our VIPERS sample, whose NIR SEDs (more important to theclustering) were instead constrained by 𝐾 𝑠 -band photometry . Forgalaxies with incomplete photometry, the array of templates and syn-thetic spectra with which CIGALE may ﬁt them is reduced, leadingto reduced variation in the shapes of their SEDs. In addition, therest-frame magnitudes (and hence, rest-frame colours) that CIGALEmust infer from photometry at other wavelengths have larger uncer-tainties. Hence, availability of photometry with which to constrainthe SEDs of galaxies is advantageous to the clustering. Nevertheless,Fig. 2 shows that, for the most part,

FEM uses the features similarlyto model both samples despite slight diﬀerences in this availabil-ity, which is driven mostly by the ubiquitous availability of opticalphotometry for both samples.

Table 2 proﬁles the clusters determined within both samples. Featuresare derived from the same SEDs as the colours used for the clustering(see Sections 2.1.1 and 2.2.1) as well as from ancillary sources (seeSections 2.1.2 and 2.2.2). Clusters are named using two-part notationthat will be used throughout the remainder of this paper; preﬁxes ‘G’or ‘V’ denote clusters determined within the GSWLC-2 and VIPERSsamples respectively. Clusters names have been ordered by their mean

𝑁𝑈𝑉 − 𝑟 colours for ease of reference.Fig. 3 shows projections of our samples onto the two principaldimensions of their respective six-dimensional discriminative sub-spaces. These projections, which oﬀer direct views of the structuresof the clustering outcomes, are determined uniquely for each sampleby FEM : the axes of the two plots do not correspond exactly to one an-other . Nevertheless, these projections are broadly similar in terms ofthe shapes of the overall samples within them. Both samples exhibitcontinua in these projections, running from the lower right to theupper left of each plot, which have been segmented by FEM . That thissegmentation is robustly reproducible over successive runs of

FEM (Table 1) indicates that

FEM has captured astrophysically meaningfulstructures in the samples. In addition, both samples exhibit a clusterwhich extends into the sparser region to the upper right of each plot.This overall similarity suggests that the evolution of galaxies at theepochs of the two samples is mostly similar. It also gives us conﬁ-dence in the success of the measures taken to ensure a fair comparisonbetween samples at diﬀerent redshifts and from diﬀerent surveys (seeSections 2.1.2 and 2.2.2), and reinforces our conclusion that

FEM hasoverall used the input features similarly for both samples in spite ofslight diﬀerences in the availability of photometry between them (seeSection 4.2). The subtler diﬀerences between clusters in these pro-jections are subject to the distributions of galaxies within the shapesof their respective samples. We comment on these diﬀerences whererelevant in Section 4.4. Cluster colours in the plots in this paper, liketheir names, are assigned based on their mean

𝑁𝑈𝑉 − 𝑟 colours.We break down the analysis of our clusters by using the two-dimensional colour bimodality of galaxies as a simple framing de-vice. The colour bimodality is a steady property of the galaxy popu-lation throughout cosmic time, having been observed among galaxieswith redshifts as high as 4 (Wuyts et al. 2007; Williams et al. 2009;Ilbert et al. 2010, 2013). Hence, we may use it to separate clusters thatare more strongly associated with the blue peak (containing mostly While the Two Micron All-Sky Survey (Skrutskie et al. 2006) has NIRphotometry for ∼

50 per cent of GSWLC-2 galaxies, it is shallow and wouldnot have provided strong constraints upon their NIR SEDs. star-forming galaxies) from clusters that are more strongly associatedwith the red peak (containing mostly passive galaxies) in a way thatis independent of redshift.This two-dimensional separation is marked by the black lines inFig. 4. The

𝑁𝑈𝑉 − 𝑟 − 𝐾 𝑠 colour-colour plane (Arnouts et al. 2013;Moutard et al. 2016b) is a useful tool with which to probe galaxysubpopulations due to its ability to separate star-forming (low 𝑁𝑈𝑉 − 𝑟 ), passive (high 𝑁𝑈𝑉 − 𝑟 ), and also dusty (high 𝑟 − 𝐾 𝑠 ) galaxies.It has been applied in several studies of galaxy evolution using datafrom VIPERS (e.g. Fritz et al. 2014; Davidzon et al. 2016; Moutardet al. 2016b; Siudek et al. 2017, 2018b; Vergani et al. 2018). Theform of the black lines is inspired by Fritz et al. (2014) and Moutardet al. (2016b); they are placed independently in each panel, withoutreference to the positions of the clusters, to simply demarcate thestar-forming and passive regions of the 𝑁𝑈𝑉 − 𝑟 − 𝐾 𝑠 plane. Clusterswhose means then lie below the black line in each plot are selectedas ‘blue’, ‘star-forming’ clusters, and clusters whose means then lieabove the black lines are selected as ‘red’, ‘passive’ clusters. As aresult, both samples break down into four blue clusters and three redclusters. Deviations of the structures of the clusters from this simpleblue/red (star-forming/passive) division that we enforce (e.g. clustersthat overlap or span this division) will highlight limitations of a purelytwo-dimensional view of the galaxy population and its bimodality.The separation between these two main families of clusters suggestsdiﬀerences in the evolution processes inﬂuencing the galaxies thatthey contain.The blue peak of the bimodality corresponds closely with the star-forming main sequence (SFMS; Noeske et al. 2007; Salim et al.2007), which is the tight correlation between the SFRs and the stel-lar masses of actively star-forming galaxies. The SFMS, like thebimodality, is ubiquitous throughout cosmic time (Speagle et al.2014). It has a lower normalisation with decreasing redshift; thiscosmological decline of star formation (Madau et al. 1996; Madau& Dickinson 2014; Driver et al. 2018) is visible as a vertical oﬀsetbetween the samples in Fig. 4. In this paper, the terms ‘blue peak’and ‘SFMS’ are synonymous, and we use them interchangeably.The stronger 𝑁𝑈𝑉 − 𝑟 split between star-forming and passiveVIPERS clusters in comparison with those of GSWLC-2 (Figs. 4and 3) is likely to result from two factors. First is the diﬀerence inthe rest-frame wavelength coverage of GALEX photometry for thetwo samples; some rest-frame UV emission is redshifted out of thebandwidths of GALEX’s ﬁlters at 𝑧 ∼ .

65. Second is the diﬀerencein the completeness of UV photometry for each sample. GALEXobservations exist for ∼

80 per cent of galaxies in clusters G1-4.This proportion falls to ∼

55 per cent in clusters G5-7, but this isexpected given that these galaxies would be fainter in the UV regime.Meanwhile, ∼

65 per cent of V1, V2, and V4 galaxies were observedby GALEX. Interestingly, only ∼

20 per cent of galaxies in V3 haveobserved UV photometry, which may explain its separation from theother star-forming VIPERS clusters. Passive VIPERS clusters are ∼

25 per cent complete in observed UV photometry. Together, thesefactors mean we are likely to miss low levels of UV emission frommore evolved VIPERS galaxies with more intermediate colours. Onthe other hand, Fig. 2 shows that rest-frame

𝑁𝑈𝑉 − 𝑢 colours aresimilarly important to the clustering structures of both samples, with 𝑁𝑈𝑉 emission expected to be a particularly accurate tracer of starformation (Salim 2014).

MNRAS , 1–22 (2021) Sebastian Turner et al.

Table 2.

Proﬁles, in terms of averages, of the clusters determined within each of our samples. See the main text for an explanation of the cluster naming scheme.We list cluster means in columns

𝑁𝑈𝑉 − 𝑟 and 𝑟 − 𝐾 𝑠 . For the remaining features, which are less directly linked to the clustering, we opt for medians tomitigate the potential inﬂuence of outliers on the cluster proﬁles. Column ‘%’ lists the percentage of galaxies contained within each cluster for each sample. Thedata in the next seven columns [ 𝑁𝑈𝑉 − 𝑟 to log ( 𝑠𝑆𝐹 𝑅 / yr − ) (SED)] originates from the same CIGALE

SEDs as the rest-frame colours that were used asinputs to the clustering. Features listed in this table include colour excesses [ 𝐸 ( 𝐵 − 𝑉 ) ], stellar masses ( 𝑀 ∗ ), stellar metallicities ( 𝑍 ), mass-weighted stellarages ( 𝑀𝑊 𝑆 𝐴 ), and speciﬁc star formation rates ( 𝑠𝑆𝐹 𝑅 ). We list sSFRs both determined by

CIGALE (SED; averaged over 100 Myr timescales) and determinedfrom galaxy spectra (and hence independent of

CIGALE ; ind.; see Sections 2.1.2 and 2.2.2). Medians marked with asterisks have unexpected values given theircorresponding

𝑁𝑈𝑉 − 𝑟 colour and are discussed in Section 4.4.2.Cluster % 𝑁𝑈𝑉 − 𝑟 𝑟 − 𝐾 𝑠 𝐸 ( 𝐵 − 𝑉 ) log ( 𝑀 ∗ / M (cid:12) ) log ( 𝑍 ) log ( 𝑀𝑊 𝑆 𝐴 / Myr ) log ( 𝑠𝑆𝐹 𝑅 / yr − ) (SED) (ind.)G1 24 . .

39 0 .

42 0 .

11 9 . − .

22 3 . − . − . . .

29 0 .

91 0 .

20 10 . − .

81 3 . − . − . . .

51 0 .

78 0 .

14 10 . − .

11 3 . − . − . . .

31 1 .

16 0 .

13 10 . − .

75 3 . − . − . . .

07 0 .

67 0 .

22 10 . − .

30 3 . − . − . . .

24 0 .

78 0 .

08 10 . − .

11 3 . − . − . . .

27 0 .

73 0 .

11 10 . − .

20 3 . − . − . . .

86 0 .

25 0 .

01 9 . − .

12 3 . − . − . . .

17 0 .

60 0 .

02 10 . − .

90 3 . − . − . . .

62 0 .

75 0 .

05 10 . − .

40 3 . − . − . . .

26 1 .

05 0 .

12 10 . − .

80 3 . − . − . . .

75 0 .

91 *0 .

15 10 .

61 * − .

51 *3 .

52 * − . − . . .

81 0 .

90 *0 .

15 10 .

69 * − .

86 *3 .

61 * − . − . . .

86 0 .

96 0 .

02 10 . − .

05 3 . − . − . Figure 3.

Projections of both samples onto the two dimensions that best separate their clusters. The axes of each plot are determined by

FEM and are unique toeach sample (as indicated by their labels; e.g. 𝑆 𝐺 . Our

𝑁𝑈𝑉 − 𝑟 − 𝐾 𝑠 cut (Section 4.3) yields the following blue clus-ters: G1, G2, G3, and G4 for the GSWLC-2 sample; and V1, V2, V3,and V4 for the VIPERS sample. While dominated by blue galaxies,clusters G4 and V4 also contain a signiﬁcant number of galaxieswith green or red 𝑁𝑈𝑉 − 𝑟 colours (including the vast majority ofgreen valley galaxies). Fig. 5 shows that the SEDs of G4 galaxies are generally more similar to those of actively star-forming galax-ies, being ﬂatter in the UV regime (e.g. G3 galaxies) than those oftypically passive galaxies (e.g. G5 galaxies). Hence, in terms of theinﬂuence of their evolutionary histories on the shapes of their SEDs,G4 galaxies appear more closely related to G1-3 galaxies than toG5-7 galaxies, despite some G4 galaxies occupying the passive re-gion of the 𝑁𝑈𝑉 − 𝑟 − 𝐾 𝑠 plane in Fig. 4. Similarly, the SEDs of V4galaxies more closely resemble those of V1-3 galaxies rather thanV5-7 galaxies (not shown). MNRAS000

Colour-colour plots of our samples. Colours are derived from

CIGALE

SED estimation. The distributions of clusters are shown using coloured, ﬁlledcontours (drawn at a relative density of 0 . Table 3.

Proﬁles, in terms of averages of ancillary features, of the clustersdetermined within each of our samples. See the main text for an explanation ofthe cluster naming scheme. We list the median values of the galaxies that theclusters contain for each of the features. Column ‘%’ lists the percentage ofgalaxies contained within each cluster for each sample. Features listed in thistable include Sérsic indices ( 𝑛 𝑔 ), half-light radii ( 𝑅 / ), and environmentaloverdensities ( 𝛿 ). The data is drawn from ancillary sources (see Sections2.1.2 and 2.2.2).Cluster % 𝑛 𝑔 log ( 𝑅 / / kpc ) log ( + 𝛿 ) G1 24 . .

04 0 .

57 0 . . .

34 0 .

50 0 . . .

57 0 .

55 0 . . .

38 0 .

61 0 . . .

09 0 .

40 0 . . .

18 0 .

45 0 . . .

25 0 .

44 0 . . .

92 0 .

49 0 . . .

95 0 .

48 0 . . .

11 0 .

50 0 . . .

53 0 .

55 0 . . .

31 0 .

42 0 . . .

29 0 .

40 0 . . .

40 0 .

43 0 . Given that the SFMS is a smooth continuum, it is important wherepossible to establish why

FEM has distinguished clusters within it, andto interpret the signiﬁcance of these distinctions in terms of galaxyevolution. The position of a galaxy along the

𝑁𝑈𝑉 − 𝑟 − 𝐾 𝑠 SFMS(Fig. 4) is governed by a combination of its stellar mass and its dustcontent (Moutard et al. 2016a,b). The lobe at high 𝑟 − 𝐾 𝑠 , whichpreferentially consists of edge on galaxies, is known to capture theexcess reddening of high-mass star-forming galaxies (Arnouts et al.2013), but it is more diﬃcult to disentangle this combination of stellarmass and dust elsewhere within the SFMS. Hence, we see an overlapof star-forming clusters in Fig. 4. In Fig. 3, though, these clusters aremore clearly separated.G1 and V1 capture equivalent subpopulations of galaxies. Both clusters contain the galaxies with the bluest colours and the lowestmasses (Fig. 4, Table 2) within their respective samples; star-forminggalaxies at relatively early stages of their evolution. The remainingstar-forming clusters have higher masses and lie further along theSFMSs of each sample.Clusters G2 and G3 overlap with one another in the left-panel ofFig. 4, as do clusters V2 and V3 in the right-hand panel of the sameﬁgure. Fig. 3 shows that G2 and V3 both extend away from the maincontinua within the subspace projections of their respective samples.The feature vector projections in Fig. 3 show that the galaxies in theseclusters have particularly red 𝐹𝑈𝑉 − 𝑁𝑈𝑉 colours in comparison withother SFMS clusters. However, the astrophysical meaning behind thisis unclear.

CIGALE alternately attributes this reddening to high colourexcesses for galaxies in G2 and to higher metallicities for galaxies inV3 (Table 2), suggesting that it has not fully resolved the degeneracybetween the inﬂuences of dust and metallicity upon the colours ofthese galaxies. However,

CIGALE is consistent in assigning G2 andV3 galaxies similar stellar masses and mass-weighted stellar ages toG3 and V2 galaxies (Table 2), which occupy similar regions of the 𝑁𝑈𝑉 − 𝑟 − 𝐾 𝑠 plane. Stellar mass estimates are not strongly aﬀectedby an inability to resolve this degeneracy between the inﬂuences ofdust and metallicity (e.g. Bell & de Jong 2001). Clusters G3 and V2,lying on the main continua in Fig. 3, seem to be intermediate betweenclusters G1 and G4, and V1 and V4 respectively.The star-forming clusters along the SFMS of our GSWLC-2 sam-ple exhibit a gradient in their star formation activity. Taking theirincreasing average stellar masses as a point of reference, clustersG1-4 exhibit a corresponding increase in their average 𝑁𝑈𝑉 − 𝑟 colours (Table 2, Fig. 4). decrease in their average sSFRs (both SEDand ind.; Table 2), and increase in their average 𝐷 ( ) (Fig. 6).High-mass galaxies in our GSWLC-2 sample do not form stars asreadily as low-mass galaxies. This gradient is weaker for clusters V1-3 (particularly with regard to their median sSFRs; Table 2), thoughwe note that clusters V2 and V3 have lower average stellar massesthan G2 and G3. It is only in V4 that we see a rise in average stellar MNRAS , 1–22 (2021) Sebastian Turner et al.

Figure 5.

A comparison of the shapes of the mean ( ± standard deviation)estimated SEDs of galaxies in clusters G3, G4, and G5. Clusters G3 and G5are chosen as they neighbour G4 in terms of their average 𝑁𝑈𝑉 − 𝑟 colour.The estimated SEDs of individual galaxies are normalised by their 𝑟 -bandmagnitudes (the eﬀective wavelength of which is marked by a dashed blackline) before the mean estimated SEDs are calculated. The y-axis applies tothe mean SED of G5; those of G3 and G4 are vertically oﬀset by − − Figure 6.

Smoothed kernel density estimates in 𝐷 ( ) (logarithmicallydistributed) for each of the clusters from both outcomes. Here, 𝐷 ( ) was measured from the spectra of galaxies (Brinchmann et al. 2004; Garilliet al. 2014) using a method introduced by Balogh et al. (1999), and is henceindependent of CIGALE ’s estimated SEDs. mass accompanied by a decrease in average 𝑠𝑆𝐹𝑅 , and an increasein 𝐷 ( ) .The large median sizes and low-to-intermediate median Sérsic in-dices of star-forming clusters from both samples indicate that theyare dominated by disc galaxies (Table 3). Clusters G1-4 exhibit a risein their median 𝑛 𝑔 to intermediate values along their SFMSs, indi-cating increasingly concentrated morphologies among their galaxies.In Fig. 7, these clusters form morphological sequences that are sepa-rate from the distributions of passive clusters in the same plane. Thesequence of V1-4 is not as strong as that of G1-4; again, it is only inV4 that we see a signiﬁcant change, with the higher stellar masses ofits galaxies met with intermediate Sérsic indices.While there are slight trends in the median local environmentaloverdensities of the star-forming clusters in both samples (Table 3),Fig. 8 shows that their distributions thereof have very large spreadsand exhibit a great deal of overlap with the distributions of other SFMS clusters from the same sample. Therefore, we cannot attributethe reduction in the star formation activity of SFMS galaxies at highermasses to mainly environmental causes for either sample. Our red clusters, selected in two dimensions using the

𝑁𝑈𝑉 − 𝑟 − 𝐾 𝑠 plots in Fig. 4, are: G5, G6, and G7 for our GSWLC-2 sample,and V5, V6, and V7 for our VIPERS sample. The colour that bestseparates the passive clusters in both samples is 𝐹𝑈𝑉 − 𝑁𝑈𝑉 . ForG5-7, this separation corresponds with the higher sSFRs and lowermasses of G5 galaxies, and diﬀerences in the metallicities of G6and G7 galaxies (Table 2). V7 has been distinguished due to the highmasses and low sSFRs of its galaxies. However,

CIGALE ’s estimationof the astrophysical properties of V5 and V6 galaxies is less reliable(see below). In general, galaxies in the passive clusters are oﬀset toredder

𝑁𝑈𝑉 − 𝑢 colours than those in the SFMS clusters (Section4.4.1).Galaxies in clusters G6, G7, and V7 are alike with respect to mostfeatures. They share high stellar masses, low sSFRs, large 𝐷 ( ) (Fig. 6), and early-type morphologies (Table 2), all of which are typ-ical of canonically passive galaxies. CIGALE attributes the diﬀerencein the

𝐹𝑈𝑉 − 𝑁𝑈𝑉 colours of G6 and G7 galaxies (i.e. the feature thatbest separates these clusters) to their metallicity distributions. WhileG6 peaks strongly at 𝑍 ∼ − .

1, G7 is split evenly between peaksat 𝑍 ∼ . 𝑍 ∼ − .

4. The metallicities of passive GSWLC-2 galaxies are discretised by the input Bruzual & Charlot (2003)grid, and due to a lack of any input NIR photometry during theirSED estimation (see Appendix B); with more precise metallicities,their distributions might overlap more. V7 also has low metallici-ties in comparison with other clusters determined in its sample. Wenote that these sub-solar metallicities are unexpected for high-masspassive galaxies (e.g Gallazzi et al. 2006), indicating diﬃculties ofbreaking the age-dust-metallicity degeneracy with photometry alone,and suggesting that these metallicities are not entirely reliable. Alto-gether though, these clusters contain the oldest, most evolved galaxiesamong their respective samples: a subpopulation that is in place atthe epoch of our VIPERS sample.Galaxies in cluster G5, while also passive and early-type, havelower stellar masses than those in clusters G6 and G7. We alsonote a diﬀerence in the G5 median sSFRs as reported by

CIGALE (SED) and by the Brinchmann et al. (2004) calibration (ind.; Table2). G5 may contain post-starburst galaxies (PSBs; Wild et al. 2009),with this diﬀerence in sSFRs possibly arising due to the diﬀerenttimescales probed by these two measures (see Section 7 of Salimet al. 2016). While the ﬁbre component of 𝑠𝑆𝐹𝑅 (ind.) is a moreinstantaneous measure of star formation activity ( ∼

10 Myr, based onH 𝛼 emission), CIGALE averages star formation over a longer periodof time (100 Myr, matching the timescale traced by UV emission).Hence, even if the tail of a declining central burst of star formationactivity is not captured by 𝑠𝑆𝐹𝑅 (ind.), it may still be capturedby 𝑠𝑆𝐹𝑅 (SED). The spheroidal morphologies (Fig. 7, Table 3)and enhanced local environmental densities of G5 galaxies suggestan external inﬂuence upon their evolution (see Section 5.2), whichis consistent with previous studies which link PSBs with mergers(Zabludoﬀ et al. 1996; Yang et al. 2008; Almaini et al. 2017).Clusters V5 and V6 present conﬂicting identities in terms of fea-tures estimated by

CIGALE (Table 2). While their galaxies have verysimilar stellar masses and morphologies to those in V7 (Table 3), theyhave unusually high colour excesses, metallicities, and 𝑠𝑆𝐹𝑅 (SED).This is in contrast with the 𝑠𝑆𝐹𝑅 (ind.) and observed 𝐷 ( ) val-ues of these galaxies (Table 2, Fig. 6), which show that they are MNRAS000

Sérsic index versus stellar mass for the galaxies in our samples. Sérsic indices were determined by Simard et al. (2011) for our GSWLC-2 sample, andKrywult et al. (2017) for our VIPERS sample. The distributions of clusters are shown using coloured, ﬁlled contours (drawn at a relative density of 0 . . Figure 8.

Smoothed kernel density estimates in local environmental overden-sity ( 𝛿 ) for each of the clusters from both outcomes. For both samples, theseoverdensities are based on ﬁfth-nearest neighbour surface densities (Baldryet al. 2006; Cucciati et al. 2017). indeed passive. The large spread in 𝐷 ( ) of V5 may be due tosome minor contamination of the cluster by star forming galaxies; its 𝑁𝑈𝑉 − 𝑟 − 𝐾 𝑠 contour extends below the black line in Fig. 4, into theregion containing dusty star-forming galaxies. This may also driveits median 𝐸 ( 𝐵 − 𝑉 ) to a higher value.The inability of CIGALE to properly resolve the age-dust-metallicity degeneracy for V5 and V6 galaxies is due to the UVregions of their SEDs. Fig. 9 shows that V5 and V6 have steeper av-erage UV SEDs than V7. To explain the red UV colours (especially

𝐹𝑈𝑉 − 𝑁𝑈𝑉 ) of their galaxies,

CIGALE invokes high colour excesses

Figure 9.

A comparison of the shapes of the mean ( ± standard deviation)estimated SEDs of galaxies in clusters V5, V6, and V6. The estimated SEDs ofindividual galaxies are normalised by their 𝑟 -band magnitudes (the eﬀectivewavelength of which is marked by a dashed black line) before the meanestimated SEDs are calculated. The y-axis applies to the mean SED of V7;those of V5 and V6 are vertically oﬀset by − − and metallicities rather than low 𝑠𝑆𝐹𝑅 (SED). This appears to be aconsequence of CIGALE ’s two-burst SFHs, which may not be a realis-tic description of the SFHs of most passive VIPERS galaxies. TheseSFHs were adjusted for the epoch of our VIPERS sample by settingthe formation time of the old population to 6 . <

50 Myr). However, a trial of the use of a gradual 1 Gyrquenching episode instead led to improvements in the quality of ﬁtof passive SEDs (with low 𝑠𝑆𝐹𝑅 ) to the photometry of the majorityof V5 and V6 galaxies. Hence, it seems that further adjustments to

CIGALE ’s SFH prescription are required when applying it at higherredshifts . LePhare (Ilbert et al. 2006) SED estimation for the same galaxies (Moutardet al. 2016b; Siudek et al. 2018b) used single exponentials for its SFHs andreported lower colour excesses, metallicities, and 𝑠𝑆𝐹 𝑅 .MNRAS , 1–22 (2021) Sebastian Turner et al.

Galaxies contained within the passive clusters of our VIPERSsample tend to have higher stellar masses than those contained withinthe passive clusters of our GSWLC-2 sample (Table 2). This is likelyto be driven by stellar mass incompleteness of our VIPERS sample.Davidzon et al. (2013) show that, even at its lower redshift limit of 𝑧 = .

5, our VIPERS sample is incomplete in passive galaxies below ∼ M (cid:12) . Furthermore, their completeness threshold increaseswith redshift to 10 . M (cid:12) at our upper limit of 𝑧 = .

8, andthus skews our clusters of passive VIPERS galaxies towards higherstellar masses . Hence, where the GSWLC-2 sample has two lobesof passive galaxies in Fig. 3 (see also Appendix B), which diﬀerin average stellar mass by ∼ . FEM to model them with a dedicated cluster (i.e. likeG5).Passive clusters in both samples have high Sérsic indices andcompact sizes (Table 3), indicating spheroid-dominated morpholo-gies. They occupy separate regions of the plots in Fig. 7 to theirrespective SFMS clusters. Fig. 7 also shows that the 𝑛 𝑔 distributionsfor passive clusters are highly consistent with one another. Whilethe passive clusters in our GSWLC-2 sample exhibit a slight oﬀ-set to higher density environments in comparison with star-formingGSWLC-2 clusters, the environments of passive VIPERS clustersare consistent with those of star-forming VIPERS clusters. This dif-ference between the two samples is, in part, expected, due to theemergence of environments of especially high densities over cosmictime (e.g. Marinoni et al. 2008; Kovač et al. 2010; Fossati et al. 2017).However, factors such as spectroscopic ﬁbre collisions and the afore-mentioned incompleteness of passive VIPERS galaxies may alsoreduce the completeness of VIPERS at high densities. This incom-pleteness does not appear to have strongly aﬀected clusters elsewherein the feature space (Fig. 3). Our clusters have been determined on the basis of the rest-framecolours of galaxies alone. In this section, we aim to discern whatthe trends of these purely colour-based clusters with other, ancillaryfeatures (see Sections 4.4.1 and 4.4.2) tell us about how strongly theSEDs of their constituent galaxies encode their evolution.

Alongside being closely related in terms of the shapes of their SEDs(see Section 4.4.1), both sets of star-forming clusters – G1-4 andV1-4 – form clear morphological sequences in Fig. 7. In Fig. 10,we examine the bulge-to-total ratios of GSWLC-2 galaxies usingtwo-component Simard et al. (2011) ﬁts (no such data exists forVIPERS). The G1-4 sequence is apparent here as well, capturingthe rising prominences of the bulges of their galaxies. It does not extend to the highest 𝐵 / 𝑇 𝑟 values, despite G4 also containing somequenching and quenched galaxies. This indicates that G1-4 galaxiesretain their discs as they evolve and that some G1-4 galaxies become Our ‘secure’ redshift criterion (Section 2.2) may contribute slightly to thisincompleteness (i.e. by selecting against faint, passive VIPERS galaxies thatlack emission lines or strong absorption lines). However, Davidzon et al.(2013) used a more relaxed criterion, so we do not expect our use of thiscriterion to signiﬁcantly inﬂuence stellar mass completeness. Star-forming galaxies and clusters are aﬀected to a much lesser degree.

Figure 10.

Bulge-to-total ratio ( 𝐵 / 𝑇 𝑟 ) versus stellar mass for the galaxiesin our GSWLC-2 sample. Here, the subscript ‘ 𝑟 ’ denotes the 𝑟 -band pho-tometry from which the ratios were derived (Simard et al. 2011; based ontwo-component ﬁts). The distributions of clusters are shown using coloured,ﬁlled contours (drawn at a relative density of 0 . passive without fully transforming their morphologies. The changingbulge-disc balance appears to be captured also in the large spreadin 𝐷 ( ) of G4 galaxies in particular (Fig. 6). The overlappingenvironmental distributions of star-forming clusters in both samples(Fig. 8) suggest that these morphological sequences of gradual bulgegrowth are more likely to be due to internal processes (i.e. that act inall environments). We assume that our interpretation in this paragraphapplies to galaxies in V1-4 as well.Bar-driven inﬂows of star-forming gas (Sheth et al. 2005) – aninternal process that acts over long timescales – constitute a likelycandidate process. These inﬂows are commonly invoked to explainthe formation of dynamically cold ‘pseudobulges’ ( 𝑛 𝑐𝑙 (cid:46)

2) ratherthan the dynamically hot ‘classical’ ( 𝑛 𝑐𝑙 (cid:38)

2) bulges that the Simardet al. (2011) two-component ﬁts assume (Kormendy & Kennicutt2004; Fisher & Drory 2008; Mishra et al. 2017). However, an in-crease in the prominence of pseudobulges would nonetheless beexpected to be captured by the single-component ﬁts which yield theSérsic indices in Table 3 and Fig. 7. We do not rule out that SFMSgalaxies may have undergone major and/or minor mergers or clumpmigration (a faster, more violent internal process; Elmegreen et al.2008; Bournaud et al. 2011; Tonini et al. 2016) in their pasts; somehave high total 𝑛 𝑔 values, which may capture classical bulges formedas a result of these processes. Instead, we proﬀer that the processesdo not contribute to the gradual of the bulges of these galaxies. It hasbeen shown, for example, that the remnant of a gas-rich merger canreform a disc and continue to form stars, thus rejoining the SFMS(Hopkins et al. 2009a,b).The falling sSFRs of galaxies along the sequences G1-4 and V1-4suggests that their morphologies are also linked with their quenching.This could be due to morphological quenching (i.e. the gravitationalinﬂuence of the morphological components of galaxies upon starformation; Martig et al. 2009). It is more likely, though, that theprominences of the bulges among these galaxies are a marker ofnuclear activity. More massive bulges host more massive black holes MNRAS000 , 1–22 (2021) edshift evolution of galaxy subpopulations at their centres (Häring & Rix 2004), which supply more feedbackenergy to their surrounding galaxies. This feedback can inhibit furtherstar formation by ejecting star-forming gas (Croton et al. 2006; Gaboret al. 2011; Vergani et al. 2018) or by preventing the cooling of newlyaccreted gas (above the ‘transition mass’, ∼ . M (cid:12) at 𝑧 ∼ 𝑠𝑆𝐹𝑅 in comparison with G1-3galaxies (Table 2), supporting the suggestion that supermassive blackholes are involved in their quenching.That the sSFRs of V1-3 galaxies do not decline as strongly asthose of G1-3 galaxies may be tied to their morphologies; all threealso have very low median 𝑛 𝑔 . This suggests that their bulges and/orsupermassive black holes have not yet grown to the extent that theycan eﬀectively inhibit star formation. This would be consistent withFang et al. (2013) and Bluck et al. (2014), who ﬁnd that bulges mustexceed a threshold in mass or central density before they becomeassociated with quenching. For V4 galaxies, the reduction in sSFRis met with a rise to intermediate median 𝑛 𝑔 , suggesting that thisthreshold bulge mass has been achieved in some V4 galaxies.Altogether, G1-4 and V1-4 galaxies (which include the vast ma-jority of green valley galaxies) appear to evolve slowly and secularly(Schawinski et al. 2014; Ilbert et al. 2015; Moutard et al. 2016b;Paciﬁci et al. 2016). This is reﬂected in the similarity of their SEDs,which all feature relatively ﬂat UV regions that suggest a gradualreduction in their star formation over time. It is also reﬂected in themorphological sequences that their clusters exhibit. The rising bulgeprominences and declining star formation rates of these galaxiessuggests that nuclear feedback, fuelled by bar-driven inﬂows, is themain mechanism driving their evolution (Gabor et al. 2011; Moutardet al. 2020). While this mechanism appears to act at the epochs ofboth samples, the connection between morphologies and star for-mation is stronger at lower redshifts. This may be linked with thelong timescales over which these internal processes act, such thatthe gradual evolution of V1-4 galaxies may eventually lead to themore evolved distribution of galaxies given by clusters G1-4, whichwe assume to be their descendants. Hence, the rising prevalence ofbulges grown by internal processes over cosmic time (e.g. Bruceet al. 2012; Gu et al. 2019) would appear to be linked to the cosmic Figure 11.

A diagram for the classiﬁcation of emission-line galaxies(Lamareille 2010) in our GSWLC-2 sample. Diﬀerent regions, labelled anddemarcated by black lines, correspond to diﬀerent types of galaxy: ‘Sy2’ totype II Seyfert galaxies, ‘SF’ to purely star-forming galaxies, ’SF/Sy2’ to amixture of type II Seyfert and star-forming galaxies, ‘LINERs’ to galaxiescontaining low-ionisation nuclear emission-line regions, and ‘Comp.’ to amixture of LINERs and star-forming galaxies. The distributions of clustersare shown using coloured, ﬁlled contours (drawn at a relative density of 0.4),and the coloured, circular markers show their medians. decline of cosmic star formation activity. This connection betweenthe bulges and the star formation of SFMS galaxies has previouslybeen established (Cheung et al. 2012; Fang et al. 2013; Bluck et al.2014; Cano-Díaz et al. 2019; McPartland et al. 2019), but in ourcase it emerges purely from our clustering of galaxy colours, withmorphologies invoked post-clustering for interpretation. Our cluster-ing also appears to demonstrate that the SFMS is a two-dimensionalprojection of this pathway which, in the full nine-dimensional colourspace, extends continuously to also include high-mass passive galax-ies (as revealed by G4 in particular) that retain their discs, but aredegenerate with other passive galaxies in two dimensions.

The uniformly red

𝑁𝑈𝑉 − 𝑢 colours and the uniformly high Sérsicindices of galaxies in clusters G5-7 and V5-7 imply a strong linkbetween their passiveness and their concentrated morphologies. Athigh masses, this link may be obfuscated by a contribution from theinternally-driven evolutionary pathway that we propose in Section5.1. We note that cluster V7 in particular, containing VIPERS galax-ies with the highest masses, seems to align well with the sequence ofclusters V1-4 in Fig. 7, such that it could partially be an extension ofthis evolutionary pathway consisting of the oldest galaxies with themost prominent bulges. This is in agreement with previous studieswhich ﬁnd that the inner stellar density of galaxies is a successfulpredictor of its having been quenched (Cheung et al. 2012; Fang et al.2013; Bluck et al. 2014).However, other passive clusters are separated from their respec-tive sequences of star-forming clusters in Fig. 7. Clusters G7, G6, andespecially G5 (the latter containing the lowest-mass passive galax- MNRAS , 1–22 (2021) Sebastian Turner et al. ies in our GSWLC-2 sample) have high median 𝑛 𝑔 in comparisonwith other clusters centred at similar stellar masses (G2, G3). Thisseparation invites the interpretation that their galaxies are subject toalternative or additional evolutionary processes. That these clusterscontain those GSWLC-2 galaxies that occupy the highest-densityenvironments (Fig. 8) suggests an additional inﬂuence of externalprocesses. Hence, we suspect that a signiﬁcant proportion of galax-ies among G5-7 are satellite galaxies (occupying the halos of moremassive central galaxies; Ilbert et al. 2010; Muzzin et al. 2013;Moutard et al. 2018). There is a weaker morphological separationfor V5-7, and no environmental oﬀset, which we attribute mostlyto the incompleteness of low-mass passive galaxies in our VIPERSsample; these would also be expected to trace high-density environ-ments. Hence, our following discussion on the inﬂuence of externalprocesses upon satellite galaxies is conducted with reference to G5-7 only. Fully establishing whether external processes inﬂuence theevolution of low-mass passive galaxies at 𝑧 ∼ .

65 in the same wayrequires a more complete sample.Major and minor mergers (Toomre 1977; Barnes 1988, 1992;Walker et al. 1996) and harassment (Moore et al. 1996; Smith et al.2015), more common in environments of higher densities (Renzini1999; Tonini et al. 2016), are external processes which can increasethe Sérsic indices of galaxies by transforming their morphologiesfrom disc- to spheroid-dominated (Naab & Trujillo 2006; Aceveset al. 2006; Fisher & Drory 2008). Fig. 10 shows a range of bulge-to-total ratios among galaxies in G5-7, which may be capturing thevarying degrees to which these processes disrupt their morphologies.While most G5-7 galaxies are strongly spheroid-dominated, others(while still having high Sérsic indices) retain a disc component (with 𝐵 / 𝑇 𝑟 values as low as ∼ . by the gravitational inﬂuenceof that environment as a whole (‘strangulation’ or ‘starvation’; Lar-son et al. 1980; Peng et al. 2015). The galaxy then quenches slowlyby exhausting any remaining gas in its disc. The balance of theseprocesses is not yet known (Bahé & McCarthy 2015; Peng et al.2015; Smethurst et al. 2017), but recent studies advocate for a gen-eral ‘delayed-then-rapid’ quenching pathway (Wetzel et al. 2012, 2013; Muzzin et al. 2014; Moutard et al. 2018). Galaxies initiallyquench slowly at the outskirts of the environment, then quickly asthey approach its core, where the conditions for the aforementionedhydrodynamical interactions are expected. This delay could also ex-plain the large spreads in the environmental distributions among allof our clusters in Fig. 8. These quenching processes are, in turn,unlikely to transform the morphologies of low-mass passive galaxies(Bekki et al. 2002; Boselli et al. 2009; Zinger et al. 2018).In all, the separation of clusters G5-7 from G1-4 in terms of boththeir galaxies’ colours (i.e. those use as an input to the clustering, inparticular their 𝑁𝑈𝑉 − 𝑢 and 𝑁𝑈𝑉 − 𝑟 colours) and morphologies (i.e.their higher Sérsic indices), implies that their galaxies are subject toadditional evolutionary processes. Hence, we suggest that the strongoverlap between the passivity and the morphologies of G5-7 galaxiesappears to be a product of diﬀerent sets of environmental processes,which drive their quenching and morphological transformation sepa-rately (Poggianti et al. 1999; Kelkar et al. 2019). In addition, it impliesthat the quenching of galaxies precedes, or at least be simultaneousto, their morphological transformation (Schawinski et al. 2014; Wooet al. 2017). While the merger of two gas-rich, star-forming galax-ies may produce a rejuvenated remnant, mergers between passiveprogenitors will invariably produce passive remnants with increas-ingly spheroidal morphologies, ranging from lenticular galaxies withclassical bulges (Mishra et al. 2017, 2018, 2019) through to purespheroids. Fig. 12 shows the size-mass distribution of the clusters in each of oursamples. The stellar masses originate from the same

CIGALE

SEDsthat were used to generate the colours with which we represent thegalaxies for the clustering, and the half-light radii from ﬁts of singleSérsic proﬁles (see Sections 2.1.2 and 2.2.2). The size of a galaxy, inthe context of its stellar mass and its morphology, is another importantrecord of its assembly history. The positions and distributions of bothsets of clusters in these plots match well with broader blue versusred, and early- versus late-type distinctions made in the same (orsimilar) plane(s) by other studies (Shen et al. 2003; van der Wel et al.2014; Lange et al. 2015). This result again demonstrates that

FEM , viajust the nine input colours, is able to identify subpopulations that aredegenerate in two dimensions and that are ordinarily distinguishedusing a combination of photometric and morphological information.The most signiﬁcant diﬀerence between the two plots in Fig. 12 isthe absence of compact massive galaxies in our GSWLC-2 samplein comparison with our VIPERS sample. The canonical explanationfor the growth of these galaxies is ongoing minor merger activityand accretion (Naab et al. 2009; Hopkins et al. 2010). The resultantshift between the passive VIPERS clusters and the passive GSWLC-2 clusters is approximately in accordance with the expected redshiftevolution of the size-mass relation for early-type, passive galaxies(van Dokkum et al. 2015), though the mass-incompleteness of pas-sive VIPERS galaxies means that we are unlikely to have preciselycaptured this shift in this paper. The large overlap of G4 and V4with their respective passive clusters in Fig. 12 seems to support theadditional ‘late-track’ (late with respect to cosmic time rather thanto morphology) of galaxy evolution proposed by Barro et al. (2013)to yield disc-dominated passive galaxies (Ilbert et al. 2010; Carolloet al. 2013; Schawinski et al. 2014). Both sets of SFMS clusters aresimilarly distributed, capturing the minimal evolution of the sizes ofstar-forming galaxies between their two redshifts (Lilly et al. 1998;van der Wel et al. 2014).

MNRAS , 1–22 (2021) edshift evolution of galaxy subpopulations Figure 12.

Half-light radius versus stellar mass for the galaxies in our samples. Circularised half-light radii are calculated from single Sérsic ﬁts by Simard et al.(2011) for our GSWLC-2 sample, and Krywult et al. (2017) for our VIPERS sample. The distributions of clusters are shown using coloured, ﬁlled contours(drawn at a relative density of 0 . We present results from the application of the

FEM clustering algo-rithm to samples of galaxies at low ( 𝑧 ∼ .

06, from GSWLC-2) andintermediate ( 𝑧 ∼ .

65, from VIPERS) redshifts. Galaxies are rep-resented using nine UV-through-NIR broadband rest-frame colours,derived from ﬁts of ensembles of synthetic spectra to observed pho-tometry with

CIGALE . Using unsupervised machine learning to char-acterise the structures of our samples in this nine-dimensional featurespace, our aims (following Siudek et al. 2018b) were to understandthe evolution of subpopulations of galaxies in terms of these coloursover cosmic time, and to establish how strongly these colours aloneencode the assembly histories of galaxies. An advantage of

FEM isits incorporation of dimensionality reduction on the ﬂy, which en-sures that it determines clusters using only the most important anddiscriminative information available among the input features. Wesummarise our results as follows:(1) Our cluster evaluation search reveals that both of our samplesare best partitioned into seven clusters (Table 1). In addition, the best-ﬁtting submodels to each of our samples, identiﬁed independently,are closely related, both allowing variation in the shapes of clustersand diﬀering only in their treatment of ‘noise’ among the input fea-tures. For both samples, these seven clusters break down into four‘blue’ clusters containing mostly star-forming galaxies (and the vastmajority of green valley galaxies), and three ‘red’ clusters containingmostly passive galaxies (Fig. 4). These two families of clusters areclearly separable, both in terms of the input colours to the clusteringas well as in terms of ancillary features, which suggests diﬀerencesin the evolution of their galaxies. Clustering outcomes in general arehighly robust and reproducible.(2) Overall,

FEM uses the nine rest-frame colours similarly to de-termine the partitions (Fig. 2), reducing the dimensionality of thefeature space to 6 in both cases. Altogether, optical colours are mostimportant to the clustering; individually, UV colours are. The avail-ability of photometry with which to constrain the SEDs of galaxiesis advantageous to the clustering. UV colours are slightly more im- portant to the clustering in our GSWLC-2 sample, which has moreGALEX coverage than our VIPERS sample. Similarly, the lack of anyNIR coverage for our GSWLC-2 sample means that NIR colours areless important to its clustering. However, given the broader overallsimilarity between the clustering structures of the samples (Fig. 3),it appears that clustering (a statistical method) combined with SEDestimation (which can infer rest-frame magnitudes from incompletephotometry) has enabled us to partially ‘ﬁll the gaps’ of missing datain our samples.(3) Blue clusters (containing mostly star-forming galaxies andthe vast majority of green valley galaxies) in both samples formclear morphological sequences (Fig. 7). The correlation betweentheir median Sérsic indices and their median stellar masses capturesthe growth of the bulges of their galaxies along the SFMS (Fig.10). At the highest masses, this growth corresponds with a dropin speciﬁc star formation rates. Hence, the quenching of high-massgalaxies is inﬂuenced by their inner stellar densities, above a certainthreshold, which appears to be linked with nuclear activity (Fig.11). The retention of discs by the highest-mass galaxies along thismorphological sequence indicates that some galaxies quench withoutfully transforming their morphologies. The lack of a strong trend ofthese clusters with local environmental overdensity (Fig. 8) suggeststhat this evolutionary pathway is dominated by internal processes.This pathway, prominent at the epochs of both samples, appearsconsistent with ‘mass quenching’, as proposed by Peng et al. (2010).In addition, the SFMS appears to be a two-dimensional projectionof this pathway which, in nine dimensions, extends all of the wayto high-mass passive galaxies that retain their discs. We expect thatthe long timescales involved would ultimately lead the VIPERS star-forming clusters to resemble the GSWLC-2 star-forming clusters bythe present day.(4) Red clusters (containing mostly passive galaxies) are clearlyseparate from their corresponding sequences of blue clusters. Galax-ies in red clusters in both samples have uniformly high Sérsic in-dices, indicating a fundamental link between centrally-concentratedmorphologies and passiveness (Fig. 7). Passive clusters in our low-

MNRAS , 1–22 (2021) Sebastian Turner et al. redshift sample are separated from their respective sequence of star-forming clusters, particularly towards lower stellar masses (Figs. 7and 10). We assume that this separation originates from the inﬂu-ence of alternative or additional processes to those that dictate theevolution of actively star-forming galaxies. Invoking the oﬀset ofthese low-redshift passive clusters to high local environmental over-densities (Fig. 8), we suggest that some of their galaxies are satel-lites, and subject to external processes. The homogeneity of theirearly-type morphologies implies that their quenching precedes, or isat least simultaneous to, their morphological transformation. In all,this pathway appears consistent with ‘environment quenching’ (Penget al. 2010). This morphological separation is not as apparent for thepassive clusters in our VIPERS sample (Fig. 7), which is mainlydue to incompleteness of low-mass passive galaxies (which wouldalso be expected to trace high-density environments). Hence, we areprohibited from commenting on the prevalence of this evolutionarypathway at intermediate redshifts.Our study appears to conﬁrm the existence of two distinct evo-lutionary pathways of galaxies through the green valley (Poggiantiet al. 1999; Faber et al. 2007; Peng et al. 2010; Barro et al. 2013;Fritz et al. 2014; Schawinski et al. 2014; Moutard et al. 2016b). Were-emphasise that while much of our interpretation involves the useof ancillary features (and especially morphological information), theseparation of the clusters into two main families of blue/green andred clusters originates in the colours used as inputs to the cluster-ing. Hence, these pathways appear to be strongly encoded within theSEDs of galaxies. Our results invite further investigation into theextent to which a galaxy’s assembly history may be discerned purelyfrom its SED.The use of further ancillary features would be instrumental in fur-ther substantiating and constraining these pathways. A wealth of suchfeatures are available for our GSWLC-2 sample, due to its basis inSDSS. Examples include Galaxy Zoo 2 morphologies (Willett et al.2013) which include bar and merger classiﬁcations, and Yang et al.(2007) group memberships to enable a distinction between centraland satellite galaxies. A more detailed analysis of our low-redshiftsample in this manner is reserved for a future study. We note thatthe Galaxy And Mass Assembly project (Driver et al. 2009) couldprovide an alternative low-redshift sample, given its panchromaticdata release (Driver et al. 2016) and its rich library of value-addedcatalogues (Baldry et al. 2018). The upcoming Deep ExtragalacticVIsible Legacy Survey (DEVILS; Davies et al. 2018), which aimsto improve completeness at 0 . < 𝑧 < .

0, could be the basis foran improved intermediate-redshift sample upon its completion. Fur-thermore, the Legacy Survey of Space and Time (Ivezić et al. 2019),which will provide galaxy colours and morphologies together, con-stitutes a particularly promising foundation for a future follow-upstudy.The incompleteness of low-mass passive galaxies at intermediateredshifts would be alleviated by moving to deeper surveys such asG10-COSMOS (Andrews et al. 2017) and 3D-HST (Momcheva et al.2016), both of which also have panchromatic photometric data re-leases. This would enable an examination of environment quenchingat earlier epochs, and of its proposed increase in prevalence at lowerredshifts (Fossati et al. 2017; Moutard et al. 2018; Papovich et al.2018). Surveys like this could also extend our comparison to redshiftsas high as 𝑧 ∼

2, thus facilitating the constraint of the changing bal-ance of evolutionary pathways, informed by clustering of rest-framecolours, over a greater extent of cosmic time.

DATA AVAILABILITY

The data underlying this article will be made available upon reason-able request to the corresponding authors.

ACKNOWLEDGEMENTS

We thank Steven Bamford, Olga Cucciati, Marie Martig, Tsutomu T.Takeuchi, and our referee for feedback which improved the qualityof our manuscript.The work presented in this paper was conducted using thefollowing software: the scikit-learn (Pedregosa et al. 2011), matplotlib (Hunter 2007), scipy (Jones et al. 2001), and numpy (Oliphant 2006; Harris et al. 2020) packages for the

Python 3 pro-gramming language ( ); the

Fisher-EM subspace clustering package (Bouveyron & Brunet 2012; known inthis paper as

FEM ) for the R statistical computing environment (R CoreTeam 2019); and the Starlink Tool for OPerations on Catalogues AndTables ( TOPCAT ; Taylor 2005).The construction of the GALEX-SDSS-WISE Legacy Catalogue(GSWLC) was funded through NASA award NNX12AE06G.The VIMOS Public Extragalactic Redshift Survey (VIPERS) wasperformed using the ESO Very Large Telescope, under the ‘LargeProgramme’ 182.A-0886. The participating institutions and fundingagencies are listed at http://vipers.inaf.it .Funding for SDSS-III was provided by the Alfred P. Sloan Founda-tion, the Participating Institutions, the National Science Foundation,and the U.S. Department of Energy Oﬃce of Science. The SDSS-IIIwebsite is .SDSS-III is managed by the Astrophysical Research Consortiumfor the Participating Institutions of the SDSS-III Collaboration in-cluding the University of Arizona, the Brazilian Participation Group,Brookhaven National Laboratory, Carnegie Mellon University, Uni-versity of Florida, the French Participation Group, the German Par-ticipation Group, Harvard University, the Instituto de Astroﬁsica deCanarias, the Michigan State/Notre Dame/JINA Participation Group,Johns Hopkins University, Lawrence Berkeley National Laboratory,Max Planck Institute for Astrophysics, Max Planck Institute for Ex-traterrestrial Physics, New Mexico State University, New York Uni-versity, Ohio State University, Pennsylvania State University, Univer-sity of Portsmouth, Princeton University, the Spanish ParticipationGroup, University of Tokyo, University of Utah, Vanderbilt Uni-versity, University of Virginia, University of Washington, and YaleUniversity.ST has been supported by a United Kingdom Science and Tech-nology Facilities Council postgraduate studentship. MS has beensupported by the European Union’s Horizon 2020 Research andInnovation programme under the Maria Skłodowska-Curie grantagreement (No. 754510), the Polish National Science Centre (UMO-2016/23/N/ST9/02963), and the Spanish Ministry of Science andInnovation through the Juan de la Cierva-formacion programme(FJC2018-038792-I). KM has been supported by the Polish Na-tional Science Centre (UMO-2018/30/E/ST9/00082). This researchhas been supported by the Polish National Science Centre (UMO-2018/30/M/ST9/00757) and the Polish Ministry of Science andHigher Education (DIR/WK/2018/12).

REFERENCES

Aceves H., Velázquez H., Cruz F., 2006, MNRAS, 373, 632Ahn C. P., et al., 2014, ApJS, 211, 17MNRAS000

Aceves H., Velázquez H., Cruz F., 2006, MNRAS, 373, 632Ahn C. P., et al., 2014, ApJS, 211, 17MNRAS000 , 1–22 (2021) edshift evolution of galaxy subpopulations Almaini O., et al., 2017, MNRAS, 472, 1401Andrews S. K., Driver S. P., Davies L. J. M., Kaﬂe P. R., Robotham A. S. G.,Wright A. H., 2017, MNRAS, 464, 1569Arnouts S., et al., 2013, A&A, 558, A67Bahé Y. M., McCarthy I. G., 2015, MNRAS, 447, 969Baldry I. K., Glazebrook K., Brinkmann J., Ivezić Ž., Lupton R. H., NicholR. C., Szalay A. S., 2004, ApJ, 600, 681Baldry I. K., Balogh M. L., Bower R. G., Glazebrook K., Nichol R. C.,Bamford S. P., Budavari T., 2006, MNRAS, 373, 469Baldry I. K., et al., 2018, MNRAS, 474, 3875Ball N. M., Brunner R. J., 2010, International Journal of Modern Physics D,19, 1049Balogh M. L., Morris S. L., Yee H. K. C., Carlberg R. G., Ellingson E., 1999,ApJ, 527, 54Bamford S. P., et al., 2009, MNRAS, 393, 1324Barchi P. H., da Costa F. G., Sautter R., Moura T. C., Stalder D. H., Rosa R. R.,de Carvalho R. R., 2016, Journal of Computational InterdisciplinarySciences, 7, 114Barnes J. E., 1988, ApJ, 331, 699Barnes J. E., 1992, ApJ, 393, 484Barnes J. E., 2002, MNRAS, 333, 481Baron D., 2019, Machine Learning in Astronomy: a practical overviewBarro G., et al., 2013, ApJ, 765, 104Bekki K., Couch W. J., Shioya Y., 2002, ApJ, 577, 651Bell E. F., de Jong R. S., 2001, ApJ, 550, 212Biernacki C., Celeux G., Govaert G., 2000, IEEE Transactions on PatternAnalysis and Machine Intelligence, 22, 719Bluck A. F. L., Mendel J. T., Ellison S. L., Moreno J., Simard L., Patton D. R.,Starkenburg E., 2014, MNRAS, 441, 599Boquien M., Burgarella D., Roehlly Y., Buat V., Ciesla L., Corre D., InoueA. K., Salas H., 2019, A&A, 622, A103Boselli A., Gavazzi G., 2006, PASP, 118, 517Boselli A., Boissier S., Cortese L., Gavazzi G., 2009, AstronomischeNachrichten, 330, 904Bournaud F., Dekel A., Teyssier R., Cacciato M., Daddi E., Juneau S., ShankarF., 2011, ApJ, 741, L33Bouveyron C., Brunet C., 2012, Statistics and Computing, 22, 301Brinchmann J., Charlot S., White S. D. M., Tremonti C., Kauﬀmann G.,Heckman T., Brinkmann J., 2004, MNRAS, 351, 1151Bruce V. A., et al., 2012, MNRAS, 427, 1666Bruzual G., Charlot S., 2003, MNRAS, 344, 1000Calzetti D., Armus L., Bohlin R. C., Kinney A. L., Koornneef J., Storchi-Bergmann T., 2000, ApJ, 533, 682Cano-Díaz M., Ávila-Reese V., Sánchez S. F., Hernández-Toledo H. M.,Rodríguez-Puebla A., Boquien M., Ibarra-Medel H., 2019, MNRAS,488, 3929Carollo C. M., et al., 2013, ApJ, 773, 112Chabrier G., 2003, PASP, 115, 763Charlot S., Longhetti M., 2001, MNRAS, 323, 887Chary R., Elbaz D., 2001, ApJ, 556, 562Cheung E., et al., 2012, ApJ, 760, 131Conroy C., 2013, ARA&A, 51, 393Conseil S., Vibert D., Amouts S., Milliard B., Zamojski M., Liebaria A.,Guillaume M., 2011, EMphot — Photometric Software with BayesianPriors: Application to GALEX. p. 107Cowie L. L., Songaila A., 1977, Nature, 266, 501Croton D. J., et al., 2006, MNRAS, 365, 11Cucciati O., et al., 2017, A&A, 602, A15Da Cunha E., Charlot S., Elbaz D., 2008, MNRAS, 388, 1595Davidzon I., et al., 2013, A&A, 558, A23Davidzon I., et al., 2016, A&A, 586, A23Davidzon I., et al., 2019, MNRAS, 489, 4817Davies L. J. M., et al., 2018, MNRAS, 480, 768Dekel A., Birnboim Y., 2006, MNRAS, 368, 2Di Matteo T., Springel V., Hernquist L., 2005, Nature, 433, 604Driver S. P., et al., 2009, Astronomy and Geophysics, 50, 5.12Driver S. P., et al., 2016, MNRAS, 455, 3911Driver S. P., et al., 2018, MNRAS, 475, 2891 Elmegreen B. G., Bournaud F., Elmegreen D. M., 2008, ApJ, 688, 67Faber S. M., 1972, A&A, 20, 361Faber S. M., et al., 2007, ApJ, 665, 265Fang J. J., Faber S. M., Koo D. C., Dekel A., 2013, ApJ, 776, 63Fisher D. B., Drory N., 2008, AJ, 136, 773Fossati M., et al., 2017, ApJ, 835, 153Fritz A., et al., 2014, A&A, 563, A92Gabor J. M., Davé R., Oppenheimer B. D., Finlator K., 2011, MNRAS, 417,2676Gallazzi A., Charlot S., Brinchmann J., White S. D. M., 2006, MNRAS, 370,1106Garilli B., et al., 2014, A&A, 562, A23Gilbank D. G., Baldry I. K., Balogh M. L., Glazebrook K., Bower R. G.,2010, MNRAS, 405, 2594Gilbank D. G., Baldry I. K., Balogh M. L., Glazebrook K., Bower R. G.,2011a, MNRAS, 412, 2111Gilbank D. G., et al., 2011b, MNRAS, 414, 304Gu Y., Fang G., Yuan Q., Cai Z., Wang T., 2018, ApJ, 855, 10Gu Y., Fang G., Yuan Q., Lu S., Li F., Cai Z.-Y., Kong X., Wang T., 2019,ApJ, 884, 172Gunn J. E., Gott III J. R., 1972, ApJ, 176, 1Guzzo L., et al., 2014, A&A, 566, A108Häring N., Rix H.-W., 2004, ApJ, 604, L89Harris C. R., et al., 2020, Nature, 585, 357Heckman T. M., 1980, A&A, 500, 187Hemmati S., et al., 2019, ApJ, 881, L14Hocking A., Geach J. E., Davey N., Sun Y., 2017, in 2017 International JointConference on Neural Networks. p. 4179Hocking A., Geach J. E., Sun Y., Davey N., 2018, MNRAS, 473, 1108Hopkins P. F., et al., 2009a, MNRAS, 397, 802Hopkins P. F., Cox T. J., Younger J. D., Hernquist L., 2009b, ApJ, 691, 1168Hopkins P. F., et al., 2010, ApJ, 715, 202Hunter J. D., 2007, Computing In Science & Engineering, 9, 90Ilbert O., et al., 2006, A&A, 457, 841Ilbert O., et al., 2010, ApJ, 709, 644Ilbert O., et al., 2013, A&A, 556, A55Ilbert O., et al., 2015, A&A, 579, A2Ivezić Ž., et al., 2019, ApJ, 873, 111Jarvis M. J., et al., 2013, MNRAS, 428, 1281Jones E., Oliphant T., Peterson P., et al., 2001, SciPy: Open source scientiﬁctools for Python.

Kauﬀmann G., et al., 2003, MNRAS, 346, 1055Kelkar K., Gray M. E., Aragón-Salamanca A., Rudnick G., Jaﬀé Y. L.,Jablonka P., Moustakas J., Milvang-Jensen B., 2019, MNRAS, 486, 868Kereš D., Katz N., Fardal M., Davé R., Weinberg D. H., 2009, MNRAS, 395,160Kormendy J., Kennicutt Robert C. J., 2004, ARA&A, 42, 603Kovač K., et al., 2010, ApJ, 708, 505Kraft R. P., et al., 2017, ApJ, 848, 27Krywult J., et al., 2017, A&A, 598, A120Kullback S., Leibler R. A., 1951, Ann. Math. Statist., 22, 79Lamareille F., 2010, A&A, 509, A53Lang D., Hogg D. W., Schlegel D. J., 2016, AJ, 151, 36Lange R., et al., 2015, MNRAS, 447, 2603Larson R. B., Tinsley B. M., Caldwell C. N., 1980, ApJ, 237, 692Le Fèvre O., et al., 2003, in Iye M., Moorwood A. F. M., eds, Society of Photo-Optical Instrumentation Engineers (SPIE) Conference Series Vol. 4841,Instrument Design and Performance for Optical/Infrared Ground-basedTelescopes. p. 1670Lilly S., et al., 1998, ApJ, 500, 75Lisboa P., Ellis I., Green A., Ambrogi F., Dias M., 2008, Pattern RecognitionLetters, 29, 1814Lloyd S., 1982, IEEE Transactions on Information Theory, 28, 129MacQueen J., 1967, in Proceedings of the Fifth Berkeley Symposium onMathematical Statistics and Probability, Volume 1: Statistics. p. 281Madau P., Dickinson M., 2014, ARA&A, 52, 415Madau P., Ferguson H. C., Dickinson M. E., Giavalisco M., Steidel C. C.,Fruchter A., 1996, MNRAS, 283, 1388 MNRAS , 1–22 (2021) Sebastian Turner et al.

Manzoni G., et al., 2019, arXiv e-prints, p. arXiv:1911.02445Maraston C., 2005, MNRAS, 362, 799Marchetti A., et al., 2013, MNRAS, 428, 1424Marinoni C., et al., 2008, A&A, 487, 7Martig M., Bournaud F., Teyssier R., Dekel A., 2009, ApJ, 707, 250Martin D. C., et al., 2005, ApJ, 619, L1Martin D. C., et al., 2007, ApJS, 173, 342Martin G., Kaviraj S., Hocking A., Read S. C., Geach J. E., 2020, MNRAS,491, 1408Masters K. L., et al., 2010, MNRAS, 405, 783McCarthy I. G., Frenk C. S., Font A. S., Lacey C. G., Bower R. G., MitchellN. L., Balogh M. L., Theuns T., 2008, MNRAS, 383, 593McPartland C., Sanders D. B., Kewley L. J., Leslie S. K., 2019, MNRAS,482, L129Mihos J. C., Hernquist L., 1994a, ApJ, 425, L13Mihos J. C., Hernquist L., 1994b, ApJ, 431, L9Mihos J. C., Hernquist L., 1996, ApJ, 464, 641Mishra P. K., Wadadekar Y., Barway S., 2017, MNRAS, 467, 2384Mishra P. K., Wadadekar Y., Barway S., 2018, MNRAS, 478, 351Mishra P. K., Wadadekar Y., Barway S., 2019, MNRAS, 487, 5572Momcheva I. G., et al., 2016, ApJS, 225, 27Moore B., Katz N., Lake G., Dressler A., Oemler A., 1996, Nature, 379, 613Morrissey P., et al., 2007, ApJS, 173, 682Moutard T., et al., 2016a, A&A, 590, A102Moutard T., et al., 2016b, A&A, 590, A103Moutard T., Sawicki M., Arnouts S., Golob A., Malavasi N., Adami C.,Coupon J., Ilbert O., 2018, MNRAS, 479, 2147Moutard T., Malavasi N., Sawicki M., Arnouts S., Tripathi S., 2020, MNRAS,495, 4237Muzzin A., et al., 2013, ApJ, 777, 18Muzzin A., et al., 2014, ApJ, 796, 65Naab T., Trujillo I., 2006, MNRAS, 369, 625Naab T., Johansson P. H., Ostriker J. P., 2009, ApJ, 699, L178Nipoti C., Binney J., 2007, MNRAS, 382, 1481Noeske K. G., et al., 2007, ApJ, 660, L43Noll S., Burgarella D., Giovannoli E., Buat V., Marcillac D., Muñoz-MateosJ. C., 2009, A&A, 507, 1793Nulsen P. E. J., 1982, MNRAS, 198, 1007Okada T., Tomita S., 1985, Pattern Recognition, 18, 139Oliphant T., 2006, A guide to NumPy. Trelgol PublishingPaciﬁci C., et al., 2016, ApJ, 832, 79Papovich C., et al., 2018, ApJ, 854, 30Pedregosa F., et al., 2011, Journal of Machine Learning Research, 12, 2825Peek J. E. G., Schiminovich D., 2013, ApJ, 771, 68Peng Y.-j., et al., 2010, ApJ, 721, 193Peng Y., Maiolino R., Cochrane R., 2015, Nature, 521, 192Poggianti B. M., Smail I., Dressler A., Couch W. J., Barger A. J., Butcher H.,Ellis R. S., Oemler Augustus J., 1999, ApJ, 518, 576Puget P., et al., 2004, WIRCam: the infrared wide-ﬁeld camera for the Canada-France-Hawaii Telescope. p. 978R Core Team 2019, R: A Language and Environment for Statistical Comput-ing. R Foundation for Statistical Computing, Vienna, Austria,

Renzini A., 1999, in Carollo C. M., Ferguson H. C., Wyse R. F. G., eds, TheFormation of Galactic Bulges. p. 9Roberts M. S., 1963, ARA&A, 1, 149Salim S., 2014, Serbian Astronomical Journal, 189, 1Salim S., Rich R. M., 2010, ApJ, 714, L290Salim S., et al., 2007, ApJS, 173, 267Salim S., et al., 2016, The Astrophysical Journal Supplement Series, 227, 2Salim S., Boquien M., Lee J. C., 2018, ApJ, 859, 11Sánchez Almeida J., Aguerri J. A. L., Muñoz-Tuñón C., de Vicente A., 2010,ApJ, 714, 487Schawinski K., et al., 2009, MNRAS, 396, 818Schawinski K., et al., 2014, MNRAS, 440, 889Schwarz G., 1978, Ann. Statist., 6, 461Scodeggio M., et al., 2018, A&A, 609, A84 Scott D., Thompson J., 1983, Computer Science and Statistics: Proceedingsof the Fifteenth Symposium on the InterfaceSérsic J. L., 1963, Boletin de la Asociacion Argentina de Astronomia La PlataArgentina, 6, 41Sérsic J. L., 1968, Atlas de Galaxias AustralesShen S., Mo H. J., White S. D. M., Blanton M. R., Kauﬀmann G., Voges W.,Brinkmann J., Csabai I., 2003, MNRAS, 343, 978Sheth K., Vogel S. N., Regan M. W., Thornley M. D., Teuben P. J., 2005, ApJ,632, 217Simard L., Mendel J. T., Patton D. R., Ellison S. L., McConnachie A. W.,2011, ApJS, 196, 11Siudek M., et al., 2017, A&A, 597, A107Siudek M., et al., 2018a, arXiv e-prints, p. arXiv:1805.09905Siudek M., et al., 2018b, A&A, 617, A70Skrutskie M. F., et al., 2006, AJ, 131, 1163Smethurst R. J., et al., 2015, MNRAS, 450, 435Smethurst R. J., Lintott C. J., Bamford S. P., Hart R. E., Kruk S. J., MastersK. L., Nichol R. C., Simmons B. D., 2017, MNRAS, 469, 3670Smith R., et al., 2015, MNRAS, 454, 2502Speagle J. S., Steinhardt C. L., Capak P. L., Silverman J. D., 2014, ApJS, 214,15Spinrad H., 1962, ApJ, 135, 715Spinrad H., Taylor B. J., 1971, ApJS, 22, 445Springel V., Di Matteo T., Hernquist L., 2005a, MNRAS, 361, 776Springel V., Di Matteo T., Hernquist L., 2005b, ApJ, 620, L79Steinhardt C. L., Weaver J. R., Maxﬁeld J., Davidzon I., Faisst A. L., MastersD., Schemel M., Toft S., 2020, ApJ, 891, 136Strateva I., et al., 2001, AJ, 122, 1861Taylor M. B., 2005, in Shopbell P., Britton M., Ebert R., eds, AstronomicalSociety of the Paciﬁc Conference Series Vol. 347, Astronomical DataAnalysis Software and Systems XIV. p. 29Tonini C., Mutch S. J., Croton D. J., Wyithe J. S. B., 2016, MNRAS, 459,4109Toomre A., 1977, in Tinsley B. M., Larson Richard B. Gehret D. C., eds,Evolution of Galaxies and Stellar Populations. p. 401Turner S., et al., 2019, MNRAS, 482, 126Valiante E., et al., 2016, MNRAS, 462, 3146Vergani D., et al., 2018, A&A, 620, A193Walker I. R., Mihos J. C., Hernquist L., 1996, ApJ, 460, 121Weigel A. K., et al., 2017, ApJ, 845, 145Wetzel A. R., Tinker J. L., Conroy C., 2012, MNRAS, 424, 232Wetzel A. R., Tinker J. L., Conroy C., van den Bosch F. C., 2013, MNRAS,432, 336Wild V., Walcher C. J., Johansson P. H., Tresse L., Charlot S., Pollo A., LeFèvre O., de Ravel L., 2009, MNRAS, 395, 144Willett K. W., et al., 2013, MNRAS, 435, 2835Williams R. J., Quadri R. F., Franx M., van Dokkum P., Labbé I., 2009, ApJ,691, 1879Woo J., Carollo C. M., Faber S. M., Dekel A., Tacchella S., 2017, MNRAS,464, 1077Wright E. L., et al., 2010, AJ, 140, 1868Wuyts S., et al., 2007, ApJ, 655, 51Wyder T. K., et al., 2007, ApJS, 173, 293Yang X., Mo H. J., van den Bosch F. C., Pasquali A., Li C., Barden M., 2007,ApJ, 671, 153Yang Y., Zabludoﬀ A. I., Zaritsky D., Mihos J. C., 2008, ApJ, 688, 945Yip C. W., et al., 2004, AJ, 128, 585York D. G., et al., 2000, AJ, 120, 1579Yuan H. B., Liu X. W., Xiang M. S., 2013, MNRAS, 430, 2188Zabludoﬀ A. I., Zaritsky D., Lin H., Tucker D., Hashimoto Y., ShectmanS. A., Oemler A., Kirshner R. P., 1996, ApJ, 466, 104Zinger E., Dekel A., Kravtsov A. V., Nagai D., 2018, MNRAS, 475, 3654de Souza R. S., et al., 2017, MNRAS, 472, 2808van Dokkum P. G., et al., 2015, ApJ, 813, 23van der Wel A., et al., 2014, ApJ, 788, 28MNRAS000

ICL scores reported at iterations 1 through 25 by various com-binations of submodel and 𝑘 for our GSWLC-2 sample. For each submodel,we show the value of 𝑘 which yields the highest ICL score. These iterationproﬁles are generally quite ﬂat, indicating that FEM quickly converges to astable outcome. The large changes exhibited by Σ , 𝛿 𝑘 , 𝑘 = APPENDIX A: ITERATIONS OF

FEM

In Fig. A1, we show ICL scores reported at each of up to 25 itera-tions by various combinations of submodel and 𝑘 for our GSWLC-2sample. These ‘iteration proﬁles’ are mostly quite ﬂat; hence, 25 it-erations are more than suﬃcient for allowing FEM to stabilise to anoutcome. In addition, the bulk of the clustering structure appears tobe determined during the k-means initialisation step, which spreadsthe cluster centres out ahead of the ﬁrst iteration. The ICL criterionrewards separated clusters, so k-means initialisations are particu-larly well suited to yielding useful clustering outcomes. Trials of theuse of uniform random initialisations resulted in more combinationsof submodels and 𝑘 failing to converge.Variations in the ICL values reported by individual combinationsof submodel and 𝑘 over successive iterations arise due to the Fisherstep of FEM , in which the subspace within which the clusters are tobe modelled is found. Hence, the updating of the model parametersduring the Maximisation step is indirectly related to the probabilitiescalculated in the Expectation step. For traditional EM algorithms,these steps are directly related and thereby guarantee convergence.The large changes between successive iterations exhibited by somecombinations (e.g. Σ , 𝛿 𝑘 , 𝑘 =

9) are most often due to the emptyingof clusters; a reduction in the number of clusters used by

FEM leads,in these cases, to a sudden increase in ICL.

APPENDIX B: SMOOTHING OF FEATURE DATA FOROUR GSWLC-2 SAMPLE

Preliminary tests revealed that a truncated, bimodal substructureamong passive galaxies within the nine-dimensional colour spacerepresenting our GSWLC-2 sample (see the left-hand plot of Fig.B1; also visible in Fig. 3) led to an inability of

FEM to convergefor the majority of submodels and values of 𝑘 . This truncated bi-modal substructure is due to the lack of input NIR photometry tothe CIGALE

SED estimation of GSWLC-2 galaxies, such that theirNIR SEDs must be inferred from UV and optical photometry. This,in turn, leads to poorly constrained, discretised metallicities: galax-ies at 𝑟 − 𝐾 𝑠 (cid:46) .

67 peak strongly at log ( 𝑍 ) ∼ − .

4, and those

Figure B1.

The eﬀect of our smoothing on the distribution of GSWLC-2galaxies in the passive region of the

𝑁𝑈𝑉 − 𝑟 − 𝐾 𝑠 colour-colour plane.Substructures in the distribution of galaxies within this region are preservedpost-smoothing. at 𝑟 − 𝐾 𝑠 (cid:38) .

67 at log ( 𝑍 ) ∼ − .

1. The NIR SEDs of VIPERSgalaxies, on the other hand, are constrained by 𝐾 𝑠 -band photometryand hence have slightly more freedom to vary. This smooths theircolour and metallicity distributions.We hence opt to apply a small level of Gaussian smoothing to theGSWLC-2 distributions of the rest-frame absolute magnitudes re-ported by CIGALE. The smoothing scale for the rest-frame absolutemagnitude of a given galaxy is given by its Bayesian error. These er-rors are winsorised at the mean value of the logarithmic distributionof errors (i.e. errors larger than the mean value are set to the meanvalue). This winsorisation ensures that the smoothing scale is keptsmall enough to avoid the potential loss of astrophysically meaning-ful substructures, while still enabling FEM to converge more readily.The absolute rest-frame magnitude most aﬀected by this smoothingis

𝐹𝑈𝑉 , whose errors are winsorised at a maximum value of 0 .

25 (allother magnitudes have a maximum error < . APPENDIX C: BEHAVIOUR OF THE VARIOUSSUBMODELS OF

FEM

FOR OUR SAMPLES

Our model selection approach considers ICL scores for 72 diﬀer-ent combinations of submodel and 𝑘 for each of our samples. Thecomparison of these 72 combinations is simpliﬁed greatly by therealisation that several submodels exhibit consistent patterns of be-haviour across all values of 𝑘 . FEM is unable to converge to an outcome for several combinationsof submodel and 𝑘 . The most common diagnosis made by FEM inthe case of non-convergence is that a cluster has become empty(i.e. that it no longer contains galaxies). Table 1 shows that severalsubmodels are unable to converge beyond a maximum value of 𝑘 ,suggesting a limit to their ability to properly partition the samples.Alternatively, submodels that converge at 𝑘 , but fail to converge at 𝑘 − 𝑘 + Σ 𝑘 , 𝛿 𝑘 and Σ 𝑘 , 𝛿 submodels oﬀer the greatest promise amongall of the FEM submodels for yielding detailed and astrophysicallymeaningful partitions of our samples. The outcomes they produce

MNRAS , 1–22 (2021) Sebastian Turner et al. are similar; they exhibit near-identical trends in their ICL scores for 𝑘 = 𝑘 = 𝑘 generally consist of splitsof clusters present in outcomes at lower values of 𝑘 .Submodels featuring non-unique covariance matrices for the Gaus-sian density functions representing the clusters (i.e. submodels with Σ and 𝛼 , such that they all have the same shape) consistently produceclusters with highly disparate sizes. Some clusters are large, contain-ing 30 to 60 per cent of the galaxies in our samples each (and eachoften spanning both blue and red galaxies); others are empty or nearlyempty, containing (cid:46) FEM registers a valid ICL score for theseoutcomes when they include empty clusters (often cited as a causefor the failure of

FEM ; see above), it is clear that these submodels aretoo crude to return more than a very broad partition of our samples,and that their outcomes are limited in their capacity for astrophysicalinterpretation. All of this is also true for the Σ , 𝛿 clustering outcomeat 𝑘 = 𝛼 𝑘, 𝑗 , 𝛼 𝑘 ) for theGaussian density functions within the discriminative latent subspaceis that they segment our samples principally along a single dimen-sion. Several representative examples of their clustering structuresare shown in Fig. C1, revealing that this single dimension is moststrongly associated with the UV colours among our nine input fea-tures, with little-to-no distinction made between galaxies based ontheir NIR colours. We note that these submodels scored highestwhen we tested clustering of our samples using 𝑖 -band magnitudesof galaxies as a reference point for deﬁning colours (as in Siudeket al. 2018b; see also Section 3.4), producing the same striping pat-tern within the 𝑁𝑈𝑉 − 𝑟 − 𝐾 𝑠 plane. While this simple segmentationdoes correspond broadly with incremental changes in the star forma-tion activity of galaxies within our samples, other submodels (with Σ 𝑘 ) return more detailed partitions and achieve higher ICL scoresanyway.The large spread in the ICL scores reported in Table 1 arisesdirectly from a large spread in the log-likelihood values of the ﬁts.This large spread in the log-likelihood values arises, in turn, primarilyfrom a 1 / 𝛿 𝑘 coeﬃcient in the log-likelihood function of DLM model(which may be seen in full in appendix 2 of Bouveyron & Brunet2012). Submodels which yield very large but negative log-likelihood(and hence, ICL) values tend to have very small 𝛿 𝑘 values for most(if not all) of their clusters; usually 0 . FEM imposes upon the value of 𝛿 𝑘 . Very small values of 𝛿 𝑘 produce verylarge, positive values of 1 / 𝛿 𝑘 , and (via a − / This paper has been typeset from a TEX/L A TEX ﬁle prepared by the author.

Figure C1.

Examples of the clustering structures determined by 𝛼 𝑘, 𝑗 and 𝛼 𝑘 submodels for our GSWLC-2 sample, shown in the 𝑁𝑈𝑉 − 𝑟 − 𝐾 𝑠 colour-colour plane. The combination of submodel and 𝑘 for each outcomeis shown to the lower-right of each plot. Individual galaxies are coloured inaccordance with the cluster to which they belong. The choice of colours inthis ﬁgure is not intended to imply any trends within or between plots. Thehorizontal striping pattern exhibited by these examples in these plots, which isa general property of 𝛼 𝑘, 𝑗 - and 𝛼 𝑘 -based outcomes, indicates segmentationmainly along a single axis.MNRAS000