[PDF] Accurate Identification of Galaxy Mergers with Stellar Kinematics

Abstract

To determine the importance of merging galaxies to galaxy evolution, it is necessary to design classification tools that can identify different types and stages of merging galaxies. Previously, using GADGET-3/SUNRISE simulations of merging galaxies and linear discriminant analysis (LDA), we created an accurate merging galaxy classifier from imaging predictors. Here, we develop a complementary tool based on stellar kinematic predictors derived from the same simulation suite. We design mock stellar velocity and velocity dispersion maps to mimic the specifications of the Mapping Nearby Galaxies at Apache Point (MaNGA) integral field spectroscopy (IFS) survey and utilize an LDA to create a classification based on a linear combination of 11 kinematic predictors. The classification varies significantly with mass ratio; the major (minor) merger classifications have a mean statistical accuracy of 80% (70%), a precision of 90% (85%), and a recall of 75% (60%). The major mergers are best identified by predictors that trace global kinematic features, while the minor mergers rely on local features that trace a secondary stellar component. While the kinematic classification is less accurate than the imaging classification, the kinematic predictors are better at identifying post-coalescence mergers. A combined imaging + kinematic classification has the potential to reveal more complete merger samples from imaging and IFS surveys like MaNGA. We note that since the suite of simulations used to train the classifier covers a limited range of galaxy properties (i.e., the galaxies are intermediate mass and disk-dominated), the results may not be applicable to all MaNGA galaxies.

Full PDF

DD RAFT VERSION F EBRUARY

5, 2021Typeset using L A TEX twocolumn style in AASTeX62

Accurate Identiﬁcation of Galaxy Mergers with Stellar Kinematics

R. N

EVIN , L. B

LECHA , J. C

OMERFORD , J. E. G

REENE , D. R. L AW , D. V. S

TARK , K. B. W

ESTFALL , J. A. V

AZQUEZ -M ATA , R. S

METHURST , M. A

RGUDO -F ERNÁNDEZ , J. R. B

ROWNSTEIN , AND

N. D

RORY

Center for Astrophysics | Harvard & Smithsonian, 60 Garden St., Cambridge, MA 02138, USA Department of Physics, University of Florida, Gainesville, FL 32611, USA Department of Astrophysical and Planetary Sciences, University of Colorado, Boulder, CO 80309, USA Department of Astrophysical Sciences, Princeton University, Princeton, NJ 08544, USA Space Telescope Science Institute, 3700 San Martin Drive, Baltimore, MD 21218, USA Department of Physics and Astronomy, Haverford College, 370 Lancaster Ave, Haverford, PA 19041, USA University of California Observatories, University of California, Santa Cruz, 1156 High St., Santa Cruz, CA 95064, USA Instituto de Astronomía, Universidad Nacional Autónoma de México, A.P. 70-264, 04510, CDMX, México Oxford Astrophysics, Department of Physics, University of Oxford, Denys Wilkinson Building, Keble Road, Oxford, OX1 3RH, UK Instituto de Física, Pontiﬁcia Universidad Católica de Valparaíso, Casilla 4059, Valparaíso, Chile Department of Physics and Astronomy, University of Utah, 115 S. 1400 E., Salt Lake City, UT 84112, USA McDonald Observatory, The University of Texas at Austin, 1 University Station, Austin, TX 78712, USA (Accepted February 2, 2021)

ABSTRACTTo determine the importance of merging galaxies to galaxy evolution, it is necessary to design classiﬁcationtools that can identify different types and stages of merging galaxies. Previously, using

GADGET-3/SUNRISE simulations of merging galaxies and linear discriminant analysis (LDA), we created an accurate merging galaxyclassiﬁer from imaging predictors. Here, we develop a complementary tool based on stellar kinematic predictorsderived from the same simulation suite. We design mock stellar velocity and velocity dispersion maps to mimicthe speciﬁcations of the Mapping Nearby Galaxies at Apache Point (MaNGA) integral ﬁeld spectroscopy (IFS)survey and utilize an LDA to create a classiﬁcation based on a linear combination of 11 kinematic predictors.The classiﬁcation varies signiﬁcantly with mass ratio; the major (minor) merger classiﬁcations have a meanstatistical accuracy of 80% (70%), a precision of 90% (85%), and a recall of 75% (60%). The major mergersare best identiﬁed by predictors that trace global kinematic features, while the minor mergers rely on localfeatures that trace a secondary stellar component. While the kinematic classiﬁcation is less accurate than theimaging classiﬁcation, the kinematic predictors are better at identifying post-coalescence mergers. A combinedimaging + kinematic classiﬁcation has the potential to reveal more complete merger samples from imaging andIFS surveys like MaNGA. We note that since the suite of simulations used to train the classiﬁer covers a limitedrange of galaxy properties (i.e., the galaxies are intermediate mass and disk-dominated), the results may not beapplicable to all MaNGA galaxies.

Keywords:

Galaxy evolution, Galaxy interactions, Galaxy kinematics, Galaxy mergers, Stellar kinematics INTRODUCTIONObservations of galaxies reveal that they evolve over cos-mic time from smaller, bluer, more irregular star-forminggalaxies at higher redshifts to larger, redder, more ellipti-cal galaxies in the local universe (e.g., Glazebrook et al.1995; Lilly et al. 1995; Giavalisco et al. 1996). Addition-

Corresponding author: R. [email protected] ally, the bimodality of galaxy properties such as color, mass,and star formation rate at low redshift implies that galaxiesare quenching, or shutting down their star formation, in thelocal universe as well (e.g., Schawinski et al. 2009; Mas-ters et al. 2010; Weigel et al. 2017). Galaxy evolution, orchanges in the size, structures, and star formation propertiesof galaxies, in lower mass (log ( M ∗ / M (cid:12) ) < .

5) galaxiesis largely driven by the accretion of gas and/or the preven-tion of this gas from forming stars (Robotham et al. 2014).Many different processes can drive this evolution, from in- a r X i v : . [ a s t r o - ph . GA ] F e b ternal processes that are dependent on the galaxy properties,to external processes, which are related to the surroundingsof the galaxy. Examples of internal processes include feed-back from active galactic nuclei (AGNs; Croton et al. 2006;Fabian 2012; Heckman & Best 2014), star formation drivenoutﬂows (Rupke 2018 and references therein), and morpho-logical quenching due to structures such as bars or stellarbulges (Sheth et al. 2005). External processes include galaxyinteractions with the hot intracluster medium that remove orheat gas (Gunn & Gott 1972), ‘cold ﬂow’ accretion from thecosmic web (Dekel et al. 2009), and galaxy mergers (Silk &Rees 1998; Di Matteo et al. 2005; Kaviraj 2013).While the current Λ CDM framework for structural forma-tion in the universe points to the importance of mergers forassembling dark matter halos (White & Rees 1978; White& Frenk 1991; Cole et al. 2008), the relative contributionof mergers to galaxy evolution through processes such asstar formation, AGN activity, and/or morphological trans-formation remains unclear. This disagreement stems largelyfrom the difﬁculty of building large, unambiguous samplesof merging galaxies. Galaxy mergers are inherently difﬁcultto identify; they persist for ∼ few Gyr and they have a diver-sity of identifying characteristics that vary with merger stage,mass ratio, gas fraction, orbital parameters, and other mergerinitial conditions.The difﬁculty of identifying merging galaxies also con-tributes to the uncertainty in the merger rate ( R merg ), whichis a key measurement for quantifying the role of mergers ingalaxy evolution and comparing observations to simulations(e.g., López-Sanjuan et al. 2008). The merger rate can eitherbe measured directly from simulations or empirically, usingthe observed merger fraction and assuming a merger ‘observ-ability’ timescale (Lotz et al. 2011). Both techniques showlarge scatter between different estimates of the merger rate;semi-analytic models and hydrodynamical simulations havediscrepancies of about an order of magnitude (see Hopkinset al. (2010) and references therein), while observations havealso not converged, due to uncertainties in merger timescalesand the completeness of the different methodologies. Re-cently, however, Mantha et al. (2018) have demonstratedthat different empirical estimates of the merger rate can bebrought into agreement. This work demonstrates the impor-tance of careful calibration of the completeness of the mergeridentiﬁcation methodology and the observability timescale.Clean and complete samples of merging galaxies are there-fore needed to address the contributions of mergers to evolu-tionary processes in galaxies and to reduce systematic uncer-tainties in the galaxy merger rate. This in turn necessitates athorough understanding of the limitations and observabilitytimescale of the technique used to identify merging galax-ies. A variety of imaging techniques exist to identify merg-ing galaxies, all of which are susceptible to their own biases. These often rely upon individual imaging tools, or predictors,such as the Gini − M methodology, or the asymmetry of thegalaxy light in imaging. One approach to overcome these bi-ases is to utilize simulations of merging galaxies to betterunderstand the shortcomings of individual tools and to char-acterize the observability timescales of these methods. Forexample, Lotz et al. (2008); Lotz et al. (2010b,a) use simula-tions of merging galaxies to measure the length of time that amajor merger is observable by the Gini − M and asymme-try metric. They ﬁnd 0 . − . ∼ few Gyrduration of the merger. Another strategy is to combine thepredictors to create a single classiﬁcation tool that dramati-cally lengthens the observability timescale by capitalizing onthe strengths of the individual methods (e.g., Goulding et al.2018; Snyder et al. 2019).In Nevin et al. (2019) (henceforth N19), we pursue bothof these approaches and utilize GADGET-3/SUNRISE sim-ulated galaxies to build a merger identiﬁcation technique.This technique combines seven imaging predictors to cre-ate one more accurate and precise classiﬁer that incorpo-rates the strength of all of these predictors, lengthening theobservability timescale to > and dark matter (Glazebrook 2013). Disturbances in the stel-lar kinematics are dynamically long-lived and can identifya merger long after the imaging signatures have faded. Forinstance, morphological disturbances like tidal tails can fadeon a ∼

500 Myr timescale following ﬁnal coalescence andare faint compared to the light of the galaxy (e.g., Hung et al.2014; Wen & Zheng 2016) whereas kinematic disturbance inthe stars of a galaxy can persist for longer (up to ∼ Gyr afterﬁnal coalescence; Hung et al. 2016).Kinematic predictors may additionally clear up ambigu-ities in imaging. For instance, some clumpy star-forminggalaxies appear to be mergers in imaging due to their dis-turbed morphologies (Miralles-Caballero et al. 2011; Pettyet al. 2014), yet galaxies with clumpy morphologies canactually be nonmerging spiral galaxies with clumps of starformation in their centers or in their spiral arms (Alonso-Herrero et al. 2006; Haan et al. 2011). Kinematics haveshown promise as an additional tool to determine if a star-forming galaxy is disk-like (e.g., White et al. 2017).This type of clumpy star-forming galaxy is even moreabundant at intermediate and high redshifts, where a higherfraction of galaxies are expected to be actively merging,yet many isolated (non-merging) galaxies are also inherentlyclumpy (e.g., Guo et al. 2015). In addition to their clumpyand rapidly evolving morphologies, high redshift galaxiesalso have distinct kinematic features such as high velocitydispersions regardless of whether they are actively mergingor isolated (e.g., Law et al. 2012b,a). The decreasing spatialresolution and surface brightness dimming of high redshiftgalaxies also confound the identiﬁcation of mergers. Sincehigh redshift galaxies present a host of additional compli-cations, in this work we focus on local galaxies in order todevelop the groundwork for a method that could eventuallybe extended to the more distant universe.Like every other merger identiﬁcation tool, kinematic pre-dictors also have their own set of ambiguities and limitations.For instance, in gas-rich mergers, disks are able to survive themerger and these recently-merged galaxies can masqueradeas isolated disk galaxies (e.g., Robertson et al. 2006). Hunget al. (2015) ﬁnd that relying upon kinematics alone to clas-sify a sample of ULIRGs identiﬁes many merging galaxiesas isolated disks and would provide a false-negative mergeridentiﬁcation for up to 50% of ULIRGs. Additionally, theidentiﬁcation technique depends strongly on the merger stageand the choice of kinematic predictor. Other work conﬁrmsthat some mergers with highly disturbed visual morphologyexhibit a distinct lack of disturbance in the stellar kinemat-ics (Bellocchi et al. 2013; Hung et al. 2016). It is thereforeimportant to probe the kinematics of merging galaxies usingsimulations in order to understand the biases and limitationsof these tools before applying them to real galaxies.There is currently a wealth of work dedicated to the imag-ing approach to identifying merging galaxies from large sur-veys. While there are many detailed case studies of thekinematics of individual local mergers (e.g., Dasyra et al.2006; Piqueras López et al. 2012), there is a lack of de-tailed statistical-sized kinematic studies of local mergers. Re-cent years have brought a revolution in more and more ca-pable IFS surveys, creating opportunities to identify merg-ing galaxies using kinematic signatures. Surveys such asATLAS-3D (Cappellari et al. 2011), CALIFA (Sánchez et al.2012), SAMI (Croom et al. 2012), MaNGA (Bundy et al.2015), and HECTOR (Bryant et al. 2016) offer a promis-ing avenue to study the spatially-resolved spectral proper-ties of an astounding number of galaxies. Here, we fo-cus on the nearing-completion Mapping Nearby Galaxies atApache Point Observatory (MaNGA) survey. MaNGA is an IFS survey of > z ∼ .

03) with a spectral resolution of R ∼ − > GADGET-3/SUNRISE simulations from N19, describe theprocess of creating mock stellar kinematic maps from thesynthetic spectra of the galaxy merger simulations, introducethe kinematic predictors, and review the linear discriminantanalysis (LDA) technique used in N19 and in this work. In§3 we describe the results of the LDA classiﬁcation, includ-ing the coefﬁcients of the LDA, the observability timescales,and the accuracy and precision of the method. In §4 we de-scribe the LDA coefﬁcients in the context of previous workon mergers, how the classiﬁcation changes with mass ratio,and examine the performance of the kinematic classiﬁcationtechnique in the context of other tools and statistical meth-ods. We present our conclusions in §5. In this work we fo-cus on creating the kinematic classiﬁcation from simulatedgalaxies. In future work we plan to apply the classiﬁcation togalaxies in the MaNGA survey. A cosmology with Ω m = . Ω Λ = .

7, and h = . METHODSIn order to construct a merger identiﬁcation frameworkfrom the kinematics of simulated galaxies, we follow adetailed procedure to best mimic observations from theMaNGA survey. We introduce the galaxy merger simula-tions in §2.1 and describe the process for preparing mockkinematic maps from the simulated galaxies in §2.2. Finally,we introduce the kinematic predictors that we utilize in thekinematic classiﬁcation in §2.3.We dedicate several appendices to discussions that are in-formative but ancillary to the goals of this paper. We makethe deliberate choice to extract the stellar kinematics fromthe

SUNRISE spectra, as opposed to relying directly on par-ticle velocities. We discuss the implications of this choiceand compare the extracted stellar velocity and velocity dis-persion maps to the inherent velocity of the simulation par-ticles in §A. In §A we also discuss the effects of dust on thesimulated observations. In §B we include more details aboutadding noise to the mock spectra. In §C we address AGNcontamination and how we extract stellar kinematics fromgalaxies that host AGN.2.1.

GADGET-3/SUNRISE

Overview

As in N19, we utilize

GADGET-3/SUNRISE simulationsof merging galaxies.

GADGET-3 (Springel & Hernquist2003; Springel 2005) is a smoothed particle hydrodynamics(SPH) and N-body code that models processes such as ra-diative heating, radiative cooling, star formation, supernovafeedback, and the multi-phase interstellar medium (ISM) us-ing sub-resolution prescriptions.

GADGET-3 also includesSMBH accretion as well as AGN feedback (this is achievedby coupling 5% of the accreted luminosity to the gas as ther-mal energy).

GADGET has been used for many differentastrophysical applications, including wide use in studies ofmerging galaxies (e.g., Di Matteo et al. 2005; Cox et al. 2006;Hopkins et al. 2006; Robertson et al. 2006; Hopkins et al.2008; Blecha et al. 2011, 2013b; Hopkins et al. 2013a,b).We present the ﬁve galaxy merger simulations and thematched isolated simulations in Table 1. The framework forthese simulations is established in Blecha et al. (2018), andthe simulations themselves are presented in N19. Three ofthe simulations are major mergers (where the mass ratio, q ,of the progenitors is greater than q = .

25, or 1:4, Rodriguez-Gomez et al. 2015; Nevin et al. 2019) and two of the simula-tions are minor mergers. The major mergers have mass ratiosof 1:2, 1:3, and 1:3. We deﬁne the gas fraction of these sim-ulations as f gas = M gas , disk / ( M gas , disk + M ∗ , disk ) . The 1:2 and1:3 mass ratio major mergers have a relatively high gas frac-tion of 0.3 and one of the 1:3 mass ratio major mergers has arelatively low gas fraction of 0.1.We verify that the different gas fractions of the simula-tions (0.1 and 0.3) cover the full range in gas fractions ofthe MaNGA galaxies. The mean gas fraction in MaNGA isdeﬁned by Barrera-Ballesteros et al. (2018) as: µ gas = σ gas σ gas + σ ∗ where σ gas is the gas mass density and σ ∗ is the stellar massdensity. The gas mass and stellar mass densities are derivedfrom the Balmer decrement and stellar template ﬁtting, re-spectively.Barrera-Ballesteros et al. (2018) ﬁnd that the mean gasfraction for the MaNGA sample ranges from 0.16 to 0.32.A f gas value of 0.1 is therefore below the mean for MaNGAgalaxies and thus relatively gas poor, while f gas = . > . > . GADGET-3 with

SUNRISE in order to di-rectly compare the simulated galaxies with observations.

SUNRISE is a 3D polychromatic Monte-Carlo dust radia-tive transfer (RT) code (Jonsson 2006; Jonsson et al. 2010)that is used to model a wide range of isolated and merginggalaxies (e.g., Narayanan et al. 2010; Hayward et al. 2011;Blecha et al. 2013a; Snyder et al. 2013; Hayward et al. 2014).The full details of the

SUNRISE prescription are presentedin Blecha et al. (2013b, 2018) and N19. Brieﬂy,

SUNRISE performs Monte Carlo radiative transfer on a 3D adaptively-reﬁned grid to compute the emission from stars, HII regions,and AGN.

SUNRISE uses the STARBURST99 stellar pop-ulation synthesis models (Leitherer et al. 1999) to calculatethe age- and metallicity-dependent spectral energy distribu-tions for each star particle. The treatment for dust includesdust self-absorption and thermal re-emission as well as poly-cyclic aromatic hydrocarbon (PAH) absorption and emission.We additionally include kinematic (doppler) effects, whichrequires very high spectral resolution. Ultimately,

SUNRISE calculates the emergent, attenuated resolved UV-to-IR spec-tra (3300 − ∆ λ = . SUNRISE optical syntheticspectra from the seven isotropically positioned viewpointsfrom each merger snapshot to produce the mock datacubes.In N19, a ‘snapshot’ is the

SUNRISE image; in this work,we use the term ‘snapshot’ to refer to the full datacube froma speciﬁc point in time. These snapshots occur at 50-100Myr intervals during each merger, and we refer to them asearly-stage, late-stage, and post-coalescence stage snapshots.We deﬁne these stages using the r − band images from N19.The early-stage mergers occur after ﬁrst pericentric passageand have (view-point) average stellar bulge separations ∆ x ≥

10 kpc, late-stage mergers have separations 1 kpc < ∆ x < Simulation Mass Ratio Gas Fraction Stellar Mass of Primary Matched Isolated Galaxies[10 M (cid:12) ]q0.5_fg0.3 1:2 0.3 3.9 m0.5_fg0.3, m1_fg0.3q0.333_fg0.3 1:3 0.3 3.9 m1_fg0.3q0.333_fg0.1 1:3 0.1 4.7 m0.333_fg0.1, m1_fg0.1q0.2_fg0.3_BT0.2 1:5 0.3 4.2 m1_fg0.3_BT0.2q0.1_fg0.3_BT0.2 1:10 0.3 4.2 m1_fg0.3_BT0.2 Table 1.

Key simulation parameters and matched isolated galaxies. The simulations are named for the mass ratio, gas fraction and bulge-to-totalmass ratio of the merging galaxies. For instance, q0.5_fg0.3 is a 1:2 mass ratio merger where each progenitor galaxy has a gas fraction of 0.3and an initial B/T ratio of 0. The stellar mass of the primary (more massive) galaxy is 3 . × M (cid:12) . The matched isolated galaxies aremass-matched to the merging galaxies and are named for which merging galaxy they are matched to (i.e., m0.5_fg0.3 is matched to the smallerof the two merging galaxies in the q0.5_fg0.3 merger).

10 kpc, and post-coalescence mergers are no longer resolv-able with separations ∆ x ≤ ∼

20 snapshots per simulation and seven viewpoints persnapshot, which amounts to 100-200 observations per mergersimulation.We further discuss the importance of running RT and incor-porating dust attenuation and scattering on the merger snap-shots in Appendix A; brieﬂy, the stellar kinematic maps areaffected by both the presence of dust and dust scattering. Theimplication is that for this type of kinematic analysis, it is im-portant to use velocities derived directly from the RT product(synthetic spectra) as opposed to the original SPH particlevelocities.2.2.

Preparing Mock MaNGA Kinematic Maps

To produce stellar kinematics for our sample of simulatedgalaxies, we use the speciﬁcations of MaNGA to create a dat-acube of spectra and then we mimic the MaNGA Data Anal-ysis Pipeline (DAP) to extract stellar kinematics to use in ourkinematic classiﬁcation. Examples of ﬁnalized ‘MaNGA-ized’ stellar velocity and stellar velocity dispersion maps arepresented in Figure 1. In this section, we describe how wemimic the speciﬁcations of MaNGA. This involves reducingthe spatial and spectral resolution of the simulations to cre-ate a MaNGA-ized datacube, placing an appropriately sizedﬁber bundle over each galaxy, and ﬁtting each spaxel with ppxf (a penalized pixel-ﬁtting method from Cappellari &Emsellem 2004; Cappellari 2017) to obtain the velocity andvelocity dispersion of the stars at each spatial position.SDSS-IV/MaNGA is an IFS survey that targets a sampleof > R ∼ Figure 1.

Snapshots of images (left column), stellar velocity maps(middle column), and stellar velocity dispersion maps (right col-umn) from different epochs of the q0.5_fg0.3 simulation. The r − band image is the log-scaled full-resolution simulation prior tothe mock-up process in order to show all of the features of themerger. The colorbar for the middle and right columns is in kms − . The spatial position for all panels is in arcsec and the stellarvelocity and stellar velocity dispersion columns have the same spa-tial coverage. We include a snapshot that is an early-stage merger(ﬁrst row), a late-stage merger (second row), and a post-coalescencemerger (third row). The stellar kinematics change over the courseof the merger. For instance, the stellar velocity map is distorteddue to the superposition of two merging galaxies, while the velocitydispersion map undergoes a global enhancement with time. equipped with 19, 37, 61, 91, and 127 ﬁbers (the largestﬁber bundle is known as the ‘frankenbundle’); each individ-ual ﬁber has a 2 . (cid:48)(cid:48) . (cid:48)(cid:48) . (cid:48)(cid:48) . (cid:48)(cid:48) . (cid:48)(cid:48) r − band effective radius( R e ) and the secondary sample has coverage to 2.5 R e (Yanet al. 2016a; Wake et al. 2017). The redshift range of theMaNGA survey is 0 . ≤ z ≤ . SUNRISE synthetic spectra, which we extract at the median redshift ofthe MaNGA survey ( z = . SUNRISE

SDSS r − and g − band images to construct the mock dat-acubes, since they are essential for certain steps of the pro-cess. We follow this procedure (which mirrors the MaNGADAP whenever possible):1. Convolve the datacube with the 2 . (cid:48)(cid:48) . (cid:48)(cid:48) . (cid:48)(cid:48) . (cid:48)(cid:48) R ∼ g − band image that is rebinned to the 0 . (cid:48)(cid:48) g − band S/N cutoff value of 1.Follow the procedure from N19 to convolve, rebin, andintroduce noise characteristic of SDSS imaging to themock g − band images to match the 0 . (cid:48)(cid:48) g − bandS/N per spaxel and mask all spaxels that fall below aS/N cutoff value of 1. This procedure directly followsthe MaNGA DAP (Westfall et al. 2019), which masksall spaxels using the same g − band S/N cutoff.4. Use the MaNGA procedure to select which sized ﬁberbundle to use for each mock datacube and mask thespaxels that are external to this hexagonal footprint.We use statmorph (Rodriguez-Gomez et al. 2019)to measure the effective radius of the mock r − band im-ages from N19. We then determine the smallest ﬁberbundle needed to cover each galaxy to 1.5 R e (this ishow MaNGA’s primary sample is deﬁned). We selectthe smallest ﬁber bundle if the total angular extent ofthe galaxy (2 × e ) is smaller than 12 . (cid:48)(cid:48) . (cid:48)(cid:48) g − band S/N value for each spaxel.The end result is a sqrt(variance), or error spectrum,which we use to introduce random noise to each spaxelin the datacube. The noisy spectra and the accompany-ing error spectra are the inputs to ppxf . More detailsof this process can be found in Appendix B.To verify that the S/N of the simulated spectra are rep-resentative of the MaNGA sample, we use the peak g − band S/N as a comparison statistic, which is themaximum value of the g − band S/N (per pixel) froma single galaxy observation. The peak g − band S/N ra-tio of a sample of MaNGA galaxies that span the fullrange of sizes, surface brightnesses, and stellar massesof the MaNGA sample ranges from 10 - 60, with a me-dian of 25. The same statistic for the simulation suiteranges from 10 - 100, with a median of 30. In §3.9, weexperiment with changing the S/N of simulated spec-tra, and investigate how this affects the classiﬁcation.Since the MaNGA datacubes oversample the effectivePSF, they also contain signiﬁcance covariance in theerrors between adjacent spaxels such that the S/N ratioof binned spectra does not increase as √ N . This co-variance is irrelevant for the ﬁtting of individual spec-tra, but we account for it in our Voronoi binning by fol-lowing the analytic approximation given by Law et al.(2016), as we discuss below.6. After completing the masking steps, we further ex-clude regions that are background dominated.At this stage, we notice that the datacubes have‘patchy’ outskirts, or regions of low S/N data thatare surrounded by masked regions. The MaNGA dat-acubes do not have this feature; instead, they excluderegions that can be characterized as ‘background dom-inated’. This patchiness does not affect the resultsof the classiﬁcation, instead we choose to correct itfor cosmetic purposes. To do this, we mask spaxelswhere the g − band signal is less than 3 σ above thebackground value, where σ is the standard deviationof the noise given above. This produces the desiredeffect, where the mask has a sharper cutoff, matchingthe appearance of the MaNGA cubes.7. Rebin spatially using a Voronoi binning scheme with g − band S/N of 10 (Cappellari & Copin 2003).We create spatial bins that have a g − band S/N of 10,reproducing the procedure described in Westfall et al.(2019). When a Voronoi bin contains more than onespaxel, the new spectrum is the masked average of allconstituent spectra while the error spectrum for thatbin is determined by co-adding the error spectra. Itis important to account for covariance between neigh-boring spaxels in our Voronoi bin calculation. In or-der to avoid the computational cost of calculating thecovariance matrix for all spaxels, we instead use thecorrection from Law et al. (2016). The correction isan analytic function of the number of spaxels in a bin( N bins ): n measured / n no covar = + . × log ( N bins ) where n measured is the corrected noise level after the cor-rection is applied to the co-added error where covari-ance is not considered ( n no covar ) and N bins is the num-ber of spaxels in a bin.The ﬁnal step of the creation of mock kinematic maps is topass the Voronoi binned spectra through ppxf (Cappellari& Emsellem 2004; Cappellari 2017). ppxf is a penalizedpixel-ﬁtting method which assumes that a galaxy spectrumis a combination of stellar templates that are convolved withthe line-of-sight velocity distribution (LOSVD). To run ppxf , we follow these steps from the DAP:• Normalize the ﬂux data so that the mean over all tem-plates is unity.• Mask the spectra to match the wavelength range of the MILES-HC library (3600-7400 Å).• Mask the emission lines using the DAP module Stel-larContinuumBitMask().• Use the 42 template

MILES-HC spectral library toglobally ﬁt each datacube.The templates are ﬁrst convolved to the spectral reso-lution of MaNGA. • We use the ‘ NZT ’, or non-zero template iteration modeto ﬁt all bins with ppxf .In this mode (which is also used in the DAP), we ﬁrstﬁt the masked average of all spectra in the datacubeand use this global ﬁt to isolate the subset of templatesallocated non-zero weights. This template subset isthen used to individually ﬁt each bin.• Each ﬁt iteration of ppxf uses an additive eight-orderLegendre polynomial and a Gaussian line of sight ve-locity dispersion (LOSVD) with two moments. As inthe DAP, due to limited spectral resolution, we do notsolve for the higher order moments h and h (Westfallet al. 2019).The ﬁnal product of our MaNGA-izing procedure is theﬁrst two moments of the LOSVD, or a stellar velocity mapand a stellar dispersion map, both with associated error mapsfrom the ﬁt to the stellar continuum.2.3. Preparing Kinematic Predictors

Here we deﬁne and describe the predictors extracted fromthe stellar kinematic maps. The goal is to create a set ofkinematic predictors that adequately describe the differenttypes of merger-induced kinematics in the velocity and ve-locity dispersion maps.To develop this kinematic identiﬁcation tool, we use thestellar kinematics instead of the warm ionized gas kinemat-ics (henceforth, ‘gas kinematics’). The stellar and gas kine-matics trace different physical regions and processes in themerging galaxies. We select the stellar kinematics becausethey directly trace the assembly history of a galaxy’s stellarpopulation. On the other hand, the gas kinematics can besubject to a number of non-gravitational forces. The stellar This is a departure from the DAP. However, as noted in Westfall et al.(2019), there is no mathematical difference between our approach and latersubtracting the difference in resolution in quadrature from the ppxf result. kinematics and the gas kinematics diverge in the presence ofshocks, inﬂows, and/or outﬂows, all of which are processesthat are not limited to merging galaxies. An analysis built ongas kinematics is a compelling direction for future work butis beyond the scope of this paper (i.e., see Khim et al. 2020). The kinematic predictors are based on previous work toidentify merging galaxies from the stellar kinematics of ob-served and simulated galaxies. All of these predictors aresensitive to different orientations, merger stages, mass ratios,and/or gas fractions of merging galaxies. Our goal is to com-bine them into one LDA classiﬁcation to best identify a va-riety of different types and epochs of merging galaxies. Intotal, we extract the following predictors (which are all in-troduced in Table 2): A , A , ∆ PA, v asym , σ asym , resid, λ R e , (cid:15) , ∆ x V , ∆ x σ , µ , V , µ , σ , µ , V , µ , σ , | µ , V | , | µ , σ | , µ , V , and µ , σ . We include a brief deﬁnition for all predictors in Ta-ble 2 but focus the remainder of this section on the kinematicpredictors that were selected by the random forest term selec-tion technique described in §3.3: A , ∆ PA, resid, λ R e , µ , V , µ , σ , µ , V , µ , σ , | µ , V | , | µ , σ | , µ , V , and µ , σ . These termsare the most informative for identifying the merging galaxiesand we discuss them throughout the rest of the paper. We fur-ther describe the kinematic predictors that were not selectedin Appendix D.To deﬁne the asymmetry in the kinematic position an-gle ( A ), we utilize the Radon Transform from Stark et al.(2018). We transform the velocity maps into circular coor-dinates ( ρ , θ ) where ρ is the distance from the spaxel to thecenter of the velocity map, which is the kinematic center (de-ﬁned below), and θ is the angle between the positive x-axisand the line segment from the kinematic center to the spaxel.The angle θ ranges from 0 to 180 in the CCW direction. Pos-itive values of ρ are regions of the velocity map above thex-axis and negative values of ρ are below the positive x-axis.The Radon Transform is deﬁned as: R ( ρ, θ ) = (cid:90) L v ( x , y ) dl (1)where the velocity is summed along line integrals that arecentered on the point ( ρ , θ ) and perpendicular to the kine-matic center of the galaxy. The Radon Transform is a 2Darray that is calculated at all values of ρ and θ .We then calculate the bounded Absolute Radon Transform, R AB , which is integrated over a distance R e and is the absolutevalue of the difference between the velocity at each point andthe mean value along the line segment. Gas kinematics are not available for many MaNGA galaxies (sincemany are non-starforming), but are easier to obtain than stellar kinematicsfor many high redshift galaxies and could be more compelling direction topursue in this context.

We present the bounded Absolute Radon Transform andthe Radon proﬁle in Figure 2. The Radon proﬁle is com-puted by determining the minimum value of θ ( ˆ θ , where thehat operator denotes an estimated value) for each value of ρ from the bounded Absolute Radon Transform. The value of ˆ θ traces the direction of maximal rotation in the stellar velocitymaps at each radial position.We follow the procedure from Stark et al. (2018) to de-termine the galaxy’s kinematic center, which we describe inmore detail in Appendix D.We quantify the asymmetry of the Radon proﬁles using thekinematic predictor A , from Stark et al. (2018): A = (cid:88) i δ ˆ θσ δ ˆ θ , i (2)where δ ˆ θ is the absolute magnitude of the difference between θ i on one side of the Radon proﬁle to the other (same ρ , dif-ferent sign), σ δ ˆ θ is the uncertainty on δ ˆ θ , and the expressionis summed over the i values of ˆ θ .The A predictor incorporates the absolute magnitude ofthe difference between the measured kinematic PA on oneside of the galaxy to the other. We therefore expect that A will be enhanced for merging galaxies, since mergers cancause warps in the stars in a galaxy (e.g., Shapiro et al. 2008).We use kinemetry to measure both ∆ PA and residfrom the LOSVD (Krajnovic et al. 2006). Functionally, kinemetry measures the kinematic asymmetry from theline of sight velocity maps by dividing them into a set ofnested elliptical rings. The best ﬁt model at each radius isdetermined using a ring deﬁned by the kinematic PA and theﬂattening factor q f = 1- e , where e is the ellipticity of the ringin the plane of the sky. These models use a decomposition ofthe moment maps into harmonic Fourier coefﬁcients in po-lar coordinates. For instance, a velocity map, K ( r , ψ ) can beexpanded into a ﬁnite number of harmonic frequencies: K ( r , ψ ) = A ( r ) + N (cid:88) n = A n ( r ) sin ( n ψ ) + B n ( r ) cos ( n ψ ) (3)where r is the semimajor axis of the ellipse, ψ is the az-imuthal angle, A ( r ) is the systemic velocity, N is the numberof ellipses ﬁt, and A n and B n are the coefﬁcients of the har-monic expansion. The best-ﬁtting ellipses are obtained byminimizing χ for the linear combination of the A n and B n coefﬁcients.An ideal rotating disk can be described using only the B term, which represents the cosine term for the circular veloc-ity of a galaxy’s rotating disk: V ( r , ψ ) = V c ( r ) sin i cos ψ (4)where r is the radius in the plane of the galaxy, ψ is the az-imuthal angle, V c ( r ) is the circular velocity, and i is the incli-nation of the galaxy disk. Predictor Name Description Derivation A The weighted asymmetry in the position angle, A = (cid:80) i δ ˆ θ i N i , j w i , j which is calculated from the Radon proﬁle where ˆ θ is the best ﬁt kinematic position angle A The error-weighted asymmetry in the position angle A = (cid:80) i δ ˆ θ i σ δ ˆ θ , i ∆ PA The difference between the global kinematic and ∆ PA = | PA kin − PA img | photometric position angles, which are measuredfrom kinemetry and the g − band image σ asym Describes the degree of smoothness of the σ asym = < (cid:80) n = k n , v / A , v > r velocity dispersion map which is the sum of the higherorder coefﬁcients from kinemetry v asym The deviation of the velocity dispersion map v asym = < (cid:80) = k n , v / , v > r from ordered rotation same as above but excluding the k , v termresid The residual between the best ﬁt kinemetry resid = (cid:80) Ni , j | V ∗ − V model | N model and the velocity map λ R e The approximate spin parameter λ R e = (cid:80) Nn = F n R n | V n | (cid:80) Nn = F n R n √ V n + σ n (cid:15) Galaxy ellipticity Measured using statmorph from the r − band imaging ∆ x V The spatial distance between the center of the The imaging center is measured from the r − band image andvelocity map and the imaging center in kpc the kinematic center is from the Radon Transform ∆ x σ Same as above, but for the velocity dispersion map The center of the velocity dispersion map is determinedusing a low pass ﬁlter µ , V and µ , σ The mean of the distribution of the The distributions for each map are createdvelocity and velocity dispersion maps by collecting the values from all spaxels µ , V and µ , σ The variance of the distributions | µ , V | and | µ , σ | The skewness of the distributions µ , V and µ , σ The kurtosis of the distributions

Table 2.

Synthesis of all of the kinematic predictors measured in this paper. We highlight the predictors that are selected as important in purple.We include a brief description and derivation for each predictor. For more details, see §2.3 for the predictors that are selected as important andAppendix D for the predictors that are not used in the classiﬁcation.

To determine the best ﬁt Fourier coefﬁcients, we run kinemetry multiple times. We ﬁrst allow the best ﬁt kine-matic PA and value of q f to vary for each radius. We deﬁnethe kinematic position angle (PA kin ) to be the median valueof the best ﬁt kinematic PAs. We then allow the value of q f tovary and determine the median value. After determining theglobal values for kinematic PA and q f , we do a ﬁnal run todetermine the values of the higher order kinematic moments and therefore the best ﬁt disk model. We then compare PA kin to the imaging major axis (PA img , which is measured using statmorph from the r − band imaging) to create the predic-tor ∆ PA. Since ∆ PA traces the recent global misalignmentsof stars, it should be elevated for the merging galaxies thathave misaligned stellar disks.We use the global kinematic position angle from kinemetry to measure ∆ PA instead of the median of the kinematic posi-0

Figure 2.

Stellar velocity map (left), bounded Absolute Radon Transform (middle), and Radon proﬁle (right) for a snapshot during the latestages of the q0.5_fg0.3 merger. In this case, the primary galaxy is blueshifted at the center of the velocity map (systemic velocity is ∼ -100km s − and the secondary galaxy approaches from the right and is redshifted relative to the primary galaxy. To compute the Radon Transform,the velocity ﬁeld is transformed into θ and ρ coordinates, where θ ⊆ [0,180] and ρ ⊆ [- ∞ ,+ ∞ ], where θ is measured CCW from the top ofthe map. The bounded Absolute Radon Transform is then calculated by creating line integrals over a grid of ( ρ , θ ) positions, where the lineis perpendicular to the kinematic center of the map. It is ‘bounded’ because the line integral is limited to the length R e . In the left panel,the kinematic center is a yellow star and the magenta and purple line segments demonstrate the calculation of the Absolute Radon Transformat θ ∼

45 for positive and negative ρ values, respectively. The magenta and purple regions in the middle panel have large and small values,respectively, which demonstrates that the value of the Absolute Radon Transform is smaller in the regions where the spaxel velocities vary lessalong the line integral. We ﬁnd the minima (shown in lighter yellow) of R AB at each value of ρ , to measure the Radon proﬁle (right), which isused to calculate the error-weighted asymmetry in the kinematic position angle, A . tion angles from the Radon Transform. The main motivationfor this choice is that kinemetry uses an adaptive binningscheme; at each step outwards, the ellipses are larger, whichgives less weight to the kinematic confusion at the outskirtsof the galaxy. The Radon Transform, however, is equallysampled in ρ (see Figure 2), so it can be more inﬂuenced bythe measurement of the kinematic PA at the outskirts of thegalaxy. In most cases, the two measurements agree withinerror, but in cases where the kinematic maps are disturbed,the global kinematic PA from kinemetry more closelymatches our by-eye assessment of the kinematic PA.We also extract ‘resid’, or the kinemetry residuals be-tween the best ﬁt rotating disk and the velocity map. Thispredictor is deﬁned as:resid = (cid:80) Ni , j | V ∗ − V model ( r , Ψ) | N (5)where V ∗ is the observed velocity map, V model ( r , Ψ) the circu-lar velocity model from kinemetry , and N is the numberof spaxels ﬁt. We include this normalization factor in orderto penalize the ﬁts that converge to a very inclined galaxy.For these galaxies, the ﬁt is attempting to avoid ﬁtting dis-ordered kinematics in the exterior regions of the galaxy byﬁtting a smaller region. We show an example of a simu-lated galaxy snapshot from the q0.5_fg0.3 simulation ﬁt with kinemetry and its velocity residuals in Figure 3.We measure λ R e , the approximate spin parameter, from thestellar velocity and velocity dispersion maps, which is de- ﬁned by Emsellem et al. (2007): λ R e = (cid:80) Nn = F n R n | V n | (cid:80) Nn = F n R n (cid:112) V n + σ n (6)where F n is the ( r − band) ﬂux of a spaxel, R n is the distancefrom the kinematic center, V n is the stellar velocity, and σ n isthe stellar velocity dispersion. We measure λ R e to the r − bandeffective radius. Since the ﬁber bundles are designed to pro-vide coverage of each galaxy to 1.5 R e , if a secondary nucleifalls towards the outside edge of the hexagonal FOV, it is ex-cluded from the measurement of λ R e . This effect is more rel-evant for the minor mergers, where the secondary componentcovers a smaller effective area of the hexagonal FOV.We measure the ellipticity of a galaxy, (cid:15) , from the r − bandphotometry using statmorph . It is distinct from the ellip-ticity parameter used by kinemetry to ﬁt rotation curves.We do not use (cid:15) as a kinematic predictor. Instead, we use it toconstruct the λ R e - (cid:15) diagnostic diagram in §4.1, where the di-vision between fast and slow rotators is deﬁned by Cappellari(2016): λ R e = . + (cid:15)/ λ R e - (cid:15) diagram, λ R e is the more predictive of the twoaxes; it decreases dramatically for the ‘slow-rotating’ pop-ulation of galaxies, which are dynamically disordered anddispersion-dominated. We predict that λ R e will decrease formerging galaxies since mergers are kinematically disordered1 Figure 3.

An example kinemetry ﬁt to a snapshot of the q0.5_fg0.3 merger simulation with observed stellar velocity map (left), best ﬁt kinemetry model (middle), and the model velocity subtracted from the stellar velocity map (right). Note that this is the same snapshot shownin Figure 2. The color bars show the velocity in km s − . In the left panel we overplot the contours from the r − band imaging and the imagingposition angle. The kinematic position angle (from kinemetry ) is the straight line in the middle panel. We utilize the normalized residualsas a predictor (right), which we refer to as ‘resid’, which is the sum over all spaxels of the absolute value of the difference of the stellar velocityand the model velocity, normalized by the number of spaxels in the model. Figure 4.

Distribution of the velocity dispersion values (in km s − )taken from each spaxel in the velocity dispersion map (inset, veloc-ity dispersion bar is in km s − and the spatial axis is in arcsec). Thissnapshot is also showcased in Figures 2 and 3. We also include themeasured values of the mean ( µ , σ ), dispersion ( µ , σ ), skew ( | µ , σ | ),and kurtosis ( µ , σ ) of this distribution. A distribution with a largerskew is asymmetric about the mean. A distribution with a positivekurtosis has a high degree of peakedness relative to a normal distri-bution. and can contribute to bulge-growth, which is associated withenhanced velocity dispersion.In addition to kinematic predictors that were utilized inprevious work, we deﬁne a new set of predictors based onthe distributions of values in the velocity and velocity dis-persion maps. These predictors include µ , V / µ , σ , µ , V / µ , σ , | µ , V | / | µ , σ | , and µ , V / µ , σ , which are the standardized mo-ments of the stellar velocity/velocity dispersion maps. Thesepredictors are similar to the formulation from Sweet et al.(2020), which calculates the moments of PDF( s ), where s isthe normalized speciﬁc angular momentum.To determine the values of these predictors, we measurethe four standardized moments of the distribution; mean ( µ ), variance (standard deviation; µ ), skewness ( µ ), and kur-tosis ( µ ). This produces eight different predictors (foureach from the velocity and velocity dispersion distributions).These quantities are different from the higher order moments h and h , which are typically measured by ppxf . We showan example of these predictors measured from a velocity dis-persion map in Figure 4.We expect to see an offset in the mean velocity ( µ , V ) fromsystemic for the merging systems and an enhanced mean ve-locity dispersion ( µ , σ ). The spread in the velocity distribu-tion ( µ , V ) and the dispersion of the velocity dispersion dis-tribution ( µ , σ ) could identify superpositions of dynamicallydistinct stellar components. This could include a secondarymerging galaxy or features like a stellar bulge.The higher order moments could be useful for identifyingsubtler features of mergers, beyond bulk shifts in µ , V , for ex-ample. The skewness of a distribution is sensitive to the tails;we take the absolute value to treat positive and negative skewidentically. A skewed velocity or velocity dispersion distri-bution ( | µ , V | and | µ , σ | ) could have a faint secondary sourcein the ﬁeld of view, where the distribution is actually a com-bination of two galaxy rotation curves. Kurtosis measureshow peaked a distribution is relative to normal; a ﬂatter dis-tribution has a negative kurtosis and a more peaked distribu-tion has a higher peak. A smoothly rotating velocity disper-sion ﬁeld has a normally shaped distribution whereas a dis-turbed ﬁeld may have a negative (ﬂatter) kurtosis ( µ , σ ). Onthe other hand, post-coalescence mergers with recent bulgegrowth could have a positive kurtosis in the velocity disper-sion distribution.To summarize, we extract the following kinematic predic-tors: A , A , ∆ PA, v asym , σ asym , resid, λ R e , (cid:15) , ∆ x V , ∆ x σ , µ , V , µ , σ , µ , V , µ , σ , | µ , V | , | µ , σ | , µ , V , and µ , σ . We then use thetechniques described in the following sections (§3.1, §3.2,and §3.3) to select the most informative of these predictors.We ultimately use the following predictors in the LDA clas-2siﬁcation: A , ∆ PA, resid, λ R e , µ , V , µ , σ , µ , V , µ , σ , | µ , V | , | µ , σ | , µ , V , and µ , σ . RESULTSAfter creating mock MaNGA datacubes from the ﬁve sim-ulations of merging galaxies (and matched isolated galaxies),we extract the kinematic predictors introduced in §2.3. Wethen prepare the input data, select the predictors that are mostinformative, and create and assess the classiﬁcation itself.In §3.1, we describe the LDA technique. We then providean overview of our process for preparing the data and exam-ining it in the context of the assumptions made by the LDA in§3.2. Prior to running the LDA classiﬁcation, we perform aninitial term selection using a random forest regressor, whichwe describe in §3.3. We present the classiﬁcation results in§3.4 and measure performance statistics in §3.5. We presentthe LDA observability time in §3.6. Then, in §3.7 we ex-plore some failure modes of the classiﬁcation. Finally, weanalyze how the classiﬁcation changes with redshift and de-creasing signal-to-noise (S/N) in §3.9. More details of theclassiﬁcation are discussed in the appendices, where we ana-lyze possible biases of the classiﬁcation in §E.3.1.

Linear Discriminant Analysis

The classiﬁcation in this work relies upon an LDA tech-nique that separates nonmerging galaxies from merginggalaxies based upon a combination of the input predictors(for a review of LDA, see James et al. 2013). This approachwas ﬁrst presented in N19 for imaging predictors; here, weuse this approach for kinematic predictors.LDA is one of many statistical learning tools that per-form classiﬁcation tasks. Using pre-deﬁned features (pre-dictors) as inputs, LDA solves for the hyperplane in multi-dimensional predictor space that maximizes the separationbetween different classes of objects (i.e., mergers and non-mergers). The solution is a linear combination of the inputpredictors; the classiﬁcation is therefore relatively easy to in-terpret because its complexity is low. Recent work has em-ployed other techniques to identify merging galaxies, such asrandom forest regressors (e.g., Snyder et al. 2019; Gouldinget al. 2018) and convolution neural networks (CNNs; e.g.,Bottrell et al. 2019). These techniques have various advan-tages and disadvantages based upon the dataset at hand andthe goals of the work. Since we aim to optimize the inter-pretability of the method, we select LDA over an approachlike a CNN. CNNs might increase the number of correct clas-siﬁcations, but they achieve this using complex non-linearfeatures, which are not easily interpreted.In this work, we have made several important changes tothe technique from N19. We ﬁrst recap the relevant detailsfrom the LDA in N19, and then we discuss the changes. Rel-evant details of the LDA technique from N19 include: • All predictors are linearly standardized prior to theLDA technique, meaning that predictors with large nu-merical values (such as A ) do not have an outsizedeffect on the analysis.• We utilize priors on the relative fraction of mergingand nonmerging galaxies in nature versus in the sim-ulations. This accounts for the fact that we havemore merging galaxy snapshots (relative to nonmerg-ing snapshots) for each simulation. We use the samepriors from N19; f merg = . f merg = . σ errors, where σ is the stan-dard deviation on the F1 statistic measured from each k − fold cross-validation set).• We use k -fold cross-validation to obtain 1 σ errors onthe predictor coefﬁcients. At each step of the forwardstepwise selection process, we divide the sample into k subsets. We then train the LDA on the ﬁrst k − k timesfor all combinations of subsamples and the variation inpredictor coefﬁcient values from the cross-validationsubsamples is the 1 σ error.For complete details, including the full mathematical for-mulation for LDA, see N19.We make several changes to the technique motivated bythe additional challenges of the kinematic data:• Due to the number of predictor terms in this work, in-stead of including all of the kinematic predictors in theﬁnal classiﬁcation, we ﬁrst utilize the RFR techniqueas a selection technique to eliminate predictors that areuninformative from the analysis (see §3.3).• We adjust the model optimization statistic. In N19, weminimized the number of misclassiﬁcations in order to3both select the predictors and determine their coefﬁ-cients. Here, we utilize the F1 statistic deﬁned in §3.5instead; it does a better job of balancing the numberof false negatives and false positives in each classiﬁca-tion.• We also adjust the k − fold cross-validation, Instead ofusing k =

10, we ﬁnd that a value of k = Data Preparation and LDA Assumptions

Prior to term selection and classiﬁcation, we examine thedistributions of predictor values. We screen for outliers andexamine the data in the context of the assumptions made bythe LDA. The goal is to gain an understanding of the proper-ties of the data prior to classiﬁcation.First, we remove outliers by transforming the distributionof each predictor into log space. We deﬁne data points thatfall more than 5 σ above or below the mean of the distributionfor each predictor as outliers. The combination of the logtransformation and 5 σ cutoff allow us to identify outliers thatare caused by errors in the creation of the mock maps andnot simply related to very disturbed kinematics. There are ∼ Random Forest Regressor Term Selection

In N19, we used a forward stepwise selection techniquewithin the LDA to select informative predictors. Here, moti-vated by both an increase in the number of initial terms anda decrease in the predictive power of these terms, we modifythe term selection procedure. We introduce a random forestregressor (RFR) into the methodology to select a subset ofpredictors for each simulation, which will then be presentedto the LDA classiﬁer.An RFR (Ho 1995) is an ensemble learning technique thataggregates the result of many individual decision trees run inparallel. We speciﬁcally utilize the scikit-learn imple-mentation of RFR (Pedregosa et al. 2011). In an RFR, thenumber of features that can be used to split at each node ofthe decision tree is limited to a percentage of the total numberof features, ensuring that the ensemble model does not relytoo heavily on any one feature. This means that the RFR isable to combine all potentially predictive variables in a fairway. It is also able to incorporate non-linear features to cap-ture some higher-order interaction terms. In practice, we ﬁndthat the RFR is an efﬁcient method to initially identify theuseful features in the dataset from the extensive list of kine-matic predictors. In order to select the informative terms from the RFR, weinclude an additional predictor. This predictor is assigneda random number for each galaxy snapshot and thereforeshows no signiﬁcant difference between the nonmerging andmerging galaxies. We use this technique to eliminate all ofthe terms that have a feature importance less than the randomterm for all simulations. In this step we eliminate the v asym , σ asym , A , ∆ x V , and ∆ x σ predictors. Then, for each individualsimulation, we additionally eliminate predictors that have animportance less than the random value prior to initiating the This is partially due to a dearth of historically utilized kinematic pre-dictors, so we initially introduce many more terms to determine which areinformative. We do not use it as the main classiﬁcation technique because the featurescan be highly nonlinear and more opaque to interpretation. Additionally, thistechnique is designed to directly complement the LDA technique in N19 forcomparison’s sake.

Classiﬁcation Results

After using the RFR term selection to narrow the numberof kinematic terms down to 11 ( ∆ PA, resid, λ R e , A , µ , V , µ , V , µ , σ , | µ , V | , | µ , σ | , µ , V , and µ , σ ), we run the LDAclassiﬁcation for each simulation individually. We also com-bine the three major mergers into a combined major mergerclassiﬁcation and the two minor mergers into a combined mi-nor merger classiﬁcation. We run the LDA with interactionterms; the result is a linear combination of selected predic-tors and coefﬁcients which is unique for each simulation. Wepresent the term coefﬁcients and standard errors for the fourmost important terms and the intercept term in Table 3. Fi-nally, we brieﬂy discuss the main results of the LDA classi-ﬁcation for each simulation, which we will examine in moredetail in §4.We ﬁrst introduce the mechanics of the classiﬁcation. LD1,which is the ﬁrst linear discriminant axis, is formed from thelinear combination of coefﬁcients multiplied by the standard-ized predictors and an intercept term:LD1 = C ∗ X + B where C is the matrix of coefﬁcients, X is the standardizedvalues of the selected predictors, and B is the intercept term.LD1 is the hyperplane that best separates the populationsof merging and nonmerging galaxies for each simulation. Weuse the result from the major merger classiﬁcation as an illus-trative example of how to interpret the LDA results. The LD1for the major merger combined run (truncated after seventerms) is:LD1 all major = − . λ R e + . | µ , σ | + . µ , σ ∗ λ R e − . µ , σ ∗ | µ , σ | − . µ , σ ∗ resid + . µ , V ∗ λ R e + . µ , σ ∗ µ , V + ... − . p merg value of 0.5; all galaxies with an LD1 value greaterthan zero would be classiﬁed as merging using a threshold value of 0.5 . We ﬁnd that the classiﬁcation is better able toseparate the merging and nonmerging classes for the majormerger simulations and this is reﬂected in Figure 5.There are important nuances to the interpretation of the se-lected predictors and their coefﬁcients in Equation (8) be-cause the interaction terms complicate the analysis. For in-stance, in Equation (8), the ﬁrst selected predictor is λ R e ,which has a negative coefﬁcient. Ignoring the rest of theequation, this means that if the λ R e value is large, then theprobability that a given galaxy is merging will decrease.However, there are other λ R e terms in the equation that arecoupled with other predictors in interaction terms. Thismeans that tweaking the value of λ R e will not linearly changethe value of LD1. While the interaction terms complicate theanalysis, they are an integral part of the classiﬁcation. Manyof the most important terms for LD1 in Table 3 are interactionterms, and including them signiﬁcantly improves the perfor-mance of the LDA. As we discuss in more detail in §4.4,these terms are able to capture the non-monotonic movementof mergers through predictor parameter space.While it may be difﬁcult to untangle many of the contribut-ing terms, we can use Table 2 to determine which predictorsare most prevalent and therefore informative for each simula-tion. For instance, the µ , σ and µ , σ predictors are selected aseither primary or interaction terms for all simulations. Theyare therefore universally useful kinematic predictors (for afull discussion of why these terms are important see §4.2.2).The selected predictors from the q0.333_fg0.1 simulation aresimilar to the q0.333_fg0.3 simulation. These two simula-tions are matched for mass ratio but not for gas fraction.However, the difference between a gas fraction of 0.1 and0.3 is insigniﬁcant so we hesitate to make any conclusionsabout the impact of gas fraction on the stellar kinematics.On the other hand, the minor merger simulations differ fromthe major merger simulations in the selected predictors. Weﬁnd that λ R e and | µ , σ | are important for the major mergerswhile the minor mergers rely more on some of the higherorder terms like µ , V and µ , σ . We explore the implicationsof these ﬁndings for the physical nature of the kinematics ofmergers in the discussion (§4).3.5. Performance Statistics and Hyperparameter Tuning

Here, we deﬁne and measure the accuracy, precision, re-call, and F1 statistic of the simulations (for a review, seeFawcett 2006). We present these results using a confusionmatrix for the major and minor combined simulations in Fig-ure 6, which shows the relative fraction of known mergersand nonmergers in the cross-validation samples that are clas-siﬁed by the LDA as merging and nonmerging. These quanti- This decision boundary can be moved either before the creation of theLDA or after to be more or less tolerant of false negatives and false positives. Table 3.

The ﬁnal LD1 predictor coefﬁcients ( ˆ (cid:126) w ) with 1 σ conﬁdence intervals after term selection for the ﬁrst four most important terms andthe intersect ( ˆ w ) for all simulations. ˆ (cid:126) w ˆ w Simulation 1 2 3 4All Major -6.76 ± λ R e ± | µ , σ | ± µ , σ * λ R e -4.44 ± µ , σ * | µ , σ | -1.21 ± ± µ , σ -4.97 ± µ , σ * µ , V ± µ , V * µ , σ ± µ , V -0.76 ± ± µ , σ * | µ , σ | -6.7 ± µ , σ * µ , σ ± µ , σ ± µ , σ -2.57 ± ± µ , σ -7.84 ± µ , σ * µ , σ ± µ , σ ± | µ , σ | -0.77 ± ± µ , σ * µ , σ ± µ , σ ± µ , σ – -0.26 ± ± µ , σ -6.2 ± µ , σ * λ R e -5.75 ± A ± µ , σ * λ R e -0.79 ± ± µ , σ * µ , V ± µ , σ -12.88 ± µ , V ± µ , σ -1.06 ± R e l a t i v e C oun t Major Combined

NonmergerMerger10.0 7.5 5.0 2.5 0.0 2.5 5.0 7.5LD10.000.25 R e l a t i v e C oun t Minor Combined

NonmergerMerger

Figure 5.

Histograms of LD1 for the populations of merging and nonmerging galaxies for the combined major merger (top) simulation andthe combined minor merger (bottom) simulation. The blue nonmerging samples include both the stand-alone isolated galaxies and the pre- andpost-merger isolated galaxies. The nonmerging galaxies in the top and bottom plots span different ranges in LD1 because they are composed ofdifferent samples of nonmerging galaxies and because the selected linear combination of predictors is different for the major and minor mergercombined simulations. The vertical black line is the decision boundary; it is the midway point between the mean of the nonmerger and mergerpopulations. If the LD1 value of a galaxy falls above this line, the galaxy is more likely to be a merger. ties are derived by taking the mean of the performance statis-tics measured on each of the cross-validation samples. Wequantify the accuracy, precision, recall, and F1 score for allsimulations in Table 4.The accuracy for a given simulation is deﬁned as the num-ber of correct classiﬁcations of mergers as mergers (true pos-itives) and the number of correct classiﬁcations of nonmerg-ers as nonmergers (true negatives) divided by the number oftotal classiﬁcations: A = T P + T NT P + T N + FP + FN (9)where FP is the number of false positives, or nonmerginggalaxies that are classiﬁed as mergers, and FN is the num-ber of false negatives, or mergers that are classiﬁed as non-merging. A classiﬁer has a higher accuracy when it is able Simulation Accuracy Precision Recall F1All Major 0.81 0.95 0.76 0.84All Minor 0.69 0.87 0.51 0.64q0.5_fg0.3 0.81 0.92 0.60 0.73q0.333_fg0.3 0.80 0.90 0.79 0.84q0.333_fg0.1 0.80 0.93 0.80 0.86q0.2_fg0.3_BT0.2 0.73 0.83 0.62 0.71q0.1_fg0.3_BT0.2 0.81 0.83 0.69 0.75

Table 4.

Accuracy, precision, recall, and F1 score for all LDA runs.We deﬁne these statistics in Equations 9, 10, 11, and 12, respec-tively. The recall value is much lower than precision in all cases be-cause there is a much higher fraction of false negatives, or mergersthat are missed by the method, yet there is a low value of contam-inants, or false positives. The performance statistics of the majormerger classiﬁcations are ∼

10% higher than the minor merger clas-siﬁcations. Figure 6.

Confusion matrices with the number of true negatives(upper left quadrant), false positives (upper right), false negatives(lower left), and true positives (lower right) for the major merger(top) and minor merger (bottom) combined simulations. These ma-trices show the mean number of galaxy snapshots in each categoryfrom the ﬁve ( k =

5) different CV samples. to increase the number of true classiﬁcations relative to falseclassiﬁcations.Precision is deﬁned as the number of true positive classiﬁ-cations over the total number of positive classiﬁcations: P = T PT P + FP (10)A precise classiﬁer maximizes the fraction of true positiveclassiﬁcations relative to false positives. Precision is alsoknown as the ‘positive predictive value’. In this work, weseek to eliminate false positives from the sample, or non-merging galaxies that are incorrectly classiﬁed as mergers.Recall is deﬁned as the number of true positive classiﬁca-tions over the total number of known mergers: R = T PT P + FN (11)A classiﬁer with high recall is also known as ‘complete’ be-cause it correctly identiﬁes the majority of mergers as such. Finally, we measure the F1 score or the F1 statistic, whichis the harmonic mean of the recall and precision: F = P ∗ RP + R (12)F1 ranges in value from 0 to 1 and is strongly penalized ifeither precision ( P ) or recall ( R ) is small. We maximize theF1 statistic within the LDA during cross-validation in orderto select the predictor terms that we use in the classiﬁcation.Figure 6 presents the number of true negatives, false pos-itives, false negatives, and true positives (left to right, top tobottom) for the combined major and minor merger simula-tions. It also quantiﬁes the accuracy, precision, recall, and F1score. The major merger classiﬁcation performs better, withan accuracy/precision/recall values of 0.81/0.95/0.76 whilethe minor mergers have values 0.69/0.87/0.51. The imbal-ance between precision and recall is due to the priors uti-lized in the classiﬁcation (we use the priors from N19 where f merg = . p merg ) or by changing the priors. This could be a directionto pursue in future work if we ﬁnd that we are no longertolerant of false negatives or if we wish to adjust the priorson f merg . As a test, we adjust the priors so that f merg = . f merg = . , . ∼

10% lower on all7statistics than the major mergers. We discuss the implicationsof the performance of the kinematic predictors in comparisonto the imaging predictors and the performance of the majorversus minor merger classiﬁcations and in §4.8.3.6.

Observability Timescale

The LDA observability timescale is deﬁned as the sum ofall consecutive snapshots where the viewpoint-averaged LD1value for a given snapshot is greater than zero. We presentthe observability timescales for all of the simulations in Ta-ble 5 along with the total merger duration for each simulationand the fraction observability, or fraction of the duration ofthe merger that it is observable by the LDA technique. Weexclude the combined major and minor mergers from this ta-ble since they are built from mergers that progress at differentrates.All of the simulations have a relatively long timescale ofobservability (2 − σ conﬁdence interval on this value andthe total range. We also plot the decision boundary for eachsimulation, which falls at an LD1 value of zero (horizontalline). The minor mergers do not fall signiﬁcantly above thisline; even though the viewpoint-averaged LD1 values for theq0.2_fg0.3_BT0.2 simulation fall above the decision bound-ary, they overlap the decision boundary to 1 σ conﬁdence atalmost all points in time. This means that not all viewpointsare signiﬁcantly above this boundary. On the other hand, themajor merger simulations are signiﬁcantly above this bound-ary for the majority of their duration. While this is not seenin Figure 7, the q0.5_fg0.3 simulation is an outlier when itcomes to the LDA observability time.Figure 7 also demonstrates how sensitive the LDA clas-siﬁcation is to the merger stage. For instance, there are anumber of false negatives from the early stage of the mergerswhen the galaxies are more disk-like. Another key ﬁndingis that the sensitivity of the technique decays slowly duringthe post-coalescence and post-merger stages. We discuss theimplications of the different observability timescales and thevariations with time in more depth in §4.5, 4.6, and 4.7.3.7. Where and why does the LDA fail?

Here we summarize the factors that are most likely to leadto false classiﬁcations (false positives and false negatives) forthe different simulations. Our goal is to identify the primary failure modes of the classiﬁcation and assess if it is makingreasonable choices. In other words, we should be concernedif our by-eye classiﬁcation disagrees with the majority of thefalse classiﬁcations.We present a visual version of a confusion matrix for theq0.5_fg0.3 and q0.2_fg0.3_BT0.2 classiﬁcations in Figures8 and 9, respectively. These simulations are representative ofthe results from the major and minor mergers, respectively.We generate example velocity and velocity dispersion mapsfor each classiﬁcation category (in rows, top to bottom: TP,TN, FN, FP) by combining the results of each iteration of the k − fold cross-validation and then randomly selecting exam-ple snapshots from each category.In Figure 8, after a by-eye examination, it makes sense thatmany of the false negatives and false positives are incorrectlyclassiﬁed in the q0.5_fg0.3 simulation. The false negatives(third row) are orderly rotating with relatively low velocitydispersions. These look like the pre-merger isolated popu-lation shown in the true negatives row. The majority of thefalse positives (fourth row) are post-merger snapshots thathave kinematic disturbances.The incorrect classiﬁcations for the major mergers canmostly be attributed to two factors. First, the false nega-tives are due to a lack of disturbed features, meaning thatit is difﬁcult to correctly classify many of these snapshotsas mergers. Despite these limitations, the classiﬁcation doescorrectly identify the majority of early stage snapshots asmergers, meaning that it is out-performing the by-eye assess-ment in many cases. Second, kinematic disturbances that areinduced by the merger persist into the post-merger stages,producing a number of false positives. These kinematic fea-tures are very similar to the features in the post-coalescencestages so it makes sense that these are commonly classiﬁedas false positives. As we discuss in more detail in §4.6, ourdeﬁnition of the ‘end’ of the merger (the dividing line be-tween post-coalescence and post-merger stages) is somewhatarbitrarily deﬁned, and results in a number of classiﬁcationsfrom these two stages.In Figure 9 we ﬁnd that it is more challenging to cor-rectly classify the nonmerging and merging galaxies in theq0.2_fg0.3_BT0.2 simulation using a by-eye assessment. Forinstance, the false negatives are very visually similar to thetrue negatives in their kinematic features. The same is truefor the false positives, which are similar to the example truepositives. The exception is a number of obvious merger snap-shots (we show an example of one such snapshot in the uppermiddle pair of panels) where the kinematics are dramaticallyaffected. However, these disturbances are short-lived, so themajority of the merging snapshots appear like the examplein the upper right corner of the diagram. This diagram illus-trates the crux of the problem for the minor mergers. Whilethe LDA is able to pick up on a number of subtle features8 Simulation LDA Observability Time [Gyr] Total Merger Time [Gyr] Observability Fractionq0.5_fg0.3 0.9 2.2 0.4q0.333_fg0.3 2.2 2.6 0.8q0.333_fg0.1 2.4 2.8 0.9q0.2_fg0.3_BT0.2 3.0 3.5 0.9q0.1_fg0.3_BT0.2 6.6 9.2 0.7

Table 5.

The duration of the merger, LDA observability time, total merger time, and observability fraction (LDA observability time/total mergertime) for each simulation.

Merger Timeline [Gyr]

LD1 q = 0.5, f g = 0.3 Early Late Post CoalescenceFirst pericentricpassage CoalescenceStand-aloneisolated galaxiesPre-mergerisolatedgalaxies Post-mergerisolated galaxies Merger Timeline [Gyr]

LD1 q = 0.2, f g = 0.3, B/T = 0.2 Early Late Post CoalescenceFirst pericentricpassage CoalescenceStand-aloneisolated galaxiesPre-mergerisolated galaxies Post-mergerisolated galaxies Figure 7.

LD1 sensitivity with time for the q0.5_fg0.3 (top) and q0.2_fg0.3_BT0.2 simulations (bottom). These two simulations are chosenbecause they are representative of the major and minor merger simulations, respectively. The points are the viewpoint-averaged value of LD1for each snapshot in time along with the shaded 1 σ conﬁdence intervals (darker shade) based on the scatter of the LD1 values for each snapshot.We also include the full range of values for each snapshot (lighter shade). We divide each plot into the early, late and post-coalescence stagesof the merger. The blue lines and shaded 1 σ conﬁdence intervals are associated with the isolated galaxies for each simulation. This includesthe pre- and post-merger isolated galaxies (circles) and the stand-alone isolated galaxies (squares). The horizontal black line is the decisionboundary, which marks the divide between the merging and nonmerging galaxies, or p merg = .

5. This ﬁgure demonstrates that the majormergers have little to no overlap with the isolated galaxies, which produces a more accurate and complete classiﬁcation (see §3.5). The LD1sensitivity plots for all of the simulations will be available in an interactive ﬁgure. (such as stellar bulge enhancements), it ultimately struggleswith a number of limitations related to the lack of identiﬁabil-ity of all of the stages of the merger as such. These challengescontribute to a minor merger classiﬁcation that has lower per-formance statistics than the major merger classiﬁcation.Overall, the LDA is not misclassifying obvious (by-eye)mergers or nonmergers. The lack of identiﬁability of merg-ers/nonmergers given their kinematic maps is therefore thelargest challenge for this technique. Other work highlights this same challenge with kinematic predictors; Hung et al.(2016) ﬁnd that a signiﬁcant fraction of merging galaxy kine-matics remain indistinguishable by-eye relative to the non-merging kinematics. This indicates something fundamentalabout galaxy kinematics: namely, that we are not missingobvious features and instead that merging galaxies are oftenindistinguishable from nonmergers.3.8.

The role of viewing angle in the classiﬁcation Figure 8.

Correct and incorrect classiﬁcations from the cross-validation sets for the q0.5_fg0.3 simulation, which is representative of the majormerger simulations. The correct classiﬁcations include true positives (ﬁrst row) and true negatives (second row) and the incorrect classiﬁcationsinclude false negatives (third row) and false positives (fourth row). We include the number of (non-repeated) galaxies in each category andthree examples per row of galaxies from the cross-validation sample. The velocity and velocity dispersion maps for each example galaxy covertwo consecutive panels, which is shown with alternating white and grey backgrounds.

It is well known that many kinematic predictors, such as λ R e , are correlated with galaxy inclination (e.g., Cappellariet al. 2007; Emsellem et al. 2011; Harborne et al. 2019). Inthis section we examine how the viewing angle, which is aproxy for inclination, affects the kinematic predictors and ul-timately, the LDA.As described in §2.1, there are seven isotropically dis-tributed viewpoints (0-6) at each snapshot. Critically, the in-clinations are not an exact match between the stand-aloneisolated galaxies and the merging galaxies. For instance,viewpoints 0 and 4 are the most face-on viewpoints for themerging galaxies, but viewpoints 4, 5, and 6 are the mostface-on for the stand-alone isolated galaxies.We ﬁrst explore how inclination affects the λ R e predictorin Figure 10, where λ R e increases as the galaxy inclinationincreases. For instance, viewing angles 0 and 4 are the mostface-on and they also have the lowest values of λ R e . Whenthe 1 σ errorbars are taken into consideration, the differencein λ R e values is marginally signiﬁcant. This is fully con-sistent with the results from Emsellem et al. (2011), whopredict that the measured λ R e and (cid:15) values of an axisym-metric rotating oblate spheroid vary with viewing angle(see Figure 3 of Emsellem et al. 2011). These errorbars are the standard deviation in λ R e values from all of the differentmoments in time of the merger. We observe a larger differ-ence in the λ R e values as a function of time (§4.2.1) than as afunction of viewpoint.We next investigate how the LDA classiﬁcation changesas a function of viewpoint. To visualize this, we plot thedistribution of LD1 values in Figure 11. We include thehistograms of the LD1 value for the nonmerging (blue) andmerging (red and orange) galaxies from both the q0.5_fg0.3simulation (left panel) and the q0.2_fg0.3_BT0.2 simulation(right panel). We then overplot the mean and standard de-viation of the LD1 values for all snapshots of each speciﬁcviewpoint. Focusing on the mean LD1 values for the mergingsample, we can determine if the LDA is varying as a functionof viewpoint.Focusing ﬁrst on the left panel of this ﬁgure, which is forthe q0.5_fg0.3 simulation, the means for both the nonmerg-ing and merging galaxies are not signiﬁcantly different asa function of viewing angle. In fact, the mean LD1 valuesare more similar than the variation we observe in LD1 as afunction of time in Figure 7. The implication is that the ma-jor merger LDAs are fairly robust to viewing angle. For theq0.2_fg0.3_BT0.2 simulation, we observe slightly more vari-0 Figure 9.

Same as Figure 8 but for the q0.2_fg0.3_BT0.2 simulation, which is representative of the minor merger simulations. ation in the LD1 distribution as a function of viewpoint, andthe most face-on viewpoints (0 and 4) have lower LD1 val-ues, which result in more false negative detections at theseviewpoints.To further quantify if the LDA is changing as a functionof viewpoint, we iteratively drop the merging galaxies ateach viewpoint from the analysis and rerun the classiﬁca-tion for the q0.5_fg0.3 and q0.2_fg0.3_BT0.2 simulations.If the classiﬁcation changes, this could indicate that a givenviewing angle and/or inclination is signiﬁcantly more or lessaccurate than the other viewpoints, which would point to in-clination itself being the primary driver of this difference.From here on, we determine that the LDA is ‘signiﬁcantlydifferent’ from the ﬁducial run if either of the following cri-teria are met: ﬁrst, the majority of the selected predictors inthe top four selected terms must change or second, the perfor-mance statistics in Table 4 must change by more than 10% onaverage. This quantiﬁcation of a signiﬁcantly different clas-siﬁcation applies to this section, where we explore the roleof different viewing angles, and also to §3.9 where we exper-iment with changes in the data reduction (i.e., changing theS/N or redshift of the simulated galaxies).When we rerun the LDA classiﬁcation for the q0.5_fg0.3simulation iteratively without each viewpoint, the LDAs arenot signiﬁcantly different than the ﬁducial run. This conﬁrmsour ﬁndings in Figure 11. For the q0.2_fg0.3_BT0.2 classi- ﬁcation, we ﬁnd that when viewpoints 2, 5, and 6 are absent,the classiﬁcation is signiﬁcantly different with lower perfor-mance statistics and different selected predictors. Our inter-pretation is that the minor mergers are best identiﬁed whenthe secondary nuclei is within the ﬁeld of view, which hap-pens most often in viewpoints 2, 5, and 6. Therefore, thesigniﬁcant changes to the classiﬁcation as a function of view-point both in Figure 11 and in the rerun of the LDA withoutthese viewpoints can be attributed to the chance positioningof the secondary galaxy as a function of viewpoint. Thismeans that inclination-related effects on the intrinsic kine-matic properties of the primary galaxy are not primarily re-sponsible for the differences in the q0.2_fg0.3_BT0.2 LDAas a function of viewpoint. As a ﬁnal note, in Appendix Ein our discussion of between-class biases, we introduce theinclination itself as a predictor in the LDA. We ultimately de-termine that (cid:15) , which we use as a proxy for inclination, is notan important predictor. This further supports the ﬁnding thatchanges in the kinematic predictors purely due to inclinationeffects are not biasing the LDA classiﬁcation itself.To conclude, we have determined that while the kinematicpredictors themselves can vary as a function of viewing an-gle and/or galaxy inclination, the LDA classiﬁcation is onlysensitive to viewpoint in the sense of the relative positioningof the secondary nucleus relative to the line of sight vector.1

Figure 10.

Distribution of the mean values of λ R e and (cid:15) as a function of viewpoint (top) and the full resolution r − band images (middle) andstellar velocity maps (bottom) for all of the different viewpoints from a snapshot in time for the q0.5_fg0.3 simulation. The more face-onviewpoints (i.e., 4), tend towards lower values of λ R e and (cid:15) , while the more edge-on viewpoints (i.e., 5) tend to have a larger λ R e value. Wealso include error bars to demonstrate the standard deviation of the spread at each viewpoint from all of the different moments in time of thissimulation. While there is a relationship between inclination and λ R e , the trend is borderline signiﬁcant. This is consistent with the trend ofvarying λ R e and (cid:15) values with viewing angle from Emsellem et al. (2011) . Figure 11.

As in Figure 5, histograms of the LD1 value for the q0.5_fg0.3 simulation (left) and the q0.2_fg0.3_BT0.2 simulation (right).We also overplot the time-averaged LD1 values for each viewpoint and errorbars to demonstrate the 1 σ variation amongst these values for allmoments in time. There is less variation as a function of viewpoint than as a function of time (shown in Figure 7). Figure 12.

The g − band S/N (left), stellar velocity (middle), andvelocity dispersion (right) maps from a snapshot of the q0.5_fg0.3simulation. We have decreased the S/N by a factor of two (secondrow) and redshifted the galaxy to z = . z = .

1, the classiﬁcation begins to change, as this is the point atwhich the larger-scale kinematic features are distorted by the largespaxel size.

Limitations of the technique in z and S/N

As with the imaging identiﬁcation technique in N19, thekinematic technique is sensitive to both S/N and resolution,meaning that as the S/N decreases in the spectra and/or as theredshift of the galaxy increases, the technique will undergosigniﬁcant changes. We test the sensitivity by decreasing theS/N and by moving the mock galaxies to higher redshift.To test how sensitive the classiﬁcation is to decreased S/N,we decrease the average S/N of q0.5_fg0.3 simulation by fac-tors of 1.5 and 2. In Figure 12 we compare a snapshot withS/N that has been decreased by a factor of two to the samesnapshot from the ﬁducial run. When the S/N is decreasedby a factor of 2, the classiﬁcation is signiﬁcantly different.While many of the predictors stay the same, the performancestatistics decrease overall and there is an increase in the num-ber of false negatives during the early stages of the merger.When the S/N is decreased, the Voronoi bins increase in sizein the exterior regions of the galaxy. This obscures the large-scale kinematic features, which lowers the performance ofthe classiﬁcation. We predict that MaNGA galaxies with lowS/N may therefore be more likely to be misclassiﬁed.There are a couple of approaches we plan to explore whenclassifying MaNGA galaxies with different S/N ratios. Oneoption is to implement a S/N cut when we apply the ﬁducial

Figure 13.

Same as Figure 12 but for a sample of MaNGA galaxiesthat span a range in surface brightness, redshift, and stellar mass.At low S/N (i.e., ﬁrst and third rows) the velocity maps have largeVoronoi bins and some kinematic predictors will be difﬁcult to mea-sure. As discussed in the text, the higher redshift galaxies (ﬁrst andfourth row) in MaNGA also tend to be larger and more massive. classiﬁcation to the MaNGA galaxies. However, MaNGAgalaxies with lower S/N need not be excluded from classi-ﬁcation. Instead, another option is to use the classiﬁcationand completeness correction from the lower S/N ratio LDArun to classify these galaxies separately. We predict that weshould be able to classify the majority of the MaNGA samplesince the ﬁducial (non-decreased) S/N of the simulation suiteis representative of the MaNGA sample.For comparison’s sake, in Figure 13 we present a sample ofMaNGA galxies that span a range in surface brightness, red-shift, and stellar mass. The approximate stellar mass is fromthe NSA catalog and is estimated using the kcorrect code(Blanton & Roweis 2007). The simulated galaxies, whichhave stellar masses log M ∗ ∼ z = 0.03,which is the median redshift of galaxies observed byMaNGA. In order to understand the limitations of the iden-tiﬁcation over the full range of redshifts for the MaNGAsurvey (0.01 < z < 0.15), we experiment with increasing theredshift of the mock galaxies. To do this, we increase thespaxel size from 0 . (cid:48)(cid:48) . (cid:48)(cid:48) . (cid:48)(cid:48) . (cid:48)(cid:48) . (cid:48)(cid:48)

5. This mimics the effects of movingthe simulated galaxies to a redshift of z = .

07 and z = . z = .

03. Thisis because we want to understand the effects of the apparentsize of galaxies independently of S/N effects.The classiﬁcation does not change signiﬁcantly when thegalaxies are placed at a redshift of z = .

07. At z = .

1, theclassiﬁcation is signiﬁcantly different; the number of mis-classiﬁcations increases and the selected terms are different.However, these results are based on a galaxy-galaxy mergerwhere each galaxy has a stellar mass on order 10 M (cid:12) . Whileit is a valid conclusion that the technique will struggle onan intermediate mass galaxy at z = .

1, the MaNGA sam-ple does not tend to include this type of galaxy. Instead,MaNGA is designed to maintain roughly uniform coverage inlog M ∗ and radial coverage, meaning that higher mass galax-ies ( > M (cid:12) ), which are more luminous and have largerangular sizes (i.e., the fourth row of Figure 13), are observedprimarily at higher redshift, somewhat alleviating this con-cern (Bundy et al. 2015; Wake et al. 2017).3.10. Limitations of the technique in stellar mass and B/T

The simulation suite of merging galaxies is limited in stel-lar mass and B/T ratio. The mergers can all be character-ized as intermediate-mass disk-dominated galaxies that spana range of 3 . − . × M (cid:12) in stellar mass and 0 - 0.2in initial B/T ratio. These limitations are especially impor-tant given that many of the leading kinematic predictors ( λ R e , µ , V , µ , σ , µ , V , and µ , σ ) are related to the intrinsic kine-matic properties of galaxies. For instance, µ , σ is a proxyfor stellar mass, so we are skeptical if this classiﬁcation canbe reliably applied to galaxies that differ in properties, i.e.,bulge-dominated elliptical galaxies.One possible approach to circumvent these concerns is toremove these predictors from the classiﬁcation. We rerun theLDA for all simulations without the λ R e , µ , V , µ , σ , µ , V , and µ , σ predictors and ﬁnd that the performance signiﬁcantlydecreases. Speciﬁcally, the accuracy, recall, and F1 scoreof the major mergers decrease by 20 − λ R e , µ , V , µ , σ , µ , V , µ , σ ).Interestingly, when the intrinsic kinematic predictors are ex- cluded, the performance of the minor merger simulations isnot signiﬁcantly affected. In fact, the performance of the ma-jor merger simulations is comparable or worse than that ofthe minor merger simulations. This highlights that the ma-jor mergers undergo a more dramatic global transformationduring the merging process, which is reﬂected in the intrinsickinematic properties of the remnant galaxy.Since removing these predictors signiﬁcantly decreases theperformance of the classiﬁcation, we choose to include allpredictors in this work and to attach the following caveat tothis paper: Since this analysis focuses on kinematic predic-tors that are sensitive to intrinsic galaxy properties, we ad-vise against applying this classiﬁcation to all galaxy types inMaNGA. In §4.9, we discuss possible strategies for carefullyapplying the classiﬁcation to MaNGA galaxies. DISCUSSIONIn the discussion portion of this paper, we consider the im-plications of the kinematic LDA classiﬁcations for merginggalaxies. We focus on the individual LD1 coefﬁcients in §4.1where we examine why some of the kinematic predictors thathave been useful in the past are not informative in this tech-nique. We then examine the most important kinematic pre-dictors in §4.2. In §4.3 we explore the impact of mass ratioon the stellar kinematics of mergers. We consider the phys-ical meaning of the interaction terms and their importanceto the classiﬁcation in §4.4. We examine the observabilitytimescale of the kinematic LDA technique and how the ob-servability of a merger varies with time in §4.5. We speciﬁ-cally focus on the deﬁnition of the ‘end’ of a merger in §4.6and the kinematics of the merger remnants in §4.7. In §4.8we compare the performance of the imaging classiﬁcationsto the kinematic classiﬁcations. Finally, we end with a noteon applying this technique to MaNGA IFS observations in§4.9.4.1.

Why are some traditionally utilized kinematicpredictors not useful in this classiﬁcation?

Some of the kinematic predictors that are traditionally uti-lized to identify merging galaxies are uninformative in thisanalysis. In this case, ‘uninformative’ means that the predic-tor is discarded during the RFR term selection steps or that ithas a small relative coefﬁcient value in the LDA. The unin-formative predictors are ∆ PA, v asym , σ asym , ∆ x V , and ∆ x σ .4.1.1. The misalignment between the kinematic PA and imagingPA ( ∆ PA) is most sensitive to galaxy inclination

A small fraction of the most dramatic mergers have sig-niﬁcantly disturbed stellar kinematic maps. However, thisdoes not translate to a global kinematic PA that is misalignedfrom the PA in imaging due to two factors. First, many ofthe warps seen in the stellar kinematic disks are symmetric,4 asym v a s y m Time [Gyr]01234560123456 asym v a s y m Time [Gyr]0123401234

Figure 14.

Time evolution of the merging (red and orange) andmatched nonmerging (blue) galaxies for the q0.5_fg0.3 (top, red)and q0.2_fg0.3_BT0.2 (bottom, orange) on the v asym - σ asym diagram.The blue squares indicate the matched isolated sample of galaxies,while the blue circles are the pre- and post-mergers. Here, we showthe pre-standardized predictor values. The time is zero as the simu-lations begin and progresses in Gyr. The K asym = .

15 (dotted) and0.135 (dashed) threshold lines are included from Hung et al. (2016)and Bellocchi et al. (2012), respectively, where galaxies above thediagnostic lines are classiﬁed as merging. which can produce a global kinematic PA that is not mis-aligned. Second, both the PA from the kinematics and thePA from imaging are not well determined during the mostdisturbed stages of the merger. This contributes to randomdeviations around a low ∆ PA value.4.1.2.

The asymmetry in the velocity and velocity dispersion maps( v asym and σ asym ) are only sensitive to the most disturbedtimes in the major mergers Previous work with the gas kinematics of simulated andobserved mergers ﬁnds that merging galaxies have enhancedvalues of both v asym and σ asym (e.g., Shapiro et al. 2008; Bel-locchi et al. 2012; Hung et al. 2016; Bloom et al. 2017).These studies deﬁne a threshold value in K asym to identifymerging galaxies, where K asym = (cid:113) v + σ . For in-stance, Bellocchi et al. (2012) study local luminous infraredgalaxies (LIRGs) and ﬁnd a threshold value of K asym > . K asym threshold of 0.15 for the galaxies in their work, which is calculated from the veloci-ties of gas particles from simulated SUNRISE mergers.The σ asym and v asym predictors are ultimately unimportantin our analysis. We present the viewpoint-averaged valuesof σ asym -v asym in Figure 14 for the q0.2_fg0.3_BT0.2 andthe q0.5_fg0.3 simulations. We include the K asym diagnos-tic lines used to identify mergers from Bellocchi et al. (2012)and Hung et al. (2016). The top panel of the plot demon-strates that there is minimal time evolution for the predictorvalues for the minor mergers. The predictor values for themajor mergers are only slightly enhanced, falling above thediagnostic line from Hung et al. (2016) for a few snapshotsduring the late stages of the merger. While Figure 14 presentsthe predictor values in log space, the standardized predictorvalues of σ asym and v asym used to construct the classiﬁcationalso have minimal separation between the merging and non-merging populations. The insensitivity of any of the mergersto enhancement in the values of these predictors results intheir exclusion during the RFR selection step.Ultimately, the v asym and σ asym predictors are unimportantin this work because v asym is only elevated for speciﬁc pointsin time during the late phases of merging. They are moreuseful in studies such as Bellocchi et al. (2012), which fo-cus on LIRGs, which are prototypical gas-rich major merg-ers, and Hung et al. (2016), where the simulated galaxies in-clude gas-rich major mergers. Additionally of interest, Hunget al. (2016) ﬁnd that the mergers in their sample exceed the K asym value only during the ‘strong interaction’ or late stageof merging.4.1.3. The offset between the center of the velocity and velocitydispersion maps and the imaging center ( ∆ x V and ∆ x σ )are not very sensitive to mergers We design the ∆ x V and the ∆ x σ statistics to identify galax-ies that have offsets in their kinematic centers. We ﬁnd thatthese values are most elevated during the late stages of themerger, where there are two visible nuclei. However, sincethe kinematic maps are disky throughout the merger and notdramatically disturbed at most stages, these two statistics arenot noticeably elevated for the duration of the merger and aretherefore relatively unimportant.Statistics like these have been used in the past for galaxiessuch as NGC 4473, which is a ‘double sigma (2 σ )’ galaxy,meaning that it has two peaks in its 2D stellar velocity dis-persion map that are aligned with the photometric major axisof the galaxy (Krajnovi´c et al. 2011). This type of veloc-ity dispersion map is rare in observations (e.g., Krajnovi´cet al. 2011) and is associated with a co-addition of a counter-rotating stellar disk, produced by retrograde 1:1 mass ratiomergers in simulations (e.g., Jesseit et al. 2007; Bois et al.2011). Therefore, these statistics may not be as useful foridentifying the more typical types of mergers, which are of-5ten more unequal in mass ratio and do not occur under ideal-ized conditions.4.2. What can we learn from the most important LDApredictors about the kinematics of stars in mergers?

Here we examine the most important predictors in the LDAfor all simulations and make connections between these pre-dictors and the dynamical evolution of the stars during amerger. We split the discussion by predictor and for brevitywe focus only on the leading predictors presented in Table 3.The top predictors include λ R e , µ , σ , µ , σ , | µ , σ | , and µ , V .4.2.1. The approximate spin parameter tracks a global‘slow-down’ in the velocity maps of the major mergers

The approximate spin parameter ( λ R e ) is a key predictor forall merger simulations and is especially important for the ma-jor mergers. The angular momentum of the merging galaxiesis therefore signiﬁcantly different than that of the nonmerg-ing population; this effect is more apparent for the majormergers. In this section we examine how λ R e changes withtime for the various simulations and how this compares toprevious work.We ﬁrst directly examine the pre-standardized values of λ R e and (cid:15) for the q0.2_fg0.3_BT0.2 and q0.5_fg0.3 simula-tions in Figure 15. On this diagram, we indicate the ‘slow-rotator’ territory, which is in the lower left corner of thisdiagram. The λ R e - (cid:15) diagram is often used to kinematicallydistinguish the slow-rotator population of early-type galaxiesfrom the fast-rotating population. This fast-slow rotator dis-tinction probes the evolutionary histories of galaxies throughdisk assembly (see Cappellari 2016 for a review).Much recent work has focused both on examining the ob-served populations of fast and slow-rotators and on makingpredictions for how merging galaxies move through this ter-ritory. For instance, Naab et al. (2014) utilize cosmologicalmerger tree simulations to show that major mergers signiﬁ-cantly affect the angular momentum content of a galaxy; theycan either spin up or spin down the remnant. In our case, allof the simulated galaxies begin with a λ R e value of ∼ . λ R e value dramatically decreases,to the boundary of the slow-rotator region. This conﬁrms thatmajor mergers can dramatically affect the kinematic proper-ties of the remnant, kinematically transforming the galaxyfrom one that can be described as disk-dominated to one thatis still rotating but is dispersion-dominated.4.2.2. The mean and dispersion of the velocity dispersiondistribution ( µ , σ and µ , σ ) track the growth of a stellarbulge component The overall importance of the µ , σ and µ , σ predictors re-ﬂects the fact that the velocity dispersion map is informativefor identifying mergers. We examine how these two predic-tors evolve with time during a merger in Figure 16 for the R e Time [Gyr]012345601234560.00 0.25 0.50 0.750.00.20.40.60.8 R e Time [Gyr]0123401234

Figure 15.

Same as Figure 14 but for the time evolution of themerging (red and orange) and nonmerging (blue) galaxies for theq0.5_fg0.3 (top, red) and q0.2_fg0.3_BT0.2 (bottom, orange) onthe λ R e - (cid:15) diagram. As the merger progresses (red points), the galax-ies evolve towards decreased values of λ R e , which corresponds toincreasing levels of disorder in the kinematic maps. Slow rotators,deﬁned by Cappellari (2016), fall below the dashed line on theseplots. q0.5_fg0.3 and q0.2_fg0.3_BT0.2 mergers. Here we presentthe average value for all of the viewpoints of a given snap-shot. We also include representative velocity dispersion mapsfor a handful of informative snapshots. For all merger simulations, the µ , σ value increasesthroughout a merger, tracing the assembly of a stellar bulgecomponent. This increase is more dramatic for the majormergers, which increase to a µ , σ value of ∼

200 km s − .Even for the minor mergers, the merger incites growth ofthe central velocity dispersion with time. This enhancementis still present 0.5 Gyr after coalescence, so the isolatedpost-coalescence stages are mixed with the merger snapshotsalong the µ , σ axis in Figure 16. The signatures of the bulgegrowth are therefore dynamically long-lived as opposed toimaging features that fade quickly with time following amerger (i.e., in N19 the imaging predictors fade within 0.5Gyr of ﬁnal coalescence).

The µ , σ predictor serves different roles in major versusminor mergers, which is reﬂected in the different evolution ofthe µ , σ values with time. An increase of µ , σ with time for6 Figure 16.

Time evolution of the mean values of µ , σ and µ , σ for the q0.5_fg0.3 (left plot) and q0.2_fg0.3_BT0.2 simulations (right plot).We show the time-evolution of the merging galaxies with the red and orange points and of the nonmerging galaxies with the blue points. Thestand-alone isolated galaxies are squares and the pre- and post-merger isolated galaxies are circles. Above each plot, we additionally includerepresentative velocity dispersion maps from key snapshots. We ﬁnd that the major mergers (left plot) tend to show a consistent evolutionin µ , σ and µ , σ with time; both values increase as the stellar bulge component is built. We include three example velocity dispersion maps(above each plot) that belong to the early (left), late (middle), and post-coalescence (right) stages of the merger. The late and post-coalescencesnapshots have elevated values of µ , σ and µ , σ ; during the late stage the area between the two nuclei has an enhanced velocity dispersion(middle) and during the post-coalescence stage (right), the center of the galaxy has a larger velocity dispersion value. On the other hand, theminor mergers (right plot) show an increase in µ , σ with time but there is not a signiﬁcant change to µ , σ with time. While both types of mergersare contributing to a stellar bulge component, the change to the dispersion maps of the major mergers is more dramatic and global. the major mergers traces the presence of two kinematic com-ponents by capturing the ‘bridge’ of higher velocity disper-sion values between two merging galaxies. This is formed bytwo overlapping counter-rotating features. Additionally, thepost-coalescence major mergers have more signiﬁcant bulgegrowth, which is reﬂected both in the enhancement in µ , σ and in an increase in µ , σ , since the entire distribution isbroadened in this process.The minor mergers show less change in the value of µ , σ with time. While µ , σ is still informative (because it in-creases during the late stages of the merger to track the bridgeof higher velocity dispersion), it does not continue to increaseinto the post-coalescence stages. This could indicate that asmaller fraction of the stars are involved in the buildup of thestellar bulge in the case of the minor mergers.4.2.3. The skewness of the velocity dispersion distribution ( | µ , σ | )identiﬁes secondary kinematic components The skewness in the velocity dispersion distribution, | µ , σ | ,is an important predictor for the major mergers; it is sensitiveto secondary kinematic components and disturbances in thevelocity dispersion maps. For instance, the stellar bulge re- gion has a higher dispersion, which manifests itself as a smallwing on the velocity dispersion distribution (the main contri-bution to this distribution is the disk rotation component). Inthis case, the skewness predictor identiﬁes similar features tothe µ , σ and µ , σ predictors.The | µ , σ | predictor is additionally important for identify-ing early-stage mergers. These snapshots tend to have lowvalues of µ , σ and µ , σ . Since they have undergone ﬁrst peri-centric passage, they have a slight enhancement in the veloc-ity dispersion map in the area of the primary galaxy that isperturbed by the merger. An example of such a snapshot withthis type of velocity dispersion enhancement is the leftmostgalaxy in the q0.5_fg0.3 panel in Figure 16. This galaxy isclassiﬁed as merging by the classiﬁcation and the most im-portant predictor leading to this decision is | µ , σ | .4.2.4. The kurtosis of the velocity distribution ( µ , V ) identiﬁes thesuperposition of two merging galaxies The kurtosis of the velocity distribution, µ , V , is importantfor both of the individual minor mergers and the combinedminor merger classiﬁcation. This predictor is sensitive to per-turbations in the velocity ﬁeld, speciﬁcally cases where there7are high velocities in the velocity distribution. When thereare extreme velocities in the velocity distribution (due to thesuperposition of two merging nuclei), the kurtosis becomesmore negative due to the ﬂattening of the distribution. The µ , V predictor is signiﬁcant because it is able to track smallerchanges in the velocity distribution, as opposed to global dis-ruptions as in the case of the major mergers. It is a signiﬁcantpredictor for the minor mergers because the velocity disper-sion distributions are not dramatically changing during a mi-nor merger. Instead, the LDA must rely upon the extremevelocities caused by the superposition of a secondary nuclei.4.3. The classiﬁcation changes with mass ratio

Past studies have investigated how the properties of sim-ulated mergers affect the kinematic predictors. Hung et al.(2016) have investigated this for a set of simulated merg-ing galaxies with mass ratios 1:1 and 1:4. They ﬁnd thatthe merger signatures in kinematics are most apparent for the1:1 major merger, where they can be visible for up to twiceas long as for the 1:4 major merger. While there are signif-icant differences between the work in this paper and Hunget al. (2016) (i.e., we perform a full RT and create mock IFSmaps while Hung et al. (2016) use the

GADGET-3 particlevelocities to create velocity maps), we also ﬁnd that the clas-siﬁcation differs signiﬁcantly with mass ratio.The major and minor merger classiﬁcations are different inseveral ways. The minor merger classiﬁcations have perfor-mance statistics that are ∼ µ , σ and µ , σ are impor-tant for all classiﬁcations, λ R e is more important for the ma-jor mergers and µ , V is more important for the minor mergers.As we discuss in §4.2, both major and minor mergers demon-strate bulge growth, which leads to an enhancement in µ , σ ,but the change is more apparent in the major mergers. Theglobal kinematic properties of the major mergers are moresigniﬁcantly impacted; this includes the λ R e predictor, whichtraces a global slow-down in the velocity maps of the majormergers. On the other hand, the minor mergers are most sen-sitive to smaller-scale changes in the kinematic maps, whichcan be traced by predictors like µ , V , which track the super-position of the secondary stellar nuclei. 4.4. The predictors evolve non-linearly with time; the LDAincorporates this behavior with interaction terms

Many of the kinematic predictors in this analysis evolvewith time throughout the merger. In most cases, this evolu-tion is non-monotonic, meaning that merging galaxies evolveback and forth in predictor space as a function of mergertime. The LDA technique accounts for this behavior usinginteraction terms.An example of an interaction term in action is the µ σ ∗ σ σ term, which has a negative coefﬁcient for several of the ma-jor merger simulations. This means that if the value of µ , σ is relatively large, then µ , σ must be relatively small for themerger probability to increase. The opposite is also true: if µ , σ is relatively small, then µ , σ must increase. To be clear,‘relatively large’ and ‘relatively small’ refer to the standard-ized values of these predictors, which are measured relativeto the distribution of values for the entire merger. So if aterm is relatively small, the predictor value will be negative.For instance, if µ , σ is large and µ , σ is small, then the termbecomes: (- coefﬁcient) * (positive) * (negative) = positivevalue = increase in LD1.Consider the µ σ − σ σ diagram for the q0.5_fg0.3 mergerpresented in the left panel of Figure 16. The µ , σ and µ , σ predictors are correlated, so they occupy the diagonal spaceof this diagram. This correlation and the interaction termfunction so that the center of the µ σ − σ σ diagram is the‘merger territory’. This picture is somewhat complicatedby the fact that there are other terms in the LDA, but if the µ σ ∗ σ σ interaction term has a large coefﬁcient, then this in-terpretation generally applies.The interaction terms account for the non-monotonic evo-lution of the predictor values with time. When we create aclassiﬁer with only the primary predictors, it is fundamen-tally different from the LDA with interaction terms. The lin-ear classiﬁer is inaccurate, classifying many of the isolatedpost-merger snapshots as mergers. When the LDA is forcedto generalize to one direction of movement across these dia-grams in the monotonic case, it loses key information.When the interaction terms are included, the LDA becomessensitive to the values of the other predictors and is there-fore able to create ‘if-then’ cases for the predictor space.For example, if the galaxy is in the later stages of merging(which are characterized by signiﬁcant bulge growth and alarge value of µ , σ ), then to avoid ambiguity with the post-merger isolated galaxies, the classiﬁcation would like to see arelatively negative standardized value of µ , σ in order to clas-sify the galaxy as merging. On the other hand, if the galaxyhas a relatively small value of µ , σ , then µ , σ should be largefor the galaxy to be classiﬁed as merging. An example ofthis type of galaxy is the early stages of the merger, wherethe bulge growth has not yet begun so µ , σ is small. How-ever, the galaxy has undergone its ﬁrst pericentric passage8and is experiencing an enhancement in the velocity disper-sion which leads to an increase in µ , σ . An example of thistype of galaxy is shown in the left panel of Figure 16.We conclude that it is critical to include the interactionterms in the LDA classiﬁcation. Not only do they improvethe performance of the technique, but they are physically mo-tivated by the non-monotonic evolution in kinematic predic-tors over the course of a merger lifetime.4.5. The observability of a merger varies with time; mergersare missed during the early stages when the kinematicsare disk-like

In §3.6, we present the observability timescales of the vari-ous simulated mergers. We conclude that the kinematic LDAtechnique lengthens the observability timescale of the sim-ulated mergers over that measured from the individual pre-dictors. Here we focus speciﬁcally on how the observabilityof a merger changes with the merger stage, and we refer thereader to Figure 7 for a useful visualization of the mean valueof LD1 with time in all of the simulated mergers.When we examine the observability of the mergers in termsof the pre-deﬁned early-, late-, and post-coalescence stages,we notice several differences with stage. First, during theearly stages of merging, some simulations show a larger stan-dard deviation in the LD1 values. For all simulations, thevalue of LD1 tends to fall below the decision boundary forthese early stages. Hung et al. (2015) ﬁnd that using kine-matic predictors to identify mergers results in a signiﬁcantfraction of false negatives from epochs where the merger isindistinguishable from a rotating disk. We also ﬁnd that asigniﬁcant fraction of false negatives occur during the earlystages where the rotation is indeed disk-like.During the late stage of the merger, the minor mergers donot show much variation from one snapshot to the next; theLD1 values are relatively ﬂat. On the other hand, the majormergers, in particular q0.5_fg0.3, show variation between thelate-stage snapshots. For q0.5_fg0.3, this is what contributesto the relatively short observability timescale. These changesin LD1 values are signiﬁcantly greater than the variance dueto different viewpoints.

The kinematic features are there-fore changing signiﬁcantly and rapidly with time for someof the major mergers during the late stage.

We also ﬁndthat most simulations have relatively high LD1 values dur-ing the late stage of the merger. The late stage of the mergeris therefore characterized by short-lived dramatic kinematicfeatures. This is consistent with Hung et al. (2015), who ﬁndthat kinematic tracers of mergers tend to be most informa-tive during the late stage of the merger, which is when theimaging predictors are also most useful.As the merger progresses into the post-coalescence andpost-merger isolated stages, we ﬁnd that the LD1 value ismore stable, which is characteristic of longer-lived kinematic features. The LD1 value does not signiﬁcantly decline dur-ing the post-merger stages. We focus on the kinematics of thepost-coalescence and post-merger stages in §4.6 and §4.7.4.6.

When does a merger end? The kinematic disturbancesdue to mergers are long-lived

In N19, we deﬁne the end of the merger as 0.5 Gyr af-ter ﬁnal coalescence. This cutoff was selected so that theimaging predictors and therefore the value of LD1 decayedsmoothly until the end of the merger. The imaging techniqueis therefore very accurate and precise during the transitionfrom a post-coalescence merger to an isolated post-mergergalaxy. In contrast, in this work the kinematic predictors andtherefore the LD1 value remain elevated during the period ofpost-merger isolated snapshots. There are visually apparentwarps in the velocity maps and the velocity dispersion mapshave elevated values of µ , σ and µ , σ for the post-merger iso-lated galaxies. Hung et al. (2016) also ﬁnd a persistence ofkinematic merger signatures up to ∼ Gyrs following coales-cence. We ﬁnd that the kinematic disturbances fade 2 − . Instead of changing the deﬁnition of the end of the merger,a more relevant task could be to deﬁne the merger morespeciﬁcally by stage.

For instance, if it is a priority to dis-tinguish post-coalescence from post-merger isolated galax-ies (post-coalescence snapshots occur immediately followingcoalescence until 0.5 Gyr after coalescence and post-mergersnapshots occur > The kinematics of the post-merger stages track thegrowth of a stellar bulge and a kinematicallydecoupled core for the 1:2 mass ratio merger

Here we focus on the kinematics of the post-coalescenceand post-merger stages. During these stages we observe thebuild-up of a central component in the stellar velocity dis-persion maps. The change is more dramatic for the majormergers and can best be explained as tracing the growth ofa stellar bulge component. Hopkins et al. (2009) investigatethe effect of mass ratio on the merger remnant and ﬁnd thatthe fraction of the primary stellar disk that is relaxed into thebulge is directly proportional to the mass ratio of the merger.This supports the hypothesis that a stellar bulge is built in the9

Figure 17.

Evolution of the long-lived kinematically decoupled fea-ture in the r − band image (left), stellar velocity map (middle), andstellar velocity dispersion map (right) of the q0.5_fg0.3 merger. Thetop panel is the last snapshot before coalescence. At 2.54 Gyr thegalaxy is in the post-coalescence stage and at 2.79 and 5.18 Gyrsthe galaxy is in the post-merger isolated stages. The central kine-matically decoupled component appears in the velocity map around2.54 Gyr, which is ∼ ∼ post-merger stages, since we would expect major mergers tocontribute a larger fraction of stars to the bulge component.The q0.5_fg0.3 merger has unique kinematic features inthe stellar velocity map during the post-merger stages, so wefocus the remainder of our discussion on this merger. Wepresent several post-coalescence and post-merger snapshotsfrom the q0.5_fg0.3 merger in Figure 17. The stellar velocitymaps are particularly intriguing because they have a distinctcentral component that appears in the post-coalescence phaseand persists into the post-merger phase. This central featurein the stellar velocity maps is spatially coincident with thebulge-like feature in the stellar velocity dispersion maps. Itis not fully counter-rotating, but is misaligned from the main stellar disk. We therefore hypothesize that we have discov-ered a decoupled kinematic component.Previous theoretical work has predicted that major merg-ers with mass ratios of 1:1 or 1:2 can produce this typeof intriguing kinematic component (e.g., Bendo & Barnes2000; Jesseit et al. 2007; Crocker et al. 2009). For instance,Bendo & Barnes (2000) and Jesseit et al. (2007) ﬁnd thatequal mass simulated mergers display a much wider rangeof kinematic features, including counter-rotating cores andglobal misalignments while the unequal mass mergers tend tohave disk-like kinematics. The merger remnants with com-plex kinematics are intriguing because these decoupled cen-tral components have been discovered in observational stud-ies as well. For example, the ATLAS survey ﬁnds thata signiﬁcant fraction of slow-rotating ETGs have decoupledkinematic components (e.g., Emsellem et al. 2011).Our ﬁnding of a decoupled kinematic core in the remnantof the 1:2 ( q = .

5) mass ratio merger supports these pastﬁndings that mergers with a large mass ratio can produce dra-matic central features in the kinematic maps. Furthermore,this result suggests that selecting galaxies with kinematicallydecoupled cores would produce a sample of post-coalescencemajor mergers with a mass ratio q (cid:38) . The kinematic LDA is not as good at identifyingmerging galaxies as the imaging technique

The kinematic LDA technique has a signiﬁcant number offalse negatives, which drives down the accuracy and recallof the technique. This is partially a result of the chosen pri-ors which skew the classiﬁcation towards minimizing falsepositives (see §3.5 for a full discussion). As we discuss in§3.7, this is also due to the lack of identiﬁability of certainsnapshots as mergers, meaning that they are indistinguishablefrom nonmerging disks and/or post-merger remnants both vi-sually and using their kinematic predictors.The imaging LDA performs better for all runs, withimprovements on all performance statistics. This meansthat fewer total galaxies will be correctly classiﬁed bythe kinematic technique due to the shortcomings listedabove. While the major merger combined run is slightlyimproved when run with imaging predictors (this improvesthe accuracy/recall/F1 score by 9%/11%/7%), the minormerger combined run experiences signiﬁcant improvement(16%/41%/25%). The kinematic minor merger combinedclassiﬁcation therefore scores lower on all performancestatistics relative to both the imaging minor merger classi-ﬁcation and the kinematic major merger classiﬁcation. Thisreﬂects the particular inability of the kinematic classiﬁcationto identify minor mergers.The imaging LDA also has longer observability times onaverage, ranging from observability times of 2 . − . . − . . − . . − . Applying the technique to MaNGA IFS in future work

In §3.10, we discuss the implication of creating a kine-matic classiﬁcation from a suite of simulations with a narrowrange in stellar mass (3 . × < M (cid:12) < . × ) andin initial B/T ratio (0 − . < M (cid:12) < )and morphology.At present, it is unclear if this will be a concern for justthe extreme cases, i.e., the most massive bulge-dominatedETGs, or if it will also cause concern in systems with amix of rotation and dispersion support in their kinematics.We have preliminarily investigated the distribution of bulge- versus disk-dominated galaxies in MaNGA; while Wanget al. (2020) ﬁnd that MaNGA galaxies are predominantlydisk-dominated, Graham et al. (2018) ﬁnd that a signiﬁcantfraction of MaNGA galaxies (across all masses) are bulge-dominated.Since the MaNGA sample includes a diversity of differentgalaxy types, we plan to tread carefully when we apply theclassiﬁcation. One option could be to select MaNGA galax-ies that have high values of λ R e to include in the classiﬁca-tion; in this way, we could exclude bulge-dominated galaxies.Another option could be to de-emphasize certain domains ofpredictor space in the classiﬁcation, or to remove the kine-matic predictors that are most sensitive to intrinsic galaxyproperties altogether. The details of this approach will be de-veloped in future work, since they are beyond the scope ofthis paper, which focuses mostly on the creation of the tech-nique. CONCLUSIONSIn this work, we build on the stand-alone imaging mergerclassiﬁer in N19 to create a parallel LDA classiﬁer that uti-lizes kinematic predictors to identify merging galaxies. Toproduce the classiﬁcation, we use

SUNRISE synthetic spec-tra from

GADGET-3 simulated merging galaxies to createmock ‘MaNGA-ized’ datacubes. We convolve and rebinthe synthetic spectra to the spatial and spectral resolution ofMaNGA, introduce noise, and implement the Voronoi bin-ning scheme used for the MaNGA datacubes. With ppxf ,we extract stellar velocity and stellar velocity dispersionmaps from each datacube.We then measure a number of kinematic predictors fromthe velocity and velocity dispersion maps. We use a randomforest regressor (RFR) followed by the LDA classiﬁer to se-lect the most informative kinematic predictors and to carryout the classiﬁcation. The selected predictors are: the differ-ence between the kinematic PA and the imaging PA ( ∆ PA),the kinemetry residuals (resid), the approximate spin pa-rameter ( λ R e ), the asymmetry in the Radon proﬁle ( A ), andthe moments of the velocity and velocity dispersion distribu-tions ( µ , V , µ , σ , µ , V , µ , σ , | µ , V | , | µ , σ | , µ , V , and µ , σ ). Wethen run the LDA as a classiﬁer for all simulations individu-ally as well as for the combined major merger simulation andthe combined minor merger simulation.We ﬁrst use the LDA classiﬁcation as an agnostic approachto determine the most useful kinematic predictors for identi-fying different types of mergers. Our main conclusions are:• Many kinematic predictors that are used in previouswork to identify mergers are not as useful in this work(i.e., the deviation of the velocity and velocity disper-sion maps from ordered rotation (v asym and σ asym , re-spectively, and ∆ PA). These predictors are sensitive tospeciﬁc stages of equal mass ratio mergers and are not1as sensitive to the full range of merger parameters andstages used in this simulation suite (§4.1).• The mean and variance of the values in the velocity dis-persion maps ( µ , σ and µ , σ , respectively) are the mostuseful predictors for identifying mergers across allsimulations, because they are sensitive to the growthof a stellar bulge component during the merger (§4.2).• The selected predictors differ as a function of mass ra-tio. The major mergers exhibit large-scale kinematicchanges (i.e., a global slow-down of the rotation), sothey rely more on predictors like λ R e . The minor merg-ers are identiﬁed using predictors like µ , V which tracethe superposition of a secondary stellar nucleus (§4.3).We also examine the performance of the LDA classiﬁcation,which is measured using the four performance statistics (ac-curacy, precision, recall, and F1 score) as well as the observ-ability timescale. Our main ﬁndings are:• The LDA performance signiﬁcantly improves whenthe interaction terms are included. These terms are ca-pable of accounting for the non-monotonic evolutionof the kinematic predictors with time (§4.4).• By combining many different kinematic predictors,we create a classiﬁcation where the observabilitytimescale is a large fraction of the overall merger time(40-90%). This corresponds to mergers that are ob-servable for 0.9-6.6 Gyr and is an improvement onthe observability timescale from any of the individualkinematic predictors (§4.5).• The sensitivity of the LDA technique varies with epochduring the mergers. We ﬁnd that there are more missedmergers (i.e., false negatives) during the early stage ofthe merger, where the stellar kinematics are disk-like.The mergers are most detectable during the late andpost-coalescence stages (§4.5).• The kinematic predictors (and the LD1 value) are long-lived and remain elevated for ∼ ∼

15% increase in accuracy, recall, and F1 score) forthe minor mergers. The kinematic LDA can be im-proved by adding imaging predictors (§4.8). • The kinematic predictors add unique informationabout merging galaxies to the toolkit; for instance, thekinematic classiﬁcation is better at identifying post-coalescence and post-merger galaxies relative to theimaging classiﬁcation (§4.8).• The kinematic classiﬁcation is created from a suite ofsimulations that are limited in their scope (i.e., the sim-ulated galaxies are disk-dominated and span a range of3 . − . × M (cid:12) in stellar mass). We conclude thatthe results may not be applied to all MaNGA galaxies(which have a range of morphologies and an approxi-mately ﬂat stellar mass distribution 10 < M (cid:12) < ).We plan to further address this concern in future work(§4.9).In Nevin et al. (2021, in prep) we will combine the kine-matic classiﬁcation with the imaging classiﬁcation presentedin N19 and apply the classiﬁer to MaNGA galaxies. Atthis point, we will release the python tools for implementingthese classiﬁcations. These tools are designed to be adapt-able to the speciﬁcations of other imaging and/or IFS sur-veys, with the goal of applying the classiﬁcation to other IFSsurveys - i.e., SAMI, CALIFA, HECTOR.In Nevin et al. (2021, in prep) we will further investi-gate whether various kinematic parameters enhance the exist-ing imaging classiﬁer and why, and revisit the hyperparam-eter tuning to determine the optimal location of the decisionboundary. We also plan to investigate the possibility of split-ting the classiﬁcation by merger stage. Our scientiﬁc goalsinclude identifying how the star formation histories, metal-licities, and AGN activity change for these different stages aswell as for different mass ratios of merging galaxies. ACKNOWLEDGEMENTSWe thank the anonymous referee for their thorough andthoughtful comments that have improved the quality and clar-ity of this paper.R. N. and J. M. C. are supported by NSF AST-1714503.L. B. acknowledges support by NSF award

Software: astropy (The Astropy Collaboration et al.2013, 2018), matplotlib (Hunter 2007), mangadap (West-fall et al. 2019; Belﬁore et al. 2019), numpy (Van Der Waltet al. 2011), openmpi (Gabriel et al. 2004), scikit-learn (Pe-dregosa et al. 2011), scipy (Virtanen et al. 2020), seaborn(Waskom et al. 2014), sdss-marvin (Cherinka et al. 2019),pandas (McKinney et al. 2010)REFERENCES

Aguado, D. S., Ahumada, R., Almeida, A., et al. 2019, ApJS, 240,23Alonso-Herrero, A., Rieke, G. H., Rieke, M. J., et al. 2006, ApJ,650, 835Barrera-Ballesteros, J. K., Heckman, T., Sánchez, S. F., et al. 2018,ApJ, 852, 74Belﬁore, F., Westfall, K. B., Schaefer, A., et al. 2019, AJ, 158, 160Bellocchi, E., Arribas, S., & Colina, L. 2012, A&A, 542, A54—. 2016, A&A, 591, A85Bellocchi, E., Arribas, S., Colina, L., & Miralles-Caballero, D.2013, AAP, 557, A59Bendo, G. J., & Barnes, J. E. 2000, MNRAS, 316, 315Blanton, M. R., & Roweis, S. 2007, AJ, 133, 734Blanton, M. R., Bershady, M. A., Abolfathi, B., et al. 2017, AJ,154, 28Blecha, L., Civano, F., Elvis, M., & Loeb, A. 2013a, MNRAS, 428,1341Blecha, L., Cox, T. J., Loeb, A., & Hernquist, L. 2011, MNRAS,412, 2154Blecha, L., Loeb, A., & Narayan, R. 2013b, MNRAS, 429, 2594Blecha, L., Snyder, G. F., Satyapal, S., & Ellison, S. L. 2018,MNRAS, 478, 3056Bloom, J. V., Fogarty, L. M. R., Croom, S. M., et al. 2017,MNRAS, 465, 123Bois, M., Emsellem, E., Bournaud, F., et al. 2011, MNRAS, 416,1654 Bottrell, C., Hani, M. H., Teimoorinia, H., et al. 2019, MNRAS,490, 5390Bryant, J. J., Bland-Hawthorn, J., Lawrence, J., et al. 2016, inSociety of Photo-Optical Instrumentation Engineers (SPIE)Conference Series, Vol. 9908, Ground-based and AirborneInstrumentation for Astronomy VI, 99081FBundy, K., Bershady, M. A., Law, D. R., et al. 2015, ApJ, 798, 7Cappellari, M. 2016, ARA&A, 54, 597—. 2017, MNRAS, 466, 798Cappellari, M., & Copin, Y. 2003, MNRAS, 342, 345Cappellari, M., & Emsellem, E. 2004, PASP, 116, 138Cappellari, M., Emsellem, E., Bacon, R., et al. 2007, MNRAS,379, 418Cappellari, M., Emsellem, E., Krajnovi´c, D., et al. 2011, MNRAS,413, 813Cherinka, B., Andrews, B. H., Sánchez-Gallego, J., et al. 2019, AJ,158, 74Cole, S., Helly, J., Frenk, C. S., & Parkinson, H. 2008, MNRAS,383, 546Conselice, C. J., Yang, C., & Bluck, A. F. L. 2009, MNRAS, 394,1956Cox, T. J., Dutta, S. N., Di Matteo, T., et al. 2006, ApJ, 650, 791Crocker, A. F., Jeong, H., Komugi, S., et al. 2009, MNRAS, 393,1255Croom, S. M., Lawrence, J. S., Bland-Hawthorn, J., et al. 2012,MNRAS, 421, 872 Croton, D. J., Springel, V., White, S. D. M., et al. 2006, MNRAS,365, 11Dasyra, K. M., Tacconi, L. J., Davies, R. I., et al. 2006, ApJ, 651,835Dekel, A., Birnboim, Y., Engel, G., et al. 2009, Nature, 457, 451Di Matteo, T., Springel, V., & Hernquist, L. 2005, Nature, 433, 604Drory, N., MacDonald, N., Bershady, M. A., et al. 2015, AJ, 149,77Duda, R. O., Hart, P. E., & Stork, D. G. 2001, Pattern Classiﬁcation(2nd Ed) (Wiley)Emsellem, E., Cappellari, M., Krajnovi´c, D., et al. 2007, MNRAS,379, 401—. 2011, MNRAS, 414, 888Fabian, A. C. 2012, ARA&A, 50, 455Fawcett, T. 2006, Pattern Recognit. Lett., 27, 861Gabriel, E., Fagg, G. E., Bosilca, G., et al. 2004, in Proceedings,11th European PVM/MPI Users’ Group Meeting, Budapest,Hungary, 97–104Giavalisco, M., Steidel, C. C., & Macchetto, F. D. 1996, ApJ, 470,189Glazebrook, K. 2013, PASA, 30, e056Glazebrook, K., Ellis, R., Santiago, B., & Grifﬁths, R. 1995,MNRAS, 275, L19Goulding, A. D., Greene, J. E., Bezanson, R., et al. 2018, PASJ, 70,S37Graham, M. T., Cappellari, M., Li, H., et al. 2018, MNRAS, 477,4711Gunn, J. E., & Gott, J. Richard, I. 1972, ApJ, 176, 1Gunn, J. E., Siegmund, W. A., Mannery, E. J., et al. 2006, AJ, 131,2332Guo, Y., Ferguson, H. C., Bell, E. F., et al. 2015, ApJ, 800, 39Haan, S., Surace, J. A., Armus, L., et al. 2011, AJ, 141, 100Harborne, K. E., Power, C., Robotham, A. S. G., Cortese, L., &Taranu, D. S. 2019, MNRAS, 483, 249Hayward, C. C., Kereš, D., Jonsson, P., et al. 2011, ApJ, 743, 159Hayward, C. C., Torrey, P., Springel, V., Hernquist, L., &Vogelsberger, M. 2014, MNRAS, 442, 1992Heckman, T. M., & Best, P. N. 2014, ARA&A, 52, 589Ho, T. K. 1995, in Proceedings of the Third InternationalConference on Document Analysis and Recognition (Volume 1)- Volume 1, ICDAR ’95 (USA: IEEE Computer Society), 278Hopkins, P. F., Cox, T. J., Hernquist, L., et al. 2013a, MNRAS,430, 1901Hopkins, P. F., Cox, T. J., Kereš, D., & Hernquist, L. 2008, ApJS,175, 390Hopkins, P. F., Cox, T. J., Younger, J. D., & Hernquist, L. 2009,ApJ, 691, 1168Hopkins, P. F., Hernquist, L., Cox, T. J., et al. 2006, ApJS, 163, 1Hopkins, P. F., Kereš, D., Murray, N., et al. 2013b, MNRAS, 433,78 Hopkins, P. F., Croton, D., Bundy, K., et al. 2010, ApJ, 724, 915Hung, C.-L., Hayward, C. C., Smith, H. A., et al. 2016, ApJ, 816,99Hung, C.-L., Sanders, D. B., Casey, C. M., et al. 2014, ApJ, 791, 63Hung, C.-L., Rich, J. A., Yuan, T., et al. 2015, ApJ, 803, 62Hunter, J. D. 2007, Computing in Science & Engineering, 9, 90James, G., Witten, D., Hastie, T., & Tibshirani, R. 2013, AnIntroduction to Statistical Learning with Applications in R(Springer)Jesseit, R., Naab, T., Peletier, R. F., & Burkert, A. 2007, MNRAS,376, 997Jonsson, P. 2006, MNRAS, 372, 2Jonsson, P., Groves, B. A., & Cox, T. J. 2010, MNRAS, 403, 17Kaviraj, S. 2013, MNRAS, 437, 1Khim, D. J., Yi, S. K., Dubois, Y., et al. 2020, ApJ, 894, 106Krajnovic, D., Cappellari, M., Zeeuw, P. T. D., Copin, Y., & Lyon,D. 2006, MNRAS, 802, 787Krajnovi´c, D., Emsellem, E., Cappellari, M., et al. 2011, MNRAS,414, 2923Law, D. R., Steidel, C. C., Shapley, A. E., et al. 2012a, ApJ, 759,29—. 2012b, ApJ, 745, 85Law, D. R., Yan, R., Bershady, M. A., et al. 2015, AJ, 150, 19Law, D. R., Cherinka, B., Yan, R., et al. 2016, AJ, 152, 83Leitherer, C., Schaerer, D., Goldader, J. D., et al. 1999, ApJS, 123,3Li, T., Zhu, S., & Ogihara, M. 2006, Knowl. Inf. Syst., 10, 453Lilly, S. J., Tresse, L., Hammer, F., Crampton, D., & Le Fevre, O.1995, ApJ, 455, 108López-Sanjuan, C., Balcells, M., Pérez-González, P. G., et al.2009, A&A, 501, 505López-Sanjuan, C., García-Dabó, C. E., & Balcells, M. 2008,PASP, 120, 571Lotz, J. M., Jonsson, P., Cox, T. J., et al. 2011, ApJ, 742, 103Lotz, J. M., Jonsson, P., Cox, T. J., & Primack, J. R. 2010a,MNRAS, 404, 590—. 2010b, MNRAS, 404, 575Lotz, J. M., Patrik, J., Cox, T. J., et al. 2008, MNRAS, 391, 3Mantha, K. B., McIntosh, D. H., Brennan, R., et al. 2018, MNRAS,475, 1549Masters, K. L., Mosleh, M., Romer, A. K., et al. 2010, MNRAS,405, 783McKinney, W., et al. 2010, in Proceedings of the 9th Python inScience Conference, Vol. 445, Austin, TX, 51–56Miralles-Caballero, D., Colina, L., Arribas, S., & Duc, P.-A. 2011,AJ, 142, 79Naab, T., Oser, L., Emsellem, E., et al. 2014, MNRAS, 444, 3357Narayanan, D., Dey, A., Hayward, C. C., et al. 2010, MNRAS,407, 1701 Nevin, R., Blecha, L., Comerford, J., & Greene, J. 2019, ApJ, 872,76Pedregosa, F., Varoquaux, G., Gramfort, A., et al. 2011, Journal ofMachine Learning Research, 12, 2825Petty, S. M., Armus, L., Charmandaris, V., et al. 2014, AJ, 148, 111Piqueras López, J., Colina, L., Arribas, S., Alonso-Herrero, A., &Bedregal, A. G. 2012, A&A, 546, A64Robertson, B., Bullock, J. S., Cox, T. J., et al. 2006, ApJ, 645, 986Robotham, A. S. G., Driver, S. P., Davies, L. J. M., et al. 2014,MNRAS, 444, 3986Rodriguez-Gomez, V., Genel, S., Vogelsberger, M., et al. 2015,MNRAS, 449, 49Rodriguez-Gomez, V., Snyder, G. F., Lotz, J. M., et al. 2019,MNRAS, 483, 4140Rupke, D. 2018, Galaxies, 6, 138Sánchez, S. F., Kennicutt, R. C., Gil de Paz, A., et al. 2012, A&A,538, A8Schawinski, K., Lintott, C., Thomas, D., et al. 2009, MNRAS, 396,818Shapiro, K. L., Genzel, R., Förster Schreiber, N. M., et al. 2008,ApJ, 682, 231Sheth, K., Vogel, S. N., Regan, M. W., Thornley, M. D., & Teuben,P. J. 2005, ApJ, 632, 217Shi, Y., Rieke, G., Lotz, J., & Perez-Gonzalez, P. G. 2009, ApJ,697, 1764Silk, J., & Rees, M. J. 1998, A&A, 331, L1Smee, S. A., Gunn, J. E., Uomoto, A., et al. 2013, AJ, 146, 32Snyder, G. F., Hayward, C. C., Sajina, A., et al. 2013, ApJ, 768,168Snyder, G. F., Rodriguez-Gomez, V., Lotz, J. M., et al. 2019,MNRAS, 486, 3702Springel, V. 2005, MNRAS, 364, 1105Springel, V., & Hernquist, L. 2003, MNRAS, 339, 289 Stark, D. V., Bundy, K. A., Westfall, K., et al. 2018, MNRAS, 480,2217Stickley, N. R., & Canalizo, G. 2016, arXiv e-prints,arXiv:1610.02601Sweet, S. M., Glazebrook, K., Obreschkow, D., et al. 2020,MNRAS, 494, 5421Tacconi, L. J., Genzel, R., Smail, I., et al. 2008, ApJ, 680, 246The Astropy Collaboration, Robitaille, T. P., Tollerud, E. J., et al.2013, A&A, 558, A33The Astropy Collaboration, Price-Whelan, A. M., Sip˝ocz, B. M.,et al. 2018, AJ, 156, 123Van Der Walt, S., Colbert, S. C., & Varoquaux, G. 2011,Computing in Science & Engineering, 13, 22Veilleux, S., Kim, D.-C., & Sanders, D. B. 2002, ApJS, 143, 315Virtanen, P., Gommers, R., Oliphant, T. E., et al. 2020, NatureMethods, 17, 261Wake, D. A., Bundy, K., Diamond-Stanic, A. M., et al. 2017, AJ,154, 86Wang, B., Cappellari, M., Peng, Y., & Graham, M. 2020, MNRAS,495, 1958Waskom, M., Botvinnik, O., Hobson, P., et al. 2014, Seaborn:V0.5.0 (November 2014), doi:10.5281/zenodo.12710Weigel, A. K., Schawinski, K., Caplar, N., et al. 2017, ApJ, 845,134Wen, Z. Z., & Zheng, X. Z. 2016, ApJ, 832, 90Westfall, K. B., Cappellari, M., Bershady, M. A., et al. 2019, arXive-prints, arXiv:1901.00856White, H. A., Fisher, D. B., Murray, N., et al. 2017, ApJ, 846, 35White, S. D. M., & Frenk, C. S. 1991, ApJ, 379, 52White, S. D. M., & Rees, M. J. 1978, MNRAS, 183, 341Yan, R., Bundy, K., Law, D. R., et al. 2016a, AJ, 152, 197Yan, R., Tremonti, C., Bershady, M. A., et al. 2016b, AJ, 151, 8 Figure 18.

A comparison of the stellar velocity (top row) and velocity dispersion maps (bottom row) for an example galaxy snapshot from theq0.5_fg0.3 merger. We include the r − band image for this snapshot (without observational effects, top left panel). The GADGET-3 particlevelocity and velocity dispersion map are in the middle left column, the kinematic maps derived from the NONSCATTER

SUNRISE spectraare in the middle right column, and the kinematic maps from the SCATTER

SUNRISE spectra are in the right column. The NONSCATTERand SCATTER kinematic have undergone all steps to achieve mock MaNGA observations, while the

GADGET-3 derived maps have onlybeen convolved and rebinned. Additionally, the

GADGET-3 maps are a mass-weighted quantity, while all others are ﬂux-weighted. Thedifferences between the maps are therefore due to these effects in addition to the effects of radiative transfer and the proper treatment of dust.The NONSCATTER and SCATTER velocity and velocity dispersion maps have signiﬁcantly larger values, which is especially apparent in thecenter of the velocity dispersion maps. The SCATTER maps have larger values at all locations relative to the NONSCATTER maps; this pointsto the importance of the proper treatment of dust when creating mock stellar kinematic maps.

APPENDIX A. THE IMPORTANCE OF RADIATIVE TRANSFER AND DUST SCATTERING AND ABSORPTIONA central goal of this paper is to design a process to produce synthetic kinematic maps from simulated galaxies that are as closeto full realism as possible. However, we acknowledge that running full radiative transfer (RT) and ﬁtting the absorption linesof the resultant datacube is complex and computationally expensive. Bottrell et al. (2019) explore the concept of observationalrealism and which steps are necessary to produce a CNN classiﬁer for merging galaxies that achieves high accuracy. They ﬁndthat the RT step is unnecessary and that it can be avoided in order to achieve signiﬁcant gains in computational time. Since thegoal of our approach is to produce observations that closely mimic MaNGA kinematic maps, we carry out a full RT (‘full’ meansthat we include the effects of dust scattering and absorption) and we additionally apply observational realism, which is detailedin §2.2. Here we focus on the importance of radiative transfer and the proper treatment of dust.We ﬁrst explore the differences between stellar kinematics derived from the

GADGET-3 particles (no RT) and the stellarkinematics derived from the

SUNRISE spectra (RT) without dust scattering and absorption. From here on,

GADGET-3 derivedkinematics are the ‘particle’ kinematics, the

SUNRISE spectra without dust effects are the ‘NONSCATTER’ spectra, and the

SUNRISE spectra that incorporate dust scattering and absorption are the ‘SCATTER’ spectra . An important caveat to ouranalysis on the differences between these kinematic maps is the differences in observational effects. We apply all observationaleffects to the NONSCATTER and SCATTER spectra, convolving, rebinning, and incorporating noise, but we create the particlekinematic maps without noise effects. Additionally, the particle kinematics are mass-weighted, so the convolution kernel thatwe apply to the particle velocity and velocity dispersion maps is a rough approximation of the convolution of a ﬂux-weighted When we initially began this investigation, we found that there was a small error in the public version of

SUNRISE that affected the shape of computedemission and absorption lines (private communication, Chris Hayward and Raymond Simons). This error in

SUNRISE occurs when kinematic shifts are appliedto emission and absorption lines during the dust scattering stage, and therefore applies only to SCATTER spectra generated from dust RT with kinematics. Wewere able to locate and ﬁx the bug prior to running our production-run

SUNRISE simulations for this paper. Figure 19.

Difference maps for the stellar velocity (ﬁrst and second panels) and stellar velocity dispersion maps (third and fourth panels).We show the difference between the particle and NONSCATTER maps and the SCATTER and NONSCATTER maps. The particle velocitymap has a faster rotating core (ﬁrst panel) and a lower central velocity dispersion (third panel) relative to the NONSCATTER maps, which areboth characteristic beam-smearing effects. The SCATTER maps are faster rotating and have a larger velocity dispersion across the galaxy disk,which could be due to dust preferentially obscuring dynamically young stars. quantity (i.e., the NONSCATTER and SCATTER spectra). We apply this convolution kernel as a ﬁnal step after the kinematicmaps are generated.In Figure 18 we present the velocity maps (top row) and the velocity dispersion maps (bottom row) produced at various stagesof realism for a snapshot from the q0.5_fg0.3 merger. We begin with the kinematic maps derived from the mass-weighted particlevelocities (second column), then the kinematic maps from the NONSCATTER spectra (third column), which are produced usingRT, and ﬁnally the kinematic maps from the SCATTER spectra (fourth column), which then additionally incorporates the effectsof dust scattering and absorption. To ﬁrst order, we ﬁnd that the velocity and velocity dispersion maps are similar enough to verifythat our methodology for creating stellar velocity and velocity dispersion maps from mock

SUNRISE spectra is not failing. Morespeciﬁcally, signiﬁcant differences between the

SUNRISE -derived maps and the particle-derived maps could indicate bugs in

SUNRISE , failures in the steps that produce mock spectral datacubes, or problems with the ppxf absorption line ﬁtting thatproduces the velocity and velocity dispersion maps.We present the difference maps between the particle and NONSCATTER and SCATTER and NONSCATTER maps in Figure19. We ﬁrst examine the differences between the particle kinematic maps and the NONSCATTER kinematics maps, which arethe ﬁrst and third panel of this ﬁgure, for the stellar velocity and velocity dispersion maps, respectively. Radiative transfer,observational effects, and the fact that the particle maps are derived from mass-weighted velocities can cause differences betweenthese maps. For instance, the NONSCATTER maps have larger bins towards the exterior of the galaxy and additionally showmore variation on a spaxel-to-spaxel basis due to noise. The NONSCATTER maps have signiﬁcantly different velocity andvelocity dispersion values, especially at the center of the galaxy. There is a faster rotating feature at the center of the particlevelocity map, which manifests as a rotating feature in the leftmost panel of Figure 19. Additionally, the NONSCATTER-deriveddispersion is signiﬁcantly higher at this location, which is apparent as an over subtraction in the third panel. Since the centralfeature is slower rotating and has a higher velocity dispersion, it is most likely related to beam-smearing. The differences in theimplementation of convolution in the particle maps, which are convolved as a last step, and the NONSCATTER maps, whichare convolved prior to deriving the velocity and velocity dispersion values, could contribute to the differences in these centralregions.We next evaluate the role of dust in the kinematic maps by comparing the NONSCATTER kinematic maps to the SCATTERversion in the second and fourth panels of Figure 19, for the stellar velocity and stellar velocity dispersion maps, respectively.The effects of dust on the spectra of galaxies are especially important in the context of merging galaxies. For instance, nearlyall ultra-luminous infrared galaxies (ULIRGs, L IR > L (cid:12) ) are dust-dominated major mergers with buried starburst or AGNs(e.g., Veilleux et al. 2002; Tacconi et al. 2008). While the galaxies in this simulation suite are not ULIRGs, they still showcasekinematic effects due to the presence of dust. For instance, the NONSCATTER maps tend to underestimate the velocity dispersionrelative to that of the SCATTER maps. This is apparent in the fourth panel of Figure 19, which has a signiﬁcant number of spaxelswith negative values, indicating that the SCATTER dispersion map has larger values. In addition, the difference is apparent forthe majority of spaxels in the galaxy, which could indicate that dust is important on galaxy scales. The NONSCATTER andSCATTER velocity maps also differ. The stars in the disk have higher rotational velocities for the SCATTER extension, meaningthat dust may be obscuring the slower-rotating dynamically young population. We run the classiﬁcation with both SCATTER7 Å ]0.00.51.01.52.0 N o r m a li z ed / Average

Figure 20.

Trend of inverse noise (1/ σ , or √ ivar ) with wavelength for the central spaxel of 20 MaNGA galaxies (color lines). The normalizedmedian trend is black. The largest variation happens galaxy to galaxy as opposed to spaxel to spaxel, so we take the average 1/ σ spectrum(black) to use as our characteristic 1/ σ , which we scale to the ﬂux of each simulated spaxel to produce an error spectrum. and NONSCATTER extensions and ﬁnd that both the individual predictors and the classiﬁcation change signiﬁcantly (in terms ofboth the selected predictors and their coefﬁcients), so we conclude that the effects of dust are important for the stellar kinematics.Previous work has investigated the effects of dust in SUNRISE spectra in depth. Stickley & Canalizo (2016) compare measure-ments of the stellar velocity dispersion ( m σ ∗ , from the GADGET-3 particle velocities) to the SCATTER version of the

SUNRISE spectra ( f σ ∗ ), and ﬁnd that the offset between the two measurements can be as large as 20-30% for isolated galaxies and as largeas 100% for extreme cases such as actively merging systems with disturbed morphologies. They conclude that when comparingobservations to simulations, it is important to measure ﬂux-weighted kinematics and to incorporate dust. The dust acts to pref-erentially obscure younger stars, elevating measurements of f σ ∗ relative to m σ ∗ . Stickley & Canalizo (2016) also ﬁnd that thedistribution of the dust is more important than the total attenuation due to dust. For the case of merging galaxies, this message isespecially relevant; the effects of dust on the merger kinematics cannot be ignored.We conclude with two points: 1) The kinematic maps derived from the particle velocities are similar to the SUNRISE -derivedkinematic maps, indicating that the technique for creating mock stellar kinematic maps is not failing, and 2) The kinematic mapshave important differences that occur over the course of RT, the proper treatment of dust, and the subsequent mock observationalsteps. While we do not fully propagate the particle kinematic maps through the LDA, we ﬁnd that they are signiﬁcantly differentin appearance than the RT’ed maps and that the derived kinematic predictors are also affected. This veriﬁes that in order toapply the classiﬁcation to observed galaxies, we need to train the classiﬁcation using galaxies that have undergone RT, are ﬂux-weighted, and incorporate observational effects. It is beyond the scope of this paper to conduct a full analysis into which steps ofthe RT and subsequent ppxf derivation of the kinematic maps are most important for creating the realistic galaxies and why. B. CREATING A MOCK NOISE SPECTRUMIn order to create realistic mock datacubes we also create a mock noise datacube for each snapshot. For an IFS datacube, thetypical noise trends with wavelength and so we ﬁrst measure the median inverse noise (1/ σ , or √ ivar ) trend for the central spaxelof 20 MaNGA galaxies (Figure 20). These 20 galaxies are composed of four randomly selected galaxies from each of the ﬁve8 Figure 21.

Stellar velocity (left) and stellar velocity dispersion (right) maps for a snapshot of the q0.5_fg0.3 simulation with an AGN. Thisspeciﬁc viewpoint and snapshot has the largest contribution from AGN emission and therefore showcases the most extreme example of theeffects of contamination in the stellar kinematic maps by AGN emission. We present the kinematics from the snapshot with the AGN turned offin the top row, followed by the AGN turned on in the middle row, and the AGN turned on with our modiﬁed masking procedure in the bottomrow. When the AGN light is not removed and no emission line mask is used (middle row), ppxf is unable to accurately ﬁt the absorption linesin areas that are contaminated by the AGN light for both velocity and velocity dispersion. These snapshots are not reliable for the extraction ofthe kinematic predictors. When we mask the emission lines (bottom row), the kinematic predictors are slightly affected. However, this effect isonly signiﬁcant for a few snapshots so the LDA classiﬁcation is unaffected by the slight difference between the masked emission lines and therun with the AGN emission turned off. ﬁber bundles. We test how the 1/ σ spectrum changes for the location of the spaxel relative to the center of the galaxy and ﬁndthat there is more variation (in the shape of the 1/ σ spectrum) between different galaxies than with location in the galaxy. Wetherefore compute the median 1/ σ spectrum using the central spaxel for all 20 MaNGA galaxies.To compute the noise spectrum for each spaxel, we multiply the normalized 1/ σ trend with wavelength by the g − band S/N ofthat spaxel to determine how S/N trends with wavelength. We then divide the mock spectra for each spaxel by the S/N trend withwavelength for that spaxel to get the error (or σ ) spectrum. Finally, we multiply the error spectrum by a random normal Gaussianwith a mean of zero and a standard deviation of one. This is the ‘realization’ of the noise for that spaxel. We add the realizationof the noise to the spectrum and use the error spectrum as an input to ppxf . To verify that the randomly selected galaxies are a fair representation of the MaNGA sample, we compare the average ivar spectrum in Figure 20 with theaverage ivar spectra computed as a function of stellar mass. We ﬁnd that the shape of the ivar spectrum varies little below a stellar mass of log M ∗ ∼

11 and thatthe average spectrum presented in Figure 20 is representative of the ivar spectra of the majority of MaNGA galaxies below this mass cutoff. C. AGN CONTAMINATIONThe framework for the simulation suite was originally created in Blecha et al. (2018) to determine the timeline of the activationand fueling of AGNs during mergers. It therefore contains broad line AGNs, which can complicate the ﬁtting of the stellarkinematics. While all of the simulations include AGNs, the q0.5_fg0.3 simulation hosts the brightest AGN, so we focus on thissimulation to describe the effects of a BL AGN on the stellar kinematics. The AGN in the q0.5_fg0.3 simulation achieves amaximum bolometric luminosity of L AGN = . erg s − , which is 90% of the total luminosity. The duty cycle is short formoderate and high luminosity AGNs; the AGN dominates the spectra for <

50 Myr. Therefore, even in the q0.5_fg0.3 simulation,the AGN luminosity is subdominant for the majority of snapshots, even during the late stage of the merger.Most MaNGA galaxies do not host AGNs, so the MaNGA DAP is not equipped to mask broad emission lines and thereforefails to ﬁt the stellar absorption lines. Bright AGN in MaNGA galaxies are often masked by the DAP ﬁtting procedure, leaving ahole at the center of the galaxy in the stellar kinematic maps. We demonstrate this type of failure in the middle row of Figure 21for the snapshot of the q0.5_fg0.3 simulation that has the largest contribution from AGN emission. When the AGN continuumis strong and the emission lines are broad, which is the case for 3-4 (out of 20 total) snapshots of the q0.5_fg0.3 simulation,the DAP produces velocity and velocity dispersion values of 1000 km s − , which is the artiﬁcial convergence of ppxf on themaximum allowed values. These values are then masked.Since our eventual science goals include investigating AGN fueling in mergers, we aim to include galaxies with broad lineAGN in the analysis. Therefore, we explore multiple different options for removing the AGN contamination. One approach toremove the AGN contamination is to determine the PSF of the AGN and subtract this from the datacube and then ﬁt the stellarkinematics. A different approach is to modify ppxf to ﬁt broader emission lines. We ﬁnd that determining the AGN PSF isnontrivial and can lead to over-subtraction of continuum light, so we pursue the other approach, modifying ppxf to ﬁt broademission lines. This approach could also be applied to MaNGA to ﬁt the stellar kinematics of broad line AGNs.We modify our ppxf ﬁtting procedure to include a step where we use the ‘ELPEXT’ emission line mask to mask all of theemission lines. We experiment with ﬁtting these emission line masks to galaxies with no broad lines and we ﬁnd that the returnedvelocity maps match that of maps without an emission line mask, so we use this procedure for all mock datacubes, not justthe ones that are dominated by AGN light. We rerun the q0.5_fg0.3 simulation with the AGN turned off in order to verify thesuccess of our approach and we present the results in Figure 21. We ‘turn the AGN off’ by removing the AGN contributionto the spectrum and re-running the RT (we still include dust absorption and scattering). The GADGET-3 output is unchanged,meaning that we do not remove BH accretion and feedback from the simulated galaxy. The modiﬁed version of ppxf producessimilar velocity and velocity dispersion maps to the simulation with the AGN turned off. In Figure 21 we demonstrate that fora few snapshots where the AGN emission is strongest, the velocity dispersion is elevated at the location of the AGN. A fullmultiwavelength subtraction of the PSF would be necessary to totally eliminate this effect.The kinematic predictors measured from the maps where the AGN is off are consistent in most cases with those measuredwith the modiﬁed emission line mask, which means that the classiﬁcation does not change signiﬁcantly with and without AGNemission. The exception is the kinematic predictors that trace the properties of the velocity dispersion map, such as µ , σ , µ , σ , and | µ , σ | . These predictors all increase because the velocity dispersion distribution has higher values from the AGN region, whichresults in a larger mean, variance, and skew in the velocity dispersion distribution. The λ R e value also increases in the presence ofa bright face-on AGN because λ R e is a ﬂux-weighted measurement that is much more sensitive to the enhanced velocities in theregion of the BL AGN. These predictors are signiﬁcantly elevated in the presence of an AGN only in the case where the AGN isvery bright and face-on. This occurs for ∼ r − band concentration of thegas rich major mergers. However, the most important predictors for the gas rich major mergers are asymmetry-based, whichindicates that the classiﬁcation relies more on the lower surface brightness features of the major mergers and are not dominatedby the central AGN light.0 D. PREDICTORS THAT WERE NOT SELECTED BY THE RFRHere we continue the description of kinematic predictors from §2.3 that are not selected by the LDA and therefore not intro-duced previously. In other words, here we describe the predictors that are not as important for identifying mergers: A , v asym , σ asym , ∆ x V , and ∆ x σ .In §2.3 we deﬁne the Radon Transform and Radon proﬁle. Here we describe how we determine the kinematic center of thegalaxy using these as tools and we deﬁne the A predictor. We follow the procedure from Stark et al. (2018) to determine thegalaxy’s kinematic center. We use the photometric center as the initial input, but ﬁnd that the photometric center is not always thesame as the kinematic center. Since an incorrect kinematic center can cause variations in the calculation of the Radon Transform,we ﬁrst determine the kinematic center using the Radon proﬁle and then extract the kinematic predictors using this kinematiccenter. The kinematic center is the location where the weighted kinematic asymmetry ( A ) is minimized, which is the asymmetryin the estimated ˆ θ values, which are measured from the bounded Absolute Radon Transform. The A predictor is deﬁned as: A = (cid:80) | ˆ θ − ˆ θ f lip | N i , j w i , j (D1)where ˆ θ is reversed to make ˆ θ ﬂip , N i , j is the number of values in the ˆ θ array at the current ‘center’, and w i , j is a weight factor: w i , j = N , N i , j (D2)where N , is the number of values at the photometric center.We iteratively measure A in a 3 × A value as our A predictor for each snapshot. If the spaxel with the lowestvalue of A is consistent within errors with the photometric center, we select the photometric center as the kinematic center. If thespaxel with the lowest value is on the edge of the spaxel grid, we expand the grid by a factor of two and rerun the determinationof the kinematic center.If the kinematic center is again at the edge of the grid, we do not expand the grid, but take the photometric center as thekinematic center. In this case, the kinematic center is not well-determined, often due to a disorganized velocity map, so thephotometric center is a fair guess for the kinematic center. In Stark et al. (2018), the galaxies where the kinematic center is notwell-determined are eliminated from the analysis, but in this case, a large fraction of galaxies have disordered kinematics, so weinclude them in the analysis, using the photometric center as the kinematic center.We ultimately ﬁnd that A is highly correlated with A and is therefore superﬂuous to the analysis. As a result, it is rejectedduring the RFR step.We measure the v asym and σ asym predictors with kinemetry , which we previously introduced in §2.3. Our goal with thesepredictors is to quantify the chaotic velocity patterns expected for interacting systems in the velocity and velocity dispersionmaps using the degree of kinematic asymmetries from the kinemetry output. The kinemetry output includes the A n and B n coefﬁcients from the harmonic expansion of the best ﬁtting ellipses for each ellipse. From these, we calculate the amplitude andphase coefﬁcients ( k n and φ n ) : k n = (cid:113) A n + B n (D3) φ n = arctan (cid:16) A n B n (cid:17) (D4)The higher order A n and B n terms represent deviations from an ideal rotating disk. We quantify these perturbations with v asym and σ asym , which are measured from the k n amplitude coefﬁcients of the model velocity and velocity dispersion maps, respectively(Krajnovic et al. 2006; Shapiro et al. 2008; Hung et al. 2016; Bellocchi et al. 2016):v asym = < (cid:80) = k n , v / , v > r , σ asym = < (cid:80) = k n , v / , v > r (D5) We do not explicitly apply a ﬂux weighting for the calculation of the phase coefﬁcients, as in Krajnovi´c et al. (2011). However, we do use an uncertaintyweighting in the kinemetry ﬁtting, which indirectly incorporates a ﬂux weighting. Additionally, kinemetry uses adaptively sized annuli (which increase insize with radius), which already emphasizes the high ﬂux inner regions of the galaxy. r . The amplitude coefﬁcients, k n , v , are summed and averaged for the n higherorder moments. We exclude the k , v term from the calculation of v asym since it represents radial outﬂow, which is not associatedwith stars (Shapiro et al. 2008). We normalize v asym by the circular velocity term and σ asym by the A term, which is the amplitudeof the velocity dispersion maps, as in Krajnovic et al. (2006). The v asym and σ asym predictors increase for disordered velocitiesand velocity dispersions.We deﬁne ∆ x σ as the difference in spatial position between the centroid of the galaxy in r − band imaging and the centroidof the galaxy’s velocity dispersion map. We determine the centroid (from both the r − band image and the velocity dispersion)by applying a 10 ×

10 pixel low pass ﬁlter, thresholding, and then identifying the center of the brightest contour. We refer to thephysical distance (in kpc) between the centroid of the r − band image and the centroid of the velocity dispersion map as ∆ x σ . Weuse a physical distance as opposed to a spaxel distance in order to avoid a sensitivity to galaxy redshift. We also measure ∆ x V ,which is the difference (in kpc) between the centroid of the galaxy in r − band imaging and the kinematic center. E. LDA STABILITY: IS THE TRAINING SET BIASED?Here we interrogate the biases of our simulation suite. We are speciﬁcally interested in two questions:1. Are we justiﬁed in directly comparing the classiﬁcations from the different simulations of mergers?2. Is the classiﬁcation cheating? In other words, are the merging and nonmerging galaxies biased in a way that allows theclassiﬁcation to identify mergers using nuisance parameters?First, we tackle the validity of comparing the different simulations. We ﬁnd that the LDA differs signiﬁcantly for the dif-ferent merger simulations both in performance and in the selected predictors. We want to ensure that this is a reﬂection of thephysical properties of the stellar kinematics in these different mergers and not merely a reﬂection of irrelevant differences in thedataset. For instance, the classiﬁcation from the q0.2_fg0.3_BT0.2 simulation has a signiﬁcantly lower accuracy than any othersimulation, including the q0.1_fg0.3_BT0.2 simulation, which we would naively expect to have the lowest accuracy. Could thisbe a result of the q0.2_fg0.3_BT0.2 simulation having fewer datapoints than any other simulation? We randomly discard datafrom all simulations so that they are limited to the same amount of data as the q0.2_fg0.3_BT0.2 simulation. We ﬁnd that boththe performance and the selected predictors for all runs are stable when we decrease the number of simulated snapshots, whichindicates that the number of snapshots is not biasing the classiﬁcation.Second, we carefully examine the properties of the isolated galaxies relative to the merging galaxies for each simulation. Thisis known as the between-class bias of the classiﬁcation. Our concern is that differences between the merging and nonmergingpopulations (i.e., trends with inclination or size) may affect the kinematic predictors. For instance, we ﬁnd that the isolatedsnapshots tend to be larger in size and slightly more face-on relative to the nonmerging population for all simulations. Sinceit is known that properties like inclination can signiﬁcantly affect the observed stellar kinematics, we want to ensure that theclassiﬁcation is not relying upon the underlying properties of the galaxies to cheat in its classiﬁcation of merging galaxies. Thiscould happen if, for instance, the nonmerging galaxies are more face-on and therefore the kinematic predictors are picking upon this property as opposed to features that are related to the mergers themselves. We have chosen a relatively transparentclassiﬁcation method, so we can use our interpretation of the individual predictors to diagnose if between-class bias is a concern.The ﬁrst line of logic we use to prove that the between-class bias is not signiﬁcant is the interpretation of the importantpredictors throughout §4. The most important predictors track features that can be associated with merging galaxies. Additionally,the sensitivity of the method changes with time, meaning that the kinematic features are not long-lived like one could expect froma classiﬁcation that relies mainly on inclination or size effects, which do not change dramatically over time in a merger.We also investigate the between-class bias by directly introducing suspicious galaxy properties into the analysis. For instance,we expect that if the classiﬁcation is instead tied to inclination or size, the kinematic predictors would be correlated with thesequantities and therefore the LDA would depend most strongly on inclination or size. To test if this is the case, we introduceseveral parameters that quantify size and inclination into the LDA as ‘nuisance’ predictors to determine if they are important. Weuse the galaxy ellipticity ( (cid:15) ), the number of spaxels in the velocity map, and the effective radius R ee