[PDF] A self-supervised, physics-aware, Bayesian neural network architecture for modelling galaxy emission-line kinematics

Abstract

In the upcoming decades large facilities, such as the SKA, will provide resolved observations of the kinematics of millions of galaxies. In order to assist in the timely exploitation of these vast datasets we explore the use of a self-supervised, physics aware neural network capable of Bayesian kinematic modelling of galaxies. We demonstrate the network's ability to model the kinematics of cold gas in galaxies with an emphasis on recovering physical parameters and accompanying modelling errors. The model is able to recover rotation curves, inclinations and disc scale lengths for both CO and HI data which match well with those found in the literature. The model is also able to provide modelling errors over learned parameters thanks to the application of quasi-Bayesian Monte-Carlo dropout. This work shows the promising use of machine learning, and in particular self-supervised neural networks, in the context of kinematically modelling galaxies. This work represents the first steps in applying such models for kinematic fitting and we propose that variants of our model would seem especially suitable for enabling emission-line science from upcoming surveys with e.g. the SKA, allowing fast exploitation of these large datasets.

Full PDF

MMNRAS , 1–12 (2021) Preprint 11 February 2021 Compiled using MNRAS L A TEX style ﬁle v3.0

A self-supervised, physics-aware, Bayesian neural network architecture formodelling galaxy emission-line kinematics

James M. Dawson ★ , Timothy A. Davis , Edward L. Gomez , , and Justus Schock Cardiﬀ University, School of Physics and Astronomy, The Parade, Cardiﬀ CF24 3AA, UK Las Cumbres Observatory, Suite 102, 6740 Cortona Dr, Goleta, CA 93117, USA RWTH Aachen University, Templergraben 55, 52062 Aachen, Germany

Accepted 2021 February 10. Received 2021 February 9; in original form 2020 October 19

ABSTRACT

In the upcoming decades large facilities, such as the SKA, will provide resolved observations of the kinematics of millionsof galaxies. In order to assist in the timely exploitation of these vast datasets we explore the use of a self-supervised, physicsaware neural network capable of Bayesian kinematic modelling of galaxies. We demonstrate the network’s ability to model thekinematics of cold gas in galaxies with an emphasis on recovering physical parameters and accompanying modelling errors. Themodel is able to recover rotation curves, inclinations and disc scale lengths for both CO and Hi data which match well with thosefound in the literature. The model is also able to provide modelling errors over learned parameters thanks to the application ofquasi-Bayesian Monte-Carlo dropout. This work shows the promising use of machine learning, and in particular self-supervisedneural networks, in the context of kinematically modelling galaxies. This work represents the ﬁrst steps in applying such modelsfor kinematic ﬁtting and we propose that variants of our model would seem especially suitable for enabling emission-line sciencefrom upcoming surveys with e.g. the SKA, allowing fast exploitation of these large datasets.

Key words: galaxies: kinematics and dynamics – methods: data analysis – techniques: image processing

In studying galaxy evolution, astronomers often use the atomic Hy-drogen (Hi) 21-cm line to trace the outermost regions of galacticdiscs (e.g. Warren et al. 2004; Begum et al. 2005; Sancisi et al. 2008;Heald et al. 2011; Koribalski et al. 2018). This region can mark thecontinuous boundary between galaxies and their surrounding envi-ronments, including the dark matter halos within which galaxies arethought to reside. The rotation curves of extended Hi discs can beused to begin probing the properties of dark matter halos as well asallow the detailed modelling of galaxies’ mass distributions whencoupled with ancillary observations (e.g. van Albada et al. 1985;de Blok et al. 2008). In the local Universe, Hi discs are useful indetermining the gaseous content of a galaxy as well as allowing as-tronomers to probe kinematic properties ranging from substructuressuch as bars, warps, counter-rotating discs, and spiral arms (e.g. Józsaet al. 2007; Spekkens & Sellwood 2007; Kamphuis et al. 2015; DiTeodoro & Fraternali 2015). Molecular gas observations (typicallyof the CO molecule) can provide a complimentary view of theseregions at high resolution, revealing the interplay between these gasphases. Hi is typically more extended than molecular gas, however,allowing it to trace environmental properties such as extended tidalfeatures and the existence of dwarf companions (Hibbard et al. 2001;Sancisi et al. 2008; Heald et al. 2011; Serra et al. 2013; Bosma 2016;Koribalski et al. 2018).The evolution of Hi gives astronomers insight into the methodby which galaxies accrete material from surrounding environments ★ E-mail: dawsonj5@cardiﬀ.ac.uk and how the mass of galaxies builds and evolves through star forma-tion. The next generation of Hi survey instruments (e.g. the SquareKilometre Array, Dewdney et al. 2009, Australian Square KilometreArray Pathﬁnder, Johnston et al. 2007, 2008, the South African Meer-Karoo Array Telescope, Jonas & MeerKAT Team 2016, the ChineseFive-hundred metre Aperture Spherical Telescope, Li & Pan 2016)are poised to collect observations spanning a large look-back time,advancing our Hi driven science as well as pushing this ﬁeld ofastronomy ﬁrmly into the

Big Data era.Currently it is estimated that the S quare K ilometre A rray (SKA)will collect data on the order of hundreds of petabytes per year. Giventhat amount of data is not only too much to fully exploit by hand butalso too large to store, astronomers should be looking to develop real-time models that can perform eﬃcient science on incoming data. In anideal world, physical information would be extracted from incomingdata automatically, leaving the work of unravelling the prevailingscience to astronomers. However, with such large data volumes andtime-intensive techniques how are astronomers to begin moving in adirection in which we can fully exploit the data quality promised bythe SKA?In previous work we sought to begin addressing this challengevia the application of machine learning (Dawson et al. 2019), andin particular neural networks, to extract kinematic properties of coldgas in galaxies. Models and tools exist to do this kind of work al-ready. With the upcoming data releases from surveys such as theWideﬁeld ASKAP L-Band Legacy All-Sky Blind Survey (WAL-LABY), it comes as no surprise that kinematic modelling tools (e.g. © a r X i v : . [ a s t r o - ph . GA ] F e b James M. Dawson et al. Di Teodoro & Fraternali 2015, Oh et al. 2017),

FAT Kamphuis et al. 2015, and

KinMS Davis et al. 2013; Daviset al. 2020) have been in use and ongoing development for sometime. Yet these models typically require several minutes or more toprovide a full kinematic model of a single object, and longer if errorsare required, which may prove problematic for kinematic analyses atSKA survey speeds.In the past decade machine learning (ML) has become a popu-lar solution to many

Big Data challenges in galaxy evolution stud-ies (e.g. Dieleman et al. 2015; Domínguez Sánchez et al. 2018a,b;Ackermann et al. 2018, Bekki 2019), but remains an under-utilisedresource among the galaxy kinematics community. Computer vision,which often utilises ML techniques, has been successfully applied tokinematic characterisation (e.g. Stark et al. 2018). Yet, there is a dis-tinct absence of directly exploiting ML (with the notable exceptionof a few recent works, e.g. Shen & Bekki 2020). Recently our grouphas made attempts to exploit the use of ML in this ﬁeld, featuringthe use of convolutional autoencoders to identify disturbed cold gasin galaxies using data from both simulations and observations (seeDawson et al. 2019). We still have a long way to go in fully explor-ing the application of ML to galaxy kinematic characterisation butit appears to be a promising avenue of research and one which weexplore further in this work.While conventional ML models are capable of high empiricalaccuracy and low testing time (e.g. Breiman 2001; Krizhevsky et al.2012), they are often highlighted for their slow training times (Limet al. 2000) and, in some cases, reluctance to generalise to unseendatasets (Dinh et al. 2017; Kawaguchi et al. 2017). These qualitiesare unsuitable for survey tasks proposed for the SKA and thereforewe are required to look at alternative methods that incorporate thebeneﬁts of ML, without the drawbacks associated with standard MLpractice.Such an approach may exist in the form of self-supervisedlearning (Liu et al. 2020), whereby models train themselves withoutthe need for an isolated training set. This has huge beneﬁts in thatone does not require long training times on a throw-away-dataset,essentially eliminating data wastage. As with all machine learningapproaches, self-supervised learning does have its disadvantagesincluding requiring ﬁxed analytical functions to perform training,as well as results which change depending on when one wishes toevaluate test data throughout the model training procedure. Few pilottests of these networks exist in astronomy (and even fewer utilisingphysics-aware capabilities, e.g. Aragon-Calvo 2019) and none existin the modelling of galaxy kinematics. In this paper we present thecurrent results from our ﬁrst attempts at creating a self-supervisedneural network with the primary goal of inferring the kinematicproperties of gas discs in galaxies and an emphasis on extracting(simplistic) characteristics of their rotation curves.The paper is divided into 3 main sections. §2 gives an in depthdescription of the model architecture used throughout this work,with emphasis lying on the decoder subnet described in §2.4. §3presents the results from testing the network using synthetic and realinterferometric observations, and §4 summarises the main outcomesof the work presented in this paper as well as proposed avenues forfuture work. https://editeodoro . github . io/BBarolo/ https://github . com/seheonoh/2dbat https://github . com/PeterKamphuis/FAT https://github . com/TimothyADavis/KinMSpy Figure 1.

A simpliﬁed pictorial representation of the neural network usedthroughout this work. The model features two convolutional encoder subnetswhich concatenate learned features before passing them to a decoder subnet.The model receives moment maps as inputs and minimises the loss betweendecoder-generated moment map outputs and the inputs throughout training.In the diagram grey squares indicate convolutional layers, blue rectanglesdepict linearly connected layers, and the grey cube represents the auxiliary3D cube containing the coordinate axes passed into the network.

A typical interferometric observation returns visibilities in a com-plex plane from which one can obtain a 3D datacube consisting of2D spatial ﬂux observations separated into discrete channels whichcorrespond to observed frequency. It is this channelisation that al-lows astronomers to measure the line of sight velocities and hence thekinematic properties of galaxies’ gas reservoirs. In practice, one cancollapse these datacubes further to create 2D maps that reﬂect themean properties of the gas in galaxies. A moment zero (integrated in-tensity) map is simply a summation along a cube’s frequency/velocitydimension:Moment zero = ∫ I v dv = ∑︁ I v , (1)and a moment one (velocity) map is an intensity weighted averagingof the line of sight velocitiesMoment one = ∫ ( v ) I v dv ∫ I v dv = (cid:205) ( v ) I v (cid:205) I v . (2)Working directly with the datacubes, or in fact the complex visibil-ities, would be optimal for any fast pipeline kinematic modelling tool.However, we have chosen to work with moment maps in this workas a ﬁrst step and to avoid the problems associated with channelisedinputs as explained further in §4. It should be noted that, because ofour choice to use moment maps, the models described in this workare also suitable to analyse optical IFU maps, as they will be handledsimilarly by the model described in this work and have been shownto encode kinematic information which can be extracted using bothanalytical and ML approaches (e.g. Stark et al. 2018; Hansen et al.2020). This will be explored further in future work (Dawson et al.,in prep).It should be noted that in this work, we are not making attemptsto mitigate the eﬀects of "beam smearing" (Swaters 1999; Blais-Ouellette et al. 1999). During the recovery of datacubes from com-plex visibilities, the raw observational datacubes are convolved witha restoring beam which eﬀectively encodes the complex visibilityplane coverage and is, in some ways, analogous to resolution. It is MNRAS000 , 1–12 (2021) elf-supervised kinematic modelling this convolution step which gives rise to "beam smearing", the eﬀectsof which are discussed further in §3.1.2 along with implications forinterpreting the model results discussed in this work. Counteracting"beam smearing" will need to be tackled in future work to maximisethe eﬀectiveness of models of this type. An autoencoder (Rumelhart et al. 1986) is a model composed of twosubnets, an encoder and a decoder . In an undercomplete autoencoder the encoder subnet extracts features and reduces input images to aconstrained number of nodes. This so-called bottleneck forces thenetwork to embed useful information about the input images intoa nonlinear manifold from which the decoder subnet reconstructsthe input images and is scored against the input image using a lossfunction.The aim of the model used in this work is to extract semanticallymeaningful information from observational data. Typical approachesusing a convolutional autoencoder (CAE, Masci et al. 2011) (such asthat presented in Dawson et al. 2019) are powerful for extracting ar-bitrary (hyperparametric) features that deﬁne dataset characteristics.During training, a CAE learns to minimise the diﬀerence betweeninput and output tensors rather than the diﬀerence between an out-put and target label (whether this be a continuous or categorical setof target classes). A CAE works similarly to a powerful nonlineargeneralisation of principle component analysis (PCA, Plaut 2018)whereby it ﬁnds a continuous nonlinear latent surface on which in-put data best lies. In this work, however, we would like to extractsemantically meaningful parameters of observed systems. In orderto achieve this we have combined a convolutional autoencoder witha set of analytical, gradient trackable, functions which approximatethe functional forms of observed kinematics of galaxies.The model, known as a semantic autoencoder (SAE, Kodirov et al.2017), is a modiﬁed CAE created using

PyTorch PyTorch .This imposes a constraint on the CAE by forcing the network togenerate a semantic encoding of the input images. As highlightedby Aragon-Calvo (2019), the decoder function can take any possibleform, no matter how representative of the true underlying functionsbeing modelled. In this way, we can be assured that the encoders arelearning semantically meaningful properties of the input images andare no longer tied to traditional training methods, instead allowingthe network to train on all available data (including test data) in aself-supervised manner. An SAE becomes physics-aware once theassumption is made that the decoder function can be used to revealphysically meaningful information about the input. In this paper, the http://pytorch . org/ Table 1.

The SAE encoder subnet architecture used throughout this paper.The ﬁrst column lists the name of each layer/operation, the second columndescribes the type of layer/operation, the third column shows the dimensionsof each layer’s output tensors (hence the input shape to the next layer). Thedimensions follow the PyTorch convention (batch size, number of channels,height, width). The ﬁlter column shows the dimensions (height, width) ofkernels used to perform the convolution and pooling operations. The convo-lutional and linearly connected layer groups are separated by a blank row forclarity. Name Layer/Operation Dimensions FilterInput – (64,1,64,64) –Conv 2D Convolution (64,16,64,64) (3,3)Pool 2D Max Pooling (64,16,32,32) (2,2)Conv 2D Convolution (64,32,32,32) (3,3)ReLU ReLU – –Pool 2D Max Pooling (64,32,16,16) (2,2)Conv 2D Convolution (64,64,16,16) (3,3)ReLU ReLU – –Pool 2D Max Pooling (64,64,8,8) (2,2)Conv 2D Convolution (64,128,8,8) (3,3)ReLU ReLU – –Pool 2D Max Pooling (64,128,4,4) (2,2)Lc1 Linear (64,1,1,2048) –ReLU ReLU – –Drop Dropout (p=0.1) – –Lc2 Linear (64,1,1,256) –Htanh Hard tanh activation – –Output – (64,1,1,2) – physics-awareness of the model refers to our main focus of approx-imating parameterisations for rotation curves, intensity proﬁles andrecovering galaxy inclinations (see §2.4).For a more in-depth background to the use of autoencoders we referthe reader to Bourlard & Kamp (1988) and Hinton & Salakhutdinov(2006). For both a concise and thorough introduction to the useof self-supervised, physics aware, neural networks in astronomy werecommend Aragon-Calvo (2019).

Within the network, the encoders are two convolutional-classiﬁer-like subnets. Each comprises a series of 4 convolutional and 2 fullyconnected layers, interspersed with pooling layers and activationfunctions. The encoders are used to extract and dimensionally reducefeatures from input images. The two subnets independently receivea moment zero map (a 2D intensity proﬁle, normalized in the range0–1) and a moment one map (a 2D velocity proﬁle, normalized intothe range -1–1) respectively. Throughout this work, we ensure thatthe input maps have size of 64 ×

64 pixels. All input maps whosesizes are larger or smaller, like those discussed in §3.2 and §3.3, aresubsequently up/down-sampled to a size of 64 ×

64 using

PyTorch ’s torch.nn.Upsample class, in bilinear mode. Each moment mapcarries valuable information for the decoder functions as describedin §2.4. With this in mind, the output of the encoders are two vectorswhich are concatenated before passing to the decoder subnet. For anin depth look at the encoder subnet structure see Table 1.The encoders learn the following properties: subnet 1 : observedgalaxy inclination (i) and free parameters of the intensity proﬁle MNRAS , 1–12 (2021)

James M. Dawson et al. which make up 𝜉 in Figure 1; subnet 2 : the parameters of the velocityproﬁle of the galaxy which make up 𝜉 in Figure 1. Here we detail the functions required for reconstructing the momentzero and moment one input maps from the concatenated featurerepresentations 𝜉 and 𝜉 as shown in Figure 1. In recovering themoment maps, we are primarily interested in modelling two proﬁles.Firstly, the intensity:I(r) = I exp (cid:18) − r x,y r scale (cid:19) exp (cid:18) − 𝑧 r z-scale (cid:19) , (3)where I is the intensity normalisation factor (set to 1 throughout, dueto the global normalisation described above), r x,y is the radius in the 𝑥𝑦 plane, in arcseconds, r scale is the intensity scale length in the 𝑥𝑦 plane, 𝑧 is the position in the 𝑧 axis, and r z-scale is the intensity scalelength in the 𝑧 axis set to a value of 1 spaxel throughout this work,to emulate a thin disk. Intensity values are determined by combiningthe integrals of Equation 3 across each spaxel in the 𝑥𝑦 and 𝑧 planes.Secondly, the rotational velocity:V(r) = max 𝜋 arctan (cid:18) − 𝑟 r turn (cid:19) , (4)where V max is the asymptotic line of sight velocity, r is the radius inarcseconds, and r turn is the velocity proﬁle scale length.Here, our choice of exponential intensity proﬁle and arctan velocityproﬁle are entirely arbitrary (i.e. not driven by any physical theory),but are choices motivated by some of the simplest forms that canapproximately ﬁt the typical disks and rotation curves found in theUniverse. Clearly objects that do not follow these functional formswill not be appropriately ﬁt by this network and we discuss thisfurther in §3.5. However, it should be noted that this analytical-styledecoder implementation would be equally valid for other functionalforms. For example, one could choose to ﬁt bulge-disk models withsuch an architecture, or include the inﬂuence of central point massesor the eﬀects of dark matter halos. These more realistic networks willbe explored in future works.An auxiliary 3D tensor of radii (labelled r in Figure 1) is passedinto the network, cloned, and evaluated using Equations 3 and 4.The 2D moment maps are then created using Equations 1 and 2. Thevelocity proﬁle is later converted into line of sight velocity map viaan inclination projection and velocity weighting based on the pixelangles about the line of sight axis. The network is trained with minimal optimisation of hyperparametersin order to demonstrate the simple nature of this architecture. At alltimes the network utilises a

PyTorch ’s MSELoss function whichcomputes the mean squared error: L = N ∑︁ i = (cid:0) 𝑓 ( x i ) − y i (cid:1) , (5)between the model outputs, y i , and inputs, x i , for every forward passof a batch of size N. In this case, this is the squared diﬀerence betweenthe moment zero and moment one inputs and decoder generated out-puts. It is worth noting here that all synthetically generated moment maps have the same position angle and consequently any observa-tional data used for training and testing have been de-rotated usingpublished position angle measurements. We do this as position angleis a non-physical parameter which we can easily account for in pre-processing (with e.g. the fit_kinematic_pa routine of Krajnovićet al. 2006).We use an adaptive Adam learning rate optimiser (Kingma & Ba2014) which begins with a value of 10 − and reduces via multiplica-tion of 0.975 every 2 epochs. We ﬁnd that the model converges wellafter 300 epochs for all training runs presented in this paper.Where synthetic training data is used, the network receives batchesof 64 input moment map pairs. Initial tests showed the network tobe largely unaﬀected by batch size and so 64 is arbitrarily chosen toincrease training speed.The models and Python training scripts used for the work presentedin this paper are publicly available on GitHub . Testing the network can be done in three distinct ways, depending onthe situation at hand. In order to test data, one can choose whether totrain the network on the test data alone (we call this testing procedure solo testing), to train on the test data alongside other examples (wecall this testing procedure combined ), or to use the network in fulltest mode having only trained on examples not including those datathat we wish to test (called blind testing).One can imagine the case where suﬃcient training data has beenpassed through the network in a survey, such that in order to returnrapid kinematic modelling of new observations one simply passesthe new observations through the network with no prior exposureto the training procedure. This blind testing has the advantage ofrapid testing speed but at the potential cost of lowered predictiveaccuracy, in an epistemic uncertainty dominated regime. One canalso imagine the case whereby initial survey data has been collectedand some sample of the dataset the network used to train is also inneed of testing. As the network has seen these data during the trainingprocedure, combined testing has the advantage of potentially higheraccuracy at the expense of time needed to train the model. It shouldcome as no surprise that the ideal testing scenario for this networkis combined , with a suﬃciently large training set in an aleatoricuncertainty dominated regime. However, there are cases (such as atﬁrst light of a survey) where the only test data available is that whichthe network was trained on. It is in this scenario that solo testing willoccur and although this testing regime lacks the beneﬁts aﬀorded by combined testing, it has the potential advantage of predictions notbeing inﬂuenced by anomalous data whose population increases withtraining set size.

Monte Carlo dropout

In this section we summarise the use of

Monte Carlo dropout (hence-forth MC dropout; Gal & Ghahramani 2016) to provide quasi-statistical modelling uncertainties over learned parameters withinthe model.In conventional neural network training circumstances dropout may be interpreted as permuting a trained model (Srivastava et al.2014) via the probabilistic zeroing of weights in linearly connectedlayers. Traditionally, dropout layers are used throughout training inorder to force the network to behave as an ensemble of architectures https://github . com/SpaceMeerkat/Corellia/ MNRAS , 1–12 (2021) elf-supervised kinematic modelling

20 0 20

X (pixels) Y ( p i x e l s ) Inputs

20 0 20

X (pixels) Y ( p i x e l s ) i = 37.8 ± 1.1 Outputs

Radius ( ) I n t e n s i t y r scale = 10.184 ± 0.004 Predicted profiles

20 0 20

X (pixels) Y ( p i x e l s )

20 0 20

X (pixels) Y ( p i x e l s ) Radius ( ) V s i n ( i ) k m s r turn = 1.90 ± 0.02Vsin(i) = 177.7 ± 3.8 kms NNTarget0.00.20.40.60.81.0 N o r m a li s e d i n t e n s i t y N o r m a li s e d i n t e n s i t y V l o s k m s V l o s k m s Figure 2.

A randomly selected synthesised galaxy, created using

KinMS and evaluated using the network in blind testing mode. The black dashed lines and greyareas show the mean and 1 𝜎 modelling uncertainties respectively for proﬁles predicted by the neural network model. The blue dashed lines show the targetproﬁles which were used to create the input maps. The galaxy was created with the following known parameters: i = . ◦ , r scale = . (cid:48)(cid:48) , r turn = . (cid:48)(cid:48) , andV max sin(i) = . − . The network predicted parameters are shown as text in the upper-middle, upper-right, and lower-right subplots. increased testing accuracy and generalisation power. In the case ofMC dropout, after training, dropout is reapplied to the network inevaluation mode and inputs are passed through the model manytimes, eﬀectively sampling a posterior where the model architectureis marginalised out. Gal (2016) ﬁrst proposed the idea of approximat-ing distributions over parameters learned in neural networks in thisway and has since been used in astronomy (e.g. for the probabilisticlabelling of galaxy morphologies, Walmsley et al. 2019).For an input x (comprised of a moment 0 and moment one map),training data D , model weights w, T forward-pass evaluations, andencoder output k, the predicted parameter means and standard devi-ations are given by Equations 6 and 7 respectively.ˆk = ∑︁ t 𝑃 ( k | x , w t ) (6) 𝜎 = ∑︁ t | k − k t | (7)For a comprehensive derivation of Equations 6 and 7, as wellas the implications for using an arbitrary dropout probability, werefer the reader to Walmsley et al. (2019). Examples of the posteriordistributions, p(k|w, D ), over learned parameters using MC dropoutfor a randomly selected synthesised galaxy are described further in§3.1.1.It should be noted that, as the network does not use dropout to zeroweights in the convolutional layers, 𝜎 does not represent a completeerror over learned parameters. Instead one should consider 𝜎 as alower limit error over parameters whose use becomes immediatelyobvious for pipeline ﬂagging purposes or to generate relative errorswithin a test set. The errors produced through this technique arestrictly errors due to the modelling technique, and will underestimate Table 2.

Parameter values and ranges for all synthetically generated galaxiesusing the

KinMS package. The units for r scale and r turn are absent due to bothquantities being fractions of the input map size. The position angle of eachgalaxy is ﬁxed at 0 as it is not a physically meaningful parameter. Throughoutmodel training, parameters are drawn uniformly in the ranges listed.Parameter Size/range UnitsPosition angle 0 degInclination 10–90 degr scale turn max sin(i) 50–500 km s − the true error in any parameter, which arises due to both modellingand observational uncertainties. In this section we present exemplar test results for highly spatiallyresolved galaxy observations. In each case we have trained new net-works using the procedures described in §2.5.

In order to explore the limitations of the network, we tested the modelusing synthetic galaxies generated using the

Python based kinematic

MNRAS , 1–12 (2021)

James M. Dawson et al. . .

91 0 .

21 0 .

51 0 . r s c a l e () r t u r n () . . . . . i ( ) V m a x s i n ( i )( k m s ) . . . . . r scale ( ) r turn ( ) V max sin(i) (km s ) Figure 3.

Corner plot showing the level of covariance between learned parameters for one randomly generated, synthesised galaxy (discussed further in§3.1). The accompanying histograms represent quasi-probabilistic distributions thanks to the use of

Monte Carlo dropout. This galaxy was passed through thenetwork in test mode 10 000 times in order to build the distributions. We observe well constrained learned parameters with Gaussian like proﬁles, allowing forquasi-probabilistic modelling errors for the parameters. The only strong covariance observed is that between the maximum line of sight velocity and the velocityproﬁle scale length, which is entirely expected and present in traditional kinematic analyses. simulator

KinMS (KINematic Molecular Simulation, Davis et al.2013; Davis et al. 2020). Figure 2 shows the inputs and outputsas well as both known and predicted proﬁles for a galaxy generatedusing the same analytical functions described in §2.4 with inclination,maximum velocity, and scale lengths drawn randomly in the rangesshown in Table 2, and a ﬁxed beam size of 2 resolution elements.It is clear that the model is able to recover the galaxy’s rotationcurve (and other parameters) well in blind testing mode, whereby themodel has not yet trained on the test data. The quasi-probabilisticdistributions for each learned parameter for this galaxy are shownin Figure 3, highlighting the Gaussian-like nature of the learnedparameter distributions as well as an expected covariance betweenr turn and V max sin(i).As seen in Figure 4 the model is able to recover the desired physicalparameters of synthesised galaxies well, heuristically. For the 1739 https://github . com/TimothyADavis/KinMSpy test galaxies shown in Figure 4 we measure the average deviationof parameters: i, r scale , r turn , and V max sin(i), from the 1:1 line as 𝜎 i = . ◦ , 𝜎 r scale = 0.003, 𝜎 V max sin(i) = .

48 km s − , and 𝜎 r turn =0.017 respectively.It is clear from Figure 4 that the error estimates do not represent thetotal errors over the parameters and only encode the modelling error.This makes the presented errors strictly lower limit estimates, andmostly useful for comparing reliability within the dataset, rather thanexternal use. This can be seen by the fact that on average only ∼ ∼ MNRAS000

20 40 60 80 i ( ) i p r e d () r scale r s c a l e , p r e d

100 200 300 400 500 V max sin(i) (km s ) V m a x , p r e d s i n ( i )( k m s ) r turn r t u r n , p r e d Figure 4.

True versus predicted plots for each learnable parameter in the network. Black markers and error bars pertain to the tested galaxies and the red dashedline indicates the 1:1 line on which perfect predictions should ideally lie. This model was trained using purely synthetic data with a restoring beamsize of 2resolution elements and only including well resolved examples as discussed in §3.1. Those galaxies whose projected r turn fell below 1.5 times the restoringbeamsize were removed in order to mimic the automated ﬂagging of poorly resolved galaxies at high inclination in a survey. Of the 2000 synthesised galaxiestested, 261 (13%) were removed using this cut.

One expects r scale,pred to artiﬁcially increase with beam size for aﬁxed r scale . However, r scale is not known for observations of galaxieswhose values r scale fall below some fraction of the beamsize. Wesee this eﬀect happening as shown in Figure A1 in a non-complexmanner. Therefore, we recommend enforcing ﬂagging based on incli-nation which appears to be strongly linked with those galaxies whoser scale is under predicted (along the minor axis). In the edge-on galaxycase, the minor axis is no longer well resolved resulting in a poorrecovery of the intensity proﬁle. However, this is a well-known issuein moment based kinematic modelling, in which the intensity proﬁlesand kinematics can never be fully derived in edge-on galaxies due toline of sight eﬀects.As we have included no method for mitigating the changes in-duced by varying beam size, it comes as no surprise that the networkwill behave diﬀerently given a suﬃciently large ratio of beam sizeto galaxy extent. Given that we do not have a mechanism for dealingwith "beam smearing" in the current network architecture, we expectto see its inﬂuence, lowering the apparent line of sight velocitiesclose to the center of galaxies where the iso-velocity contours areclosest together. For minimising the eﬀects of varying beam size werecommend convolving the 3D spatial cube 𝑟 (see Figure 1), evalu- ated using Equation 3, with the restoring beam before creating theoutput maps. The advantage of this approach being that the restoringbeam is often included in data-product header units, and so shouldbe readily available for creating kernels with which to perform theaforementioned convolution. We consider this approach as beyondthe scope of the work presented in this paper, but will be includedin future work focusing on retrieving the properties of marginallyresolved galaxies. In previous work we showed that the ﬁll factor (i.e. the number ofzeroed pixels) in a velocity map’s ﬁeld of view, impacts the behaviourof NN models which take them as inputs (Dawson et al. 2019). Withthe NN model presented in this work, we have seen little evidence thatthis has an eﬀect on the galaxies’ predicted parameters. We attributethis behaviour to the nature of the training procedure, whereby in combined and solo testing, the network does not rely solely uponinference of unseen data.

MNRAS , 1–12 (2021)

James M. Dawson et al.

20 0 20

X (pixels) Y ( p i x e l s ) Inputs

20 0 20

X (pixels) Y ( p i x e l s ) i = 51.5 ± 0.4 Outputs

Radius ( ) I n t e n s i t y r scale = 310.12 ± 2.53 Predicted profiles

20 0 20

X (pixels) Y ( p i x e l s )

20 0 20

X (pixels) Y ( p i x e l s ) Radius ( ) V r o t ( k m s ) r turn = 87.92 ± 24.89V max = 140.1 ± 5.9 kms NNBBarolo0.00.20.40.60.81.0 N o r m a li s e d i n t e n s i t y N o r m a li s e d i n t e n s i t y V l o s ( k m s ) V l o s ( k m s ) Figure 5.

An example galaxy, NGC 2403, observed in Hi and evaluated using the network in combined testing mode. Maps in the left and middle columns share 𝑥 and 𝑦 axis sizes of 64 ×

BBarolo ’s derivedrotation curves, the network’s velocity proﬁle has been corrected for by the predicted inclination term. The network predicted parameters are shown as text inthe upper-middle, upper-right, and lower-right subplots. We see that this galaxy has a velocity proﬁle which can be roughly approximated by an arctan functionmeaning the kinematic parameters are well recovered by the model.

The primary goal of developing a network like that presented inthis paper is to demonstrate the applicability of machine learning toSKA science. As such, in this section we show the network performswell with Hi observational data. In order to do this we present twoexample test galaxies, NGC 2403 and NGC 3198, observed using the V ery L arge A rray (VLA) as part of T he Hi N earby G alaxy S urvey(THINGS) (Walter et al. 2008), and showing a diversity of rotationcurve shapes. These galaxies are two of 17 THINGS galaxies used for mixed training and testing using the network and chosen heuristicallyfor the appearance of their well deﬁned rotating Hi disks. The namesand publications for the galaxies used in this sample are shown inTable A1.Figure 5 shows the derived intensity proﬁle and rotation curve forNGC 2403. We include the rotation curve modelled using BBarolo (Di Teodoro & Fraternali 2015) on the datacube (Di Teodoro &Lelli, private communication). In comparison, we see that the neuralnetwork’s predicted rotation curve matches closely and so we areconvinced that the network is able to recover physical informationwell. Although the galaxy’s intensity proﬁle does not strictly exhibitan exponential form, this has little impact in the recovery of therotation curve which is the networks primary objective.Figure 6 shows the derived intensity proﬁle and rotation curve forNGC 3198. This galaxy exhibits a mild warp and a ﬂat rotation curve(Gentile et al. 2013) with a slight rise at ∼ (cid:48)(cid:48) . Warped Hi discsare not uncommon in the outer regions of galaxies. At present ournetwork architecture is not set up to model these (however one could easily extend the model in order to do so). Again, we include therotation curve modelled using BBarolo on the datacube (Di Teodoro& Lelli, private communication) in Figure 6. Crucially, although thiswarping behaviour is not included in our model, in this case thenetwork still returns reasonable parameter estimations, showing thatit could still be usable for parameter estimations across a broadlydiverse population of galaxies.

In order to demonstrate the ﬂexibility of this network architec-ture, we trained a model to recover the kinematic propertiesof galaxies observed in the CO line using the A tacama L arge M illimeter/submillimeter A rray (ALMA). Our samples are drawnfrom the mm- W ave I nterferometric S urvey of D ark O bject M asses(WISDOM) project (see Table A2 for more information) and havehigh spatial resolution. Due to the nature of these objects beingtargeted for their evidence of black hole inﬂuence on the gas kine-matics, we expect to see small values of a V for the sample. As seenin Figure 7, this eﬀect is clearly visible, highlighting the predictablebehavioural nature of the network. It is also clear in Figure 7, thatNGC 1387 (FCC184, Zabel et al. 2020, Boyce et al., in prep), anexemplar galaxy from the WISDOM sample, exhibits an exponentialintensity proﬁle which the network can easily recover.Such an example demonstrates the transferable nature of this net-work architecture and training style but without the diﬃculties oftenassociated with traditional transfer learning tasks. This means thatsuch architectures and training styles can be applied to a multitude MNRAS000

20 0 20

X (pixels) Y ( p i x e l s ) Inputs

20 0 20

X (pixels) Y ( p i x e l s ) i = 67.2 ± 0.6 Outputs

Radius ( ) I n t e n s i t y r scale = 203.91 ± 2.38 Predicted profiles

20 0 20

X (pixels) Y ( p i x e l s )

20 0 20

X (pixels) Y ( p i x e l s ) Radius ( ) V r o t ( k m s ) r turn = 15.69 ± 9.71V max = 149.9 ± 4.7 kms NNBBarolo0.00.20.40.60.81.0 N o r m a li s e d i n t e n s i t y N o r m a li s e d i n t e n s i t y V l o s ( k m s ) V l o s ( k m s ) Figure 6.

An example galaxy, NGC 3198, observed in Hi and evaluated using the network in combined testing mode. Maps in the left and middle columns share 𝑥 and 𝑦 axis sizes of 64 ×

The network can retrieve a mean ﬁeld approximation for all learnableparameters, of a single galaxy observation, in 0.0025 seconds ona single Intel(R) Core(TM) i7-6700 CPU core. This time scaleslinearly with the number of MC dropout samples one wishes tocollect (i.e. for a set of 1000 MC dropout samples, a typical test onan individual galaxy would take 2.5 seconds) to generate pseudo-probabilistic distributions. However as the batch throughput size islimited only by the available device memory, it is possible to retrievevalues for learnable parameters, and hence MC dropout samples,in the same time frames as listed above for multiple observations.This means that one could potentially return hundreds to thousandsof parameterisations and associated pseudo-errors in a matter ofseconds.

There are a few caveats pertaining to the use of the model described inthis work. These caveats may impact the way in which users handlethe network and the conﬁdence levels associated with parameterestimations.A key factor in recovering sensible parameterisations using thenetwork is the choice of decoder functions (see §2.4). In this work we have used simple, general, functions in the form of an exponential(see Equation 3) and an arctan (see Equation 4). However, shouldone wish to model speciﬁc emission line components of galaxies,it would be prudent to adopt more tailored functional forms. Forexample, it has been shown that Hi discs can display depressions intheir intensities in their central regions, typically ﬁlled by moleculargas (Wong & Blitz 2002), for which a truncated Gaussian intensityproﬁle (Martinsson et al. 2013) would be more appropriate whenreconstructing the intensity maps. Additionally, when modelling thevery outer regions of Hi discs, one might consider adopting a morecomplex multi-parameter function capable of encoding the sharpness of the turnover at r turn and the behaviour of the curve after this point(e.g. Rix et al. 1997), or even declining velocities in the centralregions (Lelli et al. 2016). A declining rotation curve would bechallenging for the current model to ﬁt (and impossible to fullyretrieve). However, due to the nature of the loss function chosen inthis work (see Equation 5), the network will prioritise ﬁtting to thehigher velocity regions of galaxies.As described in §3.1.2, the resolution of input images impacts theability of the network to correctly predict a scale , particularly in thehigh inclination regime. This places constraints on the user’s conﬁ-dence in parameter estimations when working in both the large-beamand high inclination cases combined. Additionally, we can see in Fig-ure 4 that the network struggles to accurately recover inclinations atthe very low inclination range. This is a predictable eﬀect caused bythe loss of line of sight velocity information for face on disks butagain, in the case of survey pipelines, these low inclined galaxieswill require additional ﬂagging. In both the aforementioned caveat

MNRAS , 1–12 (2021) James M. Dawson et al.

20 0 20

X (pixels) Y ( p i x e l s ) Inputs

20 0 20

X (pixels) Y ( p i x e l s ) i = 10.0 ± 0.1 Outputs

Radius ( ) I n t e n s i t y r scale = 4.04 ± 0.06 Predicted profiles

20 0 20

X (pixels) Y ( p i x e l s )

20 0 20

X (pixels) Y ( p i x e l s ) Radius ( ) V s i n ( i ) ( k m s ) r turn = 0.33 ± 0.24V max = 69.4 ± 6.8 kms NNKinMS0.00.20.40.60.81.0 N o r m a li s e d i n t e n s i t y N o r m a li s e d i n t e n s i t y V l o s ( k m s ) V l o s ( k m s ) Figure 7.

An example WISDOM galaxy, NGC 1387, observed in CO and evaluated using the network in combined testing mode. The left and middle columnsshare 𝑥 and 𝑦 axis sizes of 64 ×

64 pixels. In this way we are directly observing the input and output maps of the model. The right column has undergone an 𝑥 -axis rescaling to match observational scales found in the literature. The black dashed lines and grey areas show the mean and standard deviation respectivelyfor proﬁles predicted by the neural network model. The blue dashed line shows a major axis cut of the input intensity map. The red dashed line shows the KinMS reconstructed rotation curve. The network predicted parameters are shown as text in the upper-middle, upper-right, and lower-right subplots. We easily see thatthis galaxy has an intensity proﬁle and velocity proﬁle which can be roughly approximated by an exponential and an arctan function respectively, meaning thekinematic parameters are well recovered by the model. cases it is worth noting that traditional kinematic modelling methodsalso struggle to accurately estimate parameters, in particular whenworking with moment maps. Extensions of the network’s frameworkpresented here to kinematically model datacubes may alleviate theseissues and will be explored in future work.

We have demonstrated the performance of a neural network modelarchitecture which can be used to recover rotation curves of galaxiesfrom their kinematics. The model was tested on syntheticallygenerated galaxies as well as observations using both Hi and COemission lines.Testing on synthetically generated galaxies has highlighted thepowerful performance of the network as well as areas where thenetwork’s performance is sub optimal. For the latter areas we havediscussed solutions including: an additional convolution with therestoring beam to counteract the eﬀects of "beam smearing", andﬂagging high inclination data in a large beam and high inclinationregime.Testing observational Hi data from THINGS has shown that thisstyle of network is well suited to work with data like that expectedfrom the SKA in the near future. We have shown that the network iscapable of estimating velocity curves for discs exhibiting a variety ofproﬁles. In order to do this, we have directly compared the rotationcurves estimated by the network to those modelled directly from the cubes using kinematic modelling tools. The network is able toperform adequate recovery of parameters even in cases where itwould not be possible to reproduce the true rotation curves. Thesepromising results give us conﬁdence that adopting more ﬂexibledecoder functions will extend the applicability of the model for morespeciﬁc use cases should one wish to model Hi discs exclusively.Testing observational CO data from the WISDOM project hasshown that the network is suitable for a range of emission line ob-servations. Unlike traditional ML models, the network architectureand training styles outlined in this work prevent the need for transferlearning which is often time consuming and fraught with ungainlychallenges associated with systematic properties of training sets.We have shown that the model outlined in this work can recoverrotation curves which heuristically match rotation curves extractedfrom ALMA observations using more time-consuming approaches.As previously stated, improvements to the model architecture inthis work include but are not limited to: adapting the model to usemore complex intensity and velocity proﬁles in the decoder sub-net, automatically accounting for large beam eﬀects such as beamsmearing and information loss either via systematic oﬀsets in modelpredictions or via the incorporation of an extra convolutional layerin the decoder subnet, and reintroducing a position angle estima-tion step. An idealised improvement on the model would be to workdirectly with interferometric datacubes themselves, or even visibili-ties, without the need to generate moment maps prior to training andtesting. However, we have found that the discretised nature of chan-nels in interferometric datacubes presents a non-gradient-trackablestep in the decoder’s reconstruction of datacube inputs. This discon-

MNRAS000

MNRAS000 , 1–12 (2021) elf-supervised kinematic modelling tinuity in the gradient tree prevents back propagation via gradientdescent and consequently halts model training. We propose adaptingthis self-supervised approach to work with datacubes as a lucrativeavenue of research for challenging current kinematic modelling toolsin preparation for the SKA and other upcoming large facilities. ACKNOWLEDGEMENTS

This paper has received funding by the Science and TechnologyFacilities Council as part of the Cardiﬀ, Swansea & Bristol Centrefor Doctoral Training.We gratefully acknowledge the support of NVIDIA Corporationwith the donation of a Titan Xp GPU used for this research. We alsothank the anonymous reviewer whose comments and suggestionshelped improve this manuscript.JMD wishes to gratefully acknowledge the help of Dr FedericoLelli for providing rotation curves for THINGS sample galaxies andvaluable insight which have both contributed towards improving thispaper.TAD acknowledges support from the UK Science and TechnologyFacilities Council through grant ST/S00033X/1.This research made use of Astropy , a community-developedPython package for Astronomy (Astropy Collaboration et al. 2013,2018), NumPY an open source numerical computation library (Har-ris et al. 2020), and pandas a data manipulation software library(pandas development team 2020).This paper makes use of data obtained using the Jansky Very LargeArray, a component of the National Radio Astronomy Observatory(NRAO). The NRAO is a facility of the National Science Foundationoperated under cooperative agreement by Associated Universities,Inc.This paper makes use of ALMA data. ALMA is a partnership ofthe ESO (representing its member states), NSF (USA), and NINS(Japan), together with the NRC (Canada), NSC, ASIAA (Taiwan),and KASI (Republic of Korea), in cooperation with the Republicof Chile. The Joint ALMA Observatory is operated by the ESO,AUI/NRAO, and NAOJ. REFERENCES

Ackermann S., Schawinski K., Zhang C., Weigel A. K., Turp M. D., 2018,MNRAS, 479, 415Aragon-Calvo M. A., 2019, arXiv e-prints, p. arXiv:1907.03957Astropy Collaboration et al., 2013, A&A, 558, A33Astropy Collaboration et al., 2018, AJ, 156, 123Begum A., Chengalur J. N., Karachentsev I. D., 2005, A&A, 433, L1Bekki K., 2019, Monthly Notices of the Royal Astronomical Society, 485,1924Blais-Ouellette S., Carignan C., Amram P., Côté S., 1999, AJ, 118, 2123Bosma A., 2016, Proceedings of the International Astronomical Union, 11,220–222Bourlard H., Kamp Y., 1988, Biological cybernetics, 59, 291Breiman L., 2001, Machine Learning, 45, 5Davis T. A., Bureau M., Cappellari M., Sarzi M., Blitz L., 2013, Nature, 494,328Davis T. A., Bureau M., Onishi K., Cappellari M., Iguchi S., Sarzi M., 2017a,Monthly Notices of the Royal Astronomical Society, 468, 4675 https://numpy.org/ https://pandas.pydata.org/ Davis T. A., et al., 2017b, Monthly Notices of the Royal Astronomical Society,473, 3818Davis T. A., Zabel N., Dawson J. M., 2020, KinMS: Three-dimensionalkinematic modelling of arbitrary gas distributions (ascl:2006.003)Dawson J. M., Davis T. A., Gomez E. L., Schock J., Zabel N., Williams T. G.,2019, Monthly Notices of the Royal Astronomical Society, 491, 2506Dewdney P. E., Hall P. J., Schilizzi R. T., Lazio T. J. L. W., 2009, IEEEProceedings, 97, 1482Di Teodoro E. M., Fraternali F., 2015, MNRAS, 451, 3021Dieleman S., Willett K. W., Dambre J., 2015, MNRAS, 450, 1441Dinh L., Pascanu R., Bengio S., Bengio Y., 2017, arXiv e-prints, p.arXiv:1703.04933Domínguez Sánchez H., Huertas-Company M., Bernardi M., Tuccillo D.,Fischer J. L., 2018a, MNRAS, 476, 3661Domínguez Sánchez H., et al., 2018b, MNRAS, 484, 93Gal Y., 2016, PhD thesis, University of CambridgeGal Y., Ghahramani Z., 2016, in Lee D. D., Sugiyama M., LuxburgU. V., Guyon I., Garnett R., eds, , Advances in Neural Informa-tion Processing Systems 29. Curran Associates, Inc., pp 1019–1027, http://papers . nips . cc/paper/6241-a-theoretically-grounded-application-of-dropout-in-recurrent-neural-networks . pdf Gentile et al., 2013, A&A, 554, A125Hansen S., Conselice C. J., Fraser-McKelvie A., Ferreira L., 2020, ResearchNotes of the American Astronomical Society, 4, 185Harris C. R., et al., 2020, Nature, 585, 357Heald G., et al., 2011, A&A, 526, A118Hibbard J. E., van der Hulst J. M., Barnes J. E., Rich R. M., 2001, AJ, 122,2969Hinton G. E., Salakhutdinov R. R., 2006, Science, 313, 504Johnston S., et al., 2007, Publ. Astron. Soc. Australia, 24, 174Johnston S., et al., 2008, Experimental Astronomy, 22, 151Jonas J., MeerKAT Team 2016, in MeerKAT Science: On the Pathway to theSKA. p. 1Józsa G. I. G., Kenn F., Klein U., Oosterloo T. A., 2007, A&A, 468, 731Kamphuis P., Józsa G. I. G., Oh S.-H., Spekkens K., Urbancic N., SerraP., Koribalski B. S., Dettmar R.-J., 2015, Monthly Notices of the RoyalAstronomical Society, 452, 3139Kawaguchi K., Pack Kaelbling L., Bengio Y., 2017, arXiv e-prints, p.arXiv:1710.05468Kingma D. P., Ba J., 2014, arXiv e-prints, p. arXiv:1412.6980Kodirov E., Xiang T., Gong S., 2017, arXiv e-prints, p. arXiv:1704.08345Koribalski B. S., et al., 2018, Monthly Notices of the Royal AstronomicalSociety, 478, 1611Krajnović D., Cappellari M., de Zeeuw P. T., Copin Y., 2006, MNRAS, 366,787Krizhevsky A., Sutskever I., Hinton G. E., 2012, in Pereira F., Burges C.J. C., Bottou L., Weinberger K. Q., eds, , Advances in Neural InformationProcessing Systems 25. Curran Associates, Inc., pp 1097–1105, http://papers . nips . cc/paper/4824-imagenet-classification-with-deep-convolutional-neural-networks . pdf Lelli F., McGaugh S. S., Schombert J. M., 2016, AJ, 152, 157Li D., Pan Z., 2016, Radio Science, 51, 1060Lim T.-S., Loh W.-Y., Shih Y.-S., 2000, Mach. Learn., 40, 203–228Liu X., Zhang F., Hou Z., Wang Z., Mian L., Zhang J., Tang J., 2020, arXive-prints, p. arXiv:2006.08218Martinsson T. P. K., Verheĳen M. A. W., Westfall K. B., Bershady M. A.,Andersen D. R., Swaters R. A., 2013, A&A, 557, A131Masci J., Meier U., Cireşan D., Schmidhuber J., 2011, in Honkela T., DuchW., Girolami M., Kaski S., eds, Artiﬁcial Neural Networks and MachineLearning – ICANN 2011. Springer Berlin Heidelberg, Berlin, Heidelberg,pp 52–59North E. V., et al., 2019, Monthly Notices of the Royal Astronomical Society,490, 319Oh S.-H., Staveley-Smith L., Spekkens K., Kamphuis P., Koribalski B. S.,2017, Monthly Notices of the Royal Astronomical Society, 473, 3256Onishi K., Iguchi S., Davis T. A., Bureau M., Cappellari M., Sarzi M., BlitzL., 2017, Monthly Notices of the Royal Astronomical Society, 468, 4663MNRAS , 1–12 (2021) James M. Dawson et al.

Table A1.

Information regarding the THINGS sample galaxies used through-out his work. Columns give the following information:

Object , the target nameas given in THINGS project publications,

Publication , records the relevantpublication in which the THINGS targets appear.Object PublicationDDO 53 NGC 3621 All from Walter et al. (2008); de Blok et al. (2008)NGC 925 NGC 4736NGC 2403 NGC 4826NGC 2841 NGC 5055NGC 2903 NGC 5236NGC 3184 NGC 6946NGC 3198 NGC 7331NGC 3351 NGC 7793NGC 3521Paszke A., et al., 2017, in NIPS-W.Plaut E., 2018, arXiv e-prints, p. arXiv:1804.10253Rix H.-W., Guhathakurta P., Colless M., Ing K., 1997, Monthly Notices ofthe Royal Astronomical Society, 285, 779Rumelhart D. E., Hinton G. E., Williams R. J., 1986, Nature, 323, 533Sancisi R., Fraternali F., Oosterloo T., van der Hulst T., 2008, A&ARv, 15,189Serra P., et al., 2013, MNRAS, 428, 370Shen A. X., Bekki K., 2020, MNRAS, 497, 5090Smith M. D., et al., 2019, Monthly Notices of the Royal Astronomical Society,485, 4359Spekkens K., Sellwood J. A., 2007, ApJ, 664, 204Srivastava N., Hinton G., Krizhevsky A., Sutskever I., Salakhutdinov R.,2014, Journal of Machine Learning Research, 15, 1929Stark D. V., et al., 2018, Monthly Notices of the Royal Astronomical Society,480, 2217Swaters R. A., 1999, PhD thesis, -Walmsley M., et al., 2019, Monthly Notices of the Royal Astronomical Soci-ety, 491, 1554Walter F., Brinks E., Blok W., Bigiel F., Kennicutt R., Thornley M., LeroyA., 2008, The Astronomical Journal, v.136, 2563-2647 (2008), 136Warren B. E., Jerjen H., Koribalski B. S., 2004, AJ, 128, 1152Wong T., Blitz L., 2002, ApJ, 569, 157Zabel N., et al., 2020, MNRAS, 496, 2155de Blok W. J. G., Walter F., Brinks E., Trachternach C., Oh S. H., KennicuttR. C. J., 2008, AJ, 136, 2648pandas development team T., 2020, pandas-dev/pandas: Pandas,doi:10.5281/zenodo.3509134, https://doi . org/10 . . van Albada T. S., Bahcall J. N., Begeman K., Sancisi R., 1985, ApJ, 295, 305 The data and scripts underlying this article are available via GitHub,at https://github . com/SpaceMeerkat/Corellia . APPENDIX A: EXTRA MATERIAL

This paper has been typeset from a TEX/L A TEX ﬁle prepared by the author. r scale m e d ( r s c a l e , p r e d r s c a l e ) Beamsize4567 8910

Figure A1.

The eﬀects of varying the ratio of beam size to galaxy extent. Itis clear to see that an increased beam size results in an artiﬁcial lengtheningof the intensity proﬁle scale length. It can also be seen that the spread inmedian oﬀset increases with r scale , which occurs due to information loss asthe convolved ﬂux is "smeared" out beyond the ﬁeld of view. The value ofr scale at which this eﬀect begins to take hold is clearly inversely proportionalto the beamsize.

Table A2.

Information regarding the WISDOM project sample used through-out this work. Table columns give the following information:

Object , thetarget name as given in WISDOM project publications,

Observation type ,gives the emission line ALMA observed for the target,

Publication , recordsthe relevant publication in which ALMA observations of the targets appear.Object Observation type PublicationNGC 3665 CO(2-1) Onishi et al. (2017)NGC 0383 CO(2-1) North et al. (2019)NGC 0524 CO(2-1) Smith et al. (2019)NGC 1387 CO(2-0) Zabel et al. (2020), Boyce et al. (in prep)NGC 4429 CO(3-2) Davis et al. (2017b)NGC 4697 CO(2-1) Davis et al. (2017a)MNRAS000