[PDF] DeepGalaxy: Deducing the Properties of Galaxy Mergers from Images Using Deep Neural Networks

Abstract

Galaxy mergers, the dynamical process during which two galaxies collide, are among the most spectacular phenomena in the Universe. During this process, the two colliding galaxies are tidally disrupted, producing significant visual features that evolve as a function of time. These visual features contain valuable clues for deducing the physical properties of the galaxy mergers. In this work, we propose DeepGalaxy, a visual analysis framework trained to predict the physical properties of galaxy mergers based on their morphology. Based on an encoder-decoder architecture, DeepGalaxy encodes the input images to a compressed latent space z , and determines the similarity of images according to the latent-space distance. DeepGalaxy consists of a fully convolutional autoencoder (FCAE) which generates activation maps at its 3D latent-space, and a variational autoencoder (VAE) which compresses the activation maps into a 1D vector, and a classifier that generates labels from the activation maps. The backbone of the FCAE can be fully customized according to the complexity of the images. DeepGalaxy demonstrates excellent scaling performance on parallel machines. On the Endeavour supercomputer, the scaling efficiency exceeds 0.93 when trained on 128 workers, and it maintains above 0.73 when trained with 512 workers. Without having to carry out expensive numerical simulations, DeepGalaxy makes inferences of the physical properties of galaxy mergers directly from images, and thereby achieves a speedup factor of ∼ 10 5 .

Full PDF

DDeepGalaxy: Deducing the Properties of GalaxyMergers from Images Using Deep Neural Networks

Maxwell X. Cai

SURF CorporativeAmsterdam, The Netherlands [email protected]

Jeroen B´edorf minds.ai

Santa Cruz, CA, [email protected]

Vikram A. Saletore

Intel Corporation

Hillsboro, OR, [email protected]

Valeriu Codreanu

SURF Corporative

Amsterdam, The [email protected]

Damian Podareanu

SURF Corporative

Amsterdam, The [email protected]

Adel Chaibi

Intel Corporation

Paris, [email protected]

Penny X. Qian

Leiden University

Leiden, The [email protected]

Abstract —Galaxy mergers, the dynamical process duringwhich two galaxies collide, are among the most spectacularphenomena in the Universe. During this process, the two col-liding galaxies are tidally disrupted, producing signiﬁcant visualfeatures that evolve as a function of time. These visual featurescontain valuable clues for deducing the physical properties of thegalaxy mergers. In this work, we propose

DeepGalaxy , a visualanalysis framework trained to predict the physical properties ofgalaxy mergers based on their morphology. Based on an encoder-decoder architecture,

DeepGalaxy encodes the input imagesto a compressed latent space z , and determines the similarityof images according to the latent-space distance. DeepGalaxy consists of a fully convolutional autoencoder (FCAE) whichgenerates activation maps at its 3D latent-space, and a variationalautoencoder (VAE) which compresses the activation maps intoa 1D vector, and a classiﬁer that generates labels from theactivation maps. The backbone of the FCAE can be fully cus-tomized according to the complexity of the images.

DeepGalaxy demonstrates excellent scaling performance on parallel machines.On the

Endeavour supercomputer, the scaling efﬁciency exceeds0.93 when trained on 128 workers, and it maintains above 0.73when trained with 512 workers. Without having to carry outexpensive numerical simulations,

DeepGalaxy makes inferencesof the physical properties of galaxy mergers directly from images,and thereby achieves a speedup factor of ∼ . Index Terms —image classiﬁcation, image searching, high per-formance computing, astrophysics, galaxy mergers.

I. I

NTRODUCTION

Recent research suggests that there are up to × galaxies in the observable Universe [1] [2], each of whichconsists of to stars. Galaxies gravitationally interactwith each other during their formation and evolution histories[3] [4]. As a consequence, the two interacting galaxies areshaped by the tidal ﬁeld of each other, resulting to theapparent distortion in their appearance. One of the most violentinteraction events is the physical collision of two galaxies,namely galaxy mergers. These spectacular events are crucial tothe evolution history of galaxies, as they profoundly impact thekinematic and star formation efﬁciency of galaxies. While itis expected that the galaxy merger rate was higher in the earlyepoch of the Universe [5], galaxy mergers do still happen in the present-day Universe. For example, the Milky Way galaxyis expected to collide with the Andromeda galaxy within atimescale of ∼ . Gyr [6].The galaxy merger rate over cosmic time is one of the fun-damental measures of the evolution of galaxies [7]. The mea-surement of galaxy merger rates has been estimated throughseveral approaches, including (semi-)analytical models [8], onobservational data [7], [9], [10], and cosmological simulationdata [11]. Recently, state-of-the-art surveys have offered awealth of observational galaxy images. Should these imagesbe properly classiﬁed according to their morphology, a decentestimate of merger rates and galactic statistical properties canbe obtained. However, the large number of images (typicallyof the order of ) renders it impossible for staff scientistto complete the classiﬁcation themselves. For example, theG ALAXY Z OO consists of , galaxies imagedby the Sloan Digital Sky Survey . To address this challenge,the G ALAXY Z OO project publishes their images online,inviting citizen scientists to contribute to the classiﬁcationof the galaxy images [12]. The citizen science approach hasproven that an astrophysical problem can be translated into apattern recognition problem, which is then efﬁciently tackledthrough the efforts of hundreds of thousands of volunteers.In recent year, the ﬁeld of image processing and patternrecognition has been boosted by the promising developmentof deep learning, in particular, deep convolutional neural net-works (CNNs). Indeed, with the support of high-performancecomputing hardware, state-of-the-art CNNs such as ResNet [13] and

EfficientNet [14] can deliver robust resultseven if the images are with low signal-to-noise ratio, havingobscured background, and/or distorted [15]–[19]. As such, itis feasible to automate the astronomical image classiﬁcationtask, which has been traditionally done by thousands ofhuman volunteers, by using deep neural networks (DNNs).The existing classiﬁcation results from human volunteers can a r X i v : . [ a s t r o - ph . I M ] O c t hen be used as the training datasets for building an appropriateDNN. When properly designed and trained, a DNN can delivera speedup of many orders of magnitude compared to humanclassiﬁers, allowing astronomers to process much more dataand thereby obtaining better statistics.Roughly speaking, most galaxy image classiﬁcation tasks(e.g., [20]–[22]) can be divided into the following two cat-egories: ﬁrst, it is a binary classiﬁcation task to determinewhether a galaxy image is a galaxy merger or a singlegalaxy, and second, for single galaxies, it is a multi-classclassiﬁcation problem to determine the galaxy type in theHubble sequence based on its morphology. In this project,we propose DeepGalaxy , which is an extension to the ﬁrstclassiﬁcation category. Instead of providing the information ofwhether a galaxy image is representing a galaxy merger, weaim to predict a timescale estimate at which the two merginggalaxies will collide. This paper is organized as follows: Theimplementation of

DeepGalaxy is presented in Section II;the performance results are shown in Section III; ﬁnally, theconclusions are summarized in Section IV.II. I

MPLEMENTATION

A. Training Data

DeepGalaxy is a general-purpose galaxy image process-ing framework, so astronomers are free to choose the datasetsof interests to feed into it. In our case, the training dataconsists of two datasets: Simulated galaxy merger imageswith strong labels and observation images without labels . Thesimulated galaxy merger images and their corresponding labelsare obtained through high-resolution N -body simulations. Weuse Bonsai [23], [24], a GPU-accelerated Barnes-Hut tree[25], code to carry out simulations of galaxy mergers withvarious initial conditions. The

Bonsai simulations take intoaccount both baryon particles and dark matter particles. Thedensity proﬁle of the galaxies is sampled from the Navarro-Frenk-White (NFW) proﬁle [26]. Each galaxy is representedby particles. Bonsai builds a Barnes-Hut tree with19 levels according to the particle density. The simulationsconserve energy rather well, up to a relative energy error of ∼ − . We survey a grid of initial conditions, with the massratios of the two galaxies being / , / , / , and / . Thetwo interacting galaxies are placed ∼ times their radii awayfrom each other, such that their mutual tidal force is negligible.The two galaxies, however, are given the velocity vectorsdeemed for collision in the future. As they approach eachother, their mutual tidal forces become increasingly stronger,which in turn shapes the morphology in this process. Foreach visualization time step, the galaxy merger is visualized,and the visualization is captured by virtual cameras from 14different positions. These virtual cameras generate images of × pixels, large enough to resolve minute details ofthe merger. The dynamical timescale of the merger is encodedwith 71 classes, with class 0 representing the galaxies To our best knowledge, there is no dataset of observation images withdynamical timescale labels available to the community. being the furthest away from each other (and therefore requiresthe longest amount of time to merge) class 70 representing thestate in which the two galaxies are merged and completelyvirialized (i.e., reached a dynamical equilibrium state). Agallery of sample images is shown in Fig. 1. This datasetconsists of 35,784 images from 36 N -body simulations withdifferent initial conditions . The simulations are carried outusing SiMon [27], an open source simulation monitor forautomating the execution of astrophysical N -body simulations.The dataset generated through these simulations is balanced.We randomly sample 80% of the dataset as training data, andreserve the remaining 20% for validation.The dataset with observation images is derived from theG ALAXY Z OO data. While the dataset contains labels aboutthe galaxy morphology type, the labels are not directly usablefor supervised training because they do not contain informa-tion about the dynamical timescale of the merger process.Therefore, we process this dataset with unsupervised training. B. Neural Network Architecture

Having taken into account that most observation imagesare unlabeled, we develop a DNN architecture that supportsboth supervised and unsupervised training. The architecture isshown in Fig. 2. The DNN consists of three parts:1) Supervised learning: The input images X generatedfrom N -body simulations will pass through an encoder E , which subsequently generates feature maps M . Thefeature maps are then ﬂattened and passed through afew fully connected layers (FC). Eventually, labels Y are generated though a softmax activation function.2) Unsupervised learning of images: A fully convolutionalautoencoder (FCAE) aiming to minimize the differencebetween X and generated images ˆ X . Since there is noﬂatten or fully-connected layer, the FCAE is invarientto image resolution.3) Unsupervised learning of feature maps: A variational au-toencoder (VAE) [28] that encodes the ﬂattened featuremaps M (cid:48) into 1D compressed latent vectors.These three parts share weights, but each of them can betrained separately. If the training data is dominated by thelabeled simulated galaxy merger images, then DeepGalaxy becomes a standard CNN classiﬁer, where auxiliary unlabeledimages (if any) can be used as a regularization to the featuremaps M . The backbone of the convolutional encoder E andconvolutional decoder D can either be a simple network ora state-of-the-art one with residual blocks (e.g., ResNet and

EfficientNet ). The choice of the backbone depends on thecomplexity of the input data. On the other hand, if the datais dominated by unlabeled observational data,

DeepGalaxy can be used as an unsupervised clustering algorithm, andthe labeled images can be considered as a regularization of M . If comparing the image similarity is the main interest, These 36 simulations are merely used for demo purposes. It is unlikely thatthey cover a full range of parameter space needed to study galaxy mergers.For actual astrophysical projects, more data is likely needed. https://github.com/maxwelltsai/SiMon =0 C=5 C=10 C=15 C=20 C=25 C=30C=35 C=40 C=45 C=50 C=55 C=60 C=65 Fig. 1. A gallery with a few sample images from the simulation-generated training dataset. The class labels (marked in the title of the subplots) are obtaineddirectly from the N -body simulation. It indicates the timescale at which two galaxies will collide. At C = 0 , the two galaxies are still far away from eachother; at C = 20 , collision is imminent; at C = 70 , the two galaxies are fully merged into an elliptical galaxy and new dynamical equilibrium has beenestablished. then DeepGalaxy can be considered as an image searchframework, where images are compared according to theirdistances in the latent space z . By training the VAE withthe high-level representation M instead of the pixel-level rawimages X , the VAE has access to the global features of X ,making it more robust. The encoded latent vector z is a 1Drepresentation with a small size (typically with the length of 16or 32). As such, comparing images via z is computationallymuch more affordable than comparing directly at the pixellevel.The current implementation of DeepGalaxy is based onTensorFlow.

C. Parallelization

As mentioned above, the backbone of E and D can bechosen according to the actual complexity of the images. Inthe scenario where processing high-resolution (observational)images is the main interest, it is usually necessary to usea large and deeper convolutional network as the backbone.Notably, the newly developed CNN EfficientNet hasfewer parameters, but higher accuracy than

ResNet50 . Buteven so, when using the largest variant of

EfficientNet , EfficientNetB7 , to process images with a resolution of (2048 × pixels, requires roughly 66 million parameterswith a memory footprint of 150 GB (when using a batchsize of 1). Consequently, in this regime, parallelization be-comes particularly important, even required. We implement theparallelization using Horovod , a data parallelism frameworkbuilt on the top of popular deep learning frameworks (suchas TensorFlow and PyTorch) and high-performance collectivecommunication frameworks (such as MPI and NCCL). Assuch,

DeepGalaxy can be easily scaled up in a multi-node,multi-worker high-performance computing environment.

D. Hardware Environment

We carry out the training and benchmark of

DeepGalaxy on the

Endeavour

HPC Supercomputer hosted at Intel Corpo-ration. Since the memory footprint required for the training isusually larger than the available memory of the latest GPU,the training is carried out on the CPU. We use up to 256nodes on

Endeavour , each node with two Intel(R) Xeon (R)Scalable Processor 8260 CPU and 192 GB of memory. Thecompute nodes are interconnected with the Intel(R) Omni-PathArchitecture. III. R

ESULTS

A. Classiﬁcation accuracy

For the convolutional classiﬁer, we measure the classiﬁ-cation accuracy on an

EfficientNetB4 network, whichconsists of roughly 18 million parameters (comparable to

ResNet50 , which has about 23 million parameters). Thenetwork is initialized using

ImageNet , and is subsequentlytrained with simulated galaxy merger images with the size of (512 × pixels . We use Optuna , a hyper-parameteroptimization tool to obtain the optimal learning rate. With the AdaDelta optimizer and a learning rate of 8.0, the networkconverges within about 12 epochs, as shown in Fig. 3. Incomparison, without the

ImageNet weights initialization, itwill take 40 epochs of training to converge. The local batchsize is 8 and with 512 workers, the global batch size is 4096.This setting results in a memory footprint of 16 GB per worker.With 256 computing nodes on the

Endeavour supercomputer,convergence is reached within 15 minutes.In addition, to study the how the image resolution affect theconvergence, we train another

EfficientNetB4 network While the original resolution of images generated from the simulations is (2048 × pixels, we down-sampled these images to (512 × pixelsto quicken the training process. https://optuna.org/ XXM e ϕ ( z | M ′ ) d θ ( M ′ | z ) M ′ e ϕ d θ ̂ M ′ μσ z Sample z = μ + σ 𝒩(0,1)

Flatten ED Minimize ℒ( M ′ , ̂ M ′ ) M i n i m i z e L ( X , ̂ X ) FC Labels ( Y ) Fig. 2. The architecture of

DeepGalaxy . The DNN consists of three parts: (1) a fully convolutional autoencoder that aims to minimize the differencebetween the input images (denoted as X ) and the generated images (denoted as ˆ X ). The encoded feature maps are denoted as M . (2) A fully connectednetwork (FC) that associates the feature map M with the labels Y . (3) A variational autoencoder that encodes the ﬂattened feature maps M (cid:48) into a 1D latentvector z as it reproduces M (cid:48) . on images with (1024 × pixels. Using the same hardwareconﬁguration, with a local batch size of 8 (global batch size4096), convergence is reached within 16 epochs, as shown inFig. 4. It has a memory footprint of 43 GB per worker, whichis too large to ﬁt into the GPU memory. With 256 comput-ing nodes on the Endeavour supercomputer, convergence isreached within 15 minutes.

B. Unsupervised Learning Performance

As shown in Fig. 5, the VAE manages to learn an embeddingin its latent space z . As such, when interpolating z , a gridof smoothly varying images is generated. Therefore, imagesimilarity can be measured by calculating || z − z || , where z is the latent vector of a query image and z is the latentvector of a training image. By getting the k -nearest neighbor(kNN) vectors around the query vector z , the latent modelis essentially suggesting k similar images, and therefore thephysical properties of the query images can be inferred fromthat of a similar ﬁgure. However, this algorithm requiresdeﬁning a number of neighbors, which could be a limitation.In order to get fully unsupervised recommendations for anarbitrary query, we continue the previous algorithm and buildthe kNN graph based on distances between samples. We thencalculate the discrete Laplacian, obtaining a matrix whichmeasures to what extent the graph differs at one vertex,sample, to nearby vertices, other samples. This allows usto explore the connected components and the spectrum ofthe graph. Finally we perform a spectral clustering step in A cc u r a c y M S E val_acctrain_accval_losstrain_loss Fig. 3. The loss and accuracy of the training dataset and the validationdataset as a function of epochs. The loss is measured with mean squarederror (MSE). The image resolution is × pixels. The CNN backboneis EfficientNetB4 , initialized with the

ImageNet weights. The trainingis carried out on 256 nodes, each with 2 workers (thus 512 workers in total).The network is initialized with

ImageNet weights. We use a momentum of0.9 in the batch normalization layers. order to obtain the ﬁnal recommendations. The advantage ofexploring the graph structure in this way is that the initialchoice for k becomes less important. When running this querythought the synthetic dataset, we validate the recommendationquality against the structural similarity index measure (SSIM).Fig. 6 demonstrates the effectiveness of this approach. Therecommendations generated by the VAE have yielded a high A cc u r a c y M S E val_acctrain_accval_losstrain_loss Fig. 4. Same as Fig. 3, but for the images of (1024 × pixels.Fig. 5. Galaxy merger images generated by the VAE when interpolating thelatent vector z . degree of consistency with the SSIM score. It is interesting tonote that, although the SSIM score of the last recommendation( C. Scaling Performance

We perform the training on different number of nodeson the

Endeavour cluster, and thereby obtaining a scalingperformance. As shown in Fig. 7,

DeepGalaxy is con-ﬁgured with an

EfficientNetB4 backbone network, andis subsequently trained on the (1024 × pixel images.With the number of workers N P ranging from 4 to 1024(corresponding to 1 to 256 nodes, each node with 4 workers),the throughput scales nearly linearly as a function of N P for N P ≤ . The scaling efﬁciency for N P = 128 exceeds0.93. The communication overhead starts to pick up in thecase of N P > . Nevertheless, the scaling efﬁciency remainssigniﬁcant even in the case of N P = 512 , hitting a value of0.73. This ﬁgure, therefore, demonstrates that DeepGalaxy is able to handle very large and complex datasets when scalingup on massively parallel computers.

D. Inference Speed-Up

When using the

FP32 precision, it takes the training model ∼ seconds to perform the inference classiﬁcation operation.In contrast, it would take approximate 1-2 days with thetraditional methods to carry out numerical simulations ofthe galaxy merger and make the prediction of its properties(depending on the initial conditions and the resolution of thesimulation). As such, a speed up factor of − is achievedwhen using DeepGalaxy to replace traditional numericalsimulations. Greater speed-up ratios can be achieved whenusing reduce-precision inference. Such a speed up factor oversimulation is extremely promising for tackling the very largeamount of galaxy merger data, as this allows astrophysiciststo quickly obtain the statistical properties of a large numberof samples. IV. C

ONCLUSIONS

Big Data is a constant challenge in modern astrophysicalresearch. With terabytes of images generated by telescopesevery night, astronomers usually would have to depend onthe help of thousands of volunteers to process the data. Inrecent years, image processing and pattern recognition havesigniﬁcantly advanced thanks to the promising developmentof deep neural networks (DNNs). With state-of-the-art DNNarchitectures and high-performance computing technologies, itis increasing feasible to make use of DNNs for constructinghighly efﬁcient and robust astrophysical imagery data process-ing pipelines.In this project, we develop a suite of DNNs, namely

DeepGalaxy , to process classiﬁcation and unsupervisedanalysis (e.g., morphology matching) of galaxy images. Wepresent an example of using

DeepGalaxy to predict the dy-namical timescales of major galaxy mergers from their images.Constructing the intrinsic properties or initial conditions ofa physical system based on observations is a typical inverseproblem, which, traditionally, requires carrying out extensiveastrophysical N -body simulations. DeepGalaxy aims tobypass the need of carrying out simulations. Trained withboth simulation data and observational data, DeepGalaxy aims to extract the information directly from the morphologyof galaxies. It can be used as a convolutional classiﬁer,an unsupervised clustering algorithm, and an image pairingalgorithm. Without having to carry out numerical simulations,this approach achieves a speedup of − .As a general-purpose galaxy image processing framework, DeepGalaxy is designed for a wide range of galaxy image https://github.com/maxwelltsai/DeepGalaxy uery Fig. 6. The latent variable model (consisting of a convolutional feature encoder, a variational autoencoder, and a convolutional feature decoder) is able togenerate a few suggestions based on the query images (plotted on the left). The suggestions are sorted by descending similarity, where 1 indicates a high-degreeof similarity and 6 indicates a low-to-moderate degree of similarity. The similarity of images is quantiﬁed with the structural similarity index measure (SSIM),in which a score of 1.0 means identical, and a score of 0 means completely different. N P S i m a g e s / s e c S actual S ideal Tput

Fig. 7. The scaling performance of

DeepGalaxy as a function of the numberof workers N P . Left y -axis: The speed up ratio S quantiﬁes the throughputratio between N P nodes and one node. In the ideal case, S = N P (dasheddiagonal line), but the actual measurement (indicated with thick black curve)is smaller than N P due to the gradient communication and other overheads.Right y -axis: The throughput (Tput) is shown with the orange curve, whichquantiﬁes the number of images trained per second by the whole system. Greynumbers along the thick black curve: The scaling efﬁciency as a function of N P (normalized to N P = 4 ), which is a value with the range of [0 , . Ascaling efﬁciency of 1 indicates perfect scaling. This benchmark is carried outwith the (1024 × pixels images with the EfficientNetB4 networkon the

Endeavour supercomputer. For the case of N P = 512 , the trainingaccuracy/loss and validation accuracy/loss are shown in Fig. 3. datasets. Depending on the complexity of the input image data, DeepGalaxy allows the users to freely choose the architec-ture of the convolutional backbone as needed. Furthermore,the training can be scaled up efﬁciently on massively parallelmachines (tested on up to 256 nodes, 1024 workers), either onthe CPUs or on the GPUs. On the

Endeavour supercomputer,we obtain a scaling efﬁciency of 0.93 when trained with 128workers, and an efﬁciency of 0.73 when trained with 512workers. A

CKNOWLEDGMENT

We thank the anonymous referees for their constructivecomments, which helped to improve the paper. We thankMalavika Vasist, Bruno Alves, Simon Portegies Zwart, Reinoutvan Weeren, and Hans Pabst for the numerous insightfuldiscussions that we have had so far. R

EFERENCES[1] I. Gott, J. Richard, M. Juri´c, D. Schlegel, F. Hoyle, M. Vogeley,M. Tegmark, N. Bahcall, and J. Brinkmann, “A Map of the Universe,”

ApJ , vol. 624, pp. 463–484, May 2005.[2] C. J. Conselice, A. Wilkinson, K. Duncan, and A. Mortlock, “TheEvolution of Galaxy Number Density at z < 8 and Its Implications,”

ApJ , vol. 830, p. 83, Oct 2016.[3] A. Toomre, “Mergers and Some Consequences,” in

Evolution of Galaxiesand Stellar Populations (B. M. Tinsley and D. C. Larson, RichardB. Gehret, eds.), p. 401, Jan 1977.[4] J. E. Barnes and L. Hernquist, “Dynamics of interacting galaxies.,”

ARAA , vol. 30, pp. 705–742, Jan 1992.[5] G. F. Snyder, V. Rodriguez-Gomez, J. M. Lotz, P. Torrey, A. C. N.Quirk, L. Hernquist, M. Vogelsberger, and P. E. Freeman, “Automateddistant galaxy merger classiﬁcations from Space Telescope images usingthe Illustris simulation,”

MNRAS , vol. 486, pp. 3702–3720, Jul 2019.[6] S. T. Sohn, J. Anderson, and R. P. van der Marel, “The M31 VelocityVector. I. Hubble Space Telescope Proper-motion Measurements,”

ApJ ,vol. 753, p. 7, Jul 2012.[7] J. M. Lotz, P. Jonsson, T. J. Cox, D. Croton, J. R. Primack, R. S.Somerville, and K. Stewart, “The Major and Minor Galaxy Merger Ratesat z < 1.5,”

ApJ , vol. 742, p. 103, Dec 2011.[8] C. Lacey and S. Cole, “Merger rates in hierarchical models of galaxyformation,”

MNRAS , vol. 262, pp. 627–649, Jun 1993.[9] E. F. Bell, S. Phleps, R. S. Somerville, C. Wolf, A. Borch, andK. Meisenheimer, “The Merger Rate of Massive Galaxies,”

ApJ ,vol. 652, pp. 270–276, Nov 2006.[10] D. W. Darg, S. Kaviraj, C. J. Lintott, K. Schawinski, M. Sarzi,S. Bamford, J. Silk, R. Proctor, D. Andreescu, P. Murray, R. C. Nichol,M. J. Raddick, A. Slosar, A. S. Szalay, D. Thomas, and J. Vandenberg,“Galaxy Zoo: the fraction of merging galaxies in the SDSS and theirmorphologies,”

MNRAS , vol. 401, pp. 1043–1056, Jan 2010.[11] V. Rodriguez-Gomez, S. Genel, M. Vogelsberger, D. Sijacki,A. Pillepich, L. V. Sales, P. Torrey, G. Snyder, D. Nelson, V. Springel,C.-P. Ma, and L. Hernquist, “The merger rate of galaxies in the Illustrissimulation: a comparison with observations and semi-empirical models,”

MNRAS , vol. 449, pp. 49–64, May 2015.[12] C. Lintott, K. Schawinski, S. Bamford, A. Slosar, K. Land, D. Thomas,E. Edmondson, K. Masters, R. C. Nichol, M. J. Raddick, A. Szalay,D. Andreescu, P. Murray, and J. Vandenberg, “Galaxy Zoo 1: data releaseof morphological classiﬁcations for nearly 900 000 galaxies,”

MNRAS ,vol. 410, pp. 166–178, Jan 2011.[13] K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for imagerecognition,” 2015.[14] M. Tan and Q. V. Le, “EfﬁcientNet: Rethinking Model Scaling forConvolutional Neural Networks,” arXiv e-prints , p. arXiv:1905.11946,May 2019.[15] A. Krizhevsky, I. Sutskever, and G. E. Hinton, “Imagenet classiﬁcationwith deep convolutional neural networks,” in

Advances in Neural Infor-mation Processing Systems 25 (F. Pereira, C. J. C. Burges, L. Bottou,and K. Q. Weinberger, eds.), pp. 1097–1105, Curran Associates, Inc.,2012.[16] C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan,V. Vanhoucke, and A. Rabinovich, “Going Deeper with Convolutions,” arXiv e-prints , p. arXiv:1409.4842, Sep 2014.17] K. Simonyan and A. Zisserman, “Very Deep Convolutional Networksfor Large-Scale Image Recognition,” arXiv e-prints , p. arXiv:1409.1556,Sep 2014.[18] K. He, X. Zhang, S. Ren, and J. Sun, “Deep Residual Learning forImage Recognition,” arXiv e-prints , p. arXiv:1512.03385, Dec 2015.[19] L. Ferreira, C. J. Conselice, K. Duncan, T.-Y. Cheng, A. Grifﬁths, andA. Whitney, “Galaxy Merger Rates up to z ∼ ApJ , vol. 895, p. 115, June 2020.[20] S. Ackermann, K. Schawinski, C. Zhang, A. K. Weigel, and M. D. Turp,“Using transfer learning to detect galaxy mergers,”

MNRAS , vol. 479,pp. 415–425, Sep 2018.[21] M. Walmsley, L. Smith, C. Lintott, Y. Gal, S. Bamford, H. Dick-inson, L. Fortson, S. Kruk, K. Masters, C. Scarlata, B. Simmons,R. Smethurst, and D. Wright, “Galaxy Zoo: Probabilistic Morphologythrough Bayesian CNNs and Active Learning,”

MNRAS , p. 2414, Oct2019.[22] N. O. Ralph, R. P. Norris, G. Fang, L. A. F. Park, T. J. Galvin, M. J.Alger, H. Andernach, C. Lintott, L. Rudnick, S. Shabala, and O. I.Wong, “Radio Galaxy Zoo: Unsupervised Clustering of ConvolutionallyAuto-encoded Radio-astronomical Images,”

PASP , vol. 131, p. 108011,Oct 2019.[23] J. B´edorf, E. Gaburov, M. S. Fujii, K. Nitadori, T. Ishiyama, and S. Porte-gies Zwart, “24.77 pﬂops on a gravitational tree-code to simulate themilky way galaxy with 18600 gpus,” in

Proceedings of the InternationalConference for High Performance Computing, Networking, Storage andAnalysis , SC ’14, (Piscataway, NJ, USA), pp. 54–65, IEEE Press, 2014.[24] J. B´edorf, E. Gaburov, and S. Portegies Zwart, “A sparse octreegravitational N-body code that runs entirely on the GPU processor,”

Journal of Computational Physics , vol. 231, pp. 2825–2839, Apr. 2012.[25] J. Barnes and P. Hut, “A hierarchical O(N log N) force-calculationalgorithm,”

Nature , vol. 324, pp. 446–449, Dec 1986.[26] J. F. Navarro, C. S. Frenk, and S. D. M. White, “The Structure of ColdDark Matter Halos,”

ApJ , vol. 462, p. 563, May 1996.[27] P. X. Qian, M. X. Cai, S. Portegies Zwart, and M. Zhu, “SiMon:Simulation Monitor for Computational Astrophysics,”

PASP , vol. 129,p. 094503, Sept. 2017.[28] D. P. Kingma and M. Welling, “Auto-Encoding Variational Bayes,” arXiv e-printsarXiv e-prints