[PDF] Analyzing the Galactic pulsar distribution with machine learning

Abstract

We explore the possibility of inferring the properties of the Galactic neutron star population through machine learning. In particular, in this paper we focus on their dynamical characteristics and show that an artificial neural network is able to estimate with high accuracy the parameters which control the current positions of a mock population of pulsars. For this purpose, we implement a simplified population-synthesis framework (where selection biases are neglected at this stage) and concentrate on the natal kick-velocity distribution and the distribution of birth distances from the Galactic plane. By varying these and evolving the pulsar trajectories in time, we generate a series of simulations that are used to train and validate a suitably structured convolutional neural network. We demonstrate that our network is able to recover the parameters governing the kick-velocity and Galactic height distribution with a mean relative error of about 10^{-2}. We discuss the limitations of our idealized approach and study a toy problem to introduce selection effects in a phenomenological way by incorporating the observed proper motions of 216 isolated pulsars. Our analysis highlights that increasing the sample of pulsars with accurate proper motion measurements by a factor of \sim10, one of the future breakthroughs of the Square Kilometer Array, we might succeed in constraining the birth spatial and kick-velocity distribution of the neutron stars in the Milky Way with high precision through machine learning.

Full PDF

MMNRAS , 1–27 (2021) Preprint 18 January 2021 Compiled using MNRAS L A TEX style ﬁle v3.0

Analyzing the Galactic pulsar distribution with machine learning

Michele Ronchi, , ★ Vanessa Graber, , † Alberto Garcia-Garcia, , , ‡ José A. Pons, Nanda Rea , Institute of Space Sciences (ICE, CSIC), Campus UAB, Carrer de Can Magrans s/n, 08193 Barcelona, Spain Institut d’Estudis Espacials de Catalunya (IEEC), 08034 Barcelona, Spain Departament de Física Aplicada, Universitat d’Alacant, 03690 Alicante, Spain

Accepted XXX. Received YYY; in original form ZZZ

ABSTRACT

We explore the possibility of inferring the properties of the Galactic neutron star populationthrough machine learning. In particular, in this paper we focus on their dynamical charac-teristics and show that an artiﬁcial neural network is able to estimate with high accuracythe parameters which control the current positions of a mock population of pulsars. For thispurpose, we implement a simpliﬁed population-synthesis framework (where selection biasesare neglected at this stage) and concentrate on the natal kick-velocity distribution and the dis-tribution of birth distances from the Galactic plane. By varying these and evolving the pulsartrajectories in time, we generate a series of simulations that are used to train and validatea suitably structured convolutional neural network. We demonstrate that our network is ableto recover the parameters governing the kick-velocity and Galactic height distribution with amean relative error of about 10 − . We discuss the limitations of our idealized approach andstudy a toy problem to introduce selection eﬀects in a phenomenological way by incorporatingthe observed proper motions of 216 isolated pulsars. Our analysis highlights that increasingthe sample of pulsars with accurate proper motion measurements by a factor of ∼

10, one ofthe future breakthroughs of the Square Kilometer Array, we might succeed in constrainingthe birth spatial and kick-velocity distribution of the neutron stars in the Milky Way with highprecision through machine learning.

Key words: stars: magnetic ﬁeld – stars: neutron – pulsars: general – methods: observational

Neutron stars have been observed to travel through the Galaxy withtypical velocities of around several hundreds of kilometers per sec-ond, reaching more than a thousand kilometers per second in someextreme cases (Chatterjee et al. 2005; Hobbs et al. 2005; Hui &Becker 2006; Pavan et al. 2014). Accurate information on neu-tron star positions and velocities in the Milky Way usually comesfrom radio timing and interferometric observations (see Chapters 8and 9 in Lorimer & Kramer 2004; Liu et al. 2020, and referencestherein) or high spatial resolution X-ray observations with, e.g., the

Chandra

X-ray observatory (Motch et al. 2009). These observationsprovide measurements of the pulsars’ angular positions in the skyand their proper motions projected onto the celestial sphere. In somecases, the radio pulse dispersion measure ( 𝐷𝑀 ) or the X-ray ab-sorption density ( 𝑁 H ) together with Galactic free electron-densityand hydrogen-density models (Balucinska-Church & McCammon ★ E-mail: [email protected] † E-mail: [email protected] ‡ E-mail: [email protected] © a r X i v : . [ a s t r o - ph . H E ] J a n Ronchi et al. distribution from current observational data is not straightforward.Most pulsars, especially those with very high velocities, have movedfar away from their birth places, and their proper motions have beenmodiﬁed by the Galactic gravitational potential. Thus, the currentvelocity of a pulsar may diﬀer substantially from its velocity at birth.Knowing the exact pulsar age and its current 3D spatial velocity,we are in principle able to recover the initial conditions by inte-grating the pulsar’s orbit back in time. However, in general we lackinformation about the pulsar’s line-of-sight velocity, and accurateknowledge about its age, since the characteristic age estimated fromthe pulsar period and its derivative can diﬀer signiﬁcantly from thetrue age (see e.g. Kaspi et al. 2001; Viganò et al. 2013). Further-more, estimates of pulsar distances have typically large associatederrors due to uncertainties in the underlying density models usedto convert pulsar 𝐷𝑀 or 𝑁 H into distance estimates (Lorimer et al.2006; He et al. 2013; Deller et al. 2019).Reconstruction of the three-dimensional initial position andvelocity distribution of pulsars, and comparison with the observedGalactic neutron star population is therefore a complicated taskthat requires careful simulations as well as detailed estimates of theobservational biases of multi-band surveys. Several studies have per-formed statistical and population synthesis analyses to recover thedistributions of important neutron star parameters from the observedpopulation (see e.g. Arzoumanian et al. 2002; Brisken et al. 2003;Hobbs et al. 2005; Faucher-Giguère & Kaspi 2006; Gullón et al.2014; Verbunt et al. 2017; Cieślar et al. 2020). While these modelsare broadly able to explain the observational data, high degrees ofdegeneracy between the diﬀerent input parameters make it diﬃcultto exactly pin down the distributions that control the pulsars’ birthproperties, such as their natal kick velocities. Nonetheless, disen-tangling the birth properties of the isolated neutron star populationin our Galaxy is crucial as it has important implications for severallines of research, including formation mechanisms of these com-pact stars, the evolution of massive stars, as well as extreme eventssuch as Gamma-Ray Bursts (GRBs), Fast Radio Bursts (FRBs) andpeculiar types of supernovae.Constraining the birth properties of isolated neutron stars fromobservational data is the main motivator for our work. Instead offollowing earlier approaches that have employed standard statisti-cal techniques, we focus on characterizing initial pulsar propertiesusing machine learning techniques which have seen increasing inter-est in the astronomy and astrophysics communities in recent years.This paper, which will be the ﬁrst in a series, is dedicated to thetechnical aspects of these eﬀorts and aims to show that a machine-learning framework can be used to estimate parameters with highaccuracy. For this feasibility study, we restrict ourselves to a simpli-ﬁed approach, where selection eﬀects and observational biases areneglected and reduced physical models are suﬃcient. In particular,we focus on the dynamical properties of the pulsar population andexplore the possibility of inferring the parameters that control agiven Galactic pulsar kick-velocity and scale-height distribution atbirth (the two quantities that largely control the spatial distributionof pulsars in the Milky Way) through neural networks. For this pur-pose, we implement a basic population synthesis code in Python and simulate the dynamical evolution of a synthetic population ofisolated neutron stars for a variety of diﬀerent birth position and na-tal kick distributions. These evolved mock populations are then fedinto a suitably structured machine-learning pipeline with the aimof recovering the underlying parameters controlling the distribu-tions. We show that this procedure is successful at estimating birthcharacteristics. Additionally, we link our framework to the observedsample of pulsars with measured proper motion in a phenomeno- logical way and discuss implications for future pulsar survey eﬀorts,i.e., with the Square Kilometer Array (SKA).Our paper is structured as follows: In §2 we describe the meth-ods used to simulate and evolve a mock neutron star population intime. §3 contains a description of the machine-learning framework,including the generation of our data-sets (§3.1), the employed net-work architectures (§3.2) as well as details of the training process(§3.3). In §4 we present our experiments, which are discussed in de-tail and connected with observational data in §5. Finally, we providea summary and outlook in §6.

A widely used approach to investigate the properties of the ob-served neutron star population is through population synthesis (seee.g. Narayan & Ostriker 1990; Faucher-Giguère & Kaspi 2006;Gonthier et al. 2007; Kiel et al. 2008; Kiel & Hurley 2009; Os-łowski et al. 2011; Levin et al. 2013; Gullón et al. 2014; Bateset al. 2014; Cieślar et al. 2020). These frameworks aim to simulatethe evolution of a population of neutron stars from birth until to-day. The resulting mock population is then compared with the realobserved population in order to constrain and validate the physi-cal model assumptions that entered the simulation. In particular,the population synthesis approach relies on assumptions about thedistributions of the birth properties of the mock neutron stars, andtypically takes advantage of Monte–Carlo methods to construct theinitial parameters of each simulated star. Starting from these initialconditions the mock population is then evolved over time accordingto some evolutionary prescriptions, and eventually contrasted withreal data. For the development of our population synthesis frame-work we largely follow Faucher-Giguère & Kaspi (2006), Gullónet al. (2014) and Cieślar et al. (2020). The necessary ingredients arebrieﬂy summarized in the following.

The age 𝑡 age of each neutron star is randomly drawn from a uniformprobability distribution between 1 and 10 yr. By choosing a uni-form distribution, we assume that the birth rate of neutron stars isconstant in the chosen time range. For all simulations of the syntheticneutron star population we choose an average neutron star birth rateof 1 neutron star per century, compatible with the core-collapse su-pernova rate in the Galaxy (Rozwadowska et al. 2021). This yieldsa total of 10 simulated neutron stars for each synthetic population,whose evolution we can compute within reasonable timescales. To deﬁne the initial positions we use both a Cartesian referenceframe ( 𝑥, 𝑦, 𝑧 ) and a cylindrical reference frame ( 𝑟, 𝜙, 𝑧 ) , whoseorigins are located at the center of the Galaxy, where 𝑟 representsthe distance in kiloparsec from the Galactic center and 𝜙 is theazimuthal angle in radians. The two coordinate systems are relatedby the transformation:  𝑥 = 𝑟 cos 𝜙,𝑦 = 𝑟 sin 𝜙,𝑧 = 𝑧. (1)We assume that the Sun is located at the coordinates 𝑥 = 𝑦 = 𝑅 (cid:12) , 𝑧 = 𝑧 (cid:12) , where 𝑅 (cid:12) = . 𝑧 (cid:12) = .

02 kpc. We

MNRAS , 1–27 (2021) nalyzing the Galactic pulsar distribution with ML Table 1.

Parameters of the Milky Way spiral arm structure: the windingconstant 𝑘 , the inner radius 𝑟 and the inner angle 𝜙 (adapted from Table1 in Yao et al. (2017); see §2.2 for more details).Arm number Name 𝑘 𝑟 𝜙 [rad] [kpc] [rad]1 Norma 4.95 3.35 0.772 Carina-Sagittarius 5.46 3.56 3.823 Perseus 5.77 3.71 2.094 Crux-Scutum 5.37 3.67 5.76 calculate the initial position at birth of each neutron star in bothcylindrical and Cartesian galactocentric reference frames. To do so,we execute the following steps: i) First, we draw a random distance 𝑟 from the Galactic centerfor each neutron star ranging between 10 − and 20 kpc accordingto a stellar radial density distribution 𝑃 ( 𝑟 ) . In particular, we followthe Milky Way’s pulsar surface density 𝜌 ( 𝑟 ) deﬁned by Eq. (15)in Yusifov & Küçük (2004) to determine the probability densityfunction for the radial distance: 𝑃 ( 𝑟 ) = 𝜋𝑟 𝜌 ( 𝑟 ) , (2)with 𝜌 ( 𝑟 ) = 𝐴 (cid:18) 𝑟 + 𝑅 𝑅 (cid:12) + 𝑅 (cid:19) 𝑎 exp (cid:20) − 𝑏 (cid:18) 𝑟 − 𝑅 (cid:12) 𝑅 (cid:12) + 𝑅 (cid:19)(cid:21) , (3)where 𝐴 = . − , 𝑎 = . 𝑏 = . 𝑅 = .

55 kpc and 𝑅 (cid:12) = . 𝑅 (cid:12) value assumed above, we choose tokeep 𝑅 (cid:12) = . ii) Neutron stars are born mainly within the Galactic spiral arms,as these regions are rich in massive OB stars (Chen et al. 2019).We implement a model for the galactic spiral structure that in-cludes four arms with a logarithmic shape function which givesthe azimuthal coordinate 𝜙 as a function of the distance from theGalactic center: 𝜙 ( 𝑟 ) = 𝑘 ln (cid:18) 𝑟𝑟 (cid:19) + 𝜙 . (4)Our values of the model parameters, i.e., the winding constant 𝑘 ,the inner radius 𝑟 and the inner angle 𝜙 are reported in Table 1and evaluated from Table 1 in Yao et al. (2017) in order to matchthe same functional form as deﬁned in Eq. (4). For our analysis,we follow Faucher-Giguère & Kaspi (2006) and do not include theLocal arm, whose density is much smaller than that of the fourmajor arms (Cordes & Lazio 2002; Yao et al. 2017).For each star we randomly select one of the four spiral arms withequal probability, and evaluate the angular coordinate 𝜙 for its given 𝑟 according to Eq. (4).The spiral pattern of the Galaxy is not static and as a ﬁrst ap-proximation can be considered as a rigid structure which rotateswith an approximated period 𝑇 =

250 Myr (Vallée 2017; Skowron et al. 2019). Knowing the age of an object and assuming a rotationalangular velocity of

Ω = 𝜋 / 𝑇 for the spiral structure, we can de-rive the angular position at birth of each neutron star. Note that theGalaxy rotates in the clockwise direction, i.e., toward decreasing 𝜙 angles.After obtaining the corresponding angular coordinate for eachneutron star birth position, we add noise to both coordinates 𝑟 and 𝜙 to smear out the distribution and avoid artiﬁcial featuresnear the Galactic center. For this purpose we add a correction 𝜙 corr = 𝜙 rand exp (− . 𝑟 / kpc ) to the 𝜙 coordinate, where 𝜙 rand israndomly drawn from a uniform distribution in the interval [ , 𝜋 ) ,and to the 𝑟 coordinate a correction 𝑟 corr randomly drawn from anormal distribution centered at 0 with standard deviation 𝜎 = . 𝑟 .Although this prescription was introduced by Faucher-Giguère &Kaspi (2006) (see their Section 3.2.1) in a somewhat arbitrary man-ner, the resulting stellar distribution broadly agrees with that ob-served for very young high-mass stars shown in Reid et al. (2019).Then the birth position in polar coordinates of each neutron staris given by ( 𝑟 + 𝑟 corr , 𝜙 ( 𝑟 ) + Ω 𝑡 age + 𝜙 corr ) with units [kpc, rad]. iii) To determine the height 𝑧 in kiloparsec from the Galactic planeof each neutron star, we adopt an exponential disk model as givenby Wainscoat et al. (1992). It is shaped by the characteristic scale-height parameter ℎ c : 𝑃 ( 𝑧 ) = ℎ c exp (cid:18) − | 𝑧 | ℎ c (cid:19) . (5)For our machine-learning experiments, we will vary the scale heightin the range [ . , ] kpc to simulate neutron star populations withdiﬀerent spread in galactic height. This range encompasses the value ℎ c = .

18 kpc, which was adopted by Gullón et al. (2014) to matchradio pulsar observations, and which is also compatible with thepopulation of young massive stars in the Galactic disk (Li et al.2019). We will consider ℎ c = .

18 kpc below, whenever a ﬁducialscale height is required for our synthetic pulsar population. Thecoordinate 𝑧 of each neutron star is then randomly drawn accordingto this height probability distribution in a range of 10 − to 5 kpc.Subsequently, for each star we randomly choose if 𝑧 is positiveor negative determining in this way a position above or below theGalactic plane. We assume that the initial velocity of the neutron stars in the Galaxyis given by two contributions: the progenitor velocity in the Galacticgravitational potential and a kick speed imparted onto the neutronstars as a result of the supernova explosion. We consider a progenitorcircular orbital speed given by the following relation: 𝑣 orb = √︂ 𝑟 𝜕 Φ MW ( 𝑟, 𝑧 ) 𝜕𝑟 , (6)where Φ MW is the Milky Way gravitational potential discussedbelow. We assume that each neutron star has an initial kick velocity v k , whose 3D magnitude 𝑣 k is randomly drawn from a Maxwelldistribution, shaped by the dispersion parameter 𝜎 k : 𝑃 ( 𝑣 k ) = √︂ 𝜋 𝑣 𝜎 exp (cid:32) − 𝑣 𝜎 (cid:33) . (7)For our machine-learning purposes, we will vary 𝜎 k in the range [ , ] km s − and randomly draw 3D velocity magnitudes fromthe resulting distribution in the range [ , ] km s − . This spreadallows us to easily accommodate the fastest observed neutron stars MNRAS000

Ronchi et al.

Table 2.

Parameters of the Milky Way gravitational potential taken fromTable 1 in Marchetti et al. (2019), see also Bovy (2015).Component Parametersnucleus (n) 𝑀 n = . × 𝑀 (cid:12) 𝑟 n = .

07 kpcbulge (b) 𝑀 b = . × 𝑀 (cid:12) 𝑟 b = . 𝑀 d = . × 𝑀 (cid:12) 𝑎 d = .

00 kpc 𝑏 d = .

28 kpchalo (h) 𝑀 h = . × 𝑀 (cid:12) 𝑟 h = .

62 kpc whose velocities have been estimated to surpass 1000 km s − (seefor example Chatterjee et al. 2005; Hui & Becker 2006; Pavan et al.2014). Based on pulsar timing measurements, Hobbs et al. (2005)have suggested that 𝜎 k =

265 km s − provides a viable explanationfor the proper motions of observed neutron stars. We will use this asa ﬁducial value below. For a given kick velocity magnitude we thenassociate a random direction to this velocity in order to evaluatethe three components ( 𝑣 k ,𝑟 , 𝑣 k ,𝜙 , 𝑣 k ,𝑧 ) in galactocentric cylindricalcoordinates. Therefore, the three components of the total initialvelocity of each neutron star in the galactocentric reference frameare computed as ( 𝑣 k ,𝑟 , 𝑣 orb + 𝑣 k ,𝜙 , 𝑣 k ,𝑧 ) . As typical for spiral galaxies like the Milky Way, we assume anaxisymmetric Galactic potential Φ MW (Carlberg & Innanen 1987;Bovy 2015) that does not incorporate the impact of the spiral armsthemselves. We speciﬁcally consider a four-component Galacticpotential model consisting of a nucleus ( Φ n ), a bulge ( Φ b ), a disk( Φ d ) and a halo ( Φ h ) as discussed in Marchetti et al. (2019): Φ MW = Φ n + Φ b + Φ d + Φ h . (8)The nucleus and the bulge are described by a Hernquist potential(Hernquist 1990): Φ n = − 𝐺 𝑀 n 𝑟 n + 𝑟 , (9) Φ b = − 𝐺 𝑀 b 𝑟 b + 𝑟 . (10)The disk has a Miyamoto-Nagai disk potential (Miyamoto & Nagai1975): Φ d = − 𝐺 𝑀 d √ 𝐾 + 𝑟 , (11)where 𝐾 = 𝑎 d + √︃ 𝑧 + 𝑏 is the shape parameter with 𝑎 d as thescale length and 𝑏 d the scale height of the disk. The halo has aNavarro-Frenk-White potential (Navarro et al. 1996): Φ h = − 𝐺 𝑀 h 𝑟 ln (cid:18) + 𝑟𝑟 h (cid:19) . (12)The parameters of this model are reported in Table 2 and werederived by Bovy (2015) through a ﬁt of the mass proﬁle of the MilkyWay. We assume that these contributions to the galactic potentialare stationary in time, i.e., they do not evolve over the time span weconsider for the dynamical evolution. Given the initial conditions deﬁned above, i.e., the initial position,initial velocity and the Galactic gravitational potential, we can solvethe equations of motion to determine the neutron stars’ dynamicalevolution. The system of dynamical equations that requires solvingto determine the orbits of the neutron stars in the galactic potentialis given by the Newtonian equations of motion: (cid:165) r = − ∇ Φ MW . Incylindrical galactocentric coordinates the three components of thisvector equation take the form:  (cid:165) 𝑟 − 𝑟 (cid:164) 𝜙 = − 𝜕 Φ MW 𝜕𝑟 , (cid:164) 𝑟 (cid:164) 𝜙 + 𝑟 (cid:165) 𝜙 = − 𝑟 𝜕 Φ MW 𝜕𝜙 , (cid:165) 𝑧 = − 𝜕 Φ MW 𝜕𝑧 . (13)For each neutron star we numerically integrate the above equationsin time from 𝑡 = 𝑡 = 𝑡 age using a discrete time step. We usethe Python package scipy.integrate.odeint and to speed upthe computational time also employ the module jit (“just in time")from the Numba package ( https://numba.pydata.org/ , Lamet al. 2015) . To asses the performance of our integration methodwe check that the potential plus kinetic energy of all the stars in oursimulation is conserved. We ﬁnd that this is the case with a relativeerror of (cid:46) − . The output of the dynamical evolution consistsof the position and velocity of each neutron star computed in bothgalactocentric (GC) and equatorial ICRS (International CelestialReference System) frames. To transform between diﬀerent coor-dinate systems we employed the method coordinates from thePython library Astropy (Astropy Collaboration et al. 2013, 2018).

In the past decade, the accumulation of extensive and heavy data-setshas been almost ubiquitous in astronomy and astrophysics. In orderto take full advantage of these data and perform data-driven sciencethat complements ongoing theoretical modeling eﬀorts, new tech-niques and analysis pipelines that can handle these large amounts ofdata (and do so in an automated way) are required. Machine learning(ML) has played an important role in developing such new algo-rithms (Ball & Brunner 2010; Allen et al. 2019; Baron 2019; Fluke& Jacobs 2020). For compact-object related science, ML algorithmshave for example been developed to classify new pulsars candidates(Balakrishnan et al. 2020; Lin et al. 2020) as well as transient ra-dio events such as FRBs (Agarwal et al. 2020). Other approacheshave aimed at forecasting and analyzing gravitational-wave signalsin real time (Cabero et al. 2020; Wei & Huerta 2020; Skliris et al.2020; Gerosa et al. 2020), interpreting gravitational wave-eventsin light of population synthesis (Wong & Gerosa 2019), or recon-structing the neutron star equation of state from observed quantities(Morawski & Bejger 2020).For our analysis, we will focus on artiﬁcial neural networks(ANNs). ANNs are algorithms inspired by the structure of biologicalbrains that can be thought of as nets of interconnected neuronsthat exchange information from one to another. When the networkreceives an input, it is able to process it to produce an output, justlike a biological brain responds to external stimulation. In ANNs,neurons are usually organized in a stack of layers. Each neuronin a layer receives input signals (typically real numbers) from the Numba translates Python functions into optimized machine code at run-time, which allows us to achieve a speed-up by about a factor of 6.MNRAS , 1–27 (2021) nalyzing the Galactic pulsar distribution with ML N u m b e r o f N S s | v r | [ k m s ] | v | [ k m s ] | v z | [ k m s ] Figure 1.

Examples of 128 ×

128 resolution maps in the galactocentric 𝑥𝑦 -plane extending from −

20 to 20 kpc in both 𝑥 and 𝑦 direction and showing (inorder from left to right) the density of simulated neutron stars, the distribution of average values of the 𝑣 𝑟 , 𝑣 𝜙 and 𝑣 𝑧 velocity components for a population ofneutron stars simulated with ℎ c = .

18 kpc and 𝜎 k =

Examples of 128 ×

64 resolution maps in the equatorial ICRS frame extending from 0 ◦ to 360 ◦ in RA and from − ◦ to 90 ◦ in DEC and showing (inorder from left to right) the density of simulated neutron stars, the distribution of average values for the 𝜇 RA , 𝜇 DEC proper-motion components for a populationof neutron stars simulated with ℎ c = .

18 kpc and 𝜎 k =

265 km s − . For visualization purposes, we represent the data using a colormap to highlight theresulting structures; red regions are characterized by a higher density of stars or a higher average magnitude of the velocity components, respectively, while blueareas correspond to lower densities and lower velocity magnitudes. Note that the Galactic silhouette is visible as a stream in the position map with an enhancedstellar density close to the Galactic Center. Due to low-number statistics, the regions outside the Galactic stream in the proper-motion maps are dominated bystatistical ﬂuctuations, i.e., the corresponding high-velocity regions are attributed to a small number of high proper-motion stars that have escaped the disk. Asa result, the disk itself is dominated by stars with lower proper motion. neurons in the previous layer and produces an output signal byapplying a non-linear activation function to a linear combinationof the input signals according to certain weights and a bias. Theoutput is then passed to the neurons in the following layer, and soon until the ﬁnal layer is reached and the output is generated. In ourparticular case, we will focus on supervised learning, where trainingthe neural network consists of making it produce a speciﬁc targetoutput when a particular input is passed through it. This is achievedby (i) labeling the input samples in the training data-set with a labelindicating the property that the network has to learn (the so-calledground truth) and (ii) iteratively adjusting the values of weightsand biases, also called network parameters, in order to minimizea speciﬁc loss function which measures the distance between thenetwork output prediction and the target ground truth.ANNs have been primarily applied in classiﬁcation and regres-sion tasks. In classiﬁcation problems, the network is trained in orderto distinguish various input data between diﬀerent classes. On theother hand, in regression problems, the network has to infer contin-uous variables for the given input data. The latter one is the kindof problem we are after since we want our network to infer certainvalues given the evolved neutron star population. In the remainderof this section, we discuss (i) the simulation data we create for ourML experiments, then (ii) focus on the speciﬁc network architec-ture employed, and ﬁnally (iii) describe the details of our trainingprocess. The goal of our ML approach is to predict the parameters that con-trol the dynamical properties of an evolved neutron star population.In particular, we focus on predicting the kick-velocity parameter 𝜎 k and the scale-height parameter ℎ c , which predominantly aﬀect thedistribution of pulsars in the Milky Way. To extract these from anevolved population, and follow a supervised learning approach, weﬁrst need to train a neural network on a series of simulated popula-tions (created by exploring the ranges for 𝜎 k and ℎ c ). Following theprescription described in §2, we perform the following simulationruns: S1 We generate 10 data-sets with an increasing number of samples(speciﬁcally 4, 8, 16, 32, 64, 128, 256, 512, 1024 and 20000 sim-ulated populations) by uniformly varying the parameter 𝜎 k of thekick velocity distribution in the range [1, 700] km s − . For thesesimulations, we keep the characteristic scale of the 𝑧 -distributionﬁxed to its ﬁducial value ℎ c = .

18 kpc. S2 We ﬁx the kick-velocity parameter to its ﬁducial value 𝜎 k =

265 km s − and generate a data-set of 20000 samples of simulatedpopulations by uniformly varying the scale-height parameter ℎ c inthe range [0.02, 2] kpc. To generate our simulation data, we partially employ the package

Hydra ( https://hydra.cc/ , Yadan 2019), which allows us to easily sweep entireparameter ranges.MNRAS , 1–27 (2021) Ronchi et al. S3 We generate 6 data-sets, where we uniformly vary the kick-velocity parameter 𝜎 k parameter in the range [1, 700] km s − aswell as the characteristic scale of the 𝑧 -distribution ℎ c in the range[0.02, 2] kpc. We choose the data-set sizes 16 = ×

4, 64 = × = ×

16, 1024 = ×

32, 4096 = ×

64 and 16384 = ×

128 given by all the combinations of 𝜎 k and ℎ c values.As an example: the 16 populations in the ﬁrst set are obtained bycombining each of the 4 values of the 𝜎 k parameter with all 4 valuesof the ℎ c parameter.As addressed in detail in §4, the smaller simulation data-sets willbe used to explore the network behavior, while the largest data-setscontaining 20000 and 16384 samples, respectively, will be used toassess the actual network accuracy in generalization scenarios.After the runs have been performed, we transform the outputof the simulation into a representation that can be interpreted bya ML pipeline. Since ANNs require the use of structured data, werepresent the position and velocity output of the simulations in theform of 2D binned density and velocity maps both in the galactocen-tric and ICRS reference frames. The density maps give informationabout the density of neutron stars in the Galaxy by providing thenumber count of stars in each spatial bin. On the other hand, ve-locity maps contain information about the kinematic properties ofthe neutron stars by providing the average magnitude of the stellarvelocity components in each spatial bin. In the galactocentric mapsthe Galaxy is represented face on and projected onto the 𝑥𝑦 -planeof the Cartesian galactocentric frame, extending from −

20 kpc to20 kpc in 𝑥 and 𝑦 direction. The ICRS maps instead extend from 0 ◦ to 360 ◦ in right ascension (RA) and from − ◦ to 90 ◦ in declination(DEC). To each map we apply a smoothing Gaussian ﬁlter (with ra-dius 4 . · 𝜎 + . 𝜎 =

1) in order to add some blurring and avoidsharp features. By doing so, we reduce noisy high-frequency fea-tures and thus make the training more stable, presumably resultingin better generalization capabilities. Therefore, for each simulatedpopulation we have: • • 𝑣 𝑟 , 𝑣 𝜙 and 𝑣 𝑧 in [km s − ]. • • 𝜇 RA and 𝜇 DEC in[mas yr − ].This set of maps for each population is labeled with the correspond-ing values of the parameters 𝜎 k and ℎ c used to simulate that speciﬁcpopulation.We will test the ML performance on three diﬀerent map res-olutions. Therefore, we generate each of the above data-sets withresolutions 32 ×

32, 128 ×

128 and 512 ×

512 square bins. Notethat in the ICRS maps the DEC coordinate axis range is half thatof the RA coordinate axis. Hence, these maps have half the binsalong the DEC coordinate and their resolution is 32 ×

16, 128 × ×

256 square bins. For brevity hereafter we will refer to thethree resolutions as 32, 128 and 512 resolutions for both galacto-centric and ICRS maps, respectively. An example of the maps withresolution 128 for a simulation with ﬁducial values ℎ c = .

18 kpcand 𝜎 k =

265 km s − is shown in Figs. 1 and 2.Before loading the maps into our ML pipeline, they are nor-malized so that each bin contains a continuous value between 0 and1. The same applies to the related labels so that their values rangecontinuously between 0 and 1. The aim of normalization is to speedup the training process and make convergence easier since all inputs will provide signals of similar magnitude to the loss-function min-imization. This is useful especially for multi-parameter and multi-channel training, that is when we train a network to predict morethan one parameter or use channels that have diﬀerent absolute mag-nitudes. In these cases, training without normalization might leadto slower, worse or even no convergence at all. For the implementation of our ML pipeline we use

PyTorch (Paszkeet al. 2019), an optimized tensor library for deep learning usingGPUs and CPUs, that is written in

Python . The simplest neuralnetwork that can be used for this task is a fully connected neuralnetwork also referred to as a multi-layer perceptron (MLP) withonly a single input and a single output layer of neurons. As this isthe starting point to develop more complicated and advanced archi-tecture models, we ﬁrst test how this simple conﬁguration behaves.The number of neurons for the input layer is equal to the total num-ber of input features, i.e., 𝐶 × 𝑊 × 𝐻 . Here, 𝐶 is the number of inputchannels (corresponding to the total number of maps used), while 𝑊 and 𝐻 are the number of bins in width and height, respectively.The number of neurons in the output layer is equal to the numberof regression parameters that we would like to predict, i.e., 1 or 2in our experiments. For the activation function we use the RectiﬁedLinear Unit (ReLU) deﬁned as ReLU ( 𝑥 ) = max ( , 𝑥 ) .A more sophisticated model architecture is represented by aconvolutional neural network (CNN). CNNs are a particular typeof deep neural network that have proven to be very successful inregression and classiﬁcation tasks when applied to structured andmatrix-like 2D inputs (see Rawat & Wang 2017, for a review).The basic structure of CNNs consists of convolutional, pooling,and fully connected layers. Convolutional layers are multi-channelﬁlters that slide along the 2D input maps and are able to extractfeature maps. The role of the pooling layer is to down-sample theoutput of a convolutional layer. This inevitably causes a loss ofinformation but in general helps to improve the training eﬃciencyby increasing the size of the receptive ﬁeld (i.e., the region of theinput that produces the feature for each neuron) and reducing thenumber of trainable parameters. The fully connected layers collectall the output features from the convolution layers into a 1D inputand return the ﬁnal output prediction.The detailed structure of the CNN we built for our case studycan be found in Table 3 and a schematic representation is given inFig. 3. It consists of the following layers: • A 2D convolution layer with kernel size 3 × 𝐶 input channels,32 output channels, stride 1 and no padding. • A 2D Max pooling layer of size 2 × • A 2D convolution ﬁlter with kernel size 3 ×

3, 32 input channels,64 output channels, stride 1 and no padding. • A 2D Max pooling layer of size 2 × • A fully connected linear layer with ﬂattened input from theconvolutional modules output and 64 output neurons. • A fully connected linear layer with 64 input neurons and 1 or 2output neurons (depending on the number of parameters we wouldlike to predict).For the convolutional and pooling layers the stride parameter regu-lates the amount of displacement in bins that the ﬁlter moves overthe map at each step. Padding adds one or more bins at the borderof the 2D maps, so that the ﬁlters can move and cover the whole

MNRAS , 1–27 (2021) nalyzing the Galactic pulsar distribution with ML × × Conv2D MaxPool+ ReLU + ReLU

Flatten

Linear × × × ×

32 63 × × × × ×

64 30 × × σ k ,h c Linear + ReLUConv2D × × MaxPool × Figure 3.

Schematic representation of our CNN architecture for an input with resolution 128 ×

128 and 4 channels (1 density map plus 3 velocity maps). Amodule formed by two blocks that each contain a convolution layer and max pooling layer is followed by a fully connected linear network with one hiddenlayer. The ﬁnal network output is either a single parameter or two parameters depending on the experiment speciﬁcs.

Table 3.

CNN architecture. The total number of input and output featuresis reported. 𝐶 is the number of used channels, while 𝑊 and 𝐻 representthe number of bins (i.e. the resolution) in width and height of the densityand velocity maps, respectively. Input and output feature numbers have beenrounded down to the lower integer. Layer Input OutputConv2d + ReLU 𝐶 × 𝑊 × 𝐻 × ( 𝑊 − ) × ( 𝐻 − ) MaxPool2d 32 × ( 𝑊 − ) × ( 𝐻 − ) × (cid:16) 𝑊 − (cid:17) × (cid:16) 𝐻 − (cid:17) Conv2d + ReLU 32 × (cid:16) 𝑊 − (cid:17) × (cid:16) 𝐻 − (cid:17) × (cid:16) 𝑊 − (cid:17) × (cid:16) 𝐻 − (cid:17) MaxPool2d 64 × (cid:16) 𝑊 − (cid:17) × (cid:16) 𝐻 − (cid:17) × (cid:16) 𝑊 − (cid:17) × (cid:16) 𝐻 − (cid:17) Linear + ReLU 64 × (cid:16) 𝑊 − (cid:17) × (cid:16) 𝐻 − (cid:17) map without leaving any bins out. We use a padding of 0 becausethe borders of the maps do not contain relevant information.The choice of this architecture was found by trial and errorexperiments where we started from a very simple structure and pro-gressively increased the complexity, adding more and more layersto acquire the desired accuracy in predicting the input parameters. For the training of the network, we use the root-mean-square er-ror (RMSE) both for the loss function and validation metric, i.e.,to compute the distance between the network predictions and theground truths of the ℎ c and 𝜎 k parameters. In general validationoccurs at the same time as training and consists of testing the net-work over a data-set diﬀerent from the training set. This is neededto asses the ability of the network to generalize what it is learning toan unknown data-set. The minimization of the loss function occursthrough gradient descent and backpropagation, i.e., computation ofthe loss-function gradients with respect to all network parameters(weights and biases). These gradients taken with a negative sign in-dicate the directions towards which the network parameters shouldbe updated so that the loss is reduced, and hence the network pre-dictions move closer to the true, expected labels. In this regard, acrucial aspect to ensure the best performance of a neural networkis to properly initialize the weights and biases. For this purpose weuse the Kaiming initialization method (He et al. 2015) in order toavoid exploding or vanishing gradients during the training.The training process itself is regulated by several hyperparam-eters. The ﬁrst one is the learning rate, which is a coeﬃcient for theweight updates. In general, a larger learning rate results in updatesof larger magnitude, which could in turn lead to faster convergence,but might also reduce the stability of the training process and thusincrease the risk of overshooting the minima of the loss landscape. A second hyperparameter is the batch size, which deﬁnes the num-ber of samples that are packed together and passed through thenetwork before an optimization step is performed. In general, forbigger training data-sets, a larger batch size helps to increase theeﬃciency and stability of the training process, since the gradient-descent steps are averaged over many samples and noise is reduced.For the gradient-descent optimizer we use the Adaptive MomentEstimation (Adam) (Kingma & Ba 2014). As its name suggests,Adam adds an adaptive momentum term to the gradient descent toautomatically modify the learning rate and accelerate the conver-gence. When using the Adam optimizer the chosen initial value ofthe learning rate represents only an upper limit.We ﬁx the maximum number of learning epochs to 1024. Everyepoch the network performs a series of optimization steps by goingthrough the whole training data-set once. Then epoch-averaged lossand validation metric values are computed. If the validation metricvalue has improved with respect to the previous epoch the currentstatus of the optimized network is saved. We also set an early stopof 128 epochs, so that if the validation metric does not improveover this epoch span, the training process automatically stops andthe weights of the best epoch are stored. This prevents the networkfrom overﬁtting the training samples, which would reduce its abilityto generalize over unknown data. As a ﬁrst step we perform several tests to analyze which conﬁgura-tion of the input feature maps provides the best training experienceand which proposed neural network architecture, either the MLPor the CNN, behaves better. Regarding the input conﬁguration wewould like to understand (i) what the best type of maps is (galac-tocentric vs equatorial ICRS); (ii) how many input channels areneeded to obtain good results (do density maps provide enough in-formation alone or does the addition of velocity maps improve theresults?); (iii) which resolution of the input maps provides the bestresult. To do so, we ﬁrst compare the behavior of the MLP and theCNN when trained to predict a single parameter. We explore diﬀer-ent types of input signals by varying the resolution of the densityand velocity maps as well as the number of input channels. Oncewe ﬁnd the optimal conﬁguration for the input maps and the bestperforming network, we proceed to test its generalization power. Asimilar strategy is then followed for the two-parameter prediction.

MNRAS000

MNRAS000 , 1–27 (2021)

Ronchi et al.

10 100 1000Sample number0510152025 R M S E l o ss [ k m s ] GC position

10 100 1000Sample number0510152025 R M S E l o ss [ k m s ] GC position + velocity

10 100 1000Sample number0510152025 R M S E l o ss [ k m s ] ICRS position

10 100 1000Sample number0510152025 R M S E l o ss [ k m s ] ICRS position + velocity

Figure 4.

MLP best training RMSE values for the training process on the single parameter 𝜎 k of the Maxwell kick-velocity distribution, as a function of thetraining data-set size and the resolution ( red , blue , and orange curves for 32, 128 and 512 respectively) using the four diﬀerent input conﬁgurations T1 (GCposition), T2 (GC position + velocity), T3 (ICRS position) and T4 (ICRS position + velocity).

10 100 1000Sample number0510152025 R M S E l o ss [ k m s ] GC position

10 100 1000Sample number0510152025 R M S E l o ss [ k m s ] GC position + velocity

10 100 1000Sample number0510152025 R M S E l o ss [ k m s ] ICRS position

10 100 1000Sample number0510152025 R M S E l o ss [ k m s ] ICRS position + velocity

Figure 5.

CNN best training RMSE values for the training process on the single parameter 𝜎 k of the Maxwell kick-velocity distribution, as a function of thetraining data-set size and the resolution ( red , blue , and orange curves for 32, 128 and 512 respectively) using the four diﬀerent input conﬁgurations T1 (GCposition), T2 (GC position + velocity), T3 (ICRS position) and T4 (ICRS position + velocity). We focus on predicting the parameter 𝜎 k of the Maxwell kick ve-locity function and employ the density and velocity maps generatedfrom simulation run S1. We keep aside the data-set with 20000 sam-ples as it will be used to asses the generalization power of the bestperforming network (see §4.1.2). Thus, we have training sets withincreasing number of samples (from 4 to 1024) and increasing mapresolution (32, 128, 512). For each of these training sets we comparethe performance on four diﬀerent kinds of input information: T1 Galactocentric position information: 1 density map in galacto-centric coordinates (1 channel). T2 Galactocentric position and velocity information: 1 densitymap plus 3 velocity maps in galactocentric coordinates (4 channels). T3 ICRS position information: 1 density map in equatorial ICRScoordinates (1 channel). T4 ICRS position and velocity information: 1 density map plus 2proper motion maps in equatorial ICRS coordinates (3 channels).We ﬁx the network architectures and the training set-up as describedin §3.2 and §3.3. For these initial experiments we do not incorporatevalidation but only focus on the training results for diﬀerent types ofdata-sets, as we are not yet interested in evaluating the generalizationpower of the networks. To assess the convergence of the trainingruns we set a threshold for the 𝜎 k RMSE training loss to 10 km s − .If the ﬁnal RMSE value is higher than this threshold the training isrepeated up to a maximum of 8 times. If convergence is not reachedafter 8 trials we take the trained model with the lowest ﬁnal RMSE. For each training experiment we also monitor the computationaltime needed to perform a single optimization step on a single databatch (see Appendix B for more details).Initially we perform a search for the starting value of the learn-ing rate that provides the best results. In particular for the MLP weﬁnd that to ensure a decaying RMSE value during training, the ini-tial learning rate needs to be decreased as the input-map resolutionincreases. Therefore, after several tests we set the initial learningrate to 10 − , 10 − and 10 − , respectively, for the 32, 128 and 512resolution maps. The CNN instead is more ﬂexible and stable andall three initial learning rates are suitable for every resolution. Inthis case we set it to 10 − to obtain training convergence in thesmallest number of epochs.Next, we tune the batch size to make the learning process morestable and eﬃcient as the number of training samples increases. Ingeneral, we keep the batch size to 1 for the data-set sizes 4, 8, 16, 32and progressively increase it to 4, 8, 16, 32, 64 as the data-set sizeincreases to 64, 128, 256, 512, 1024, respectively. Only when testingthe performance of the MLP on the T2 input conﬁguration we needto further ﬁne tune the batch size in order to reach acceptable valuesof the RMSE. In all other cases the batch sizes mentioned abovework well.The results of our experiments using the MLP and the CNNon the training data-sets from S1 with conﬁguration T1, T2, T3 andT4 are shown in Figs. 4 and 5, respectively (see Appendix B for thetiming results), highlighting the best converged RMSE values as afunction of the training data-set size and the resolution.First of all, we note that for both model architectures, ICRSmaps allow us to obtain slightly better results in terms of the best MNRAS , 1–27 (2021) nalyzing the Galactic pulsar distribution with ML k R M S E l o ss [ k m s ] TrainingValidation k,T [km s ]20020 k , P k , T [ k m s ] k,T [km s ]10 | ( k , P k , T ) / k , T | h c R M S E l o ss [ kp c ] TrainingValidation h c,T [kpc]0.050.000.05 h c , P h c , T [ kp c ] h c,T [kpc]10 | ( h c , P h c , T ) / h c , T | Figure 6.

Results of the CNN single-parameter prediction for the kick-velocity parameter 𝜎 k and scale height ℎ c for the corresponding validation sets. Top(bottom) left panel : evolution of the RMSE training ( red ) and validation ( blue ) losses as a function of the training/validation epoch for the 𝜎 k ( ℎ c ) parameters. Top (bottom) central panel : residuals of the prediction as a function of the target 𝜎 k ( ℎ c ) value; the subscripts P and T refer to the predicted and target values,respectively. The red dashed lines delimit the 68% uncertainty region corresponding to a RMSE = . − (0 .

017 kpc) computed over the whole range [ , ] km s − ( [ . , ] kpc). The orange region delimits the 68% uncertainty region computed as a running RMSE for which we have divided the full 𝜎 k ( ℎ c ) range into 50 bins of equal size. The blue line shows the trend of the average residuals which are well centered around the value 0. Top (bottom) rightpanel : relative error of the prediction as a function of the target 𝜎 k ( ℎ c ) value. The red dashed line corresponds to a MRE = .

014 (0 . [ , ] km s − ( [ . , ] kpc). The blue line shows the trend of the running MRE computed over 50 bins of equal size into which we havedivided the full 𝜎 k ( ℎ c ) range. RMSE values. An explanation for this could be that only the ICRSmaps contain 3D information of the Galaxy, i.e., they encode thestellar height distribution with respect to the galactic disk whichcorrelates with the kick velocity magnitude. In fact the higher thekick velocity of the newborn neutron stars the more spread outtheir distribution in galactic height 𝑧 at the end of the dynamicalevolution. On the other hand galactocentric maps show the Galaxyrepresented face on and the information on the height 𝑧 of the starswith respect to the Galactic plane is hidden. Therefore, it is likelyeasier for the networks to distinguish populations with diﬀerent 𝜎 k by processing the ICRS maps.We also note that the addition of the velocity or proper motioninformation has opposite eﬀects on the two model architectures.In particular, on one side it worsens the MLP performance, as thebest RMSE almost doubles. On the other side, it helps the CNN inreducing the overall RMSE. We interpret this as an indication that,as the complexity of the input data increases, a deeper (with morelayers) and more sophisticated model architecture like the CNNis more suitable to process the input data and extract meaningfulfeatures to perform the regression task. However, the improvementis not dramatic, which could suggest that the density maps alreadyprovide enough information to distinguish, with suﬃcient precision,populations simulated with diﬀerent 𝜎 k values.Concerning the resolution, we observe that higher resolutionsallow us to reach slightly lower RMSE values with both the MLP and the CNN but at the expense of longer computation time (seeFigs. A1 and A2 in Appendix B). We therefore consider the smalldiﬀerences between the best RMSE values obtained with 128 and512 resolutions not suﬃcient to justify the choice of the higherresolution. Hence, the use of the 128 resolution appears to be a goodcompromise to ensure good accuracy and reasonably fast training.In light of these results for the single-parameter estimation,we conclude that the optimal representation to be used for trainingis composed of the ICRS density plus proper motion maps with128 resolution. Moreover, as the CNN obtains the best results andappears more stable and ﬂexible when compared to the simple MLP(especially for the multi-channel input features), we employ ourCNN architecture in the following experiments, which should alsobe less prone to overﬁtting. As the next step we separately train the CNN to predict the 𝜎 k and ℎ c parameters by using the two big data-sets with 20000 simulationseach (see S1 and S2). As input features we use the 3-channel ICRSrepresentation with one density map and two proper motion mapswith 128 resolution that ensure the best results as suggested by ourearlier experiments. We randomly split both data-sets into train-ing and validation subsets with a relative percentage of 80 / MNRAS000

017 kpc,computed over the whole range [ . , ] kpc (see bottom centralpanel of Fig. 6). Note that in both experiments, the residuals spreadout as the target values increase. We visualize this by computing arunning RMSE with increasing target values. As is shown by theorange regions in the residuals plots, the running RMSE increasesfrom 1 . . − for 𝜎 k and from 0 .

013 to 0 .

028 kpc for ℎ c ,respectively. However, the average residuals are consistent with 0over the entire target value ranges as marked by the blue line in theresidual plots, showing no anomalous trends in the predicted values.In the right panels of Fig. 6 we also show the relative errorto highlight the precision with which the network is able to predictthe 𝜎 k and ℎ c parameters for a given target value. We observe thatthe precision of the predictions improves with increasing 𝜎 k and ℎ c and eventually stabilizes to a relative error of around 0 .

01. Thisis highlighted by the blue lines, which show the trend of the meanrelative error (MRE) as a function of the target parameters. The reddashed lines correspond instead to the MRE computed on the wholeparameter ranges and are equal to 0 .

014 and 0 .

024 for 𝜎 k and ℎ c ,respectively. The fact that the precision of the predictions decreasesat the lower end of the target ranges suggests that (for our chosennetwork set-up) the input maps become harder to distinguish as theneutron stars’ initial kick velocities and their galactic birth heightsdecrease in magnitude. To see how the CNN behaves when two parameters, i.e., the kick-velocity parameter 𝜎 k and the characteristic scale height ℎ c , are in-ferred simultaneously, we use the data-set of maps generated fromsimulation run S3 (see §3.1). In this case, we have training setswith increasing number of samples, 16 = 4x4, 64 = 8x8, 256 =16x16, 1024 = 32x32, 4096 = 64x64. We leave aside the largestdata-set with 16384 = 128x128 simulations for our ﬁnal generaliza-tion experiment. Given the results of the single-parameter trainingexperiments, we choose the 128 resolution maps and compare theCNN’s performance on the four kinds of input information T1, T2,T3 and T4 as we did for the single-parameter case (see §4.1.1). Weﬁx the CNN architecture and the training set-up as described in §3.2and §3.3, respectively. As for the single-parameter comparison, weonly focus on the training behavior for this initial comparison. Wekeep track of the RMSE training losses for both parameters sepa-rately, but training proceeds by minimizing the total loss computedon both parameters. We set the convergence threshold for 𝜎 k to

100 1000Sample number051015 k R M S E l o ss [ k m s ] GC posGC pos+velICRS posICRS pos+vel

100 1000Sample number0.00.20.40.6 h c R M S E l o ss [ kp c ] GC posGC pos+velICRS posICRS pos+vel

Figure 7.

CNN best training RMSE values for the training process on thetwo parameters 𝜎 k of the Maxwell kick-velocity distribution ( left panel )and characteristic scale height ℎ c ( right panel ) as a function of the trainingdata-set size for the four diﬀerent input conﬁgurations T1 (GC position),T2 (GC position + velocity), T3 (ICRS position) and T4 (ICRS position+velocity).

10 km s − and for ℎ c to 0 . − , while the batch size is changedaccording to the data-set size. In particular, we use a batch size of1, 4, 16, 64, 128 for the data-set sizes 16, 64, 256, 1024 and 4096,respectively.The results of our experiments using the CNN for the two-parameter prediction on the training data-sets from S3, with conﬁg-uration T1, T2, T3 and T4 are summarized in Fig. 7 (see AppendixB for the timing results). As in the single-parameter case we ﬁndthat the best results are provided by the ICRS maps, which allowus to reach the lowest training RMSE losses for both parameters. Inparticular, for the ICRS 3-channel input, the CNN is able to reacha training RMSE loss (cid:46) − for the 𝜎 k parameter, comparablewith the single-parameter case. For ℎ c , the CNN reaches a trainingRMSE (cid:46) . 𝜎 k parameter when only the ICRS density maps are used. As alreadymentioned in §4.1.1, information on the stars’ 𝑧 -coordinate is en-coded in the ICRS maps. As we simultaneously vary the initial kickvelocities and the galactic heights of the pulsars’ birth places, thedegeneracy between the eﬀects of these two parameters becomesrelevant. This makes a distinction of the impact of one parame-ter over the other more diﬃcult for the network, when only ICRSdensity maps are provided. Adding two extra channels that containinformation about the stars’ proper motion thus helps to improve theaccuracy on the kick-velocity parameter estimation. The results ofthese initial explorations are promising and indicate that our simpleCNN architecture has good predictive power for both parameters, ifprovided with all three input channels in the ICRS representation. As for the single-parameter analysis, we assess the actual perfor-mance of the network when simultaneously predicting 𝜎 k and ℎ c bytraining the CNN on the biggest data-set with 16384 = × / MNRAS , 1–27 (2021) nalyzing the Galactic pulsar distribution with ML k R M S E l o ss [ k m s ] TrainingValidation k,T [km s ]50050 k , P k , T [ k m s ] k,T [km s ]10 | ( k , P k , T ) / k , T | h c R M S E l o ss [ kp c ] TrainingValidation h c,T [kpc]0.20.10.00.10.2 h c , P h c , T [ kp c ] h c,T [kpc]10 | ( h c , P h c , T ) / h c , T | Figure 8.

Results of the CNN two-parameter prediction for the kick-velocity parameter 𝜎 k and scale height ℎ c for the corresponding validation sets. Top(bottom) left panel : evolution of the RMSE training ( red ) and validation ( blue ) losses as a function of the training/validation epoch for the 𝜎 k ( ℎ c ) parameters. Top (bottom) central panel : residuals of the prediction as a function of the target 𝜎 k ( ℎ c ) value; the subscripts P and T refer to the predicted and target values,respectively. The red dashed lines delimit the 68% uncertainty region corresponding to a RMSE = . − (0 .

038 kpc) computed over the whole range [ , ] km s − ( [ . , ] kpc). The orange region delimits the 68% uncertainty region computed as a running RMSE for which we have divided the full 𝜎 k ( ℎ c ) range into 50 bins of equal size. The blue line shows the trend of the average residuals which are well centered around the value 0. Top (bottom) rightpanel : relative error of the prediction as a function of the target 𝜎 k ( ℎ c ) value. The red dashed line corresponds to a MRE = .

039 (0 . [ , ] km s − ( [ . , ] kpc). The blue line shows the trend of the running MRE computed over 50 bins of equal size into which we havedivided the full 𝜎 k ( ℎ c ) range. ﬁguration as in §4.1.2. The evolution of the individual training andvalidation losses is shown in the left panels of Fig. 8.We then test the trained network on the 3277 samples in the val-idation set. The results for the network’s prediction on both param-eters are shown in Fig. 8 and summarized in Table 4. The networkis able to predict 𝜎 k and ℎ c with an average RMSE uncertainty of8 . − and 0 .

038 kpc, respectively, which is approximately dou-bled compared to the single-parameter experiment. These RMSEvalues are computed over the full target ranges and represented bythe red dashed lines in the residuals plots (see central panels ofFig. 8). As in the single-parameter case, we observe that the RMSEuncertainties increase with increasing target parameters as indicatedby the orange regions in the residuals plots. The relative errors rep-resented in the right panels of Fig. 8 show the same decreasingtrend with the target value as for the single-parameter predictions,albeit with larger relative errors. When computing the MRE overthe whole range of the two target parameters, we obtain 0 .

039 and0 .

061 for 𝜎 k and ℎ c , respectively. These values are highlighted bythe red dashed lines in the right panels of Fig. 8.In general, we ﬁnd that the CNN trained to simultaneouslypredict 𝜎 k and ℎ c is not able to reach the same level of accuracyas in the experiments where a single parameter was predicted at atime. This could be due to three distinct causes: (i) either our neuralnetwork is not sophisticated enough to discern between the eﬀectsof both parameters on the simulation outcomes represented in the ICRS maps, (ii) our choice of ICRS maps as input does not pro-vide suﬃcient information for the network to distinguish betweenboth parameters, or (iii) this is a physical (real) degeneracy, andthere is a limit to what we can measure. To investigate this issuewe train the CNN to predict only the parameter ℎ c using the sametwo-parameter data-set used above where both 𝜎 k and ℎ c are var-ied. After predicting on the validation data-set we obtain a RMSEaccuracy of 0 .

038 kpc which is equal to the result obtained abovefor the two-parameter prediction. This suggests that the networkcomplexity is suitable to predict either one or two parameters si-multaneously. Limitations in performance are therefore either dueto an inadequate input representation or a physical degeneracy thatimposes a natural accuracy threshold. While we cannot distinguishthese two with our current simulation and ML pipeline, we can il-lustrate the underlying problem in the following way: Fig. 9 showsthe residuals of the scale-height parameter ℎ c versus the residuals ofthe kick-velocity parameter 𝜎 k . The overall negative slope indicatesthat the network tends to overpredict ℎ c in those simulations where 𝜎 k is underestimated and vice versa; i.e., large (small) ℎ c valueshave the same overall eﬀect on the phenomenology of the pulsarpopulation as large (small) 𝜎 k values and the network struggles todistinguish these cases. This highlights the degeneracy between thetwo parameters already discussed above, which might be broken ifthe data itself were represented in a diﬀerent way or additional input MNRAS000

50 0 50Residuals k [km s ]0.20.10.00.10.2 R e s i du a l s h c [ kp c ] Figure 9.

Scatter plot of the residuals of the predicted scale-height parameter ℎ c versus residuals of the predicted kick-velocity parameter 𝜎 k for our two-parameter generalization experiment. An anticorrelation can be observed. Table 4.

Summary of the CNN generalization results in the single-parameterand two-parameter cases.1-par. generalization 2-par. generalizationParameter RMSE MRE RMSE MRE 𝜎 k . − . − ℎ c .

017 kpc 0.024 0 .

038 kpc 0.061 information about each neutron star (beyond position and velocity)were provided.

In this paper we have studied the potential of an artiﬁcial neural net-work to estimate with high accuracy the dynamical characteristicsof a mock population of isolated pulsars. Implementing a simpliﬁedpopulation-synthesis framework we focused on the pulsar natal kick-velocity distribution and the distribution of birth distances from theGalactic plane. Taking into account the Galaxy gravitational poten-tial and evolving the pulsar motions in time, we generate a series ofsimulations that are used to train and validate a suitably structuredconvolutional neural network.The generalized results presented in the previous sections areobtained in a very idealized and simpliﬁed scenario, implying thatcaution is required when the uncertainties for the prediction of thekick-velocity dispersion 𝜎 k and birth scale height ℎ c are taken atface value and conclusions for the real pulsar population are drawn.In particular, our simulations assume that the distribution of neutronstar progenitors in Galactic height is represented by the exponen-tial thin-disk model, and that the kick-velocity magnitudes follow aMaxwellian distribution. While the choice of an exponentially thindisk is commonly adopted (Wainscoat et al. 1992; Polido et al. 2013;Li et al. 2019) and can be justiﬁed theoretically as the outcome ofa self-gravitating isothermal disk (Spitzer 1942), the choice of aMaxwellian model for the kick-velocity distribution is diﬃcult tomotivate from a theoretical standpoint. The Maxwellian model hasfound empirical support as it has been shown to well describe theproper motions of observed pulsars (Hobbs et al. 2005). It is for thisreason, and its rather simple mathematical form, that the Maxwellkick-velocity distribution is often adopted in population synthesis studies of compact stars (Sartore et al. 2010; Cieślar et al. 2020).However, the real functional form of the kick-velocity distributionis still unknown and debated. Several authors have studied the kick-velocity problem and concluded that other models explain observedproper motions of neutron stars equally well. For example, by us-ing maximum-likelihood methods Arzoumanian et al. (2002) foundthat a bimodal Gaussian distribution with a low-velocity and a high-velocity component is the preferred model to describe the observedproper motion of a sample of 79 radio pulsars. Faucher-Giguère &Kaspi (2006) studied the velocity component along the Galactic lon-gitude for a sample of 34 pulsars observed with interferometric tech-niques (Brisken et al. 2002, 2003). After testing a two-componentGaussian model as well as a variety of single-component models,they opted for a single-component description with an exponentialshape, although a two-component model could not be ruled out dueto the poor statistics of their sample. More recently, Verbunt et al.(2017) and Igoshev (2020) analyzed a sample of isolated young pul-sars and found that a two-component Maxwellian model explainedthe observed sample best.In general, the presence of a low-velocity and a high-velocitycomponent could indicate diﬀerent progenitor properties as well asbirth scenarios for the pulsar population. Numerical simulations ofsupernova explosions have for example suggested that neutron starswith lower kick velocities could be generated in the core-collapse su-pernovae of progenitors with small iron cores or in electron-capturesupernovae (Podsiadlowski et al. 2004; Mandel & Müller 2020).While also possible scenarios for isolated systems (Janka 2017),such conditions might generally aﬀect those neutron stars born in bi-naries (Giacobbo & Mapelli 2020), where mass-loss episodes couldstrip their progenitors oﬀ their hydrogen envelopes. This might fa-vor the formation of small iron cores or accretion-induced electron-capture supernovae, resulting in weaker natal kicks (Schwab et al.2010; Tauris et al. 2013). Only for the strongest kicks can the bina-ries be disrupted by the supernova and both companions expelled;otherwise the two stars remain gravitationally bound. Such eﬀectsare neglected in our model but could in principle generate an im-print on the observed neutron star population. The ability of MLalgorithms to recognize features and patterns in the processed datacould help to better constrain the bi-modality of the kick-velocitydistribution. In future work, we will investigate these aspects inmore detail and explore ways to implement a ML framework toreconstruct the shape of the kick-velocity distribution underlying anevolved population of pulsars.Up to this point, we have not considered any kind of selec-tion eﬀects or observational biases and eﬀectively assumed that allthe neutron stars in our simulation are detectable. While this pro-vides direct insight into how the various initial conditions aﬀectthe evolved population of neutron stars, a direct comparison withobservational data in principle requires a careful treatment of bi-ases. For example, due to beaming eﬀects not all Galactic radiopulsars are visible from Earth (Tauris & Manchester 1998; Mel-rose 2017), while survey sensitivity thresholds and instrumentallimitations might hamper the detection of faint or far away sources(Manchester et al. 2001; Johnston et al. 2008; Stovall et al. 2014;Coenen et al. 2014; Good et al. 2020). Additionally, timing noisecan signiﬁcantly limit the sensitivity and precision in the detectionof pulsar proper motions via timing analysis techniques (Hobbset al. 2004; Lentati et al. 2016; Parthasarathy et al. 2019). Withthe aim of obtaining a rough and simpliﬁed idea on how selectioneﬀects and biases could potentially inﬂuence a future comparisonwith observational we perform the following experiment. We ﬁrstcollect those neutron stars that have measured proper motions. As MNRAS , 1–27 (2021) nalyzing the Galactic pulsar distribution with ML D E C [ d e g ] d [ kp c ] Figure 10.

Proper motions of the selected 216 neutron stars in the sky represented in the ICRS reference frame. The current locations of the neutron stars areindicated by the colored circles, whereas the tracks indicate their motion for the past 0 . 𝑑 (cid:12) of the neutron stars. The corresponding data is provided in Table C. In the background, weshow in gray all non-binary pulsars in the ATNF catalogue (those in the Small and Large Magellanic Clouds as well as those in globular clusters are included).The red star highlights the position of the Galactic Center. the main resource we use the ATNF pulsar catalogue (Manchesteret al. 2005), but in some cases we provide proper motion resultsfrom more recent analyses (see Appendix C for details). We ﬁnda total of 417 neutron stars whose angular positions, proper mo-tions in ICRS coordinates, spin periods, spin-period derivatives, 𝐷𝑀 values and distance estimates are reported in Table C. Out ofthese objects, we remove those stars that belong to the MagellanicClouds, are associated with globular clusters or have a binary com-panion. We further select only those neutron stars with a spin-periodderivative (cid:164) 𝑃 > − to exclude those that have potentially beenrecycled. Finally, we consider only those for which an estimate ofthe heliocentric distance 𝑑 (cid:12) is available; in the case of radio pul-sars we quote values that are derived from their respective 𝐷𝑀 susing the free-electron density model of Yao et al. (2017) (YMW16model hereafter). As some neutron stars have a 𝐷𝑀 that exceeds themaximum Galactic 𝐷𝑀 allowed by the YMW16 model, these casesare assigned a default distance of 25 kpc. We exclude those casesunless an alternative distance measurement is available. Applyingthese ﬁlters we obtain a sample of 216 Galactic, likely isolatedneutron stars, whose positions and proper motions are illustrated inFig. 10.In the left panel of Fig. 11, the gray histogram shows the dis-tribution of their heliocentric distances. Even if subject to some un-certainties due to imprecisions in the YMW16 model, the distancedistribution peaks around 1 kpc, followed by a sharp exponentialdecrease. For a realistic pulsar distribution and in the absence ofselection eﬀects, we would expect the number of neutron stars to in-crease with distance due to an increase in the explored volume, untilreaching a maximum at a distance of about 10 kpc, which comprisesthe region around the Galactic Center (see the red histogram in theleft panel of Fig. 11). Thus, the shape of the gray distribution inFig. 11, as expected, indicates that our observed sample of neutron stars with measured proper motions is incomplete in distance andsubject to selection biases. By looking at the right panel of Fig. 11,we also note that a selection bias on distance is also reﬂected in thedistribution of the total proper motion magnitudes (gray histogram),computed as 𝜇 tot ≡ √︃ 𝜇 + 𝜇 . Indeed, since the nearest starsare also characterized by larger angular proper motions, there is atendency to detect high proper-motion stars with higher probability.In this ﬁrst empirical approach, instead of attempting to iden-tify these underlying biases (which will be the subject of a subse-quent paper in preparation), we follow a more agnostic approachto introduce a comparable selection eﬀect in our simulated popu-lations. Speciﬁcally, we use a weighted random-sampling routineto select 𝑛 pulsars from our mock populations, where each simu-lated star is assigned a weight according to a function 𝑓 ( 𝑑 (cid:12) ) ofits heliocentric distance. This weight function has to assign largerweights to closer neutron stars in order to ensure their higher de-tection probabilities and has to be chosen such that we recoverthe observed distance distribution with suﬃcient accuracy. To ﬁnd 𝑓 ( 𝑑 (cid:12) ) we simulate a mock population with the ﬁducial values ofthe kick velocity and scale height, that is 𝜎 k =

265 km s − and ℎ c = .

18 kpc, respectively. After using a given 𝑓 ( 𝑑 (cid:12) ) to weightthe simulated neutron stars we sample 216 mock stars and comparetheir distance and proper-motion distributions (shown as blue his-tograms in Fig. 11) with those of the observed sample by performingtwo-sample Kolmogorov-Smirnov (KS) tests. After testing variousfunctional forms we ﬁnd that 𝑓 ( 𝑑 (cid:12) ) = 𝑑 − (cid:12) exp (− . 𝑑 (cid:12) ) is able toreproduce the observed distributions with a good level of accuracy.More precisely, for this choice of 𝑓 ( 𝑑 (cid:12) ) the KS tests performedover 1000 distinct comparisons give average p-values of ∼ . ∼ . > .

05 when

MNRAS000

MNRAS000 , 1–27 (2021) Ronchi et al. d [kpc]10 N u m b e r o f N S s ObservedSimulated allSimulated+bias tot [mas yr ]10 N u m b e r o f N S s ObservedSimulated allSimulated+bias

Figure 11.

Distribution of heliocentric distances 𝑑 (cid:12) ( left panel ) and proper-motion magnitudes 𝜇 tot ( right panel ) for the 216 neutron stars with measuredproper motion ( gray histograms ). For comparison, we also show the distances and proper-motion magnitudes for 216 simulated neutron stars ( blue histograms )randomly sampled from a mock population generated with the ﬁducial parameters 𝜎 k =

265 km s − and ℎ c = .

18 kpc ( red histograms ). For the weightedsampling, we use the weight function 𝑓 ( 𝑑 (cid:12) ) = 𝑑 − (cid:12) exp (− . 𝑑 (cid:12) ) . comparing the observed sample with the simulated populations forreasonable values of 𝜎 k and ℎ c . Only in the cases where 𝜎 k and ℎ c assume extreme values (near the edges of their respective ranges)the p-values might drop below 0 .

05. However, these cases are as-sociated with simulations with extreme initial conditions that areunlikely to reproduce the observations. For our basic experiment,we thus adopt the above weight function to emulate all the selectionbiases, further assuming that it is the same for every number 𝑛 ofsampled neutron stars. Although one might intuitively attribute theexponential factor in 𝑓 ( 𝑑 (cid:12) ) to scattering in the interstellar mediumat larger distances, the underlying nature of our weight functionis certainly more complicated, and it encompasses a series of ef-fects due to the physics of the interstellar medium and the pulsaremission itself, as well as selection eﬀects of pulsar surveys andpulsar searches. We reserve disentangling the diﬀerent eﬀects thatcontribute to 𝑓 ( 𝑑 (cid:12) ) to future work.We then analyze how the predictive power of the CNN evolvesas a function of the number 𝑛 of neutron stars, sampled with theabove weight function 𝑓 ( 𝑑 (cid:12) ) . To do so, we vary 𝑛 from 200 to2000 in steps of 200, and in each case re-sample the 16384 simu-lated populations from run S3, where both 𝜎 k and ℎ c are varied.After applying a 80 /

20% training-validation split, we retrain theCNN on each of the down-sampled simulations. As before, we usethe 3-channel ICRS input maps (i.e., density plus proper motioninformation) but instead opt for a resolution of 32 ×

16 bins to ac-commodate the smaller number of stars represented in our maps.We have veriﬁed that a higher resolution of 128 ×

64 bins does notaﬀect the training results signiﬁcantly, but slows down the trainingprocess; we therefore choose the lower resolution. We use the sametraining hyperparameters as in §4.1.2, that is an initial learning rateof 10 − , a batch size of 64 and an early stop at 128 epochs. Oncetrained for each 𝑛 value, we apply the CNN to the validation setsand compute the root-mean-square error (RMSE) and mean relativeerror (MRE) for the 𝜎 k and ℎ c predictions as a function of 𝑛 .In the left panel of Fig. 12, we show how the RMSE uncer-tainties for the predictions of the two parameters 𝜎 k ( blue ) and ℎ c ( red ) diminish with increasing number of neutron stars 𝑛 sampledfrom the simulations. We observe that both curves (with the appro-priate rescaling) follow very similar trends. On the right, we showinstead how the MREs evolve with 𝑛 , indicating how the precision of the two-parameter prediction improves with the number of de-tected neutron stars. This plot shows that, under the assumptionsthat selection eﬀects are unaltered and the underlying kick-velocityand birth-height distributions have a Maxwellian and exponentialshape, respectively, our trained CNN is able to predict 𝜎 k and ℎ c with a relative error of ∼ .

35 for a sample of 2000 stars.These results highlight that the number of neutron stars playsa crucial role for the level of accuracy that the CNN can reach.As expected, a larger number of stars provides more information,which allows the CNN to pinpoint diﬀerences between populationsevolved from diﬀerent initial conditions, also when selection ef-fects are introduced. Future observational eﬀorts aimed at detectingand characterizing new pulsars will thus play an important role inconstraining the birth properties of neutron stars. Speciﬁcally, theadvent of the Square Kilometer Array (SKA) will represent an im-portant step forward into this direction. Due to its large sensitivity (afactor of 10 better than other radio telescopes) and its long baseline(up to 3000 km), SKA has the potential to increase the number ofdiscovered pulsars by a factor 10 (Smits et al. 2009, 2011). This willallow more precise timing and astrometric measurements of pulsarpositions as well as distances and proper motions. A larger and moreprecise data-set could also help to better constrain the shape of thekick-velocity distribution and diﬀerentiate between models that tryto explain the origin of the natal kicks (Tauris et al. 2015).

In this work we have analyzed the possibility of using machine-learning (ML) techniques to reconstruct the dynamical birth prop-erties of an evolved population of isolated neutron stars. For thispurpose, we developed a simplistic population-synthesis pipelineto simulate the dynamical evolution of Galactic neutron stars andsubsequently used these simulations to train and validate two dif-ferent neural networks. We speciﬁcally focused on their ability topredict two parameters that strongly impact on the phenomenol-ogy of the evolved population: the dispersion 𝜎 k of a Maxwellkick-velocity distribution and the scale height ℎ c of an exponen-tial distribution for the Galactic birth heights. This was achievedby providing the networks with two-dimensional stellar density and MNRAS , 1–27 (2021) nalyzing the Galactic pulsar distribution with ML

500 1000 1500 2000Number of NSs708090100110120130 k R M S E [ k m s ] h c R M S E [ kp c ] k h c

500 1000 1500 2000Number of NSs0.30.40.50.60.70.80.9 M R E k h c Figure 12.

Trend of the RMSE ( left panel ) and MRE ( right panel ) uncertainties on the prediction of the two parameters 𝜎 k ( blue ) and ℎ c ( red ) as a functionof the number of neutron stars 𝑛 in the resampled simulations in the validation sets. To train and validate the CNN we use the 16384 simulated populationsfrom simulation run S3 (with a 80 /

20% training/validation split), where both 𝜎 k and ℎ c are varied, and which are resampled with increasing number of stars 𝑛 according to the weight function 𝑓 ( 𝑑 (cid:12) ) = 𝑑 − (cid:12) exp (− . 𝑑 (cid:12) ) . velocity maps in galactocentric and equatorial (ICRS) coordinateframes. We found that a convolutional neural network (CNN) isable to estimate the physical parameters with high accuracy whenmultiple input channels, i.e., position and velocity information, areprovided. In particular, when simultaneously predicting 𝜎 k and ℎ c from ICRS maps, the network is able to reach absolute uncertaintieslower than 10 km s − and 0 .

05 kpc, respectively, corresponding to arelative error of around 10 − for both parameters. Albeit obtainedunder simpliﬁed assumptions, our feasibility study, the main focusof this paper, has thus demonstrated that ML techniques are indeedsuitable to infer information about the pulsar population. Our phe-nomenological analysis incorporating proper-motion measurements(an attempt at including observational biases in an agnostic way)has further highlighted that increasing the sample of known pulsarsand accurately measuring their current characteristics with futuretelescopes is crucial to tightly constrain the birth properties of theneutron stars in the Milky Way. In particular, our trained CNN isable to predict 𝜎 k and ℎ c with a relative error of ∼ .

35 for a sampleof 2000 pulsars with measured proper motions.We also demonstrated that one of the main factors in limitingthe accuracy of the CNN’s predictions in our simpliﬁed set-up isthe degeneracy between 𝜎 k and ℎ c ; as they both aﬀect the evolvedpopulations in a similar way, the network struggles to disentangletheir eﬀects. This limitation is a direct consequence of our simpliﬁedpopulation-synthesis framework. In future works, we will go beyondmodeling the dynamical evolution and focus on incorporating addi-tional physics such as magneto-thermal and spin-period evolution.We will further model their emission in diﬀerent electromagneticbands and study corresponding detectability limits by addressingselection eﬀects as well as observational survey biases. Such ad-ditional input information could potentially break the degeneracybetween the kick-velocity and the Galactic height distributions andprovide more accurate model constraints on 𝜎 k and ℎ c as well asother input parameters.The ultimate goal will be to use multi-wavelength observationsof the Galactic neutron star population and take advantage of ML,combined with population synthesis, to recover their birth proper-ties, such as the natal kick-velocity, spin-period or magnetic-ﬁelddistribution. The potential to learn complex patterns from multi-dimensional data and the power of generalizing to unseen data-sets, makes ML algorithms a powerful tool to tackle such multi-parameteroptimization studies. ACKNOWLEDGEMENTS

We would like to thank Alice Borghese and Francesco Coti Zelatifor their help with the collection of the observed pulsar proper mo-tions, Daniele Viganò for providing comments on the manuscript,and Alessandro Patruno for helpful conversations related to this pa-per. MR, VG, AG and NR acknowledge support from the H2020ERC Consolidator Grant “MAGNESIA” under grant agreementNr. 817661 (PI: Rea), and grants SGR2017-1383 and PGC2018-095512-BI00. JAP acknowledges support by the Generalitat Valen-ciana grant (PROMETEO/2019/071 and by AEI grant PGC2018-095984-B-I00. This work has also been partially supported by thePHAROS COST Action (CA16214). The data production, process-ing and analysis tools for this paper have been developed, imple-mented and operated in collaboration with the Port d’InformacióCientíﬁca (PIC) data center. PIC is maintained through a consor-tium of the Institut de Física d’Altes Energies (IFAE) and the Centrode Investigaciones Energéticas, Medioambientales y Tecnológicas(Ciemat). We particularly thank Christian Neissner, Ricard Cruz,Carles Acosta, Gonzalo Merino, and Pau Tallada for their supportat PIC. Finally, we acknowledge the use of the following software:

Astropy (Astropy Collaboration et al. 2013, 2018),

Hydra (Yadan2019),

IPython (Perez & Granger 2007),

Matplotlib (Hunter2007),

Numba (Lam et al. 2015),

NumPy (Oliphant 2006; van derWalt et al. 2011; Harris et al. 2020),

Pandas (McKinney 2010),

PyTorch (Paszke et al. 2019), and

SciPy (Jones et al. 2001; Virta-nen et al. 2020).

APPENDIX A: HARDWARE AND SOFTWARE

Our test machine features an Intel(R) Xeon(R) Gold 6230R CPUat 2.10GHz with a single NVIDIA GTX 2080 Ti GPU, 16 GiBRAM, and SSD drives. The system is running CentOS Linux release7.8.2003 (Core) with PyTorch 1.2.0, CUDA toolkit 10.0.130 andGPU driver 455.32.00.

MNRAS000

MNRAS000 , 1–27 (2021) Ronchi et al. T i m e [ m s ] GC position T i m e [ m s ] GC position + velocity T i m e [ m s ] ICRS position T i m e [ m s ] ICRS position + velocity

Figure A1.

MLP forward ( solid lines ) and backward ( dashed lines ) pass times per sample in ms for the training process on the single parameter 𝜎 k of theMaxwell kick-velocity distribution, as a function of the batch size and the resolution ( red , blue , and orange curves for 32, 128 and 512 respectively) using thefour diﬀerent input conﬁgurations T1 (GC position), T2 (GC position + velocity), T3 (ICRS position) and T4 (ICRS position + velocity). T i m e [ m s ] GC position T i m e [ m s ] GC position + velocity T i m e [ m s ] ICRS position T i m e [ m s ] ICRS position + velocity

Figure A2.

CNN forward ( solid lines ) and backward ( dashed lines ) pass times per sample in ms for the training process on the single parameter 𝜎 k of theMaxwell kick-velocity distribution, as a function of the batch size and the resolution ( red , blue , and orange curves for 32, 128 and 512 respectively) using thefour diﬀerent input conﬁgurations T1 (GC position), T2 (GC position + velocity), T3 (ICRS position) and T4 (ICRS position + velocity). APPENDIX B: TIMING TESTSB1 Timing for Single-parameter Predictions

We report here the timing benchmarks for the MLP and CNN duringthe single-parameter training experiments discussed in §4.1.1. Werun our experiments on the test machine and individually record theforward pass time (the time needed to go through the samples in abatch and compute a prediction) and the backward pass time (thetime to compute all the gradients and perform a single optimizationstep) as a function of the batch size and resolution. Our benchmarksfor the training data-sets from simulation run S1 using the fourtraining conﬁgurations T1, T2, T3 and T4 are shown in Figs. A1and A2, respectively.The timing benchmark shows that the MLP is slightly fasterin performing an optimization step. This is expected due to fewertrainable parameters when compared to the more complex CNN. Wecan also see that the forward and backward pass times per sampledecrease with increasing batch size for both the MLP and CNN. For alarger batch size, several input samples are transferred from the CPUto the GPU in one step, reducing the overall number of calls betweenthe two. Thus, on average, the processing time for an individualsample reduces when the batch size is increased. Moreover, a higherresolution generally implies an increase in computational cost, albeitbeing more pronounced in the case of the CNN than the MLP. Thenumber of input channels itself has very little eﬀect on our timings.Finally, we note that ICRS maps are slightly faster to process (inparticular for the higher resolutions), due to the fact that their sizeis smaller compared to the galactocentric maps (they have half theheight in bins). T i m e [ m s ] GC posGC pos+velICRS posICRS pos+vel

Figure B1.

CNN forward pass ( solid line ) and backward pass time ( dashedline ) per sample for the two-parameter experiments as a function of thebatch size for the four diﬀerent input conﬁgurations T1 (GC position), T2(GC position + velocity), T3 (ICRS position) and T4 (ICRS position +velocity).

B2 Timing for Two-parameter Predictions

Following the results for the single-parameter experiments, we re-strict our two-parameter predictions to the CNN model only and ﬁxthe resolution of the input maps to 128. The results of our timingbenchmarks using the training data-sets from simulation run S3 withthe four conﬁgurations T1, T2, T3 and T4 are shown in Fig. B1. Weagain report the timings for the forward and backward passes persample as a function of the batch size and the type of input channelsprovided. As for the single-parameter case, we conclude that usingICRS maps ensures the lowest forward and backward pass times.

MNRAS , 1–27 (2021) nalyzing the Galactic pulsar distribution with ML APPENDIX C: NEUTRON STARS WITH MEASUREDPROPER MOTION

In Table C, we report the properties of 417 neutron stars with mea-sured proper motions in RA and DEC. Data for these neutron starsare primarily collected from the ATNF catalogue (Manchester et al.2005). In some cases, updated estimates are available and those val-ues quoted and the corresponding references speciﬁed. Note that inthose cases, where multiple proper-motion estimates are available,we choose the ones with the lowest absolute error. The columnsreport in order: (i) the object name based on J2000 coordinates;(ii) the right ascension (RA) in hour angle and (iii) declination(DEC) in degrees with the last digit uncertainty given in parenthe-ses; the proper motion in (iv) RA and (v) DEC in milliarcsecondsper year with corresponding uncertainties; (vi) the parallax mea-sured in milliarcseconds with uncertainty where available; (vii) theposition epoch in modiﬁed Julian days; (viii) the spin period in sec-onds; (ix) the spin-period derivative in seconds over seconds; (x)the dispersion measure in [ pc cm − ] with the last digit uncertaintygiven in parentheses; (xi) the heliocentric distance derived fromthe 𝐷𝑀 using the YMW16 free-electron density model (for someobjects the 𝐷𝑀 exceeds the maximum Galactic 𝐷𝑀 allowed bythe YMW16 model, which assigns a default value of 25 kpc; whenavailable, we quote other distance estimates; * indicates a distancederived from other techniques especially for X-rays and gamma-ray sources, which have no measured 𝐷𝑀 ); (xii) the classiﬁcationof the object, i.e., radio pulsar (PSR), binary pulsar (binary PSR),gamma-/X-ray pulsar (Gamma-/X-ray PSR), magnetar (MAG), X-ray dim isolated neutron star (XDINS); if the object is associatedwith a globular cluster (GC) or the Small Magellanic Cloud (SMC)this is reported in brackets; and (xiii) the reference for the propermotion measurements, indicated only if diﬀerent from the ATNFcatalogue, i.e., [1] Motch et al. (2009), [2] Eisenbeiss et al. (2010),[3] Walter et al. (2010), [4] Stovall et al. (2014), [5] Jennings et al.(2018), [6] Perera et al. (2019), [7] Dang et al. (2020), [8] Danilenkoet al. (2020). REFERENCES

Agarwal D., Aggarwal K., Burke-Spolaor S., Lorimer D. R., Garver-DanielsN., 2020, MNRAS, 497, 1661Allen G., et al., 2019, preprint, p. arXiv:1902.00522Arzoumanian Z., Chernoﬀ D. F., Cordes J. M., 2002, ApJ, 568, 289Astropy Collaboration et al., 2013, A&A, 558, A33Astropy Collaboration et al., 2018, AJ, 156, 123Balakrishnan V., Champion D., Barr E., 2020, preprint, p. arXiv:2010.07457Ball N. M., Brunner R. J., 2010, Int. J. Mod. Phys. D, 19, 1049Balucinska-Church M., McCammon D., 1992, ApJ, 400, 699Baron D., 2019, preprint, p. arXiv:1904.07248Bates S. D., Lorimer D. R., Rane A., Swiggum J., 2014, MNRAS, 439, 2893Bisnovatyi-Kogan G. S., 1993, Astronomical and Astrophysical Transac-tions, 3, 287Bovy J., 2015, ApJS, 216, 29Brisken W. F., Benson J. M., Goss W. M., Thorsett S. E., 2002, ApJ, 571,906Brisken W. F., Fruchter A. S., Goss W. M., Herrnstein R. M., Thorsett S. E.,2003, AJ, 126, 3090Cabero M., Mahabal A., McIver J., 2020, ApJ, 904, L9Carlberg R. G., Innanen K. A., 1987, AJ, 94, 666Chatterjee S., et al., 2005, ApJ, 630, L61Chen B. Q., et al., 2019, MNRAS, 487, 1400 Cieślar M., Bulik T., Osłowski S., 2020, MNRAS, 492, 4043Coenen T., et al., 2014, A&A, 570, A60Cordes J. M., Lazio T. J. W., 2002, preprint, pp arXiv:astro–ph/0207156Dang S. J., et al., 2020, ApJ, 896, 140Danilenko A., Karpova A., Ofengeim D., Shibanov Y., Zyuzin D., 2020,MNRAS, 493, 1874Deller A. T., Tingay S. J., Bailes M., Reynolds J. E., 2009, ApJ, 701, 1243Deller A. T., et al., 2019, ApJ, 875, 100Dewey R. J., Cordes J. M., 1987, ApJ, 321, 780Eisenbeiss T., Ginski C., Hohle M. M., Hambaryan V. V., Neuhäuser R.,Schmidt T. O. B., 2010, AN, 331, 243Faucher-Giguère C.-A., Kaspi V. M., 2006, ApJ, 643, 332Fluke C. J., Jacobs C., 2020, WIREs Data Mining and Knowledge Discovery,10, e1349Fryer C. L., Kusenko A., 2006, ApJS, 163, 335Gerosa D., Pratten G., Vecchio A., 2020, Phys. Rev. D, 102, 103020Giacobbo N., Mapelli M., 2020, ApJ, 891, 141Gonthier P. L., Story S. A., Clow B. D., Harding A. K., 2007, Ap&SS, 309,245Good D. C., et al., 2020, preprint, p. arXiv:2012.02320Gullón M., Miralles J. A., Viganò D., Pons J. A., 2014, MNRAS, 443, 1891Hansen B. M. S., Phinney E. S., 1997, MNRAS, 291, 569Harris C. R., et al., 2020, Nature, 585, 357He C., Ng C. Y., Kaspi V. M., 2013, ApJ, 768, 64He K., Zhang X., Ren S., Sun J., 2015, preprint, p. arXiv:1502.01852Hernquist L., 1990, ApJ, 356, 359Hobbs G., Lyne A. G., Kramer M., Martin C. E., Jordan C., 2004, MNRAS,353, 1311Hobbs G., Lorimer D. R., Lyne A. G., Kramer M., 2005, MNRAS, 360, 974Hui C. Y., Becker W., 2006, A&A, 457, L33Hunter J. D., 2007, Comput. Sci. Eng., 9, 90Igoshev A. P., 2020, MNRAS, 494, 3663Janka H.-T., 2017, ApJ, 837, 84Jennings R. J., Kaplan D. L., Chatterjee S., Cordes J. M., Deller A. T., 2018,ApJ, 864, 26Johnston S., et al., 2008, Experimental Astronomy, 22, 151Jones E., Oliphant T. E., Peterson P., et al., 2001, SciPy: Open sourcescientiﬁc tools for Python,

Kaspi V. M., Roberts M. E., Vasisht G., Gotthelf E. V., Pivovaroﬀ M., KawaiN., 2001, ApJ, 560, 371Kiel P. D., Hurley J. R., 2009, MNRAS, 395, 2326Kiel P. D., Hurley J. R., Bailes M., Murray J. R., 2008, MNRAS, 388, 393Kingma D. P., Ba J., 2014, preprint, p. arXiv:1412.6980Lai D., Chernoﬀ D. F., Cordes J. M., 2001, ApJ, 549, 1111Lam S. K., Pitrou A., Seibert S., 2015, in Proceedings of the SecondWorkshop on the LLVM Compiler Infrastructure in HPC. LLVM’15. Association for Computing Machinery, New York, NY, USA, https://doi.org/10.1145/2833157.2833162

Lentati L., et al., 2016, MNRAS, 458, 2161Levin L., et al., 2013, MNRAS, 434, 1387Li C., Zhao G., Jia Y., Liao S., Yang C., Wang Q., 2019, ApJ, 871, 208Lin H., Li X., Luo Z., 2020, MNRAS, 493, 1842Liu J., Yan Z., Shen Z.-Q., Huang Z.-P., Zhao R.-S., Wu Y.-J., Yuan J.-P.,Wu X.-J., 2020, PASJ, 72, 70Lorimer D. R., Kramer M., 2004, Handbook of Pulsar Astronomy. Cam-bridge University PressLorimer D. R., et al., 2006, MNRAS, 372, 777Manchester R. N., et al., 2001, Monthly Notices of the Royal AstronomicalSociety, 328, 17Manchester R. N., Hobbs G. B., Teoh A., Hobbs M., 2005, AJ, 129, 1993Mandel I., Müller B., 2020, MNRAS, 499, 3214Marchetti T., Rossi E. M., Brown A. G. A., 2019, MNRAS, 490, 157Matthews A. M., et al., 2016, ApJ, 818, 92McKinney W., 2010, in van der Walt S., Millman J., eds, Proceedings of the9th Python in Science Conference. pp 51–56, http://conference.scipy.org/proceedings/scipy2010/mckinney.html

Melrose D. B., 2017, Reviews of Modern Plasma Physics, 1, 5Miyamoto M., Nagai R., 1975, PASJ, 27, 533MNRAS000

Melrose D. B., 2017, Reviews of Modern Plasma Physics, 1, 5Miyamoto M., Nagai R., 1975, PASJ, 27, 533MNRAS000 , 1–27 (2021) Ronchi et al.

Morawski F., Bejger M., 2020, A&A, 642, A78Motch C., Pires A. M., Haberl F., Schwope A., Zavlin V. E., 2009, A&A,497, 423Nagakura H., Sumiyoshi K., Yamada S., 2019, ApJ, 880, L28Narayan R., Ostriker J. P., 1990, ApJ, 352, 222Navarro J. F., Frenk C. S., White S. D. M., 1996, ApJ, 462, 563Oliphant T. E., 2006, A guide to NumPy. USA: Trelgol PublishingOsłowski S., Bulik T., Gondek-Rosińska D., Belczyński K., 2011, MNRAS,413, 461Parthasarathy A., et al., 2019, MNRAS, 489, 3810Paszke A., et al., 2019, preprint, p. arXiv:1912.01703Pavan L., et al., 2014, A&A, 562, A122Perera B. B. P., et al., 2019, MNRAS, 490, 4666Perez F., Granger B. E., 2007, Comput. Sci. Eng., 9, 21Podsiadlowski P., Langer N., Poelarends A. J. T., Rappaport S., Heger A.,Pfahl E., 2004, ApJ, 612, 1044Polido P., Jablonski F., Lépine J. R. D., 2013, ApJ, 778, 32Rawat W., Wang Z., 2017, Neural Computation, 29, 1Reid M. J., et al., 2019, ApJ, 885, 131Rozwadowska K., Vissani F., Cappellaro E., 2021, New Astron., 83, 101498Sartore N., Ripamonti E., Treves A., Turolla R., 2010, A&A, 510, A23Schwab J., Podsiadlowski P., Rappaport S., 2010, ApJ, 719, 722Shklovskii I. S., 1970, Soviet Ast., 13, 562Skliris V., Norman M. R. K., Sutton P. J., 2020, preprint, p.arXiv:2009.14611Skowron D. M., et al., 2019, Science, 365, 478Smits R., Kramer M., Stappers B., Lorimer D. R., Cordes J., Faulkner A.,2009, A&A, 493, 1161Smits R., Tingay S. J., Wex N., Kramer M., Stappers B., 2011, A&A, 528,A108Spitzer Lyman J., 1942, ApJ, 95, 329Stovall K., et al., 2014, ApJ, 791, 67Tamborra I., Hanke F., Janka H.-T., Müller B., Raﬀelt G. G., Marek A.,2014, ApJ, 792, 96Tauris T. M., Manchester R. N., 1998, MNRAS, 298, 625Tauris T. M., Langer N., Moriya T. J., Podsiadlowski P., Yoon S. C., BlinnikovS. I., 2013, ApJ, 778, L23Tauris T. M., et al., 2015, in Advancing Astrophysics with the Square Kilo-metre Array (AASKA14). p. 39 ( arXiv:1501.00005 )Taylor J. H., Cordes J. M., 1993, ApJ, 411, 674Vallée J. P., 2017, Astron. Rev., 13, 113Verbunt F., Igoshev A., Cator E., 2017, A&A, 608, A57Viganò D., Rea N., Pons J. A., Perna R., Aguilera D. N., Miralles J. A.,2013, MNRAS, 434, 123Virtanen P., et al., 2020, Nature Methods, 17, 261Wainscoat R. J., Cohen M., Volk K., Walker H. J., Schwartz D. E., 1992,ApJS, 83, 111Walter F. M., Eisenbeiß T., Lattimer J. M., Kim B., Hambaryan V., NeuhäuserR., 2010, ApJ, 724, 669Wang J. B., et al., 2017, MNRAS, 469, 425Wei W., Huerta E. A., 2020, preprint, p. arXiv:2010.09751Wong K. W. K., Gerosa D., 2019, Phys. Rev. D, 100, 083015Yadan O., 2019, Hydra - A framework for elegantly conﬁguring complexapplications, Github, https://github.com/facebookresearch/hydra

Yao J. M., Manchester R. N., Wang N., 2017, ApJ, 835, 29Yusifov I., Küçük I., 2004, A&A, 422, 545van der Walt S., Colbert S. C., Varoquaux G., 2011, Comput. Sci. Eng., 13,22 MNRAS , 1–27 (2021) na l y z i ng t h e G a l a c ti c pu l s a r d i s t r i bu ti on w it h M L Up-to-date list of 417 neutron stars with measured proper motions in RA and DEC.

JName RA DEC 𝜇 RA 𝜇 DEC parallax Pos. Epoch 𝑃 (cid:164) 𝑃 𝐷𝑀 𝑑 (cid:12)

Class(Assoc.) Ref.[hms] [ ◦ ’ ”] [mas yr − ] [mas yr − ] [mas] [MJD] [s] [s s − ] [pc cm − ] [kpc]J0014+4746 00:14:17.75(4) +47:46:33.4(3) . ± . − . ± . _ 49664 . . × − . ( ) − . ± . − . ± . . ± . . . × − . ( ) . ± . − . ± . _ 51600 . . × − . ( ) . ± . − . ± . _ 51600 . − . × − . ( ) . ± . − . ± . _ 51600 . − . × − . ( ) . ± . − . ± . _ 51600 . . × − . ( ) . ± . − . ± . _ 51600 . . × − . ( ) . ± . − . ± . _ 51600 . − . × − . ( ) . ± . − . ± . _ 51600 . − . × − . ( ) ± . − . ± . _ 51600 . − . × − . ( ) . ± . − . ± . _ 51600 . − . × − . ( ) . ± . − . ± . _ 51600 . − . × − . ( ) ± . − ± . _ 51600 . − . × − . ( ) . ± . − . ± . _ 51600 . − . × − . ( ) . ± . − . ± . _ 51600 . . × − . ( ) . ± . − . ± . _ 51600 . . × − . ( ) . ± . − . ± . _ 51600 . . × − . ( ) . ± . − . ± . _ 51600 . − . × − . ( ) . ± . − . ± . _ 51600 . . × − . ( ) . ± . − . ± . _ 51600 . . × − . ( ) . ± . − . ± . _ 50000 . − . × − . ( ) . ± . − . ± . _ 54000 . . × − . ( ) . ± . − . ± . _ 51600 . − . × − . ( ) ± ± _ 51600 . − . × − . ( ) − . ± . . ± . . ± . . . × − . ( ) . ± . − . ± . _ 55000 . . × − . ( ) . ± . − . ± .

16 0 . ± . . . × − . ( ) . ± . − . ± . . ± . . . × − . ( ) . ± . − . ± .

099 0 . ± . . . × − . ( ) . ± . − . ± . . ± . . . × − . ( ) ± − ± _ 55520 . . × − . ( ) . ± .

07 1 . ± .

15 0 . ± . . . × − . ( ) . ± . − . ± . . ± . . . × − . ( ) − . ± .

03 35 . ± .

04 0 . ± . . . × − . ( ) ± − ± _ 54428 . . × − . ( )

25 PSRJ0139+5814 01:39:19.7401(12) +58:14:31.819(17) − . ± . − . ± .

07 0 . ± . . . × − . ( ) − . ± . ± _ 49031 . . × − _ 3.6* MAGJ0147+5922 01:47:44.6434(1) +59:22:03.284(1) − . ± .

092 3 . ± .

075 0 . ± . . . × − . ( ) . ± . − . ± .

069 0 . ± . . . × − . ( )

25 PSRJ0152-1637 01:52:10.8539(1) -16:37:53.641(2) . ± . − . ± .

368 0 . ± . . . × − . ( ) . ± . − . ± . _ 56900 . . × − . ( ) . ± .

06 44 . ± .

04 0 . ± . . . × − . ( ) − . ± .

16 0 . ± . _ 55487 . . × − . ( ) ± ± _ 57600 . . × − . ( ) . ± . − . ± . . ± . . . × − . ( ) ±

10 48 ± _ 54000 . . × − ( )

2* PSRJ0255-5304 02:55:56.2925(9) -53:04:21.275(9) ± ± _ 57600 . . × − . ( ) ± − ± _ 49289 . . × − . ( ) . ± . − . ± .

02 1 . ± . . . × − . ( ) . ± . − . ± .

05 0 . ± . . . × − . ( ) − . ± . − . ± .

12 0 . ± . . . × − . ( ) . ± . − . ± .

42 0 . ± . . . × − . ( ) ± . − . ± . . ± . . . × − . ( ) . ± .

16 3 . ± . _ 56000 . . × − . ( ) ±

20 115 ± _ 54946 . . × − _ 0.83* Gamma-/X-ray PSR M N R A S , ( ) R on c h i e t a l . Table C1:

Up-to-date list of 417 neutron stars with measured proper motions in RA and DEC.

JName RA DEC 𝜇 RA 𝜇 DEC parallax Pos. Epoch 𝑃 (cid:164) 𝑃 𝐷𝑀 𝑑 (cid:12)

Class(Assoc.) Ref.[hms] [ ◦ ’ ”] [mas yr − ] [mas yr − ] [mas] [MJD] [s] [s s − ] [pc cm − ] [kpc]J0357+5236 03:57:44.8403(2) +52:36:57.493(1) . ± . − . ± .

078 0 . ± . . . × − . ( ) . ± . . ± . . ± . . . × − . ( ) . ± . . ± .

08 0 . ± . . . × − . ( ) . ± . − . ± .

002 6 . ± . . . × − . ( ) . ± . ± . ± . . . × − . ( ) − . ± . − ± . _ 56400 . . × − . ( ) . ± . − . ± .

14 0 . ± . . . × − . ( ) − ± ± _ 48717 . . × − . ( ) . ± . − . ± . _ 57384 . . × − . ( ) . ± . − . ± . _ 53623.16 . . × − . ( ) ±

12 19 ± _ 51216 . . × − . ( ) ± − ± _ 48262 . . × − . ( ) − ±

19 7 ± _ 53100 . . × − . ( ) − . ± . ± . _ 40706 . . × − . ( ) ±

40 65 ± _ 57600 . . × − . ( ) − . ± . . ± . . ± . . . × − . ( ) ± ± _ 55580 . . × − . ( ) − . ± . − . ± .

09 0 . ± . . . × − . ( ) . ± .

08 16 . ± . _ 55000 . . × − . ( ) . ± . − . ± .

01 0 . ± . . . × − . ( ) − . ± . − . ± .

04 0 . ± . . . × − . ( ) . ± . − . ± .

12 1 . ± . . . × − . ( ) . ± . − . ± . _ 55000 . . × − . ( ) . ± . − . ± .

08 0 . ± . . . × − . ( ) − . ± .

99 21 . ± .

52 3 . ± . . . × − . ( ) − ± _ ± _ _ _ . . × − _ 1.35* Gamma-/X-ray PSR [8]J0633+1746 06:33:54.1530(28) +17:46:12.909(4) ± ± . ± . . . × − . ( ) . ± . ± . . ± . . . × − . ( ) . ± . − . ± . . ± . . . × − . ( ) ± − ± _ 48712 . . × − . ( ) . ± . − . ± .

29 3 . ± . . . × − . ( ) − . ± . − . ± . _ 56983 . . × − . ( ) − . ± .

02 14 . ± . _ 54500 . . × − . ( ) − . ± . . ± . . ± . . . × − _ 0.4* XDINS [2]J0729-1836 07:29:32.3369(1) -18:36:42.244(2) − . ± .

11 13 . ± .

44 0 . ± . . . × − . ( ) − . ± .

62 2 . ± .

23 0 . ± . . . × − . ( ) − . ± .

62 2 . ± .

23 0 . ± . . . × − . ( ) − ± . ± _ 57600 . . × − . ( ) − ± ± _ 49326 . . × − . ( ) − ±

10 50 ± _ 55129 . . × − . ( ) − . ± . − . ± . . ± . . . × − . ( ) − ± ± _ 48725 . . × − . ( ) − ± ± _ 49896 . . × − . ( ) − ± ± _ 57600 . . × − . ( ) . ± . − ± . . ± . . . × − . ( ) . ± . − . ± .

05 0 . ± . . . × − . ( ) − . ± . − . ± _ 53454 . . × − _ 2.2* Gamma-/X-ray PSRJ0823+0159 08:23:09.7651(1) +01:59:12.469(1) − . ± .

244 0 . ± .

256 0 . ± . . . × − . ( ) − . ± . − . ± . _ 56600 . . × − . ( ) ± . − . ± .

07 2 . ± . . . × − . ( ) − . ± .

06 29 . ± . . ± . . . × − . ( ) ± ± _ 48721 . . × − . ( ) ± − ± _ 57600 . . × − . ( ) ± − ± _ 57600 . . × − . ( ) − . ± .

05 2 . ± .

06 0 . ± . . . × − . ( ) M N R A S , ( ) na l y z i ng t h e G a l a c ti c pu l s a r d i s t r i bu ti on w it h M L Up-to-date list of 417 neutron stars with measured proper motions in RA and DEC.

JName RA DEC 𝜇 RA 𝜇 DEC parallax Pos. Epoch 𝑃 (cid:164) 𝑃 𝐷𝑀 𝑑 (cid:12)

Class(Assoc.) Ref.[hms] [ ◦ ’ ”] [mas yr − ] [mas yr − ] [mas] [MJD] [s] [s s − ] [pc cm − ] [kpc]J0908-1739 09:08:38.1822(11) -17:39:37.67(3) ± − ± _ 48737 . . × − . ( ) . ± . . ± . . ± . . . × − . ( ) − ± ± _ 57600 . . × − . ( ) ±

16 9 ± _ 48865 . . × − . ( ) − ± − ± _ 49337 . . × − . ( ) − ± − ± _ 48483 . . × − . ( ) − . ± .

08 29 . ± .

07 3 . ± . . . × − . ( ) . ± . − . ± .

01 0 . ± . . . × − . ( ) − . ± .

06 6 . ± .

05 3 . ± . . . × − . ( ) − . ± .

041 5 . ± .

034 1 . ± . . . × − . ( ) . ± . − . ± .

04 0 . ± . . . × − . ( ) − . ± . − . ± .

04 0 . ± . . . × − . ( ) − ± ± _ 55716 . . × − . ( ) − ± ± _ 48738 . . × − . ( ) − . ± .

06 5 . ± . . ± . . . × − . ±

54 34 ± _ 51019 . . × − . ( ) − . ± − . ± . . ± . . . × − . ( ) ± − ± _ 57600 . . × − . ( ) ± − ± _ 49334 . . × − . ( ) − ± ± _ 51360 . . × − . ( ) − ± ± _ 57600 . . × − . ( ) − ± . . ± . _ 55126.3 . . × − . ( ) − . ± .

02 366 . ± .

06 2 . ± . . . × − . ( ) ±

17 43 ± _ 57600 . . × − . ( ) . ± . ± . _ 54637 . . × − . ( ) − ± − ± _ 56315 . . × − . ( ) − ± ± _ 57600 . . × − . ( ) ± ± _ 55191 . . × − . ( ) − . ± . . ± .

11 0 . ± . . . × − . ( ) − . ± .

26 6 . ± . _ 55100 . . × − . ( ) − . ± . . ± . . ± . . . × − . ( ) ± ± _ 49020 . . × − . ( ) − . ± .

122 12 . ± .

114 0 . ± . . . × − . ( )

25 PSRJ1300+1240 13:00:03.5767(1) +12:40:56.4721(3) . ± . − . ± .

07 1 . ± . . . × − . ( ) − . ± . − . ± .

044 0 . ± . . . × − . ( ) − ±

20 84 ± _ 53415.68 . . × − _ _ XDINS [1]J1311-1228 13:11:52.656(9) -12:28:01.1(3) − ± − ± _ 54371 . . × − . ( )

25 PSR [7]J1312+0051 13:12:46.64232(9) +00:51:00.104(3) − . ± . − . ± . _ 55600 . . × − . ( ) ± ± _ 57600 . . × − . ( ) − . ± .

087 32 . ± .

126 0 . ± . . . × − . ( ) ±

18 95 ± _ 55131 . . × − . ( )

25 PSRJ1328-4357 13:28:06.432(1) -43:57:44.12(1) ± ± _ 51360 . . × − . ( ) − ± ± _ 55692.7 . . × − . ( ) − ± − ± _ 57617 . . × − . ( ) − . ± . − . ± .

87 0 . ± . . _ . ( ) − ± ± _ 57600 . . × − . ( ) − ± − ± _ 57600 . . × − . ( ) − . ± .

33 14 . ± .

26 0 . ± . . . × − . ( ) − ± . − ± . _ 55647.8 . . × − . ( ) − ± − . ± . _ 55433 . . × − . ( ) . ± . − . ± .

09 0 . ± . . . × − . ( ) − . ± . − . ± . . ± . . . × − . ( ) − ± − ± _ 55421.2 . . × − . ( ) − . ± . − . ± .

09 0 . ± . . . × − . ( ) − . ± . − ± _ 55520 . . × − . ( ) M N R A S , ( ) R on c h i e t a l . Table C1:

Up-to-date list of 417 neutron stars with measured proper motions in RA and DEC.

JName RA DEC 𝜇 RA 𝜇 DEC parallax Pos. Epoch 𝑃 (cid:164) 𝑃 𝐷𝑀 𝑑 (cid:12)

Class(Assoc.) Ref.[hms] [ ◦ ’ ”] [mas yr − ] [mas yr − ] [mas] [MJD] [s] [s s − ] [pc cm − ] [kpc]J1518+0204C 15:18:32.788893(21) +02:04:47.8153(8) . ± . − . ± . _ 52850 . . × − . ( ) − . ± . − . ± . _ 52000 . . × − . ( ) − ± − ± _ 54217 . . × − ( ) . ± .

104 18 . ± .

111 0 . ± . . . × − . ( ) − ± − ± _ 57600 . . × − . ( ) . ± . − . ± .

012 0 . ± . . . × − . ( ) ± − ± _ 55408 . . × − . ( ) − . ± . − . ± .

131 0 . ± . . . × − . ( ) − . ± . − . ± .

07 0 . ± . . . × − . ( ) − . ± . − ± _ 55522 . . × − . ( ) . ± . − . ± . _ 54795 . . × − ( ) − ± − ± _ 54033 . . × − . ( ) − ± ± _ 48017 . . × − . ( ) . ± .

14 13 . ± .

05 0 . ± . . . × − . ( ) − . ± . − . ± .

05 0 . ± . . . × − . ( ) − ± − ± _ 57600 . . × − . ( ) − ± ± _ 50719 . . × − . ( ) − . ± . − . ± .

04 1 . ± . . . × − . ( ) − ± ± _ 57600 . . × − . ( ) − ±

16 142 ± _ 52111 . _ _ _ XDINSJ1607-0032 16:07:12.0598(2) -00:32:41.527(2) − . ± . − . ± .

211 0 . ± . . . × − . ( ) − ±

12 58 ± _ 54192 . . × − . ( ) . ± . − ± . ± . . . × − . ( ) − . ± . . ± . _ 56769 . . × − . − ± ± _ 55253.1 . . × − . ( ) − . ± .

126 23 . ± .

118 0 . ± . . . × − . ( ) − . ± − ± _ 48725 . . × − . ( ) − . ± − . ± .

82 1 . ± . . _ . . ± . − . ± . _ 56136 . . × − . ( ) . ± . − . ± .

02 0 . ± . . . × − . ( ) − ± ± _ 56425 . . × − . ( ) . ± .

03 4 . ± . . ± . . . × − . − . ± .

027 20 . ± .

176 0 . ± . . . × − . ( ) − . ± . − . ± . _ 50862 . . × − . ( ) . ± . . ± . _ 55520 . . × − . ( ) − . ± .

08 16 . ± .

19 0 . ± . . . × − . ( ) − ± − ± _ 48733 . . × − . ( ) − ± ± _ 54565 . . × − . ( ) − . ± . − ± _ 55132.9 . . × − . ( ) − ± ± _ 46993 . . × − . ( ) − . ± . − . ± . _ 51145 . . × − . ( ) − . ± . − . ± . _ 56850 . . × − . ( ) . ± . − . ± .

002 0 . ± . . . × − . ( ) . ± . − ± _ 55235.5 . . × − . ( ) − ± − ± _ 49429 . . × − . ( ) ± − ± _ 55000 . . × − . ( ) − ± − ± _ 57600 . . × − . ( ) − ± − ± _ 53882 . . × − . ( ) − . ± . − . ± .

062 1 . ± . . . × − . ( ) . ± . ± _ 54723 . . × − . ( ) . ± . − ± . ± . . . × − . − . ± . − ± _ 55215.1 . . × − . ( ) ± − ± _ 53000 . . × − . ( ) − . ± . − . ± . _ 54500 . . × − . ± ± _ 55359 . . × − . ( ) M N R A S , ( ) na l y z i ng t h e G a l a c ti c pu l s a r d i s t r i bu ti on w it h M L Up-to-date list of 417 neutron stars with measured proper motions in RA and DEC.

JName RA DEC 𝜇 RA 𝜇 DEC parallax Pos. Epoch 𝑃 (cid:164) 𝑃 𝐷𝑀 𝑑 (cid:12)

Class(Assoc.) Ref.[hms] [ ◦ ’ ”] [mas yr − ] [mas yr − ] [mas] [MJD] [s] [s s − ] [pc cm − ] [kpc]J1735-0724 17:35:04.9730(1) -07:24:52.130(1) . ± .

06 20 . ± .

06 0 . ± . . . × − . ( ) . ± .

04 5 ± . . ± . . . × − . ( ) ± ± _ 54598 . . × − . ( ) − ± − ± _ 48262 . . × − . ( ) . ± . . ± .

09 0 . ± . . . × − . ( ) − . ± . − . ± .

04 0 . ± . . . × − . ( ) − ± − ± _ 54933 . . × − . ( ) ± − ± _ 57600 . . × − . ( ) − ± − ± _ 50241 . . × − . ( ) . ± . − . ± .

02 2 . ± . . . × − . − ± − ± _ 55716 . . × − _ _ Gamma-/X-ray PSRJ1745-0952 17:45:09.1348(2) -09:52:39.682(15) − . ± . ± _ 53200 . . × − . ( ) ± − ± _ 55400 . . × − . ( ) . ± .

32 5 . ± . _ 56899 . . × − ( ) ± ± _ 55276 . . × − . ( ) − . ± . − ± . ± . . . × − . ( ) − . ± . − ± _ 55000 . . × − . ( ) − ± − ± _ 57600 . . × − . ( ) − ± − ± _ 46483 . . × − . ( ) − . ± .

05 1 . ± .

07 0 . ± . . . × − . ( ) − . ± .

08 0 ±

20 1 . ± . . . × − . ( ) − . ± . − ± _ 55000 . . × − . ( ) − ± − ± _ 53348 . . × − . ( ) − ± − ± _ 55001.9 . − . × − . ( ) − . ± . − ± . ± . . . × − . ( ) . ± . . ± . _ 51544 . . × − . ( ) . ± . − ± _ 55000 . . × − . ( ) − ± ± _ 48244 . . × − . ( ) − . ± . − . ± _ 53254 . . × − _ 13* MAGJ1809-1917 18:09:43.136(2) -19:17:38.1(5) − ± ± _ 55366 . . × − . ( ) − . ± . − . ± _ 55444 . . × − ( ) ± − ± _ 55770 . . × − _ 0.88* Gamma-/X-ray PSRJ1810+1744 18:10:37.28038(1) 17:44:37.4719(10) . ± . − . ± . ± . . _ . ± ± _ 53200 . . × − . ( ) . ± . − . ± . . ± . . . × − . ( ) − . ± . − ± _ 54058 . . × − . ( ) − . ± . − . ± .

33 0 . ± . . . × − . ( ) ± − ± _ 57600 . . × − . ( ) − . ± .

07 15 . ± .

08 0 . ± . . . × − . ( ) − ±

16 21 ± _ 54607 . . × − . ( ) ± − ± _ 48713 . . × − . ( ) . ± . − . ± . _ 55049 . . × − . ( ) ± ± _ 50093 . . × − . ( ) . ± ± _ 55291 . . × − . ( ) − . ± . − ± _ 54500 . . × − . − ± − ± _ 57600 . . × − . ( ) . ± . − . ± . _ 55314 . . × − . ( ) ± . − . ± . _ 52400 . . × − . ( ) ± − ± _ 49878 . . × − . ( ) − ± ± _ 55397 . . × − . ( ) − . ± .

09 15 ± . . ± . . . × − . ( ) ± ± _ 54369 . . × − . ( ) ±

46 56 ± _ 55429 . . × − . ( ) − ± − ± _ 49532 . . × − . ( ) ±

65 12 ± _ 48880 . . × − . ( ) M N R A S , ( ) R on c h i e t a l . Table C1:

Up-to-date list of 417 neutron stars with measured proper motions in RA and DEC.

JName RA DEC 𝜇 RA 𝜇 DEC parallax Pos. Epoch 𝑃 (cid:164) 𝑃 𝐷𝑀 𝑑 (cid:12)

Class(Assoc.) Ref.[hms] [ ◦ ’ ”] [mas yr − ] [mas yr − ] [mas] [MJD] [s] [s s − ] [pc cm − ] [kpc]J1839-0643 18:39:09.788(7) -06:43:44.5(4) ± − ± _ 54603 . . × − . ( ) − . ± . − . ± .

06 0 . ± . . . × − . ( ) − ± ± _ 48266 . . × − . ( ) − . ± . − . ± . . ± . . . × − . ( ) . ± .

19 12 ± _ 53934 . . × − . ( ) − ±

10 45 ± _ 49362 . . × − . ( ) − ± ± _ 57600 . . × − . ( ) ± − ± _ 53868 . . × − . ( ) − . ± . − . ± .

04 0 . ± . . . × − . ( ) . ± . − . ± . . ± . . . × − _ 0.16* XDINS [3]J1857+0526 18:57:15.851(2) +05:26:28.64(2) ± − ± _ 54619 . . × − . ( ) − . ± . − . ± .

006 0 . ± . . . × − . ( ) − . ± . − . ± . . ± . . . × − . ( ) − ± − ± _ 54336 . . × − . ( ) ± − ± _ 54537 . . × − . ( ) − . ± . − . ± . . ± . . . × − . ( ) − ± − ± _ 54288 . . × − . ( ) − . ± . − . ± . _ 55520 . . × − . ( ) ± − ± _ 57600 . . × − . ( ) − . ± . − . ± . . ± . . . × − . ( ) − . ± . − ± _ 56526 . . × − . ( ) ± − ± _ 54377 . . × − . ( ) − . ± . − . ± . _ 53700 . . × − . ( ) − ± − ± _ 49466 . . × − . ( ) − ± − ± _ 58100 . . × − . ( ) . ± . . ± . _ 51052 . . × − _ _ MAGJ1907+4002 19:07:34.656(8) +40:02:05.71(11) ± ± _ 48713 . . × − . ( ) ± − ± _ 48740 . . × − . ( ) − . ± . − . ± .

006 0 . ± . . . × − . . ± . − . ± .

05 0 . ± . . . × − . ( ) − . ± . − . ± . _ 51920 . . × − . ( ) − . ± . − . ± . _ 51910 . . × − . ( ) − . ± . − . ± . _ 55000 . . × − . ( ) . ± . − . ± .

05 0 . ± . . . × − . ( ) − . ± . − . ± . . ± . . . × − . ( ) ± − ± _ 46634 . . × − . ( ) − . ± . − . ± .

05 0 . ± . . . × − . ( ) ± − ± _ 57600 . . × − . ( ) − . ± . − . ± . _ 52984 . . × − . ( ) − ± − ± _ 54587 . . × − . ( ) − . ± .

05 3 . ± .

06 0 . ± . . . × − . ( ) − . ± . − . ± .

05 0 . ± . . . × − . ( ) ± − ± _ 49690 . . × − . ( ) . ± . − . ± .

09 0 . ± . . . × − . ( ) − . ± . − ± _ 58137 . . × − . ( ) ± ± _ 48999 . . × − . ( ) − . ± . − . ± . − . ± . . . × − . ( ) ± − ± _ 54606 . . × − . ( ) . ± .

11 42 . ± .

16 2 . ± . . . × − . ( ) − . ± .

07 10 . ± . _ 53000 . . × − . ( ) . ± . − . ± .

15 0 . ± . . . × − . ( ) − . ± . − ± _ 57516 . . × − . ( ) − . ± . − . ± .

04 0 . ± . . . × − . ( ) . ± . − . ± .

003 0 . ± . . . × − . ( ) ± − ± _ 50076 . . × − . ( ) M N R A S , ( ) na l y z i ng t h e G a l a c ti c pu l s a r d i s t r i bu ti on w it h M L Up-to-date list of 417 neutron stars with measured proper motions in RA and DEC.

JName RA DEC 𝜇 RA 𝜇 DEC parallax Pos. Epoch 𝑃 (cid:164) 𝑃 𝐷𝑀 𝑑 (cid:12)

Class(Assoc.) Ref.[hms] [ ◦ ’ ”] [mas yr − ] [mas yr − ] [mas] [MJD] [s] [s s − ] [pc cm − ] [kpc]J1943-1237 19:43:25.480(4) -12:37:43.10(3) − ± − ± _ 54561 . . × − . ( ) . ± . − . ± . . ± . . . × − . ( ) ± − ± _ 48790 . . × − . ( ) ± − ± _ 48736 . . × − . ( ) − . ± .

23 4 . ± . _ 56819 . . × − . ( ) − . ± . . ± . _ 49449 . . × − . ( ) − ± − ± − ± . . × − . ( ) − . ± . − . ± . _ 56000 . . × − . ( ) − ± ± _ 55813 . . × − . ( ) − . ± . − . ± . _ 52793 . . × − . ( ) ± − ± _ 48719 . . × − . ( ) − . ± . − . ± . _ 54800 . . × − . ( ) − . ± . − . ± . − ± . . × − . ( ) − ± ± _ 48741 . . × − . ( ) − ± ± _ 56600 . . × − . ( ) − . ± − . ± . . ± . . . × − . ( ) − ± . − . ± . _ 48196 . . × − . ( ) − ± − ± _ 54539 . . × − . ( ) − . ± . − . ± .

15 0 . ± . . . × − . ( ) . ± . − . ± . . ± . . . × − . ( ) − . ± . − ± _ 57949 . . × − . ( ) − . ± . − ± _ 49718 . . × − . ( ) . ± .

26 2 . ± . _ 53000 . . × − . ( ) . ± . − . ± . . ± . . . × − . ( ) − . ± . − . ± . . ± . . . × − . ( ) − . ± . − . ± . _ 48900 . . × − . ( ) − . ± . − . ± . . ± . . . × − . ( ) − . ± .

17 11 . ± . . ± . . . × − . ( ) ± − ± _ 54150 . . × − . ( ) − . ± . − . ± .

055 0 . ± . . . × − . ( ) − . ± . − ± . _ 56945 . . × − . ( ) . ± . − . ± . _ 56520 . . × − . ( ) − . ± . − . ± . . ± . . . × − . ( ) − . ± . − . ± . _ 56855 . . × − . ( ) . ± . − . ± .

22 0 . ± . . . × − . ( ) − . ± .

06 0 . ± .

06 0 . ± . . . × − . ( ) . ± . − . ± .

28 1 . ± . . . × − . ( ) . ± .

04 2 . ± . _ 55655 . . × − . ( ) − ± . . ± . _ 53400 . . × − . ( ) − . ± . − . ± . _ 56917.5915 . . × − . ( ) ± − ± _ 54546 . . × − . ( ) . ± . − . ± .

13 0 . ± . . . × − . ( ) . ± .

03 0 . ± . _ 57900 . . × − . ( ) . ± . . ± . _ 48736 . . × − . ( ) − . ± . − . ± .

07 0 . ± . . . × − . ( ) . ± . . ± .

08 0 . ± . . . × − . ( ) ± − ± _ 48471 . . × − . ( )

25 PSRJ2124-3358 21:24:43.849372(1) -33:58:44.8500(3) − . ± . − . ± .

07 2 . ± . . . × − . . ± .

15 10 . ± .

14 0 . ± . . _ . − . ± . − . ± . _ 55146 . − . × − . ( ) . ± . − . ± . _ 50000 . . × − . ( ) − . ± . − . ± _ 50000 . . × − . ( ) . ± . − . ± .

03 0 . ± . . . × − . − . ± . − . ± .

54 6 . ± . . . × − . ( ) − . ± . − . ± .

07 1 . ± . . . × − . M N R A S , ( ) R on c h i e t a l . Table C1:

Up-to-date list of 417 neutron stars with measured proper motions in RA and DEC.

JName RA DEC 𝜇 RA 𝜇 DEC parallax Pos. Epoch 𝑃 (cid:164) 𝑃 𝐷𝑀 𝑑 (cid:12)

Class(Assoc.) Ref.[hms] [ ◦ ’ ”] [mas yr − ] [mas yr − ] [mas] [MJD] [s] [s s − ] [pc cm − ] [kpc]J2149+6329 21:49:58.7033(2) +63:29:44.277(2) . ± .

11 11 . ± .

19 0 . ± . . . × − . ( ) . ± . − . ± . . ± . . . × − . ( ) . ± . . ± .

12 0 . ± . . . × − . ( ) ± − ± _ 56805 . . × − . ( ) − . ± . − . ± .

11 0 . ± . . . × − . ( )

25 PSRJ2214+3000 22:14:38.8525(57) +30:00:38.193(97) . ± . − . ± . . ± . . . × − . ( ) . ± .

54 1 . ± . . ± . . . × − . ( ) − ± − ± _ 46599 . . × − . ( ) . ± . − . ± .

06 3 . ± . . . × − . . ± .

23 126 . ± . . ± . . . × − . ( ) − . ± . − . ± . . ± . . . × − . ( ) . ± .

02 9 . ± .

05 1 . ± . . . × − . ( ) ± ± _ 49250 . . × − . ( ) − . ± . − ± . _ 54000 . . × − . ( ) . ± . − . ± . _ 55044.2 . . × − . ( ) − . ± . − . ± .

19 0 . ± . . . × − . ( )

25 PSRJ2301+5852 23:01:08.29(33) +58:52:44.45(60) − . ± . − . ± . _ 54852 . . × − _ 3.3* MAGJ2302+4442 23:02:46.9783(78) +44:42:22.085(95) − ± − ± ± . . × − . ( ) − . ± . − . ± .

11 0 . ± . . . × − . ( ) − ± ± _ 48717 . . × − . ( ) . ± . . ± .

13 0 . ± . . . × − . ( ) − . ± . . ± . . ± . . . × − . ( ) . ± .

07 0 . ± .

14 0 . ± . . . × − . ( ) − ± − ± _ 49303 . . × − . ( ) − . ± . − . ± . _ 55000 . . × − . ( ) − . ± . − . ± . . ± . . . × − . ( ) − . ± . − . ± .

19 0 . ± . . . × − . ( ) . ± . ± _ 49878 . . × − . ( ) − ± − ± _ 53100 . . × − . ( ) . ± . − . ± . . ± . . . × − . ( ) . ± . − . ± .

09 0 . ± . . . × − . ( ) . ± .

05 4 . ± .

025 0 . ± . . . × − . ( ) M N R A S , ( ) nalyzing the Galactic pulsar distribution with ML This paper has been typeset from a TEX/L A TEX ﬁle prepared by the author.MNRAS000