[PDF] Nested Sampling with Normalising Flows for Gravitational-Wave Inference

Abstract

We present a novel method for sampling iso-likelihood contours in nested sampling using a type of machine learning algorithm known as normalising flows and incorporate it into our sampler nessai. Nessai is designed for problems where computing the likelihood is computationally expensive and therefore the cost of training a normalising flow is offset by the overall reduction in the number of likelihood evaluations. We validate our sampler on 128 simulated gravitational wave signals from compact binary coalescence and show that it produces unbiased estimates of the system parameters. Subsequently, we compare our results to those obtained with dynesty and find good agreement between the computed log-evidences whilst requiring 2.07 times fewer likelihood evaluations. We also highlight how the likelihood evaluation can be parallelised in nessai without any modifications to the algorithm. Finally, we outline diagnostics included in nessai and how these can be used to tune the sampler's settings.

Full PDF

NNested Sampling with Normalising Flows for Gravitational-Wave Inference

Michael J. Williams, John Veitch, and Chris Messenger

SUPA, School of Physics and AstronomyUniversity of GlasgowGlasgow G12 8QQ, United Kingdom (Dated: February 23, 2021)We present a novel method for sampling iso-likelihood contours in nested sampling using a type ofmachine learning algorithm known as normalising ﬂows and incorporate it into our sampler

Nessai . Nessai is designed for problems where computing the likelihood is computationally expensive andtherefore the cost of training a normalising ﬂow is oﬀset by the overall reduction in the number oflikelihood evaluations. We validate our sampler on 128 simulated gravitational wave signals fromcompact binary coalescence and show that it produces unbiased estimates of the system parameters.Subsequently, we compare our results to those obtained with dynesty and ﬁnd good agreementbetween the computed log-evidences whilst requiring 2.32 times fewer likelihood evaluations. Wealso highlight how the likelihood evaluation can be parallelised in

Nessai without any modiﬁcationsto the algorithm. Finally, we outline diagnostics included in

Nessai and how these can be used totune the sampler’s settings.

I. INTRODUCTION

Gravitational-wave astronomy has contributed to ourunderstanding of physics and astrophysics from theatomic scale up to the scale of the Universe [1–3]. This isset to continue as the detectors of the LIGO-Virgo-Kagra(LVK) Collaboration continue to improve in sensitivity[4] and new detectors come online [5] increasing the po-tential of detecting previously unseen types of sources.Prior to the recent third observing run there were 11conﬁrmed detections of gravitational waves (GWs) fromcompact binary coalescence (CBC) [6], including the ﬁrstmulti-messenger detection [7]. The ﬁrst half of the lastyear-long observing run has resulted in a further 39 can-didate detections [8]. Each of the candidate events re-quires extensive analysis to determine whether they areastrophysical in origin and consequently allow us to un-derstand the astrophysics that produce such phenomena.The nature of these analyses and ever increasing numberof candidates necessitates more eﬃcient analysis tech-niques.Our understanding of the sources that produce the de-tected GWs hinges on the ability to infer the parametersthat describe them. This inference is carried out in aBayesian framework [9, 10] centred around Bayes’ theo-rem which allows for prior knowledge to be updated withobservations to obtain posterior distributions, or moreexplicitly: p ( θ | d, H ) = p ( d | θ , H ) p ( θ | H ) p ( d | H ) , (1)where θ are the parameters that describe the model H , d is observed data, p ( θ | d, H ) is the posterior, ( d | θ , H ) isthe likelihood, p ( θ | H ) is the prior and p ( d | H ) is the evi-dence. Whilst computing the posterior is trivial in lowerdimensions, the gravitational-wave signals from compactbinary coalescence are described by a minimum of 15parameters and the resulting parameter space can be multi-modal and highly correlated [11]. Typically, thisrequires applying stochastic sampling techniques such asMarkov Chain Monte Carlo (MCMC) [12] and NestedSampling [13]. These techniques are computationally ex-pensive and their cost is directly related to the cost ofevaluating the likelihood and amount of data being anal-ysed. There have been various eﬀorts to reduce this costby means such as re-parameterisations [14], reduced or-der methods [15, 16], parallel sampling methods [17–19]and problem-speciﬁc sampling algorithms [11, 20–22].Machine learning algorithms have successfully been ap-plied to various aspects of gravitational-wave data anal-ysis [23] including data-quality improvement, waveformmodelling, searches and, most relevant to this work,parameter estimation [24–27]. These algorithms, oncetrained, greatly reduce the cost of computing posteriordistributions. However they are still in their infancy andthere are various challenges that have yet to be solved,for example, they are constrained by the distribution oftraining data and cannot reliably be applied to data thatis outside that distribution, e.g. with diﬀerent detectornoise curves or longer duration signals.Normalising ﬂows are a type of generative machinelearning algorithm that have recently been applied to awide variety of problems including generation, inferenceand representation learning [28, 29]. The fundamentalidea behind normalising ﬂows is to model a complex prob-ability distribution as the transformation of simple distri-bution. This transformation is constructed to be highlyﬂexible but maintain an explicit mathematical descrip-tion with a tractable Jacobian. This makes them wellsuited to applications in the physical sciences where theapparent opacity of other machine learning algorithmscan hinder their widespread adoption.In this paper we propose a modiﬁed nested samplingalgorithm that incorporates a novel proposal method us-ing normalising ﬂows. During sampling the normalisingﬂow is trained such that the resulting distribution canbe re-sampled according to the prior to produce new live a r X i v : . [ g r- q c ] F e b points within a given iso-likelihood contour which are in-dependent of the current worst point . This is akin toother nested sampling algorithms which sample from theconstrained prior thus avoiding the need to evolve pointswith a random walk. These algorithms typically requirefewer evaluations of the likelihood compared to those thatuse random walks which reduces the computational cost.However, the challenge when implementing these algo-rithms is determining the constrained prior which oftenrequires deﬁning multiple bounding distributions. Usinga normalising ﬂow allows us to use a single bounding dis-tribution but comes at the cost of training during sam-pling. This additional cost is easily oﬀset in problemswhere there is signiﬁcant computational cost associatedwith computing the likelihood. As such we choose toapply our algorithm to gravitational-wave inference.This paper is structured as follows: in section II weoutline the background theory for gravitational-wave in-ference, nested sampling and normalising ﬂows. We thenintroduce our method in section III and discuss relatedwork in section IV. In section V we present results forgravitational-wave parameter estimation, compare to acommonly used nested sampler, and discuss the implica-tion of our results. Finally in section VI we summariseour results and draw conclusions. II. BACKGROUNDA. Gravitational-wave likelihood for compactbinary coalescence

The strain data from gravitational-wave interferome-ters is modelled as a signal with additive noise in thetime domain [11]. The standard deﬁnition of the likeli-hood for compact binary coalescence then assumes thatthe noise is Gaussian and stationary over short obser-vation times and that the power spectral density S n isindependent of the model parameters. This allows us todeﬁne the likelihood as an independent product in thefrequency domain p ( d | H, θ ) ∼ exp (cid:88) i (cid:34) − | ˜ h i ( θ ) − ˜ d i | T S n ( f i ) (cid:35) , (2)where ˜ h i ( θ ) and ˜ d i are the waveform model and datain the frequency domain. There are various diﬀerentwaveform models h ( θ ) which are characterised by setsof parameters. The exact number of parameters de-pends on the physics that the model describes, if bothcompact objects are assumed to be black holes, thentypically 15 parameters are required [11]. Diﬀerent pa-rameterisations have proven to improve sampling eﬃ-ciency and convergence, most notably reparameterisingthe component masses in terms for chirp mass M =( m m ) / ( m + m ) − / and asymmetric mass ratio q = m /m [11] and using the system-frame when de-scribing the orientation of the binary [14]. B. Nested sampling

Nested sampling is a stochastic algorithm for Bayesianinference that was proposed by Skilling in [13] andcomputes the Bayesian evidence Z = p ( d | H ) = (cid:82) p ( d | θ , H )d θ in eq. (1). In nested sampling the evi-dence integral is simpliﬁed by considering the total priorvolume X contained within a given iso-likelihood contour L ∗ = p ( d | θ , H ): X ( L ∗ ) = (cid:90) p ( d | θ ,H ) >L ∗ d θ p ( θ | H ) . (3)where d θ p ( θ | H ) = dX . The evidence integral can thenbe re-written using the likelihood L by inverting eq. (3) Z = (cid:90) L ( X )d X. (4)Since the integrand is positive and decreasing, thefunction is well behaved and the integral can thereforebe approximated by considering an ordered sequence ofdecreasing of M points in the prior volume X i , evaluat-ing the likelihood at each point L i = L ( X i ) and ﬁnally,for example, using the trapezoid rule Z = M (cid:88) i

12 ( X i − − X i +1 L i ) . (5)The complete algorithm is detailed in [13] but in shortit requires ﬁrst sampling a set of points, known as livepoints, from the prior distribution. The point with thelowest likelihood L ∗ is then removed and replaced by an-other sample drawn from within the likelihood contourdeﬁned by the original point and according to the priordistribution. This replacement process is then continueduntil a stopping criteria is met [13]. In gravitational-waveinference this is typically chosen to be when the fractionalchange in log evidence at a given iteration ∆ ln Z i causedby replacing another point is less than 0 . Z i ≡ ln ( Z i + ∆ Z i ) − ln Z i and ∆ Z i ≈ L ∗ X i . Oncethe algorithm has terminated the ﬁnal evidence is com-puted including the ﬁnal set of live points and posteriorsamples can be produced from the set of discarded andﬁnal live points [13].The challenge when implementing this algorithm is ef-ﬁciently proposing the replacement samples within thecurrent iso-likelihood contour since these samples mustbe independently and identically distributed (i.i.d.) ac-cording to the prior. One commonly used method drawsnew points based on the location of existing points whichare then evolved using a random walk to ensure they arei.i.d. [11, 18]. The eﬃciency of this method depends onthe length of the MCMC chain required since each stepin the chain requires evaluating the likelihood, for GWinference proposing a single new sample often requiresof order 10 likelihood evaluations. Other approachesavoid using a random walk by sampling from the con-strained prior at a given iteration. This requires deter-mining the distribution of the current live points and con-structing one or more bounding distributions, for exam-ple, using multidimensional ellipsoids as described in [30].In this case the challenge is eﬃciently constructing thebounding distributions and sampling them without over-constraining the prior and consequently under-samplinga region in parameter space. What is more, these two ap-proaches are sometimes used in conjunction [31] to tackleparticularly challenging problems, such as gravitational-wave inference [32], where the individual methods canbe ineﬃcient. In section III we propose a method fordirectly sampling from a constrained prior using normal-ising ﬂows to learn the iso-likelihood contours. C. Normalising ﬂows

Normalising ﬂows [33] are a type of generative machinelearning algorithm that map samples x in the physicalspace X to samples z in a latent space Z such that thesamples z are distributed according to a prior probabilitydistribution p Z , known as the latent prior . If we assumethis mapping to be bijective and denoted f : Z → X ,then using the change of variable formula we can deﬁnea probability distribution in the physical space p X p X ( x ) = p Z ( f ( x )) (cid:12)(cid:12)(cid:12)(cid:12) det (cid:18) ∂f ( x ) ∂x T (cid:19)(cid:12)(cid:12)(cid:12)(cid:12) , (6)where ∂f ( x ) /∂x T is the Jacobian of f at x . Samplescan then be drawn from p X by sampling the latent space z ∼ p Z and applying the inverse mapping f − . Thisrequires the bijective mapping f to have a tractable Ja-cobian, which for functions with multidimensional do-mains and codomains is often computationally expensive.These mappings are therefore carefully constructed tomeet these criteria and usually parameterised by a neu-ral network which allows for the mapping to be learned.Normalising ﬂows generally fall in two categories de-pending on how the mappings are deﬁned: autoregressiveﬂows [34, 35] and ﬂows based on coupling transforms [36–38]. Each have distinct advantages and disadvantages,most notably autoregressive ﬂows are often more ﬂexiblebut have a greater computational cost and are not alwaysanalytically invertible whereas coupling based ﬂows tendto be less ﬂexible but computationally cheaper to evalu-ate and analytically invertible. In this work we use thelatter since we aim to minimise computational cost andlater rely on the invertablility of normalising ﬂows. We use the term physical space to distinguish it from the straindata however in the literature X often referred to as the dataspace. Flows based on coupling transforms map an n --dimensional input vector x to an n -dimensional out-put vector y by splitting the input vector in two parts x = [ x m , x m +1: n ] where m < n and then transformingone part conditioned on the other unchanged part. Thechoice of splitting depends on the speciﬁc coupling trans-form though typically an alternating pattern is used. Forexample the coupling transform proposed in [36]: y m = x m , (7a) y m +1: n = x m +1: n (cid:12) exp [ s ( x m )] + t ( x m ) , (7b)where s ( x m ) and t ( x m ) are the output of a neuralnetwork and (cid:12) denotes the element-wise product. Weshow a schematic of eq. (7) in ﬁg. 1. Since y m = x m the Jacobian matrix of these transforms is lower triangu-lar and the Jacobian determinant is simply the productof the diagonal entries. This transform can easily be in-verted by inverting eq. (7) and using y m as the input to s and t : x m = y n , (8a) x m +1: n = [ y m +1: n − t ( y m :1 )] (cid:12) exp [ − s ( y m )] . (8b)As with eq. (7) the Jacobian determinant is trivial tocompute. Coupling transforms can also be stacked byalternating which part of x is updated, x m or x m +1: n ,to produce more ﬂexible transforms. It is also commonpractice to include permutations between coupling trans-forms, these permute the inputs to the transforms makingthe mapping more expressive [36]. More recently this hasbeen generalised to linear transforms in which the per-mutation is learnt during training [37]. In ﬁg. 1 we showa schematic of how coupling transforms can be stackedwith permutations. For a more detailed description ofnormalising ﬂows, the diﬀerent types and their applica-tion we point the reader to [28, 29]. III. METHOD

We present a novel method for sampling within a giveniso-likelihood using a normalising ﬂow. The normalisingﬂow learns the distribution of a set of live points and isconstructed such that the learnt distribution can be sam-pled from analytically. We introduce additional steps toensure that the samples are distributed according to the sampling prior and bounded by the iso-likelihood con-tour. This eliminates the need to evolve new samplesand allows us to eﬃciently draw new samples within acomplex iso-likelihood contour. A more eﬃcient proposal We use sampling prior to denote the prior used for nested sam-pling and to distinguish it from the latent prior use in normalisingﬂows. xyx m x m +1: n y m y m +1: n (cid:12) +NN st Trainable exp( s ) xyx m x m +1: n y m y m +1: n (cid:12) + NN s t Trainableexp( s ) f ( x ) xz PermutationTransform 1PermutationTransform 2PermutationTransform 3PermutationTransform 4

FIG. 1. Diagram of a normalising ﬂow f ( x ) composed of four coupling transforms which maps an n -dimensional input vectorx to an n -dimensional latent vector z. Each transform splits x in two [ x m , x m +1: n ] and updates one part conditioned on theother. In the ﬁrst and third transforms x m is used as the input to a neural network (NN) which then produces the scale s andtranslation t vectors of length m . The element-wise product ( (cid:12) ) is then computed between x m and exp( s ) followed by thesum of the output and t . This is shown in the left transform. In the second and fourth transforms x m is updated conditionedon x m +1: n as shown in the right transform. equates to few rejected points and, since the likelihoodmust be computed before a point can be rejected, thisis also equivalent to a reduction in the number of like-lihood evaluations. We ﬁrst describe our method in theisolated case of a single of set of live points at a giveniteration and then present a nested sampling algorithmwhich incorporates it. A. Sampling within a iso-likelihood contour

At any given point in the nested sampling algorithmthere is a current set of live points which by deﬁnitionis contained within the iso-likelihood contour deﬁned bythe worst point with likelihood L ∗ . We describe the im-plementation in our algorithm in terms of four steps: a. How to deﬁne an iso-likelihood contour using anormalising ﬂow. If we treat the sampling space as thephysical space X and the live points as the data x we canthen train a normalising ﬂow to approximate the distri-bution of the live points to within some error and use it todraw new samples in X . This requires sampling from thelatent prior p z , which we choose to be an n -dimensionalGaussian, and then applying the inverse mapping learnedby ﬂow f − . However, since our choice of latent prior p Z has an inﬁnite domain the resulting distribution of sam-ples p X will not be bounded by the iso-likelihood contouror distributed according to the sampling prior. We there-fore examine the notion of an iso-likelihood contour in thecontext of the normalising ﬂow with an n -dimensionalGaussian latent prior. b. How to determine the contour given the current setof live points. Once trained, the normalising ﬂow canbe used to map the current worst point in the physicalspace x ∗ to the latent space Z as z ∗ . This point has alikelihood in the latent space L ∗Z given by p Z ( z ∗ ) and,since p Z is an n -dimensional Gaussian, points of equallikelihood lie on the ( n − r ∗ givenby z ∗ . If we assume a perfect mapping, then this iso-likelihood contour in the latent space can be mapped toan iso-likelihood contour in the physical space. We cantherefore sample within the contour in latent space anduse the inverse mapping to produce samples within thecontour in the data space. c. How to sample within the contour. We use twoapproaches for drawing K new samples z i in the latentspace given a radius r ∗ , these produce normally and uni-formly distributed samples respectively. Both start bydrawing K samples on the ( n − K -dimensional vectors x x Data in

X − z − z Data in Z− z − z New samples in Z x x New samples in X− z − z Pool in Z x x Pool in X FIG. 2. Example of how a normalising ﬂow trained on a set oflive points can produce samples within current iso-likelihoodcontour for simple two-dimensional parameter space.

Top: example of training samples in the physical space X andlearned mapping to the latent space Z with the iso-likelihoodcontour for the current worst point shown in orange. Mid-dle: samples drawn from a truncated Guassian within theiso-likelihood contour in Z and mapped to X using the inversemapping. Bottom: pool of accepted samples after applyingrejection sampling until 1000 points are obtained shown inboth Z and X . are drawn from an n -dimensional unit Gaussian and thennormalised using the Euclidean norm. These samples y i can then be rescaled to obtain samples within in the n --ball z i = ρ i y i || y i || , (9)where the choice of distribution for ρ i determines howthe resulting samples z i are distributed in Z . For uni-formly distributed samples ρ i = u /n where u ∼ U (0 , r ∗ )and for normally distributed samples ρ i ∼ χ ( n ) where χ ( n ) is a chi-distribution with n degrees of freedom trun-cated at r . The inverse mapping of the normalising ﬂow can then be applied to z i to obtain samples in the physi-cal space. We consider two approaches because samplingfrom a truncated Gaussian in high-dimensional space canbecome ineﬃcient for large values of r ∗ . d. How to ensure new samples are drawn accordingto the prior. The samples obtained in the previous stepmust be re-sampled such that they are distributed ac-cording to the sampling prior. We use rejection samplingand compute weights α i for each sample α i = p ( x i ) q ( x i ) , (10)where x i = f − ( z ), p ( x ) is the sampling prior and q ( x ) the proposal probability which is computed usingthe inverse of eq. (6) q ( x ) = q ( f − ( z )) = p Z ( z ) (cid:12)(cid:12)(cid:12)(cid:12) det (cid:18) ∂f − ( z ) ∂x T (cid:19)(cid:12)(cid:12)(cid:12)(cid:12) − . (11)The choice of latent prior p Z will depend on whichmethod was used to draw the samples in the latent space.The weights eq. (10) are then rescaled such that theirmaximum value is one. We then draw N samples u ∼U [0 ,

1] and accept samples for which α i /u i >

1. In ﬁg. 2we show an example of this process for a simple two-dimensional case.

B. Algorithm details

We identify three key stages of the sampling approachdetailed in section III A that we then incorporate into thenested sampling algorithm: • Training: a normalising ﬂow is trained on the cur-rent set of K live points by minimising a MonteCarlo approximation of the of the KL divergencebetween the target distribution and p X ( x ) as de-ﬁned in eq. (6). The explicit loss function is derivedin appendix A. • Population: once the normalising ﬂow is trained,samples are drawn within the n -ball of radius r deﬁned by the worst point . These samples are thenmapped to the data space X , re-sampled accordingto the prior and ﬁnally stored in the pool of newsamples. • Proposal : once the pool of new samples has beenpopulated, new live points are drawn at randomfrom the pool and then removed until the pool isempty or the normalising ﬂow is retrained.The standard nested sampling algorithm is modiﬁed toinclude these stages, we call the algorithm

Nessai .The start of the algorithm remains unchanged: K livepoints are drawn from the prior distribution and theirlog-likelihoods computed. We then start the iterativeprocess of determining the worst live point with log-likelihood L ∗ and drawing a replacement live point thatlies within the iso-likelihood contour. In our modiﬁedalgorithm we use standard rejection sampling from theprior for the ﬁrst M points (typically 2 K ) or until it be-comes ineﬃcient. The normalising ﬂow is then trained onthe current K live points allowing us to map the worstpoint to the latent space to obtain the worst latent point z w and the radius of the corresponding n -ball r . This ra-dius is then used for the population stage when drawingsamples in the latent space. Once populated, a replace-ment point is drawn from the pool, its log-likelihood iscomputed and if it is greater than L ∗ , the point is ac-cepted; if not, more points are drawn until one is ac-cepted. The proposal stage is then repeated for subse-quent worst live points until one of four criteria is met: • the proposal pool is depleted: the normalisingﬂow is retrained using the current live points and worst live point is used to compute a new radiusand the population stage is repeated, • the acceptance rate falls below a user-deﬁned criteria: the current proposal pool is dis-carded and the normalising ﬂow is retrained. Thisthreshold is deﬁned by the user, • the criteria for retraining the normalisingﬂow is met: the normalising ﬂow is retrained withthe current K live points, this happens by defaultevery K live points, • the nested sampling convergence criterion ismet: the algorithm terminates.Since this algorithm relies on the normalising ﬂows’ability to approximate the distribution of live points atvarious stages throughout the sampling, we include a se-ries of reparameterisations of X to reduce the complexityof the data space and removing certain features. We de-note this reparameterised space X (cid:48) and include the Ja-cobian for each reparameterisation in eq. (11). Thesereparamerisations are: • Rescaling: we add the option to rescale the inputdata according to either the sampling priors or thecurrent minimum and maximum values such thatall of the parameters in X (cid:48) are deﬁned over thesame domain. As as default we use [ − , n . • Boundary inversion: we observe that asymmet-ric distributions with high density regions near theprior bounds are often under-sampled. To miti-gate this eﬀect we add the option to mirror thelive points around such bounds and train on theresulting symmetric distribution. Further detailsare provided in appendix B.We also introduce additional settings which help withconvergence and sampling eﬃciency, some of these and discussed in section V E and a comprehensive list can befound in the online documentation for our sampler [41].

C. Gravitational-wave reparameterisations

The gravitational-wave parameter space is typically15-dimensional and contains various degeneracies be-tween parameters such as the masses, inclination and lu-minosity distance which can make sampling ineﬃcient.Previous work has shown that certain reparameterisa-tions can improve sampling eﬃciency [11]. We use twoof these: chirp mass M and asymmetric mass ratio q replace the component masses and we use the system-frame parameterisation in place of the radiation-frame to describe the orientation of the binary [14].More than half of the parameters to sample are anglesand we note that the periodicity of these angles is notencoded in the mapping learned by the normalising ﬂowsince the latent space Z is continuous and unbounded.We therefore include a further reparameterisation specif-ically for the angular parameters θ i . We assume thateach angle has a corresponding radial component ρ θ i andtogether they describe a position in a two-dimensionalplane. We can therefore use standard transformations toexpress this position in Cartesian coordinates ( x θ i , y θ i ): x θ i = ρ θ i cos θ i ,y θ i = ρ θ i sin θ i . (12)If we choose the distribution of radial components suchthat ρ θ i ∈ [0 , ∞ ) then x θ i , y θ i ∈ ( −∞ , ∞ ). Since we areusing a Gaussian latent prior p Z , we sample ρ θ i from achi-distribution with two degrees of freedom such that,if the angle is uniformly distributed on [0 , π ], the re-sulting distribution of ( x θ i , y θ i ) is Gaussian. We use thistreatment for the phase, inclination, polarisation and allfour spin angles, for polarisation we rescale the angles to[0 , π ] before applying the transformation to Cartesiancoordinates. This reparameterisation also naturally in-cludes periodic boundary conditions for the angles withuniform priors.The sky location is described by a further two angles,right ascension α and declination δ . For these angleswe extend the previous treatment from two-dimensionalto three-dimensional Cartesian coordinates ( x, y, z ) anddraw the radial component ρ from a chi-distribution withthree degrees of freedom. For the standard priors, p ( α ) ∼U [0 , π ] and p ( δ ) ∼ cos δ , the resulting distribution of( x, y, z ) is again Gaussian.The spin magnitudes χ and χ also require a speciﬁctreatment. They are typically deﬁned on [0 , .

99] withuniform priors and, importantly, the posterior distribu-tions are often broad and span the entire prior range.We consider applying the boundary inversion to bothbounds but in practice ﬁnd this ineﬀective. We insteadopt to map χ i into a two-dimensional plane with posi-tions described using Cartesian coordinates x χ i and y χ i .We achieve this by ﬁrst deﬁning a rescaled magnitudeˆ χ i ∈ [0 ,

1] which is obtained using the corresponding pri-ors. Then, we consider the angle deﬁned by ˆ χ i π and,again, introduce a radial component ρ χ i ∼ χ (2). The cor-responding Cartesian coordinates ( x χ i , y χ i ) are deﬁnedon [0 , ∞ ) and ( −∞ , ∞ ) respectively. However we knowthat the coupling transforms we have chosen to use arebetter suited to unbounded domains. To avoid this, weintroduce a random variable k which is drawn from aRademacher distribution and include it in the Cartesiancoordinate transform x χ i = ρ χ i cos ˆ χ i π,y χ i = kρ χ i sin ˆ χ i π. (13)As a result of the including k , ( x χ i , y χ i ) ∈ ( −∞ , ∞ )and the hard boundary at x χ i = 0 has been avoided.We choose to reparameterise the luminosity distance d L such that the prior for the resulting parameter d U is uniform. The exact reparameterisation therefore de-pends on the prior used for d L . In this work we choose touse a prior on d L that is uniform in co-moving volume sowe ﬁrst convert the luminosity distance to a co-movingdistance d C and then the uniform parameter is simply d U = d . Similar reparametersitions can be determinedfor other commonly used distances priors such as a powerlaw. Additionally, we allow boundary inversion as de-scribed in section III B but limit it to only the upperbound since in practice the luminosity distance posteriorwill not rail against the lower bound.The remaining parameters are the chirp mass M andmass ratio q for which we use the reparameterisationsfrom X to X (cid:48) mentioned in section III B, allowing bound-ary inversion for q . D. Implementation

We use the implementation of normalising ﬂows in

Py-Torch [42] available in nflows [43] which allows fora wide variety of normalising ﬂows to be used. How-ever we choose to use coupling transforms [36] because ofthe tractable Jacobian and ease of computing the inversemapping. As suggested in [37, 44], we include invertiblelinear transforms that randomly permute the parametersbefore each coupling transform allowing all of the param-eters to interact with each other. We also include batchnormalisation [45] after each coupling transform as de-scribed in [36]. We use a residual neural network [46, 47]for computing the parameters for each transform. Wetrain the normalising ﬂows with the Adam optimiser [48].In appendix D we detail the speciﬁc parameters used forthe results presented in section V.Our sampler,

Nessai (Nested Sampling with ArtiﬁcialIntelligence), is available as an open source package [49]and documentation is also available online [41].

IV. RELATED WORK

Diﬀerent frameworks and samplers have been devel-oped for gravitational-wave inference.

LALInference [11] implements nested sampling and MCMC with spe-ciﬁc proposal methods for the gravitational-wave param-eter space and has been used extensively for analyses ofthe ﬁrst gravitational wave detections [50] and GWTC-1 [6]. More recently, the

Python package

Bilby [22]has been developed to use oﬀ-the-shelf samplers, such as dynesty [31], and been shown to achieve comparableresults to

LALInference on GWTC-1 [32].Machine learning has previously been incorporatedinto stochastic sampling algorithms; in [51] the likelihoodfunction is approximated with a neural network, and in[52] neural networks are used to generalise HamiltonianMonte Carlo. More closely related to our work, normal-ising ﬂows have been used to improve the eﬃciency ofMCMC methods by reparameterising the sampling space[53] and a similar approach has also been extended toMCMC sampling in nested sampling in [54].Recent work has shown that likelihood-free inferenceusing conditional variational autoencoders [24, 25] andnormalising ﬂows [26, 27] can produce posterior distribu-tions for compact binary coalescence from binary blackholes. These approaches promise to drastically reducethe cost of producing posterior samples when comparedto traditional stochastic sampling methods. However,they require large amounts of training data and theycurrently lack the ﬂexibility to deal with, for example,diﬀerent PSDs, high sampling frequencies and long du-ration signals.

V. RESULTS

We chose to evaluate the performance of our samplingalgorithm,

Nessai , with simulated gravitational-wavesignals from CBC. The parameter is multi-dimensionaland various parameters are correlated which can provechallenging when sampling. The likelihood, as deﬁnedin section II A, is also typically computationally costlyto evaluate, making it well suited to our sampler, how-ever, this depends on the waveform approximant usedand length of the observation. We ﬁrst use probability-probability plots to check the consistency of our samplerand then compare our results to those obtained using dynesty . We then highlight how the likelihood compu-tation can be parallelised in

Nessai before ﬁnally dis-cussing various diagnostics that can ne used to identifyproblems during sampling and tune the sampler settings.We use bilby pipe and

Bilby [22] to simulate 128injections using

IMRPhenomPv2 [55, 56] sampled at2048 Hz with 4-second observing time in a three detec-tor network with AdLIGO Hanford, AdLIGO Livingstonand AdVirgo at design sensitivity [4]. We choose uni-form priors on chirp mass

M ∼ U [25 ,

35 ]M (cid:12) and asym-metric mass ratio q ∼ U [0 . , . . . . . . . . . . . . . F r a c t i o n o f e v e n t s i n C . I . N=128, p-value=0.3394 χ (0.831) χ (0.788) M (0.504) δ (0.298) t c (0.844) d L (0.066) q (0.723) φ (0.800) φ jl (0.925) ψ (0.630) α (0.019) θ JN (0.283) θ (0.121) θ (0.197) (a) With phase marginalisation . . . . . . . . . . . . F r a c t i o n o f e v e n t s i n C . I . N=128, p-value=0.6818 χ (0.937) χ (0.876) M (0.785) δ (0.607) t c (0.984) q (0.955) φ (0.786) φ jl (0.885) ψ (0.544) α (0.007) θ JN (0.492) θ (0.163) θ (0.206) (b) With phase and distance marginalisation FIG. 3. Probability-probability (P-P) plot showing the conﬁdence interval versus the fraction of the events within that conﬁdenceinterval for the posterior distributions obtained using our analysis

Nessai for 128 simulated compact binary coalescence signalsproduced with

Bilby and bilby pipe . The 1-, 2- and 3- σ conﬁdence intervals are indicated by the shaded regions and p -valuesare shown for each of the parameters and the combined p -value is also shown. nosity distance that is uniform in co-moving volume on[100 , Bilby [32], see appendix C for a complete list.We analyse the injections with our sampling algorithm,

Nessai , outlined in section III B and include the speciﬁcreparameterisations for gravitational wave analyses de-scribed in section III C. We choose to analyse each injec-tion twice: once with just phase marginalisation and oncewith both phase and distance marginalisation. Furtherdetails of the exact settings used for

Nessai are providedin appendix D.

A. Result validation

Probability-probability (P-P) plots are a standardmethod of verifying the performance of sampling algo-rithms [57, 58]. They test whether the correct propor-tion of injected values are recovered at a given conﬁ-dence interval for a speciﬁc prior distribution. Thesetests are particularly useful when using a Gaussian likeli-hood, such as eq. (2), since the fraction of events within agiven conﬁdence interval should be uniformly distributedand we can therefore compute p -values for each parame-ter and a combined p -value for all of the parameters. Weproduce P-P plots for both of our analyses using Bilby and present the results in ﬁg. 3. For an idealised samplerfor the p % conﬁdence interval, p % of the events shouldbe recovered, this would correspond to a diagonal line. Inpractice we expect to see deviation from the diagonal, assuch the 1-, 2- and 3- σ conﬁdence intervals are also shownin ﬁg. 3. These results show that Nessai consistently re-covers for the posteriors for the 128 injections but alsoindicate that luminosity distance is consistently harder tosample. The combined p -values of 0.3394 and 0.6818 forour analyses without and with distance marginalisationserve as further veriﬁcation. B. Comparison to dynesty

To further validate our results we compare them tothose obtained with dynesty [31], another nested sam-pling algorithm commonly used in gravitational-wave in-ference [22, 32]. We use the conﬁguration described in[32] but increase the number of live points to 2000 andrun on a single thread to ensure as direct of a comparisonwith

Nessai as possible. With these settings dynesty passes the P-P test (see appendix E) but we note thatthese settings are the minimum required to produce re-liable results and in practice more conservative settingsare often used. Additionally, several injections requireda second analysis with a diﬀerent sampling-seed in orderreach convergence. The results obtained with dynesty − Z C o un t s FIG. 4. Diﬀerence between the log evidences ∆ ln Z obtainedusing dynesty and Nessai for all 128 injections with distancemarginalisation (dashed line) and without distance marginal-isation (solid line). allow us to verify the log-evidences returned by

Nessai since these cannot be computed analytically and providesa point of reference when considering the number of like-lihood evaluations and total computational time.In ﬁg. 4 we compare the log-evidences returned by dynesty and

Nessai . If

Nessai was consistently overor under-estimating the log-evidence when compared to dynesty , this would indicate a potential problem dur-ing sampling, such as over- or under-constraining, whichwould lead to biased results. The results in ﬁg. 4 showno such bias. However, since sampling is a stochasticprocess there is an error associated with the computedlog-evidence. The theoretical error can be approximatedusing the information H and the number of live points K , δ log Z ≈ (cid:112) H/K . To quantify this error we repeat theanalysis on a single injection with 50 diﬀerent samplingseeds and compute an approximate error δ log Z ≈ . .

11, this is consistent with previous analyses which de-termined that there are additional sources of uncertainty[59].

Nessai is designed with the aim of improving the ef-ﬁciency of drawing replacement live points at the costof repeatedly training a normalising ﬂow and populatinga pool of live points. An improvement in the eﬃciencytranslates to a reduction in the total number of likeli-hood evaluations since the likelihood must be computedfor each rejected point. We therefore compare the to-tal number of likelihood evaluations required to reachconvergence for each sampler in ﬁg. 5 with and withoutdistance marginalisation.

Nessai requires a median of5 . × and 7 . × likelihood evaluations to con-verge with and without distance marginalisation respec-tively and dynesty requires 10 . × and 9 . × . Incontrast to dynesty , sampling with Nessai is more eﬃ-cient without distance marginalisation, we attribute thisto a combination of the reparameterisation used for lu-minosity distance and the sampler settings we converged 10 Number of likelihoodevaluations02040 10 Runtime (hrs)

FIG. 5. Distribution of the total number of likelihood evalua-tions required to reach convergence for and total run time for dynesty (blue) and

Nessai (orange) when applied 128 simu-lated signals from compact binary coalescence with the priorsand sampler settings described in section V and appendix D.The results distance marginalisation disable are shown withsolid lines and those with distance marginalisation enabledare shown with dashed lines. on for

Nessai .This, however, does not directly translate into a de-crease in the total run-time since it does factor in thecost of training and population. In ﬁg. 5 we also show thetotal run-time for each sampler and when comparing themedian run-times we observe that

Nessai is 2.07 timesfaster than dynesty without distance marginalisationand 1.34 times faster with it. We also examine the pro-portion of the run-time spent on training and populationand ﬁnd that on average population-time accounts forapproximately 39.6% of the total run-time and training-time accounts for a further 8.1%. This explains why,despite a 2.32 times reduction in likelihood evaluations,we do not observe a 2.32 times reduction in samplingtime compared to dynesty . We also note that the costof training and population does not depend on the costof evaluating the likelihood, as such, the fraction of thetotal run-time will decrease as the cost of evaluating thelikelihood cost increases.For each injection we can also compare the poste-rior distributions produced by each sampler. These al-low us to quickly identify discrepancies between samplersfor speciﬁc injections or regions of the parameter space.However these diﬀerences are not easily quantiﬁed andthe correlations between more than two parameters arenot clearly represented. We show an example of such acomparison is appendix F.

C. Parallelisation of the likelihood computation

Our sampler is designed such that candidate live pointsare drawn simultaneously in the population stage. Thisallows for simple parallelisation of the likelihood compu-tation since the pool of candidate live points can be dis-0 Number of threads2 − T i m e ( h r s ) PopulationTrainingLikelihood evaluation TotalSumTheoretical

FIG. 6. Comparison of the total time (in hours) spent oneach stage of the algorithm for increasing number of threadsfor a single injection with a ﬁxed noise seed. The time spentevaluating the likelihood decreases as the number of threadsincreases, the theoretical reduction is shown in black. Thetraining and population stages remain approximately con-stant, as such act a lower bound on the minimum run-time.The sum of the time spent on likelihood evaluation, trainingand population is approximately equal to the total time spentsampling, indicating minimal overhead. tributed over a number of threads and likelihood valuescomputed and stored until needed for the proposal stage.In ﬁg. 6 we compare the run-time and time spent evaluat-ing the likelihood for the same injection using increasingnumber of threads for the likelihood computation. Weuse an additional thread for the main sampling process.The time spent evaluating the likelihood is inversely pro-portional to the number of threads allocated although theoverall run-time is not. With a single thread it accountsfor 53.9% of the total run-time and this decreases to 9.0%when using 16 threads. As mentioned previously, thereis a cost associated with populating stage and furthersmaller cost associated with training, for this injectionthese are 36.4% and 9.3% respectively. These remainapproximately constant when increasing the number ofthreads available and act as a lower limit on the theo-retical minimum run-time, this is shown in ﬁg. 6. Thereremaining <

1% of the run-time is general overhead as-sociated with running the sampler.

D. Diagnostics

As mentioned previously, there are various challengeswhen implementing a sampling algorithm.

Nessai is de-signed to sample from within the constrained prior, inthis case care must be taken to ensure that the prioris not over-constrained since this will lead to regions ofparameter being under-sampled which in turn will biasthe results. There are also speciﬁc problems that arise . . . . P r o b a b ili t y d e n s i t y Correctly constrainedOver-constrained

FIG. 7. Example of the distribution of insertion indices fortwo nested sampling runs with 2000 live points. Uniformlydistributed indices indicate no under- or over-constrainingand deviations from uniformity indicate the opposite. Theresult with an orange dashed line shows over-constraining andthe result with a solid blue line shows the correctly convergedrun. The shaded region indicates the 2- σ errors on the ex-pected distribution. from the nature of the parameter space, such as multi-modality and correlations. We use a series of diagnosticsto identify possible problems during sampling. In sec-tion V E we also discuss how some of these diagnosticscan be used to tune the sampler settings described insection III B and appendix D.We use the cross-checks proposed in [60] as a heuristicfor determining if the nested sampling algorithm has con-verged without over or under-constraining the posteriordistributions. These checks rely on order statistics andthe assumption that new live points should be inserteduniformly into the existing live points which allows for a p -value to be computed using the Kolomorgov-Smirnovstatistic [61] with the additional consideration that un-derlying distribution is discrete [62]. In ﬁg. 7 we showan example of the distribution of the indices of newlyinserted live points and in ﬁg. 8 we show the p -valuescomputed every K iterations. The histogram shows theﬁnal distribution of insertion indices for all the nestedsamples, this may not highlight speciﬁc problematic re-gions of the parameter space but if it is not uniform, it isa clear indication that the sampler is consistently over-or under-constraining. If the distribution of p -values inﬁg. 8 is non-uniform then this is another clear indicationof problems during sampling.The acceptance is another important statistic to mon-itor during sampling since we aim to develop a more eﬃ-cient sampler. There are two acceptances we can monitorin Nessai , the proposal acceptance and the populationacceptance. The ﬁrst has a direct eﬀect on the numberof likelihood evaluations whilst the second only aﬀectsthe total run-time, both quantities are shown in ﬁg. 8.This ﬁgure also highlights how periodically retraining thenormalising ﬂow leads to an increase in the proposal eﬃ-ciency. It also shows how the population process is typi-1 − − L og - li k e li h oo d (a) Min.Max.01 L og - li k e li h oo d e v a l u a t i o n s × (b) − − l n Z (c) ln Z d Z . . . A cce p t a n ce (d) ProposalPopulation0 20000 40000

Iteration . . . p - v a l u e (e) d Z FIG. 8. Example of the statistics that are tracked in oursampler as a function of sampling iteration: (a) minimum(blue solid) and maximum (red dashed) log-likelihood, (b) cumulative number of likelihood evaluations, (c) log evi-dence log Z (blue solid) and fractional change in evidence d Z (red dashed), (d) proposal (blue solid) and population (reddashed) acceptance and (e) p -value for cross-checks of every K live points. The iterations at which the normalising ﬂow istrained are indicated with vertical lines. cally ineﬃcient which explains why on average 32 percentof the total run-time is spent on the population stage.We also track the minimum and maximum log-likelihoods, number of log-likelihood evaluations, log-evidence and fractional change in evidence. The com-bination of these statistics allows the user to quickly un-derstand the current state of the sampler and identify potential issues such as plateaus in the likelihood spaceand regions which are ineﬃcient to sample. The completeset of statistics is shown in ﬁg. 8. E. Tuning

Nessai

Nessai includes various settings, a comprehensive listand description of each can be found in the documen-tation [41]. In practice we ﬁnd that a small subset ofthe settings predominantly determine whether the algo-rithm converges without any bias. We use the validationmethod described in section V A and the diagnostics fromsection V D to understand how these settings aﬀect con-vergence.As expected, the number of live points K is an impor-tant setting but it is even more crucial in Nessai since itlimits the amount of training data available. We ﬁnd thata minimum of 1000 live points is required and for morecomplex problems, such as gravitational-wave inference,at least 2000 live points should be used.There are large number of settings which relate to thecomplexity of the normalising ﬂow. Whilst tuning thesampler we found that the number of coupling trans-formations greatly aﬀected convergence. If too manytransforms were used the algorithm was prone to over-constraining the posterior distribution. We attribute thisto the complexity of the iso-likelihood contour learnt bythe ﬂow, if the ﬂow has too many trainable parametersit can over-ﬁt the distribution and exclude regions of theparameter space which should be sampled. At the otherextreme, if the model is too simple then resulting contourcan “smooth” ﬁne details and more samples are drawnoutside of the initial likelihood constraint. These will notbe accepted and the sampling process is therefore less ef-ﬁcient. We use a similar logic for the number of neuronsand layers in the neural network that parameterises theﬂow but we ﬁnd that these parameters predominantlyaﬀect training time with a lesser eﬀect on overall con-vergence. Another parameter that is important to con-sider is the batch size, during sampling the normalisingﬂow can be training upwards of 100 times. Hence, alarger batch size is recommended since it can greatly re-duce training time, we also recommend increasing thebatch size when using reparameterisations that increasethe amount of training data, such as the boundary inver-sion described in section III B and appendix B.We note that the size of pool of new samples eﬀectsthe eﬃciency of the algorithm and the total run-time.If the pool-size is small then the normalising ﬂow is fre-quently retrained, in extreme case where the proposalis ineﬃcient due to, for example, the complexity of theparameter space, then the normalising can be retrainedmultiple times during a single iteration. Conversely, ifthe pool-size is large then if the ﬂow is force-ably re-trained a number of points are discarded or, if the ﬂowis only retrained once the pool is empty, then the rejec-tion sampling becomes in-eﬃcient since a large fraction2of the potential new points will lie outside the likelihoodbound. We instead opt to inversely scale the pool-sizegiven the mean acceptance of the sampler since the lastiteration the ﬂow was trained. We recommend settingthe base pool-size to the number of live points, only re-training the model when the pool is empty and settingthe maximum pool-size to be ten times the base pool-size.As mentioned previously, approximately 32% of therun-time is spent on populating the pool of new sam-ples. This is directly attributable to the eﬃciency of therejection sampling required to ensure samples are dis-tributed according to the prior. In section III A we pro-pose two methods for drawing samples within the con-tour in the latent space, these produce uniformly andnormally distributed samples respectively. In practicewe ﬁnd the two methods comparable in most cases withthe exception of when the latent radius lies in the tail ofthe chi-distribution that corresponds to the latent prior p Z . In this case using the uniform distribution results inlower population and proposal acceptances which leadsto longer run-times. VI. CONCLUSIONS

We have proposed a novel method for sampling withina given iso-likelihood contour according to the prior thatcan be incorporated into the standard nested samplingalgorithm. Our method employs normalising ﬂows tolearn the density of current set of live points which, oncetrained, allows us to produce points within the contourby sampling from a simple distribution and using rejec-tion sampling. The use of normalising ﬂows allows usto avoid using multiple bounding distributions and sincenew samples are independent of the previous samples weeliminate the need to use a random walk. We imple-ment this proposal method in our sampler,

Nessai , andconduct a series of tests to verify that it recovers the cor-rect Bayesian posteriors and then compare our results tothose obtained with another sampler to determine if ourdesign does in fact result in a more eﬃcient sampler.We apply our sampler to 128 four second durationsimulated signals from the coalescence of binary blackhole systems sampled at 2048 Hz and we run two sepa-rate analyses, one with distance marginalisation and an-other without. The resulting P-P plots (ﬁg. 3) show thatour sampler more reliably recovers the posterior distribu-tions with distance marginalisation than without, how-ever both pass the P-P test. This indicates that ourproposal method does not introduce any inherent biases.We use dynesty for the comparison, which has beenshown to produce results consistent with those used inprevious LVK analyses [32]. We ﬁnd that our samplerreturns evidences consistent with dynesty , which servesas further veriﬁcation of our results. Since we aim to pro-duce a more eﬃcient sampler we also compare the like-lihood evaluations required to reach convergence. Whennot using distance marginalisation we ﬁnd that

Nessai requires 5 . × likelihood evaluations, 2.32 times fewerthan dynesty . When distance marginalisation is en-abled Nessai requires 7 . × , which, whilst still 1.40fewer than dynesty , is more than with the marginali-sation disabled. As such, we recommend using Nessai without distance marginalisation for gravitational-waveinference.However, this reduction in likelihood evaluations doesnot translate to a direct reduction in total computationtime because of the cost associated with training the nor-malising ﬂow and populating the pool of new samples.We ﬁnd that the fraction of the time spent of each processchanges when using distance marginalisation. Withoutthe marginalisation, on average, 8.1% of the total com-putation time is spent on training and a further 39.6%on population. When using distance marginalisation thischanges to 4.9% spent on training and 42.2% on popu-lation. We attribute the diﬀerence in population timeto the eﬃciency of the rejection sampling, which is im-proved when including the reparameterisation for dis-tance discussed in section III C. We ﬁnd that withoutdistance marginalisation the median run time for

Nes-sai is 2.07 times faster than dynesty . However whendistance marginalisation is enabled we observe that, onaverage,

Nessai is only 1.34 times faster than dynesty .This further reinforces our recommendation to use

Nes-sai with distance marginalisation disabled.We also show how our sampler can make use of paral-lelised likelihood functions by evaluating the likelihood ofnew live points during the population stage. We repeatthe previous analysis for a single injection without dis-tance marginalisation and parallelise the likelihood com-putation with increasing number of threads up to 16. Weobserve that the reduction time evaluating the likelihooddoes not quite match the theoretical values, indicatingthat there is a small overhead associated with it. Thisalso highlights how the limiting factor is the time spenttraining the normalising ﬂow and populating the pool ofnew live point.To aid in diagnosing potential biases during sampling,we include a series of diagnostics in our sampler whichallow us to easily identify under and over-constraining.These diagnostics also help to tune the sampling settingsand highlight how periodically re-training the normal-ising ﬂow during sampling prevents the proposal frombecoming ineﬃcient during sampling.We ﬁnd that our algorithm is susceptible to under-sampling regions of the parameter space which are closeto the prior bounds. We consequently introduce thepreviously described reparameterisations to mitigate thisand a series of diagnostics to aid in diagnosing biases andcorrectly tuning the settings. We aim address this in fur-ther work with changes to the design of the normalisingﬂows we have used.It is natural to compare this work to [24–27] which usevariational autoeconders and normalising ﬂows to pro-duce posterior distributions. Our approach diﬀers fromthese in that it requires no prior computation since train-3ing occurs during sampling and we do not introduce anyassumptions about the data other than those necessary toapply a nested sampling algorithm.

Nessai is thereforea drop-in replacement for existing sampling algorithmsthat does not require changes to existing pipelines.In future work we aim to evaluate our sampler us-ing more expensive waveform models including those forlonger duration signals, such as those from binary neu-tron star of neutron star-black hole system, and modelswhich include higher-order modes. We will also inves-tigate the suitability of other types of normalising ﬂowtransforms, such as the spline based transforms from [63]and ﬂows which allow for specifying a manifold [64].In summary, we have proposed a novel variation ofthe standard nested sampling algorithm that incorpo-rates normalising ﬂows speciﬁcally designed for inferencewith computationally expensive likelihood functions. Wehave applied our sampler to the problem of gravitationalwave inference and shown that it consistently recovers theBayesian posteriors distributions and evidences with 2.32times fewer total likelihood evaluations than dynesty ,another commonly used sampler, which translates to a2.07 times reduction in computation time. Our samplertherefore serves as a more eﬃcient drop-in replacement for existing samplers.

ACKNOWLEDGMENTS

The authors gratefully acknowledge the Science andTechnology Facilities Council of the United Kingdom.MJW is supported by the Science and Technology Fa-cilities Council [2285031]. JV and CM are supportedby the Science and Technology Research Council [ST/L000946/1]. CM is also supported by the EuropeanCooperation in Science and Technology (COST) action[CA17137]. The authors are grateful for computationalresources provided by Cardiﬀ University, and funded byan STFC grant supporting UK Involvement in the Oper-ation of Advanced LIGO.

Software:

Nessai was initially develop using cpnest [18] with permission from the authors and still shares asimilar interface and other core codes.

Nessai is im-plemented in

Python and uses

NumPy [65],

SciPy [66], pandas [67, 68], nflows [43],

PyTorch [42], mat-plotlib [69] and seaborn [70]. Gravitational wave in-jections were generated using

Bilby and bilby pipe [22].Figures were prepared using matplotlib [69], seaborn [70],

Bilby [22] and corner [71].

Appendix A: Loss function

A normalising ﬂow applies a mapping f : X → Z conditioned on its parameters θ which are typically the trainableparameters of a neural network. In this context the goal of training a normalising ﬂow is to approximate a targetdistribution p ∗X ( x ). The KL divergence between the target distribution and the distribution of the ﬂow p X ( x | θ ) canbe written as [29]: L ( θ ) = D KL [ p ∗X ( x ) || p X ( x | θ )]= − E p ∗X ( x ) [ln p X ( x | θ )] + const.= − E p ∗X ( x ) (cid:20) ln p Z ( f ( x | θ )) + ln (cid:12)(cid:12)(cid:12)(cid:12) det ∂f ( x | θ ) ∂x T (cid:12)(cid:12)(cid:12)(cid:12)(cid:21) + const. . (A1)Then, assuming the set of K samples used for training is drawn from p ∗X ( x ), the expectation value can be approxi-mated as: L ( θ ) ≈ − K K (cid:88) k =1 ln p Z ( f ( x | θ )) + ln (cid:12)(cid:12)(cid:12)(cid:12) det ∂f ( x | θ ) ∂x T (cid:12)(cid:12)(cid:12)(cid:12) . (A2) Appendix B: Boundary inversion

In section III B we describe boundary inversion which we introduce to avoid under-sampling regions which are closeto the priors bounds. The user deﬁnes which parameters the inversion can be applied to and before training thesampler determines if it should be applied to each parameter using the following steps:1. Compute the density of samples over the speciﬁed range and ﬁnd the maximum value.2. Compute the fraction of the density that lies within the initial and ﬁnal m % of the bounds, i.e. [0 , .

1] and[0 . , .

0] if the parameter is deﬁned on [0 , n % of the maximum density andthe density at the bound is non-zero.From our testing the percentages m and n default to 10% and 50% respectively but can be changed by the user.We consider two methods for applying the inversion: • duplication: duplicate the set of points and apply the inversion to the duplicates, • splitting: randomly select half of the points to apply the inversion to.We ﬁnd that duplication generally provides more consistent results but at the cost of the increasing the trainingtime. As such we recommend using splitting when inversion is applied to more than two parameters. Appendix C: Gravitational-wave priors

TABLE I. Prior distributions used for each parameter for gravitational-wave parameter estimation. Their corresponding labelsand the lower and upper bounds are included where applicable.Parameters Label Prior BoundsChirp mass M Uniform [25 , (cid:12) Asymmetric mass ratio q Uniform [0 . , . d L Uniform in co-moving volume [100 , α Uniform [0 , π ]Declination δ Cosine -Reference time at geocentre t c Uniform [ − . , . θ JN Sine -Polarisation ψ Uniform [0 , π ]Phase φ c Uniform [0 , π ]Dimensionless spin magnitudes χ i Uniform [0 , . θ i Sine -Diﬀerence between the azimuthal angles of each spin vector relativeto the orbital angular momentum φ Uniform [0 , π ]Diﬀerence between the azimuthal angles of the total and orbitalangular momentum φ JL Uniform [0 , π ] Appendix D:

Nessai sampling settings

TABLE II. Settings used for

Nessai for gravitational-wave inference. These are split into three categories: general settingswhich control aspects of the sampler such as the choice of latent prior or pool-size, ﬂow hyper-parameters which determine theconﬁguration of the normalising ﬂow and ﬂow training settings which control the training process. For a complete descriptionof each see the documentation [41].

Nessai settingsGeneral settings Flow hyper-parameters Flow training settingsTraining frequency

None

Coupling transformations Optimiser

Adam

Cooldown

Linear transformation LU Learning rate

Base pool-size

Network type

ResNet

Batch size

Update pool-size

True

Layers per network Max. epochs

Drawsize

Neurons per layer Patience Train on empty

True

Activation

ReLU

Weights reset

False

Batch-Normalisation

Intra-transforms

Latent prior

Truncated Gaussian

Rescale

True

Update bounds

True Appendix E: P-P tests for dynesty . . . . . . . . . . . . F r a c t i o n o f e v e n t s i n C . I . N=128, p-value=0.3401 χ (0.380) χ (0.453) M (0.850) δ (0.478) t c (0.628) d L (0.078) q (0.889) φ (0.759) φ jl (0.918) ψ (0.635) α (0.019) θ JN (0.504) θ (0.108) θ (0.175) (a) With phase marginalisation . . . . . . . . . . . . F r a c t i o n o f e v e n t s i n C . I . N=128, p-value=0.6549 χ (0.419) χ (0.432) M (0.628) δ (0.664) t c (0.451) q (0.894) φ (0.621) φ jl (0.806) ψ (0.680) α (0.053) θ JN (0.460) θ (0.134) θ (0.362) (b) With phase and distance marginalisation FIG. 9. Probability-probability (P-P) plot showing the conﬁdence interval versus the fraction of the events within that conﬁdenceinterval for the posterior distributions obtained using dynesty for 128 simulated compact binary coalescence signals producedwith

Bilby and bilby pipe . The 1-, 2- and 3- σ conﬁdence intervals are indicated by the shaded regions and p -values are shownfor each of the parameters and the combined p -value is also shown. We use the settings described in [32] with the exception ofthe number of live points which we increase to 2000. Appendix F: Example corner plot . . q . . . χ . . . χ . . . θ . . . θ . . . ∆ φ . . . φ J L d L [ M p c ] − . − . δ . . . α . . . θ J N . . . ψ

28 32 M . . . t c [ s ] . . q . . . χ . . . χ . . . θ . . . θ . . . ∆ φ . . . φ JL d L [Mpc] − . − . δ .

04 5 .

12 5 . α . . . θ JN . . . ψ .

076 0 .

080 0 . t c [ s ] FIG. 10. Corner plot comparing the posterior distributions produced with dynesty (blue) and our sampler

Nessai (red). Thephase is marginalised and remaining 14 parameters are shown, see appendix C for details on the parameters. The respective16% and 84% quantiles are also shown in the 1-dimensional marginalised posteriors.[1] B. Abbott et al. (LIGO Scientiﬁc, Virgo, Fermi GBM,INTEGRAL, IceCube, AstroSat Cadmium Zinc Tel- luride Imager Team, IPN, Insight-Hxmt, ANTARES,Swift, AGILE Team, 1M2H Team, Dark Energy Camera GW-EM, DES, DLT40, GRAWITA, Fermi-LAT, ATCA,ASKAP, Las Cumbres Observatory Group, OzGrav,DWF (Deeper Wider Faster Program), AST3, CAAS-TRO, VINROUGE, MASTER, J-GEM, GROWTH,JAGWAR, CaltechNRAO, TTU-NRAO, NuSTAR, Pan-STARRS, MAXI Team, TZAC Consortium, KU, NordicOptical Telescope, ePESSTO, GROND, Texas TechUniversity, SALT Group, TOROS, BOOTES, MWA,CALET, IKI-GW Follow-up, H.E.S.S., LOFAR, LWA,HAWC, Pierre Auger, ALMA, Euro VLBI Team, Pi ofSky, Chandra Team at McGill University, DFN, AT-LAS Telescopes, High Time Resolution Universe Sur-vey, RIMAS, RATIR, SKA South Africa/MeerKAT),Multi-messenger Observations of a Binary NeutronStar Merger, Astrophys. J. Lett. , L12 (2017),arXiv:1710.05833 [astro-ph.HE].[2] B. Abbott et al. (LIGO Scientiﬁc, Virgo), GW170817:Measurements of neutron star radii and equation of state,Phys. Rev. Lett. , 161101 (2018), arXiv:1805.11581[gr-qc].[3] B. Abbott et al. (LIGO Scientiﬁc, Virgo), Agravitational-wave measurement of the Hubble constantfollowing the second observing run of Advanced LIGOand Virgo, arXiv e-prints (2019), arXiv:1908.06060[astro-ph.CO].[4] B. Abbott et al. (KAGRA, LIGO Scientiﬁc, VIRGO),Prospects for Observing and Localizing Gravitational-Wave Transients with Advanced LIGO, AdvancedVirgo and KAGRA, Living Reviews in Relativity ,10.1007/s41114-018-0012-9 (2020), arXiv:1304.0670 [gr-qc].[5] B. Iyer, T. Souradeep, C. Unnikrishnan, S. Dhurandhar,S. Raja, and A. Sengupta, LIGO-India, Proposal of theConsortium for Indian Initiative in Gravitational-waveObservations (IndIGO) , Tech. Rep. (LIGO Scientiﬁc Col-laboration, 2011).[6] B. Abbott et al. (LIGO Scientiﬁc, Virgo), GWTC-1: AGravitational-Wave Transient Catalog of Compact Bi-nary Mergers Observed by LIGO and Virgo during theFirst and Second Observing Runs, Phys. Rev. X ,031040 (2019), arXiv:1811.12907 [astro-ph.HE].[7] B. Abbott et al. (LIGO Scientiﬁc, Virgo), GW170817:Observation of Gravitational Waves from a Binary Neu-tron Star Inspiral, Phys. Rev. Lett. , 161101 (2017),arXiv:1710.05832 [gr-qc].[8] R. Abbott et al. (LIGO Scientiﬁc, Virgo), GWTC-2: Compact Binary Coalescences Observed by LIGOand Virgo During the First Half of the Third Ob-serving Run, arXiv e-prints , arXiv:2010.14527 (2020),arXiv:2010.14527 [gr-qc].[9] J. Aasi et al. (LIGO Scientiﬁc, VIRGO), Parameter es-timation for compact binary coalescence signals withthe ﬁrst generation gravitational-wave detector network,Phys. Rev. D , 062001 (2013), arXiv:1304.1775 [gr-qc].[10] B. Abbott et al. (LIGO Scientiﬁc, Virgo), Properties ofthe Binary Black Hole Merger GW150914, Phys. Rev.Lett. , 241102 (2016), arXiv:1602.03840 [gr-qc].[11] J. Veitch, V. Raymond, B. Farr, W. Farr, P. Graﬀ, S. Vi-tale, B. Aylott, K. Blackburn, N. Christensen, M. Cough-lin, et al. , Parameter estimation for compact binarieswith ground-based gravitational-wave observations usingthe lalinference software library, Phys. Rev. D , 042003(2015).[12] S. Brooks, A. Gelman, G. Jones, and X.-L. Meng, Hand- book of markov chain monte carlo (CRC press, 2011).[13] J. Skilling et al. , Nested sampling for general bayesiancomputation, Bayesian analysis , 833 (2006).[14] B. Farr, E. Ochsner, W. M. Farr, and R. O’Shaughnessy,A more eﬀective coordinate system for parameter esti-mation of precessing compact binaries from gravitationalwaves, Phys. Rev. D , 024018 (2014).[15] M. P¨urrer, Frequency domain reduced order models forgravitational waves from aligned-spin compact binaries,Class. Quant. Grav. , 195010 (2014), arXiv:1402.4146[gr-qc].[16] R. Smith, S. E. Field, K. Blackburn, C.-J. Haster,M. P¨urrer, V. Raymond, and P. Schmidt, Fast and ac-curate inference on gravitational waves from precessingcompact binaries, Phys. Rev. D , 044031 (2016).[17] W. J. Handley, M. P. Hobson, and A. N. Lasenby,polychord: next-generation nested sampling, MonthlyNotices of the Royal Astronomical Society ,4384 (2015), https://academic.oup.com/mnras/article-pdf/453/4/4384/8034904/stv1911.pdf.[18] J. Veitch, W. D. Pozzo, M. Williams, C. Talbot,M. Pitkin, G. Ashton, Cody, M. H¨ubner, A. Nitz,D. Macleod, G. Carullo, G. Davies, and Tony, john-veitch/cpnest: Version 0.11 (2021).[19] R. J. E. Smith, G. Ashton, A. Vajpeyi, andC. Talbot, Massively parallel Bayesian inferencefor transient gravitational-wave astronomy, MonthlyNotices of the Royal Astronomical Society ,4492 (2020), https://academic.oup.com/mnras/article-pdf/498/3/4492/33798799/staa2483.pdf.[20] J. Lange, R. O’Shaughnessy, and M. Rizzo, Rapid andaccurate parameter inference for coalescing, precess-ing compact binaries, arXiv e-prints , arXiv:1805.10457(2018), arXiv:1805.10457 [gr-qc].[21] C. M. Biwer, C. D. Capano, S. De, M. Cabero, D. A.Brown, A. H. Nitz, and V. Raymond, PyCBC Infer-ence: A Python-based Parameter Estimation Toolkit forCompact Binary Coalescence Signal, PASP , 024503(2019), arXiv:1807.10312 [astro-ph.IM].[22] G. Ashton, M. H¨ubner, P. D. Lasky, C. Talbot, K. Ack-ley, S. Biscoveanu, Q. Chu, A. Divakarla, P. J. Easter,B. Goncharov, F. Hernandez Vivanco, J. Harms, M. E.Lower, G. D. Meadors, D. Melchor, E. Payne, M. D.Pitkin, J. Powell, N. Sarin, R. J. E. Smith, andE. Thrane, BILBY: A User-friendly Bayesian InferenceLibrary for Gravitational-wave Astronomy, ApJS ,27 (2019), arXiv:1811.02042 [astro-ph.IM].[23] E. Cuoco, J. Powell, M. Cavagli`a, K. Ackley, M. Be-jger, C. Chatterjee, M. Coughlin, S. Coughlin, P. Easter,R. Essick, H. Gabbard, T. Gebhard, S. Ghosh,L. Haegel, A. Iess, D. Keitel, Z. Marka, S. Marka,F. Morawski, T. Nguyen, R. Ormiston, M. Puerrer,M. Razzano, K. Staats, G. Vajente, and D. Williams,Enhancing Gravitational-Wave Science with MachineLearning, arXiv e-prints , arXiv:2005.03745 (2020),arXiv:2005.03745 [astro-ph.HE].[24] H. Gabbard, C. Messenger, I. S. Heng, F. Tonolini, andR. Murray-Smith, Bayesian parameter estimation usingconditional variational autoencoders for gravitational-wave astronomy, arXiv e-prints , arXiv:1909.06296(2019), arXiv:1909.06296 [astro-ph.IM].[25] A. J. K. Chua and M. Vallisneri, Learning BayesianPosteriors with Neural Networks for Gravitational-Wave Inference, Phys. Rev. Lett. , 041102 (2020), arXiv:1909.05966 [gr-qc].[26] S. R. Green, C. Simpson, and J. Gair, Gravitational-wave parameter estimation with autoregressive neuralnetwork ﬂows, arXiv e-prints , arXiv:2002.07656 (2020),arXiv:2002.07656 [astro-ph.IM].[27] S. R. Green and J. Gair, Complete parameter infer-ence for GW150914 using deep learning, arXiv e-prints, arXiv:2008.03312 (2020), arXiv:2008.03312 [astro-ph.IM].[28] I. Kobyzev, S. J. D. Prince, and M. A. Brubaker, Nor-malizing Flows: An Introduction and Review of Cur-rent Methods, arXiv e-prints , arXiv:1908.09257 (2019),arXiv:1908.09257 [stat.ML].[29] G. Papamakarios, E. Nalisnick, D. Jimenez Rezende,S. Mohamed, and B. Lakshminarayanan, NormalizingFlows for Probabilistic Modeling and Inference, arXive-prints , arXiv:1912.02762 (2019), arXiv:1912.02762[stat.ML].[30] F. Feroz, M. P. Hobson, and M. Bridges, MULTINEST:an eﬃcient and robust Bayesian inference tool for cos-mology and particle physics, MNRAS , 1601 (2009),arXiv:0809.3437 [astro-ph].[31] J. S. Speagle, DYNESTY: a dynamic nested samplingpackage for estimating Bayesian posteriors and evidences,MNRAS , 3132 (2020), arXiv:1904.02180 [astro-ph.IM].[32] I. M. Romero-Shaw, C. Talbot, S. Biscoveanu,V. D’Emilio, G. Ashton, C. P. L. Berry, S. Cough-lin, S. Galaudage, C. Hoy, M. Huebner, K. S. Phukon,M. Pitkin, M. Rizzo, N. Sarin, R. Smith, S. Steven-son, A. Vajpeyi, M. Arene, K. Athar, S. Banagiri,N. Bose, M. Carney, K. Chatziioannou, R. Cotesta,B. Edelman, C. Garcia-Quiros, A. Ghosh, R. Green,C. J. Haster, A. X. Kim, F. Hernandez-Vivanco, I. Ma-gana Hernandez, C. Karathanasis, P. D. Lasky, N. DeLillo, M. E. Lower, D. Macleod, M. Mateu-Lucena,A. Miller, M. Millhouse, S. Morisaki, S. H. Oh, S. Os-sokine, E. Payne, J. Powell, M. Puerrer, A. Ramos-Buades, V. Raymond, E. Thrane, J. Veitch, D. Williams,M. J. Williams, and L. Xiao, Bayesian inference for com-pact binary coalescences with BILBY: Validation andapplication to the ﬁrst LIGO–Virgo gravitational-wavetransient catalogue, arXiv e-prints , arXiv:2006.00714(2020), arXiv:2006.00714 [astro-ph.IM].[33] D. Jimenez Rezende and S. Mohamed, VariationalInference with Normalizing Flows, arXiv e-prints ,arXiv:1505.05770 (2015), arXiv:1505.05770 [stat.ML].[34] G. Papamakarios, T. Pavlakou, and I. Murray, MaskedAutoregressive Flow for Density Estimation, arXive-prints , arXiv:1705.07057 (2017), arXiv:1705.07057[stat.ML].[35] C.-W. Huang, D. Krueger, A. Lacoste, and A. Courville,Neural Autoregressive Flows, arXiv e-prints ,arXiv:1804.00779 (2018), arXiv:1804.00779 [cs.LG].[36] L. Dinh, J. Sohl-Dickstein, and S. Bengio, Den-sity estimation using Real NVP, arXiv e-prints ,arXiv:1605.08803 (2016), arXiv:1605.08803 [cs.LG].[37] D. P. Kingma and P. Dhariwal, Glow: GenerativeFlow with Invertible 1x1 Convolutions, arXiv e-prints ,arXiv:1807.03039 (2018), arXiv:1807.03039 [stat.ML].[38] C. Durkan, A. Bekasov, I. Murray, and G. Papamakarios,Neural Spline Flows, arXiv e-prints , arXiv:1906.04032(2019), arXiv:1906.04032 [stat.ML].[39] M. E. Muller, A note on a method for generating points uniformly on n-dimensional spheres, Commun. ACM ,19 (1959).[40] G. Marsaglia, Choosing a point from the surface of asphere, Ann. Math. Statist. , 645 (1972).[41] M. J. Williams, Nessai documentation, https://nessai.readthedocs.io/ (2021).[42] A. Paszke, S. Gross, F. Massa, A. Lerer, J. Bradbury,G. Chanan, T. Killeen, Z. Lin, N. Gimelshein, L. Antiga,A. Desmaison, A. Kopf, E. Yang, Z. DeVito, M. Rai-son, A. Tejani, S. Chilamkurthy, B. Steiner, L. Fang,J. Bai, and S. Chintala, Pytorch: An imperative style,high-performance deep learning library, in Advances inNeural Information Processing Systems 32 , edited byH. Wallach, H. Larochelle, A. Beygelzimer, F. d’Alch´eBuc, E. Fox, and R. Garnett (Curran Associates, Inc.,2019) pp. 8024–8035.[43] C. Durkan, A. Bekasov, I. Murray, and G. Papamakarios,nﬂows: normalizing ﬂows in PyTorch (2020).[44] L. Dinh, D. Krueger, and Y. Bengio, NICE: Non-linearIndependent Components Estimation, arXiv e-prints ,arXiv:1410.8516 (2014), arXiv:1410.8516 [cs.LG].[45] S. Ioﬀe and C. Szegedy, Batch Normalization: Acceler-ating Deep Network Training by Reducing Internal Co-variate Shift, arXiv e-prints , arXiv:1502.03167 (2015),arXiv:1502.03167 [cs.LG].[46] K. He, X. Zhang, S. Ren, and J. Sun, Deep Resid-ual Learning for Image Recognition, arXiv e-prints ,arXiv:1512.03385 (2015), arXiv:1512.03385 [cs.CV].[47] K. He, X. Zhang, S. Ren, and J. Sun, Identity Map-pings in Deep Residual Networks, arXiv e-prints ,arXiv:1603.05027 (2016), arXiv:1603.05027 [cs.CV].[48] D. P. Kingma and J. Ba, Adam: A method for stochasticoptimization, arXiv preprint arXiv:1412.6980 (2014).[49] M. J. Williams, Nessai: Nested sampling with artiﬁcialintelligence (2021).[50] B. Abbott et al. (LIGO Scientiﬁc, Virgo), Observation ofGravitational Waves from a Binary Black Hole Merger,Phys. Rev. Lett. , 061102 (2016), arXiv:1602.03837[gr-qc].[51] P. Graﬀ and F. Feroz, BAMBI: Blind Accelerated Mul-timodal Bayesian Inference (2013), ascl:1312.008.[52] D. Levy, M. D. Hoﬀman, and J. Sohl-Dickstein,Generalizing Hamiltonian Monte Carlo with NeuralNetworks, arXiv e-prints , arXiv:1711.09268 (2017),arXiv:1711.09268 [stat.ML].[53] M. Hoﬀman, P. Sountsov, J. V. Dillon, I. Lang-more, D. Tran, and S. Vasudevan, NeuTra-lizing BadGeometry in Hamiltonian Monte Carlo Using NeuralTransport, arXiv e-prints , arXiv:1903.03704 (2019),arXiv:1903.03704 [stat.CO].[54] A. Moss, Accelerated Bayesian inference using deeplearning, arXiv e-prints , arXiv:1903.10860 (2019),arXiv:1903.10860 [astro-ph.CO].[55] P. Schmidt, M. Hannam, and S. Husa, Towards modelsof gravitational waveforms from generic binaries: A sim-ple approximate mapping between precessing and non-precessing inspiral signals, Phys. Rev. D , 104063(2012).[56] S. Khan, K. Chatziioannou, M. Hannam, and F. Ohme,Phenomenological model for the gravitational-wave sig-nal from precessing binary black holes with two-spin ef-fects, Phys. Rev. D , 024059 (2019), arXiv:1809.10113[gr-qc].[57] S. R. Cook, A. Gelman, and D. B. Rubin, Validation of software for bayesian models using posterior quantiles,Journal of Computational and Graphical Statistics ,675 (2006), https://doi.org/10.1198/106186006X136976.[58] S. Talts, M. Betancourt, D. Simpson, A. Vehtari,and A. Gelman, Validating Bayesian Inference Algo-rithms with Simulation-Based Calibration, arXiv e-prints, arXiv:1804.06788 (2018), arXiv:1804.06788 [stat.ME].[59] J. Veitch and A. Vecchio, Bayesian coherent analysis ofin-spiral gravitational wave signals with a detector net-work, Phys. Rev. D , 062003 (2010), arXiv:0911.3820[astro-ph.CO].[60] A. Fowlie, W. Handley, and L. Su, Nested samplingcross-checks using order statistics, arXiv e-prints ,arXiv:2006.03371 (2020), arXiv:2006.03371 [stat.CO].[61] N. Smirnov, Table for estimating the goodness of ﬁtof empirical distributions, The annals of mathematicalstatistics , 279 (1948).[62] T. B. Arnold and J. W. Emerson, Nonparametricgoodness-of-ﬁt tests for discrete null distributions., RJournal (2011).[63] C. Durkan, A. Bekasov, I. Murray, and G. Papamakarios,Neural Spline Flows, arXiv e-prints , arXiv:1906.04032(2019), arXiv:1906.04032 [stat.ML].[64] J. Brehmer and K. Cranmer, Flows for simultaneousmanifold learning and density estimation, arXiv e-prints, arXiv:2003.13913 (2020), arXiv:2003.13913 [stat.ML]. [65] S. van der Walt, S. C. Colbert, and G. Varoquaux, Thenumpy array: A structure for eﬃcient numerical compu-tation, Computing in Science Engineering , 22 (2011).[66] P. Virtanen, R. Gommers, T. E. Oliphant, M. Haber-land, T. Reddy, D. Cournapeau, E. Burovski, P. Pe-terson, W. Weckesser, J. Bright, S. J. van der Walt,M. Brett, J. Wilson, K. Jarrod Millman, N. Mayorov,A. R. J. Nelson, E. Jones, R. Kern, E. Larson, C. Carey,˙I. Polat, Y. Feng, E. W. Moore, J. Vand erPlas, D. Lax-alde, J. Perktold, R. Cimrman, I. Henriksen, E. A. Quin-tero, C. R. Harris, A. M. Archibald, A. H. Ribeiro, F. Pe-dregosa, P. van Mulbregt, and S. . . Contributors, SciPy1.0: Fundamental Algorithms for Scientiﬁc Computingin Python, Nature Methods , 261 (2020).[67] T. pandas development team, pandas-dev/pandas: Pan-das (2020).[68] Wes McKinney, Data Structures for Statistical Comput-ing in Python, in Proceedings of the 9th Python in ScienceConference , edited by St´efan van der Walt and JarrodMillman (2010) pp. 56 – 61.[69] J. D. Hunter, Matplotlib: A 2d graphics environment,Computing in Science & Engineering , 90 (2007).[70] M. Waskom and the seaborn development team,mwaskom/seaborn (2020).[71] D. Foreman-Mackey, corner.py: Scatterplot matrices inpython, The Journal of Open Source Software1