Orbital dynamics of binary black hole systems can be learned from gravitational wave measurements
OOrbital dynamics of binary black hole systems can be learned fromgravitational wave measurements
Brendan Keith ∗ Center for Applied Scientific Computing, Lawrence Livermore National Laboratory andInstitute for Computational and Experimental Research in Mathematics (ICERM), Brown University
Akshay Khadse
Department of Physics and Astronomy, The University of Mississippi, University, MS 38677, USA
Scott E. Field
Department of Mathematics, University of Massachusetts, Dartmouth, MA 02747, USA andCenter for Scientific Computing and Visualization Research,University of Massachusetts, Dartmouth, MA 02747, USA
We introduce a gravitational waveform inversion strategy that discovers mechanical models ofbinary black hole (BBH) systems. We show that only a single time series of (possibly noisy) waveformdata is necessary to construct the equations of motion for a BBH system. Starting with a class ofuniversal differential equations parameterized by feed-forward neural networks, our strategy involvesthe construction of a space of plausible mechanical models and a physics-informed constrainedoptimization within that space to minimize the waveform error. We apply our method to variousBBH systems including extreme and comparable mass ratio systems in eccentric and non-eccentricorbits. We show the resulting differential equations apply to time durations longer than the traininginterval, and relativistic effects, such as perihelion precession, radiation reaction, and orbital plunge,are automatically accounted for. The methods outlined here provide a new, data-driven approachto studying the dynamics of binary black hole systems.
I. INTRODUCTION
Classical physical theories begin with scientific laws asans¨atze, which are validated by repeated scientific exper-iment. From these laws, one derives a set of equations(usually differential equations) which can be solved, ei-ther completely or partially, to deduce various conclu-sions about the physical system under consideration. Inthis paper, we follow a different approach to learningphysical equations: we solve an optimization problemwhich isolates the most likely physical model (differentialequations) that would deliver certain physical measure-ments (data). This approach is aligned with a growingtrend in data-driven science; see, e.g., [1–11] and refer-ences therein.The focus of this paper is on learning mechanical mod-els for binary black hole (BBH) systems through gravi-tational wave measurements. As the black holes orbitone another, the motion of these massive objects gener-ate gravitational waves that radiate away to the far-fieldwhere they can be observed by an international-networkof detectors [12]. Complicated partial differential equa-tions (PDEs) govern the entire process, and in particularconnect the near-field dynamics to the far-field gravita-tional radiation. Traditionally, black hole orbital dynam-ics and gravitational waves have been computed by ex-pensive simulation codes [13] or approximations to gen-eral relativity such as the post-Newtonian formalism [14]. ∗ Corresponding author: brendan [email protected]
Our principal contribution is to show that two-body rela-tivistic orbital models can be deduced from gravitationalwave measurements. This is accomplished by solving aninverse problem [15, 16] where the control variable is thevector of weights and biases in a neural network.
A. Universal differential equations
In this work, we rely on the following general classof dynamical system models referred to in [8] as (au-tonomous) universal differential equations (UDEs),˙ x = f ( x , F ( x )) , x (0) = x , (1)where x ( t ) is the solution vector and x specifies the ini-tial conditions. Here, F ( x ) is neural network and theoverdot symbol “ ˙ ” denotes differentiation with respectto time t ∈ [0 , T ]. For example, if F ( x ) = F ( x ; ξ ) is afeed-forward deep neural network with two hidden layers,it can be written as F ( x ; ξ ) = W σ ( W σ ( W ( x ) + b ) + b ) + b , (2)with ξ = ( W , W , W , b , b , b ). Here, W i are matrices(weights), b i are vectors (biases), and σ i are the chosenactivation functions.We note that the UDE paradigm permits a immensevariety of different parameterized functions F ( x ), notonly neural network-based parameterizations. This flex-ibility provides numerous advantages, for example, priorscientific knowledge of the solution may be incorporatedinto the choice of function parametrization. We choose a r X i v : . [ g r- q c ] F e b to use feed-forward neural networks, such as (2), becauseof the ease with which their parametrization can be de-termined using existing software [8]. B. BBH modeling
Our task is to define a family of physical models thatcan be used to describe relativistic orbital dynamics oftwo spherical objects of mass m and m . That is, weask that our model provides the position of object 1, r ( t ), and object 2, r ( t ), which are the solutions to thedynamical system model. In Newtonian physics, this isthe familiar two-body problem of Kepler whose solutionhas been known since the earliest days of classical me-chanics. In relativistic physics, the field of computationalrelativity is largely devoted to providing numerical solu-tions to this problem by solving the equations of generalrelativity (a nonlinear, coupled, hyperbolic-elliptic PDEsystem) on large supercomputers [13]. In order to moti-vate a dynamical system model, we consider special caseswhereby the dynamical motion described by PDEs can beapproximately reduced to ordinary differential equations(ODEs). Such approximations have been well developedover many decades, and we refer the reader to [14, 17, 18]and references therein. Throughout this paper, we usegeometric units where both the speed of light, c , and thegravitational constant, G , are set to unity.First, for slow moving objects ( v (cid:28) c ), a powerfulformalism known as the post-Newtonian approximationprovides a systematic framework for adding relativisticcorrections in powers of v/c to the Keplerian equationsof motion. The post-Newtonian framework provides uswith justification for treating the two-body problem as aneffective one-body problem. Here, the relevant equationsthat govern the separation vector, r = r − r , can beused to reconstruct the two-body motion through therelations, r = ( m /M ) r , r = − ( m /M ) r , (3)where M = m + m . We call this an effective one-bodyproblem as one often views there to be an “effective”body located at r relative to the system’s center-of-mass.A different technique, known as the post-Minkowskianapproximation, informs us that the dominant contribu-tion to the gravitational radiation field can be computedaccording to the quadrupole formula, first derived by Ein-stein. Collectively, the two-body-to-one-body map andthe quadrupole formula neglect a substantial amount ofphysics including terms proportional to v/c , as well asmany more higher-order terms, some of which have beencomputed and others that remain unknown.Second, in the limit of m (cid:29) m , the two-body prob-lem reduces to a simpler setup whereby the larger objectis fixed at the coordinate system’s origin. The smallerorbiting object’s motion is then described by a geodesicpath in the Schwarzschild geometry set by the largerblack hole. Recent numerical evidence suggests that the geodesic equations of motion (11) with self-force correc-tions may work unreasonably well even for near-equalmass systems [19–27]. More importantly for our pur-poses, we know from blackhole perturbation theory re-sults that using geodesic equations of motion to describethe two-body problem neglects a substantial amount ofphysics including terms proportional to m /m .Having motivated some of the physics behind the prob-lem, we now outline our strategy to write down a modelinspired by a combination of Newtonian and relativisticphysics. Our orbital model will omit a significant amountof important physics that will be accounted for by deepneural networks trained on gravitational waveform data.We write the two-body problem as an effective one-body one, and associate the orbital separation vector, r ,with the location of the fictitious effective body orbiting afixed, spherically-symmetric central object. Owing to thespherical symmetry of the central object, we may assume,without loss of generality, that the effective object’s tra-jectory lies in the equatorial plane, which we take to bethe plane perpendicular to the z-axis and where the an-gle φ is between r and the x-axis. Orbits are specified bythe orbital parameters eccentricity e ( t ) and semi–latusrectum p ( t ); in the Newtonian case these are constantswhile in the general relativistic case they are often in-terpreted as time-dependent functions. Finally, we use awell-known parameterization for the Euclidean norm of r , r ( t ) = p ( t ) M/ (1 + e ( t ) cos χ ( t )) , (4)and evolve the anomaly χ ( t ) instead of r ( t ) because theanomaly increases monotonically through radial turningpoints. To summarize, we assume the equations of mo-tion for the effective object can be described by four time-dependent variables φ , χ , p , and e . The effective object’strajectory is provided by φ, χ whereas p, e parameterizethe orbital configuration.Upon denoting x = ( φ, χ, p, e ), we propose the follow-ing family of UDEs to describe the two-body relativisticdynamics:˙ φ = (1 + e cos( χ )) M p / (cid:0) F (cos( χ ) , p, e ) (cid:1) , (5a)˙ χ = (1 + e cos( χ )) M p / (cid:0) F (cos( χ ) , p, e ) (cid:1) , (5b)˙ p = F ( p, e ) , (5c)˙ e = F ( p, e ) , (5d)with x (0) = ( φ , χ , p , e ). Note that the functionalform of Eqs. (5) have been inspired by Eq. (11), whichare the geodesic equations of motion for an infinitesimallysmall “particle” orbiting a super-massive blackhole. Inparticular, Eqs. (5) are rotationally invariant because theright-hand side omits the φ -variable. Moreover, when F = F = 0, orbital energy, E ( p, e ), and orbital angularmomentum, L ( p, e ), are conserved:˙ E = ∂E∂p ˙ p + ∂E∂e ˙ e, ˙ L = ∂L∂p ˙ p + ∂L∂e ˙ e. (6) FIG. 1. Flowchart of algorithm to solve the inverse problem (10).
Due to the emission of gravitational waves (so-calledradiation-reaction), we have that both ˙ E, ˙ L < F j = 0, we recover Newtonian orbits.Eqs. (5a) through (5d) define a family of trajectories x ( t ) = x ( t ; ξ ). Through Eq. (3), these trajectories deter-mine the black hole orbits, r ( t ) = r ( t ) m M (cos( φ ( t )) , sin( φ ( t )) , , (7a) r ( t ) = − r ( t ) m M (cos( φ ( t )) , sin( φ ( t )) , . (7b)Gravitational waves are generated by orbiting blackholes, and so the waves encode detailed informationabout the dynamical variables r ( t ) and r ( t ). Generalrelativity tells us that the dynamics and waves are con-nected through PDEs, which is a familiar scenario in themodeling of waves. In the next section we summarizehow to learn F j from gravitational wave measurements. C. Quadrupole formula, the loss function, andmodel discovery
Very far from a BBH system, where gravitational wavedetectors are located, the gravitational radiation field isan outgoing spherical wavefront. On a sufficiently largesphere we can expand the radiation field into a completebasis of (tensorial) spherical harmonics labeled by ( (cid:96), m )harmonic indices.In this paper, we consider only the dominant ( (cid:96), m ) =(2 , w = ( r/M ) · Re { h } , where h ( t ) = 1 r (cid:114) π (cid:16) ¨ I xx − i ¨ I xy − ¨ I yy (cid:17) (8)and the trace-free mass quadrupole tensors I xx , I xy , and I yy are defined in Eq. (25) (see also [28, Eqs. 54–56]).Eq. (8) is the well-known quadrupole formula which ex-presses the measurable waveform, h , in terms of the or-bits r = ( x , y ,
0) and r = ( x , y , F j , butis sufficient for our purpose.We assume that our waveform measurements appearas ordered pairs ( t k , w k ), where w k denotes the value ofthe waveform data at time t k ∈ [0 , T ]. In this setting, wedefine the mean-squared waveform error J ( x ) = (cid:104) J ( x , · ) (cid:105) := 1 T (cid:90) T J ( x , t ) d t, (9)where J ( x , t ) = (cid:80) k (cid:0) w k − w ( t ) (cid:1) δ ( t − t k ) and bracketnotation, (cid:104)·(cid:105) , denotes denotes averaging over the time in-terval. Accordingly, we choose to solve the inverse prob-lem: min ξ J ( x ) subject to (5a) through (5d) . (10)In some situations, convergence to the solution of (10)can be improved by adding well-chosen, physics-informedpenalty and regularization to (9); cf. Section II B.We note that the exclusive use of gravitational-wavedata in the loss function is motivated by the consider-ation that in experimental settings only gravitational-wave observations will be available and never a directview of black hole orbits. Even in computational relativ-ity simulations, the numerical measurement of black holetrajectories are complicated by coordinate ambiguities ofgeneral relativity that make it difficult to assign physicalsignificance to their values. Waveforms computed fromcomputational relativity simulations, on the other hand,are well-defined and physically meaningful.The ODE-constrained optimization problem (10) de-livers the calibrated dynamical system model ˙ x = f ( x , F ( x ; ξ (cid:63) )), where ξ (cid:63) denotes the optimizer found bysolving (10). This inverse problem can be solved witha number of standard methods. We choose to use aBFGS algorithm with backtracking line search [29] andan adjoint-based (implicit differentiation/adjoint sensi-tivity method) calculation of gradients [30] implementedwith the Julia [31] software package DiffEqFlux [8]. Thealgorithm is described by the flowchart in Figure 1. Ourcode is available for download at [32]. II. RESULTS
In this section, we present results with three differentexamples. The first demonstrates the ability of (10) torecover known orbital equations. The second two show-case the discovery of new equations of motion for equalmass binary black hole mergers.
A. Extreme mass ratio systems
As our first motivating example, we consider a specialcase of the relativistic two-body problem where the ex-act solution is known. We show that from short-durationgravitational wave observations we are able to discoverdifferential equations that are valid over much longertime-scales.In the regime of m (cid:29) m , formally the limit m → M , m is a “test particle” whose motion obeys [18, 33–35]˙ φ = ( p − − e cos χ )(1 + e cos χ ) M p / (cid:2) ( p − − e (cid:3) / , (11a)˙ χ = ( p − − e cos χ )(1 + e cos χ ) (cid:2) p − − e cos χ (cid:3) / M p (cid:2) ( p − − e (cid:3) / , (11b)while r = (0 , ,
0) and ˙ e = ˙ p = 0. We shall be inter-ested in the parameter restriction 0 ≤ e <
1, for whichthe radial motion occurs between two turning points, pM/ (1 + e ) and pM/ (1 − e ) and the orbit is bounded.When e = 0, the orbit is circular. We let F = F = 0and provide values for the initial conditions φ = 0 and χ = π . In our example, we set e = 0 . p = 100, and m = 1, although the results we show remain largelythe same for other parameter values we have tested. Forsimplicity, in this first example we provide known valuesfor e , p while in Sec. II B we show how our approachperforms when these parameters are also learned.To prepare our ground-truth data, we numericallysolve Eq. (11) on a dense time grid, thereby gener-ating the black hole trajectory r ( t ). We then applythe quadropole formula Eq. (8) to generate a gravita-tional waveform sampled at 250 equally-spaced pointsspanning the time interval [0 , . · ], and shown inFig. 2 (bottom panel; black dots). Note that in theextreme mass ratio limit, m →
0, and the waveform h ∝ m /m goes to zero. Therefore, in this example,we use w = ( m /m ) · Re { rh } as gravitational-wavedata; w is now independent of m .Using the procedure summarized in Sec. I C, we re-cover the governing equations by optimizing for F and F . In this setting, both abstract functions only dependon cos χ . We exploit this periodic structure by defin-ing F and F with cosine activation functions, σ j = cos.We then construct F and F as feed-forward neural net-works with two hidden layers each; see, e.g., Eq. (2). Theexact network architecture and numerical discretizationcan be found in the file EMR.jl in [32]. Finally, we learnthe corresponding neural network weights and biases byoptimizing (10).This process delivers the red trajectory and waveformpresented in Figure 2. Not only do both waveforms andtrajectories match over the training interval consisting ofabout about 6 orbits, they continue to agree when thelearned dynamics are extrapolated to about 31 orbits,after which the orbit’s perihelion precession has under-gone a full cycle. In Figure 2 we compare the true wave-forms/trajectories to the learned waveforms/trajectoriesover the extended time interval [0 , · ]. To compare thelearned model (5) to the exact model (11), we also com-pute the error in ˙ φ and ˙ χ over the extended time interval.Evidently, not only do the waveforms and trajectoriesmatch upon visual inspection, the learned model matchesthe true mechanical model to about two orders of mag-nitude. The learned model also recovers important rela-tivistic effects, notably perihelion precession, from just afew gravitational-wave cycles. Finally, we note that oncethe dynamical model is known, it can be used to gen-erate very long orbits and gravitational wave signals byintegrating the ODE (5) and post-processing the solutionwith the quadrupole formula (8).This experiment demonstrates the potential power ofwaveform inversion in a simple setting with a known so-lution. It also demonstrates how the information con-tent in the original waveform can be used to infer UDEs.Nevertheless, this system is conservative ( ˙ e = ˙ p = 0),the quadrupole formula is prescribed exactly, and thelearnable dynamics depend only on the χ -variable. Thefollowing examples are more challenging because none ofthe aforementioned simplifications hold. True trajectory Learned trajectory Training interval · · . · · . · · − . . W a v e f o r m True waveformLearned waveformTraining data
FIG. 2. Summary of our first experiment, where we have used gravitational-wave observations (black dots; bottom panel)to learn the underlying two-dimensional dynamical system model governing the relativistic two-body problem in the extrememass ratio limit with orbital parameters p = 100 , e = 0 .
5. Top left: Learned (dashed red) and exact (solid blue) trajectoriesextrapolated 4 × the training interval. We also show the portion of the orbit (black) corresponding to the gravitational-wavetraining window, although no orbital data was used to learn the dynamics. Top right: Relative error between the learnedmodel (5) and the exact model (11). Bottom: Learned (dashed red) and exact (solid blue) waveforms extrapolated 4 × thetraining interval. B. General relativistic orbital dynamics of binaryblack holes
In this pair of examples, we consider numerically gen-erated waveform measurements from equal mass m = m = 0 . Merger
Time − , − , −
500 0
Time before merger S e m i - l a t u s r ec t u m E cce n tr i c i t y − , − , −
500 0 − − − − − Time before merger R e l a t i v e d i s ag r ee m e n t k ˆ r − r k / k ˆ r k − , − , − , − , − − − −
200 0 − . − . . . Time before merger W a v e f o r m DataFit
FIG. 3. Summary of our second experiment, where have used gravitational-wave observations (black dots; bottom figure) tolearn the underlying dynamical system model governing the relativistic two-body problem for two equal mass black holes inquasi-circular orbit. Top left: Trajectories of the centers of mass of the black hole system
SXS:BBH:0217 (black lines), takenfrom the SXS Gravitational Waveform Database [36]. We also show the orbit computed from our learned dynamical system(blue and red lines). In the upper right panel, we show the evolution of the eccentricity and semi-latus rectum from ourlearned-dynamical system. The middle right panel shows the disagreement between the NR trajectories and the ones computedfrom the learned dynamical system. We caution the reader that this figure should not be understood as a relative error becausethe numerical relativity black hole trajectories and our learned model are expressed in different coordinate systems that areimpossible to relate. Bottom: Learned (red line) and computational relativity (black dots) waveform data corresponding tothe real part of the h mode. black holes are coordinate-dependent they can still beused to compare with the trajectories obtained from ourmodel. However, such comparisons should no longer beunderstood as model error since the coordinate systemused for the data and model are necessarily different.The simulations for this work were performed usingthe Spectral Einstein Code (SpEC) [36, 39–44] developedby the Simulating eXterme Spacetimes (SXS) collabora-tion [39] and made publicly available through the Grav-itational Waveform Database [36]. From now on, we augment the loss function J in (9)with non-negative penalty and regularization terms, mo-tivated below: J ( x , ξ ) = (cid:104) J ( x , · ) (cid:105) + P ( x ) + P ( x ) + R ( ξ ) . (12)In this new expression, we define P ( x ) = γ (cid:104) ( ˙ p ) (cid:105) + γ (cid:104) (¨ p ) (cid:105) , (13)where ( f ( t )) + = max { f ( t ) , } , P ( x ) = γ (cid:104) ( − e ) (cid:105) + γ (cid:104) ( e − e ) · { p> e } (cid:105) , (14)where Ω denotes the indicator function on the set Ω ⊂ [0 , T ], and finally R ( ξ ) = γ (cid:107) ξ (cid:107) , (15)where (cid:107) ξ (cid:107) denotes the (cid:96) -norm of the expanded param-eter vector ξ . We do not make a concerted attempt tooptimize the penalty and regularization coefficients. Inboth of the coming experiments, we fix γ = 10 , γ =10 , γ = 10 , and γ = 10 − . In the first experiment,we take γ = 1, while in the second experiment we use γ = 0.The physical motivation for the first two terms in (13)relies on Eqs. (7). From these equations, we have thatthe distance between the two black holes r is proportionalto p . Due to energy loss from the emitted gravitationalwaves, r ( t ) converges to zero at a rate that increasesthroughout the system’s evolution. The penalty terms (cid:104) ( ˙ p ) (cid:105) and (cid:104) (¨ p ) (cid:105) have been chosen to encourage the se-lection of solutions with this physical behavior. The firstterm in Eq. (14) encourages the selection of a positiveeccentricity function e ( t ) for all time t . On the otherhand, the final term in this definition is motivated bythe stability condition p ≥ e for bound orbits. It iswidely accepted that e decays in this range [33], and thisterm helps to direct the solution toward models with thisproperty.Clearly, if ˙ p, ¨ p, − e, ( e − e ) · { p> e } ≤
0, then P ( x ) = P ( x ) = 0. Our experiments appear to in-dicate that optimal solutions p ( t ) and e ( t ) satisfy eachof these bounds, therefore, the penalty terms only actas guardrails throughout the optimization process. TheTikhonov–Phillips regularization term (15) helps conver-gence by ensuring continuous dependence between thedata and the solution [45]. The Tichonov regularizationterm (cid:107) ξ (cid:107) can also help with model degeneracies. Forexample, when the orbit is circular ( e = 0) the model isdegenerate in χ and, therefore, it is also degenerated inthe weights and biases defining F . Other penalty andregularization terms could be considered in future stud-ies.In the following pair of examples, we construct feed-forward neural network parameterizations of F j , j =1 , . . . ,
4, with tanh activation functions. The exact net-work architecture we use can be found in files
SXS1.jl and
SXS2.jl [32].
1. Near-circular orbits from clean GW observations
For this experiment, we consider a binary black holesystem with negligible eccentricity during the initial in-spiral. For inspection, the center of mass-corrected tra-jectories of the binary black hole system are depicted inthe top left-hand corner of Figure 3 (solid black lines),with the associated 1000 equally-spaced waveform datapoints in the bottom panel (black dots). From now on,we let [0 , T ] denote the time interval between the first ( t = 0) and final ( t = T ) measurement, where the finalmeasurement occurs shortly before merger.As in the previous experiment, we adopt the initialconditions φ = 0 and χ = π and assume that r isknown. Using Eq. (4), these assumptions provide us withan explicit expression for p , M p = r · (1 + e cos( χ )).Due to the nearly-zero eccentricity of the initial trajec-tories, we opt for the simple initial condition e = 0. Inthe next and final experiment, we treat the more realisticcase where both e and χ are unknown.In order to avoid local minima, we solve (10) on asequence of increasing time intervals [0 , T ] (cid:40) [0 , T ] (cid:40) · · · (cid:40) [0 , T ], using the optimal parameters ξ (cid:63) from eachpreceding optimization problem (plus a small amount ofGaussian noise) as initial data for the subsequent prob-lem. Using this incremental procedure, we are able to re-cover the overwhelming majority of the black hole trajec-tories, as indicated by the visual agreement between thelearned (red and blue) and NR (black) trajectories shownin the top left-hand panel of Figure 3. We also depict therelative disagreement between the NR trajectory of thefirst black hole ˆ r and the learned trajectory r . Themodel also recovers important general relativistic effects,notably the learned functions F and F cause a runawayinspiral process that drives the black holes to merge. Thisprocess is seen most clearly by monitoring the behaviorof p ( t ) in the upper right-hand panel of Figure 3. We alsoobserve that our model is able to naturally include boththe inspiral ( p >
6) and plunge ( p <
6) orbital regimes.Note the upper right-hand panel of Figure 3 shows thatnear this transition region the eccentricity quickly grows,which is at odds with our physical expectation for stableorbits and indicates a very different dynamical regime.
2. Eccentric orbits from noisy GW measurements
This experiment proceeds in much the same way asthe previous one. Here, however, we learn the dynamicsof an eccentric binary black hole system whose trajecto-ries are depicted in the top left-hand corner of Figure 4(solid black lines) with the associated waveform data inthe bottom panel (black dots). Unlike the previous ex-periments, we do not assume known values for the initialconditions e , p , or χ but instead make these part of thelearning process. We continue to adopt the initial condi-tions φ = 0. As can be seen in Figure 4, we introduceadditive Gaussian noise to the waveform data of the form w ( t i ) + n ( t i ), where n ( t i ) is draw from a normal distribu-tion of mean 0 and standard deviation of σ = 10 − . Asthe typical waveform amplitude is ∼ .
1, this correspondsto a coefficient of variation of around σ/ . . e or χ in (10), in addition to the neuralnetwork parameters ξ , and deducing the associated valueof p directly from Eq. (4). The model also recovers im-portant general relativistic effects, notably, as in the pre-vious example, radiation-reaction effects. In this case,the tendency for the orbit to circularize, e ( t ) →
0, is seenin the upper right-hand panel of Figure 4. As before,the eccentricity has an inflection point around p ≈ III. DISCUSSION
We have presented a new, data-driven gravitationalwaveform inversion strategy which generates mechanicalmodels of binary black hole systems. We start with astructurally very simple set of universal differential equa-tions and parameterize the space of models with feed-forward neural networks. Our differential equations aretrained by solving a physics-informed constrained opti-mization that seeks to minimize the waveform error. Wetested our method on various BBH systems including ex-treme and comparable mass ratio systems in eccentricand non-eccentric orbits, and train on portions of thewaveform corresponding to orbital plunge right up tothe time of merger. We find that the resulting differen-tial equations agree remarkably well with the black holetrajectories computed through purely numerical means.Our models can be extrapolated in time and recover var-ious known relativistic effects despite these being pre-viously unknown to the universal differential equations.The main contribution of our work is to show that two-body relativistic models can be deduced from gravita-tional wave measurements.Our framework for learning the dynamics of binaryblack holes is quite general, and we expect that it can beapplied to a variety of cases we have not considered in-cluding unequal masses, aligned-spins systems, and pre-cessing systems. We believe the largest source of system-atic error in our method is the use of the quadrupole for-mula for generating waveforms from orbits, and any high-accuracy study should seek to use better maps from or-bits to waveforms. Our method should be especially use-ful for discovering equations of motion for systems wheretraditional approaches are less well-developed includ-ing eccentric binaries, the highly relativistic late-inspiraland plunge dynamical regimes, and beyond-GR theories.When combined with other techniques, the methods de-scribed here may assist in building high-accuracy gravi-tational wave models, many of which require computedtrajectories, to analyze gravitational-wave observations.We leave these explorations to future work.
IV. METHODS
In this section, we elaborate on the technical elementsof our work which are necessary for replication of the re-sults. We open with a brief overview of the adjoint sensi- tivity method [30] which we used to compute derivativesof J (see Eqs. (9) and (12)) with respect to ξ and, in turn,facilitate solving problem (10). The section then closeswith an overview of the quadrupole formula we have usedto model the gravitational waveform (8). Because bothof these topics are well-known in specific (but mainly dis-joint) scientific communities, we keep the exposition briefbut include numerous references to the literature. A. Calculation of derivatives
The ODE-constrained optimization problem (10) de-livers a calibrated dynamical system model˙ x = F ( x ; ξ ) , x (0) = x , (16)where F ( x ; ξ ) = f ( x , F ( x ; ξ )). Because x = x ( ξ ) de-pends implicitly on ξ through Eq. (16), the main tech-nical difficulty with solving such optimization problemslies in computing total derivatives of the functional J ( x , ξ ) = 1 T (cid:90) T J ( x , ξ ) d t, (17)with respect to ξ .Indeed, assume that we are working with the defini-tion of J given in Eq. (9). Here, any gradient-based op-timization algorithm requires the calculation of the totalderivative d ξ J = 1 T (cid:90) T ∂ x J d ξ x + ∂ ξ J d t. (18)One may easily note that, in the specific setting given tous through Eq. (9), we have J = J ( x ) and, therefore,the partial derivative ∂ ξ J vanishes. In more general sit-uations, the term ∂ ξ J is routine to derive. On the otherhand, the calculation of d ξ x is problematic due to the factthat x ( ξ ) is not available in closed form. One approachto computing d ξ J involves directly estimating d ξ x via fi-nite differences, however, the cost of this approach scaleslinearly with the size of ξ . This makes it prohibitive formost practical problems. We choose to compute thesegradients using in an alternative way often referred to asthe adjoint sensitivity method [30].The adjoint sensitivity method has been used exten-sively in engineering design [46, 47] and, more recently,in machine learning research [4, 8, 48]. It involves the in-tegration of two ODEs over the time interval [0 , T ]: theoriginal governing ODE (16) and an adjoint ODE (inte-grated backwards in time).The derivation of this method usually begins with theLagrangian L = 1 T (cid:90) T J ( x ( t ) , ξ ) − λ ( t ) (cid:62) (cid:0) ˙ x ( t ) − F ( x ( t ); ξ )) (cid:1) d t − µ (cid:62) (cid:0) x (0) − x (cid:1) , (19) Merger
Time − , − , −
500 0
Time before merger S e m i - l a t u s r ec t u m E cce n tr i c i t y − , − , −
500 0 − − − − − Time before merger R e l a t i v e d i s ag r ee m e n t k ˆ r − r k / k ˆ r k − , − , − , − − − −
200 0 − . − . . . Time before merger W a v e f o r m Real Imag.Data ReferenceFit Learned
FIG. 4. Summary of our third experiment, where have used noisy gravitational-wave observations (black dots; bottom figure)to learn the underlying dynamical system model governing the relativistic two-body problem for two equal mass black holes inan eccentric orbit. Top left: Trajectories of the centers of mass of a black hole system
SXS:BBH:1356 (black lines), taken fromthe SXS Gravitational Waveform Database [36]. We also show the orbit computed from our learned dynamical system (blue andred lines). In the upper right panel, we show the evolution of the eccentricity and semi-latus rectum from our learned-dynamicalsystem. The middle right panel shows the disagreement between the NR trajectories and the ones computed from the learneddynamical system. We caution the reader that this figure should not be understood as a relative error because the numericalrelativity black hole trajectories and our learned model are expressed in different coordinate systems that are impossible torelate. Bottom: Learned (red line) and computational relativity (black dots) waveform data corresponding to the real part ofgravitational waveforms. Here, we also include the learned imaginary part of the waveform (blue line), reconstructed with thequadrupole formula, and compare it with the reference imaginary part taken from the SXS database (black line). which comes from writing the ξ -minimization of (17),constrained by solutions of (16), as a saddle-point prob-lem [49]. This functional is clearly designed such thatd ξ J = d ξ L for any x satisfying (16). In addition, one observes thatd ξ L = ∂ x L d ξ x + ∂ λ L d ξ λ + ∂ µ L d ξ µ + ∂ ξ L (20)= 1 T (cid:90) T ∂ ξ J ( x ) − λ (cid:62) ∂ ξ F ( x ; ξ ) d t, (21)0if ∂ λ L = ∂ µ L = ∂ x L = 0. A straightforward calculationshows that the first two of these equations are equiva-lent to the original dynamical system (16). On the otherhand, ∂ x L = 0 is equivalent to the adjoint equation − ˙ λ = [ ∂ x F ( x ; ξ )] (cid:62) λ + ∂ x J ( x , ξ ) , λ ( T ) = . (22)Evidently, the ODE system (22) depends on the solu-tion to (16), x ( t ). Therefore, the algorithm for computingd ξ J must follow a specific order:1. Integrate ˙ x = F ( x ; ξ ) from t = 0 to T with theinitial condition x (0) = x .2. Integrate − ˙ λ = [ ∂ x F ( x ; ξ )] (cid:62) λ + ∂ x J ( x , ξ ) from t = T to 0 with the initial condition λ ( T ) = .3. Compute d ξ J = T (cid:82) T ∂ ξ J ( x ) − λ (cid:62) ∂ ξ F ( x ; ξ ) d t .Extension of this algorithm to the scenario where theinitial condition x is also optimized for (as consideredin, e.g., Section II B 2) is straightforward. For thor-ough accounts of the numerical implementation of theadjoint sensitivity method for UDEs, we refer the inter-ested reader to [8, 50]. B. Gravitational waves from an orbit
In the context of general relativity, gravitational wavesare associated with the outgoing, radiative parts of thespacetime metric and are solutions to Einstein field equa-tion. The motion of massive objects produce gravita-tional waves, and our model outlined in Sec. I B providesthe equations of motion for object 1 of mass m and po-sition r ( t ), and object 2 of mass m and position r ( t ).The quadrupole formula provides one simple method ofobtaining the gravitational radiation from these orbitaltrajectories. In this framework, we assume both blackholes to be “point sources” (i.e. Dirac delta functions).The Newtonian mass density of two objects orbiting inthe x-y plane is ρ ( t, x, y, z ) = m δ ( z ) δ ( x − x ( t )) δ ( y − y ( t ))+ m δ ( z ) δ ( x − x ( t )) δ ( y − y ( t )) , (23)and note that in the special case m (cid:29) m we have r ( t ) = (0 , ,
0) and r ( t ) = r ( t ). Given the den-sity, the quadrupole formula tells us that the dominantquadrupole mode of the gravitational radiation field ten-sor in the transverse-traceless gauge is rH ab = 2 ∂ ∂t I ab , (24)where the trace-free mass quadrupole tensor is I ab = I ab − δ ab δ cd I cd , (25) δ ab is the Kronecker delta. In Cartesian coordinates, theindicies take on values of “x”, “y”, and “z”. For example, we have H xx , H yy , H xy , etc. The components of themass quadrupole tensor, I ab , that are relevant to Eq. (24)(non-zero temporal derivatives) are I xx = (cid:90) d xρx = m x ( t ) + m x ( t ) , (26a) I yy = (cid:90) d xρy = m y ( t ) + m y ( t ) , (26b) I xy = (cid:90) d xρxy = m x ( t ) y ( t ) + m x ( t ) y ( t ) , (26c)and by symmetry I xy = I yx .This framework computes the gravitational wave asperturbations, H ab , of the background spacetime met-ric. However, numerical simulations instead provide theplus, h + and cross, h × , gravitational-wave polarizationsdefined on a “sphere at infinity”, which are obtained af-ter contracting H ab with the polarization tensors [28].On this sphere, it is common to define a complexifiedgravitational wave h ( t, θ, φ ) = h + ( t, θ, φ ) − i h × ( t, θ, φ ) (27)= ∞ (cid:88) (cid:96) =2 l (cid:88) m = − l h (cid:96)m ( t ) − Y (cid:96)m ( θ, φ ) , (28)and subsequently expand h into a complete basis ofspin = − (cid:96), m )harmonic indices, − Y (cid:96)m . Here θ and φ are the polarand azimuthal angles. For example, the SXS (2 ,
2) modegravitational waveform data, h , was used extensivelyin this paper. Given the orbital trajectories, the (2 , h ( t ) = 1 r (cid:114) π (cid:16) ¨ I xx − i ¨ I xy − ¨ I yy (cid:17) , (29)can be computed directly from the trace-free massquadrupole tensor [28, Eqs. 54–56].To summarize, from the orbital motion we computethe three components of the mass quadrupole tensorEq. (26), compute the trace-free mass quadrupole ten-sor, compute the time derivatives using finite differences,and then finally assemble the (2,2)-multipolar componentof the outgoing gravitational waves, h , from Eq. (29).While a full discussion is outside the scope of thisappendix, we point out that the quadrupole formulais the simplest possible one and, consequently, ignoresa lot of the relevant physics. Future work couldconsider incorporating more physics into the gravita-tional waveform model, including relativistic definitionsof the density, higher-order post-Minkowskian correc-tions, subdominant harmonic modes, or near-field-to-far-field transport maps. Nevertheless, some of missingphysics might already be accounted for through the or-bital model, where the deep networks may try to accountfor missing physics in the waveform model by modifyingthe orbital dynamics model.1 ACKNOWLEDGMENTS
We thank Gaurav Khanna, Tom O’Leary-Roseberry,and Harald Pfeiffer for helpful discussions. S.E.F. is sup-ported by NSF grants No. PHY-1806665 and No. DMS-1912716. This manuscript was written while the au-thors were in residence at the Institute for Computationaland Experimental Research in Mathematics (ICERM) inProvidence, RI, during the Advances in ComputationalRelativity program, supported by the National ScienceFoundation under Grant No. DMS-1439786.This work was performed under the auspices of theU.S. Department of Energy by Lawrence Livermore Na-tional Laboratory under Contract DE-AC52-07NA27344,LLNL-JRNL-819108. This document was prepared as anaccount of work sponsored by an agency of the UnitedStates government. Neither the United States govern- ment nor Lawrence Livermore National Security, LLC,nor any of their employees makes any warranty, expressedor implied, or assumes any legal liability or responsibil-ity for the accuracy, completeness, or usefulness of anyinformation, apparatus, product, or process disclosed, orrepresents that its use would not infringe privately ownedrights. Reference herein to any specific commercial prod-uct, process, or service by trade name, trademark, man-ufacturer, or otherwise does not necessarily constituteor imply its endorsement, recommendation, or favoringby the United States government or Lawrence LivermoreNational Security, LLC. The views and opinions of au-thors expressed herein do not necessarily state or reflectthose of the United States government or Lawrence Liv-ermore National Security, LLC, and shall not be used foradvertising or product endorsement purposes. [1] J. P. Crutchfield and B. McNamara, Equations of motionfrom a data series, Complex systems , 121 (1987).[2] S. L. Brunton, J. L. Proctor, and J. N. Kutz, Discoveringgoverning equations from data by sparse identification ofnonlinear dynamical systems, Proceedings of the nationalacademy of sciences , 3932 (2016).[3] M. Raissi and G. E. Karniadakis, Hidden physics models:Machine learning of nonlinear partial differential equa-tions, Journal of Computational Physics , 125 (2018).[4] R. T. Q. Chen, Y. Rubanova, J. Bettencourt, andD. K. Duvenaud, Neural ordinary differential equations,in Advances in Neural Information Processing Systems ,Vol. 31, edited by S. Bengio, H. Wallach, H. Larochelle,K. Grauman, N. Cesa-Bianchi, and R. Garnett (CurranAssociates, Inc., 2018) pp. 6571–6583.[5] Z. Long, Y. Lu, X. Ma, and B. Dong, PDE-net: LearningPDEs from data, in
International Conference on MachineLearning (PMLR, 2018) pp. 3208–3216.[6] J. Han, C. Ma, Z. Ma, and E. Weinan, Uniformly ac-curate machine learning-based hydrodynamic models forkinetic equations, Proceedings of the National Academyof Sciences , 21983 (2019).[7] K. Wu and D. Xiu, Numerical aspects for approximat-ing governing equations using data, Journal of Compu-tational Physics , 200 (2019).[8] C. Rackauckas, Y. Ma, J. Martensen, C. Warner,K. Zubov, R. Supekar, D. Skinner, and A. Ramad-han, Universal differential equations for scientific ma-chine learning, arXiv preprint arXiv:2001.04385 (2020).[9] D. Z. Huang, K. Xu, C. Farhat, and E. Darve, Learn-ing constitutive relations from indirect observations usingdeep neural networks, Journal of Computational Physics , 109491 (2020).[10] Y. Sun, L. Zhang, and H. Schaeffer, NeuPDE: Neuralnetwork based ordinary and partial differential equa-tions for modeling time-dependent data, in
Proceedings ofThe First Mathematical and Scientific Machine LearningConference , Proceedings of Machine Learning Research,Vol. 107, edited by J. Lu and R. Ward (PMLR, PrincetonUniversity, Princeton, NJ, USA, 2020) pp. 352–372.[11] R. Dandekar, V. Dixit, M. Tarek, A. Garcia-Valadez, and C. Rackauckas, Bayesian neural ordinary differentialequations, arXiv preprint arXiv:2012.07244 (2020).[12] R. Abbott, T. Abbott, S. Abraham, F. Acernese, K. Ack-ley, A. Adams, C. Adams, R. Adhikari, V. Adya, C. Af-feldt, et al. , Gwtc-2: Compact binary coalescences ob-served by ligo and virgo during the first half of the thirdobserving run, arXiv preprint arXiv:2010.14527 (2020).[13] L. Lehner and F. Pretorius, Numerical relativity and as-trophysics, arXiv preprint arXiv:1405.4840 (2014).[14] L. Blanchet, Gravitational radiation from post-newtonian sources and inspiralling compact binaries,Living Reviews in Relativity , 1 (2014).[15] C. R. Vogel, Computational methods for inverse prob-lems , Frontiers in applied mathematics (Society for In-dustrial and Applied Mathematics, 2002).[16] S. Arridge, P. Maass, O. ¨Oktem, and C.-B. Sch¨onlieb,Solving inverse problems using data-driven models, ActaNumerica , 1–174 (2019).[17] E. Poisson and C. M. Will, Gravity: Newtonian, post-newtonian, relativistic (Cambridge University Press,2014).[18] S. Chandrasekhar,
The mathematical theory of blackholes , Vol. 69 (Oxford University Press, 1998).[19] A. G. Lewis, A. Zimmerman, and H. P. Pfeiffer, Fun-damental frequencies and resonances from eccentric andprecessing binary black hole inspirals, Classical andQuantum Gravity , 124001 (2017).[20] A. Zimmerman, A. G. Lewis, and H. P. Pfeiffer, Red-shift factor and the first law of binary black hole me-chanics in numerical simulations, Physical review letters , 191101 (2016).[21] A. Le Tiec, A. H. Mroue, L. Barack, A. Buonanno,H. P. Pfeiffer, N. Sago, and A. Taracchini, Periastron ad-vance in black-hole binaries, Physical review letters ,141101 (2011).[22] A. Le Tiec, E. Barausse, and A. Buonanno, Gravitationalself-force correction to the binding energy of compact bi-nary systems, Physical review letters , 131103 (2012).[23] A. Le Tiec, The overlap of numerical relativity, per-turbation theory and post-newtonian theory in the bi-nary black hole problem, International Journal of Mod- ern Physics D , 1430022 (2014).[24] A. Le Tiec, A. Buonanno, A. H. Mroue, H. P. Pfeif-fer, D. A. Hemberger, G. Lovelace, L. E. Kidder, M. A.Scheel, B. Szil´agyi, N. W. Taylor, et al. , Periastron ad-vance in spinning black hole binaries: Gravitational self-force from numerical relativity, Physical Review D ,124027 (2013).[25] N. E. Rifat, S. E. Field, G. Khanna, and V. Varma, Sur-rogate model for gravitational wave signals from compa-rable and large-mass-ratio black hole binaries, PhysicalReview D , 081502 (2020).[26] A. Pound, B. Wardell, N. Warburton, and J. Miller,Second-order self-force calculation of gravitational bind-ing energy (2019), arXiv:1908.07419 [gr-qc].[27] M. van de Meent and H. P. Pfeiffer, Intermediate mass-ratio black hole binaries: Applicability of small mass-ratio perturbation theory, Physical Review Letters ,181101 (2020).[28] N. T. Bishop and L. Rezzolla, Extraction of gravitationalwaves in numerical relativity, Living reviews in relativity , 2 (2016).[29] J. Nocedal and S. Wright, Numerical optimization (Springer Science & Business Media, 2006).[30] V. Boltyanskiy, R. V. Gamkrelidze, Y. Mishchenko, andL. Pontryagin,
Matematicheskaya teoriya optimal’nykhprotsessov (Mathematical Theory of Optimal Processes) (Nauka, 1961).[31] J. Bezanson, A. Edelman, S. Karpinski, and V. B. Shah,Julia: A fresh approach to numerical computing, SIAMreview , 65 (2017).[32] B. Keith, A. Khadse, and S. E. Field, Three examplesof learning orbital dynamics of binary black hole systemsfrom gravitational wave measurements with Julia (2021), http://10.5281/zenodo.4477649 .[33] C. Cutler, D. Kennefick, and E. Poisson, Gravita-tional radiation reaction for bound motion around aschwarzschild black hole, Physical Review D , 3816(1994).[34] K. Martel, Gravitational waveforms from a point particleorbiting a schwarzschild black hole, Physical Review D , 044025 (2004).[35] S. E. Field, J. S. Hesthaven, and S. R. Lau, Discontinuousgalerkin method for computing gravitational waveformsfrom extreme mass ratio binaries, Classical and QuantumGravity , 165010 (2009).[36] M. Boyle, D. Hemberger, D. A. Iozzo, G. Lovelace, S. Os-sokine, H. P. Pfeiffer, M. A. Scheel, L. C. Stein, C. J.Woodford, A. B. Zimmerman, et al. , The SXS collabo- ration catalog of binary black hole simulations, Classicaland Quantum Gravity , 195006 (2019).[37] A. Buonanno and T. Damour, Effective one-body ap-proach to general relativistic two-body dynamics, Physi-cal Review D , 084006 (1999).[38] E. Poisson, A. Pound, and I. Vega, The motion of pointparticles in curved spacetime, Living Reviews in Relativ-ity , 1 (2011).[39] The Spectral Einstein Code, .[40] J. W. York, Jr., Conformal ’thin sandwich’ data for theinitial-value problem, Phys. Rev. Lett. , 1350 (1999),arXiv:gr-qc/9810051 [gr-qc].[41] H. P. Pfeiffer and J. W. York, Jr., Extrinsic curvatureand the Einstein constraints, Phys. Rev. D67 , 044022(2003), arXiv:gr-qc/0207095 [gr-qc].[42] L. Lindblom, M. A. Scheel, L. E. Kidder, R. Owen,and O. Rinne, A New generalized harmonic evolutionsystem, Class. Quant. Grav. , S447 (2006), arXiv:gr-qc/0512093 [gr-qc].[43] O. Rinne, L. T. Buchman, M. A. Scheel, and H. P. Pfeif-fer, Implementation of higher-order absorbing bound-ary conditions for the Einstein equations, Class. Quant.Grav. , 075009 (2009), arXiv:0811.3593 [gr-qc].[44] SXS Collaboration, The SXS collaboration catalogof gravitational waveforms, .[45] M. Benning and M. Burger, Modern regularization meth-ods for inverse problems, Acta Numerica , 1–111(2018).[46] A. Jameson, Aerodynamic design via control theory,Journal of scientific computing , 233 (1988).[47] M. B. Giles and N. A. Pierce, An introduction to theadjoint approach to design, Flow, turbulence and com-bustion , 393 (2000).[48] F. d. A. Belbute-Peres, T. Economon, and Z. Kolter,Combining differentiable PDE solvers and graph neuralnetworks for fluid flow prediction, in International Con-ference on Machine Learning (PMLR, 2020) pp. 2402–2411.[49] I. Ekeland and R. Temam,
Convex analysis and varia-tional problems , Classics in applied mathematics (Societyfor Industrial and Applied Mathematics, 1999).[50] A. Gholami, K. Keutzer, and G. Biros, ANODE: Uncon-ditionally accurate memory-efficient gradients for neuralODEs, in