[PDF] Towards fast machine-learning-assisted Bayesian posterior inference of realistic microseismic events

Abstract

Bayesian inference applied to microseismic activity monitoring allows for principled estimation of the coordinates of microseismic events from recorded seismograms, and their associated uncertainties. However, forward modelling of these microseismic events, necessary to perform Bayesian source inversion, can be prohibitively expensive in terms of computational resources. A viable solution is to train a surrogate model based on machine learning techniques, to emulate the forward model and thus accelerate Bayesian inference. In this paper, we improve on previous work, which considered only sources with isotropic moment tensor. We train a machine learning algorithm on the power spectrum of the recorded pressure wave and show that the trained emulator allows for the complete and fast retrieval of the event coordinates for \textit{any} source mechanism. Moreover, we show that our approach is computationally inexpensive, as it can be run in less than 1 hour on a commercial laptop, while yielding accurate results using less than 10^4 training seismograms. We additionally demonstrate how the trained emulators can be used to identify the source mechanism through the estimation of the Bayesian evidence. This work lays the foundations for the efficient localisation and characterisation of any recorded seismogram, thus helping to quantify human impact on seismic activity and mitigate seismic hazard.

Full PDF

GGeophys. J. Int. (0000) , 000–000

Towards fast machine-learning-assisted Bayesian posterior inferenceof realistic microseismic events

Davide Piras (cid:63) , Alessio Spurio Mancini , , , Benjamin Joachimi , Michael P. Hobson Department of Physics and Astronomy, University College London, Gower Street, London, WC1E 6BT, UK Mullard Space Science Laboratory, University College London, Holmbury St. Mary, Dorking, Surrey, RH5 6NT, UK Astrophysics Group, Cavendish Laboratory, J. J. Thomson Avenue, Cambridge, CB3 0HE, UK

14 January 2021

SUMMARY

Bayesian inference applied to microseismic activity monitoring allows for principled estima-tion of the coordinates of microseismic events from recorded seismograms, and their associateduncertainties. However, forward modelling of these microseismic events, necessary to performBayesian source inversion, can be prohibitively expensive in terms of computational resources.A viable solution is to train a surrogate model based on machine learning techniques, to em-ulate the forward model and thus accelerate Bayesian inference. In this paper, we improve onprevious work, which considered only sources with isotropic moment tensor. We train a ma-chine learning algorithm on the power spectrum of the recorded pressure wave and show thatthe trained emulator allows for the complete and fast retrieval of the event coordinates for any source mechanism. Moreover, we show that our approach is computationally inexpensive, as itcan be run in less than 1 hour on a commercial laptop, while yielding accurate results using lessthan training seismograms. We additionally demonstrate how the trained emulators can beused to identify the source mechanism through the estimation of the Bayesian evidence. Thiswork lays the foundations for the efﬁcient localisation and characterisation of any recordedseismogram, thus helping to quantify human impact on seismic activity and mitigate seismichazard. Key words: seismograms – microseismicity – statistical methods – seismic modelling – artiﬁ-cial intelligence.

Underground human activity, including ﬂuid injection in rocks andmining operations, can cause seismic events with much smaller am-plitude than large-scale earthquakes (Majer et al., 2007; Ellsworth,2013). These induced seismic signals are usually referred to as mi-croseismic events, and their monitoring is critical in understand-ing the impact of human activity on seismic hazard (Brueckl et al.,2008; Shapiro et al., 2010; Mukuhira et al., 2016; Das et al., 2017,and references therein). Microseismic events are usually trackedby placing geophones on the land surface or at the seabed (Panahiet al., 2005; Fertitta et al., 2010). Besides measuring the amplitudeof the seismograms, these sensors allow for a spatial and temporallocalisation of the event. In particular, once a seismogram has beenrecorded, we wish to infer its source location in the subsurface, inorder to mitigate the seismic hazard and forecast seismic risk.Various methods for microseismic event - and, more gener-ally, earthquake - location are available in the literature, datingback to the work of Geiger (1910), and up to today (see e.g. Vascoet al., 2019, and references therein, for a recent review). One of (cid:63) [email protected] the most common approaches relies on estimating the ﬁrst arrivaltime by means of the eikonal equation (see e.g. Noack and Clark,2017; Smith et al., 2020), which allows for the direct comparisonof travel times through a direct grid search or more sophisticatedtechniques (Wuestefeld et al., 2018). More accurate predictions canbe obtained exploiting methods that take into consideration the fullwaveform, even though they generally require more computationalresources (Angus et al., 2014; Vasco et al., 2019).A Bayesian approach can be adopted to solve the inverseproblem as well (Lomax et al., 2000; Tarantola, 2005; St¨ahler andSigloch, 2014, 2016; Pugh et al., 2016). We assume a model whereeach microseismic event is uniquely identiﬁed by a set of coordi-nates, which are the model parameters. Then, given a seismogram,we wish to obtain the parameters’ posterior distribution, whichis used to provide an estimate of the model parameters and theirassociated uncertainty. Markov Chain Monte Carlo (MCMC; seee.g. Craiu and Rosenthal, 2014, for a review) and nested sampling(Skilling, 2006) techniques are among those employed to samplethe posterior distribution. However, this approach becomes pro-hibitive when dealing with a high number of parameters, or whenthe forward model is computationally expensive to simulate (seee.g. Rajaratnam and Sparks, 2015; Conrad et al., 2016; Alsing a r X i v : . [ phy s i c s . g e o - ph ] J a n D. Piras, A. Spurio Mancini, B. Joachimi, M. Hobson et al., 2018). For these reasons, being able to cheaply and accu-rately simulate a microseismic event given its coordinates has be-come paramount in recent years. In order to obtain forward simula-tions of microseismic events, one must solve the elastic wave equa-tion given a 3-D heterogenous density and velocity model for thepropagating medium (Das et al., 2017), which can be prohibitivelyexpensive unless sophisticated techniques are implemented.Machine-learning generative models have gained considerableattention in recent years, with applications to many ﬁelds rangingfrom computer vision (Goodfellow et al., 2014; Gulrajani et al.,2016) to astrophysics (Auld et al., 2007, 2008), as well as climatescience, nuclear physics and drug selection (see e.g. Kasim et al.,2020; Chenthamarakshan et al., 2020). These advances have beenenabled by both an increased accessibility to speciﬁc computationalresources, as well as a signiﬁcant growth in the amount of availabledata.In geophysics, Das et al. (2017) developed an optimised ap-proach to microseismic events generation that, for each set of co-ordinates and given a physical model for the propagating medium,produces the corresponding seismogram in O (1 h) using a Teslagraphics processing unit (GPU) and the software K - WAVE (Treebyet al., 2014). Subsequently, Das et al. (2018) and Spurio Manciniet al. (2020) (D18 and SM20 hereafter, respectively) showed thelimitations of this direct approach, and presented an alternativewhere the mapping is learnt using machine learning techniques. Inparticular, D18 showed how Gaussian Processes (GPs, Rasmussenand Williams, 2005) can be used to learn an accurate surrogatemodel, while SM20 demonstrated the effectiveness of a variety ofmachine learning algorithms as emulators, and showed how theirsurrogate model yields an accurate estimate of the posterior distri-bution of an event’s coordinates in a fraction of the time requiredby the D18 method.D18 and SM20, however, only applied their methodologiesto isotropic microseismic events. It is well known that any sourcemechanism can be mathematically decomposed into three compo-nents: isotropic (ISO), double couple (DC), and compensated lin-ear vector dipole (CLVD) (see e.g. Knopoff and Randall, 1970;Vavryˇcuk, 2001, 2005; Vavryˇcuk, 2015). The pure ISO source isassociated with implosive or explosive force, while the pure DCsource is caused by shear faulting. The CLVD source is coupled tothe ISO source, but compressive stress is exerted along one direc-tion, while tensile stress is exerted along the other two (Li et al.,2015).In this paper, we present an approach that aims at learningthe direct mapping between coordinates and seismograms for anymicroseismic source type (ISO, DC and CLVD). We show that itis sufﬁcient to consider the power spectrum of the recorded pres-sure waves in order to distinguish different seismograms, and wetrain a simple machine learning algorithm to learn this mapping ef-ﬁciently. Moreover, we demonstrate how our method allows for theaccurate inference of the posterior distribution of the coordinates ofa single source in O (0 . on a commercial laptop, thus paving theway for the fast and inexpensive localisation of any microseismicevent. Finally, we show how we can use the trained emulators toidentify the source mechanism through Bayesian evidence estima-tion, thus demonstrating the versatility of our Bayesian approach.We structure the paper as follows. In Sect. 2 we describe thedata we consider in this work. In Sect. 3 we explain what prepro-cessing steps we perform and brieﬂy recall the details of the gener-ative method we employ. We show its performance at both trainingand inference time in Sect. 4. Finally, in Sect. 5 we discuss ourresults and provide an outlook on possible extensions of this work. In this work, we consider the same data framework as D18 andSM20, starting from 3-D heterogenous density and velocity modelsfor the propagating medium, which we show in Fig. 1. The model,which is discretised on a 3-D grid of voxels, speciﬁes the values ofthe density ρ of the propagating medium, as well as the propaga-tion velocities for P- and S-waves ( V p , V s ). We assume that sen-sors are placed at the seabed, and that they record both pressureand three-component particle velocity of the propagating medium(even though we will use only the former, as we explain later on).As anticipated, our aim is to apply our method to any source mech-anism, so we will consider a more general generation procedurethan previous work. Unlike D18 and SM20, who only consideredisotropic sources, we take the microseismic moment tensor to beone of three types, which we denote as M ISO , M DC and M CLVD .Following Vavryˇcuk (2005) and Li et al. (2015), we deﬁne thesequantities as: M ISO =  M M

00 0 M  , (1) M DC =  M M  , (2) M CLVD =  M M

00 0 − M  , (3)where each M ij represents a different couple of forces. We addi-tionally assume M = M = M = M = M = 1 MPa,which is a realistic assumption following Collettini and Barchi(2002). In practice, isotropic (ISO) events are characterised by asingle (explosive or implosive) P-wave, while double couple (DC)events are linked to shear stress and are characterised by both aP- and S-wave, with comparable amplitudes. Similarly to isotropicevents, compensated linear vector dipole (CLVD) events display anoften dominant principal wave, whose amplitude is however muchsmaller than the isotropic one, and a non-negligible S-wave.We generate forward simulations using the GPU implementa-tion of Das et al. (2017), separately using 4 types of GPUs: TeslaK20, Tesla C2075, GeForce RTX 2080 Ti and GeForce GTX 1080Ti. For each source type, we produce 10000 events correspondingto different source locations, which are randomly sampled usingLatin Hypercube Sampling on a 3-D grid of 81 × ×

301 points,corresponding to a real geological model of size 1 km × × . ms (2 kHz), and the total length of each seismicevent is s. After generation, all seismograms are downsampledto a time resolution of . ms (400 Hz) to reduce computationalstorage; this is done by keeping only the ﬁrst of each group of 5components. In this way, each seismic trace is ultimately a timeseries composed of N t = 2001 time components: we show anexample for each source mechanism in the left column of Fig. 3. ast seismic inference using machine learning Figure 1.

P-wave velocity ( V p ), S-wave velocity ( V s ) and density ( ρ ) models of the simulated domain we consider in this work. The models are speciﬁedas 3-D grids of voxels, with size 81 × ×

301 points, corresponding to a real geological model of size 1 km × × x [km] y [ k m ] Figure 2.

Projection of the positions of the 23 receivers on the x − y plane; z = 2 . km corresponds to the seabed, where all sensors lie. Each receiverrecords the acoustic pressure wave and particle velocity generated by a mi-croseismic event below the seabed. The red crosses indicate all 23 receivers,while green circles and blue squares (9 and 5 receivers, respectively) refer tosubsets of receivers we used to test the robustness of our method, as shownin Sect. 4.2. Finally, note that we consider the seismograms to be noiseless attraining time, while some noise is added to the simulated recordedevent when performing inference on the coordinates’ posterior dis-tribution. We will discuss and show the effect of the noise level inSect. 3.1 and Sect. 4.2, respectively.

Learning a mapping between coordinates and seismograms directlywould be hard for at least two reasons. First, each signal has fea-tures with different amplitudes: this means that e.g. a neural net-work (which will be described in detail in Sect. 3.2) would likelyjust focus on the main peak and ignore the other components, thuslosing useful information for the localisation purpose. Secondly,given the complexity of the seismograms and the high number offeatures, the amount of data required to train an accurate emulatorwithout overﬁtting would be at least an order of magnitude higherthan what we consider in this work (see e.g. Bishop, 2006; Zhuet al., 2015, and references therein).In this sense, we have to preprocess the seismograms in orderto extract only the relevant information that is needed to localisean event, while discarding all the noisy or redundant features of thesignal. Both D18 and SM20 showed the importance of preprocess-ing, employing GPs in order to select only the components of eachseismogram that are essential for inference. However, their meth-ods fail on more complicated sources like the ones we consider inthis work. While it could be argued that employing more trainingdata could improve the results, it is also well-known that GPs donot scale well with the number of training points (Liu et al., 2018),thus it is likely that the D18 method would struggle to generaliseto more complicated source mechanisms. Additionally, the methodproposed in SM20 is applied directly to the complicated seismictraces like the ones in the left panel of Fig. 3, thus making it moredifﬁcult for any algorithm to capture the most useful features forlocalisation. Therefore, we follow a different preprocessing proce-dure, based on translating the data to the Fourier domain, and weoutline the steps in the next paragraphs.The left panel of Fig. 4 shows the mean and standard deviationof all seismograms in the training set (8000 seismic traces - seeSect. 3.2 for more details): based on these distributions, we keep allcomponents of the DC and CLVD signals, and only keep the ﬁrsthalf of the ISO traces. In other words, we keep only N keep = 1000 time components for the ISO traces, and N keep = N t = 2001 timecomponents for the other mechanisms. D. Piras, A. Spurio Mancini, B. Joachimi, M. Hobson A m p li t u d e [ A U ] Isotropic (ISO) P o w e r s p e c t r u m [ A U ] ISO noiseless A m p li t u d e [ A U ] Double couple (DC) P o w e r s p e c t r u m [ A U ] DC noiseless

Time [s] A m p li t u d e [ A U ] Compensated linear vector dipole (CLVD)

Frequency [Hz] P o w e r s p e c t r u m [ A U ] CLVD noiseless

Figure 3.

Left column:

Example acoustic pressure wave for each different type of moment tensor: isotropic (ISO), double couple (DC) and compensatedlinear vector dipole (CLVD). These seismograms correspond to a source location of ( x, y, z ) = (0 .

55 km , .

73 km , . as recorded by a receiver in ( x, y, z ) = (0 .

13 km , .

38 km , .

43 km) . The seismograms’ amplitude is measured in arbitrary units of pressure. Note the different scales for each sourcemechanism. The dashed black line in the top panel indicates a cut we perform for the isotropic sources only, based on the left panel of Fig. 4.

Right column:

The corresponding power spectra, calculated as described in Sect. 3.1. No noise is added when training the emulator, while some noise is introduced whendoing inference, as described in Sec. 3.3. The dashed lines indicate a frequency cut we perform to further reduce the number of features and to be robust tonoise, based on the right panel of Fig. 4.

The next preprocessing step we implement is applying theone-dimensional discrete Fourier Transform (Cooley and Tukey,1965) to each seismogram, using the version of N UM P Y † . Since theamplitudes at each time component are real numbers, the FourierTransform returns ( (cid:98) N keep / (cid:99) + 1) frequency components: thismeans that for DC and CLVD sources we are left with 1001 compo-nents (501 in the ISO case) in the Fourier domain. We then take thesquare of the absolute value of these complex numbers: this is usu-ally referred to as a power spectrum. In the right column of Fig. 3we report the power spectra corresponding to the seismograms inthe left column of the same ﬁgure. We further take the decimal log-arithm of the power spectra at each frequency value, and refer to itas logarithmic power spectra in the rest of the paper.As anticipated, we shall add some noise to the observed seis-mogram whose coordinates will be inferred. In the right panel ofFig. 4 we show the mean power spectra for each source mecha-nism with and without noise, which will be described in detail inSect. 3.3. We calculate the ratio between the noiseless and the noisysignals, and ﬁlter out the frequencies for which this ratio is lessthan 0.99. We experimented with different thresholds, and chose0.99 as a good balance between retaining enough features to locatea seismogram and being insensitive to noise. In other words, we ad-ditionally cut each power spectrum in the ranges [1 Hz, . Hz ] , [0 Hz, . Hz ] and [0 Hz, . Hz ] for ISO, DC and CLVD, re-spectively, to keep the parts of each signal that are less affected bynoise. We observe that translating the seismograms to the Fourierdomain has allowed us to obtain smoother signals, as well as to † https://numpy.org/doc/stable/reference/routines.fft.html reduce the number of features by a factor of . Moreover, this al-lows our proposed method to be robust to noise: any effect due tonoise can be translated into some information of the noise power,and hence accounted for in the analysis. We will discuss this furtherin Sect. 3.3.To further reduce the number of features, we apply PrincipalComponent Analysis (PCA). PCA is a standard linear compressiontechnique where the data is projected along the eigenvalues of thedata covariance matrix. Considering only the components that carrymore variance (the so-called ‘principal components’, correspond-ing to the largest eigenvalues), it is possible to reduce the numberof features while maintaining the relevant information for infer-ence. We ﬁt PCA to the training data, and use it to compress thewhole dataset. After applying PCA to the logarithmic power spec-tra, we retain 10 principal components for each signal. We veriﬁedexperimentally that varying the number of retained PCA compo-nents does not impact the ﬁnal results signiﬁcantly. The following step consists of learning the mapping between co-ordinates and principal components by means of a neural network(NN). A neural network is a set of subsequent layers, each madeof a certain number of neurons, that allow for the parametrisa-tion of any measurable function between ﬁnite dimensional spaces(Hornik et al., 1989). Each neuron is associated to a weight, andeach layer is additionally associated to a bias (i.e. an offset):weights and biases constitute the parameters that we wish to learn.Additionally, activation functions can be introduced after each layerto model non-linear mappings. Training the neural network consists ast seismic inference using machine learning A m p li t u d e [ A U ] Isotropic (ISO) P o w e r s p e c t r u m [ A U ] ISO noiselessISO with noise A m p li t u d e [ A U ] Double couple (DC) P o w e r s p e c t r u m [ A U ] DC noiselessDC with noise

Time [s] A m p li t u d e [ A U ] Compensated linear vector dipole (CLVD)

Frequency [Hz] P o w e r s p e c t r u m [ A U ] CLVD noiselessCLVD with noise

Figure 4.

Left column:

Mean (blue line) and standard deviation (grey area) of all the seismic traces in the training set, for each source mechanism: isotropic(ISO), double couple (DC) and compensated linear vector dipole (CLVD). The seismograms’ amplitude is measured in arbitrary units of pressure. The trainingset is made of 8000 traces for each source mechanism. We cut the isotropic sources at . s, as indicated by the dashed black line. Right column:

Noiseless(blue) and noisy (coral) mean of the corresponding power spectra, calculated as described in Sect. 3.1. We consider a signal-to-noise ratio of 33 dB, asdescribed in Sect. 3.3. We ﬁlter the power spectra between the dashed lines selecting only the frequencies where the ratio between the noiseless and noisysignals is more than 99 % . of feeding some data through all the layers, and then updating thevalue of the parameters in order to optimise a chosen loss function.Our goal is to learn a mapping between coordinates and prin-cipal components. We employ a neural network made of threelayers with 256 neurons each, to provide enough ﬂexibility tothe parametrisation without consuming too much memory. We setLeaky ReLU (Maas et al., 2013) as the activation function for alllayers except the last one, where we keep a linear activation func-tion. We recall here that Leaky ReLU acts on the output of eachlayer O as follows: LeakyReLU( O ) = (cid:26) O if O > α O otherwise , (4)where we set the hyperparameter α = 0 . ; Leaky ReLU is usuallypreferred over the standard ReLU (rectiﬁed linear unit) because ofnon-vanishing gradients (Kolen and Kremer, 2001). We choose theMean Squared Error (MSE) between the network output and theprincipal components of the training data as our loss function tominimise. We note that the preprocessing steps applied to our log-arithmic power spectra correspond to the method ‘PCA+NN’ inSM20.For each source mechanism and each receiver, we train theemulator using 8000 traces; we reserve 1000 seismograms for val-idation purposes and 1000 seismograms for testing purposes. Totrain the neural network, we use the Adam optimiser (Kingma andBa, 2014) with default parameters; moreover, we choose a learningrate of . and a batch size of : the former controls the stepsize of the parameters’ update, while the latter indicates the numberof training points that are fed through the network at each iteration.We additionally set a patience of to early-stop (Yao et al., 2007)based on the validation loss: this means that if the loss calculatedon the validation set has not decreased in the last 50 epochs, we stop training and take the model corresponding to the minimumvalidation loss as the best model. The test set is used to randomlysample events on which we perform our Bayesian analysis. We alsoexplore the behaviour of the posterior distribution as a function ofthe number of training events, number of receivers and noise scale,which we show in Sect. 4.2. However, we do not perform a full gridsearch amongst the hyperparameters (e.g. number of layers, num-ber of neurons, activation function and learning rate), as we observethe results are not signiﬁcantly affected by them; we defer a morecomplete grid search to future work. Once the emulator has been trained, it can be used as the forwardmodel to perform Bayesian inference on the source location of amicroseismic event. We recall here the basic assumptions of ourBayesian analysis.Given a set of parameters θ (the coordinates, in our case), theirposterior distribution given some data D and some hypothesis H can be written using Bayes’ theorem (see e.g. Bishop, 2006): Pr ( θ | D , H ) = Pr ( D | θ , H ) Pr ( θ |H )Pr ( D |H ) , (5)which expresses the posterior distribution Pr ( θ | D , H ) as theproduct of the likelihood Pr ( D | θ , H ) and the parameters’ prior Pr ( θ |H ) , divided by the evidence Pr ( D |H ) . For the purposes ofinference, we will ignore this last term, as it is just a normalisationfactor independent of θ ; however, in Sect. 3.4 we show how theevidence can be used to perform model selection, which is anotheradvantage of working in a Bayesian framework. In order to sam-ple the posterior distribution of the source coordinates, we employ D. Piras, A. Spurio Mancini, B. Joachimi, M. Hobson A m p li t u d e [ A U ] ISO noiselessISO with noise A m p li t u d e [ A U ] DC noiselessDC with noise

Time [s] A m p li t u d e [ A U ] CLVD noiselessCLVD with noise

Figure 5.

Comparison of the signal without (blue) and with (coral) noise foreach source mechanism: isotropic (ISO), double couple (DC) and compen-sated linear vector dipole (CLVD). As explained in Sect. 3.3, after train-ing the emulator on the noiseless traces, we add Gaussian noise to theobserved signal to infer its coordinates. In this case, the seismogram cor-responds to event 3 in Table 1 as recorded by a receiver in ( x, y, z ) =(0 .

13 km , .

38 km , .

43 km) , and the signal-to-noise ratio (SNR) is 33dB; we explore higher and lower SNR values in Sect. 4.2 and Fig. 9.The dashed black line in the top panel indicates a cut we perform for theisotropic sources only, based on the left panel of Fig. 4. nested sampling (Skilling, 2006), as implemented in P Y M ULTI -N EST ‡ (Buchner et al., 2014), the Python interface to M ULTI N EST (Feroz et al., 2009). We choose nested sampling over Metropolis-Hastings sampling or other MCMC techniques as it generally con-verges faster (Allison and Dunkley, 2014) and provides an estimateof the evidence.We ﬁrst randomly choose an event’s coordinates from the testset. For this set of coordinates, we simulate the observation of amicroseismic event for each source mechanism, and generate thenoiseless trace as it would be recorded by each of the 23 receivers.We add random Gaussian noise to each component of the noiselesstrace. We deﬁne the signal-to-noise ratio (SNR) as (Li et al., 2018;Zhang et al., 2020):

SNR = 10 log (cid:80) N i (cid:80) N keep j s ij (cid:80) N i (cid:80) N keep j ( s ij − ˜ s ij ) , (6)where s ij refers to the j -th component of the i -th trace and ˜ s ij tothe corresponding noisy trace, N = 8000 is the number of trainingdata and N keep = 2001 (1000 in the ISO case) is the number oftime components. Following Li et al. (2018), we set SNR = 33 dB,which corresponds to a standard deviation of the Gaussian noiseof σ = 10.0, 0.3 and 3.5 for ISO, DC and CLVD, respectively, inthe same arbitrary units as the seismograms’ amplitude. We showexamples of noiseless and noisy signals in Fig. 5. ‡ https://github.com/JohannesBuchner/PyMultiNest We note that the choice of Gaussian noise has a quantiﬁableconsequence on the power spectra of the signals, as it is knownthat additive white noise has an expected constant power in Fourierspace (see e.g. Haykin, 2001; Papoulis et al., 2002). This is re-ﬂected on the right-hand side of Fig. 4, where the mean noisy signalis shifted up due to the noise addition. We argue that in general anyinformation about the noise power (even if more complicated thanGaussian noise) can be easily accounted for when preprocessingthe data, and this makes our proposed approach robust to noise, aswe will show in Sect. 4.2. Moreover, our approach lends itself tothe extension to coloured noise, which is most likely in realisticseismic data (Liu et al., 2017).The noisy seismogram is further preprocessed as described inSect. 3.1: ﬁrst, after retaining N keep components, it is translated tothe Fourier domain, and the power spectrum is calculated. Then,only the frequencies in the chosen windows are considered, andthe decimal logarithm of the power spectrum at each frequency iscomputed - this is what we refer to as the logarithmic power spec-trum. At each likelihood evaluation of P Y M ULTI N EST , the pro-posed coordinates are mapped to the principal components, andthen transformed to the corresponding logarithmic power spectrum.By evaluating the likelihood in multiple points of the prior space,P Y M ULTI N EST can sample from the posterior distribution of thecoordinates, thus yielding the required credibility regions in pa-rameter space. Note that, similarly to D18 and SM20, we assume aGaussian likelihood, without loss of generality as our method canbe easily extended to more complicated likelihood models. It isworth stressing that adding Gaussian noise to the seismograms doesnot necessarily imply that the distribution of the logarithmic powerspectra will also be Gaussian; however, we veriﬁed experimentallythat the distribution of each logarithmic power spectrum compo-nent is unimodal and symmetric, thus supporting our assumption.

As detailed in Sect. 3.3, we ignore the denominator in Eq. 5 wheninferring the coordinates of a microseismic event. However, wecan use the evidence

Pr ( D |H ) to perform model selection, thusshowing another advantage of our proposed Bayesian approach(see e.g. Knuth et al., 2015). The quantity Pr ( D |H ) can be in-terpreted as the likelihood of a given signal under a certain hy-pothesis, with the constraint that the hypotheses form a set of n hyp pairwise disjoint events whose union is the entire possibility space- i.e. Pr ( H i ∩ H j ) = 0 for i (cid:54) = j, ∀ i, j = { , . . . , n hyp } , and (cid:80) n hyp i =1 Pr ( H i ) = 1 .To compare two hypotheses, and thus perform model selec-tion, we deﬁne the Bayes factor BF as: BF = Pr ( D |H i )Pr ( D |H j ) = Pr ( H i | D ) Pr ( H j )Pr ( H j | D ) Pr ( H i ) , (7)where the second equality has been obtained using Bayes’ theo-rem, Pr ( H i ) is the prior distribution of the hypothesis H i , and Pr ( D |H i ) is the posterior distribution of hypothesis H i given D . If the hypotheses are equiprobable a priori , we can write Pr ( H i ) = Pr ( H j ) , which allows us to express the Bayes factoras the ratio of the posterior distribution of one hypothesis over theother. Hence, if the Bayes factor as deﬁned in Eq. 7 is greater than1, we can interpret it as hypothesis H i being more favoured thanhypothesis H j under the observed data D (Knuth et al., 2015).Translating this into practice, after training the emulators,given an observed signal D as described in Sect. 3.3, we can com- ast seismic inference using machine learning pare the three following equiprobable hypotheses: the source mech-anism is isotropic ( H ISO ), the source mechanism is double cou-ple ( H DC ), or the source mechanism is compensated linear vec-tor dipole ( H CLVD ). The advantage of using nested sampling isthat the evidence is calculated while sampling the posterior distri-bution. Consequently, by feeding the observation D to each em-ulator it is straightforward to obtain the evidences Pr ( D |H ISO ) , Pr ( D |H DC ) and Pr ( D |H CLVD ) . By looking at the hypothesisthat maximises the evidence, we can select the model that best de-scribes the given observation, thus identifying the source type for agiven observation. We show the results in Sect. 4.3 and Table 2. We ﬁrst report on the speed performance of our method. We recallhere that if we were to solve the elastic wave equation at each like-lihood evaluation, inference would be severely compromised, as asingle event’s source inversion would take thousands of hours on aHigh Performance Computing (HPC) cluster, if at all possible. Incontrast, our method requires only ∼ simulations to be pro-duced once - an overhead of O (100 h) - and then it allows for thecomplete source inversion of any event in O (1 h) on a commer-cial laptop. More importantly, perhaps, most of this time is spenttraining the emulator, which needs to be done only once as well,provided the density and velocity models remain unchanged § ; afterthe training is complete, performing inference on a given recordedseismogram takes O (0 . on the same commercial laptop. We then turn our attention to the accuracy of the inferred posteriordistribution of the coordinates. For each source mechanism, we re-port the inference results for 3 different coordinates in Figs. 6, 7and 8: each shows the posterior contour plots obtained with ourmethodology, considering all 23 receivers, SNR=33 dB and using8000 seismograms as the training set for the emulator. The numer-ical results are summarised in Table 1, reporting the prior rangesand marginalised mean and 68 percent credibility interval on thecoordinates. We note that with our method we can accurately re-trieve the correct value of the coordinates across the prior parame-ter space, as our results always match the events’ coordinates withinerror bars. Additionally, we note that the x and y coordinates areusually less constrained than the z coordinate for ISO and CLVD,while this behaviour is less prominent in the DC case, for whichall coordinates are always tightly constrained. We attribute this ef-fect to two possible causes. On one hand, it can be related to thespeciﬁc density model we are considering in this work, which hasa layered structure. On the other hand, we note that when translat-ing the seismic traces to the Fourier domain we ignored the phasesignal (since we only considered the power spectra), thus possiblylosing useful information for the localisation purpose (Ferreira andWoodhouse, 2007). We additionally note that retaining the phaseinformation would yield a generative model, as by combining thepower spectra with the phase one could in principle reconstruct a § In general, the stability of a given velocity model is not well known - seee.g. Thornton (2013); Usher et al. (2013); Gesret et al. (2013, 2014); Daset al. (2018) and references therein for a discussion on the uncertainties ofvelocity models, and their consequences on location errors.

ISO . . . y [ k m ] . . . x [km] . . z [ k m ] . . . y [km] . . z [km]DC . . . y [ k m ] . . . x [km] . . z [ k m ] . . . y [km] . . z [km]CLVD . . . y [ k m ] . . . x [km] . . z [ k m ] . . . y [km] . . z [km] Figure 6.

Marginalised 68 and 95 per cent credibility contoursobtained with our method for a source located at ( x, y, z ) =(0 .

71 km , .

25 km , .

10 km) , indicated by the black dashed lines. Wecompare 3 source mechanisms: isotropic (ISO), double couple (DC), andcompensated linear vector dipole (CLVD). The event corresponds to event1 in Table 1. Note that we are considering 23 receivers, the signal-to-noiseratio is 33 dB and the emulator was trained on 8000 training points.

D. Piras, A. Spurio Mancini, B. Joachimi, M. Hobson

Event Coordinate Prior range [km] Ground truth [km] ISO [km] DC [km] CLVD [km]1 x [0 ,

1] 0 .

71 0 . +0 . − . . +0 . − . . +0 . − . y [0 ,

1] 0 .

25 0 . +0 . − . . +0 . − . . +0 . − . z [0 , .

43] 2 .

10 2 . +0 . − . . +0 . − . . +0 . − . x [0 ,

1] 0 .

46 0 . +0 . − . . +0 . − . . +0 . − . y [0 ,

1] 0 .

34 0 . +0 . − . . +0 . − . . +0 . − . z [0 , .

43] 1 .

48 1 . +0 . − . . +0 . − . . +0 . − . x [0 ,

1] 0 .

20 0 . +0 . − . . +0 . − . . +0 . − . y [0 ,

1] 0 .

43 0 . +0 . − . . +0 . − . . +0 . − . z [0 , .

43] 0 .

99 0 . +0 . − . . +0 . − . . +0 . − . Table 1.

Prior range and marginalised mean and 68 percent credibility intervals on the coordinates ( x, y, z ) , for each source mechanism - isotropic (ISO),double couple (DC) and compensated linear vector dipole (CLVD). The three events are randomly sampled from the test set. These results are obtained byconsidering all 23 receivers, and training on 8000 simulated events. The noise level is set to 10.0, 0.3 and 3.5 respectively, which corresponds to a signal-to-noise ratio of dB. Event type ln Pr ( D |H ISO ) ln Pr ( D |H DC ) ln Pr ( D |H CLVD ) ISO − − − − − − − − Natural logarithm of the evidence for a source located at ( x, y, z ) = (0 .

71 km , .

25 km , .

10 km) and source mechanismisotropic (ISO), double couple (DC) and compensated linear vector dipole(CLVD), as described in Sect. 3.4 and Sect. 4.3. The hypotheses correspondto an ISO ( H ISO ), DC ( H DC ) and CLVD ( H CLVD ) source mechanism, re-spectively. We highlighted in bold the highest evidence in each line, whichcorrectly corresponds to the known source mechanism. Note that the nat-ural logarithm of the evidence is returned by P Y M ULTI N EST , and in thisinstance we ignored its associated error (which is very small). full seismogram from the coordinates (after training the emulator).We defer the study of phase information to future work, as we an-ticipate that given the oscillatory behaviour of the phase signals itwill be harder to train an emulator on them.We then study the dependence of the posterior contours on theSNR, the number of training data and the number of receivers used.In Fig. 9, we ﬁrst show the effect of different noise levels; in partic-ular, for each source mechanism we vary the SNR from 13 dB to 54dB. We note that, in order to increase the robustness to noise, we cutdifferent frequency windows based on the noise levels: given theGaussian noise model we assume in this work, a smaller SNR cor-responds to a higher noise power, and hence to a smaller number ofretained power spectrum components. We observe that while highernoise levels can lead to small biases in some of the 1D marginalisedcoordinates’ distributions, in general no signiﬁcant variations in theshape of the 2-D posterior contours are present.In Fig. 10 we show the posterior contours when training theemulator with 2000, 5000 and 8000 data points. Again, we observeno signiﬁcant differences for the ISO and CLVD sources, whilevery small deviations appear when using fewer training data in theDC case. In general, O (10 ) training data are enough to obtainaccurate posterior contours for all source mechanisms. Finally, inFig. 11 we vary the geometry of the receivers used for recording themicroseismic traces. While we are not interested in a full study ofthe optimal geometry of the receivers, we observe that using fewerreceivers leads to broader posterior contours, while still allowingfor the accurate localisation of the event for all source mechanisms.In summary, our results are very robust to the noise injected into the observed seismogram. Additionally, very few receivers - O (10) - are needed to obtain accurate results, and a number oftraining points of order O (10 ) is sufﬁcient to localise any event. In general, a real event can be described as a linear combination ofthe three source mechanisms described in Sect. 2 (Vavryˇcuk, 2015).While we considered the three sources separately in this work, weshow how the proposed Bayesian approach additionally allows forthe identiﬁcation of a source type given an observed seismogram.We consider event 1 as reported in Table 1, and produce an obser-vation for each source mechanism (ISO, DC and CLVD). As de-scribed in Sect. 3.4, we can run nested sampling for each event andfor every trained emulator, and obtain 9 evidence values in total,whose logarithm we report in Table 2. As expected, the evidenceis maximal in correspondence of the source type that generatedthe given event, which indicates that we are capable of correctlyidentifying the source type for a given observation. What is more,this selection is also very fast, as after training the emulators eachevidence calculation takes less than 10 minutes on a commerciallaptop.

In this paper, we proposed a method that allows for the fast andaccurate retrieval of the coordinates for any microseismic sourcemechanism: isotropic (ISO), double couple (DC), and compensatedlinear vector dipole (CLVD). This offers an efﬁcient technique toboth localise an event and identify its source type, exploiting thepower of machine learning and Bayesian tools to extract the infor-mation contained in the seismic waveforms.Our proposed method is based on a physically motivated pre-processing of the raw signals, using Fourier analysis and principalcomponent compression, followed by the use of a neural network tolearn the mapping between coordinates and principal components.Using the learnt forward model in combination with Bayesian tech-niques, we showed that we can retrieve an accurate estimate of anymicroseismic event coordinates, for any source mechanism, in lessthan hour on a commercial laptop. Therefore, we demonstratedfor the ﬁrst time that machine learning techniques allow for a fastand accurate Bayesian analysis on microseismic traces, yielding ast seismic inference using machine learning ISO . . . y [ k m ] . . . x [km] . . z [ k m ] . . . y [km] . . z [km]DC . . . y [ k m ] . . . x [km] . . z [ k m ] . . . y [km] . . z [km]CLVD . . . y [ k m ] . . . x [km] z [ k m ] . . . y [km] z [km] Figure 7.

Same as Fig. 6 for a source located at ( x, y, z ) =(0 .

46 km , .

34 km , .

48 km) . The event corresponds to event 2 in Ta-ble 1.

ISO . . . y [ k m ] . . . x [km] . . z [ k m ] . . . y [km] . . z [km]DC . . . y [ k m ] . . . x [km] . . z [ k m ] . . . y [km] . . z [km]CLVD . . . y [ k m ] . . . x [km] . . . z [ k m ] . . . y [km] . . . z [km] Figure 8.

Same as Fig. 6 for a source located at ( x, y, z ) =(0 .

20 km , .

43 km , .

99 km) . The event corresponds to event 3 in Ta-ble 1. D. Piras, A. Spurio Mancini, B. Joachimi, M. Hobson

ISO DC CLVD

Figure 9.

Marginalised 68 per cent credibility contours obtained with our method for a source located at ( x, y, z ) = (0 .

71 km , .

25 km , .

10 km) , indicatedby the black dashed lines, comparing different levels of noise. In particular, in each panel we show signal-to-noise ratios of 13 dB, 33 dB and 54 dB for the 3source mechanisms: isotropic (ISO), double couple (DC), and compensated linear vector dipole (CLVD). Note that we are considering 8000 training data forthe emulator and 23 receivers.

ISO DC CLVD

Figure 10.

Marginalised 68 per cent credibility contours obtained with our method for a source located at ( x, y, z ) = (0 .

20 km , .

43 km , .

99 km) ,indicated by the black dashed lines, comparing different numbers of training data for the emulator (2000, 5000 and 8000). We show these results for 3source mechanisms: isotropic (ISO), double couple (DC), and compensated linear vector dipole (CLVD). Note that we are considering 23 receivers and asignal-to-noise ratio of 33 dB.

ISO DC CLVD

Figure 11.

Marginalised 68 per cent credibility contours obtained with our method for a source located at ( x, y, z ) = (0 .

46 km , .

34 km , .

48 km) , indicatedby the black dashed lines, comparing different dispositions of the receivers. In particular, 5 receivers refer to the blue squares in Fig. 2, and 9 receivers referto the green circles in Fig. 2. We show these results for 3 source mechanisms: isotropic (ISO), double couple (DC), and compensated linear vector dipole(CLVD). Note that we are considering 8000 training data for the emulator and a signal-to-noise ratio of 33 dB. ast seismic inference using machine learning competitive results on ISO sources and state-of-the-art results onDC and CLVD sources.We showed that O (10 ) events for each source mechanism areenough to train a representative emulator, when using the data com-ing from O (10) receivers placed at the seabed as indicated in Fig. 2.We also explored the effect of the noise level, and how the numberof receivers and the number of training data for the emulator impactthe accuracy of the coordinates’ posterior distribution, demonstrat-ing the robustness of our approach. Finally, we demonstrated theutility of our Bayesian approach by calculating the Bayesian ev-idence for a given observation and three hypotheses, and showedthat this correctly identiﬁes the source type of any given event.In conclusion, our work lays the foundations for the fast andreliable localisation of any microseismic event, given a minimalamount of computing resources. Some straightforward extensionsof our method include the following three points. First, we note thatin order for this method to be deployed in a realistic scenario, thenoise associated with the recorded sesimograms should be mod-elled more carefully: a lower SNR may have to be considered,a more complicated likelihood distribution might have to be im-plemented, or a “likelihood-free” approach should be investigated(Sunn˚aker et al., 2013). Second, the errors in the 3-D density andvelocity models should be incorporated into the analysis, in orderto account for all sources of uncertainty (Gesret et al., 2013, 2014).Last, a real microseismic event is in general described by a linearmixture of components of the moment tensor, which we consideredseparately in this work. This increases the total number of parame-ters to infer, which we anticipate will require a larger dataset to trainthe emulator. However, we note that our proposed method scaleswell with the number of training data, and therefore we speculatethat performing a Bayesian analysis with a larger parameter spaceis an attainable goal using our approach. This will be addressed infuture work, while the code to reproduce this work will be madepublicly available in the GitHub repository at this link ¶ upon ac-ceptance of the paper. ACKNOWLEDGMENTS

We thank Saptarshi Das and Ana Ferreira for useful discussions.DP is supported by the STFC UCL Centre for Doctoral Trainingin Data Intensive Science. Generation of the synthetic data used inthis work has been performed in part on the Wilkes High Perfor-mance GPU computer cluster at the University of Cambridge, andin part on the Beaker cluster at UCL. This work has been partiallyenabled by funding from Royal Dutch Shell plc and the UCL Cos-moparticle Initiative. We acknowledge the use of N UM P Y (Harriset al., 2020), M ATPLOTLIB (Hunter, 2007), T

ENSOR F LOW (Abadiet al., 2015), S CI P Y (Virtanen et al., 2020), and C HAIN C ONSUMER (Hinton, 2016).

References

Abadi, M., Agarwal, A., Barham, P., Brevdo, E., Chen, Z., Citro,C., Corrado, G. S., Davis, A., Dean, J., Devin, M., Ghemawat, S.,Goodfellow, I., Harp, A., Irving, G., Isard, M., Jia, Y., Jozefow-icz, R., Kaiser, L., Kudlur, M., Levenberg, J., Man´e, D., Monga,R., Moore, S., Murray, D., Olah, C., Schuster, M., Shlens, J., ¶ https://github.com/alessiospuriomancini/seismoML Steiner, B., Sutskever, I., Talwar, K., Tucker, P., Vanhoucke, V.,Vasudevan, V., Vi´egas, F., Vinyals, O., Warden, P., Wattenberg,M., Wicke, M., Yu, Y., and Zheng, X. (2015). TensorFlow:Large-scale machine learning on heterogeneous systems. Soft-ware available from tensorﬂow.org.Allison, R. and Dunkley, J. (2014). Comparison of sampling tech-niques for Bayesian parameter estimation.

Mon. Not. Roy. As-tron. Soc. , 437(4):3918–3928.Alsing, J., Wandelt, B., and Feeney, S. (2018). Massive optimaldata compression and density estimation for scalable, likelihood-free inference in cosmology.

Mon. Not. Roy. Astron. Soc. ,477(3):2874–2885.Angus, D., Aljaafari, A., Usher, P., and Verdon, J. (2014). Seismicwaveforms and velocity model heterogeneity: Towards a full-waveform microseismic location algorithm.

Journal of AppliedGeophysics , 111:228 – 233.Auld, T., Bridges, M., Hobson, M., and Gull, S. (2007). Fast cos-mological parameter estimation using neural networks.

Mon.Not. Roy. Astron. Soc. , 376:L11–L15.Auld, T., Bridges, M., and Hobson, M. P. (2008). cosmonet: fastcosmological parameter estimation in non-ﬂat models using neu-ral networks.

Monthly Notices of the Royal Astronomical Soci-ety , 387(4):1575–1582.Bishop, C. M. (2006).

Pattern Recognition and Machine Learning(Information Science and Statistics) . Springer-Verlag, Berlin,Heidelberg.Brueckl, E., Binder, D., Hausmann, H., and Mertl, S. (2008). Haz-ard Estimation of Deep Seated Mass Movements by Microseis-mic Monitoring. In

International Strategy for Disaster Reduc-tion .Buchner, J., Georgakakis, A., Nandra, K., Hsu, L., Rangel, C.,Brightman, M., Merloni, A., Salvato, M., Donley, J., and Ko-cevski, D. (2014). X-ray spectral modelling of the AGN obscur-ing region in the CDFS: Bayesian model selection and catalogue.AAP, 564:A125.Chenthamarakshan, V., Das, P., Hoffman, S. C., Strobelt, H.,Padhi, I., Lim, K. W., Hoover, B., Manica, M., Born, J., Laino,T., and Mojsilovic, A. (2020). CogMol: Target-Speciﬁc andSelective Drug Design for COVID-19 Using Deep GenerativeModels. arXiv e-prints , page arXiv:2004.01215.Collettini, C. and Barchi, M. R. (2002). A low-angle normal faultin the Umbria region (Central Italy): a mechanical model for therelated microseismicity.

Tectonophysics , 359(1):97 – 115.Conrad, P. R., Marzouk, Y. M., Pillai, N. S., and Smith, A. (2016).Accelerating Asymptotically Exact MCMC for ComputationallyIntensive Models via Local Approximations.

Journal of theAmerican Statistical Association , 111(516):1591–1607.Cooley, J. W. and Tukey, J. W. (1965). An Algorithm for theMachine Calculation of Complex Fourier Series.

Math. Comput. ,19:297–301.Craiu, R. V. and Rosenthal, J. S. (2014). Bayesian Computationvia Markov Chain Monte Carlo.

Annual Review of Statistics andIts Application , 1(1):179–201.Das, S., Chen, X., and Hobson, M. P. (2017). Fast GPU-BasedSeismogram Simulation From Microseismic Events in MarineEnvironments Using Heterogeneous Velocity Models.

IEEETransactions on Computational Imaging , 3(2):316–329.Das, S., Chen, X., Hobson, M. P., Phadke, S., van Beest, B.,Goudswaard, J., and Hohl, D. (2018). Surrogate regression mod-elling for fast seismogram generation and detection of micro-seismic events in heterogeneous velocity models.

GeophysicalJournal International , 215(2):1257–1290. D. Piras, A. Spurio Mancini, B. Joachimi, M. Hobson

Ellsworth, W. L. (2013). Injection-induced earthquakes.

Science ,341(6142).Feroz, F., Hobson, M., and Bridges, M. (2009). MultiNest: anefﬁcient and robust Bayesian inference tool for cosmology andparticle physics.

Mon. Not. Roy. Astron. Soc. , 398:1601–1614.Ferreira, A. M. G. and Woodhouse, J. H. (2007). Source, path andreceiver effects on seismic surface waves.

Geophysical JournalInternational , 168(1):109–132.Fertitta, G., Stefano, A. D., Fiscelli, G., and Giaconia, G. C.(2010). A low power and high resolution data logger for sub-marine seismic monitoring.

Microprocessors and Microsystems ,34(2):63 – 72.Geiger, L. (1910). Herdbestimmung bei Erdbeben aus den Ankun-ftszeiten, Nachrichten der K. Gesellschaft der Wissenschaften zuGottingen.

Math.-Phys. Klasse , pages 331–349.Gesret, A., Desassis, N., Noble, M., Romary, T., and Maisons,C. (2014). Propagation of the velocity model uncertainties tothe seismic event location.

Geophysical Journal International ,200(1):52–66.Gesret, A., Noble, M., Desassis, N., and Romary, T. (2013). Mi-croseismic Monitoring - Consequences of Velocity Model Un-certainties on Event Location Uncertainties.

Proceedings of theThird Passive Seismic Workshop, Eur. Ass. of Geoscientists andEngineers, Athens - Greece (2011) .Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., and Bengio, Y. (2014).Generative Adversarial Nets. In Ghahramani, Z., Welling, M.,Cortes, C., Lawrence, N. D., and Weinberger, K. Q., editors,

Advances in Neural Information Processing Systems 27 , pages2672–2680. Curran Associates, Inc.Gulrajani, I., Kumar, K., Ahmed, F., Taiga, A. A., Visin, F.,Vazquez, D., and Courville, A. (2016). PixelVAE: A La-tent Variable Model for Natural Images. arXiv e-prints , pagearXiv:1611.05013.Harris, C. R., Millman, K. J., van der Walt, S. J., Gommers, R.,Virtanen, P., Cournapeau, D., Wieser, E., Taylor, J., Berg, S.,Smith, N. J., Kern, R., Picus, M., Hoyer, S., van Kerkwijk, M. H.,Brett, M., Haldane, A., del R’ıo, J. F., Wiebe, M., Peterson, P.,G’erard-Marchant, P., Sheppard, K., Reddy, T., Weckesser, W.,Abbasi, H., Gohlke, C., and Oliphant, T. E. (2020). Array pro-gramming with NumPy.

Nature , 585(7825):357–362.Haykin, S. (2001).

Communication Systems . Wiley.Hinton, S. R. (2016). ChainConsumer.

The Journal of OpenSource Software , 1:00045.Hornik, K., Stinchcombe, M., and White, H. (1989). Multilayerfeedforward networks are universal approximators.

Neural Net-works , 2(5):359 – 366.Hunter, J. D. (2007). Matplotlib: A 2d graphics environment.

Computing in Science & Engineering , 9(3):90–95.Kasim, M. F., Watson-Parris, D., Deaconu, L., Oliver, S., Hatﬁeld,P., Froula, D. H., Gregori, G., Jarvis, M., Khatiwala, S., Kore-naga, J., Topp-Mugglestone, J., Viezzer, E., and Vinko, S. M.(2020). Building high accuracy emulators for scientiﬁc simula-tions with deep neural architecture search.Kingma, D. P. and Ba, J. (2014). Adam: A Method for StochasticOptimization.Knopoff, L. and Randall, M. J. (1970). The compensated linear-vector dipole: A possible mechanism for deep earthquakes.

Jour-nal of Geophysical Research (1896-1977) , 75(26):4957–4963.Knuth, K. H., Habeck, M., Malakar, N. K., Mubeen, A. M., andPlacek, B. (2015). Bayesian evidence and model selection.

Dig-ital Signal Processing , 47:50 – 67. Special Issue in Honour of William J. (Bill) Fitzgerald.Kolen, J. F. and Kremer, S. C. (2001).

Gradient Flow in RecurrentNets: The Difﬁculty of Learning LongTerm Dependencies , pages237–243. Wiley-IEEE Press.Li, H., Wang, R., and Cao, S. (2015). Microseismic forward mod-eling based on different focal mechanisms used by the seismicmoment tensor and elastic wave equation.

Journal of Geophysicsand Engineering , 12(2):155–166.Li, J., Ji, S., Li, Y., Qian, Z., and Lu, W. (2018). Downholemicroseismic signal-to-noise ratio enhancement via strip match-ing shearlet transform.

Journal of Geophysics and Engineering ,15(2):330–337.Liu, E., Zhu, L., Govinda Raj, A., McClellan, J. H., Al-Shuhail,A., Kaka, S. I., and Iqbal, N. (2017). Microseismic events en-hancement and detection in sensor arrays using autocorrelation-based ﬁltering.

Geophysical Prospecting , 65(6):1496–1509.Liu, H., Ong, Y.-S., Shen, X., and Cai, J. (2018). When GaussianProcess Meets Big Data: A Review of Scalable GPs. arXiv e-prints , page arXiv:1807.01065.Lomax, A., Virieux, J., Volant, P., and Berge-Thierry, C. (2000).

Probabilistic Earthquake Location in 3D and Layered Models ,pages 101–134. Springer Netherlands, Dordrecht.Maas, A. L., Hannun, A. Y., and Ng, A. Y. (2013). Rectiﬁer non-linearities improve neural network acoustic models. In in ICMLWorkshop on Deep Learning for Audio, Speech and LanguageProcessing .Majer, E. L., Baria, R., Stark, M., Oates, S., Bommer, J., Smith,B., and Asanuma, H. (2007). Induced seismicity associated withEnhanced Geothermal Systems.

Geothermics , 36(3):185 – 222.Mukuhira, Y., Asanuma, H., Ito, T., and H¨aring, M. O. (2016).Physics-based seismic evaluation method: Evaluating possibleseismic moment based on microseismic information due to ﬂuidstimulation.

GEOPHYSICS , 81(6):KS195–KS205.Noack, M. M. and Clark, S. (2017). Acoustic wave and eikonalequations in a transformed metric space for various types ofanisotropy.

Heliyon , 3(3):e00260.Panahi, S. S., Ventosa, S., Cadena, J., Manuel, A., Bermudez, T.,Sallares, V., Piera, J., and Del Rio, J. (2005). A Low PowerDatalogger based on Compactﬂash memory for Ocean BottomSeismometers (obs). In , volume 2, pages1278–1281.Papoulis, A., Pillai, S., and Pillai, S. (2002).

Probability, RandomVariables, and Stochastic Processes . McGraw-Hill electrical andelectronic engineering series. McGraw-Hill.Pugh, D. J., White, R. S., and Christie, P. A. F. (2016). A Bayesianmethod for microseismic source inversion.

Geophysical JournalInternational , 206(2):1009–1038.Rajaratnam, B. and Sparks, D. (2015). MCMC-Based Inferencein the Era of Big Data: A Fundamental Analysis of the Conver-gence Complexity of High-Dimensional Chains. arXiv e-prints ,page arXiv:1508.00947.Rasmussen, C. E. and Williams, C. K. I. (2005).

Gaussian Pro-cesses for Machine Learning (Adaptive Computation and Ma-chine Learning) . The MIT Press.Shapiro, S. A., Dinske, C., Langenbruch, C., and Wenzel, F.(2010). Seismogenic index and magnitude probability of earth-quakes induced during reservoir ﬂuid stimulations.

The LeadingEdge , 29(3):S. 304–309.Skilling, J. (2006). Nested sampling for general Bayesian compu-tation.

Bayesian Analysis , 1(4):833–859.Smith, J. D., Azizzadenesheli, K., and Ross, Z. E. (2020). ast seismic inference using machine learning EikoNet: Solving the Eikonal equation with Deep Neural Net-works. arXiv e-prints , page arXiv:2004.00361.Spurio Mancini, A., Piras, D., Hobson, M. P., and Joachimi, B.(2020). Deep generative models for accelerated Bayesian pos-terior inference of microseismic events. arXiv e-prints , pagearXiv:2009.06758.St¨ahler, S. C. and Sigloch, K. (2014). Fully probabilistic seismicsource inversion – part 1: Efﬁcient parameterisation.

Solid Earth ,5(2):1055–1069.St¨ahler, S. C. and Sigloch, K. (2016). Fully probabilistic seismicsource inversion – part 2: Modelling errors and station covari-ances.

Solid Earth , 7(6):1521–1536.Sunn˚aker, M., Busetto, A. G., Numminen, E., Corander, J., Foll,M., and Dessimoz, C. (2013). Approximate Bayesian Computa-tion.

PLOS Computational Biology , 9(1):1–10.Tarantola, A. (2005).

Inverse Problem Theory and Methods forModel Parameter Estimation . Society for Industrial and AppliedMathematics.Thornton, M. (2013). Velocity uncertainties in surface and down-hole monitoring.

Conference Proceedings, 4th EAGE PassiveSeismic Workshop .Treeby, B. E., Jaros, J., Rohrbach, D., and Cox, B. T. (2014).Modelling elastic wave propagation using the k-Wave MATLABToolbox. In ,pages 146–149.Usher, P., Angus, D., and Verdon, J. (2013). Inﬂuence of a velocitymodel and source frequency on microseismic waveforms: someimplications for microseismic locations.

Geophysical Prospect-ing , 61(s1):334–345.Vasco, D. W., Nakagawa, S., Petrov, P., and Newman, G. (2019).Rapid estimation of earthquake locations using waveform trav-eltimes.

Geophysical Journal International , 217(3):1727–1741.Vavryˇcuk, V. (2015). Moment tensor decompositions revisited.

Journal of Seismology , 19(1):231–252.Vavryˇcuk, V. (2001). Inversion for parameters of tensileearthquakes.

Journal of Geophysical Research: Solid Earth ,106(B8):16339–16355.Vavryˇcuk, V. (2005). Focal mechanisms in anisotropic media.

Geophysical Journal International , 161(2):334–346.Virtanen, P., Gommers, R., Oliphant, T. E., Haberland, M., Reddy,T., Cournapeau, D., Burovski, E., Peterson, P., Weckesser, W.,Bright, J., van der Walt, S. J., Brett, M., Wilson, J., Millman,K. J., Mayorov, N., Nelson, A. R. J., Jones, E., Kern, R., Larson,E., Carey, C. J., Polat, ˙I., Feng, Y., Moore, E. W., VanderPlas,J., Laxalde, D., Perktold, J., Cimrman, R., Henriksen, I., Quin-tero, E. A., Harris, C. R., Archibald, A. M., Ribeiro, A. H., Pe-dregosa, F., van Mulbregt, P., and SciPy 1.0 Contributors (2020).SciPy 1.0: Fundamental Algorithms for Scientiﬁc Computing inPython.

Nature Methods , 17:261–272.Wuestefeld, A., Greve, S. M., N¨asholm, S. P., and Oye, V. (2018).Benchmarking earthquake location algorithms: A synthetic com-parison.

Geophysics , 83(4):KS35–KS47.Yao, Y., Rosasco, L., and Caponnetto, A. (2007). On Early Stop-ping in Gradient Descent Learning.

Constructive Approximation ,26:289–315.Zhang, J., Dong, L., and Xu, N. (2020). Noise Suppression ofMicroseismic Signals via Adaptive Variational Mode Decom-position and Akaike Information Criterion.

Applied Sciences ,10(11):3790.Zhu, X., Vondrick, C., Fowlkes, C., and Ramanan, D. (2015).Do We Need More Training Data? arXiv e-printsarXiv e-prints