[PDF] A Novel Approach for Earthquake Early Warning System Design using Deep Learning Techniques

Abstract

Earthquake signals are non-stationary in nature and thus in real-time, it is difficult to identify and classify events based on classical approaches like peak ground displacement, peak ground velocity. Even the popular algorithm of STA/LTA requires extensive research to determine basic thresholding parameters so as to trigger an alarm. Also, many times due to human error or other unavoidable natural factors such as thunder strikes or landslides, the algorithm may end up raising a false alarm. This work focuses on detecting earthquakes by converting seismograph recorded data into corresponding audio signals for better perception and then uses popular Speech Recognition techniques of Filter bank coefficients and Mel Frequency Cepstral Coefficients (MFCC) to extract the features. These features were then used to train a Convolutional Neural Network(CNN) and a Long Short Term Memory(LSTM) network. The proposed method can overcome the above-mentioned problems and help in detecting earthquakes automatically from the waveforms without much human intervention. For the 1000Hz audio data set the CNN model showed a testing accuracy of 91.1% for 0.2-second sample window length while the LSTM model showed 93.99% for the same. A total of 610 sounds consisting of 310 earthquake sounds and 300 non-earthquake sounds were used to train the models. While testing, the total time required for generating the alarm was approximately 2 seconds which included individual times for data collection, processing, and prediction taking into consideration the processing and prediction delays. This shows the effectiveness of the proposed method for Earthquake Early Warning (EEW) applications. Since the input of the method is only the waveform, it is suitable for real-time processing, thus the models can also be used as an onsite EEW system requiring a minimum amount of preparation time and workload.

Full PDF

AA N

OVEL A PPROACH FOR E ARTHQUAKE E ARLY W ARNING S YSTEM D ESIGN USING D EEP L EARNING T ECHNIQUES

Tonumoy Mukherjee

Advanced Technology Developement CenterIndian Institute of Technology KharagpurKharagpur 721302, India

[email protected]

Chandrani Singh

Geology & Geophysics DepartmentIndian Institute of Technology KharagpurKharagpur 721302, India [email protected]

Prabir Kumar Biswas

Electronics & Electrical Communication DepartmentIndian Institute of Technology KharagpurKharagpur 721302, India [email protected]

January 19, 2021 A BSTRACT

Earthquake signals are non-stationary in nature and thus in real-time, it is difﬁcult to identify andclassify events based on classical approaches like peak ground displacement, peak ground velocity.Even the popular algorithm of STA/LTA requires extensive research to determine basic thresholdingparameters so as to trigger an alarm. Also, many times due to human error or other unavoidable naturalfactors such as thunder strikes or landslides, the algorithm may end up raising a false alarm. This workfocuses on detecting earthquakes by converting seismograph recorded data into corresponding audiosignals for better perception and then uses popular Speech Recognition techniques of Filter bankcoefﬁcients and Mel Frequency Cepstral Coefﬁcients (MFCC) to extract the features. These featureswere then used to train a Convolutional Neural Network(CNN) and a Long Short Term Memory(LSTM) network. The proposed method can overcome the above-mentioned problems and help indetecting earthquakes automatically from the waveforms without much human intervention. For the1000Hz audio data set the CNN model showed a testing accuracy of 91.1% for 0.2-second samplewindow length while the LSTM model showed 93.99% for the same. A total of 610 sounds consistingof 310 earthquake sounds and 300 non-earthquake sounds were used to train the models. Whiletesting, the total time required for generating the alarm was 1.68 seconds which included individualtimes for data collection, processing, and prediction. Taking into consideration the processing andprediction delays, the total time is thus considered to be approximately 2 seconds. This shows theeffectiveness of the proposed method for EEW applications. Since the input of the method is onlythe waveform, it is suitable for real-time processing, thus, the models can very well be used also asan onsite earthquake early warning system requiring a minimum amount of preparation time andworkload.

Earthquakes have been an integral part of the planet earth since time immemorial. Any movement in the tectonic platesof the earth’s crust releases massive amounts of energy which passes through the earth’s surface as seismic wavesand results in mild to tremendous shaking. This whole process although seems to be very long but happens within afew minutes. As a result, the time to respond to these occurrences have increased many folds. Various scales of 1-10have been adopted as the magnitude indicator of earthquakes. For addressing a wider range of earthquake sizes, themoment magnitude scale, abbreviated M W , is preferred and is applicable globally [1]. Magnitudes of earthquakes a r X i v : . [ ee ss . SP ] J a n PREPRINT - J

ANUARY

19, 2021are exponential. To put it simply, for each whole number that we go up on a magnitude scale, the amplitude of theground motion goes up by a factor of 10 when recorded by a seismograph [2]. Thus, by using this scale as a reference,it is realized that the level of ground shaking caused by a magnitude 2 earthquake would be ten times more than amagnitude 1 earthquake (and 32 times as much energy would be released). To put that into context, if a magnitude1 earthquake releases as much energy as blowing up 6 ounces of TNT, a magnitude 8 earthquake would release asmuch energy as detonating 6 million tons of TNT. Major earthquakes can cause signiﬁcant damage to life as well asproperty. Preventing an earthquake from occurring is an illogical thing to do. We need to focus on how to mitigate thedevastation caused by these events. For that, we need some robust and reliable prediction methods. There were a fewsuccessful predictions. The Haicheng earthquake(1975) is a perfect example of a successful prediction. Precursorsfor this event included a foreshock sequence, peculiarity in animal behavior, and anomalies like geodetic deformation,groundwater level differences[3]. But these abnormalities were not present in other major earthquakes. Hence there isno generic precursor for earthquakes. This is where modern approaches of machine learning and deep learning comesinto play. Although researches in seismology using machine learning and deep learning are quite limited, with thehuge amount of data that are accessible to researchers, good researches like foreshock identiﬁcation in real-time usingdeep learning [4], Earthquake detection and location using convolutional neural network [5] and Machine LearningSeismic Wave Discrimination [6] have been done. Thus having an idea about the occurrence of an earthquake asearly as possible will result in developing a certain amount of alertness to respond to such events with ease. Unlikevarious methods of detecting an earthquake such as STA/LTA (Short-Term Average/Long-Term Average) algorithm asshown in Figure: 1 [7], another possible way of detecting and classifying earthquakes using their sounds, have beenexplored in this research. Sound is an effect produced by anything physical including earthquakes [8] and is one ofthe most common effects reported during or immediately after the felt tremors caused by them[9]. An earthquake ofmagnitude 5, will have a sound or vibration different from another earthquake whose magnitude is 4 [10]. Also, similarmagnitude earthquakes should have similar vibrations or sounds that are indistinguishable by us but can be picked upvery efﬁciently by neural networks.In this paper, two types of machine learning models were used to train a system with data consisting of earthquake andnon-earthquake sounds. Convolutional Neural Network (CNN) and Long Short Term Memory (LSTM) models wereused for the purpose. The former model showed a testing accuracy of 91.102% for a 0.2-second audio data sample andthe latter showed a testing accuracy of 93.999% for the same.Figure 1: Figure showing a 6.8 magnitude earthquake acceleration signal. The red line represents the ﬁrst arrival ofP-wave detected by STA/LTA algorithm marked by the letter ’P’ and the S-wave arrival marked by the letter ’S’.

With the rapid increase in the seismic data quantity, major challenges are faced by modern seismology in the ﬁelds ofdata analyzing and processing techniques. Most of the many popular techniques that are used in major data centers dateback to the time when the amount of seismic data was small and the computational power was limited.Today with the advancements in the ﬁelds of machine learning and deep learning, scientists and researchers can veryeasily extract useful information from voluminous data as they provide a large collection of tools to work with. Oncetrained with sufﬁcient data, deep learning models just like humans, can acquire their knowledge by extracting featuresfrom raw data [11] to recognize natural objects and make expert-level decisions in various disciplines. Besides, the highcomputational costs for training such networks are balanced by their low-cost online operation [5]. Advantages likethese, make deep learning suitable for applications in real-time seismology and earthquake early warning (EEW).This paper would look into two commonly used and widely known Deep learning models, CNN and LSTM respectively2

PREPRINT - J

ANUARY

19, 2021 I N P U T S I G NA L F E A T UR ES ConvLayer 116Filters(3x3) ConvLayer 364Filters(3x3)ConvLayer 232Filters(3x3) ConvLayer 4128Filters(3x3) M a x p oo l L a y e r DenseLayer 1128Neurons DenseLayer 164Neurons DenseLayer 12Neurons

Class 1 ProbClass 2 Prob (a) Block Diagram of the CNN Model used (b) 16 Filters of the FirstConvolutional Layer ar-ranged in a 4X4 Matrix

Figure 2: CNN Block Diagram with First Convolutional Layer Filtersand the accuracy achieved by them in detecting and classifying earthquakes from their sounds. The architecture of boththe networks would be looked at in detail in the following sections.

The proposed architecture for the CNN model has 4 convolutional layers, 1 maxpool layer, and 3 dense layers. 2bshows the ﬁrst layer of the 4 convolutional layers consisting of 16 ﬁlters built with 3x3 convolution with ’Relu’ as theactivation function and 1x1 stride. All the parameters for the second, third, and fourth layers remain the same except forthe number of ﬁlters in each layer multiplies two times with the number of ﬁlters in the previous layer. Or in otherwords, 16 ﬁlters in the ﬁrst layer, 32 ﬁlters in the second layer, 64 in the third and 128 in the ﬁnal layer (shown in 2a).The idea behind increasing the number of ﬁlters in each layer is to be more speciﬁc about the features as the data startsto convolve through each layer. A kernel of 2x2 has been used for the maxpool layer. The three dense layers after themaxpool layer consist of 128, 64, and 2 neurons so as to pool down the features for the ﬁnal 2 class classiﬁcation. Theﬁrst two dense layers use ’Relu’ as their activation function whereas the last dense layer uses ’Softmax’ as its activationfunction as we use categorical cross-entropy for multi-class classiﬁcation purposes and ’adam’ as the optimizer.

The proposed architecture for the LSTM model (shown in Figure: 3) has 2 LSTM layers consisting of 128 neuronseach. 4 time distributed fully connected layers of 64, 32, 16, and 8 neurons respectively are added after the 2 LSTMlayers with ’relu’ as their activation function. Lastly, a dense layer consisting of 2 neurons is added for the ﬁnal 2 classclassiﬁcation with ’softmax’ as its activation function and ’adam’ as the optimizer. I N P U T S I G NA L F E A T UR ES DenseLayer 1

Class 1 ProbClass 2 Prob

LSTM 1 LSTM 2 T i m e D i s t r i bu t e d L aye r ( N e u r on s ) T i m e D i s t r i bu t e d L aye r ( N e u r on s ) T i m e D i s t r i bu t e d L aye r ( N e u r on s ) T i m e D i s t r i bu t e d L aye r ( N e u r on s ) Figure 3: Block Diagram of the LSTM Model used

Indian Earthquake acceleration data of the past 10 years and magnitudes approximately between 2-8 M W was collectedfrom PESMOS (Program for Excellence in Strong Motion Studies, IIT Roorkee) [12] and EGMB (Eastern GhatsMobile Belt) Data of earthquakes with magnitudes ranging between 2-5 M W Collected at Geology and Geophysics LabIIT Kharagpur, were also used.Table 1 and Table 2 shows the Train-Test Data split for both the Deep Learning models. The collected dataset wasreplicated twice. The ﬁrst copy of the dataset was converted into corresponding audio signals of .wav format by keeping3

PREPRINT - J

ANUARY

19, 2021Table 1: Training Data for Deep Learning Models (CNN & LSTM)

Data Source Data Type & Format Feature Vector Dimensions No. of Earthquake Data No. of Non-Earthquake Data

PESMOS (IIT Roorkee) Audio, .wav 9 x 13 212 0EGMB (IIT Kharagpur) Audio, .wav 9 x 13 68 280

Table 2: Testing Data for Deep Learning Models (CNN & LSTM)

Data Source Data Type & Format Feature Vector Dimensions No. of Earthquake Data No. of Non-Earthquake Data

PESMOS (IIT Roorkee) Audio, .wav 9 x 13 11 0EGMB (IIT Kharagpur) Audio, .wav 9 x 13 21 18 the original sensor sampling rate 200Hz and the second copy of the same dataset was converted into correspondingaudio signals of the same format whose sampling rate was increased to 1000Hz using upsampling techniques. Thesampling rate of the upsampled audio signals was estimated using trial and error method so as to hear and clearly noticethe change of the signal with respect to time. A total of 610 sounds consisting of 310 earthquake sounds and 300non-earthquake sounds were used to train the models. Both the audio signal datasets of 200Hz and 1000Hz were fed asinputs to both the above models.

Before training the models, proper cleaning of the audio data was done and essential features from the data were thenextracted using popular speech processing methodologies of Filter Banks and Mel-Frequency Cepstral Coefﬁcients(MFCCs).

The very ﬁrst step of data processing included ﬁltering the data using a Pre-Emphasis ﬁlter. This was mainly done toamplify the high frequencies. Apart from ampliﬁcation, the ﬁlter helped in balancing the frequency spectrum sincehigher frequencies usually have smaller magnitudes compared to lower frequencies. The ﬁlter also was able to improvethe Signal-to-Noise Ratio (SNR). The ﬁrst order ﬁlter is represented by the equation : y ( t ) = x ( t ) − αx ( t − (1)was used to apply the pre-emphasis ﬁlter over the range of audio signal data. From the typical values of 0.95 and 0.97,the former was used for the ﬁlter coefﬁcient α . After pre-emphasis ﬁltering, the data was divided into short-time frames to avoid the loss of frequency contours of thesignal over time for performing Fourier transform. A good approximation of the frequency contours of the signal wasachieved by concatenating adjacent frames and applying the Fourier transform over those short-time frames. Popularsettings of 25ms for the frame-size and 10ms stride were used for framing the data. A Hamming window function(shown in Eq.2) was applied after the signal was sliced into frames. w [ n ] = 0 . − .

46 cos 2 πnN − , for ≤ n ≤ N − (2)where N represents the number of frames in which the signal was divided.After dividing the signal into frames, an N-point FFT was performed on each frame to calculate the frequency spectrum,which also happens to be the Short-Time Fourier-Transform (STFT), where N is typically 256 or 512 (256 in this case). The rationale behind using ﬁlter banks was to separate the input signal into its multiple components such that eachof those components carries a single frequency sub-band of the original signal. Triangular ﬁlters, typically 26, wereapplied on a Mel-scale to the power spectrum of the short-time frames to extract the frequency bands as shown inFigure: 4. 4

PREPRINT - J

ANUARY

19, 2021The formula for converting from frequency to Mel scale is given by: M ( f ) = 1125 × ln (1 + f

700 ) (3)To go back from Mels to frequency, the formula used is given by: M − ( m ) = 700 × ( e m/ − (4) time (s) F r e q u e n c y ( K H z ) Earthquake Sound time (s) F r e q u e n c y ( K H z ) Non-Earthquake Sound

Figure 4: Features extracted from the signal for training the models after applying the ﬁlter bank to the power spectrumsof the short-time frames of the signal. time (s) M F CC C o e ff i c i e n t s Earthquake Sound time (s) M F CC C o e ff i c i e n t s Non-Earthquake Sound

Figure 5: Mel-Frequency Cepstral Coefﬁcients (MFCCs) as extracted from the short frames in which the signal wasdivided.MFCC is a biologically inspired and by far the most successful and most used feature in the area of speech processing[13]. The algorithm was used in volcano classiﬁcation also [14]. For speech signals, the mean and the variance changescontinuously with time, and thus it makes the signal non-stationary [15]. Similarly like speech, earthquake signalsare also non-stationary [16] as each of them has different arrival of P, S, and surface waves. Therefore, normal signalprocessing techniques like Fourier transform, cannot be directly applied to it. But, if the signal is observed in a very5

PREPRINT - J

ANUARY

19, 2021small duration window (say 25ms ), the frequency content in that small duration appears to be more or less stationary.This opened up the possibility of short-time processing of the earthquake sound signals. The small duration windowis called a frame, discussed in section 4.3. For processing the whole sound segment, the window was moved fromthe beginning to the end of the segment consistently with equal steps, called shift or stride. Based on the frame-sizeand frame-stride, it gave us M frames. Now, for each of the frames as depicted in Figure: 5, MFCC coefﬁcients werecomputed. Figure 6: Block diagram of the entire SystemMoreover, the ﬁlter bank energies computed were highly correlated since all the ﬁlterbanks were overlapping. Thisbecomes a problem for most of the machine learning algorithms. To reduce autocorrelation between the ﬁlterbankcoefﬁcients and get a compressed representation of the ﬁlter banks, a Discrete Cosine Transform (DCT) was applied tothe ﬁlterbank energies. This also allowed the use of diagonal covariance matrices to model the features for training.Also, 13 of the 26 DCT coefﬁcients were kept and the rest were discarded due to the fact that fast changes in theﬁlterbank energies are represented by higher DCT coefﬁcients. These fast changes resulted in degrading the modelperformances. Thus a small improvement was observed by dropping them.The overall system representation could be better understood from Figure: 6.

The CNN and the LSTM models performed almost similarly for the 200Hz audio data set, but signiﬁcant improvementsin the train-test accuracy percentages are observed for the 1000Hz data set. For 1000Hz audio data set the CNN modelshowed a testing accuracy of 91.102% for 0.2-second sample window length while the LSTM model showed 93.999%for the same (shown in Figure: 7). This observation can be backed by the fact that LSTMs performs better for sequentialor time-series data classiﬁcations [17].The Kappa statistics(values), generally used for comparing an Observed Accuracy with an Expected Accuracy (randomchance), was used for validating the model accuracies for both the data sets (shown in Figure: 8).For both data classes, activations by random 5 out of 16 ﬁlters of the ﬁrst layer of the CNN Model along with theirinputs is represented by Figure: 9 and Figure: 10 respectively.The time required for generating the alarm by the CNN model includes 1.28 seconds to gather enough data to startthe MFCC computations, 8ms for processing and a prediction time of 0.2 seconds. The summation gives a result of1.68 seconds. Taking into consideration the processing and prediction delays, the total time for the CNN model is thusconsidered to be (cid:39) seconds. 6 PREPRINT - J

ANUARY

19, 2021Figure 7: Testing and Training Accuracy comparisons of CNN and LSTM Models over several sampling window timeframes for the audio signal data sets of 1000Hz & 200Hz sampling frequencies

Sample Window Lengths (Sec) C oh e n K a pp a S c o r e Kappa Value of CNN Model at 1000 FsKappa Value of LSTM Model at 1000 FsKappa Value of CNN Model at 200 FsKappa Value of LSTM Model at 200 Fs

Sample Window Lengths (Sec) T es t i ng A cc u r acy P e r ce n t a g es Testing Accuracy of CNN Model at 1000 FsTesting Accuracy of LSTM Model at 1000 FsTesting Accuracy of CNN Model at 200 FsTesting Accuracy of LSTM Model at 200 Fs

Figure 8: Testing Accuracy and Cohen Kappa Score comparisons of the CNN and the LSTM Models over severalsampling window time frames for the audio signal data sets of 1000Hz and 200Hz sampling frequencies7

PREPRINT - J

ANUARY

19, 2021Figure 9: 1st Convolutional layer activations for Earthquake Data with corresponding FiltersFigure 10: 1st Convolutional layer activations for Non- Earthquake Data with the same Filters as used for EarthquakedataLSTMs being computationally a bit more expensive, the processing and the prediction times were 10ms and 0.5 secondsrespectively, giving a total of 2.19 seconds. Taking into consideration the processing and prediction delays, the totaltime for the LSTM model is thus considered to be (cid:39) . seconds.Table 3: Overall Comparision between standard STA/LTA algorithm and the proposed algorithms Algorithm Data Source Data Sampling Frequency Time to Alarm Accuracy Prerequisites

STA/LTA PESMOS (IIT Roorkee)&EGMB (IIT Kharagpur) 200 Hz 3 Seconds after P-wave arrival 95.43%(Heavily dependant on Prerequisites) 1. Proper user-deﬁned Thresholds2. Different Thresholds for different Regions3. Threshold kept high for Strong Motion Events(more earthquakes missed, lesser false alarms)4. Threshold kept low for Weak Motion Events(fewer earthquakes missed, more false alarms)MFCC with CNN Model(Proposed Method) PESMOS (IIT Roorkee)&EGMB (IIT Kharagpur) 1000 Hz 2 Seconds after P-wave arrival 91.1% NoneMFCC with LSTM Model(Proposed Method) PESMOS (IIT Roorkee)&EGMB (IIT Kharagpur) 1000 Hz 2.5 Seconds after P-wave arrival 93.99% None

It was also observed that by increasing the sampling rate of the audio signals, the training and the testing accuraciesof both CNN and LSTM model increased. This is one of the very ﬁrst applications of Machine learning in the ﬁeldof earthquake detection using MFCC’s and Filterbank Coefﬁcients which are generally used in the ﬁeld of SpeechRecognition. An earthquake is basically understood well by three types of waves namely P-wave, S-wave, and surfacewave. Interaction of these waves with the surrounding medium gives an earthquake its intensity. Any wave is a vibration.Any vibration has some sound associated with it. It might be inaudible to the human ears, but the sound remains. Inreal-time, it is difﬁcult to identify and classify events based on classical approaches like peak ground displacement, peakground velocity, or even the widely recognized algorithm of STA/LTA as they require extensive research to determinebasic thresholding parameters so as to trigger an alarm. Many times due to human error or other unavoidable naturalfactors such as thunder strikes or landslides, the conventional algorithms may end up raising a false alarm (shown inFigure: 11). 8

PREPRINT - J

ANUARY

19, 2021Figure 11: Figure Showing 3 subplots where the ﬁrst plot shows incorrect P-wave arrival detection by STA/LTA, markedby the red line, due to poor threshold parameters, leading to late detection of P-wave and in-turn leading to late alarmgeneration. The second plot shows the spectrogram with almost appropriate detection of the P-wave ﬁrst arrival markedby the green line and the third plot shows the same spectrogram with a -50dB threshold applied, so as to get a clearview of the ﬁrst P-wave arrival.Table 3 shows a detailed comparision between the standard STA/LTA algorithm and the proposed Deep Learningmodels. The main disadvantage of STA/LTA algorithm is that it has to be tweaked differently for different types ofevent detections. The user deﬁned threshold which acts as the trigger, varies from region to region due to its directdependence on the geographical and topological features of a particular region. Also, the threshold is kept high forstrong motion events and low for weak motion events. This results in serious ambiguity as both the features can’t beused together. If the threshold is high, more earthquakes are missed but lesser false alarms are generated and if thethreshold is low, less earthquakes are missed but more false alarms generated. The proposed method can overcome theseproblems as it can extract essential features of a raw sound or vibrational data without manually hand engineering themand doesn’t require any knowledge or expertise in the relevant ﬁeld. It is also invariant to small variations in occurrencein time or position and can understand representations of data with multiple levels of abstraction. Since the input of themethod is only the waveform, it is suitable for real-time processing, thus, the models can very well be used also as anonsite earthquake early warning system requiring a minimum amount of preparation time and workload. Until now,the use of earthquake early warning systems for earthquake disasters are mainly limited by false alarm generation anddelay in detection. By using the suggested approach, these problems can be overcome, leading to automatic, fast, andaccurate detection of earthquake seismic signals.

The main reason for developing the sensor hardware prototype was to test the validity and robustness of the proposedmodels. Also, it was very crucial to understand and evaluate the complexities and solve the challenges at a system level9

PREPRINT - J

ANUARY

19, 2021as it is targeted to deploy the system in earthquake prone areas. It was also essential for comparing the system resultswith already present commercial solutions.The hardware is developed with the help of commercial equipments to mimic the vibrations from an actual earthquake.Figure: 12 depicts a schematic of the prototype sensor hardware used. Figure: 13a shows the actual PCB with boxesrepresenting the utility of sections. On the hard left we have the power management block to supply a steady sourceof power to the circuit (3.3V,80mA). On the top right we have the accelerometer sensor acting as the principle ofdetection (seismometer) having an output dynamic range of 0.1V to 2.5V and a sensitivity of 300mV/g. In between thePower Management and the accelerometer sensor blocks we have the microcontroller and wiﬁ communication blocks toreceive and transmit the sensor data wirelessly. The Wiﬁ has a bandwidth of 100mHz to 10Hz, RF carrier frequency of2.4GHz with a data transmission rate upto 256Kbps. The processor is of a 16 bit RISC architecture with an operatingfrequency of 16MHz.Figure 12: Schematic of an accelerometer with front-end ampliﬁers (box in black), and a microcontroller with wirelesstransceiver (box in red).Figure: 13b and Figure: 13c gives us the image of the 3D printed packaging solution used to house the sensor board andthe exact speciﬁcation details of the components used to design the sensor PCB respectively. After the Deep learningmodels were validated and ﬁnalized, an interesting experimental setup was made so as to test the effectiveness androbustness of the models. The idea was to mimic the vibrations from an actual earthquake signal, sense it with the helpof the custom designed sensor PCB, transmit the data wirelessly to a remote computer where the data gets plotted,processed and stored in real time. Also, as soon as the Deep Learning models detects the required data, it activates andpredicts the label so as to classify whether or not the recorded data corresponds to an earthquake or a non earthquakesound. The experimental setup can be visualized from Figure: 14.The test setup and the overall system included the following steps:• An Arbitary Waveform Generator (AWG). Real earthquake data was stored in it so that it could generate theoutput from that data.• A 4 ohm 40 Watt Speaker. The output of the AWG was connected to the speaker so as to make the speakervibrate according to the output waveform and thus mimic an earthquake.• The Prototype circuit board consisting of the accelerometer sensor to sense the speaker vibrations, microcon-troller and an ESP8266 wiﬁ communication block (transmitter built within the PCB and receiver is coonectedto the remote PC via a USB to UART cable).• MATLAB code running in the remote PC to receive, process, plot and store the data in real time.• Python code running simultaneously along with the MATLAB code in the remote PC, monitoring the storeddata directory to activate the deep learning model for prediction as soon as it detects any data in that directory.The test run of the system was a success as it was able to produce the desired results by accurately predicting the classof the vibration signals from the speaker received by the wireless ESP8266 Wi-Fi module.10

PREPRINT - J

ANUARY

19, 2021 (a) Photo of the prototype circuit board for implementing a wireless seismic sensor node.(b) A 3D printed packaging scheme for the wirelessseismic sensor prototype.(c) Speciﬁcations of the Seismic sensor Prototype.

Figure 13: Prototype Sensor PCB with its 3D printed packaging and Speciﬁcations11

PREPRINT - J

ANUARY

19, 2021Figure 14: Test Setup for the entire System.

In this paper, a new way of automatic classiﬁcation of earthquake signals is presented based on CNN and LSTM byusing only MFCC features extracted from the waveform. The performance of this algorithm has been tested by itsapplication to regional and local earthquake events selected from PESMOS (Program for Excellence in Strong MotionStudies, IIT Roorkee) and EGMB (Eastern Ghats Mobile Belt, Geology and Geophysics Lab IIT Kharagpur) datasets. Using optimal parameters, for 1000 Hz audio data set the CNN model showed a testing accuracy of 91.102%for a 0.2-second sample window while the LSTM model obtained an accuracy of 93.999% for the same. This bringsdown the standard alarm generation time to approximately 2 seconds after P-wave arrival. The results outperformthe conventional algorithm of STA/LTA in terms of classiﬁcation accuracy with respect to classiﬁcation speed andconstraint requirements. While the models were mainly aimed at classifying earthquake events as quickly as possible,they can also be easily tweaked to give early estimates of magnitudes, ground velocities, shaking intensity, and manymore useful parameters.Also, the uniquely innovative experimental setup helped in creating real time earthquake simulations in a cheap, preciseand effective way.MFCC for earthquake detection is like putting our ears to hear and listen the sounds inside the earth which we otherwisecould not hear and thus treating the earth sounds as earth’s speech signals. The most interesting and effective part ofthis model is that it can be trained on various classes of sounds other than earthquake sounds alone, to classify everysignal that the sensor detects, using their sound signatures. This can be of enormous help in military applications also.If trained with human movement sounds, the model could be deployed in the border and other high-security areas so asto provide us instant information regarding trespassing and other such unlawful activities, releasing the burden to someextent from the security ofﬁcials and the soldiers. This could be a solution not just for earthquake detection alone butfor many other such applications also.

Acknowledgments

This work is funded by the Ministry of Human Resource Development(MHRD), Government of India. The authors arethankful to the Department of Earthquake Engineering, IIT Roorkee, and the Department of Geology and Geophysics,IIT Kharagpur, for proving Program for Excellence in Strong Motion Studies(PESMOS) and Easter Ghats MobileBelt(EGMB) earthquake datasets respectively for the research. This work would not have been possible withoutthe research facilities provided by IIT Kharagpur. Finally, authors are very thankful to all the members of ImageProcessing and Computer Vision Laboratory, Department of Electronics and Electrical Communication Engineering12

PREPRINT - J

ANUARY

19, 2021and Computational Laboratory, Department of Geology and Geophysics, Indian Institute of Technology Kharagpur, fortheir kind help and support.

References [1] Thomas C Hanks and Hiroo Kanamori. A moment magnitude scale.

Journal of Geophysical Research: SolidEarth , 84(B5):2348–2350, 1979.[2] Hiroo Kanamori. Quantiﬁcation of earthquakes.

Nature , 271(5644):411–414, 1978.[3] Anshu Jin and Keiiti Aki. Temporal change in coda q before the tangshan earthquake of 1976 and the haichengearthquake of 1975.

Journal of Geophysical Research: Solid Earth , 91(B1):665–673, 1986.[4] K Vikraman. A deep neural network to identify foreshocks in real time. arXiv preprint arXiv:1611.08655 , 2016.[5] Thibaut Perol, Michaël Gharbi, and Marine Denolle. Convolutional neural network for earthquake detection andlocation.

Science Advances , 4(2):e1700578, 2018.[6] Zefeng Li, Men-Andrin Meier, Egill Hauksson, Zhongwen Zhan, and Jennifer Andrews. Machine learning seismicwave discrimination: Application to earthquake early warning.

Geophysical Research Letters , 45(10):4773–4779,2018.[7] Wu Yih-Min, Hiroo Kanamori, Richard M Allen, and Egill Hauksson. Determination of earthquake early warningparameters, τ c and pd, for southern california. Geophysical Journal International , 170(2):711–717, 2007.[8] PL Bragato, M Sugan, P Augliera, M Massa, A Vuan, and A Saraò. Moho reﬂection effects in the po plain(northern italy) observed from instrumental and intensity data.

Bulletin of the Seismological Society of America ,101(5):2142–2152, 2011.[9] Patrizia Tosi, Valerio De Rubeis, Andrea Tertulliani, and Calvino Gasparini. Spatial patterns of earthquake soundsand seismic source geometry.

Geophysical research letters , 27(17):2749–2752, 2000.[10] Patrizia Tosi, Paola Sbarra, and Valerio De Rubeis. Earthquake sound perception.

Geophysical Research Letters ,39(24), 2012.[11] Ian Goodfellow, Yoshua Bengio, and Aaron Courville.

Deep learning . MIT press, 2016.[12] Himanshu Mittal, Ashok Kumar, and Rebecca Ramhmachhuani. Indian national strong motion instrumentationnetwork and site characterization of its stations.

International Journal of Geosciences , 3(06):1151, 2012.[13] Md Afzal Hossan, Sheeraz Memon, and Mark A Gregory. A novel approach for mfcc feature extraction. In , pages 1–5. IEEE, 2010.[14] I. Ã?lvarez, G. CortÃ©s, A. de la Torre, C. BenÃtez, L. GarcÃa, P. Lesage, R. ArÃ¡mbula, and M. GonzÃ¡lez.Improving feature extraction in the automatic classiﬁcation of seismic events. application to colima and arenalvolcanoes. In , volume 4, pages IV–526–IV–529, 2009.[15] Mark G Frei and Ivan Osorio. Intrinsic time-scale decomposition: time–frequency–energy analysis and real-timeﬁltering of non-stationary signals.

Proceedings of the Royal Society A: Mathematical, Physical and EngineeringSciences , 463(2078):321–342, 2006.[16] JK Hammond and PR White. The analysis of non-stationary signals using time-frequency methods.

Journal ofSound and vibration , 190(3):419–447, 1996.[17] Sepp Hochreiter and Jrgen Schmidhuber. Long short-term memory.