A Multi-Task Learning Model for Estimating Fractional Inspiratory Time from Electrocardiography
EExtraction Of Fractional Inspiratory Time And Respiration Rate FromElectrocardiogram Signals
Maria Nyamukuru, Kofi Odame
Thayer School of Engineering, Dartmouth College14 Engineering DriveHanover, NH [email protected], kofi[email protected]
Abstract
At-home monitoring of lung health enables the early detec-tion and treatment of respiratory diseases like asthma andchronic obstructive pulmonary disease (COPD). To allowfor discreet continuous monitoring, various approaches havebeen proposed to estimate the respiratory rate from an elec-trocardiogram (ECG) signal. Unfortunately, respiratory ratecan only provide a non-specific, incomplete picture of lunghealth. This paper introduces an algorithm to extract morerespiratory information from the ECG signal: in addition torespiratory rate, the algorithm also derives the fractional in-spiratory time(FIT), which is a direct measure of airway ob-struction. The algorithm is based on a gated recurrent neu-ral network that infers vital respiratory information from atwo-lead ECG signal. The network is trained and tested ondifferent test subjects and reports up to 0.099, and 0.243 nor-malized root mean squared error in the computation of FITand respiration rate, respectively.
Keywords: electrocardiogram (ECG), bio-signals, respi-ration, inspiration-expiration (I:E) ratio, Ti/Ttot, fractionalinspiratory time (FIT), lung performance, neural network,gated recurrent unit, ECG-derived respiration (EDR)
There are 251 million people worldwide who suffer fromchronic obstructive pulmonary disease (COPD), and risk ex-periencing a sudden flare up in their symptoms. Early detec-tion and treatment of COPD exacerbations would increasepatients’ quality of life, reduce mortality, and avoid the ma-jority of COPD-related healthcare costs.One of the most predictive signs of an impending COPDexacerbation is an increase in lung airway obstruction, mea-sured via spirometry. But there is no way to perform spirom-etry reliably at home, due to low patient adherence and poorpatient technique.An alternative to home spirometry is to monitor lunghealth via an electrocardiogram-derived respiratory (EDR)signal: it requires no special patient technique, and it isintegrable into a smartwatch, thus encouraging patient ad-herence. Unfortunately, EDR is of limited use because itprimarily measures respiratory rate (Charlton et al. 2018),which provides an incomplete, non-specific picture of lunghealth.
US Patent Application No.: 63/120,085. All rights reserved.
Figure 1: Continuous lung function monitoring from ECGsignals (Dominguez 2020),(Apple 2020)To address the limitations of EDR, this paper introducesan algorithm to extract more respiratory information fromthe electrocardiogram signal: in addition to respiratory rate,the algorithm also derives the fractional inspiratory time(FIT), which is a direct measure of airway obstruction. Inthis paper, we will discuss how to extract FIT from ECGsignals successfully.
Breathing affects the ECG signal in three main ways: ECGfrequency modulation, ECG amplitude modulation, andECG baseline wander, all of which can be used to extractthe respiratory signal.ECG frequency modulation, also known as RespiratorySinus Arrhythmia (RSA), refers to the heart rate variabil-ity associated with respiration (Caiani et al. 1999). Duringinspiration, the heart rate increases because the diaphragmcontracts downwards creating a negative pressure in the tho-racic cavity which increases the pull of blood into the heart’sveins and increases the amount of blood in the heart. Onthe other hand, expiration causes a drop in heart rate due toless negative pressure in the thoracic cavity, so less bloodis pulled into the heart. These variations in heart rate withbreathing cycle are observed as changes in the interval be- a r X i v : . [ ee ss . SP ] D ec ween adjacent R peaks of the ECG (of the European Societyof Cardiology the North American Society of Pacing Elec-trophysiology 1996).ECG amplitude modulation results from the change in theelectric heart vector with reference to the electrodes due torise and fall motion of the lungs during breathing (Travagliniet al. 1998). During inspiration, the diaphragm contracts andshifts downwards, and during expiration, the diaphragm ex-pands and shifts upwards. The movement of the diaphragmchanges the positioning of the heart, by moving it towardsthe abdomen when it expands and towards the chest when itcontracts. This adjustment in the heart causes changes in theelectric heart vector and in turn modulates the ECG signal,particularly the QRS complexes (Arunachalam and Brown2009) which can be demodulated to extract the respiratorysignal as shown in Fig. 3.Less commonly, the respiratory signal is extracted fromthe ECG signal’s baseline wander. The expansion and con-traction of the chest causes the electrodes placed on the chestto move closer and further away from the heart, resulting inthe low frequency component of the ECG signal, also knownas the baseline wander (Ji et al. 2008). ECG frequency mod-ulation, ECG amplitude modulation, and baseline wanderexcel in extracting the respiration signal’s frequency compo-nent, making them ideal in determining the respiratory rate.They, however, fall short in accurately extracting the phaseinformation of the respiratory signal. In order to extract both frequency and phase informa-tion, different techniques have been proposed like princi-pal component analysis (PCA) (Langley, Bowers, and Mur-ray 2010), and kernel principal component analysis (kPCA)(Widjaja et al. 2012). These techniques require use of hand-crafted features like P-peak, Q-peak, R-peak, S-peak and T-peak amplitudes, and RR-interval (Gao et al. 2017), to ex-tract a more accurate EDR signal. Another technique to ex-tract the respiratory signal from ECG utilizes Hilbert trans-formations to break down the ECG into different sinusoids.This technique then utilizes Hilbert vibration decompositionto filter the sinusoids whose amplitudes are of a lower or-der (Sharma and Sharma 2018). Using the Hilbert transformnegates the use of handcrafted features but only utilizes asmall portion of the ECG signal to extract the respiratorysignal.In order to extract both the frequency and phase informa-tion of the respiratory signal with high accuracy and withoutusing handcrafted features while using majority of the ECGsignal, we propose a novel approach to perform an end-to-end extraction of respiratory signals from ECG signals usinga gated recurrent unit neural network. To accurately extractrespiratory frequency information, we focus on the respira-tory rate, and focus on the FIT to accurately extract phaseinformation. We formulate the problem to maximize the ac-curacy of predicting the respiratory rate and FIT of the res-piratory signal as discussed in Section IV.
The computation of the FIT and respiratory rate is dependenton correctly identifying the inspiratory phase and expiratoryphase of the respiration signal, as shown in Fig. 2b. Thisreconstruction relies on accurately locating the peaks andvalleys, as shown in Fig. 2a.Figure 2: a. Respiratory signal (top), b. Inspiratory and Ex-piratory Phase Classification (bottom)Figure 3: a. R-Peak Detection on ECG Lead I Signal (top),b. EDR vs Respiratory signal (bottom)One approach is to reconstruct the respiratory signal fromthe ECG signal as discussed in Sections 2 and 3. For exam-ple, Fig. 3 shows the reconstructed respiratory signal basedon amplitude modulation and spline interpolation. This EDRsignal is used to deduce the inspiratory peaks and expiratoryvalleys.igure 4: Block diagram showing novel approach to extract FIT and respiratory rate from ECG signals using GRU network toclassify underlying inspiratory and expiratory phases, a median filter to filter the noisy classification output and a duty cyclecalculation to compute FIT and respiratory rateTo avoid the use of spline interpolation, or any hand-crafted features, and to utilize the entire ECG signal data,respiratory signal reconstruction could be set up as a regres-sion problem. This requires the use of temporally alignedECG signal data and respiratory signal data to train on. Theregression model would output a respiration signal, and apeak/valley detection algorithm would then be applied tofurther locate the inspiratory peaks and expiratory valleys.This approach, however, requires two algorithms to extractthe FIT: the regression algorithm and the peak/valley algo-rithm, making it susceptible to compounding errors.In an effort to combine these two algorithms and cir-cumvent their drawbacks, we formulate the problem as atwo-class classification problem; the two classes are theinspiratory phase and the expiratory phase. The inspira-tory peak and the expiratory valley mark the transitionfrom inspiration-expiration and expiration-inspiration, re-spectively, as shown in Fig. 2b. This formulation simplifiesthe learning task and emphasizes the importance of correctlyidentifying peaks and valleys, which is essential in calculat-ing FIT and determining the respiratory rate. Additionally,since the peak/valley detection is integrated into the clas-sification formulation, a separate peak/valley algorithm isnot required. This approach consists of three main stages,as shown in Fig. 4; the neural network stage that performsclassification of inspiratory and expiratory phases from in-put ECG signals, the median filter that removes any noisein the classified output, and the duty cycle calculation thatcomputes the FIT and the respiratory rate.The neural network stage is the most crucial stage of thisapproach, making the choice of neural network architectureessential. The gated recurrent unit (GRU) (Cho et al. 2014)is explored to solve the two-class classification problem. TheGRU is a variant of the recurrent neural network (RNN), aneural network design that models time dependencies. TheGRU’s variation contains reset and update gates, which pro-vide inbuilt memory to better capture long-term temporaldependencies for large time steps without experiencing van-ishing gradients during back-propagation. This GRU traitis responsible for its excellent performance on sequence-to-sequence labeling tasks, like speech recognition, musicmodeling, and natural language processing. GRU networksare also more robust to noisy sequential data. A similar RNNvariant to the GRU is the LSTM. The LSTM has an addi-tional gate, the forget gate. The LSTM is not used in this problem because it has significantly more parameters thanthe GRU, and the GRU has better performance for smallerand less frequent datasets (Chung et al. 2014), making itideal for this application. These GRU characteristics make itideal for the extraction and classification of inspiratory andexpiratory phases from ECG signals.
The Fantasia database (Iyengar et al. 1996) is a commontest set for evaluating EDR algorithms. However, it doesnot have temporally aligned ECG and respiratory signals(Nyamukuru and Mark 2020) and is therefore unsuitablefor our application. The data used in this study is from theCombined measurement of ECG, Breathing, and Seismocar-diograms (CEBS) database (Garc´ıa-Gonz´alez et al. 2013),which is publicly available on PhysioNet (Goldberger et al.2000). This database contains recordings of the respiratorysignal, ECG Lead I signal, ECG Lead II signal, and Seis-mocardiograms (SCG) signal. The ECG Lead I, ECG LeadII, and the respiratory signals are the signals of interest.These signals were collected simultaneously and are, there-fore, temporally aligned. (Garc´ıa-Gonz´alez et al. 2013) de-tails how this data was acquired.We reviewed the data available and selected the measure-ments of six subjects whose respiratory signal profiles wereclosest to a typical respiratory signal profile. This selectionformed a six-hour total dataset used for training and testingthe neural network.
During preprocessing, the dataset is down-sampled from5000 Hz to 300 Hz to reduce power consumption and mem-ory usage. 300 Hz is chosen as the new sampling frequencybecause it the lowest sampling frequency that can be usedto satisfy the Nyquist sampling theorem, given that the max-imum bandwidth is 150 Hz at the ECG signal channels. Athird-order median filter is then applied to the respiratorysignal to attenuate the noise. All the input signals, that isto say, the ECG Lead I, ECG Lead II, and the respiratorysignals are normalized in the 0 to 1 range. Each normalizedECG Lead signal is used as one of the features in the two-dimensional input signal, x , to the neural network.ach sample point in the respiratory signal time series islabeled using the following semi-automatic procedure. First,the peaks and troughs of the respiratory signal are labeledmanually. Then, all points between a given trough and thenext peak in the time series are automatically labeled class 1(inspiratory phase). All points between a given peak and thesubsequent trough are labeled class 0 (expiratory phase) asshown in Fig. 2b. The resulting time series of class labels, y ,are used as the target output to train the neural network.For neural network training, both input signal, x , and classlabels, y , are partitioned into 25 second windows of 7500samples. The GRU network consists of three 16-unit hidden GRUlayers, one 16-unit fully connected layer, and a single unitoutput layer, as shown in Fig. 5. The network processes 25-second long (7500 samples) sequences and returns outputfor each sample in the sequence. The GRUs’ ability to cap-ture long-term dependencies makes them ideal for extractingtemporal and spatial features from the 2-feature ECG inputsignals.Figure 5: GRU network architecture that takes in 2 featuresat a time (one sample), the ECG Lead I and Lead II, andoutputs respiratory phase classes for each sample for a 25-second window (7500 samples), with no overlapThe hidden layers feed into a fully connected layer/ denselayer that consists of 16 rectified linear units (ReLU). Thefully connected layer feeds into the classification layer witha sigmoid function that outputs each sample’s class label inthe sequence.
The noise from the input ECG signal that propagates throughthe neural network is mitigated by applying a rd ordermedian filter to every predicted output window. This noisefiltering removes spurious inspiratory and expiratory phasesof less than ms, thereby improving the predicted outputaccuracy. The median filtered output is used to compute FIT. Before calculating the FIT, the algorithm identifies the num-ber of complete cycles N c . A complete cycle includes an in-spiratory phase and an expiratory phase. The algorithm com-putes the FIT for each complete respiration cycle as shownin Equation 1 where N i is the number of samples in the in-spiratory phase and N tot is the total number of samples inthe complete cycle, which is the sum of samples in the in-spiratory and expiratory phases. FIT = N i N tot (1) The respiratory rate is computed for each 25-second windowas shown in Equation 2 where F s is the sampling frequency, N c is the total number of complete cycles in each windowby and S tot is the total number of samples in the completecycles. respiratory rate = F s · · N c S tot (2) The neural network shown in Fig. 5 is designed and imple-mented using Tensorflow Keras v2.2 (Chollet et al. 2015).The hyper-parameters of the network were initially deter-mined using hyperas (Pumperla 2020), a hyper-parameteroptimization library, and then tweaked manually to reducethe number of parameters whilst achieving similar perfor-mance. The hyper-parameters used to train this network areas shown in Table 1.Table 1: Neural Network Hyper-parameters
Parameter
Value
Epochs Used 500Batch Size 32Training Optimizer Adam (Kingma and Ba 2015)Learning rate 0.005Number of parameters 4530During training, a 20% cross-validation training split al-lows the model to train on multiple train-validation splits.In order to guarantee that the algorithm uses the best per-forming model, a model checkpoint is implemented to saveand retrieve the model with the highest validation accuracy.However, since the expiratory phase of a normal person lastslonger than the inspiratory phase, the expiratory phase willhave a higher representation in the dataset making the algo-rithm biased towards the expiratory phases. To address thisias, we used sample weights to normalize the binary cross-entropy loss function. The application of sample weights tothe loss function results in an evenly distributed predictionaccuracy for both the inspiratory phases and the expiratoryphases. The neural network training and testing occur in twostages.
Initially, the neural network is trained on five of the six sub-jects and evaluated on the test subject whose data is not in-cluded in the training set. This is done for all combinationsof the six subjects, I, II, III, IV, V and VI, resulting in sixbase models one for each test subject. This case informshow well the network works without seeing data from thetest subject. The base model weights are used to fine-tunethe neural network model with new training data.
The second stage consists of fine-tuning the base model ondata from the test subject that was initially excluded fromthe training set. The model is fine-tuned on 5-minute incre-mental amounts of data from 5 minutes up to 20 minutes.Forty minutes of data from the test subject is reserved forevaluating both the base model and the fine-tuned models.
The trained base models and fine-tuned models are evaluatedon the same test sets. An exemplary plot of the accuracy andloss curves during the training and validation of the basemodel on Subject VI is shown in Fig. 6.Figure 6: Accuracy and Loss Training Curves for Subject VIThe median filter filters the noise in the neural networkoutputs, and the test accuracy, precision, recall, and F1-scoreof the filtered output are recorded. Table 2 below shows theresults for Subject VI for all neural network models, baseand fine-tuned models with different amounts of data.The test accuracy results from all subjects for all modelsare shown in Fig. 7.The FIT is computed for every complete cycle in each 25-second window of the filtered predicted classes output testsets, and is compared to the FIT from the true classes asshown by the exemplary plots in Fig. 10 for Subject III - 10minutes and Fig. 11 for Subject V - 15 minutes. The average Table 2: Evaluation of base and fine-tuned GRU models
Subject VI Evaluation Results in %Test Acc. F1-Score Precision Recall * Base model
Figure 7: Filtered classification output test set accuracy ofall subjectsFigure 8: NRMSE of the FIT for all subjectsigure 9: NRMSE of the respiratory rate for all subjectsFigure 10: a. Plot of true vs predicted classes output of onewindow for Subject VI (top) and b. Plot of true vs predictedFIT of one window for Subject VI (bottom) Figure 11: a. Plot of true vs predicted classes output of onewindow for Subject V (top) and b. Plot of true vs predictedFIT of one window for Subject V (bottom)FIT for each 25-second window in the test set is computedand used to calculate the normalized root mean squared er-ror (NRMSE) of estimated FIT and respiratory rate obtainedfrom this novel approach, as shown in Fig.8 and Fig. 9.The NRMSE is computed as shown in Equation 3 where ˆ y i is the estimated average FIT or respiratory rate for onewindow, y i is the true average FIT or respiratory rate for onewindow, and n is the total number of windows in the test set. NRMSE = (cid:113)(cid:80) ni=1 ( ˆy i − y i ) n (cid:80) ni=1 y i n (3)Additionally, Bland-Altman plots for the estimated FITand respiratory rate for all subjects are computed. Theseplots are used to determine how well the predictions fromthis approach compare to the ground truth and determine ifthis technique is interchangeable with the ground truth tech-nique. Two exemplary plots are shown in Fig. 12 and Fig. 13for Subjects V and VI. The lowest classification accuracy and the highest NRMSEfor both the FIT and respiratory rate are observed for thebase models when data from the test set subject is not in-cluded in the training (0 minutes) as shown in Fig. 7, Fig. 8and Fig. 9. This low performance is attributed to the smalldataset which does not allow for a diverse representationof respiratory signals in the training set. This poor perfor-mance of the base model is particularly worse for Subject VIwho has the lowest accuracy of the base model performanceat 40% as shown in Fig. 7. Subject VI has an inspiratoryphase that is on average longer than the expiratory phase,and therefore an FIT that is on average above 0.5 as shownin Fig. 10. This is unlike most of the other subjects whoserue average FIT is less 0.5, for example Subject V shownin top plot of Fig. 11. Therefore for the base model, whereSubject VI is excluded from the training set and used solelyas the test set, the neural network model performs poorly.This is rectified by fine-tuning the neural network model onexample data used in the test set.Fine-tuning the model on example data of the test set re-sults in up to 50% higher classification accuracy, and re-duces the FIT NRMSE by up to 0.4 and the respiratoryrate NRMSE by up to 2.6. This results in a minimum FITNRMSE of 0.0990 reported on Subject VI with 20 minutesof fine-tuning data as shown in Fig. 8, and a minimum res-piratory rate NRMSE of 0.2428 also reported on Subject VIwith 20 minutes of fine-tuning data as shown in Fig. 9. Theincrease in classification accuracy and the decrease in FITand respiratory rate NRMSE when fine-tuned on the datafrom the test subject is attributed to the increase in data’sdiversity, which allows the model to generalize better. FromFigs. 7, 8, 9, increasing the amount of data in the fine-tunedtraining data generally improves accuracy and reduces theFIT NRMSE and respiratory rate NRMSE. The gain in per-formance starts to plateau at 10 minutes, and implies that aminimum of 10 minutes should be collected to fine-tune themodel for optimal performance.This improved performance is very evident in the Bland-Altman plots for different subjects. The FIT and respiratoryrate Bland Altman plots of the base models for all the sub-jects have negative proportional bias. This negative propor-tional bias diminishes, and tends towards a fixed bias withincreased data in the fine-tuned models as shown in Fig. 12and Fig. 13.The noise seen in the predicted classifications in Fig.11(b) is due to noisy input signals as shown in Fig. 11(a).These noisy signals are included in the training set and affectthe accuracy of the detected FIT. Fig. 10(a) has much cleanerdata than Fig. 11(a), and also reports a much more accurateFIT estimation as shown in Fig. 10(b). This proves that thealgorithm performs exceptionally well, and has higher testset accuracy and lower FIT and respiratory rate NRMSE forsubjects whose fine-tuned training data is cleaner with lessfalse peaks and valleys.
This paper proposes a GRU neural network architecture toextract FIT and respiratory rate from an ECG signal. TheGRU neural network is demonstrated to have significantlyhigh classification accuracy and, as a resultant accurate FITand respiratory rate estimation when fine-tuned on at least10 minutes of clean subject data. This method will allowfor implementing this GRU architecture on wearable devicesthat collect ECG signal data. This technique would increaseECG signals’ utility and motivate continuous monitoring ofpulmonary parameters from prevalent wrist-worn wearabledevices with built-in ECG sensors.
References , 5681–5684.Caiani, E. G.; Porta, A.; Terrani, M.; Guzzetti, S.; Malliani,A.; and Cerutti, S. 1999. Minimal adaptive notch filter forrespiratory frequency tracking. In
Computers in Cardiology1999. Vol.26 (Cat. No.99CH37004) , 511–514.Charlton, P. H.; Birrenkott, D. A.; Bonnici, T.; Pimentel,M. A. F.; Johnson, A. E. W.; Alastruey, J.; Tarassenko, L.;Watkinson, P. J.; Beale, R.; and Clifton, D. A. 2018. Breath-ing Rate Estimation From the Electrocardiogram and Pho-toplethysmogram: A Review.
IEEE Reviews in BiomedicalEngineering
11: 2–20.Cho, K.; van Merrienboer, B.; Gulcehre, C.; Bougares, F.;Schwenk, H.; and Bengio, Y. 2014. Learning phrase rep-resentations using RNN encoder-decoder for statistical ma-chine translation. In
Conference on Empirical Methods inNatural Language Processing (EMNLP 2014) .Chollet, F.; et al. 2015. Keras. https://keras.io.Chung, J.; Gulcehre, C.; Cho, K.; and Bengio, Y. 2014. Em-pirical evaluation of gated recurrent neural networks on se-quence modeling. In
NIPS 2014 Workshop on Deep Learn-ing, December 2014 .Dominguez, M. 2020. Pressure-Volume Curve - Respira-tory - Medbullets Step 1. URL https://step1.medbullets.com/respiratory/117003/pressure-volume-curve.Gao, Y.; Yan, H.; Xu, Z.; Xiao, M.; and Song, J. 2017.A principal component analysis based data fusion methodfor ECG-derived respiration from single-lead ECG.
Aus-tralasian Physical & Engineering Sciences in Medicine
Computingin Cardiology 2013 , 461–464.Goldberger, A. L.; Amaral, L. A. N.; Glass, L.; Hausdorff,J. M.; Ivanov, P. C.; Mark, R. G.; Mietus, J. E.; Moody,G. B.; Peng, C.-K.; and Stanley, H. E. 2000. PhysioBank,PhysioToolkit, and PhysioNet.
Circulation
Ameri-can Journal of Physiology-Regulatory, Integrative and Com-parative Physiology
Electronics Letters
CoRR abs/1412.6980.Langley, P.; Bowers, E. J.; and Murray, A. 2010. Prin-cipal Component Analysis as a Tool for Analyzing Beat-to-Beat Changes in ECG Features: Application to ECG-Derived Respiration.
IEEE Transactions on Biomedical En-gineering th Circulation
Australasian Physical & EngineeringSciences in Medicine
41: 429–443.Travaglini, A.; Lamberti, C.; DeBie, J.; and Ferri, M. 1998.Respiratory signal derived from eight-lead ECG. In
Com-puters in Cardiology 1998. Vol. 25 (Cat. No.98CH36292) ,65–68.Widjaja, D.; Varon, C.; Dorado, A.; Suykens, J. A. K.; andVan Huffel, S. 2012. Application of Kernel Principal Com-ponent Analysis for Single-Lead-ECG-Derived Respiration.