Automatic Detection of ECG Abnormalities by using an Ensemble of Deep Residual Networks with Attention
Yang Liu, Runnan He, Kuanquan Wang, Qince Li, Qiang Sun, Na Zhao, Henggui Zhang
AAutomatic Detection of ECG Abnormalities by using an Ensemble of Deep Residual Networks with Attention
Yang Liu , Runnan He , Kuanquan Wang , Qince Li , Qiang Sun , Na Zhao and Henggui Zhang School of Computer Science and Technology, Harbin Institute of Technology (HIT), 150001 Harbin, China School of Physics and Astronomy, The University of Manchester, Manchester M13 9PL, UK SPACEnter Space Science and Technology Institute, Shenzhen 518117, China International Laboratory for Smart Systems and Key Laboratory of Intelligent of Computing in Medical Image, Ministry of Education, Northeastern University, Shengyang, 110004, China The Department of Pharmacology, Beijing Electric Power Hospital, Beijing, 100073, China + Joint first author * [email protected] Abstract.
Heart disease is one of the most common diseases causing morbidity and mortality. Electrocardiogram (ECG) has been widely used for diagnosing heart diseases for its simplicity and non-invasive property. Automatic ECG ana-lyzing technologies are expected to reduce human working load and increase di-agnostic efficacy. However, there are still some challenges to be addressed for achieving this goal. In this study, we develop an algorithm to identify multiple abnormalities from 12-lead ECG recordings. In the algorithm pipeline, several preprocessing methods are firstly applied on the ECG data for denoising, aug-mentation and balancing recording numbers of variant classes. In consideration of efficiency and consistency of data length, the recordings are padded or trun-cated into a medium length, where the padding/truncating time windows are se-lected randomly to suppress overfitting. Then, the ECGs are used to train deep neural network (DNN) models with a novel structure that combines a deep resid-ual network with an attention mechanism. Finally, an ensemble model is built based on these trained models to make predictions on the test data set. Our method is evaluated based on the test set of the First China ECG Intelligent Com-petition dataset by using the F metric that is regarded as the harmonic mean between the precision and recall. The resultant overall F score of the algorithm is 0.875, showing a promising performance and potential for practical use. Keywords: heart disease, electrocardiogram, automatic diagnosis, deep neural networks Introduction
Heart diseases, mainly manifested as disordered patterns of atrial and ventricular elec-trical excitation activity, have been regarded as the leading cause of morbidity and mor-tality. Electrocardiogram (ECG) is a common and noninvasive tool that can be used for diagnosing heart conditions. However, it is time consuming and error-prone to analyze ECGs in practice, therefore, computer-aided algorithms may offer a promising way to improve the efficiency and accuracy of ECG analyzing. The algorithms for ECG analyzing typically contain three steps, which are prepro-cessing, feature extraction and classification [1-5]. Among these, the feature extraction is a critical step, for which many methods have been proposed, such as morphology information [2], temporal and frequency features [3], high order statistical features [4] and wavelet features [5]. However, these algorithms still have shortcomings to achieve a good performance for the detection of abnormalities in ECGs. Recently, deep convo-lutional neural networks (DNNs) showed outstanding performance in automatic feature extraction, leading to a dramatic breakthrough in a range of fields associated with com-puter vision [6]. Therefore, in the field of ECG analysis, many studies have attempted to apply DNNs, such as convolutional neural network (CNN) [7], deep residual network [6], and recurrent neural network (RNN) [8], to address the problem of heart diseases detection, all of which have achieved some impressive results. However, due to the long recording length, low signal quality and pathological diversity of ECG recordings, it is still a challenge for accurate feature extraction. . In this study, we propose a novel deep residual neural network with attention mech-anism to detect a series of abnormalities from 12-lead ECG recordings. The deep resid-ual neural network is used to learn local features from the ECG waveforms, while the attention mechanism determines the relevance of features from each part and summa-rizes them into a single feature vector that is used for the classification. This combina-tion demonstrates to be effective for feature learning in a long ECG recording, and robust when the signal is partially corrupted. Materials
The First China ECG Intelligent Competition dataset contains about 15000 12-lead ECG recordings, among which 6,500 for training and 8,500 for testing respectively. The recordings are in different lengths, ranging from 9 to 90 seconds sampled at 500 Hz (Fs=500Hz). Each recording is labeled with one or more types including normal, atrial fibrillation (AF), first degree atrioventricular block (FDAVB), complete right bundle branch block (CRBBB), left anterior fascicular block (LAFB), premature ven-tricular contraction (PVC), premature atrial contraction (PAC), early repolarization pat-tern changes (ER) and T-wave changes (TWC). Method
Preprocessing Baseline Wander Removal.
Baseline wander results from low-frequency noise in the ECG signal. It can influence the diagnosis of many diseases that manifest as low-fre-quency changes in the ECG signals, e.g., S-T segment changes. We remove the baseline wander by first estimating it and then subtracting it from the original signal. The esti-mating is based on moving average which is a windowed low-pass filter with the cut-off frequency calculated by π ππ = 0.443 Γ π π π (1) where π ππ indicates the cut-off frequency, π π indicates the sampling frequency, and N indicates the window size. Generally, the cut-off frequency shouldnβt be less than the slowest heart rate which is typically 40 beats/minute, i.e., 0.67Hz. But, considering the fluctuation of heart rate, the cut-off frequency should be slightly lower, approximately 0.5Hz. Powerline Interference and Muscle Noise Removal.
Powerline interference gener-ally resulted from the alternating current (AC) in the environment. Thus, its frequency is usually 50/60Hz depending on the specific standard of the AC power supply system. Muscle noise, i.e., electromyographic noise, is caused by the electrical activity pro-duced by muscle contraction. Different from the powerline interference, muscle noise is much more irregular, due to the randomness of muscle activities. The frequency com-ponents of muscle noise have a wide overlap with those of the ECG, and can be even higher. In this work, we remove both these noises by wavelet denoising based on 5-level βdb4β wavelet transform and soft-thresholding [9].
Padding or Truncating Signals to the Same Length.
The lengths of ECG recordings in the dataset is in a high variety, ranging from 9s to 90s. For batch processing of a DNN model, the recordings in a batch should be in the same length. To address this problem, there are three ways: ο· Padding all recordings into the longest length. This method avoids the loss of infor-mation during the length-unifying process. But, for most of the recordings, the new length is several times longer than their original lengths, which will result in the significant increase in processing time of a DNN model. ο· Truncating signals into the shortest length. On the contrary to the padding method, this method can reduce the processing time significantly. However, the truncating process will inevitably lead to the loss of information, especially when the truncated part is in a large proportion of the original signal. ο· Grouping the recordings that have the same length. Compared to the above two methods, this method doesnβt change the original signals and thus wonβt cause any loss or distortion to their information. But, as the distribution of the recording lengths is extremely uneven, a group may contain just one or two recordings, resulting in a big variety of batch sizes. Besides, there is some uncertainty in the prediction by a DNN model when it receives a recording with an unknown length. In this work, we try to make a trade-off between the recording integrity and computing complexity by padding or truncating the signals into a medium length. As more than
90% of recordings in the dataset is no longer than 30s, we choose 30s (i.e., 15000 sam-pling points) as the target length. To minimize the effect of padding on the following interpretations, the value for padding is set to zero which is equal to the baseline of the ECG signals. The obvious difference between the padded part and the original part will help a machine learning model to detect the original part from the whole recording. However, the padding and truncating operations can be done in different positions, which may lead to different impacts on the interpretation by a DNN model. We will discuss this in the following section about data augmentation and balancing.
Redistribution of Signal Lengths.
Even though the recordings are padded or truncated into the same length, the difference of original length distributions between data classes can still induce a bias to the discrimination of a DNN model. For example, the record-ings longer than 20s only account for a proportion less than 1% in the normal class, but account for more than 10% in the PAC class. As a result, a DNN model may recognize the padding length as a feature to distinguish between these classes, which is clearly unreasonable. Therefore, we propose a method to organize the recordings into the same distribution of recording lengths among all the classes. We first make a distribution statistics of recording lengths in the whole dataset. Because of the truncating operation in our pipeline as stated above, recordings longer than 30s are all counted as that of 30s. Then, the global distribution is used as the target distribution, and recordings in each class are augmented to have the same distribution. For lengths that exist in the target distribution but not exist in the original distribution of a class, we truncate the longer recordings to make recordings with these lengths.
Data Augmentation and Balancing.
As discussed above, there are different ways for padding or truncating a recording in terms of time windows. In order to make a DNN model insensitive to the timing positions of padding or truncating, we pad or truncate each recording at different positions to create more samples for training. In other words, we augment the dataset by different padding and truncating ways. The selection for padding or truncating way is random in our study. For a recording shorter than the target length, we can pad it at both the ends with various of schemes to specify the padding length at each end. This method introduces more randomness to the positions of pad-ding, which would help a DNN model learn to ignore the padded parts and focus the original parts of the recordings. And for the truncated recordings, this method generates more training samples with different parts of the original recordings that would con-tribute to better use of the limited data and enhance the modelβs discrimination ability. Besides, in terms of recordingsβ numbers, the dataset is very imbalanced between clas-ses. The data augmentation method can also be used to balance the dataset. Generally speaking, recordings with short class length will be augmented more times than those with long class length, allowing each class have approximately the same number of recordings.
Model Architecture
Due to the automatic feature-learning ability, DNNs can reduce human working load in extracting features from the raw ECGs. A DNN model is supposed to learn a brief, robust but comprehensive representation from a raw ECG recording. In this work, we propose a novel DNN architecture that combines a residual convolutional network and an attention mechanism, as shown in Fig. 1.
Fig. 1.
The architecture of proposed deep residual network with attention mechanism. L i indi-cates the i-th local feature vector. Ξ± i indicates the attention value for the i-th local feature vec-tor. V indicates the global feature vector. In our proposed architecture, the feature learning process can be divided into two stages: local features learning stage and global features learning stage. A local feature vector learned by a stack of residual convolutional modules characterizes a short frag-ment of a ECG recording, while the global feature vector learned by an attention mech-anism is a summary of the sequence of local-feature vectors. After the feature learning, the global feature vector is input into a fully-connected layer to predict the probabilities that a recording belongs to each class. We will give more details about this architecture in the following section. In the local features learning stage, a raw ECG recording is first input into a 1D convolutional layer. The output feature map is then processed sequentially by 7 residual convolutional modules which are considered having good properties to avoid the deg-radation problem in DNNs [6]. Each residual model is constructed by 9 layers: 2 batch normalization layers, 2 dropout layers, 2 ReLU activation layers, 2 1D-convolutional layers and an addition-based merging layer, in the order shown in Fig.1. The kernel size of each convolutional layer is 16. The kernel number in the first convolutional layer is 16, and it grows by 16 for every two residual modules. There is also a max-pooling layer following each residual module for compression of intermediate feature maps. As a result, the length of a feature map output by the local feature learning part will be 1/2 of the input length. In the global features learning stage, an attention mechanism is utilized to learn an attention distribution on the sequence of local features. Due to the possible paroxysm of diseases, padding parts and noise effects, only a few episodes in a ECG recording may be relevant for the diagnosis. In view of this, the attention distribution is supposed ... R e L U L L L L L n V F u ll y - C o nn ec t e d ΓRaw ECG Ξ± Ξ± Ξ± Ξ± Ξ± n ... Γ ΓΓ Γ M ax - P oo li n g D C o n vo l u t i o n a l B a t c h N o r m a li z a t i o n D C o n vo l u t i o n a l B a t c h N o r m a li z a t i o n D r o p o u t D r o p o u t D C o n vo l u t i o n a l + Γ7 ... R e L U R e L U B a t c h N o r m a li z a t i o n to manifest the relevance of each part in the ECG recordings for the classification. Then, the local features are summed, weighted by the attention, into a single feature vector. Finally, a fully-connected layer is used to learn a classifier based on the global features. This layer contains 9 cells corresponding to the 9 categories respectively. As a record may belong to more than one category, the output of each cell is processed by a sigmoid activation function to make prediction independently. Model Training
Based on the architecture stated above, we train a series of models with different pro-cedures. As shown in Fig. 2, there are 4 different pipelines (labeled with numbers) are used in our research for model training. Most of the differences between pipelines are present in the preprocessing steps, including data normalization, denoising, data aug-mentation and balancing between classes. The window size for baseline wander re-moval is 250 ( π ππ = π ππ = Fig. 2.
The model training workflow that combines models from different pipelines to make an ensemble model.
Denoising
Balancing between
Classes
Ensemble
Model
Validation10-fold Cross
Validation
DenoisingMixture with CPSC 2018 Dataset 10-fold Cross Validation10-fold Cross
Validation
Balancing between
ClassesECG
Training set Normalization Weights TransferDenoising Balancing between ClassesData
Augmentation Results and Discussion
The metrics are designed based on the evaluation of multi-label classification, where a single recording may belong to more than one class. The predictive accuracy of the algorithm for each class is measured by the F score. Besides, an overall F score is also calculated by averaging all the categorical sub-scores. The formulas for calculation of these metrics are described in the following. For each class ( ), there are four values counting samples with different pre-diction results, namely true positive (TP), false positive (FP), true negative (TN) and false negative (FN): ππ π = |{π₯ π |π¦ π β π π , π¦ π β π(π₯ π ), 1 β€ π β€ π}| , (2) πΉπ π = |{π₯ π |π¦ π β π π , π¦ π β π(π₯ π ), 1 β€ π β€ π}| , (3) ππ π = |{π₯ π |π¦ π β π π , π¦ π β π(π₯ π ), 1 β€ π β€ π}| , (4) πΉπ π = |{π₯ π |π¦ π β π π , π¦ π β π(π₯ π ), 1 β€ π β€ π}| . (5) where π₯ π indicates a sample for prediction, π¦ π is the label for the class j, π π is the anno-tated label set of π₯ π , and π(π₯ π ) is the predicted label set of π₯ π . The precision, recall and F of each class can be calculated by ππππππ πππ π = ππ π ππ π +πΉπ π , (6) π πππππ π = ππ π ππ π +πΉπ π , (7) πΉ = π βπ πππππ π ππππππ πππ π +π πππππ π . (8) The overall F score is the arithmetic mean value of that of the nine classes. πΉ = β πΉ (9) The results show that the overall F score of the proposed classifier on the hidden test set is 0.875, with the detailed scores shown in Table 1. Table 1.
Results of the ECG abnormalities classification on the entire test set. Normal AF FDAVB CRBBB LAFB PVC PAC ER TWC Total F Results shown in Table 1 demonstrate that the classifier achieves a good perfor-mance for AF, FDAVB, CRBBB, PVC and PAC, which are all above 0.9. However, the identification of LAFB, ER and TWC are less good, which are just over 0.7 due to relatively few data. Conclusions
In this paper, two contributions have been made for ECG automatic classifications. (1) The random padding/truncating method is a simple strategy that not only helps to bal-ance the processing efficiency and recordings integrity, but also provides ways to aug-ment and balance the dataset. Furthermore, the randomness involved by this method in the padding/truncating positions can prevent a DNN overfitting the padding/truncating manner. (2) The proposed workflow that combines different pipelines of model training to make an ensemble model achieved better results than each of the separated pipelines. Classification results showed that the proposed algorithm may provide a potential way of computer-aided diagnosis for clinical applications.
Acknowledgements
The work is supported by the National Science Foundation of China (NSFC) under Grant Nos. 61572152 (to HZ), 61571165 (to KW), 61601143 (to QL) and 81770328 (to QL), the Science Technology and Innovation Commission of Shenzhen Municipal-ity under Grant Nos. JSGG20160229125049615 and JCYJ20151029173639477 (to HZ), and China Postdoctoral Science Foundation under Grant Nos. 2015M581448 (to QL).
References Ari, S., Das, M. K., Chacko, A.: ECG signal enhancement using S-Transform,β IEEE Trans. Biomed. Eng. 43 (6) , 649 -
660 (2013). 2.
De Chazal, P., Reilly, R.B.: A patient-adapting heartbeat classifier using ECG morphology and heartbeat interval features, IEEE Trans. Biomed. Eng. 53 (12), 2535β2543 (2006). 3.
Yang, H., Kan, C., Liu, G., Chen, Y.: Spatiotemporal differentiation of myocardial infarc-tions. IEEE Trans. Autom. Sci. Eng. 10 (4), 938β947 (2013). 4.
Dima, S.-M., Panagiotou, C., Mazomenos, E.B., Rosengarten, J.A., Maharatna, K., Gialelis, J.V., Curzen, N., Morgan, J.: On the detection of myocadial scar based on ECG/VCG anal-ysis. IEEE Trans. Biomed. Eng. 60 (12), 3399β3409 (2013). 5.
Yu, S.N. Chou, K.T.: Integration of independent component analysis and neural networks for ECG beat classification. Expert Syst. Appl 34, 2841β2846 (2008). 6.
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770β778. IEEE, Las Vegas, NV, USA (2016). 7.
He, R., Wang, K., Zhao, N., Liu, Y., Yuan, Y., Li, Q., Zhang, H.: Automatic detection of atrial fibrillation based on continuous wavelet transform and 2D convolutional neural net-works. Frontiers in physiology 9, 1206 (2018). 8.
Oh, S. L., Ng, E. Y., San Tan, R., Acharya, U. R.: Automated diagnosis of arrhythmia using combination of CNN and LSTM techniques with variable length heart beats. Computers in biology and medicine 102, 278 -
287 (2018). 9.287 (2018). 9.