Automatic Feature Extraction for Heartbeat Anomaly Detection
Robert-George Colt, Csongor-Huba Várady, Riccardo Volpi, Luigi Malagò
AAutomatic Feature Extractionfor Phonocardiogram Heartbeat AnomalyDetection using WaveNetVAE
Robert George Colt , , Csongor Huba V´arady , ,Riccardo Volpi , and Luigi Malag`o Romanian Institute of Science and Technology, Cluj-Napoca, Romania Babes-Bolyai University, Cluj-Napoca, Romania Max Planck Institute for Mathematics in the Sciences, Leipzig, Germany { robert.colt,varady,volpi,malago } @rist.ro Abstract.
We focus on automatic feature extraction for raw audio heart-beat sounds, aimed at anomaly detection applications in healthcare. Welearn features with the help of an autoencoder composed by a 1D non-causal convolutional encoder and a WaveNet decoder trained with a mod-ified objective based on variational inference, employing the MaximumMean Discrepancy (MMD). Moreover we model the latent distributionusing a Gaussian chain graphical model to capture temporal correlationswhich characterize the encoded signals. After training the autoencoderon the reconstruction task in a unsupervised manner, we test the sig-nificance of the learned latent representations by training an SVM topredict anomalies. We evaluate the methods on a problem proposed bythe PASCAL Classifying Heart Sounds Challenge and we compare withresults in the literature.
Keywords:
Heartbeats · Autoencoder · WaveNet · Latent Representa-tions · Anomaly Detection.
Anomaly detection is usually characterized by a class imbalance between nor-mal and anomalous data, i.e., outliers differing from the majority of the data.In healthcare this problem is particularly relevant considering the potential forearly diagnoses, by triggering expedited emergency responses in time-criticalsituations, having the potential to improve the quality of life [27,16,24]. Cardio-vascular diseases are the first cause of death worldwide [16] and the problem ofanomaly detection in heartbeats has been extensively approached in the liter-ature [24,10,13,14]. Detecting irregularities in ECG signals can be approachedefficiently with machine learning methods [24,16] with considerable success, dueto the low level of noise in these signals, allowing additionally for a low sampling
Presented at PharML 2020 Workshop -
European Conference on Machine Learningand Principles and Practice of Knowledge Discovery in Databases (ECML-PKDD) a r X i v : . [ c s . S D ] F e b R.G. Colt et al. rate. Classification of heartbeat anomalies from PhonoCardioGram audio signals(PCG) is a considerably more difficult task compared to ECG signals. On theother hand data are easier to obtain and successful anomaly detection algorithmsbased on PCG are more pervasive in the society due to the wide availability ofaudio recording devices.The PASCAL Classifying Heart Sounds Challenge 2011 [3] has introducedthe problem of identifying unhealthy heartbeat sounds in PCG signals. Severalapproaches to this problem take a fully supervised approach, by introducingexpert knowledge [8,6,18,1], with ad-hoc features design, and by using specificwavelet transformations to identify anomalous frequencies [2], or through deci-sion trees based on expert knowledge heuristics [5]. Malik et al. [17] exploitedthe periodicity and average heartbeat lengths to highlight anomalous behavior.In this paper we follow a different perspective in which features are auto-matically extracted through an autoencoder trained on the reconstruction task.Wang et al. [28] use CNNs and autoencoders to perform anomaly detection ontime-series physiological data, Pereira et al. [20] perform unsupervised LSTMbased representation learning and anomaly detection in ECG sequences. Rusheet al. [23] introduce an anomaly detection algorithm for raw audio data, basedon the ability of WaveNet [19] to predict the next sample of a normal signal. Weaim at combining both approaches, in particular we would like to have a wellbehaved latent representation and at the same time leverage the expressivity of aWaveNet autoregressive model. Given the importance of interpretable models inmedicine [4], we aim at learning a set of expressive and compact features in thelatent space by Variational Inference [9]. We demonstrate that relevant featurescan be automatically extracted through the reconstruction task, paving the waytowards semi-supervised approaches.
WaveNetAE [7] has been proven capable of learning to reconstruct high-fidelitynatural sounds like music or human speech. It consists of an encoder employ-ing non-causal convolutional layers with skip connections, and on a conditionalWaveNet [19] decoder. The training is done with the objective of minimizingthe negative log likelihood. Unlike simple Autoencoders, Variational AutoEn-coders (VAE) [12,22], based on Variational Inference (VI) [9] exhibit advanta-geous properties by regularizing the latent space and being able to learn compactrepresentations, by using a multivariate Gaussian.Usually the Gaussian distribution is chosen with independent priors, i.e. witha diagonal covariance matrix. In order to model a time correlation between thelatent variables, we introduce a Gaussian graphical model [15] characterized bya chain structure over the time dimension in the latent space of the encoder,cf. [21]. The overall architecture is shown in Figure 1b and it consists of anencoder-decoder pair. The decoder is a WaveNet model, while the distribution forthe approximate posterior in the latent space is either a Gaussian Independentmodel (GI) or a Gaussian Chain model (GC). utomatic Feature Extraction for Heartbeat Anomaly Detection 3
A known problem, when using a probabilistic distribution in the latent spaceof Variational Autoencoder with a powerful autoregressive decoder such as WaveNet,is that the KL term in the Evidence Lower Bound objective (ELBO) L ( x, θ, φ ) = E q θ ( z | x ) [ln p φ ( x | z )] − D KL ( q θ ( z | x ) || p ( z )) (1)might lead the optimization towards posterior collapse, as also reported in [7].To solve this problem we propose replacing the KL divergence in the ELBOobjective with the Maximum Mean Discrepancy (MMD) [25], a dissimilaritymeasure between the aggregate posterior and the prior distribution [30,26] as D MMD ( q || p ) = E p ( z ) ,p ( z (cid:48) ) [ k ( z, z (cid:48) )] − E q ( z ) ,p ( z (cid:48) ) [ k ( z, z (cid:48) )] + E q ( z ) ,q ( z (cid:48) ) [ k ( z, z )] ,k gaussian ( z, z (cid:48) ) = e − || z − z (cid:48)|| σ , k module ( z, z (cid:48) ) = || z − z (cid:48) || − || z || − || z (cid:48) || . (2)In our experiments we used the Gaussian kernel ( k gaussian ) for the GI modelsand for GC trained only on normal data. For the GC models trained on allboth normal and anomalous data, the best results were obtained when using themodule kernel ( k module ) to compute the MMD. (a) Example of heartbeats signals. (b) Illustration of a Gaussian chain modelwith tridiagonal precision matrix. We train models in two ways: with all the samples (all) or only with normalsamples (n). We classify the data using a supervised SVM on the frozen latentspace of the pretrained WaveNetAE (abbreviated as AE), GI and GC modelswhich are used as feature extractors.
We evaluate our methods on the Dataset B of the PASCAL Classifying HeartSounds Challenge, including 507 records, collected with a 4,000Hz sampling fre-quency, and divided in three categories:
Normal, Murmur and
Extrasystole , seeFig. 1a. Our preprocessing consists of 3 steps: 1) we clip the signal by the 99 . R.G. Colt et al. recommended by the challenge [3] to smooth out the clipping and to removehigh frequency noise; finally 3) we rescale the data between [ − , . β = 0 . β = 0 . In order to classify the latent representations we experiment with different croplengths (6,144, 9,216, 12,288) and number of random crops of the raw signal.We report best results with a crop length of 12,288 and 10 crops per signal.We employ a majority voting policy between the crops of a single signal inorder to assign a label. We trained the SVM classifiers both with the raw latentrepresentation as well as with the Fast Fourier Transforms taken on each channelseparately. The best results are yielded from the latter method and are reportedin Table 1. We report average on 3 training steps (spaced by 5k steps for VAEsand by 7k steps for AEs) and around the step obtaining the best classificationperformances on validation. We perform a 3-class classification and computeTotal precision (TP) as the sum of all precisions of the 3 classes while the othermetrics are computed on a binary classification task,
Murmur and
Extrasystole taken together being the positive class with
Normal as the negative class, asspecified by the challenge [3]. ‘ all ’ specifies that the model has been trainedwith both anomalous and normal samples, ‘ n ’ indicates the fact that the modelwas trained only with normal heartbeats. The models ending in ‘ bn ’ use batchnormalization in their encoder. VAE models trained on all samples are moreeffective than models trained only on normal data, most likely due to the limiteddata set size. However the AE models seem to perform better when trained onlyon normal data obtaining good overall performances. GC models trained on allsamples exhibits good overall performance as well, similarly to AE. The chainmodel in the temporal dimension (GC vs GI model) improves the latent spacerepresentation, which results more meaningful to the anomaly detection task.Overall, the results obtained with the proposed methods are better than thoseof other works dealing with this particular challenge, Table 1. We demonstrate how relevant features for PCG audio signals can automati-cally be extracted through WaveNet autoencoders. We introduce a WaveNet- utomatic Feature Extraction for Heartbeat Anomaly Detection 5
Table 1: Anomaly detection for different models described in the paper, re-sults on the test set averaged at the last 3 saved models. C is the regularizationcoefficient for a SVM using Gaussian kernel, chosen based on validation. Abbre-viations: YI - Youden’s Index, TP - Total Precision, Spec. - Specificity of heartproblem, Sens. - Sensitivity of heart problem, DP - Discriminant Power. Secondbest results are highlighted in bold while best results are also underlined . Model C YI TP Spec. Sens. DP AUCAE-all 0.55 0.27 ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± VAE model, trained using MMD in the latent space and we demonstrate howthe introduced regularization produce a benefit in terms of SVM classificationin the latent space. Additionally we found that Batch Normalization in the en-coder produce benefits in terms of latent representations for the WaveNetVAEmodels. We obtained better results than other works dealing with the PAS-CAL Classifying Heart Sounds Challenge 2011, evaluated with several metricsof interest for the challenge. We show how a VAE or AE model can be used toautomatically extract relevant features to the anomaly detection task, withoutthe need of expert domain knowledge. We chose a simple method to classify thefrozen latent space of the heartbeats (SVM), to probe the latent space repre-sentation learned by the autoencoders. The approach presented paves the waytowards semi-supervised/self-supervised training for detecting anomalies in au-dio signals.
This work was supported by the DeepRiemann project, co-funded by the Eu-ropean Regional Development Fund and the Romanian Government throughthe Competitiveness Operational Program 2014-2020, Action 1.1.4, project IDP 37 714, contract no. 136/27.09.2016.
References
1. Avendano-Valencia, L., Godino-Llorente, J., Blanco-Velasco, M., Castellanos-Dominguez, G.: Feature extraction from parametric time–frequency representa-tions for heart murmur detection. Annals of Biomedical Engineering , 168–193 (2018)5. Chakir, F., Jilbab, A., Nacir, C., Hammouch, A.: Phonocardiogram signals pro-cessing approach for pascal classifying heart sounds challenge. Signal, Image andVideo Processing (6), 1149–1155 (2018)6. Deng, Y., Bentley, P.J.: A robust heart sound segmentation and classification al-gorithm using wavelet decomposition and spectrogram. In: Workshop ClassifyingHeart Sounds, La Palma, Canary Islands. pp. 1–6 (2012)7. Engel, J., Resnick, C., Roberts, A., Dieleman, S., Norouzi, M., Eck, D., Simonyan,K.: Neural audio synthesis of musical notes with wavenet autoencoders. In: Inter-national Conference on Machine Learning. pp. 1068–1077 (2017)8. Gomes, E.F., Pereira, E.: Classifying heart sounds using peak location for seg-mentation and feature construction. In: Workshop Classifying Heart Sounds, LaPalma, Canary Islands. pp. 480–92 (2012)9. Graves, A.: Practical variational inference for neural networks. In: Advances inneural information processing systems. pp. 2348–2356 (2011)10. Ismail, S., Siddiqi, I., Akram, U.: Localization and classification of heart beats inphonocardiography signals—a comprehensive review. EURASIP Journal on Ad-vances in Signal Processing (1), 26 (2018)11. Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprintarXiv:1412.6980 (2014)12. Kingma, D.P., Welling, M.: Auto-encoding variational bayes. arXiv preprintarXiv:1312.6114 (2013)13. Krishnan, P.T., Balasubramanian, P., Umapathy, S.: Automated heart sound clas-sification system from unsegmented phonocardiogram (pcg) using deep neural net-work. Physical and Engineering Sciences in Medicine pp. 1–11 (2020)14. Latif, S., Usman, M., Rana, R., Qadir, J.: Phonocardiographic sensing using deeplearning for abnormal heartbeat detection. IEEE Sensors Journal (22), 9393–9400 (2018)15. Lauritzen, S.L.: Graphical models, vol. 17. Clarendon Press (1996)16. Li, H., Boulanger, P.: A survey of heart anomaly detection using ambulatory elec-trocardiogram (ecg). Sensors (5), 1461 (2020)17. Malik, S.I., Akram, M.U., Siddiqi, I.: Localization and classification of heartbeatsusing robust adaptive algorithm. Biomedical Signal Processing and Control ,57–77 (2019)18. Oliveira, S.C., Gomes, E.F., Jorge, A.M.: Heart sounds classification using motifbased segmentation. In: Proceedings of the 18th International Database Engineer-ing & Applications Symposium. pp. 370–371 (2014)19. Oord, A.v.d., Dieleman, S., Zen, H., Simonyan, K., Vinyals, O., Graves, A., Kalch-brenner, N., Senior, A., Kavukcuoglu, K.: WaveNet: A Generative Model for RawAudio. arXiv:1609.03499 [cs] (Sep 2016), http://arxiv.org/abs/1609.03499 ,arXiv: 1609.0349920. Pereira, J., Silveira, M.: Unsupervised representation learning and anomaly detec-tion in ecg sequences. International Journal of Data Mining and Bioinformatics (4), 389–407 (2019)utomatic Feature Extraction for Heartbeat Anomaly Detection 721. Peste, A., Malag´o, L.: Towards the use of gaussian graphical models in variationalautoencoders. ICML Workshop (2017)22. Rezende, D.J., Mohamed, S., Wierstra, D.: Stochastic backpropagation and ap-proximate inference in deep generative models. arXiv preprint arXiv:1401.4082(2014)23. Rushe, E., Mac Namee, B.: Anomaly detection in raw audio using deep autoregres-sive networks. In: ICASSP 2019-2019 IEEE International Conference on Acoustics,Speech and Signal Processing (ICASSP). pp. 3597–3601. IEEE (2019)24. ˇSabi´c, E., Keeley, D., Henderson, B., Nannemann, S.: Healthcare and anomalydetection: using machine learning to predict anomalies in heart rate data. AI &Society (2020)25. Sriperumbudur, B.K., Gretton, A., Fukumizu, K., Sch¨olkopf, B., Lanckriet, G.R.:Hilbert space embeddings and metrics on probability measures. The Journal ofMachine Learning Research , 1517–1561 (2010)26. Tolstikhin, I., Bousquet, O., Gelly, S., Schoelkopf, B.: Wasserstein auto-encoders.arXiv preprint arXiv:1711.01558 (2017)27. Ukil, A., Bandyoapdhyay, S., Puri, C., Pal, A.: Iot healthcare analytics: The im-portance of anomaly detection. In: 2016 IEEE 30th International Conference onAdvanced Information Networking and Applications (AINA). pp. 994–997. IEEE(2016)28. Wang, K., Zhao, Y., Xiong, Q., Fan, M., Sun, G., Ma, L., Liu, T.: Research onhealthy anomaly detection model based on deep learning from multiple time-seriesphysiological signals. Scientific Programming (2016)29. Zhang, W., Han, J., Deng, S.: Heart sound classification based on scaled spectro-gram and tensor decomposition. Expert Systems with Applications84