Deep Neural Network based Cough Detection using Bed-mounted Accelerometer Measurements
Madhurananda Pahar, Igor Miranda, Andreas Diacon, Thomas Niesler
CCopyright 2021 IEEE. Published in ICASSP 2021- 2021 IEEE International Conference on Acoustics,Speech and Signal Processing (ICASSP), scheduled for6-11 June 2021 in Toronto, Ontario, Canada. Personaluse of this material is permitted. However, permission toreprint/republish this material for advertising or promo-tional purposes or for creating new collective works forresale or redistribution to servers or lists, or to reuse anycopyrighted component of this work in other works, mustbe obtained from the IEEE. Contact: Manager, Copy-rights and Permissions / IEEE Service Center / 445 HoesLane / P.O. Box 1331 / Piscataway, NJ 08855-1331, USA.Telephone: + Intl. 908-562-3966. a r X i v : . [ c s . L G ] F e b EEP NEURAL NETWORK BASED COUGH DETECTION USING BED-MOUNTEDACCELEROMETER MEASUREMENTS
Madhurananda Pahar †∗ , Igor Miranda ⊕ , Andreas Diacon † , Thomas Niesler † Department of Electrical and Electronic Engineering, University of Stellenbosch, South Africa TASK Applied Science, Cape Town, South Africa † { mpahar, ahd, trn } @sun.ac.za; ⊕ [email protected] ABSTRACT
We have performed cough detection based on measurementsfrom an accelerometer attached to the patient’s bed. Thisform of monitoring is less intrusive than body-attached ac-celerometer sensors, and sidesteps privacy concerns encoun-tered when using audio for cough detection. For our experi-ments, we have compiled a manually-annotated dataset con-taining the acceleration signals of approximately 6000 coughand 68000 non-cough events from 14 adult male patients ina tuberculosis clinic. As classifiers, we have considered con-volutional neural networks (CNN), long-short-term-memory(LSTM) networks, and a residual neural network (Resnet50).We find that all classifiers are able to distinguish between theacceleration signals due to coughing and those due to otheractivities including sneezing, throat-clearing and movementin the bed with high accuracy. The Resnet50 performs thebest, achieving an area under the ROC curve (AUC) exceed-ing 0.98 in cross-validation experiments. We conclude thathigh-accuracy cough monitoring based only on measurementsfrom the accelerometer in a consumer smartphone is possi-ble. Since the need to gather audio is avoided and thereforeprivacy is inherently protected, and since the accelerometer isattached to the bed and not worn, this form of monitoring mayrepresent a more convenient and readily accepted method oflong-term patient cough monitoring.
Index Terms — accelerometer, cough detection, Resnet,CNN, LSTM
1. INTRODUCTION
Coughing is the forceful expulsion of air to clear up the air-way and a common symptom of respiratory disease [1]. Itcan be distinctive in nature and is an important indicator usedby physicians for clinical diagnosis and heath monitoring inmore than 100 respiratory diseases [2], including tuberculosis(TB) [3], asthma [4] and pertussis [5]. ∗ We would like to thank the South African Centre for High PerformanceComputing (CHPC) for providing computational resources on their Lengaucluster for this research.
Automatic cough detection and classification is possibleby applying machine learning algorithms on extracted fea-tures from cough sounds [6]. It has also been shown to bepossible when using the signals from an accelerometer placedon the patient’s body [7]. Since the accelerometer is insensi-tive to environmental and background noise, it can be usedin conjunction with other sensors such as microphones, ECGand thermistors [8].A cough monitoring system using a contact microphoneand an accelerometer attached to the participant’s supraster-nal (jugular) notch has been considered by [9]. This systemallows participants to move around within their homes whilethe cough audio and vibration is recorded. In related work, anambulatory cough monitoring system using an accelerometerattached to the skin of the participant’s suprasternal notch us-ing a bioclusive transparent dressing was developed in [10].Here the recorded signal is transmitted to a receiver carried ina pocket or attached to a belt.Throat-mounted accelerometers have been used success-fully to detect coughing in [11] and in [12], and an accelerom-eter placed at the laryngeal prominence (Adam’s apple) in [7].Two accelerometers, one placed on the abdomen and the sec-ond on a belt wrapped at dorsal region, have been used tomeasure cough rate in the research carried out by [13]. Ab-dominal placement (between the navel and sternal notch) ofthe accelerometer was also investigated in [14], and it was ap-plied to patients who were children. Finally, multiple sensors,including ECG, thermistor, chest belt, accelerometer and au-dio microphones were used for cough detection in [15].Attaching an accelerometer to the patient’s body is how-ever inconvenient and intrusive. Thus, we propose the mon-itoring of coughing based on the signals obtained from theaccelerometer inbuilt in an inexpensive consumer smartphonefirmly attached to the patient’s bed, as shown in Figure 1,thereby eliminating the need to wear a measuring equipment.We have trained and evaluated deep neural network (DNN)classifiers such as convolutional neural networks (CNN),long-short-term-memory (LSTM) networks, and a residualneural network (Resnet50) [16] architecture using leave-one-out cross-validation on a dataset, prepared for this purpose,hich consists of cough and non-cough events such as sneez-ing, throat-clearing and getting in and out of the bed. TheResnet50 produces the highest AUC of 0.9888 after 50 epochswith corresponding accuracy 96.71% and sensitivity 99% for32 sample (320 msec) long frames and grouping them in10 segments. This shows that it is possible to discriminatebetween cough events and other non-cough events by usingstate-of-the-art classifiers such as a Resnet50 architecture;where the accelerometer is no longer attached to the patient’sbody, rather built inside an inexpensive consumer smartphoneattached to the headboard of the patient’s bed.
2. DATASET PREPARATION
Data collection was performed at a small 24h TB clinic nearCape Town, South Africa, which can accommodate approx-imately 10 staff and 30 patients. Each ward has four bedsand can thus accommodate up to four patients at one time.The overall motivation of our work is to develop a practicalmethod of automatic cough monitoring for the patients in thisclinic, in order to assist with the monitoring of recovery.The recording setup is shown in Figure 1. An enclosurehousing an inexpensive consumer smartphone is firmly at-tached to the back of the headboard of each bed in a ward.A data gathering Android smartphone application, developedfor this study, continuously monitors the accelerometer andaudio signals from an external microphone (also shown inFigure 1). The 3-axis accelerometer has a sampling frequencyof
Hz and only magnitudes were recorded. This energy-threshold-based detection for both audio and acceleration sig-nals results in a large volume of data being captured. In ad-dition, continuous video recordings were made using ceiling-mounted cameras in order to assist with data annotation.This work considers automatic classification of the accel-eration signals. The audio signals and video recordings wereused only during the manual annotation process in order tounambiguously confirm the presence or absence of a cough.We define an ‘event’ to be any detected accelerometer or au-dio activity. An example of the accelerometer magnitudescaptured for a cough and for a non-cough event are shownin Figure 2. Annotation was performed using the multimediasoftware tool ELAN [17], which allowed easy consolidationof the audio and video for accurate manual labelling.Our final dataset, summarised in Table 1, contains approx-imately 6000 coughs and 68000 non-coughs from 14 adultmale patients totalling 3.16 and 32.20 hours of data respec-tively. No other information regarding patients are recordeddue to ethical constraints. This dataset was used to train andevaluate the classifiers within a cross-validation framework.Table 1 shows that coughs are underrepresented in ourdataset. To compensate for this imbalance, which can detri-mentally affect machine learning [18, 19], we have appliedSMOTE data balancing during training [20, 21]. This tech-nique oversamples the minor class by generating synthetic
A Samsung Galaxy J4 smartphone with an inbuilt accelerometer connected to an external BOYA BY-MM1 cardioid microphone by a 3.5 mm audio jack
Fig. 1 . Recording Process:
A plastic enclosure housing aninexpensive smartphone running data gathering software isattached behind the headboard of each bed. The audio sig-nal from the external microphone is also monitored, but onlyused for the purpose of annotation of the acceleration signal.
Fig. 2 . Example accelerometer magnitudes for a coughevent (red) and a non-cough event (blue). In this case, thenon-cough event was the patient moving in the bed.samples (instead of for example random oversampling).SMOTE has previously been successfully applied to coughdetection and classification based on audio recordings [22].
3. FEATURE EXTRACTION
Power spectra [23], root mean square value, kurtosis, movingaverage and crest factor are extracted from the acceleration able 1 . Ground Truth Dataset Summary: ‘PATIENTS’:list of the patients; ‘COUGHS’: number of confirmed coughevents; ‘NON COUGHS’: number of confirmed events thatare not coughs; ‘COUGH TIME’: total amount of time (insec) for cough events; ‘NON-COUGH TIME’: total amountof time (in sec) for non-cough events.
PATIENTS COUGHS NON COUGH NON-COUGHS TIME COUGH TIME
Patient 1 88 973 169.16 1660.67Patient 2 63 1111 117.67 1891.92Patient 3 469 11025 893.91 18797.32Patient 4 109 9151 204.06 15596.71Patient 5 97 7826 188.26 13344.98Patient 6 192 12437 360.72 21197.35Patient 7 436 14053 825.23 23953.15Patient 8 368 2977 702.05 5077.89Patient 9 2816 3856 5345.27 6569.32Patient 10 649 2579 1236.84 4400.42Patient 11 205 527 391.42 901.38Patient 12 213 323 402.61 547.62Patient 13 213 712 401.61 1211.75Patient 14 82 455 158.77 777.64
TOTAL 6000 68005 11397.6 115928.12 signals as features for classification. The frame length ( Ψ )and number of segments ( C ) have been used as the featureextraction hyperparameters, shown in Table 2 and 4. The in-put feature matrix, shown in Figure 3 and 4, has the dimensionof ( C , Ψ2 + 5 ) and power spectra have ( Ψ2 + 1) coefficients. Table 2 . Feature extraction hyperparameters.
Framelengths (16, 32, 64 samples i.e. 160, 320 and 640 msec long)overlap in such a way that the number of segments (5 and 10)are the same for all events in the dataset.
Hyperparameter Description Range
Frame length ( Ψ ) Size of frames in samples k wherein which cough is segmented k = 4 , , No. of Segments ( C ) Number of segments in 5, 10which frames were grouped Frame length ( Ψ ) used to extract features from accelera-tion signal is shorter than those generally used to extract fea-tures from audio [24], because the accelerometer in the smart-phone (shown in Figure 1) has a lower sampling rate of 100Hz and longer frames lead to deteriorated performance as thesignal properties can no longer be assumed to be stationary.
4. CLASSIFIER TRAINING
Our dataset contains 14 patients and a leave-one-out cross-validation scheme [25] has been used to train and evaluate α α α α INPUT FEATURE MATRIX CONVOLUTIONAL 2D LAYERS MAX-POOLING WITH DROPOUT
RATE = α FLATTENING WITH DROPOUT RATE = α REDUCE DENSE LAYER UNITS TO 8, THEN 2-UNIT SOFTMAX
Fig. 3 . CNN Classifier , trained and evaluated using leave-one-out cross-validation [25], produces results shown in Table4 for feature extraction hyperparameters shown in Table 2. α β INPUT FEATURE MATRIX β LSTM UNITS
FLATTENING WITH DROPOUT RATE = α REDUCE DENSE LAYER UNITS TO 8, THEN 2-UNIT SOFTMAX
Fig. 4 . LSTM classifier , trained and evaluated using leave-one-out cross-validation [25], produces results shown in Table4 for feature extraction hyperparameters in Table 2.our three DNN classifiers: CNN, LSTM and Resnet50.The CNN classifier, shown in Figure 3, has been set upwith α
2D convolutional layers with kernel size α and rec-tified linear units as activation functions. A dropout rate α has been applied along with max-pooling, followed by α dense layers with rectified linear units as activation functions,followed by another 8 dense layers, also with rectified linearunits as activation functions.The LSTM classifier, shown in Figure 4, has been set upwith β LSTM units with rectified linear units as activationfunctions and a dropout rate α . Then α dense layers havebeen applied with rectified linear units as activation functions,followed by another 8 dense layers also with rectified linearunits as activation functions.For both LSTM and CNN classifiers, a final softmax func-tion produces one output for a cough event (i.e. 1) and theother for a non-cough event (i.e. 0), as shown in Figure 3 and4. Features are fed into these two classifiers in batch size of ξ for ξ number of epochs.The residual network (Resnet) architecture we trained andevaluated has 50 layers and has been found to deliver thestate-of-the-art performance in image recognition. We havereplicated the 50-layer architecture used in Table 1 of [16] inour experiments. Table 3 lists the classifier hyperparametersthat were optimised during leave-one-out cross-validation. able 3 . CNN & LSTM classifier hyperparameters , opti-mised using leave-one-out cross-validation and shown in Fig-ure 3 and 4.
Hyperparameters Classifier Range
Batch Size ( ξ ) CNN & LSTM k where k = 6 , , No. of epochs ( ξ ) CNN & LSTM 10 to 200 in steps of 20No. of Conv filters ( α ) CNN × k where k = 3 , , kernel size ( α ) CNN 2 and 3Dropout rate ( α ) CNN & LSTM 0.1 to 0.5 in steps of 0.2Dense layer size ( α ) CNN & LSTM k where k = 4 , LSTM units ( β ) LSTM k where k = 6 , , Learning rate ( β ) LSTM k where k = − , − , − Fig. 5 . Mean ROC curves for cough detection , for the bestperforming DNN classifiers in Table 4. AUC values are aver-aged over 14 leave-one-patient-out cross-validation folds dur-ing hyperparameter optimisation. Resnet50 outperforms theLSTM and CNN over a wide range of operating points andhas achieved the highest accuracy of 96.71%.
5. RESULTS
Table 4 lists the performance achieved by our three DNN clas-sifiers for the hyperparameters mentioned in Table 2. Theseresults are averages over the 14 leave-one-patient-out testingpartitions during hyperparameter optimisation.Table 4 shows that the best-performing CNN uses 64 sam-ples (640 msec) long frames and 10 number of segments toachieve an AUC of 0.9499. The optimal LSTM classifierachieves the slightly higher AUC of 0.9572 when using aframe length of 32 samples (320 msec) and 10 number ofsegments. However, the best performance is achieved by theResnet50 architecture, with an AUC of 0.9888 after 50 epochsfrom 32 samples (320 msec) long frames and 10 number ofsegments. Figure 5 shows the mean ROC curves for the opti-mal CNN, LSTM and Resnet50 configurations shown in Ta-ble 4, where the means were calculated over the 14 cross-validation folds. The Resnet50 classifier is superior to theother two classifiers over a wide range of operating points.
Table 4 . Leave-one-out cross-validation results for DNNclassifiers.
The values are averaged over 14 cross-validationfolds.
Frame Seg Clas- Mean Mean Mean Mean( Ψ ) ( C ) -sifier Spec Sens Accuracy AUC
16 5 CNN 83% 86% 84.55% 0.924316 10 CNN 87% 84% 85.66% 0.935832 5 CNN 76% 93% 84.47% 0.927232 10 CNN 84% 86% 85.25% 0.932464 5 CNN 85% 87% 86.31% 0.9339
64 10 CNN 91% 80% 85.82% 0.9499
16 5 LSTM 84% 91% 87.58% 0.944416 10 LSTM 85% 92% 88.32% 0.950432 5 LSTM 79% 95% 87.1% 0.9457
32 10 LSTM 86% 93% 89.21% 0.9572
64 5 LSTM 84% 93% 88.68% 0.95464 10 LSTM 86% 89% 87.66% 0.948916 5 Resnet50 93% 98% 95.43% 0.980216 10 Resnet50 94% 99% 96.35% 0.981232 5 Resnet50 94% 99% 96.54% 0.9810
32 10 Resnet50 94% 99% 96.71% 0.9888
64 5 Resnet50 94% 99% 96.35% 0.985464 10 Resnet50 95% 98% 96.46% 0.9884
6. CONCLUSION AND FUTURE WORK
A deep neural network based cough detector is able to accu-rately discriminate between the accelerometer measurementsdue to coughing and due to other movements as capturedby a consumer smartphone attached to a patient’s bed. Thebest system, using the Resnet50 architecture, achieves anAUC of 0.9888. These experimental results are based ona specially-compiled corpus of manually-annotated acceler-ation measurements, including approximately 6000 coughand 68000 non-cough events, gathered from 14 patients ina small TB clinic. Although accelerometer-based detectionof coughing has been considered before, it has always madeuse of sensors worn by the patient, which is in some respectsintrusive and can be inconvenient. We have shown that ex-cellent discrimination is also possible when the sensor isattached to the patient’s bed. This presents a less intrusivemethod of cough monitoring, which can be of practical use inmonitoring the recovery process of patients, for example inthe clinic where the data was collected. Acceleration-basedmonitoring also has important privacy advantages over thedetection of cough sounds by audio- based monitoring, whichoften raises privacy concerns, and we have found patientsto be uncomfortable in the presence of audio-based moni-toring equipment. Accelerometer-based monitoring, using abed-mounted inexpensive consumer smartphone, represents amore discreet and also cost-effective alternative.In ongoing work, we are attempting to optimise some ofthe Resnet50 metaparameters and also to incorporate audio[26] along with accelerometer measurements with a view tofurther improvement on cough detection results. . REFERENCES [1] J Korp´aˇs, J Sadloˇnov´a, and M Vrabec, “Analysis of the coughsound: an overview,”
Pulmonary Pharmacology , vol. 9, no.5-6, pp. 261–268, 1996.[2] J Knocikova, J Korpas, M Vrabec, and M Javorka, “Waveletanalysis of voluntary cough sound in patients with respiratorydiseases,”
Journal of Physiology and Pharmacology , vol. 59,no. Suppl 6, pp. 331–40, 2008.[3] GHR Botha, G Theron, RM Warren, M Klopper, K Dheda,PD Van Helden, and TR Niesler, “Detection of tuberculosis byautomatic cough sound analysis,”
Physiological Measurement ,vol. 39, no. 4, pp. 045005, 2018.[4] Mahmood Al-khassaweneh and Ra’ed Bani Abdelrahman, “Asignal processing approach for the diagnosis of asthma fromcough sounds,”
Journal of Medical Engineering & Technology ,vol. 37, no. 3, pp. 165–171, 2013.[5] Renard Xaviero Adhi Pramono, Syed Anas Imtiaz, and EstherRodriguez-Villegas, “A cough-based algorithm for automaticdiagnosis of pertussis,”
PloS one , vol. 11, no. 9, pp. e0162128,2016.[6] Igor DS Miranda, Andreas H Diacon, and Thomas R Niesler,“A comparative study of features for acoustic cough detectionusing deep architectures,” in . IEEE, 2019, pp. 2601–2605.[7] Helia Mohammadi, Ali-Akbar Samadani, Catriona Steele, andTom Chau, “Automatic discrimination between cough andnon-cough accelerometry signal artefacts,”
Biomedical SignalProcessing and Control , vol. 52, pp. 394–402, 2019.[8] Paul Munyard, Carolyn Busst, Ron Logan-Sinclair, and An-drew Bush, “A new device for ambulatory cough recording,”
Pediatric Pulmonology , vol. 18, no. 3, pp. 178–186, 1994.[9] Lucy Pavesi, Subbu Subburaj, and Kerri Porter-Shaw, “Appli-cation and validation of a computerized cough acquisition sys-tem for objective monitoring of acute cough: a meta-analysis,”
Chest , vol. 120, no. 4, pp. 1121–1128, 2001.[10] Ian M Paul, Kitman Wai, Steven J Jewell, Michele L Shaf-fer, and Vasundara V Varadan, “Evaluation of a new self-contained, ambulatory, objective cough monitor,”
Cough , vol.2, no. 1, pp. 7, 2006.[11] Michael Coyle, P Alexander Derchak, and Lance Jonathan My-ers, “Systems and methods for monitoring cough,” June 12010, US Patent 7,727,161.[12] Jingqi Fan, German Comina, Robert Gilman, Jose Lopez, andBrian H Tracey, “Cough monitoring for pulmonary tuberculo-sis using combined microphone/accelerometer measurements,”
The Journal of the Acoustical Society of America , vol. 135, no.4, pp. 2268–2268, 2014.[13] Johnny Yat Ming Chan, Stephen Anthony Tunnell, and JoshuaAaron Lawrence Jacobs, “Systems, methods and kits formeasuring cough and respiratory rate using an accelerometer,”Sept. 4 2014, US Patent App. 13/783,257.[14] Kota Hirai, Hideyuki Tabata, Mariko Hirayama, TohruKobayashi, Yasumasa Oh, and Hiroyuki Mochizuki, “A new method for objectively evaluating childhood nocturnal cough,”
Pediatric Pulmonology , vol. 50, no. 5, pp. 460–468, 2015.[15] Thomas Drugman, Jerome Urbain, Nathalie Bauwens, RicardoChessini, Carlos Valderrama, Patrick Lebecque, and ThierryDutoit, “Objective study of sensor relevance for automaticcough detection,”
IEEE journal of Biomedical and Health in-formatics , vol. 17, no. 3, pp. 699–707, 2013.[16] Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun,“Deep residual learning for image recognition,” in
Proceed-ings of the IEEE Conference on Computer Vision and PatternRecognition , 2016, pp. 770–778.[17] Peter Wittenburg, Hennie Brugman, Albert Russel, Alex Klass-mann, and Han Sloetjes, “ELAN: a professional frameworkfor multimodality research,” in , 2006.[18] Jason Van Hulse, Taghi M Khoshgoftaar, and Amri Napoli-tano, “Experimental perspectives on learning from imbalanceddata,” in
Proceedings of the 24th International Conference onMachine learning , 2007, pp. 935–942.[19] Bartosz Krawczyk, “Learning from imbalanced data: openchallenges and future directions,”
Progress in Artificial Intelli-gence , vol. 5, no. 4, pp. 221–232, 2016.[20] Nitesh V Chawla, Kevin W Bowyer, Lawrence O Hall, andW Philip Kegelmeyer, “Smote: synthetic minority over-sampling technique,”
Journal of Artificial Intelligence Re-search , vol. 16, pp. 321–357, 2002.[21] Guillaume Lemaˆıtre, Fernando Nogueira, and Christos K Ari-das, “Imbalanced-learn: A python toolbox to tackle the curseof imbalanced datasets in machine learning,”
The Journal ofMachine Learning Research , vol. 18, no. 1, pp. 559–563, 2017.[22] Anthony Windmon, Mona Minakshi, Pratool Bharti, SriramChellappan, Marcia Johansson, Bradlee A Jenkins, and Pon-rathi R Athilingam, “Tussiswatch: A smart-phone system toidentify cough episodes as early symptoms of chronic obstruc-tive pulmonary disease and congestive heart failure,”
IEEEJournal of Biomedical and Health Informatics , vol. 23, no. 4,pp. 1566–1573, 2018.[23] Bo Liang, SD Iwnicki, and Yunshi Zhao, “Application ofpower spectrum, cepstrum, higher order spectrum and neuralnetwork analyses for induction motor fault diagnosis,”
Me-chanical Systems and Signal Processing , vol. 39, no. 1-2, pp.342–360, 2013.[24] Gen Takahashi, Takeshi Yamada, Shoji Makino, and NobutakaOno, “Acoustic scene classification using deep neural networkand frame-concatenated acoustic feature,”
Detection and Clas-sification of Acoustic Scenes and Events , 2016.[25] Claude Sammut and Geoffrey I Webb, “Leave-one-out cross-validation,”
Encyclopedia of Machine Learning , pp. 600–601,2010.[26] Madhurananda Pahar and Leslie S Smith, “Coding and Decod-ing Speech using a Biologically Inspired Coding System,” in2020 IEEE Symposium Series on Computational Intelligence(SSCI)