A Lightweight CNN Model for Detecting Respiratory Diseases from Lung Auscultation Sounds using EMD-CWT-based Hybrid Scalogram
Samiul Based Shuvo, Shams Nafisa Ali, Soham Irtiza Swapnil, Taufiq Hasan, Mohammed Imamul Hassan Bhuiyan
IIEEE JOURNAL OF BIOMEDICAL AND HEALTH INFORMATICS, VOL. XX, NO. XX, XXXX 2020 1
A Lightweight CNN Model for DetectingRespiratory Diseases from Lung AuscultationSounds using EMD-CWT-based Hybrid Scalogram
Samiul Based Shuvo , Shams Nafisa Ali , Soham Irtiza Swapnil ,Taufiq Hasan , Member, IEEE and Mohammed Imamul Hassan Bhuiyan , Member, IEEE
Abstract —Listening to lung sounds through auscultation isvital in examining the respiratory system for abnormalities.Automated analysis of lung auscultation sounds can be beneficialto the health systems in low-resource settings where there is alack of skilled physicians. In this work, we propose a lightweightconvolutional neural network (CNN) architecture to classify res-piratory diseases using hybrid scalogram-based features of lungsounds. The hybrid scalogram features utilize the empirical modedecomposition (EMD) and continuous wavelet transform (CWT).The proposed scheme’s performance is studied using a patientindependent train-validation set from the publicly availableICBHI 2017 lung sound dataset. Employing the proposed frame-work, weighted accuracy scores of 99.20% for ternary chronicclassification and 99.05% for six-class pathological classificationare achieved, which outperform well-known and much largerVGG16 in terms of accuracy by 0.52% and 1.77% respectively.The proposed CNN model also outperforms other contemporarylightweight models while being computationally comparable.
Index Terms —Lung auscultation sound, respiratory diseasedetection, lightweight convolutional neural networks, empiricalmode decomposition, continuous wavelet transform, scalogram.
I. I
NTRODUCTION L UNG diseases are the third largest cause of death in theworld [1]. According to the World Health Organization(WHO), the five major respiratory diseases [2], namely chronicobstructive pulmonary disease (COPD), tuberculosis, acutelower respiratory tract infection (LRTI), asthma, and lungcancer, cause the death of more than 3 million people eachyear worldwide [3], [4]. These respiratory diseases severelyaffect the overall healthcare system and adversely affect thelives of the general population. Prevention, early diagnosis andtreatment are considered key factors for limiting the negativeimpact of these deadly diseases.Auscultation of the lung using a stethoscope is the tra-ditional diagnostic method used by specialists and generalpractitioners for the initial investigation of the respiratorysystem. Although physicians use various other investigationstrategies such as plethysmography, spirometry, and arterial Samiul Based Shuvo, Shams Nafisa Ali, Soham Irtiza Swapnil and Tau-fiq Hasan are with Department of Biomedical Engineering, BangladeshUniversity of Engineering and Technology (BUET), Dhaka-1205, Bangladesh,Email: { sbshuvo.bme.buet, snafisa.bme.buet, swapnil.buetbme } @gmail.com,taufi[email protected]. Mohammed Imamul Hassan Bhuiyan is with Department of Electrical andElectronic Engineering, Bangladesh University of Engineering and Technol-ogy, Dhaka-1205, Bangladesh, Email: [email protected] authors share first authorship on, and contributed equally to, this work.Manuscript received August XX, 20XX; revised September XX, 20XX. blood gas analysis, lung sound auscultation remains as a vitaltool for physicians due to its simplicity and low-cost [5].The primary classification of these non-periodic and non-stationary sounds consists of two groups: normal (vesicular)and abnormal (adventitious) [6]. The first group is observedwhen there are no respiratory diseases, while the latter groupindicates complications in the lungs or airways [7]. Crackle,wheeze, rhonchus, squawk, stridor, and pleural rub are thecommonly known abnormal lung sounds. These anomaliescan be differentiated from the normal lung sounds on thebasis of frequency, pitch, energy, intensity, timbre, and mu-sicality [8], [9]. Therefore, lung sounds are of particularimportance for recognizing specific respiratory diseases andassessing its chronic-nonchronic characteristics. However, thesubtle differences between some of the adventitious lung soundclasses can be a strenuous task even for a specialist and mayintroduce subjectivity in the diagnostic interpretation [10]. Inthis scenario, artificial intelligence (AI)-powered algorithmscan be of benefit in automatically interpreting lung sounds,especially in underdeveloped regions of the world, with ascarcity of skilled physicians.In the past decade, a number of research approaches havebeen considered and evaluated for automatic identificationof respiratory anomalies from lung auscultations sounds.Numerous feature extraction techniques including statisticalfeatures [11], entropy-based features [12], wavelet coeffi-cients [13], Mel Frequency Cepstral Coefficients (MFCC) [10],spectrograms [14], scalograms [15]etc. have been adoptedin conjunction with a diverse set of machine learning (ML)algorithms [10]–[24].With the advent of deep learning (DL), new developmentshave been made in recent times, demonstrating highly promis-ing results in diversified applications, including biomedicalengineering and clinical diagnostics [25]–[30]. With the abilityof automatic feature learning, deep learning (DL) approachesare more generic and can mitigate the limitations of traditionalML-based methods. In the same vein, DL-based paradigmsthat are employed in the recent years for the identification ofrespiratory anomalies and pathologies from lung auscultationdata have exhibited highly promising performance [5], [31]–[42]. However, for attaining proper functionality, the deepnetworks require to undergo an extensive training schemewith a large training dataset that subsequently calls for aconsiderable amount of time and the engagement of powerfulcomputational resources. As a result, it becomes quite chal- a r X i v : . [ ee ss . SP ] S e p EEE JOURNAL OF BIOMEDICAL AND HEALTH INFORMATICS, VOL. XX, NO. XX, XXXX 2020 2
Fig. 1.
A graphical overview of the proposed framework. After several generic preprocessing steps, the lung sound signals are converted intoscalograms using both conventional and hybrid approaches. The resulting images are further augmented and fed into the proposed lightweight CNNmodel to carry out a two-way classification of respiratory diseases: (i) Chronic and (ii) Pathological. lenging to incorporate the deep learning frameworks in thecurrently available wearable devices and mobile platforms. Inorder to reduce the number of parameters of these networks,various methods have been investigated, including weightquantization [36], lightweight networks [43], and low precisioncomputation [44].While constructing AI-assisted automated medical diagnosisframeworks, patient specificity in the train and validationdataset should be considered a salient factor to produce reliableresults for unseen patient data, especially for chronic diseases[45], [46]. Due to the available medical data’s sparse nature,this factor is often neglected in the existing literature. Therandom adoption of 80%-20% or any other percentage of thetrain-validation split of the dataset corrupts most of the workswith intra-patient dependency and ultimately the obtainedresults do not stand out to be consistent and generalized incase of a new patient [36]. Although this patient-independentdivision requires additional time and effort, the achievedresults are more generalizable and represent the real-worldscenarios.In this work, a lightweight CNN architecture is proposedto perform respiratory disease classification (ternary chronicclassification and six class pathology classification) utiliz-ing the ICBHI 2017 scientific challenge respiratory sounddatabase [47] while maintaining patient independent train-validation dataset splitting strategy. A hybrid approach for ob-taining scalograms from respiratory sound signals is presentedwherein continuous wavelet transform (CWT) is performedonly on the maximally correlated intrinsic mode function(IMF) of the empirically decomposed (EMD) respiratorysound signals. The class discrimination capability of a hybridscalogram is evaluated with respect to the CWT-based con-ventional scalogram. Subsequently, along with the proposedCNN model, complex CNN models such as VGG16 [48],AlexNet [49] and several contemporary lightweight archi-tectures including MobileNet V2 [50], NASNet [51] andShuffleNet V2 [52] are used for classifying the scalogramimages to detect respiratory diseases in different categories.A comparative study among the proposed CNN model andthe others is presented in terms of detection performance andbeing a lightweight network.The rest of the paper is organized as follows. Previous studies related to lung sound classification using differentML-based approaches are discussed in Section II. Section IIIdescribes the dataset, feature extraction process, and the pro-posed lightweight CNN model. The experimental setup andresults are discussed in Section IV. The performance of theproposed method is compared with other works in Section V.Finally, the concluding remarks are provided in Section VI.II. R
ELATED W ORKS
Many research works employing machine learning, anddeep learning have been reported on developing automatedsystems for respiratory sound classification. However, themajority of the works have focused on respiratory anomalyprediction, basically classifying the lung sounds as wheeze,crackles [10]–[24], [31]–[36] rather than directly predictingrespiratory diseases from lung auscultation recordings. Thefew works geared towards pathology classification are veryrecent and mostly involve elaborate processing or dedicatedCNN and RNN frameworks due to the inherent complexity ofthe signal [37]–[42]. However, at pathology-level, so far, theclassification task has been investigated at three different res-olutions; the binary classification (healthy, pathological) [37],[38], the ternary chronic classification (healthy, chronic dis-ease, non-chronic disease) [38], [42] and multi-class distinctdisease classification [39], [42]. Among the diseases, Upperand Lower Respiratory Tract Infection (URTI and LRTI),bronchiolitis and pneumonia have been included in the non-chronic disease class while COPD, asthma and bronchiectasishave been combined to form the chronic class [38].In [37], a novel CNN based ternary classification approachhas been implemented and performed considerably well with82% accuracy and 88% ICBHI score. Later, the same authorsproposed a Mel-Frequency Cepstral Coefficient (MFCC) andLong Short-term Memory (LSTM) based framework capableof conducting both binary and ternary classification of respira-tory diseases [38] which demonstrated excellent performancewith 99% and 98% accuracy, respectively. Another workinvolving complex RNN architecture and extensive preprocess-ing has reported accuracy of 95.67%0.77% in predicting sixclass pathology-driven diseases [39]. However, by employinga CRNN network with a CNN-Mixture-of-Experts (MoE)baseline to learn both spatial and time-sequential features
EEE JOURNAL OF BIOMEDICAL AND HEALTH INFORMATICS, VOL. XX, NO. XX, XXXX 2020 3 from the spectrograms, recent work has achieved a specificityof 83% and a sensitivity of 96% in ternary respiratory dis-ease classification [40]. For binary classification, the samework has reported specificity and sensitivity of 83% and99%, respectively. As an extension of [40], a separate studyinvolving the robust Teacher-Student learning schemes withknowledge distillation has been conducted, which resultedin a substantially reduced specificity while maintaining thesensitivity [41].Since the existing heavily imbalanced datasets of lungauscultations further exacerbate the task of respiratory dis-ease classification, a contemporary study has dealt with thisissue by experimenting with several data augmentation tech-niques, such as SMOTE, Adaptive Synthetic Sampling Method(ADASYN) and Variational autoencoder (VAE) [42]. Amongthe methods, the VAE-based Mel-spectrogram augmentationstrategy, in conjunction with a CNN model, has achieved thebest results with 98.5% sensitivity and 99.0% specificity internary chronic classification. The strategy has also exhibitedan equally sophisticated performance with 98.8% sensitivityand 98.6% specificity in the case of six class respiratorydisease classification [42].Although the scope of DL-based frameworks with aspectrogram-based feature extraction strategy has been inves-tigated in several works for direct classification of respiratorydiseases from lung auscultations [40]–[42], to the best of theknowledge of the authors, scalogram based approaches havenot been explored in this domain. Additionally, no dedicatedlightweight, efficient CNN framework has been developedand investigated for the respiratory disease classification task.Furthermore, none of the studies consider the issue of intra-patient dependency in the train-validation split. Inspired by allof these factors, a scalogram based approach in conjunctionwith a lightweight CNN is proposed in this paper for theprediction of respiratory diseases from lung auscultations,maintaining patient independence. The proposed frameworkis schematically represented in Fig. 1.III. M
ATERIALS AND M ETHODS
A. ICBHI 2017 Dataset
ICBHI (International Conference on Biomedical HealthInformatics) 2017 database is a publicly available benchmarkdataset of lung auscultations [47]. It is collected by two inde-pendent research teams of Portugal and Greece. The datasetcontains 5.5 hours of audio recordings sampled at differentfrequencies (4 kHz, 10 kHz, and 44.1 kHz), ranging from 10sto 90s, in 920 audio samples of 126 subjects from differentanatomical positions with heterogeneous equipment [53].The samples are professionally annotated considering twoschemes: 1. according to the corresponding patients patho-logical condition, i.e. healthy and seven distinct diseaseclasses, namely Pneumonia, Bronchiectasis, COPD, URTI,LRTI, Bronchiolitis, Asthma and 2. according to the presenceof respiratory anomalies, i.e. crackles and wheezes in eachrespiratory cycle. Further details about the dataset and datacollection methods can be found in [53].
B. Data Prepossessing1) Noise filtering:
Since 50 Hz to 2500 Hz is the acknowl-edged frequency range of the lung auscultation signals [7], therecorded audio signals are filtered with a 6th order Butterworthbandpass filter, thus retaining 50 Hz to 2500 Hz frequencycomponents. Subsequently, all the sample signals are resam-pled to 22050 Hz for ensuring consistency and normalized tothe range [-1,1] for attaining device homogeneity.
2) Segmentation of the sound data:
Each of the audiorecordings is segmented according to the annotated respiratorycycle timing with a 6s duration each. Samples with a minimumrespiratory cycle duration of 3s are taken into account to obtainuseful respiratory sound information [40]. Post performingthis procedure, 2 of the disease classes, namely Asthma andLRTI, are found to have inadequate segmented samples formeaningful feature extraction and therefore, these two classesare not considered in our study. After these procedures, lungauscultation sounds from 87 out of 120 independent patientsare usable. Table I represents data distribution at several levelsof processing corresponding to the disease classes consideredfor this study.
TABLE ID
ISTRIBUTION OF D ATA AT D IFFERENT P ROCESSING L EVELS C ORRESPONDING TO THE D ISEASE C LASSES
DiseaseName No. ofunseg-mentedsound file No. ofSegmentedand FilteredSample No. ofUniquePatient No. ofGeneralizedAugmentedImage
Pneumonia 37 41 3 164Bronchiectasis 16 55 6 220COPD 793 1,963 51 1963Healthy 35 42 13 168URTI 23 21 8 84Bronchiolitis 13 65 6 260Total 917 2187 87 2859
C. Feature extraction1) Empirical Mode Decomposition (EMD):
EMD is apowerful self-adaptive signal decomposition method especiallyin the time scale and energy distribution aspects and highlysuitable for analysis and processing of non-linear and non-stationary signals such as lung sounds and heart sounds [54]. Itdecomposes a given signal x(t) into a finite set (N) of intrinsicmode functions,
IMF (t), IMF (t), . . . . , IMF N (t) , dependingon the local characteristic time scale of the signal, with a viewto expressing the original signal as the sum of all its IMF plusa final trend either monotonic or constant called residue, r(t) a: x ( t ) = (cid:80) Ni =1 IM F i ( t ) + r ( t ) [55]. An IMF is a simpleoscillatory function with the equal number of extrema and zerocrossings and its envelopes must be symmetrical with respectto zero. Thus, the EMD detrends a signal and elicits underlyingspectral patterns [54].
2) Continuous Wavelet Transform (CWT):
Wavelet trans-form is defined as a signal processing method that can de-compose a signal into an orthonormal wavelet basis or intoa set of independent frequency channels [15], [29]. Using abasis function, i.e. the mother wavelet g(t) , and its scaled and
EEE JOURNAL OF BIOMEDICAL AND HEALTH INFORMATICS, VOL. XX, NO. XX, XXXX 2020 4 dilated versions, the Continuous Wavelet Transform (CWT)can be used to decompose a finite-energy signal, x(t) as [30]: Z ( a, b ) = (cid:90) x ( t ) ∗ g ( t )( t − ab ) (1)where b denotes time location and a is scale factor. Largerscale values reveal low-frequency information while thesmaller scale values reveal high-frequency information [29].The squared-modulus of the CWT coefficients Z is known asthe scalogram [15]. D. Scalogram Representations1) Conventional Scalogram:
Scalogram is defined as thetime-frequency representation of a signal that depicts theobtained energy density using CWT [5], [56]. The segmentedand filtered lung sound samples are decomposed into corre-sponding wavelet coefficients in MATLAB 2020a by usingMorse analytic wavelet. Scalogram plots are generated with aresolution of 224 224 using these coefficients. Fig. 2 showsthe scalograms of lung sounds in different disease categories.
2) Hybrid Approach for Scalogram:
For each segmentedand filtered sample under each pathological class, 9 IMFsare generated using the EMD function in MATLAB 2020a.Based on the cross-correlation between the source signaland the IMFs, the most physically significant IMF outputwith the highest correlation coefficient is determined [57],[58]. Subsequently, the squared-modulus of the CWT of thecorresponding IMF is calculated to obtain the scalogram.The diverse frequency bands varying from the maximumto the minimum range give the IMFs the capability to extractthe temporal and spectral information [55] effectively. Hence,when this IMF based scheme is combined with CWT-orientedscalogram representation, the newly formed hybrid scalogramscan demonstrate more discriminative and significant features.Thus, it has the potential to provide better classification per-formance by a CNN model. The box plots of the scalogramsof lung sounds for various respiratory diseases are shownin Fig. 3. The distinction among the plots is more evidentwhen using the hybrid approach than those of the conventionalscalograms obtained using only CWT.It should be mentioned that the proposed scalogram isdistinctly different from that of [5], [58] in that the CWTmodulus is computed from the maximally correlated IMFsand thus, providing a better representation of the underlyinginformation. Note that the works of [5] and [58] are ondetecting respiratory anomalies such as crackle and wheeze,
Fig. 2.
Scalograms of the lung auscultation sounds for 6 disease classes;lung sound recordings (1st row), conventional scalogram (2nd row) andscalogram using the proposed hybrid approach (3rd row).
Fig. 3.
Box plot. (a) Scalogram using the conventional CWT approach;(b) Scalogram using the hybrid approach. and analysis and segmentation of heart sounds, respectively,whereas our objective is to detect respiratory diseases fromthe lung auscultations.
E. Augmentation
The ICBHI 2017 dataset is highly imbalanced, with around86% of the data belonging to COPD. Image augmentation us-ing different color mapping schemes is employed to oversam-ple the less represented classes and address the data imbalanceissue [59]. Colormaps are three-column arrays containing RGBtriplets where each row defines a distinct color. Scalogramrepresentation using different color maps helps generalize theproduced images.From each of the audio samples of the less represented dataclasses, four scalograms are generated for each segmentedsample using four different color mapping schemes: Parula,HSV, Jet, and Hot, which are available in MATLAB 2020awhile for the most represented class, COPD, only one image isproduced from each audio sample. Nevertheless, for ensuringgeneralization and homogeneity of the augmented data, allfour-color mapping schemes are randomly utilized for COPD.A summary of segmented audio files and final augmentedscalogram images with corresponding diseases classes arepresented in Table I.IV. P
ROPOSED LIGHTWEIGHT
CNN
ARCHITECTURE
CNN has become a popular approach for classifying imagedata, and recently there have been several works using CNN onclassifying images produced from sounds [5], [31], [33]. How-ever, due to memory constraints, a regular deep CNN model iscomputationally expensive with its large number of learnableparameters and arithmetic operations. Thus, it is not suitablefor embedded devices as they cannot afford the processingcomplexity and storage space for parameters and weightvalues of filters [36]. Cloud computing methodology requiresa higher RAM for this computationally intensive training andhence are outsourced [60]. For this reason, Lightweight CNNmodels are gaining popularity among researchers for theirfaster performance and compact size without compromisingthe much-needed accuracy performance compared to the well-known deep learning networks [61].The architecture of the proposed CNN model consists ofan input layer corresponding to the 3-channel input of 224224images. The architecture of the proposed model is illustratedin Fig. 4.
EEE JOURNAL OF BIOMEDICAL AND HEALTH INFORMATICS, VOL. XX, NO. XX, XXXX 2020 5
Fig. 4.
The detailed architecture of the proposed lightweight CNN model.
The 1st convolutional layer uses 64 output filters with a 55-pixel size kernel followed by a 22-pixel max-pooling layer.Three additional convolutional layers are stacked over the firstlayer, each having a 33-pixel size kernel with 64, 96, and 96filters sequentially and corresponding batch-normalization andmax-pooling layers with 22 pooling window. Outputs fromall these layers are flattened and connected with five pairs ofFC and dropout layers, followed by a SoftMax output layerwith probability nodes for each class. ReLU activation layeris applied within convolution calculation and fully connectedlayers and employed to introduce nonlinearity within thecalculation and reduce the time for convergence. It does notget activated for any negative value. Max pooling is usedafter the ReLU activation; it reduces the spatial dimensionalityof the extracted feature maps, extracts the most importantfeatures, and is unaffected from locational bias [35]. Toovercome the problem of more diverse data variance, the BatchNormalization layer with every convolution layer normalizesthe extracted feature. It gives the network a representativepower with a small number of parameters and faster trainingcapability by reducing the variance.V. E
XPERIMENTAL R ESULTS
A. Evaluation Criteria
The augmented image sets are divided into 80% trainingand 20% validating parts for training and fine-tuning themodel hyperparameters. Patient uniqueness, a critical aspect inthe real-world applications, is maintained while dividing intotraining and validation parts as speaker dependency results inbiased accuracy [46].The classifier models’ performance is evaluated based onthe well-known evaluation matrices, namely, accuracy, recall(sensitivity), precision, and F1-score. Additionally, specificityand ICBHI-score [38], [53], a dedicated metric involving bothsensitivity and specificity to assess the performance of theframeworks using the ICBHI dataset, is used to evaluate theperformance of our method.
B. Experimental Setup
The proposed CNN model is constructed using Kerasand TensorFlow backend, and trained using NVidia K80GPUs provided by Kaggle notebooks. The mini-batch training scheme is employed while feeding the image data into amodel for tackling the class imbalance issue. This techniqueperforms by oversampling the scarce classes while randomlyundersampling a majority class. This strategy ensures that theCNN model takes an equal number of samples from eachclass during each of the training epochs and thereby formsa balanced training set [26].The adaptive learning rate optimizer (Adam) with the learn-ing rate of 0.00001 is used for compiling the model. The batchsize needs to be a multipler of 6 since an equal number ofsamples from each of the 3 and 6 data classes are taken ineach training and validation batch [26]. In this study, batchsize 6 has been taken for training and validation of both theclassification schemes.As stated earlier, both the ternary chronic classification(chronic, non-chronic, healthy) and six class (Bronchiecta-sis, Bronchiolitis, COPD, Healthy, Pneumonia, and URTI)pathological classification are carried out in this work. Theclassification performance of the proposed CNN model iscompared with that of VGG16, a well-known CNN architec-ture for image classification [48] in both of the classificationschemes. It should be noted that the experiments are performedusing both the convention CWT-based scalogram and hybridscalogram images. In addition, the performance of our pro-posed CNN model is compared with a number of well-knownDL architectures such as VGG16 and AlexNet and severallightweight networks in terms of computational complexityand accuracy.
C. Classification Performance of the Proposed Framework1) Chronic Classification:
From Table II, it can be seenthat using the hybrid scalogram method in conjunction withthe proposed CNN model classifier shows the best accuracy,99.21%. However, the corresponding accuracy obtained byusing VGG16 is quite close (98.89%). Despite being a heavymodel, the comparatively lower accuracy of VGG16 can beattributed to the over-fitting issue due to the limited numberof images in different classes. When comparing conventionalCWT scalogram to the proposed hybrid scalogram, consider-able improvement in accuracy is evident for the latter usingVGG16 and our proposed CNN model (9.5%-11.4%). Thecorresponding confusion matrices for both the models’ bestresults are illustrated in Fig. 5. The results depict that the
EEE JOURNAL OF BIOMEDICAL AND HEALTH INFORMATICS, VOL. XX, NO. XX, XXXX 2020 6
TABLE IIS
UMMARY OF THE C LASSIFICATION P ERFORMANCE T HE RED AND BLUE MARKED VALUES REPRESENT THE HIGHEST ACCURACY OBTAINED WITH OUR PROPOSED
CNN
MODEL AND
VGG16
Network Chronic Classification Pathological ClassificationScalogram using CWT Scalogram using EMD and CWT Scalogram using CWT Scalogram using EMD and CWTPrec. Recall Acc. F1 Prec. Recall Acc. F1 Prec. Recall Acc. F1 Prec. Recall Acc. F1
Proposedmodel 95.00 91.00 90.58 92.00 99.25 99.20 99.20 99.22 87.00 86.00 86.31 86.00 99.12 99.05 99.05 98.96VGG16 92.00 89.00 88.58 90.00 99.03 98.69 98.69 99.11 85.00 85.00 85.80 85.00 97.80 97.32 97.32 97.01TABLE IIIC
OMPARISON OF THE P ROPOSED F RAMEWORK WITH E XISTING W ORKS U SING THE
ICBHI 2017 D
ATASET
Processing Type of Training Network Number of Prediction Classes Acc. Spec. Sen. ICBHI Score
Gammatone Spectrogram [40], [41] C-RNN 3 (Healthy, Chronic, Non-chronic) - 0.57 0.94 0.76CNN-MoE - 0.86 0.96 0.91Ensemble - 0.71 0.95 0.83MFCC [38] CNN 3 (Healthy, Chronic, Non-chronic) 0.82 0.76 0.89 0.83LSTM 0.98 0.82 0.98 0.90MFCC [39] RNN 6 classes (excluding Asthma, LRTI) 0.9567 - 0.9567 -Mel-spectrogram- VAE [42] CNN 3 (Healthy, Chronic, Non-chronic) 0.99 0.990 0.985 0.9886 classes (excluding Asthma, LRTI) 0.99 0.986 0.988 0.987
Hybrid Scalogram (Proposed)
Lightweight CNN 3 (Healthy, Chronic, Non-chronic) proposed method is better in ternary chronic classification thanthe VGG16.
2) Pathological Classification:
For six-class Pathologicalclassification, the proposed method involving hybrid scalo-gram and proposed CNN model classifier yields the bestaccuracy, 99.05%, as seen in Table II. Similar to the casein the ternary chronic classification scheme, the accuracy ofVGG16 is slightly lower. However, since the dataset gets moresegregated being divided into six different disease classes, theaccuracy drop is more here. The proposed hybrid scalogramoutperforms the conventional CWT scalogram with a largermargin (13.4%-14.7%) for both VGG16 and our proposedmodel. In general, the proposed method gives a better per-formance, which is apparent from the best confusion matricesshown in Fig. 6.
D. Comparison with Other Works1) Respiratory Disease Classification:
As discussed in sec-tion II, none of the existing works for respiratory disease
Fig. 5.
Confusion matrices for the best results obtained in ternarychronic classification. (a) Proposed CNN model with batch size 6; (b)VGG16 with batch size 6. classification explore the domain of patient-specific prediction.Some of the studies address the issues regarding class im-balance [39], [42].Nevertheless, the extensive preprocessing,coupled with the ambiguous undersampling of the COPDdisease class while oversampling all other disease classes, cancomplicate the reproducibility of [39]. Furthermore, in [42],FFT is applied to the entire respiratory sound signals, whereasour work focuses on segmented breath sounds. In our work,complete patient independence has been maintained in thetrain and validation set, which is not possible while usingthe entire lung auscultation signal due to the low numberof samples. Therefore, our work aims to overcome all thedrawbacks present in the existing methods. A comparisonamong the various methods, including the Proposed method,is provided in Table III.It is observed that our proposed CNN model with the hybridscalogram can perform on par with the existing state-of-the-artCNN and RNN models for both cases of classification whilemaintaining a patient independent train-validation scheme.
2) Computational Performance as a Lightweight Network:
A detailed comparison is presented in Table IV amongVGG16 [48], our proposed CNN model, AlexNet [49] andthe existing state-of-the-art lightweight models such as Mo-bileNetV2 [50], NASNet [51], ShuffleNetV2 [52] in terms ofsize, trainable parameters, the number of operations measuredby multiply-add (MAdd) and accuracy on both chronic andpathological approach. In terms of accuracy, our proposedCNN model shows better results than VGG16 while requiringonly 3% of the parameters. The proposed CNN model alsooutperforms the contemporary lightweight models, ShuffleNetV2, MobileNet V2, and NASNet relatively by 0.16%, 0.32%,and 0.80%, respectively, while obtaining better trade-off be-tween the number of parameters, requiring significantly lowerstorage space and computational power. This makes our pro-posed lightweight model more suitable for real-time wearable
EEE JOURNAL OF BIOMEDICAL AND HEALTH INFORMATICS, VOL. XX, NO. XX, XXXX 2020 7 devices with faster and less resource-intensive training.We have calculated the time required for the end-to-endclassification of an auscultation sound using our framework.For this experiment, we only performed the preprocessing andinference step using all of our test data and calculated the meanand standard deviation of the required CPU time. We foundthat the preprocessing time for EMD+CWT is s ± . s. , andonly CWT is . s ± . s . These processes are run on a Core i77500 processor with a 2.70-2.90GHz speed. Time required forthe classification of a scalogram using the proposed network is . s ± . s , while the MobileNetV2 takes . s ± . s .Thus, the proposed CNN is faster in classifying a sound imageas compared to MobileNetV2. TABLE IVC
OMPARISONS AMONG S EVERAL M ODELS FROM L IGHTWEIGHT P ERSPECTIVE
Parameter Network
VGG16 AlexNet
Proposedmodel
Mobile-Net(v2) Shuffle-Net(v2) NASNetSize(aftertraining) 1.5GB 294MB 44.85MB 49MB 46.9MB 64MBTrainableparame-ters 138M 25.704M 3.7674M 4.2M 5.4M 4.2MMAdd 154.7G 725M 371.93M 575M 564M 567MAccuracy(6 classes) 97.60% 98.237% 99.05% 98.89% 98.27% 98.73%Accuracy(3 classes) 97.60% 99.519% 99.21% 98.72% 99.06% 98.42%
VI. C
ONCLUSION
In this work, we have proposed a lightweight CNN model toclassify respiratory diseases using scalogram images of lungsounds. A hybrid approach employing both EMD and CWTis presented to generate the scalogram images. The publiclyavailable ICBHI 2017 challenge dataset has been used for theChronic and Pathological classification of respiratory diseases.The proposed method has provided a considerable accuracy of99.21% for ternary chronic classification. In pathological clas-sification among six disease classes, an accuracy of 99.05%is achieved. The obtained accuracies are higher than VGG16,which is a much larger network. In addition, for both cases
Fig. 6.
Confusion matrices for the best results obtained in six-classPathological classification. (a) Proposed CNN model with batch size 6;(b) VGG16 with batch size 6. of classifications, the proposed framework provides better ora comparable performance with respect to the existing state-of-the-art methods in terms of Precision, Recall, F1-score,Sensitivity, Specificity and ICBHI score. It is worthwhile tomention that unlike most of these methods, the classificationperformance of the proposed technique has been assessed,keeping the training and testing data-independent in terms ofpatients. The proposed classifier’s computational complexityhas also been compared with a number of well-known CNNmodels and state-of-the-art lightweight networks. It has beenshown to achieve high accuracy in classification while being alightweight deep architecture. We believe that these attributescan enable the development of the automatic classificationof respiratory diseases from lung auscultations in real-worldclinical applications. R
Global surveillance, prevention and control of chronic respi-ratory diseases: a comprehensive approach . World Health Organization,2007.[3] C. D. Mathers and D. Loncar, “Projections of global mortality andburden of disease from 2002 to 2030,”
PLoS medicine
Artificial Intelligence in Medicine , vol. 103, p. 101809, 2020.[6] A. Abbas and A. Fahim, “An automated computerized auscultation anddiagnostic system for pulmonary diseases,”
Journal of Medical Systems ,vol. 34, no. 6, pp. 1149–1155, 2010.[7] S. Reichert, R. Gass, C. Brandt, and E. Andr`es, “Analysis of respiratorysounds: state of the art,”
Clinical Medicine. Circulatory, Respiratory andPulmonary Medicine , vol. 2, pp. CCRPM–S530, 2008.[8] M. Sarkar, I. Madabhavi, N. Niranjan, and M. Dogra, “Auscultation ofthe respiratory system,”
Annals of Thoracic Medicine , vol. 10, no. 3, p.158, 2015.[9] A. Bohadana, G. Izbicki, and S. S. Kraman, “Fundamentals of lungauscultation,”
New England Journal of Medicine , vol. 370, no. 8, pp.744–751, 2014.[10] M. Bahoura and C. Pelletier, “New parameters for respiratory soundclassification,” in
Canadian Conference on Electrical and ComputerEngineering , vol. 3. IEEE, 2003, pp. 1457–1460.[11] R. Palaniappan, K. Sundaraj, and N. U. Ahamed, “Machine learning inlung sound analysis: a systematic review,”
Biocybernetics and Biomed-ical Engineering , vol. 33, no. 3, pp. 129–135, 2013.[12] J. Zhang, W. Ser, J. Yu, and T. Zhang, “A novel wheeze detection methodfor wearable monitoring systems,” in . IEEE, 2009, pp.331–334.[13] M. Bahoura, “Pattern recognition methods applied to respiratory soundsclassification into normal and wheeze classes,”
Computers in Biologyand Medicine , vol. 39, no. 9, pp. 824–843, 2009.[14] J. Acharya, A. Basu, and W. Ser, “Feature extraction techniques for low-power ambulatory wheeze detection wearables,” in . IEEE, 2017, pp. 4574–4577.[15] N. Gautam and S. B. Pokle, “Wavelet scalogram analysis of phonopul-monographic signals,”
International Journal of Medical Engineering andInformatics , vol. 5, no. 3, pp. 245–252, 2013.[16] G. Serbes, C. O. Sakar, Y. P. Kahya, and N. Aydin, “Pulmonary crackledetection using time–frequency and time–scale analysis,”
Digital SignalProcessing , vol. 23, no. 3, pp. 1012–1021, 2013.[17] S. ˙Ic¸er and S¸. Gengec¸, “Classification and analysis of non-stationarycharacteristics of crackle and rhonchus lung adventitious sounds,”
Dig-ital Signal Processing , vol. 28, pp. 18–27, 2014.
EEE JOURNAL OF BIOMEDICAL AND HEALTH INFORMATICS, VOL. XX, NO. XX, XXXX 2020 8 [18] F. Jin, F. Sattar, and D. Y. Goh, “New approaches for spectro-temporalfeature extraction with applications to respiratory sound classification,”
Neurocomputing , vol. 123, pp. 362–371, 2014.[19] P. Bokov, B. Mahut, P. Flaud, and C. Delclaux, “Wheezing recognitionalgorithm using recordings of respiratory sounds at the mouth in apediatric population,”
Computers in Biology and Medicine , vol. 70, pp.40–50, 2016.[20] P. Mayorga, C. Druzgalski, R. Morelos, O. Gonzalez, and J. Vi-dales, “Acoustics based assessment of respiratory diseases using gmmclassification,” in . IEEE, 2010, pp. 6312–6316.[21] T. R. Fenton, H. Pasterkamp, A. Tal, and V. Chernick, “Automatedspectral characterization of wheezing in asthmatic children,”
IEEETransactions on Biomedical Engineering , vol. 32, no. 1, pp. 50–55,1985.[22] H. Pasterkamp, S. S. Kraman, and G. R. Wodicka, “Respiratory sounds:advances beyond the stethoscope,”
American Journal of Respiratory andCritical Care Medicine , vol. 156, no. 3, pp. 974–987, 1997.[23] Z. Dokur, “Respiratory sound classification by using an incrementalsupervised neural network,”
Pattern Analysis and Applications , vol. 12,no. 4, p. 309, 2009.[24] S. Rietveld, M. Oud, and E. H. Dooijes, “Classification of asthmaticbreath sounds: preliminary results of the classifying capacity of humanexaminers versus artificial neural networks,”
Computers and BiomedicalResearch , vol. 32, no. 5, pp. 440–448, 1999.[25] B. Bozkurt, I. Germanakis, and Y. Stylianou, “A study of time-frequencyfeatures for cnn-based automatic heart sound classification for pathologydetection,”
Computers in Biology and Medicine , vol. 100, pp. 132–143,2018.[26] A. I. Humayun, S. Ghaffarzadegan, M. I. Ansari, Z. Feng, and T. Hasan,“Towards domain invariant heart sound abnormality detection usinglearnable filterbanks,”
IEEE Journal of Biomedical and Health Infor-matics , vol. 24, no. 8, pp. 2189–2198, 2020.[27] U. R. Acharya, S. L. Oh, Y. Hagiwara, J. H. Tan, and H. Adeli, “Deepconvolutional neural network for the automated detection and diagnosisof seizure using eeg signals,”
Computers in Biology and Medicine , vol.100, pp. 270–278, 2018.[28] S. Hershey, S. Chaudhuri, D. P. Ellis, J. F. Gemmeke, A. Jansen,R. C. Moore, M. Plakal, D. Platt, R. A. Saurous, B. Seybold, et al. ,“Cnn architectures for large-scale audio classification,” in . IEEE, 2017, pp. 131–135.[29] S. Debbal and F. Bereksi-Reguig, “Analysis of the second heart soundusing continuous wavelet transform,”
Journal of Medical Engineering& Technology , vol. 28, no. 4, pp. 151–156, 2004.[30] A. Meintjes, A. Lowe, and M. Legget, “Fundamental heart soundclassification using the continuous wavelet transform and convolutionalneural networks,” in . IEEE,2018, pp. 409–412.[31] K. Minami, H. Lu, H. Kim, S. Mabu, Y. Hirano, and S. Kido, “Au-tomatic classification of large-scale respiratory sound dataset based onconvolutional neural network,” in . IEEE, 2019, pp. 804–807.[32] M. Aykanat, ¨O. Kılıc¸, B. Kurt, and S. Saryal, “Classification of lungsounds using convolutional neural networks,”
EURASIP Journal onImage and Video Processing , vol. 2017, no. 1, p. 65, 2017.[33] F. Demir, A. Sengur, and V. Bajaj, “Convolutional neural networks basedefficient approach for classification of lung diseases,”
Health InformationScience and Systems , vol. 8, no. 1, p. 4, 2020.[34] R. Liu, S. Cai, K. Zhang, and N. Hu, “Detection of adventitiousrespiratory sounds based on convolutional neural network,” in . IEEE, 2019, pp. 298–303.[35] D. Bardou, K. Zhang, and S. M. Ahmad, “Lung sounds classificationusing convolutional neural networks,”
Artificial Intelligence in Medicine ,vol. 88, pp. 58–69, 2018.[36] J. Acharya and A. Basu, “Deep neural network for respiratory soundclassification in wearable devices enabled by patient specific modeltuning,”
IEEE Transactions on Biomedical Circuits and Systems , vol. 14,no. 3, pp. 535–544, 2020.[37] D. Perna, “Convolutional neural networks learning from respiratorydata,” in . IEEE, 2018, pp. 2109–2113.[38] D. Perna and A. Tagarelli, “Deep auscultation: Predicting respiratoryanomalies and diseases via recurrent neural networks,” in . IEEE, 2019, pp. 50–55.[39] V. Basu and S. Rana, “Respiratory diseases recognition through res-piratory sound with the help of deep neural network,” in . IEEE, 2020, pp. 1–6.[40] L. Pham, I. McLoughlin, H. Phan, M. Tran, T. Nguyen, and R. Pala-niappan, “Robust deep learning framework for predicting respiratoryanomalies and diseases,” arXiv preprint arXiv:2002.03894 , 2020.[41] L. Pham, “Predicting respiratory anomalies and diseases using deeplearning models,” arXiv preprint arXiv:2004.04072 , 2020.[42] M. T. Garc´ıa-Ord´as, J. A. Ben´ıtez-Andrades, I. Garc´ıa-Rodr´ıguez,C. Benavides, and H. Alaiz-Moret´on, “Detecting respiratory pathologiesusing convolutional neural networks and variational autoencoders forunbalancing data,”
Sensors , vol. 20, no. 4, p. 1214, 2020.[43] A. G. Howard, M. Zhu, B. Chen, D. Kalenichenko, W. Wang,T. Weyand, M. Andreetto, and H. Adam, “Mobilenets: Efficient convo-lutional neural networks for mobile vision applications,” arXiv preprintarXiv:1704.04861 , 2017.[44] I. Hubara, M. Courbariaux, D. Soudry, R. El-Yaniv, and Y. Bengio,“Quantized neural networks: Training neural networks with low pre-cision weights and activations,”
The Journal of Machine LearningResearch , vol. 18, no. 1, pp. 6869–6898, 2017.[45] S. Kiranyaz, T. Ince, R. Hamila, and M. Gabbouj, “Convolutional neuralnetworks for patient-specific ecg classification,” in . IEEE, 2015, pp. 2608–2611.[46] N. U. Maheswari, A. Kabilan, and R. Venkatesh, “Speaker independentspeech recognition system based on phoneme identification,” in .IEEE, 2008, pp. 1–6.[47] “ICBHI 2017 Challenge,” 2017, [Online]. Available: https://bhichallenge.med.auth.gr/.[48] K. Simonyan and A. Zisserman, “Very deep convolutional networks forlarge-scale image recognition,” arXiv preprint arXiv:1409.1556 , 2014.[49] A. Krizhevsky, I. Sutskever, and G. E. Hinton, “Imagenet classificationwith deep convolutional neural networks,” in
Advances in Neural Infor-mation Processing Systems , 2012, pp. 1097–1105.[50] M. Sandler, A. Howard, M. Zhu, A. Zhmoginov, and L.-C. Chen, “Mo-bilenetv2: Inverted residuals and linear bottlenecks,” in
IEEE Conferenceon Computer Vision and Pattern Recognition , 2018, pp. 4510–4520.[51] X. Qin and Z. Wang, “Nasnet: A neuron attention stage-by-stage net forsingle image deraining,” arXiv preprint arXiv:1912.03151 , 2019.[52] N. Ma, X. Zhang, H.-T. Zheng, and J. Sun, “Shufflenet v2: Practicalguidelines for efficient cnn architecture design,” in
European conferenceon Computer Vision (ECCV) , 2018, pp. 116–131.[53] B. Rocha, D. Filos, L. Mendes, I. Vogiatzis, E. Perantoni,E. Kaimakamis, P. Natsiavas, A. Oliveira, C. J´acome, A. Marques, et al. , “A respiratory sound database for the development of automatedclassification,” in
International Conference on Biomedical and HealthInformatics . Springer, 2017, pp. 33–37.[54] N. Ibtehaz, M. S. Rahman, and M. S. Rahman, “Vfpred: A fusionof signal processing and machine learning techniques in detectingventricular fibrillation from ecg signals,”
Biomedical Signal Processingand Control , vol. 49, pp. 349–359, 2019.[55] M. Altuve, L. Su´arez, and J. Ardila, “Fundamental heart sounds analysisusing improved complete ensemble emd with adaptive noise,”
Biocyber-netics and Biomedical Engineering , vol. 40, no. 1, pp. 426–439, 2020.[56] Z. Ren, K. Qian, Y. Wang, Z. Zhang, V. Pandit, A. Baird, and B. Schuller,“Deep scalogram representations for acoustic scene classification,”
IEEE/CAA Journal of Automatica Sinica , vol. 5, no. 3, pp. 662–669,2018.[57] R. Fontugne, J. Ortiz, D. Culler, and H. Esaki, “Empirical modedecomposition for intrinsic-relationship extraction in large sensor de-ployments,” in
Workshop on Internet of Things Applications, IoT-App ,vol. 12, 2012.[58] D. Boutana, M. Benidir, and B. Barkat, “Segmentation and time-frequency analysis of pathological heart sound signals using the emdmethod,” in . IEEE, 2014, pp. 1437–1441.[59] F. Y. Shih and H. Patel, “Deep learning classification on opticalcoherence tomography retina images,”
International Journal of PatternRecognition and Artificial Intelligence , vol. 34, no. 08, p. 2052002, 2019.[60] S. Y. Nikouei, Y. Chen, S. Song, R. Xu, B.-Y. Choi, and T. R. Faughnan,“Real-time human detection as an edge service enabled by a lightweightcnn,” in . IEEE, 2018, pp. 125–129.
EEE JOURNAL OF BIOMEDICAL AND HEALTH INFORMATICS, VOL. XX, NO. XX, XXXX 2020 9 [61] B. Lim, B. Yang, and H. Kim, “Real-time lightweight cnn for detectingroad object of various size,” in2018 IEEE Conference on Multimedia Information Processing and Retrieval (MIPR)