[PDF] A Lightweight CNN Model for Detecting Respiratory Diseases from Lung Auscultation Sounds using EMD-CWT-based Hybrid Scalogram

Abstract

Listening to lung sounds through auscultation is vital in examining the respiratory system for abnormalities. Automated analysis of lung auscultation sounds can be beneficial to the health systems in low-resource settings where there is a lack of skilled physicians. In this work, we propose a lightweight convolutional neural network (CNN) architecture to classify respiratory diseases using hybrid scalogram-based features of lung sounds. The hybrid scalogram features utilize the empirical mode decomposition (EMD) and continuous wavelet transform (CWT). The proposed scheme's performance is studied using a patient independent train-validation set from the publicly available ICBHI 2017 lung sound dataset. Employing the proposed framework, weighted accuracy scores of 99.20% for ternary chronic classification and 99.05% for six-class pathological classification are achieved, which outperform well-known and much larger VGG16 in terms of accuracy by 0.52% and 1.77% respectively. The proposed CNN model also outperforms other contemporary lightweight models while being computationally comparable.

Full PDF

IIEEE JOURNAL OF BIOMEDICAL AND HEALTH INFORMATICS, VOL. XX, NO. XX, XXXX 2020 1

A Lightweight CNN Model for DetectingRespiratory Diseases from Lung AuscultationSounds using EMD-CWT-based Hybrid Scalogram

Samiul Based Shuvo , Shams Naﬁsa Ali , Soham Irtiza Swapnil ,Tauﬁq Hasan , Member, IEEE and Mohammed Imamul Hassan Bhuiyan , Member, IEEE

Abstract —Listening to lung sounds through auscultation isvital in examining the respiratory system for abnormalities.Automated analysis of lung auscultation sounds can be beneﬁcialto the health systems in low-resource settings where there is alack of skilled physicians. In this work, we propose a lightweightconvolutional neural network (CNN) architecture to classify res-piratory diseases using hybrid scalogram-based features of lungsounds. The hybrid scalogram features utilize the empirical modedecomposition (EMD) and continuous wavelet transform (CWT).The proposed scheme’s performance is studied using a patientindependent train-validation set from the publicly availableICBHI 2017 lung sound dataset. Employing the proposed frame-work, weighted accuracy scores of 99.20% for ternary chronicclassiﬁcation and 99.05% for six-class pathological classiﬁcationare achieved, which outperform well-known and much largerVGG16 in terms of accuracy by 0.52% and 1.77% respectively.The proposed CNN model also outperforms other contemporarylightweight models while being computationally comparable.

Index Terms —Lung auscultation sound, respiratory diseasedetection, lightweight convolutional neural networks, empiricalmode decomposition, continuous wavelet transform, scalogram.

I. I

NTRODUCTION L UNG diseases are the third largest cause of death in theworld [1]. According to the World Health Organization(WHO), the ﬁve major respiratory diseases [2], namely chronicobstructive pulmonary disease (COPD), tuberculosis, acutelower respiratory tract infection (LRTI), asthma, and lungcancer, cause the death of more than 3 million people eachyear worldwide [3], [4]. These respiratory diseases severelyaffect the overall healthcare system and adversely affect thelives of the general population. Prevention, early diagnosis andtreatment are considered key factors for limiting the negativeimpact of these deadly diseases.Auscultation of the lung using a stethoscope is the tra-ditional diagnostic method used by specialists and generalpractitioners for the initial investigation of the respiratorysystem. Although physicians use various other investigationstrategies such as plethysmography, spirometry, and arterial Samiul Based Shuvo, Shams Naﬁsa Ali, Soham Irtiza Swapnil and Tau-ﬁq Hasan are with Department of Biomedical Engineering, BangladeshUniversity of Engineering and Technology (BUET), Dhaka-1205, Bangladesh,Email: { sbshuvo.bme.buet, snaﬁsa.bme.buet, swapnil.buetbme } @gmail.com,tauﬁ[email protected]. Mohammed Imamul Hassan Bhuiyan is with Department of Electrical andElectronic Engineering, Bangladesh University of Engineering and Technol-ogy, Dhaka-1205, Bangladesh, Email: [email protected] authors share ﬁrst authorship on, and contributed equally to, this work.Manuscript received August XX, 20XX; revised September XX, 20XX. blood gas analysis, lung sound auscultation remains as a vitaltool for physicians due to its simplicity and low-cost [5].The primary classiﬁcation of these non-periodic and non-stationary sounds consists of two groups: normal (vesicular)and abnormal (adventitious) [6]. The ﬁrst group is observedwhen there are no respiratory diseases, while the latter groupindicates complications in the lungs or airways [7]. Crackle,wheeze, rhonchus, squawk, stridor, and pleural rub are thecommonly known abnormal lung sounds. These anomaliescan be differentiated from the normal lung sounds on thebasis of frequency, pitch, energy, intensity, timbre, and mu-sicality [8], [9]. Therefore, lung sounds are of particularimportance for recognizing speciﬁc respiratory diseases andassessing its chronic-nonchronic characteristics. However, thesubtle differences between some of the adventitious lung soundclasses can be a strenuous task even for a specialist and mayintroduce subjectivity in the diagnostic interpretation [10]. Inthis scenario, artiﬁcial intelligence (AI)-powered algorithmscan be of beneﬁt in automatically interpreting lung sounds,especially in underdeveloped regions of the world, with ascarcity of skilled physicians.In the past decade, a number of research approaches havebeen considered and evaluated for automatic identiﬁcationof respiratory anomalies from lung auscultations sounds.Numerous feature extraction techniques including statisticalfeatures [11], entropy-based features [12], wavelet coefﬁ-cients [13], Mel Frequency Cepstral Coefﬁcients (MFCC) [10],spectrograms [14], scalograms [15]etc. have been adoptedin conjunction with a diverse set of machine learning (ML)algorithms [10]–[24].With the advent of deep learning (DL), new developmentshave been made in recent times, demonstrating highly promis-ing results in diversiﬁed applications, including biomedicalengineering and clinical diagnostics [25]–[30]. With the abilityof automatic feature learning, deep learning (DL) approachesare more generic and can mitigate the limitations of traditionalML-based methods. In the same vein, DL-based paradigmsthat are employed in the recent years for the identiﬁcation ofrespiratory anomalies and pathologies from lung auscultationdata have exhibited highly promising performance [5], [31]–[42]. However, for attaining proper functionality, the deepnetworks require to undergo an extensive training schemewith a large training dataset that subsequently calls for aconsiderable amount of time and the engagement of powerfulcomputational resources. As a result, it becomes quite chal- a r X i v : . [ ee ss . SP ] S e p EEE JOURNAL OF BIOMEDICAL AND HEALTH INFORMATICS, VOL. XX, NO. XX, XXXX 2020 2

Fig. 1.

A graphical overview of the proposed framework. After several generic preprocessing steps, the lung sound signals are converted intoscalograms using both conventional and hybrid approaches. The resulting images are further augmented and fed into the proposed lightweight CNNmodel to carry out a two-way classiﬁcation of respiratory diseases: (i) Chronic and (ii) Pathological. lenging to incorporate the deep learning frameworks in thecurrently available wearable devices and mobile platforms. Inorder to reduce the number of parameters of these networks,various methods have been investigated, including weightquantization [36], lightweight networks [43], and low precisioncomputation [44].While constructing AI-assisted automated medical diagnosisframeworks, patient speciﬁcity in the train and validationdataset should be considered a salient factor to produce reliableresults for unseen patient data, especially for chronic diseases[45], [46]. Due to the available medical data’s sparse nature,this factor is often neglected in the existing literature. Therandom adoption of 80%-20% or any other percentage of thetrain-validation split of the dataset corrupts most of the workswith intra-patient dependency and ultimately the obtainedresults do not stand out to be consistent and generalized incase of a new patient [36]. Although this patient-independentdivision requires additional time and effort, the achievedresults are more generalizable and represent the real-worldscenarios.In this work, a lightweight CNN architecture is proposedto perform respiratory disease classiﬁcation (ternary chronicclassiﬁcation and six class pathology classiﬁcation) utiliz-ing the ICBHI 2017 scientiﬁc challenge respiratory sounddatabase [47] while maintaining patient independent train-validation dataset splitting strategy. A hybrid approach for ob-taining scalograms from respiratory sound signals is presentedwherein continuous wavelet transform (CWT) is performedonly on the maximally correlated intrinsic mode function(IMF) of the empirically decomposed (EMD) respiratorysound signals. The class discrimination capability of a hybridscalogram is evaluated with respect to the CWT-based con-ventional scalogram. Subsequently, along with the proposedCNN model, complex CNN models such as VGG16 [48],AlexNet [49] and several contemporary lightweight archi-tectures including MobileNet V2 [50], NASNet [51] andShufﬂeNet V2 [52] are used for classifying the scalogramimages to detect respiratory diseases in different categories.A comparative study among the proposed CNN model andthe others is presented in terms of detection performance andbeing a lightweight network.The rest of the paper is organized as follows. Previous studies related to lung sound classiﬁcation using differentML-based approaches are discussed in Section II. Section IIIdescribes the dataset, feature extraction process, and the pro-posed lightweight CNN model. The experimental setup andresults are discussed in Section IV. The performance of theproposed method is compared with other works in Section V.Finally, the concluding remarks are provided in Section VI.II. R

ELATED W ORKS

Many research works employing machine learning, anddeep learning have been reported on developing automatedsystems for respiratory sound classiﬁcation. However, themajority of the works have focused on respiratory anomalyprediction, basically classifying the lung sounds as wheeze,crackles [10]–[24], [31]–[36] rather than directly predictingrespiratory diseases from lung auscultation recordings. Thefew works geared towards pathology classiﬁcation are veryrecent and mostly involve elaborate processing or dedicatedCNN and RNN frameworks due to the inherent complexity ofthe signal [37]–[42]. However, at pathology-level, so far, theclassiﬁcation task has been investigated at three different res-olutions; the binary classiﬁcation (healthy, pathological) [37],[38], the ternary chronic classiﬁcation (healthy, chronic dis-ease, non-chronic disease) [38], [42] and multi-class distinctdisease classiﬁcation [39], [42]. Among the diseases, Upperand Lower Respiratory Tract Infection (URTI and LRTI),bronchiolitis and pneumonia have been included in the non-chronic disease class while COPD, asthma and bronchiectasishave been combined to form the chronic class [38].In [37], a novel CNN based ternary classiﬁcation approachhas been implemented and performed considerably well with82% accuracy and 88% ICBHI score. Later, the same authorsproposed a Mel-Frequency Cepstral Coefﬁcient (MFCC) andLong Short-term Memory (LSTM) based framework capableof conducting both binary and ternary classiﬁcation of respira-tory diseases [38] which demonstrated excellent performancewith 99% and 98% accuracy, respectively. Another workinvolving complex RNN architecture and extensive preprocess-ing has reported accuracy of 95.67%0.77% in predicting sixclass pathology-driven diseases [39]. However, by employinga CRNN network with a CNN-Mixture-of-Experts (MoE)baseline to learn both spatial and time-sequential features

EEE JOURNAL OF BIOMEDICAL AND HEALTH INFORMATICS, VOL. XX, NO. XX, XXXX 2020 3 from the spectrograms, recent work has achieved a speciﬁcityof 83% and a sensitivity of 96% in ternary respiratory dis-ease classiﬁcation [40]. For binary classiﬁcation, the samework has reported speciﬁcity and sensitivity of 83% and99%, respectively. As an extension of [40], a separate studyinvolving the robust Teacher-Student learning schemes withknowledge distillation has been conducted, which resultedin a substantially reduced speciﬁcity while maintaining thesensitivity [41].Since the existing heavily imbalanced datasets of lungauscultations further exacerbate the task of respiratory dis-ease classiﬁcation, a contemporary study has dealt with thisissue by experimenting with several data augmentation tech-niques, such as SMOTE, Adaptive Synthetic Sampling Method(ADASYN) and Variational autoencoder (VAE) [42]. Amongthe methods, the VAE-based Mel-spectrogram augmentationstrategy, in conjunction with a CNN model, has achieved thebest results with 98.5% sensitivity and 99.0% speciﬁcity internary chronic classiﬁcation. The strategy has also exhibitedan equally sophisticated performance with 98.8% sensitivityand 98.6% speciﬁcity in the case of six class respiratorydisease classiﬁcation [42].Although the scope of DL-based frameworks with aspectrogram-based feature extraction strategy has been inves-tigated in several works for direct classiﬁcation of respiratorydiseases from lung auscultations [40]–[42], to the best of theknowledge of the authors, scalogram based approaches havenot been explored in this domain. Additionally, no dedicatedlightweight, efﬁcient CNN framework has been developedand investigated for the respiratory disease classiﬁcation task.Furthermore, none of the studies consider the issue of intra-patient dependency in the train-validation split. Inspired by allof these factors, a scalogram based approach in conjunctionwith a lightweight CNN is proposed in this paper for theprediction of respiratory diseases from lung auscultations,maintaining patient independence. The proposed frameworkis schematically represented in Fig. 1.III. M

ATERIALS AND M ETHODS

A. ICBHI 2017 Dataset

ICBHI (International Conference on Biomedical HealthInformatics) 2017 database is a publicly available benchmarkdataset of lung auscultations [47]. It is collected by two inde-pendent research teams of Portugal and Greece. The datasetcontains 5.5 hours of audio recordings sampled at differentfrequencies (4 kHz, 10 kHz, and 44.1 kHz), ranging from 10sto 90s, in 920 audio samples of 126 subjects from differentanatomical positions with heterogeneous equipment [53].The samples are professionally annotated considering twoschemes: 1. according to the corresponding patients patho-logical condition, i.e. healthy and seven distinct diseaseclasses, namely Pneumonia, Bronchiectasis, COPD, URTI,LRTI, Bronchiolitis, Asthma and 2. according to the presenceof respiratory anomalies, i.e. crackles and wheezes in eachrespiratory cycle. Further details about the dataset and datacollection methods can be found in [53].

B. Data Prepossessing1) Noise ﬁltering:

Since 50 Hz to 2500 Hz is the acknowl-edged frequency range of the lung auscultation signals [7], therecorded audio signals are ﬁltered with a 6th order Butterworthbandpass ﬁlter, thus retaining 50 Hz to 2500 Hz frequencycomponents. Subsequently, all the sample signals are resam-pled to 22050 Hz for ensuring consistency and normalized tothe range [-1,1] for attaining device homogeneity.

2) Segmentation of the sound data:

Each of the audiorecordings is segmented according to the annotated respiratorycycle timing with a 6s duration each. Samples with a minimumrespiratory cycle duration of 3s are taken into account to obtainuseful respiratory sound information [40]. Post performingthis procedure, 2 of the disease classes, namely Asthma andLRTI, are found to have inadequate segmented samples formeaningful feature extraction and therefore, these two classesare not considered in our study. After these procedures, lungauscultation sounds from 87 out of 120 independent patientsare usable. Table I represents data distribution at several levelsof processing corresponding to the disease classes consideredfor this study.

TABLE ID

ISTRIBUTION OF D ATA AT D IFFERENT P ROCESSING L EVELS C ORRESPONDING TO THE D ISEASE C LASSES

DiseaseName No. ofunseg-mentedsound ﬁle No. ofSegmentedand FilteredSample No. ofUniquePatient No. ofGeneralizedAugmentedImage

Pneumonia 37 41 3 164Bronchiectasis 16 55 6 220COPD 793 1,963 51 1963Healthy 35 42 13 168URTI 23 21 8 84Bronchiolitis 13 65 6 260Total 917 2187 87 2859

C. Feature extraction1) Empirical Mode Decomposition (EMD):

EMD is apowerful self-adaptive signal decomposition method especiallyin the time scale and energy distribution aspects and highlysuitable for analysis and processing of non-linear and non-stationary signals such as lung sounds and heart sounds [54]. Itdecomposes a given signal x(t) into a ﬁnite set (N) of intrinsicmode functions,

IMF (t), IMF (t), . . . . , IMF N (t) , dependingon the local characteristic time scale of the signal, with a viewto expressing the original signal as the sum of all its IMF plusa ﬁnal trend either monotonic or constant called residue, r(t) a: x ( t ) = (cid:80) Ni =1 IM F i ( t ) + r ( t ) [55]. An IMF is a simpleoscillatory function with the equal number of extrema and zerocrossings and its envelopes must be symmetrical with respectto zero. Thus, the EMD detrends a signal and elicits underlyingspectral patterns [54].

2) Continuous Wavelet Transform (CWT):

Wavelet trans-form is deﬁned as a signal processing method that can de-compose a signal into an orthonormal wavelet basis or intoa set of independent frequency channels [15], [29]. Using abasis function, i.e. the mother wavelet g(t) , and its scaled and

EEE JOURNAL OF BIOMEDICAL AND HEALTH INFORMATICS, VOL. XX, NO. XX, XXXX 2020 4 dilated versions, the Continuous Wavelet Transform (CWT)can be used to decompose a ﬁnite-energy signal, x(t) as [30]: Z ( a, b ) = (cid:90) x ( t ) ∗ g ( t )( t − ab ) (1)where b denotes time location and a is scale factor. Largerscale values reveal low-frequency information while thesmaller scale values reveal high-frequency information [29].The squared-modulus of the CWT coefﬁcients Z is known asthe scalogram [15]. D. Scalogram Representations1) Conventional Scalogram:

Scalogram is deﬁned as thetime-frequency representation of a signal that depicts theobtained energy density using CWT [5], [56]. The segmentedand ﬁltered lung sound samples are decomposed into corre-sponding wavelet coefﬁcients in MATLAB 2020a by usingMorse analytic wavelet. Scalogram plots are generated with aresolution of 224 224 using these coefﬁcients. Fig. 2 showsthe scalograms of lung sounds in different disease categories.

2) Hybrid Approach for Scalogram:

For each segmentedand ﬁltered sample under each pathological class, 9 IMFsare generated using the EMD function in MATLAB 2020a.Based on the cross-correlation between the source signaland the IMFs, the most physically signiﬁcant IMF outputwith the highest correlation coefﬁcient is determined [57],[58]. Subsequently, the squared-modulus of the CWT of thecorresponding IMF is calculated to obtain the scalogram.The diverse frequency bands varying from the maximumto the minimum range give the IMFs the capability to extractthe temporal and spectral information [55] effectively. Hence,when this IMF based scheme is combined with CWT-orientedscalogram representation, the newly formed hybrid scalogramscan demonstrate more discriminative and signiﬁcant features.Thus, it has the potential to provide better classiﬁcation per-formance by a CNN model. The box plots of the scalogramsof lung sounds for various respiratory diseases are shownin Fig. 3. The distinction among the plots is more evidentwhen using the hybrid approach than those of the conventionalscalograms obtained using only CWT.It should be mentioned that the proposed scalogram isdistinctly different from that of [5], [58] in that the CWTmodulus is computed from the maximally correlated IMFsand thus, providing a better representation of the underlyinginformation. Note that the works of [5] and [58] are ondetecting respiratory anomalies such as crackle and wheeze,

Fig. 2.

Scalograms of the lung auscultation sounds for 6 disease classes;lung sound recordings (1st row), conventional scalogram (2nd row) andscalogram using the proposed hybrid approach (3rd row).

Fig. 3.

Box plot. (a) Scalogram using the conventional CWT approach;(b) Scalogram using the hybrid approach. and analysis and segmentation of heart sounds, respectively,whereas our objective is to detect respiratory diseases fromthe lung auscultations.

E. Augmentation

The ICBHI 2017 dataset is highly imbalanced, with around86% of the data belonging to COPD. Image augmentation us-ing different color mapping schemes is employed to oversam-ple the less represented classes and address the data imbalanceissue [59]. Colormaps are three-column arrays containing RGBtriplets where each row deﬁnes a distinct color. Scalogramrepresentation using different color maps helps generalize theproduced images.From each of the audio samples of the less represented dataclasses, four scalograms are generated for each segmentedsample using four different color mapping schemes: Parula,HSV, Jet, and Hot, which are available in MATLAB 2020awhile for the most represented class, COPD, only one image isproduced from each audio sample. Nevertheless, for ensuringgeneralization and homogeneity of the augmented data, allfour-color mapping schemes are randomly utilized for COPD.A summary of segmented audio ﬁles and ﬁnal augmentedscalogram images with corresponding diseases classes arepresented in Table I.IV. P

ROPOSED LIGHTWEIGHT

CNN

ARCHITECTURE

CNN has become a popular approach for classifying imagedata, and recently there have been several works using CNN onclassifying images produced from sounds [5], [31], [33]. How-ever, due to memory constraints, a regular deep CNN model iscomputationally expensive with its large number of learnableparameters and arithmetic operations. Thus, it is not suitablefor embedded devices as they cannot afford the processingcomplexity and storage space for parameters and weightvalues of ﬁlters [36]. Cloud computing methodology requiresa higher RAM for this computationally intensive training andhence are outsourced [60]. For this reason, Lightweight CNNmodels are gaining popularity among researchers for theirfaster performance and compact size without compromisingthe much-needed accuracy performance compared to the well-known deep learning networks [61].The architecture of the proposed CNN model consists ofan input layer corresponding to the 3-channel input of 224224images. The architecture of the proposed model is illustratedin Fig. 4.

EEE JOURNAL OF BIOMEDICAL AND HEALTH INFORMATICS, VOL. XX, NO. XX, XXXX 2020 5

Fig. 4.

The detailed architecture of the proposed lightweight CNN model.

The 1st convolutional layer uses 64 output ﬁlters with a 55-pixel size kernel followed by a 22-pixel max-pooling layer.Three additional convolutional layers are stacked over the ﬁrstlayer, each having a 33-pixel size kernel with 64, 96, and 96ﬁlters sequentially and corresponding batch-normalization andmax-pooling layers with 22 pooling window. Outputs fromall these layers are ﬂattened and connected with ﬁve pairs ofFC and dropout layers, followed by a SoftMax output layerwith probability nodes for each class. ReLU activation layeris applied within convolution calculation and fully connectedlayers and employed to introduce nonlinearity within thecalculation and reduce the time for convergence. It does notget activated for any negative value. Max pooling is usedafter the ReLU activation; it reduces the spatial dimensionalityof the extracted feature maps, extracts the most importantfeatures, and is unaffected from locational bias [35]. Toovercome the problem of more diverse data variance, the BatchNormalization layer with every convolution layer normalizesthe extracted feature. It gives the network a representativepower with a small number of parameters and faster trainingcapability by reducing the variance.V. E

XPERIMENTAL R ESULTS

A. Evaluation Criteria

The augmented image sets are divided into 80% trainingand 20% validating parts for training and ﬁne-tuning themodel hyperparameters. Patient uniqueness, a critical aspect inthe real-world applications, is maintained while dividing intotraining and validation parts as speaker dependency results inbiased accuracy [46].The classiﬁer models’ performance is evaluated based onthe well-known evaluation matrices, namely, accuracy, recall(sensitivity), precision, and F1-score. Additionally, speciﬁcityand ICBHI-score [38], [53], a dedicated metric involving bothsensitivity and speciﬁcity to assess the performance of theframeworks using the ICBHI dataset, is used to evaluate theperformance of our method.

B. Experimental Setup

The proposed CNN model is constructed using Kerasand TensorFlow backend, and trained using NVidia K80GPUs provided by Kaggle notebooks. The mini-batch training scheme is employed while feeding the image data into amodel for tackling the class imbalance issue. This techniqueperforms by oversampling the scarce classes while randomlyundersampling a majority class. This strategy ensures that theCNN model takes an equal number of samples from eachclass during each of the training epochs and thereby formsa balanced training set [26].The adaptive learning rate optimizer (Adam) with the learn-ing rate of 0.00001 is used for compiling the model. The batchsize needs to be a multipler of 6 since an equal number ofsamples from each of the 3 and 6 data classes are taken ineach training and validation batch [26]. In this study, batchsize 6 has been taken for training and validation of both theclassiﬁcation schemes.As stated earlier, both the ternary chronic classiﬁcation(chronic, non-chronic, healthy) and six class (Bronchiecta-sis, Bronchiolitis, COPD, Healthy, Pneumonia, and URTI)pathological classiﬁcation are carried out in this work. Theclassiﬁcation performance of the proposed CNN model iscompared with that of VGG16, a well-known CNN architec-ture for image classiﬁcation [48] in both of the classiﬁcationschemes. It should be noted that the experiments are performedusing both the convention CWT-based scalogram and hybridscalogram images. In addition, the performance of our pro-posed CNN model is compared with a number of well-knownDL architectures such as VGG16 and AlexNet and severallightweight networks in terms of computational complexityand accuracy.

C. Classiﬁcation Performance of the Proposed Framework1) Chronic Classiﬁcation:

From Table II, it can be seenthat using the hybrid scalogram method in conjunction withthe proposed CNN model classiﬁer shows the best accuracy,99.21%. However, the corresponding accuracy obtained byusing VGG16 is quite close (98.89%). Despite being a heavymodel, the comparatively lower accuracy of VGG16 can beattributed to the over-ﬁtting issue due to the limited numberof images in different classes. When comparing conventionalCWT scalogram to the proposed hybrid scalogram, consider-able improvement in accuracy is evident for the latter usingVGG16 and our proposed CNN model (9.5%-11.4%). Thecorresponding confusion matrices for both the models’ bestresults are illustrated in Fig. 5. The results depict that the

EEE JOURNAL OF BIOMEDICAL AND HEALTH INFORMATICS, VOL. XX, NO. XX, XXXX 2020 6

TABLE IIS

UMMARY OF THE C LASSIFICATION P ERFORMANCE T HE RED AND BLUE MARKED VALUES REPRESENT THE HIGHEST ACCURACY OBTAINED WITH OUR PROPOSED

CNN

MODEL AND

VGG16

Network Chronic Classiﬁcation Pathological ClassiﬁcationScalogram using CWT Scalogram using EMD and CWT Scalogram using CWT Scalogram using EMD and CWTPrec. Recall Acc. F1 Prec. Recall Acc. F1 Prec. Recall Acc. F1 Prec. Recall Acc. F1

Proposedmodel 95.00 91.00 90.58 92.00 99.25 99.20 99.20 99.22 87.00 86.00 86.31 86.00 99.12 99.05 99.05 98.96VGG16 92.00 89.00 88.58 90.00 99.03 98.69 98.69 99.11 85.00 85.00 85.80 85.00 97.80 97.32 97.32 97.01TABLE IIIC

OMPARISON OF THE P ROPOSED F RAMEWORK WITH E XISTING W ORKS U SING THE

ICBHI 2017 D

ATASET

Processing Type of Training Network Number of Prediction Classes Acc. Spec. Sen. ICBHI Score

Gammatone Spectrogram [40], [41] C-RNN 3 (Healthy, Chronic, Non-chronic) - 0.57 0.94 0.76CNN-MoE - 0.86 0.96 0.91Ensemble - 0.71 0.95 0.83MFCC [38] CNN 3 (Healthy, Chronic, Non-chronic) 0.82 0.76 0.89 0.83LSTM 0.98 0.82 0.98 0.90MFCC [39] RNN 6 classes (excluding Asthma, LRTI) 0.9567 - 0.9567 -Mel-spectrogram- VAE [42] CNN 3 (Healthy, Chronic, Non-chronic) 0.99 0.990 0.985 0.9886 classes (excluding Asthma, LRTI) 0.99 0.986 0.988 0.987

Hybrid Scalogram (Proposed)

Lightweight CNN 3 (Healthy, Chronic, Non-chronic) proposed method is better in ternary chronic classiﬁcation thanthe VGG16.

2) Pathological Classiﬁcation:

For six-class Pathologicalclassiﬁcation, the proposed method involving hybrid scalo-gram and proposed CNN model classiﬁer yields the bestaccuracy, 99.05%, as seen in Table II. Similar to the casein the ternary chronic classiﬁcation scheme, the accuracy ofVGG16 is slightly lower. However, since the dataset gets moresegregated being divided into six different disease classes, theaccuracy drop is more here. The proposed hybrid scalogramoutperforms the conventional CWT scalogram with a largermargin (13.4%-14.7%) for both VGG16 and our proposedmodel. In general, the proposed method gives a better per-formance, which is apparent from the best confusion matricesshown in Fig. 6.

D. Comparison with Other Works1) Respiratory Disease Classiﬁcation:

As discussed in sec-tion II, none of the existing works for respiratory disease

Fig. 5.

Confusion matrices for the best results obtained in ternarychronic classiﬁcation. (a) Proposed CNN model with batch size 6; (b)VGG16 with batch size 6. classiﬁcation explore the domain of patient-speciﬁc prediction.Some of the studies address the issues regarding class im-balance [39], [42].Nevertheless, the extensive preprocessing,coupled with the ambiguous undersampling of the COPDdisease class while oversampling all other disease classes, cancomplicate the reproducibility of [39]. Furthermore, in [42],FFT is applied to the entire respiratory sound signals, whereasour work focuses on segmented breath sounds. In our work,complete patient independence has been maintained in thetrain and validation set, which is not possible while usingthe entire lung auscultation signal due to the low numberof samples. Therefore, our work aims to overcome all thedrawbacks present in the existing methods. A comparisonamong the various methods, including the Proposed method,is provided in Table III.It is observed that our proposed CNN model with the hybridscalogram can perform on par with the existing state-of-the-artCNN and RNN models for both cases of classiﬁcation whilemaintaining a patient independent train-validation scheme.

2) Computational Performance as a Lightweight Network:

A detailed comparison is presented in Table IV amongVGG16 [48], our proposed CNN model, AlexNet [49] andthe existing state-of-the-art lightweight models such as Mo-bileNetV2 [50], NASNet [51], ShufﬂeNetV2 [52] in terms ofsize, trainable parameters, the number of operations measuredby multiply-add (MAdd) and accuracy on both chronic andpathological approach. In terms of accuracy, our proposedCNN model shows better results than VGG16 while requiringonly 3% of the parameters. The proposed CNN model alsooutperforms the contemporary lightweight models, ShufﬂeNetV2, MobileNet V2, and NASNet relatively by 0.16%, 0.32%,and 0.80%, respectively, while obtaining better trade-off be-tween the number of parameters, requiring signiﬁcantly lowerstorage space and computational power. This makes our pro-posed lightweight model more suitable for real-time wearable

EEE JOURNAL OF BIOMEDICAL AND HEALTH INFORMATICS, VOL. XX, NO. XX, XXXX 2020 7 devices with faster and less resource-intensive training.We have calculated the time required for the end-to-endclassiﬁcation of an auscultation sound using our framework.For this experiment, we only performed the preprocessing andinference step using all of our test data and calculated the meanand standard deviation of the required CPU time. We foundthat the preprocessing time for EMD+CWT is s ± . s. , andonly CWT is . s ± . s . These processes are run on a Core i77500 processor with a 2.70-2.90GHz speed. Time required forthe classiﬁcation of a scalogram using the proposed network is . s ± . s , while the MobileNetV2 takes . s ± . s .Thus, the proposed CNN is faster in classifying a sound imageas compared to MobileNetV2. TABLE IVC

OMPARISONS AMONG S EVERAL M ODELS FROM L IGHTWEIGHT P ERSPECTIVE

Parameter Network

VGG16 AlexNet

Proposedmodel

Mobile-Net(v2) Shufﬂe-Net(v2) NASNetSize(aftertraining) 1.5GB 294MB 44.85MB 49MB 46.9MB 64MBTrainableparame-ters 138M 25.704M 3.7674M 4.2M 5.4M 4.2MMAdd 154.7G 725M 371.93M 575M 564M 567MAccuracy(6 classes) 97.60% 98.237% 99.05% 98.89% 98.27% 98.73%Accuracy(3 classes) 97.60% 99.519% 99.21% 98.72% 99.06% 98.42%

VI. C

ONCLUSION

In this work, we have proposed a lightweight CNN model toclassify respiratory diseases using scalogram images of lungsounds. A hybrid approach employing both EMD and CWTis presented to generate the scalogram images. The publiclyavailable ICBHI 2017 challenge dataset has been used for theChronic and Pathological classiﬁcation of respiratory diseases.The proposed method has provided a considerable accuracy of99.21% for ternary chronic classiﬁcation. In pathological clas-siﬁcation among six disease classes, an accuracy of 99.05%is achieved. The obtained accuracies are higher than VGG16,which is a much larger network. In addition, for both cases

Fig. 6.

Confusion matrices for the best results obtained in six-classPathological classiﬁcation. (a) Proposed CNN model with batch size 6;(b) VGG16 with batch size 6. of classiﬁcations, the proposed framework provides better ora comparable performance with respect to the existing state-of-the-art methods in terms of Precision, Recall, F1-score,Sensitivity, Speciﬁcity and ICBHI score. It is worthwhile tomention that unlike most of these methods, the classiﬁcationperformance of the proposed technique has been assessed,keeping the training and testing data-independent in terms ofpatients. The proposed classiﬁer’s computational complexityhas also been compared with a number of well-known CNNmodels and state-of-the-art lightweight networks. It has beenshown to achieve high accuracy in classiﬁcation while being alightweight deep architecture. We believe that these attributescan enable the development of the automatic classiﬁcationof respiratory diseases from lung auscultations in real-worldclinical applications. R

Global surveillance, prevention and control of chronic respi-ratory diseases: a comprehensive approach . World Health Organization,2007.[3] C. D. Mathers and D. Loncar, “Projections of global mortality andburden of disease from 2002 to 2030,”

PLoS medicine

Artiﬁcial Intelligence in Medicine , vol. 103, p. 101809, 2020.[6] A. Abbas and A. Fahim, “An automated computerized auscultation anddiagnostic system for pulmonary diseases,”

Journal of Medical Systems ,vol. 34, no. 6, pp. 1149–1155, 2010.[7] S. Reichert, R. Gass, C. Brandt, and E. Andr`es, “Analysis of respiratorysounds: state of the art,”

Clinical Medicine. Circulatory, Respiratory andPulmonary Medicine , vol. 2, pp. CCRPM–S530, 2008.[8] M. Sarkar, I. Madabhavi, N. Niranjan, and M. Dogra, “Auscultation ofthe respiratory system,”

Annals of Thoracic Medicine , vol. 10, no. 3, p.158, 2015.[9] A. Bohadana, G. Izbicki, and S. S. Kraman, “Fundamentals of lungauscultation,”

New England Journal of Medicine , vol. 370, no. 8, pp.744–751, 2014.[10] M. Bahoura and C. Pelletier, “New parameters for respiratory soundclassiﬁcation,” in

Canadian Conference on Electrical and ComputerEngineering , vol. 3. IEEE, 2003, pp. 1457–1460.[11] R. Palaniappan, K. Sundaraj, and N. U. Ahamed, “Machine learning inlung sound analysis: a systematic review,”

Biocybernetics and Biomed-ical Engineering , vol. 33, no. 3, pp. 129–135, 2013.[12] J. Zhang, W. Ser, J. Yu, and T. Zhang, “A novel wheeze detection methodfor wearable monitoring systems,” in . IEEE, 2009, pp.331–334.[13] M. Bahoura, “Pattern recognition methods applied to respiratory soundsclassiﬁcation into normal and wheeze classes,”

Computers in Biologyand Medicine , vol. 39, no. 9, pp. 824–843, 2009.[14] J. Acharya, A. Basu, and W. Ser, “Feature extraction techniques for low-power ambulatory wheeze detection wearables,” in . IEEE, 2017, pp. 4574–4577.[15] N. Gautam and S. B. Pokle, “Wavelet scalogram analysis of phonopul-monographic signals,”

International Journal of Medical Engineering andInformatics , vol. 5, no. 3, pp. 245–252, 2013.[16] G. Serbes, C. O. Sakar, Y. P. Kahya, and N. Aydin, “Pulmonary crackledetection using time–frequency and time–scale analysis,”

Digital SignalProcessing , vol. 23, no. 3, pp. 1012–1021, 2013.[17] S. ˙Ic¸er and S¸. Gengec¸, “Classiﬁcation and analysis of non-stationarycharacteristics of crackle and rhonchus lung adventitious sounds,”

Dig-ital Signal Processing , vol. 28, pp. 18–27, 2014.

EEE JOURNAL OF BIOMEDICAL AND HEALTH INFORMATICS, VOL. XX, NO. XX, XXXX 2020 8 [18] F. Jin, F. Sattar, and D. Y. Goh, “New approaches for spectro-temporalfeature extraction with applications to respiratory sound classiﬁcation,”

Neurocomputing , vol. 123, pp. 362–371, 2014.[19] P. Bokov, B. Mahut, P. Flaud, and C. Delclaux, “Wheezing recognitionalgorithm using recordings of respiratory sounds at the mouth in apediatric population,”

Computers in Biology and Medicine , vol. 70, pp.40–50, 2016.[20] P. Mayorga, C. Druzgalski, R. Morelos, O. Gonzalez, and J. Vi-dales, “Acoustics based assessment of respiratory diseases using gmmclassiﬁcation,” in . IEEE, 2010, pp. 6312–6316.[21] T. R. Fenton, H. Pasterkamp, A. Tal, and V. Chernick, “Automatedspectral characterization of wheezing in asthmatic children,”

IEEETransactions on Biomedical Engineering , vol. 32, no. 1, pp. 50–55,1985.[22] H. Pasterkamp, S. S. Kraman, and G. R. Wodicka, “Respiratory sounds:advances beyond the stethoscope,”

American Journal of Respiratory andCritical Care Medicine , vol. 156, no. 3, pp. 974–987, 1997.[23] Z. Dokur, “Respiratory sound classiﬁcation by using an incrementalsupervised neural network,”

Pattern Analysis and Applications , vol. 12,no. 4, p. 309, 2009.[24] S. Rietveld, M. Oud, and E. H. Dooijes, “Classiﬁcation of asthmaticbreath sounds: preliminary results of the classifying capacity of humanexaminers versus artiﬁcial neural networks,”

Computers and BiomedicalResearch , vol. 32, no. 5, pp. 440–448, 1999.[25] B. Bozkurt, I. Germanakis, and Y. Stylianou, “A study of time-frequencyfeatures for cnn-based automatic heart sound classiﬁcation for pathologydetection,”

Computers in Biology and Medicine , vol. 100, pp. 132–143,2018.[26] A. I. Humayun, S. Ghaffarzadegan, M. I. Ansari, Z. Feng, and T. Hasan,“Towards domain invariant heart sound abnormality detection usinglearnable ﬁlterbanks,”

IEEE Journal of Biomedical and Health Infor-matics , vol. 24, no. 8, pp. 2189–2198, 2020.[27] U. R. Acharya, S. L. Oh, Y. Hagiwara, J. H. Tan, and H. Adeli, “Deepconvolutional neural network for the automated detection and diagnosisof seizure using eeg signals,”

Computers in Biology and Medicine , vol.100, pp. 270–278, 2018.[28] S. Hershey, S. Chaudhuri, D. P. Ellis, J. F. Gemmeke, A. Jansen,R. C. Moore, M. Plakal, D. Platt, R. A. Saurous, B. Seybold, et al. ,“Cnn architectures for large-scale audio classiﬁcation,” in . IEEE, 2017, pp. 131–135.[29] S. Debbal and F. Bereksi-Reguig, “Analysis of the second heart soundusing continuous wavelet transform,”

Journal of Medical Engineering& Technology , vol. 28, no. 4, pp. 151–156, 2004.[30] A. Meintjes, A. Lowe, and M. Legget, “Fundamental heart soundclassiﬁcation using the continuous wavelet transform and convolutionalneural networks,” in . IEEE,2018, pp. 409–412.[31] K. Minami, H. Lu, H. Kim, S. Mabu, Y. Hirano, and S. Kido, “Au-tomatic classiﬁcation of large-scale respiratory sound dataset based onconvolutional neural network,” in . IEEE, 2019, pp. 804–807.[32] M. Aykanat, ¨O. Kılıc¸, B. Kurt, and S. Saryal, “Classiﬁcation of lungsounds using convolutional neural networks,”

EURASIP Journal onImage and Video Processing , vol. 2017, no. 1, p. 65, 2017.[33] F. Demir, A. Sengur, and V. Bajaj, “Convolutional neural networks basedefﬁcient approach for classiﬁcation of lung diseases,”

Health InformationScience and Systems , vol. 8, no. 1, p. 4, 2020.[34] R. Liu, S. Cai, K. Zhang, and N. Hu, “Detection of adventitiousrespiratory sounds based on convolutional neural network,” in . IEEE, 2019, pp. 298–303.[35] D. Bardou, K. Zhang, and S. M. Ahmad, “Lung sounds classiﬁcationusing convolutional neural networks,”

Artiﬁcial Intelligence in Medicine ,vol. 88, pp. 58–69, 2018.[36] J. Acharya and A. Basu, “Deep neural network for respiratory soundclassiﬁcation in wearable devices enabled by patient speciﬁc modeltuning,”

IEEE Transactions on Biomedical Circuits and Systems , vol. 14,no. 3, pp. 535–544, 2020.[37] D. Perna, “Convolutional neural networks learning from respiratorydata,” in . IEEE, 2018, pp. 2109–2113.[38] D. Perna and A. Tagarelli, “Deep auscultation: Predicting respiratoryanomalies and diseases via recurrent neural networks,” in . IEEE, 2019, pp. 50–55.[39] V. Basu and S. Rana, “Respiratory diseases recognition through res-piratory sound with the help of deep neural network,” in . IEEE, 2020, pp. 1–6.[40] L. Pham, I. McLoughlin, H. Phan, M. Tran, T. Nguyen, and R. Pala-niappan, “Robust deep learning framework for predicting respiratoryanomalies and diseases,” arXiv preprint arXiv:2002.03894 , 2020.[41] L. Pham, “Predicting respiratory anomalies and diseases using deeplearning models,” arXiv preprint arXiv:2004.04072 , 2020.[42] M. T. Garc´ıa-Ord´as, J. A. Ben´ıtez-Andrades, I. Garc´ıa-Rodr´ıguez,C. Benavides, and H. Alaiz-Moret´on, “Detecting respiratory pathologiesusing convolutional neural networks and variational autoencoders forunbalancing data,”

Sensors , vol. 20, no. 4, p. 1214, 2020.[43] A. G. Howard, M. Zhu, B. Chen, D. Kalenichenko, W. Wang,T. Weyand, M. Andreetto, and H. Adam, “Mobilenets: Efﬁcient convo-lutional neural networks for mobile vision applications,” arXiv preprintarXiv:1704.04861 , 2017.[44] I. Hubara, M. Courbariaux, D. Soudry, R. El-Yaniv, and Y. Bengio,“Quantized neural networks: Training neural networks with low pre-cision weights and activations,”

The Journal of Machine LearningResearch , vol. 18, no. 1, pp. 6869–6898, 2017.[45] S. Kiranyaz, T. Ince, R. Hamila, and M. Gabbouj, “Convolutional neuralnetworks for patient-speciﬁc ecg classiﬁcation,” in . IEEE, 2015, pp. 2608–2611.[46] N. U. Maheswari, A. Kabilan, and R. Venkatesh, “Speaker independentspeech recognition system based on phoneme identiﬁcation,” in .IEEE, 2008, pp. 1–6.[47] “ICBHI 2017 Challenge,” 2017, [Online]. Available: https://bhichallenge.med.auth.gr/.[48] K. Simonyan and A. Zisserman, “Very deep convolutional networks forlarge-scale image recognition,” arXiv preprint arXiv:1409.1556 , 2014.[49] A. Krizhevsky, I. Sutskever, and G. E. Hinton, “Imagenet classiﬁcationwith deep convolutional neural networks,” in

Advances in Neural Infor-mation Processing Systems , 2012, pp. 1097–1105.[50] M. Sandler, A. Howard, M. Zhu, A. Zhmoginov, and L.-C. Chen, “Mo-bilenetv2: Inverted residuals and linear bottlenecks,” in

IEEE Conferenceon Computer Vision and Pattern Recognition , 2018, pp. 4510–4520.[51] X. Qin and Z. Wang, “Nasnet: A neuron attention stage-by-stage net forsingle image deraining,” arXiv preprint arXiv:1912.03151 , 2019.[52] N. Ma, X. Zhang, H.-T. Zheng, and J. Sun, “Shufﬂenet v2: Practicalguidelines for efﬁcient cnn architecture design,” in

European conferenceon Computer Vision (ECCV) , 2018, pp. 116–131.[53] B. Rocha, D. Filos, L. Mendes, I. Vogiatzis, E. Perantoni,E. Kaimakamis, P. Natsiavas, A. Oliveira, C. J´acome, A. Marques, et al. , “A respiratory sound database for the development of automatedclassiﬁcation,” in

International Conference on Biomedical and HealthInformatics . Springer, 2017, pp. 33–37.[54] N. Ibtehaz, M. S. Rahman, and M. S. Rahman, “Vfpred: A fusionof signal processing and machine learning techniques in detectingventricular ﬁbrillation from ecg signals,”

Biomedical Signal Processingand Control , vol. 49, pp. 349–359, 2019.[55] M. Altuve, L. Su´arez, and J. Ardila, “Fundamental heart sounds analysisusing improved complete ensemble emd with adaptive noise,”

Biocyber-netics and Biomedical Engineering , vol. 40, no. 1, pp. 426–439, 2020.[56] Z. Ren, K. Qian, Y. Wang, Z. Zhang, V. Pandit, A. Baird, and B. Schuller,“Deep scalogram representations for acoustic scene classiﬁcation,”

IEEE/CAA Journal of Automatica Sinica , vol. 5, no. 3, pp. 662–669,2018.[57] R. Fontugne, J. Ortiz, D. Culler, and H. Esaki, “Empirical modedecomposition for intrinsic-relationship extraction in large sensor de-ployments,” in

Workshop on Internet of Things Applications, IoT-App ,vol. 12, 2012.[58] D. Boutana, M. Benidir, and B. Barkat, “Segmentation and time-frequency analysis of pathological heart sound signals using the emdmethod,” in . IEEE, 2014, pp. 1437–1441.[59] F. Y. Shih and H. Patel, “Deep learning classiﬁcation on opticalcoherence tomography retina images,”

International Journal of PatternRecognition and Artiﬁcial Intelligence , vol. 34, no. 08, p. 2052002, 2019.[60] S. Y. Nikouei, Y. Chen, S. Song, R. Xu, B.-Y. Choi, and T. R. Faughnan,“Real-time human detection as an edge service enabled by a lightweightcnn,” in . IEEE, 2018, pp. 125–129.

EEE JOURNAL OF BIOMEDICAL AND HEALTH INFORMATICS, VOL. XX, NO. XX, XXXX 2020 9 [61] B. Lim, B. Yang, and H. Kim, “Real-time lightweight cnn for detectingroad object of various size,” in2018 IEEE Conference on Multimedia Information Processing and Retrieval (MIPR)