[PDF] Multi-Feature Multi-Scale CNN-Derived COVID-19 Classification from Lung Ultrasound Data

Abstract

The global pandemic of the novel coronavirus disease 2019 (COVID-19) has put tremendous pressure on the medical system. Imaging plays a complementary role in the management of patients with COVID-19. Computed tomography (CT) and chest X-ray (CXR) are the two dominant screening tools. However, difficulty in eliminating the risk of disease transmission, radiation exposure and not being costeffective are some of the challenges for CT and CXR imaging. This fact induces the implementation of lung ultrasound (LUS) for evaluating COVID-19 due to its practical advantages of noninvasiveness, repeatability, and sensitive bedside property. In this paper, we utilize a deep learning model to perform the classification of COVID-19 from LUS data, which could produce objective diagnostic information for clinicians. Specifically, all LUS images are processed to obtain their corresponding local phase filtered images and radial symmetry transformed images before fed into the multi-scale residual convolutional neural network (CNN). Secondly, image combination as the input of the network is used to explore rich and reliable features. Feature fusion strategy at different levels is adopted to investigate the relationship between the depth of feature aggregation and the classification accuracy. Our proposed method is evaluated on the point-of-care US (POCUS) dataset together with the Italian COVID-19 Lung US database (ICLUS-DB) and shows promising performance for COVID-19 prediction.

Full PDF

MMulti-feature Multi-Scale CNN-Derived COVID-19 Classiﬁcation fromLung Ultrasound Data

Hui Che, Jared Radbel, Jag Sunderram, John L. Nosher, Vishal M. Patel, and Ilker Hacihaliloglu

Abstract — The global pandemic of the novel coronavirusdisease 2019 (COVID-19) has put tremendous pressure onthe medical system. Imaging plays a complementary rolein the management of patients with COVID-19. Computedtomography (CT) and chest X-ray (CXR) are the two dominantscreening tools. However, difﬁculty in eliminating the risk ofdisease transmission, radiation exposure and not being cost-effective are some of the challenges for CT and CXR imaging.This fact induces the implementation of lung ultrasound (LUS)for evaluating COVID-19 due to its practical advantages ofnoninvasiveness, repeatability, and sensitive bedside property.In this paper, we utilize a deep learning model to perform theclassiﬁcation of COVID-19 from LUS data, which could produceobjective diagnostic information for clinicians. Speciﬁcally, allLUS images are processed to obtain their corresponding localphase ﬁltered images and radial symmetry transformed imagesbefore fed into the multi-scale residual convolutional neuralnetwork (CNN). Secondly, image combination as the input ofthe network is used to explore rich and reliable features. Featurefusion strategy at different levels is adopted to investigatethe relationship between the depth of feature aggregation andthe classiﬁcation accuracy. Our proposed method is evaluatedon the point-of-care US (POCUS) dataset together with theItalian COVID-19 Lung US database (ICLUS-DB) and showspromising performance for COVID-19 prediction.

I. INTRODUCTIONThe COVID-19 pandemic has increased the burden ofexcess morbidity and mortality worldwide. The high trans-missibility and long incubation time of the SARS CoV-2virus increase the difﬁculty in containing viral spread. Arapid diagnosis and severity classiﬁcation in the early stageof the disease can signiﬁcantly reduce the risk of furtherinfections and help mitigate the excess morbidity and mor-tality of critically ill patients. At present, the main detectionmethods for COVID-19 infection are the genetic test (reversetranscription polymerase chain reaction (RT-PCR)) [1] andserology test. RRadiological assessment, based on CT andCXR, has been incorporated to improve the management ofCOVID-19 disease. However, difﬁculty in eliminating therisk of disease transmission, radiation exposure, not beingcost-effective are some of the challenges for CT and CXR

Hui Che is with the Department of Biomedical Engineering, RutgersUniversity, NJ, USAJared Radbel and Jag Sunderam are with the Department of Medicine,Rutgers Robert Wood Johnson Medical School, NJ, USAJohn L. Nosher is with the Department of Radiology, Rutgers RobertWood Johnson Medical School, NJ, USAVishal M. Patel is with the Department of Electrical and ComputerEngineering, Johns Hopkins University, MD, USAIlker Hacihaliloglu is with the Department of Biomedical Engineering,Rutgers University and the Department of Radiology, Rutgers RobertWood Johnson Medical School, NJ, USA (Corresponding author; E-mail: [email protected] ) imaging [2]. CT scan can also be not performed bedsidelimiting its use in the intensive care unit (ICU) settings.Lung ultrasound (LUS) is non-invasive, rapid, repeatable,and provides bedside imaging providing a safer alternativeto CXR and CT. As such, LUS use for rapid assessmentof the severity of COVID-19 pneumonia has been reported[1]–[3]. However, early lesions or less obvious tissue changesare difﬁcult to distinguish by the human eyes. Furthermore,differences in medical pathology around various regionsand the varied LUS experience of clinicians can resultin misdiagnosis. Thus, developing standardized systems toreport and interpret disease ﬁndings is a challenge with LUS[2].Artiﬁcial intelligence (AI)-based solutions in medicalimaging have demonstrated the potential to establish ob-jective and uniﬁed interpretation standards. In [1] a newconvolutional neural network (CNN) architecture, termedPOCOVID-Net, was proposed. A VGG-16 architecture wasused as the backbone and was ﬁne-tuned during networktraining [1]. The reported average 3-class classiﬁcation ac-curacy was 89% [1]. In [4], a multi-task CNN architecturewas proposed. The network achieved an F score of 61%, aprecision of 70%, and a recall of 60% for risk prediction.Although promising early results, CNN-based methodsfor processing B-mode US data are affected by the imageacquisition settings and quality of the collected data [5].Finally, the limited availability of COVID-19 LUS data isalso another bottleneck.To address the above problems, we propose using a multi-feature multi-scale CNN-based approach to achieve a moreaccurate COVID-19 classiﬁcation. Given that incorporatinglocal-phase image tissue features can improve the accuracyof CNNs [5] for processing B-mode US data, local phase USimage-based COVID-19 signatures are extracted for diverseand robust representations. Then we adopted the feature-fusion strategy to realize the effect of feature complement.To enlarge the network perception dimensions for morediscriminative features of the input images, extra convolu-tional layers with different-size kernels are used in our CNNarchitecture. Our proposed approach is evaluated on 1752scans obtained from 76 subjects.II. METHODSOur method mainly consists of two parts, local phasefeatures extraction and binary classiﬁcation based on multi-feature multi-scale CNNs. In this work, the use of local phaseinformation aims to enhance the appearance of lung tissuesand recover the pertinent tissue structure from LUS data. a r X i v : . [ ee ss . I V ] F e b he extraction of local phase image features also increasethe dataset size for training. The model applied for the clas-siﬁcation task is based on the multi-scale two-dimensional(2D) residual neural network (ResNet) architecture similar tothe one reported in [6]. Three different fusion architecturesare investigated during this work. A. Local Phase Image Features

Image phase information is a key component in the inter-pretation of a scene and has been used in various applicationsfor processing US data [5], [7]. In this part, we ﬁrst obtainthe local phase energy feature image, denoted as

LPE ( x , y ) ,which is extracted using: LPE ( x , y ) = ∑ sc | US M | − (cid:113) US M + US M (1)In the above formula, sc represents the number of ﬁlter scalesset to 2 throughout the experimental evaluation, and US M isthe group of monogenic signal images computed using thevector-valued odd ﬁlter (Riesz ﬁlter) [7] on band-pass ﬁlteredLUS image, denoted as US B ( x , y ) , as follows: US M ( x , y ) = [ US M ( x , y ) , US M ( x , y ) , US M ( x , y )]= [ US B ( x , y ) , US B ( x , y ) ⊗ h ( x , y ) , US B ( x , y ) ⊗ h ( x , y )] (2)where ⊗ represents the convolution operation and h ( x , y ) , h ( x , y ) are components in Riesz ﬁlter. For bandpass ﬁltering α -scale space derivative quadrature ﬁlters (ASSD) [7] areused to output US B ( x , y ) .Then US signal transmission map is modelled with scat-tering and attenuation information to get enhanced im-age US E ( x , y ) , with maximized visibility of high intensity LPE ( x , y ) features inside a local region. A linear interpola-tion model is selected to combine the two interactions: LPE ( x , y ) = US A ( x , y ) US E ( x , y ) + ( − US A ( x , y )) β (3)Here, LPE ( x , y ) is the local phase energy image, US A ( x , y ) is the signal transmission map and US E ( x , y ) is the enhancedimage. β is a constant value representative of the tissueechogenicity in the local region. Our aim is US E ( x , y ) andwe hope to get two different enhancement results withdifferent β value settings (60% and 90% of the maximumintensity value of LPE ( x , y ) ). Once the signal transmissionmap US A ( x , y ) is obtained using the well-established BeerLambert Law, US E ( x , y ) can be calculated according toEquation (4): US E ( x , y ) = LPE ( x , y ) − β [ max ( US A ( x , y ) , ε )] δ + β (4) δ is related to tissue attenuation coefﬁcient, η , and ε is asmall constant to avoid division by zero. Throughout theexperimental evaluation η = . ε = . US E ( x , y ) denoted as US E ( x , y ) and US E ( x , y ) are shown. These twoenhanced images are used as the input for radial symmetrytissue extraction. Fast radial symmetry transform algorithm is Fig. 1. Qualitative results of local phase and radial symmetry-based imageenhancement and feature extraction methods. Top row: A regular lung.Middle row: A bacterial pneumonia infected lung. Bottom row:A COVID-19infected lung. All rows from left to right: LUS image US ( x , y ) , local phaseenhanced images US E ( x , y ) and US E ( x , y ) , radial symmetry transformedimages S ( x , y ) and S ( x , y ) . applied on the local phase images, aiming to detect points ofinterest [7]. Fig. 1 shows radial symmetry images S ( x , y ) and S ( x , y ) corresponding to the local phase enhanced images, US E ( x , y ) and US E ( x , y ) , it can be seen that the transfor-mation highlights the points of interest that are characterizedby radial symmetry as well as high contrast. B. Network Architecture

The multi-scale 2D ResNet is a light-weighted classiﬁca-tion network even though it simultaneously captures featuresfrom multiple receptive ﬁelds. This network is composed ofthree functional parts: 1) one convolutional layer for primaryfeature map extraction, 2) multiple residual blocks with themulti-scale convolutional layer, 3) a fully connected layerwith softmax activation function to act as a classiﬁer. Allinput images should be resized to 512 ×

512 before fed intothe network. We investigate three different fusion architec-tures with the US ( x , y ) , US E ( x , y ) , US E ( x , y ) , S ( x , y ) , and S ( x , y ) images as an input.Fig. 2 illustrates various network architectures. Since thescale of the objects in the image is varies, we adopt multi-scale receptive ﬁelds to focus on different scale information.As shown in Fig. 2, three ResCNN blocks are basic com-ponents in all designs to extract multi-scale features. Theyhave different receptive kernels, the sizes set to 3 ×

3, 5 × ×

7. In every ResCNN block, there are three sub-blocks and each sub-block contains two convolutional layers.The skip connection is added in each sub-block to avoid thedegradation problem [5]. An average pooling layer followsafter the convolution operation to output the ﬁnal featuremap. At the end of the network, a fully connected layer withactivation function is used to act as a classiﬁer, with the inputof the concatenation of the ﬁnal feature maps.Feature-fusion function is utilized in different levels ofthe CNN model to construct early-, mid-, and late-fusion ig. 2. The various CNN architectures. Each convolutional layer has three parameters: kernel size, depth and stride. (a) The architecture of the early-fusionCNN. US ( x , y ) , US E ( x , y ) , US E ( x , y ) , S ( x , y ) and S ( x , y ) images are fused at the pixel level to input the network. (b) The architecture of the mid-fusionCNN. Five input images are separately processed by the initial convolutional block to output the corresponding primary feature maps. A concatenationof these primary feature maps is processed through the network. (c) The architecture of the late-fusion CNN. Five input images are separately processedthrough the whole network till the average pooling layer. All ﬁnal feature maps are fused to input the fully-connected layer. structures [4]. To achieve early-fusion, all the images areconcatenated at the pixel level to form input with morechannels. In the mid-fusion model, multiple input imagesare input to the network separately, processed by the initialconvolutional layer to obtain corresponding primary featuremaps. Concatenation is performed to ﬁnish feature aggrega-tion for the deeper extraction. Late fusion is operated beforethe fully connected layer processing to fuse ﬁnal featuremaps from each input image. C. Data

The dataset used in this work was obtained from [1]and [3] and consisted of 1276 COVID-19 LUS scans from51 subjects, 254 bacterial pneumonia LUS scans from 13subjects, and 222 LUS scans from 12 healthy subjects. Allselected images from two released datasets are in the formof convex probe image. Bacterial pneumonia and healthy LUS scans are joined as a non-COVID-19 class. Our task isto perform binary classiﬁcation. Before image enhancement,all data is cropped into 334 ×

334 squares for a purpose ofremoving the non-relevant information.

D. Experiment Implementation

We perform 5-fold cross-validation to evaluate the per-formance of our proposed method. During evaluation samepatient data was not included in the training and testing data.The reported ﬁnal results show the mean of the 5-fold crossvalidation. All datasets maintain the same data distribution,including the US ( x , y ) dataset and US E ( x , y ) , US E ( x , y ) , S ( x , y ) , S ( x , y ) datasets.All CNN models are trained by using the cross-entropyloss function and ADAM optimizer with a learning rate of1 e −

5. Classiﬁcation performance is measured by four met-rics: accuracy, precision, recall and F score . To evaluate the ABLE IC

LASSIFICATION PERFORMANCE SUMMARY . B

EST RESULT IS SHOWN IN

BOLD

Method

Accuracy PrecisionCovid-19/Non RecallCovid-19/Non F1 ScoreCovid-19/Non US ( x , y ) (single feature CNN) 89.94% 92.48%/82.49% 93.98%/78.70% 93.21%/80.46% US E ( x , y ) + US E ( x , y ) Early FusionMid-FusionLate-Fusion 91.93%90.91%88.96% /84.65%94.94%/81.69%92.12%/81.29% 93.76%/ S ( x , y ) + S ( x , y ) Early FusionMid-FusionLate-Fusion 90.68%86.53%87.52% 93.09%/84.35%87.35%/84.92%89.70%/83.38% 94.33%/80.85%95.85%/61.54%93.62%/70.46% 93.65%/82.17%91.27%/69.50%91.48%/75.56% US ( x , y ) + US E ( x , y ) + US E ( x , y ) Early FusionMid-FusionLate-Fusion 88.60%92.80% /86.05% 92.09%/79.27%95.14%/86.03% US ( x , y ) + US { E } ( x , y ) + US { E } ( x , y ) + S { } ( x , y ) + S { } ( x , y ) Early FusionMid-FusionLate-Fusion 90.57%88.79%89.33% 93.51%/84.21%92.37%/79.50%92.26%/82.15% 93.89%/82.74%92.55%/79.20%93.25%/79.05% 93.54%/82.50%92.37%/78.80%92.69%/80.16% effectiveness of image processing methods and feature-fusionstrategies, we compare the results of just using US ( x , y ) image as the input and the combination between two groupsof processed images ( US E ( x , y ) and US E ( x , y ) , S ( x , y ) and S ( x , y ) ). Furthermore, we investigate the accuracy of themodel by using 5 kinds of images as input.Experiments are implemented in the PyTorch frameworkwith an Intel Core GPU at 3.70 GHz and a NVIDIA GeForceGTX 1080Ti GPU. III. RESULTSQuantitative results of our proposed method are presentedin Table 1. Our proposed multi-scale network achieves anaverage classiﬁcation accuracy of 89.94% when using onlyLUS data ( US ( x , y ) ). The average accuracy increases to91.93% when using enhanced images US E ( x , y ) , and 90.68%when using radial symmetry images S ( x , y ) (Table 1). Thebest performance was obtained when combining the LUS US ( x , y ) images with the enhanced images US E ( x , y ) , wherean average accuracy of 95.11% was obtained. The compar-ison among results of the ﬁrst three sets of experimentsdemonstrates that local phase feature is beneﬁcial to enhancetissue characteristics for network learning, especially, featurefusion performed in the early stage. As seen in Table 1 theresults present that late-fusion design obtains the highestaccuracy (95.11%), F score (96.70%) signiﬁcantly outper-forming the other fusion operations in these two metrics.When all the image features were combined early fusionarchitecture obtained the best results compared to otherfusion networks investigated. We further observe that byusing local phase image features the performance of thenetwork for classifying non-COVID-19 data is also improved( F score of 90.48% vs 80.46%).IV. CONCLUSIONWe proposed to apply a novel CNN-based method toachieve accurate COVID-19 prediction from LUS. Quantita-tive and qualitative results conﬁrm that the use of local phaseinformation and multi-feature multi-scale CNN contributesto improved COVID-19 classiﬁcation performance in LUSdata. Fusing LUS features and local phase features at a late stage gives the highest accuracy reaching 95.11%, atthe same time, other metrics prove a balanced classiﬁcationcapability of the model. In most cases, early-fusion strategyshows a better classiﬁcation performance. Our future workwill include the evaluation of the proposed method on alarger scale dataset. We also would like to extend our networkfor multi-class classiﬁcation for differentiating regular pneu-monia from COVID-19. Finally, optimization of the localphase image ﬁlter parameters based on CNN performancewill be another future work.Rof 90.48% vs 80.46%).IV. CONCLUSIONWe proposed to apply a novel CNN-based method toachieve accurate COVID-19 prediction from LUS. Quantita-tive and qualitative results conﬁrm that the use of local phaseinformation and multi-feature multi-scale CNN contributesto improved COVID-19 classiﬁcation performance in LUSdata. Fusing LUS features and local phase features at a late stage gives the highest accuracy reaching 95.11%, atthe same time, other metrics prove a balanced classiﬁcationcapability of the model. In most cases, early-fusion strategyshows a better classiﬁcation performance. Our future workwill include the evaluation of the proposed method on alarger scale dataset. We also would like to extend our networkfor multi-class classiﬁcation for differentiating regular pneu-monia from COVID-19. Finally, optimization of the localphase image ﬁlter parameters based on CNN performancewill be another future work.R