Multi-Feature Multi-Scale CNN-Derived COVID-19 Classification from Lung Ultrasound Data
Hui Che, Jared Radbel, Jag Sunderram, John L. Nosher, Vishal M. Patel, Ilker Hacihaliloglu
MMulti-feature Multi-Scale CNN-Derived COVID-19 Classification fromLung Ultrasound Data
Hui Che, Jared Radbel, Jag Sunderram, John L. Nosher, Vishal M. Patel, and Ilker Hacihaliloglu
Abstract — The global pandemic of the novel coronavirusdisease 2019 (COVID-19) has put tremendous pressure onthe medical system. Imaging plays a complementary rolein the management of patients with COVID-19. Computedtomography (CT) and chest X-ray (CXR) are the two dominantscreening tools. However, difficulty in eliminating the risk ofdisease transmission, radiation exposure and not being cost-effective are some of the challenges for CT and CXR imaging.This fact induces the implementation of lung ultrasound (LUS)for evaluating COVID-19 due to its practical advantages ofnoninvasiveness, repeatability, and sensitive bedside property.In this paper, we utilize a deep learning model to perform theclassification of COVID-19 from LUS data, which could produceobjective diagnostic information for clinicians. Specifically, allLUS images are processed to obtain their corresponding localphase filtered images and radial symmetry transformed imagesbefore fed into the multi-scale residual convolutional neuralnetwork (CNN). Secondly, image combination as the input ofthe network is used to explore rich and reliable features. Featurefusion strategy at different levels is adopted to investigatethe relationship between the depth of feature aggregation andthe classification accuracy. Our proposed method is evaluatedon the point-of-care US (POCUS) dataset together with theItalian COVID-19 Lung US database (ICLUS-DB) and showspromising performance for COVID-19 prediction.
I. INTRODUCTIONThe COVID-19 pandemic has increased the burden ofexcess morbidity and mortality worldwide. The high trans-missibility and long incubation time of the SARS CoV-2virus increase the difficulty in containing viral spread. Arapid diagnosis and severity classification in the early stageof the disease can significantly reduce the risk of furtherinfections and help mitigate the excess morbidity and mor-tality of critically ill patients. At present, the main detectionmethods for COVID-19 infection are the genetic test (reversetranscription polymerase chain reaction (RT-PCR)) [1] andserology test. RRadiological assessment, based on CT andCXR, has been incorporated to improve the management ofCOVID-19 disease. However, difficulty in eliminating therisk of disease transmission, radiation exposure, not beingcost-effective are some of the challenges for CT and CXR
Hui Che is with the Department of Biomedical Engineering, RutgersUniversity, NJ, USAJared Radbel and Jag Sunderam are with the Department of Medicine,Rutgers Robert Wood Johnson Medical School, NJ, USAJohn L. Nosher is with the Department of Radiology, Rutgers RobertWood Johnson Medical School, NJ, USAVishal M. Patel is with the Department of Electrical and ComputerEngineering, Johns Hopkins University, MD, USAIlker Hacihaliloglu is with the Department of Biomedical Engineering,Rutgers University and the Department of Radiology, Rutgers RobertWood Johnson Medical School, NJ, USA (Corresponding author; E-mail: [email protected] ) imaging [2]. CT scan can also be not performed bedsidelimiting its use in the intensive care unit (ICU) settings.Lung ultrasound (LUS) is non-invasive, rapid, repeatable,and provides bedside imaging providing a safer alternativeto CXR and CT. As such, LUS use for rapid assessmentof the severity of COVID-19 pneumonia has been reported[1]–[3]. However, early lesions or less obvious tissue changesare difficult to distinguish by the human eyes. Furthermore,differences in medical pathology around various regionsand the varied LUS experience of clinicians can resultin misdiagnosis. Thus, developing standardized systems toreport and interpret disease findings is a challenge with LUS[2].Artificial intelligence (AI)-based solutions in medicalimaging have demonstrated the potential to establish ob-jective and unified interpretation standards. In [1] a newconvolutional neural network (CNN) architecture, termedPOCOVID-Net, was proposed. A VGG-16 architecture wasused as the backbone and was fine-tuned during networktraining [1]. The reported average 3-class classification ac-curacy was 89% [1]. In [4], a multi-task CNN architecturewas proposed. The network achieved an F score of 61%, aprecision of 70%, and a recall of 60% for risk prediction.Although promising early results, CNN-based methodsfor processing B-mode US data are affected by the imageacquisition settings and quality of the collected data [5].Finally, the limited availability of COVID-19 LUS data isalso another bottleneck.To address the above problems, we propose using a multi-feature multi-scale CNN-based approach to achieve a moreaccurate COVID-19 classification. Given that incorporatinglocal-phase image tissue features can improve the accuracyof CNNs [5] for processing B-mode US data, local phase USimage-based COVID-19 signatures are extracted for diverseand robust representations. Then we adopted the feature-fusion strategy to realize the effect of feature complement.To enlarge the network perception dimensions for morediscriminative features of the input images, extra convolu-tional layers with different-size kernels are used in our CNNarchitecture. Our proposed approach is evaluated on 1752scans obtained from 76 subjects.II. METHODSOur method mainly consists of two parts, local phasefeatures extraction and binary classification based on multi-feature multi-scale CNNs. In this work, the use of local phaseinformation aims to enhance the appearance of lung tissuesand recover the pertinent tissue structure from LUS data. a r X i v : . [ ee ss . I V ] F e b he extraction of local phase image features also increasethe dataset size for training. The model applied for the clas-sification task is based on the multi-scale two-dimensional(2D) residual neural network (ResNet) architecture similar tothe one reported in [6]. Three different fusion architecturesare investigated during this work. A. Local Phase Image Features
Image phase information is a key component in the inter-pretation of a scene and has been used in various applicationsfor processing US data [5], [7]. In this part, we first obtainthe local phase energy feature image, denoted as
LPE ( x , y ) ,which is extracted using: LPE ( x , y ) = ∑ sc | US M | − (cid:113) US M + US M (1)In the above formula, sc represents the number of filter scalesset to 2 throughout the experimental evaluation, and US M isthe group of monogenic signal images computed using thevector-valued odd filter (Riesz filter) [7] on band-pass filteredLUS image, denoted as US B ( x , y ) , as follows: US M ( x , y ) = [ US M ( x , y ) , US M ( x , y ) , US M ( x , y )]= [ US B ( x , y ) , US B ( x , y ) ⊗ h ( x , y ) , US B ( x , y ) ⊗ h ( x , y )] (2)where ⊗ represents the convolution operation and h ( x , y ) , h ( x , y ) are components in Riesz filter. For bandpass filtering α -scale space derivative quadrature filters (ASSD) [7] areused to output US B ( x , y ) .Then US signal transmission map is modelled with scat-tering and attenuation information to get enhanced im-age US E ( x , y ) , with maximized visibility of high intensity LPE ( x , y ) features inside a local region. A linear interpola-tion model is selected to combine the two interactions: LPE ( x , y ) = US A ( x , y ) US E ( x , y ) + ( − US A ( x , y )) β (3)Here, LPE ( x , y ) is the local phase energy image, US A ( x , y ) is the signal transmission map and US E ( x , y ) is the enhancedimage. β is a constant value representative of the tissueechogenicity in the local region. Our aim is US E ( x , y ) andwe hope to get two different enhancement results withdifferent β value settings (60% and 90% of the maximumintensity value of LPE ( x , y ) ). Once the signal transmissionmap US A ( x , y ) is obtained using the well-established BeerLambert Law, US E ( x , y ) can be calculated according toEquation (4): US E ( x , y ) = LPE ( x , y ) − β [ max ( US A ( x , y ) , ε )] δ + β (4) δ is related to tissue attenuation coefficient, η , and ε is asmall constant to avoid division by zero. Throughout theexperimental evaluation η = . ε = . US E ( x , y ) denoted as US E ( x , y ) and US E ( x , y ) are shown. These twoenhanced images are used as the input for radial symmetrytissue extraction. Fast radial symmetry transform algorithm is Fig. 1. Qualitative results of local phase and radial symmetry-based imageenhancement and feature extraction methods. Top row: A regular lung.Middle row: A bacterial pneumonia infected lung. Bottom row:A COVID-19infected lung. All rows from left to right: LUS image US ( x , y ) , local phaseenhanced images US E ( x , y ) and US E ( x , y ) , radial symmetry transformedimages S ( x , y ) and S ( x , y ) . applied on the local phase images, aiming to detect points ofinterest [7]. Fig. 1 shows radial symmetry images S ( x , y ) and S ( x , y ) corresponding to the local phase enhanced images, US E ( x , y ) and US E ( x , y ) , it can be seen that the transfor-mation highlights the points of interest that are characterizedby radial symmetry as well as high contrast. B. Network Architecture
The multi-scale 2D ResNet is a light-weighted classifica-tion network even though it simultaneously captures featuresfrom multiple receptive fields. This network is composed ofthree functional parts: 1) one convolutional layer for primaryfeature map extraction, 2) multiple residual blocks with themulti-scale convolutional layer, 3) a fully connected layerwith softmax activation function to act as a classifier. Allinput images should be resized to 512 ×
512 before fed intothe network. We investigate three different fusion architec-tures with the US ( x , y ) , US E ( x , y ) , US E ( x , y ) , S ( x , y ) , and S ( x , y ) images as an input.Fig. 2 illustrates various network architectures. Since thescale of the objects in the image is varies, we adopt multi-scale receptive fields to focus on different scale information.As shown in Fig. 2, three ResCNN blocks are basic com-ponents in all designs to extract multi-scale features. Theyhave different receptive kernels, the sizes set to 3 ×
3, 5 × ×
7. In every ResCNN block, there are three sub-blocks and each sub-block contains two convolutional layers.The skip connection is added in each sub-block to avoid thedegradation problem [5]. An average pooling layer followsafter the convolution operation to output the final featuremap. At the end of the network, a fully connected layer withactivation function is used to act as a classifier, with the inputof the concatenation of the final feature maps.Feature-fusion function is utilized in different levels ofthe CNN model to construct early-, mid-, and late-fusion ig. 2. The various CNN architectures. Each convolutional layer has three parameters: kernel size, depth and stride. (a) The architecture of the early-fusionCNN. US ( x , y ) , US E ( x , y ) , US E ( x , y ) , S ( x , y ) and S ( x , y ) images are fused at the pixel level to input the network. (b) The architecture of the mid-fusionCNN. Five input images are separately processed by the initial convolutional block to output the corresponding primary feature maps. A concatenationof these primary feature maps is processed through the network. (c) The architecture of the late-fusion CNN. Five input images are separately processedthrough the whole network till the average pooling layer. All final feature maps are fused to input the fully-connected layer. structures [4]. To achieve early-fusion, all the images areconcatenated at the pixel level to form input with morechannels. In the mid-fusion model, multiple input imagesare input to the network separately, processed by the initialconvolutional layer to obtain corresponding primary featuremaps. Concatenation is performed to finish feature aggrega-tion for the deeper extraction. Late fusion is operated beforethe fully connected layer processing to fuse final featuremaps from each input image. C. Data
The dataset used in this work was obtained from [1]and [3] and consisted of 1276 COVID-19 LUS scans from51 subjects, 254 bacterial pneumonia LUS scans from 13subjects, and 222 LUS scans from 12 healthy subjects. Allselected images from two released datasets are in the formof convex probe image. Bacterial pneumonia and healthy LUS scans are joined as a non-COVID-19 class. Our task isto perform binary classification. Before image enhancement,all data is cropped into 334 ×
334 squares for a purpose ofremoving the non-relevant information.
D. Experiment Implementation
We perform 5-fold cross-validation to evaluate the per-formance of our proposed method. During evaluation samepatient data was not included in the training and testing data.The reported final results show the mean of the 5-fold crossvalidation. All datasets maintain the same data distribution,including the US ( x , y ) dataset and US E ( x , y ) , US E ( x , y ) , S ( x , y ) , S ( x , y ) datasets.All CNN models are trained by using the cross-entropyloss function and ADAM optimizer with a learning rate of1 e −
5. Classification performance is measured by four met-rics: accuracy, precision, recall and F score . To evaluate the ABLE IC
LASSIFICATION PERFORMANCE SUMMARY . B
EST RESULT IS SHOWN IN
BOLD
Method
Accuracy PrecisionCovid-19/Non RecallCovid-19/Non F1 ScoreCovid-19/Non US ( x , y ) (single feature CNN) 89.94% 92.48%/82.49% 93.98%/78.70% 93.21%/80.46% US E ( x , y ) + US E ( x , y ) Early FusionMid-FusionLate-Fusion 91.93%90.91%88.96% /84.65%94.94%/81.69%92.12%/81.29% 93.76%/ S ( x , y ) + S ( x , y ) Early FusionMid-FusionLate-Fusion 90.68%86.53%87.52% 93.09%/84.35%87.35%/84.92%89.70%/83.38% 94.33%/80.85%95.85%/61.54%93.62%/70.46% 93.65%/82.17%91.27%/69.50%91.48%/75.56% US ( x , y ) + US E ( x , y ) + US E ( x , y ) Early FusionMid-FusionLate-Fusion 88.60%92.80% /86.05% 92.09%/79.27%95.14%/86.03% US ( x , y ) + US { E } ( x , y ) + US { E } ( x , y ) + S { } ( x , y ) + S { } ( x , y ) Early FusionMid-FusionLate-Fusion 90.57%88.79%89.33% 93.51%/84.21%92.37%/79.50%92.26%/82.15% 93.89%/82.74%92.55%/79.20%93.25%/79.05% 93.54%/82.50%92.37%/78.80%92.69%/80.16% effectiveness of image processing methods and feature-fusionstrategies, we compare the results of just using US ( x , y ) image as the input and the combination between two groupsof processed images ( US E ( x , y ) and US E ( x , y ) , S ( x , y ) and S ( x , y ) ). Furthermore, we investigate the accuracy of themodel by using 5 kinds of images as input.Experiments are implemented in the PyTorch frameworkwith an Intel Core GPU at 3.70 GHz and a NVIDIA GeForceGTX 1080Ti GPU. III. RESULTSQuantitative results of our proposed method are presentedin Table 1. Our proposed multi-scale network achieves anaverage classification accuracy of 89.94% when using onlyLUS data ( US ( x , y ) ). The average accuracy increases to91.93% when using enhanced images US E ( x , y ) , and 90.68%when using radial symmetry images S ( x , y ) (Table 1). Thebest performance was obtained when combining the LUS US ( x , y ) images with the enhanced images US E ( x , y ) , wherean average accuracy of 95.11% was obtained. The compar-ison among results of the first three sets of experimentsdemonstrates that local phase feature is beneficial to enhancetissue characteristics for network learning, especially, featurefusion performed in the early stage. As seen in Table 1 theresults present that late-fusion design obtains the highestaccuracy (95.11%), F score (96.70%) significantly outper-forming the other fusion operations in these two metrics.When all the image features were combined early fusionarchitecture obtained the best results compared to otherfusion networks investigated. We further observe that byusing local phase image features the performance of thenetwork for classifying non-COVID-19 data is also improved( F score of 90.48% vs 80.46%).IV. CONCLUSIONWe proposed to apply a novel CNN-based method toachieve accurate COVID-19 prediction from LUS. Quantita-tive and qualitative results confirm that the use of local phaseinformation and multi-feature multi-scale CNN contributesto improved COVID-19 classification performance in LUSdata. Fusing LUS features and local phase features at a late stage gives the highest accuracy reaching 95.11%, atthe same time, other metrics prove a balanced classificationcapability of the model. In most cases, early-fusion strategyshows a better classification performance. Our future workwill include the evaluation of the proposed method on alarger scale dataset. We also would like to extend our networkfor multi-class classification for differentiating regular pneu-monia from COVID-19. Finally, optimization of the localphase image filter parameters based on CNN performancewill be another future work.Rof 90.48% vs 80.46%).IV. CONCLUSIONWe proposed to apply a novel CNN-based method toachieve accurate COVID-19 prediction from LUS. Quantita-tive and qualitative results confirm that the use of local phaseinformation and multi-feature multi-scale CNN contributesto improved COVID-19 classification performance in LUSdata. Fusing LUS features and local phase features at a late stage gives the highest accuracy reaching 95.11%, atthe same time, other metrics prove a balanced classificationcapability of the model. In most cases, early-fusion strategyshows a better classification performance. Our future workwill include the evaluation of the proposed method on alarger scale dataset. We also would like to extend our networkfor multi-class classification for differentiating regular pneu-monia from COVID-19. Finally, optimization of the localphase image filter parameters based on CNN performancewill be another future work.R