[PDF] Boosting Segmentation Performance across datasets using histogram specification with application to pelvic bone segmentation

Abstract

Accurate segmentation of the pelvic CTs is crucial for the clinical diagnosis of pelvic bone diseases and for planning patient-specific hip surgeries. With the emergence and advancements of deep learning for digital healthcare, several methodologies have been proposed for such segmentation tasks. But in a low data scenario, the lack of abundant data needed to train a Deep Neural Network is a significant bottle-neck. In this work, we propose a methodology based on modulation of image tonal distributions and deep learning to boost the performance of networks trained on limited data. The strategy involves pre-processing of test data through histogram specification. This simple yet effective approach can be viewed as a style transfer methodology. The segmentation task uses a U-Net configuration with an EfficientNet-B0 backbone, optimized using an augmented BCE-IoU loss function. This configuration is validated on a total of 284 images taken from two publicly available CT datasets, TCIA (a cancer imaging archive) and the Visible Human Project. The average performance measures for the Dice coefficient and Intersection over Union are 95.7% and 91.9%, respectively, give strong evidence for the effectiveness of the approach, which is highly competitive with state-of-the-art methodologies.

Full PDF

BBOOSTING SEGMENTATION PERFORMANCE ACROSS DATASETS USING HISTOGRAMSPECIFICATION WITH APPLICATION TO PELVIC BONE SEGMENTATION

Prabhakara Subramanya Jois , Aniketh Manjunath , Thomas Fevens Department of Computer Science and Software Engineering, Concordia University, Montr´eal, Canada Department of Computer Science, University of Southern California, Los Angeles, USAEmail: { sp.subramanya, v.m.aniketh } @gmail.com, [email protected] ABSTRACT

Accurate segmentation of the pelvic CTs is crucial for the clinicaldiagnosis of pelvic bone diseases and for planning patient-speciﬁchip surgeries. With the emergence and advancements of deep learn-ing for digital healthcare, several methodologies have been proposedfor such segmentation tasks. But in a low data scenario, the lack ofabundant data needed to train a Deep Neural Network is a signiﬁ-cant bottle-neck. In this work, we propose a methodology based onmodulation of image tonal distributions and deep learning to boostthe performance of networks trained on limited data. The strategyinvolves pre-processing of test data through histogram speciﬁcation.This simple yet effective approach can be viewed as a style trans-fer methodology. The segmentation task uses a U-Net conﬁgurationwith an EfﬁcientNet-B0 backbone, optimized using an augmentedBCE-IoU loss function. This conﬁguration is validated on a total of284 images taken from two publicly available CT datasets, TCIA (acancer imaging archive) and the Visible Human Project. The averageperformance measures for the Dice coefﬁcient and Intersection overUnion are 95.7% and 91.9%, respectively, give strong evidence forthe effectiveness of the approach, which is highly competitive withstate-of-the-art methodologies.

Index Terms — Pelvic bone segmentation, data pre-processing,histogram speciﬁcation, U-Net, ﬁne-tuning.

1. INTRODUCTION

In recent years, due to the increase in the incidence of pelvic injuriesfrom trafﬁc-related accidents [1], pelvic bone diseases within the ag-ing population, and sufﬁcient access to computed tomography (CT)imaging, automated pelvic bone segmentation in CT has gained con-siderable prominence. The segmentation results assist physicians inthe early detection of pelvic injury and help expedite surgical plan-ning and reduce the complications caused by pelvic fractures [2]. InCT data, structures like the bone marrow and bone surface appear asdark and bright regions due to their low and high densities comparedto the surrounding tissues. However, given the variations in imagequality between different CT datasets, distinguishing bone structuresfrom the image background becomes cumbersome and leads to er-roneous segmentation outputs. These issues indicate a need for anovel solution to develop a simple yet effective methodology for theaccurate segmentation of pelvic bones from varying CT data.

Contribution of this paper:

The key novelties of this work are asfollows: † This work was supported by Mitacs Accelerate Project-IT20604 andNSERC Grants RGPIN 04929 and RGPIN 06785, Canada.

TCIA (a1)VHBD (b1)

Fig. 1 : (a1) and (b1) – illustrate the segmentation outputs, for inputimages from TCIA [3] and VHBD [4], respectively.1. introduction of an encoder-decoder network, trained on lim-ited data, for high accuracy segmentation of pelvic bones2. boosting model performance on unseen data by employinghistogram speciﬁcationThe exact details of the approach are deferred until Sec. 3.3. Fig. 1illustrates the results of the proposed method.

2. PRIOR ART

Recent literature has seen many applications for the segmentationof the pelvis from CT imaging data. Traditional methods such asthresholding and region growth [5], deformable surface model [6],and others, have been commonly used to perform bone segmenta-tion. However, these approaches often suffer from low accuracy dueto varying image properties such as intensity, contrast, and the inher-ent variations between the texture of the bone structures (bone mar-row and surface boundary) and the surrounding tissues. To overcomethese challenges, supervised methods such as statistical shape mod-els (SSM) and atlas-based deep learning (DL) methods have madesigniﬁcant contributions to segmentation tasks. Wang et al. [7, 8] a r X i v : . [ ee ss . I V ] J a n x 2 x 2 x 6 EFFICIENT NET - B0 ENCODER UNET - DECODER3 x 3 CONV 3 x 3 MBCONV1 3 x 3 MBCONV6 5 x 5 MBCONV6 UPSAMPLE + CONV (2x)CT - DATA SEGMENTED OUTPUT

Fig. 2 : Workﬂow of U-Net architecture with pre-trained backbone, detailing pelvic bone segmentation.suggested using a multi-atlas segmentation with joint label fusionfor detecting regions on interest from CT images. Yokota et al. [9]showcased a combination of hierarchical and conditional SSMs forthe automated segmentation of diseased hips from CT data. Chu et al. [10] presented a multi-atlas based method for accurately seg-menting femur and pelvis. Zeng et al. [11] proposed a supervised3D U-Net with multi-level supervision for segmenting femur in 3DMRI. Chen et al. [12] showcased a 3D feature enhanced network forquickly segmenting femurs from CT data. Chang et al. [13] pro-posed patch-based reﬁnement on top of a conditional random ﬁeldmodel for ﬁne segmentation of healthy and diseased hips. Liu etal. [14] used 3D U-Nets in two-stages (trained on approximately270K images) with a signed distance function for producing bonefragments from image-stacks. In the following section, we discuss anew technique addressing accurate segmentation of the pelvis fromCT images of varying qualities.

3. PROPOSED METHODOLOGY

The efﬁcacy of using Encoder-Decoder architectures for designinghigh accuracy segmentation models for biomedical applications hasbeen showcased in recent literature [11, 14, 15]. We employ a simi-lar architecture, with various encoder modules for feature extractionand a decoder module for semantic segmentation. The details of theencoder and decoder modules are explained in the following.

In simple terms, an encoder takes the input image and generates ahigh-dimensional feature vector aggregated over multiple levels. Wedeploy a choice of the following well-known architectures as theencoder module:

Residual networks (ResNet) introduced residual mappings to solvethe vanishing gradient problem in deep neural networks [16].ResNets are easy to optimize and gain accuracy even with deepermodels.

Inception Networks are computationally efﬁcient architectures, bothin terms of the model parameters and their memory usage. Adapt-ing the Inception network for different applications while ensuringthat changes do not impede its computational efﬁciency is difﬁcult. Inception V3 introduced various strategies for optimizing networkwith ease of model adaptation capabilities [17].

Conventional methods make use of scaling to increase the accuracyof the models. The models are scaled by increasing the depth/widthof the network or using higher resolution input images. EfﬁcientNetresults from a novel scaling method that uses a compound coefﬁcientto uniformly scale the network across all dimensions [18].

The decoder module is responsible for generating a semantic seg-mentation mask using the aggregated high-dimensional features ex-tracted by the encoder module. We make use of the popular U-Netmodel specially designed for medical imaging as the decoding mod-ule [15].

Histogram speciﬁcation, or histogram matching, is a traditional im-age processing technique [19] that matches the input image’s his-togram to a reference histogram. Histogram speciﬁcation involvescomputing the cumulative distribution function (CDF) of histogramsfrom both the target and the reference, following which a transforma-tion function is obtained by mapping each gray level [0 , fromthe target’s CDF (input) to the gray level in the reference CDF. Inthis work, we construct the reference histogram by averaging overhistograms from every image in the training set. Using this tech-nique as a pre-processing step for the test data serves an importantpurpose, as the distribution of the test data is converted to a similarform seen by the network during training.

4. EXPERIMENTAL VALIDATION4.1. Datasets

The input data preparation and label annotation were done using thetools from

Image-J software. A summary of TCIA–cancer imagingarchive [3] and VHBD–Visible human project [4] datasets, imageresolution, the number of images used in this study, and the respec-tive data-splits for training-validation-testing, are shown in Table 1. able 1 : An overview of the datasets used in this work.

Dataset Resolution Train-set Val-set Test-set

512 x 512 407 58 117582 (70%) (10%) (20%)

VHBD [4]

512 x 512 – – 167167 (100%)

VHBD-2 [4]

512 x 512 116 17 34167 (70%) (10%) (20%)

To quantify the quality of segmentation, we compute standard per-formance measures for segmentation tasks commonly used in lit-erature, speciﬁcally, the mean Dice coefﬁcient (mDice) and meanIntersection over Union (mIoU) [20, 21]. For a given segmentationoutput ( A ) and the ground truth ( B ), the Dice coefﬁcient is given byDice = | A ∩ B || A | + | B | , which can be interpreted as a weighted average ofthe precision and recall, and IoU = | A ∩ B || A ∪ B | , (also known as Jaccardindex) is commonly used for comparing the similarity between sets( A ) and ( B ), while penalizing their diversity. The implementations used were based on the documentation from[22]. The models used [16–18] were pre-trained on the Ima-genet [23] dataset to improve the generalization capability on unseendata and achieve faster convergence. For the base-model, we useResNet-34 [16] as the encoder and a U-Net decoder. We initializethe base-model with random weights ( rnwt ) and train without anydata-augmentation ( noaug ) on images from [3], using an NvidiaRTX 2070 GPU, and an ADAM optimizer with a learning rate of0.001, momentum of 0.9 and a weight decay of 0.0001, for 40epochs. We chose a 70% : 10% : 20% split of the data (shown in theﬁrst row of Table 1), where the 70% was utilized for training and the10% of the data was utilized for validation. The remaining 20% fortesting was completely unseen during training. About 50 passes ofrandom image batches, of size eight, from the training set, were usedin each epoch. The model was then validated on the 10% data toevaluate the performance based on binary-cross-entropy loss ( bce )and record the corresponding weights. After training, the weightsthat gave the best performance on the validation set were selected forthe base-model, which was then evaluated on the unseen test-sets,i.e., 20% of [3] and 100% of [4], respectively, whose performance isshowcased in the ﬁrst row of Table 2.Extending beyond the base-model, data augmentation ( aug ) wasperformed using horizontal and vertical ﬂips, afﬁne transforms, im-age intensity modulation and blurring, for increasing training datasize and to help reduce over-ﬁtting. In addition, we try to ﬁnd thebest overall segmentation performance and generalization capabilityto completely unseen data, through further extension of the base-model with different conﬁgurations, using the following:• encoder modules using ResNet-34 [16], Inception V3 [17]and EfﬁcientNet-B0 [18], initialized with Imagenet weights( imwt ) for transfer learning• re-conﬁguration of input data, or not, to the pre-trainedmodel’s format and its pre-processing functions ( ppr ), forextraction of better features Input (a) (b)

Fig. 3 : Pelvic bone segmentation on TCIA data using: (a) Base U-Net with random weight initialization for ResNet-34 encoder, withno data-augmentation, optimized using BCE loss ( least perform-ing ); and (b) ﬁne-tuned U-Net with Imagenet weight initializationfor EfﬁcientNet-B0 encoder, with data-augmentation and input re-conﬁguration, optimized using combined BCE-IoU loss ( best per-forming ), are overlaid onto the binary ground-truth; yellow - TP;black - TN; green - FP; red - FN.TCIA (a1) (a2)VHBD (target) (b1) (b2)H-VHBD (c1) (c2)

Fig. 4 : Performance in segmentation with histogram speciﬁcation:(a1-c1) show the respective histograms of the input images; (a2-c2)show the pelvic bone segmentations overlaid on the ground-truth;and (b2-c2) decisively show the improvement in segmentation frommatching target’s histogram to the reference. yellow - TP; black -TN; green - FP; red - FN.• loss functions like Dice loss ( dice) , IoU loss ( iou ) and com-bined bce-iou loss, in place of bce loss, for propagatingstrong gradients for better optimization and learning

The detailed comparisons of the different U-Net conﬁgurations’segmentation performance on test-sets with 95% conﬁdence in-tervals are shown in Table 2. The segmentation outputs from the least-performing (base-model) and best-performing (ﬁne-tuned U- able 2 : Performance comparison of different U-Net conﬁgurations for pelvic bone segmentation on unseen data from

TCIA , VHBD , and

H-VHBD , i.e., VHBD after histogram speciﬁcation.

U-Net Conﬁgurations TCIA VHBD H-VHBDmIoU mDice mIoU mDice mIoU mDiceRes34-rnwt-noaug-bce ± ± ± ± ± ± Res34-imwt-aug-bce ± ± ± ± ± ± Res34-imwt-aug-dice ± ± ± ± ± ± Res34-imwt-aug-bce-iou ± ± ± ± ± ± IncepV3-imwt-aug-bce ± ± ± ± ± ± IncepV3-ppr-imwt-aug-bce ± ± ± ± ± ± EfﬁB0-imwt-aug-bce ± ± ± ± ± ± EfﬁB0-ppr-imwt-aug-bce-iou 0.924 ± ± ± ± ± ± (cid:168) ) ± ± ± ± ± ± * Encoder module - Res34, IncepV3, EfﬁB0 are ResNet-34, Inception Net-V3, EfﬁcientNet-B0, respectively. * Encoder Weights - rnwt and imwt are random weights and Imagenet weights, respectively. * Augmentation - aug and noaug means training with and without data-augmentation, respectively. * Loss - bce, dice, iou are the Binary Cross Entropy loss, Dice Loss and IoU loss, respectively . * ppr - conﬁgure input to the pre-trained backbone’s format. * Grey background - indicates improvement due to histogram-speciﬁcation based pre-processing.

Net with Imagenet weight initialization for EfﬁcientNet-B0 en-coder [18], with data-augmentation and input re-conﬁguration,optimized using combined BCE-IoU loss) DL models are show-cased in Fig. 3 (a) &(b). The predicted outputs are overlaid ontothe ground-truth and color-coded (yellow - TP; black - TN; green- FP; red - FN) for visualizing the quality of segmentation. Theresults shown in Fig. 4(b2) &(c2) illustrate the desired effect onsegmentation due to histogram speciﬁcation. The reduction in thenumber of pixels labeled as FPs & FNs, and improvement in numberof TPs from the overlays decisively show the signiﬁcance of pre-processing test-data, which clearly boosts the model’s segmentationperformance. Furthermore, the comparitive results tabulated in thelast two columns of Table 2 give strong evidence for the success ofthe proposed methodology on all the speciﬁed model conﬁgurations.On analyzing the data shown in Table 3, the proposed method-ology’s overall performance on the test-sets surpassed several state-of-the-art techniques that were trained on similarly sized datasets,with the exception of Liu et al. [14] who performed training on ap-proximately 270,000 images. Since data drives any model, the pro-posed methodology (trained only on 407 images) shows room forfurther improvement in segmentation under the availability of largerdatasets.

Images from [3, 4], with the data splits shown in rows 1 and 3 ofTable 1, are used for training. The best-model was trained on thejoint data whose test-data performance is shown in Table 2 ( (cid:168) ). Theresults showed that training the model on joint data degrades theperformance on both datasets. The data imbalance and the varyingimage tonal distributions play a signiﬁcant role in inﬂuencing thesegmentation performance. And by using the proposed methodol-ogy, the model overcomes data imbalance and generalizes well tounseen datasets, which boosts its overall segmentation performance.

Table 3 : Overall performance comparison for pelvic bone segmen-tation with state-of-the-art techniques.

Methodology Dataset mIoU mDice ( ∼ ‡ ( ∼ TCIA,VHBD (284) 0.919 0.957 (DS ‡ ) KITS19, CERVIX, ABDOMEN, MSD T10, COLONOG,CLINIC; Train:Test ≈

5. CONCLUSION

To sum up, in this work, we presented a novel methodology forthe automated segmentation of pelvic bones from axial CT images.We addressed the unmet need for superior pelvic bone segmentationmethodology for images with varying properties by using histogramspeciﬁcation. This simple yet powerful approach of pre-processingthe test-data improved segmentation performance by a signiﬁcantmargin, with the quantitative results conﬁrming its validity. Throughour approach, the encoder-decoder conﬁguration overcame a signif-icant hurdle of varying intensity distributions in CT images, whichled to superior segmentation quality. Moreover, after validating theresults on publicly available TCIA and VHBD datasets, the proposedmethodology has been shown to be highly competent with-respect-toexisting state-of-the-art techniques.Through this study, we saw that, although deep learning haspushed the limits for image processing applications, traditional im-age processing techniques are not necessarily obsolete and that com-bining the two approaches can lead to superior performance in seg-mentation. . REFERENCES [1] Rebecca B Naumann, Ann M Dellinger, Eduard Zaloshnja,Bruce A Lawrence, and Ted R Miller, “Incidence and total life-time costs of motor vehicle–related fatal and nonfatal injury byroad user type, united states, 2005,”

Trafﬁc injury prevention ,vol. 11, no. 4, pp. 353–360, 2010.[2] Hui Yu, Haijun Wang, Yao Shi, Ke Xu, Xuyao Yu, and YuzhenCao, “The segmentation of bones in pelvic ct images based onextraction of key frames,”

BMC medical imaging , vol. 18, no.1, pp. 18, 2018.[3] Kenneth Clark, Bruce Vendt, Kirk Smith, John Freymann,Justin Kirby, Paul Koppel, Stephen Moore, Stanley Phillips,David Mafﬁtt, Michael Pringle, et al., “The cancer imagingarchive (tcia): maintaining and operating a public informationrepository,”

Journal of digital imaging , vol. 26, no. 6, pp.1045–1057, 2013.[4] M. J. Ackerman, “The visible human project,”

Proceedings ofthe IEEE , vol. 86, no. 3, pp. 504–511, 1998.[5] Phan TH Truc, Sungyoung Lee, and Tae-Seong Kim, “A den-sity distance augmented chan-vese active contour for ct bonesegmentation,” in .IEEE, 2008, pp. 482–485.[6] Dagmar Kainmueller, Hans Lamecker, Stefan Zachow, andHans-Christian Hege, “Coupling deformable models for multi-object segmentation,” in

International Symposium on Biomed-ical Simulation . Springer, 2008, pp. 69–78.[7] Hongzhi Wang, Jung W Suh, Sandhitsu R Das, John B Pluta,Caryne Craige, and Paul A Yushkevich, “Multi-atlas segmen-tation with joint label fusion,”

IEEE transactions on patternanalysis and machine intelligence , vol. 35, no. 3, pp. 611–623,2012.[8] Hongzhi Wang, Mehdi Moradi, Yaniv Gur, Prasanth Prasanna,and Tanveer Syeda-Mahmood, “A multi-atlas approach to re-gion of interest detection for medical image classiﬁcation,” in

International Conference on Medical Image Computing andComputer-Assisted Intervention . Springer, 2017, pp. 168–176.[9] Futoshi Yokota, Toshiyuki Okada, Masaki Takao, NobuhikoSugano, Yukio Tada, Noriyuki Tomiyama, and YoshinobuSato, “Automated ct segmentation of diseased hip using hi-erarchical and conditional statistical shape models,” in

In-ternational Conference on Medical Image Computing andComputer-Assisted Intervention . Springer, 2013, pp. 190–197.[10] Chengwen Chu, Junjie Bai, Xiaodong Wu, and Guoyan Zheng,“Mascg: Multi-atlas segmentation constrained graph methodfor accurate segmentation of hip ct images,”

Medical imageanalysis , vol. 26, no. 1, pp. 173–184, 2015.[11] Guodong Zeng, Xin Yang, Jing Li, Lequan Yu, Pheng-AnnHeng, and Guoyan Zheng, “3d u-net with multi-level deep su-pervision: fully automatic segmentation of proximal femur in 3d mr images,” in

International workshop on machine learningin medical imaging . Springer, 2017, pp. 274–282.[12] Fang Chen, Jia Liu, Zhe Zhao, Mingyu Zhu, and Hongen Liao,“Three-dimensional feature-enhanced network for automaticfemur segmentation,”

IEEE journal of biomedical and healthinformatics , vol. 23, no. 1, pp. 243–252, 2017.[13] Yong Chang, Yongfeng Yuan, Changyong Guo, Yadong Wang,Yuanzhi Cheng, and Shinichi Tamura, “Accurate pelvis andfemur segmentation in hip ct with a novel patch-based reﬁne-ment,”

IEEE journal of biomedical and health informatics , vol.23, no. 3, pp. 1192–1204, 2018.[14] Pengbo Liu, Hu Han, Yuanqi Du, Heqin Zhu, Yinhao Li,Feng Gu, Honghu Xiao, Jun Li, Chunpeng Zhao, Li Xiao,et al., “Deep learning to segment pelvic bones: Large-scale ct datasets and baseline models,” arXiv preprintarXiv:2012.08721 , 2020.[15] Olaf Ronneberger, Philipp Fischer, and Thomas Brox, “U-net:Convolutional networks for biomedical image segmentation,”in

International Conference on Medical image computing andcomputer-assisted intervention . Springer, 2015, pp. 234–241.[16] Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun,“Deep residual learning for image recognition,” in

Proceed-ings of the IEEE conference on computer vision and patternrecognition , 2016, pp. 770–778.[17] Christian Szegedy, Vincent Vanhoucke, Sergey Ioffe, JonShlens, and Zbigniew Wojna, “Rethinking the inception ar-chitecture for computer vision,” in

Proceedings of the IEEEconference on computer vision and pattern recognition , 2016,pp. 2818–2826.[18] Mingxing Tan and Quoc V Le, “Efﬁcientnet: Rethinkingmodel scaling for convolutional neural networks,” arXivpreprint arXiv:1905.11946 , 2019.[19] Richard Szeliski,

Computer vision: algorithms and applica-tions , Springer Science & Business Media, 2010.[20] William R Crum, Oscar Camara, and Derek LG Hill, “General-ized overlap measures for evaluation and validation in medicalimage analysis,”

IEEE Trans. Med. Imag. , vol. 25, no. 11, pp.1451–1461, 2006.[21] Herng-Hua Chang, Audrey H Zhuang, Daniel J Valentino, andWoei-Chyn Chu, “Performance measure characterization forevaluating neuroimage segmentation algorithms,”

Neuroim-age , vol. 47, no. 1, pp. 122–135, 2009.[22] Pavel Yakubovskiy, “Segmentation models,” https://segmentation-models.readthedocs.io/en/latest/ , 2019.[23] Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, andLi Fei-Fei, “Imagenet: A large-scale hierarchical imagedatabase,” in2009 IEEE conference on computer vision andpattern recognition