Boosting Segmentation Performance across datasets using histogram specification with application to pelvic bone segmentation
Prabhakara Subramanya Jois, Aniketh Manjunath, Thomas Fevens
BBOOSTING SEGMENTATION PERFORMANCE ACROSS DATASETS USING HISTOGRAMSPECIFICATION WITH APPLICATION TO PELVIC BONE SEGMENTATION
Prabhakara Subramanya Jois , Aniketh Manjunath , Thomas Fevens Department of Computer Science and Software Engineering, Concordia University, Montr´eal, Canada Department of Computer Science, University of Southern California, Los Angeles, USAEmail: { sp.subramanya, v.m.aniketh } @gmail.com, [email protected] ABSTRACT
Accurate segmentation of the pelvic CTs is crucial for the clinicaldiagnosis of pelvic bone diseases and for planning patient-specifichip surgeries. With the emergence and advancements of deep learn-ing for digital healthcare, several methodologies have been proposedfor such segmentation tasks. But in a low data scenario, the lack ofabundant data needed to train a Deep Neural Network is a signifi-cant bottle-neck. In this work, we propose a methodology based onmodulation of image tonal distributions and deep learning to boostthe performance of networks trained on limited data. The strategyinvolves pre-processing of test data through histogram specification.This simple yet effective approach can be viewed as a style trans-fer methodology. The segmentation task uses a U-Net configurationwith an EfficientNet-B0 backbone, optimized using an augmentedBCE-IoU loss function. This configuration is validated on a total of284 images taken from two publicly available CT datasets, TCIA (acancer imaging archive) and the Visible Human Project. The averageperformance measures for the Dice coefficient and Intersection overUnion are 95.7% and 91.9%, respectively, give strong evidence forthe effectiveness of the approach, which is highly competitive withstate-of-the-art methodologies.
Index Terms — Pelvic bone segmentation, data pre-processing,histogram specification, U-Net, fine-tuning.
1. INTRODUCTION
In recent years, due to the increase in the incidence of pelvic injuriesfrom traffic-related accidents [1], pelvic bone diseases within the ag-ing population, and sufficient access to computed tomography (CT)imaging, automated pelvic bone segmentation in CT has gained con-siderable prominence. The segmentation results assist physicians inthe early detection of pelvic injury and help expedite surgical plan-ning and reduce the complications caused by pelvic fractures [2]. InCT data, structures like the bone marrow and bone surface appear asdark and bright regions due to their low and high densities comparedto the surrounding tissues. However, given the variations in imagequality between different CT datasets, distinguishing bone structuresfrom the image background becomes cumbersome and leads to er-roneous segmentation outputs. These issues indicate a need for anovel solution to develop a simple yet effective methodology for theaccurate segmentation of pelvic bones from varying CT data.
Contribution of this paper:
The key novelties of this work are asfollows: † This work was supported by Mitacs Accelerate Project-IT20604 andNSERC Grants RGPIN 04929 and RGPIN 06785, Canada.
TCIA (a1)VHBD (b1)
Fig. 1 : (a1) and (b1) – illustrate the segmentation outputs, for inputimages from TCIA [3] and VHBD [4], respectively.1. introduction of an encoder-decoder network, trained on lim-ited data, for high accuracy segmentation of pelvic bones2. boosting model performance on unseen data by employinghistogram specificationThe exact details of the approach are deferred until Sec. 3.3. Fig. 1illustrates the results of the proposed method.
2. PRIOR ART
Recent literature has seen many applications for the segmentationof the pelvis from CT imaging data. Traditional methods such asthresholding and region growth [5], deformable surface model [6],and others, have been commonly used to perform bone segmenta-tion. However, these approaches often suffer from low accuracy dueto varying image properties such as intensity, contrast, and the inher-ent variations between the texture of the bone structures (bone mar-row and surface boundary) and the surrounding tissues. To overcomethese challenges, supervised methods such as statistical shape mod-els (SSM) and atlas-based deep learning (DL) methods have madesignificant contributions to segmentation tasks. Wang et al. [7, 8] a r X i v : . [ ee ss . I V ] J a n x 2 x 2 x 6 EFFICIENT NET - B0 ENCODER UNET - DECODER3 x 3 CONV 3 x 3 MBCONV1 3 x 3 MBCONV6 5 x 5 MBCONV6 UPSAMPLE + CONV (2x)CT - DATA SEGMENTED OUTPUT
Fig. 2 : Workflow of U-Net architecture with pre-trained backbone, detailing pelvic bone segmentation.suggested using a multi-atlas segmentation with joint label fusionfor detecting regions on interest from CT images. Yokota et al. [9]showcased a combination of hierarchical and conditional SSMs forthe automated segmentation of diseased hips from CT data. Chu et al. [10] presented a multi-atlas based method for accurately seg-menting femur and pelvis. Zeng et al. [11] proposed a supervised3D U-Net with multi-level supervision for segmenting femur in 3DMRI. Chen et al. [12] showcased a 3D feature enhanced network forquickly segmenting femurs from CT data. Chang et al. [13] pro-posed patch-based refinement on top of a conditional random fieldmodel for fine segmentation of healthy and diseased hips. Liu etal. [14] used 3D U-Nets in two-stages (trained on approximately270K images) with a signed distance function for producing bonefragments from image-stacks. In the following section, we discuss anew technique addressing accurate segmentation of the pelvis fromCT images of varying qualities.
3. PROPOSED METHODOLOGY
The efficacy of using Encoder-Decoder architectures for designinghigh accuracy segmentation models for biomedical applications hasbeen showcased in recent literature [11, 14, 15]. We employ a simi-lar architecture, with various encoder modules for feature extractionand a decoder module for semantic segmentation. The details of theencoder and decoder modules are explained in the following.
In simple terms, an encoder takes the input image and generates ahigh-dimensional feature vector aggregated over multiple levels. Wedeploy a choice of the following well-known architectures as theencoder module:
Residual networks (ResNet) introduced residual mappings to solvethe vanishing gradient problem in deep neural networks [16].ResNets are easy to optimize and gain accuracy even with deepermodels.
Inception Networks are computationally efficient architectures, bothin terms of the model parameters and their memory usage. Adapt-ing the Inception network for different applications while ensuringthat changes do not impede its computational efficiency is difficult. Inception V3 introduced various strategies for optimizing networkwith ease of model adaptation capabilities [17].
Conventional methods make use of scaling to increase the accuracyof the models. The models are scaled by increasing the depth/widthof the network or using higher resolution input images. EfficientNetresults from a novel scaling method that uses a compound coefficientto uniformly scale the network across all dimensions [18].
The decoder module is responsible for generating a semantic seg-mentation mask using the aggregated high-dimensional features ex-tracted by the encoder module. We make use of the popular U-Netmodel specially designed for medical imaging as the decoding mod-ule [15].
Histogram specification, or histogram matching, is a traditional im-age processing technique [19] that matches the input image’s his-togram to a reference histogram. Histogram specification involvescomputing the cumulative distribution function (CDF) of histogramsfrom both the target and the reference, following which a transforma-tion function is obtained by mapping each gray level [0 , fromthe target’s CDF (input) to the gray level in the reference CDF. Inthis work, we construct the reference histogram by averaging overhistograms from every image in the training set. Using this tech-nique as a pre-processing step for the test data serves an importantpurpose, as the distribution of the test data is converted to a similarform seen by the network during training.
4. EXPERIMENTAL VALIDATION4.1. Datasets
The input data preparation and label annotation were done using thetools from
Image-J software. A summary of TCIA–cancer imagingarchive [3] and VHBD–Visible human project [4] datasets, imageresolution, the number of images used in this study, and the respec-tive data-splits for training-validation-testing, are shown in Table 1. able 1 : An overview of the datasets used in this work.
Dataset Resolution Train-set Val-set Test-set
512 x 512 407 58 117582 (70%) (10%) (20%)
VHBD [4]
512 x 512 – – 167167 (100%)
VHBD-2 [4]
512 x 512 116 17 34167 (70%) (10%) (20%)
To quantify the quality of segmentation, we compute standard per-formance measures for segmentation tasks commonly used in lit-erature, specifically, the mean Dice coefficient (mDice) and meanIntersection over Union (mIoU) [20, 21]. For a given segmentationoutput ( A ) and the ground truth ( B ), the Dice coefficient is given byDice = | A ∩ B || A | + | B | , which can be interpreted as a weighted average ofthe precision and recall, and IoU = | A ∩ B || A ∪ B | , (also known as Jaccardindex) is commonly used for comparing the similarity between sets( A ) and ( B ), while penalizing their diversity. The implementations used were based on the documentation from[22]. The models used [16–18] were pre-trained on the Ima-genet [23] dataset to improve the generalization capability on unseendata and achieve faster convergence. For the base-model, we useResNet-34 [16] as the encoder and a U-Net decoder. We initializethe base-model with random weights ( rnwt ) and train without anydata-augmentation ( noaug ) on images from [3], using an NvidiaRTX 2070 GPU, and an ADAM optimizer with a learning rate of0.001, momentum of 0.9 and a weight decay of 0.0001, for 40epochs. We chose a 70% : 10% : 20% split of the data (shown in thefirst row of Table 1), where the 70% was utilized for training and the10% of the data was utilized for validation. The remaining 20% fortesting was completely unseen during training. About 50 passes ofrandom image batches, of size eight, from the training set, were usedin each epoch. The model was then validated on the 10% data toevaluate the performance based on binary-cross-entropy loss ( bce )and record the corresponding weights. After training, the weightsthat gave the best performance on the validation set were selected forthe base-model, which was then evaluated on the unseen test-sets,i.e., 20% of [3] and 100% of [4], respectively, whose performance isshowcased in the first row of Table 2.Extending beyond the base-model, data augmentation ( aug ) wasperformed using horizontal and vertical flips, affine transforms, im-age intensity modulation and blurring, for increasing training datasize and to help reduce over-fitting. In addition, we try to find thebest overall segmentation performance and generalization capabilityto completely unseen data, through further extension of the base-model with different configurations, using the following:• encoder modules using ResNet-34 [16], Inception V3 [17]and EfficientNet-B0 [18], initialized with Imagenet weights( imwt ) for transfer learning• re-configuration of input data, or not, to the pre-trainedmodel’s format and its pre-processing functions ( ppr ), forextraction of better features Input (a) (b)
Fig. 3 : Pelvic bone segmentation on TCIA data using: (a) Base U-Net with random weight initialization for ResNet-34 encoder, withno data-augmentation, optimized using BCE loss ( least perform-ing ); and (b) fine-tuned U-Net with Imagenet weight initializationfor EfficientNet-B0 encoder, with data-augmentation and input re-configuration, optimized using combined BCE-IoU loss ( best per-forming ), are overlaid onto the binary ground-truth; yellow - TP;black - TN; green - FP; red - FN.TCIA (a1) (a2)VHBD (target) (b1) (b2)H-VHBD (c1) (c2)
Fig. 4 : Performance in segmentation with histogram specification:(a1-c1) show the respective histograms of the input images; (a2-c2)show the pelvic bone segmentations overlaid on the ground-truth;and (b2-c2) decisively show the improvement in segmentation frommatching target’s histogram to the reference. yellow - TP; black -TN; green - FP; red - FN.• loss functions like Dice loss ( dice) , IoU loss ( iou ) and com-bined bce-iou loss, in place of bce loss, for propagatingstrong gradients for better optimization and learning
The detailed comparisons of the different U-Net configurations’segmentation performance on test-sets with 95% confidence in-tervals are shown in Table 2. The segmentation outputs from the least-performing (base-model) and best-performing (fine-tuned U- able 2 : Performance comparison of different U-Net configurations for pelvic bone segmentation on unseen data from
TCIA , VHBD , and
H-VHBD , i.e., VHBD after histogram specification.
U-Net Configurations TCIA VHBD H-VHBDmIoU mDice mIoU mDice mIoU mDiceRes34-rnwt-noaug-bce ± ± ± ± ± ± Res34-imwt-aug-bce ± ± ± ± ± ± Res34-imwt-aug-dice ± ± ± ± ± ± Res34-imwt-aug-bce-iou ± ± ± ± ± ± IncepV3-imwt-aug-bce ± ± ± ± ± ± IncepV3-ppr-imwt-aug-bce ± ± ± ± ± ± EffiB0-imwt-aug-bce ± ± ± ± ± ± EffiB0-ppr-imwt-aug-bce-iou 0.924 ± ± ± ± ± ± (cid:168) ) ± ± ± ± ± ± * Encoder module - Res34, IncepV3, EffiB0 are ResNet-34, Inception Net-V3, EfficientNet-B0, respectively. * Encoder Weights - rnwt and imwt are random weights and Imagenet weights, respectively. * Augmentation - aug and noaug means training with and without data-augmentation, respectively. * Loss - bce, dice, iou are the Binary Cross Entropy loss, Dice Loss and IoU loss, respectively . * ppr - configure input to the pre-trained backbone’s format. * Grey background - indicates improvement due to histogram-specification based pre-processing.
Net with Imagenet weight initialization for EfficientNet-B0 en-coder [18], with data-augmentation and input re-configuration,optimized using combined BCE-IoU loss) DL models are show-cased in Fig. 3 (a) &(b). The predicted outputs are overlaid ontothe ground-truth and color-coded (yellow - TP; black - TN; green- FP; red - FN) for visualizing the quality of segmentation. Theresults shown in Fig. 4(b2) &(c2) illustrate the desired effect onsegmentation due to histogram specification. The reduction in thenumber of pixels labeled as FPs & FNs, and improvement in numberof TPs from the overlays decisively show the significance of pre-processing test-data, which clearly boosts the model’s segmentationperformance. Furthermore, the comparitive results tabulated in thelast two columns of Table 2 give strong evidence for the success ofthe proposed methodology on all the specified model configurations.On analyzing the data shown in Table 3, the proposed method-ology’s overall performance on the test-sets surpassed several state-of-the-art techniques that were trained on similarly sized datasets,with the exception of Liu et al. [14] who performed training on ap-proximately 270,000 images. Since data drives any model, the pro-posed methodology (trained only on 407 images) shows room forfurther improvement in segmentation under the availability of largerdatasets.
Images from [3, 4], with the data splits shown in rows 1 and 3 ofTable 1, are used for training. The best-model was trained on thejoint data whose test-data performance is shown in Table 2 ( (cid:168) ). Theresults showed that training the model on joint data degrades theperformance on both datasets. The data imbalance and the varyingimage tonal distributions play a significant role in influencing thesegmentation performance. And by using the proposed methodol-ogy, the model overcomes data imbalance and generalizes well tounseen datasets, which boosts its overall segmentation performance.
Table 3 : Overall performance comparison for pelvic bone segmen-tation with state-of-the-art techniques.
Methodology Dataset mIoU mDice ( ∼ ‡ ( ∼ TCIA,VHBD (284) 0.919 0.957 (DS ‡ ) KITS19, CERVIX, ABDOMEN, MSD T10, COLONOG,CLINIC; Train:Test ≈
5. CONCLUSION
To sum up, in this work, we presented a novel methodology forthe automated segmentation of pelvic bones from axial CT images.We addressed the unmet need for superior pelvic bone segmentationmethodology for images with varying properties by using histogramspecification. This simple yet powerful approach of pre-processingthe test-data improved segmentation performance by a significantmargin, with the quantitative results confirming its validity. Throughour approach, the encoder-decoder configuration overcame a signif-icant hurdle of varying intensity distributions in CT images, whichled to superior segmentation quality. Moreover, after validating theresults on publicly available TCIA and VHBD datasets, the proposedmethodology has been shown to be highly competent with-respect-toexisting state-of-the-art techniques.Through this study, we saw that, although deep learning haspushed the limits for image processing applications, traditional im-age processing techniques are not necessarily obsolete and that com-bining the two approaches can lead to superior performance in seg-mentation. . REFERENCES [1] Rebecca B Naumann, Ann M Dellinger, Eduard Zaloshnja,Bruce A Lawrence, and Ted R Miller, “Incidence and total life-time costs of motor vehicle–related fatal and nonfatal injury byroad user type, united states, 2005,”
Traffic injury prevention ,vol. 11, no. 4, pp. 353–360, 2010.[2] Hui Yu, Haijun Wang, Yao Shi, Ke Xu, Xuyao Yu, and YuzhenCao, “The segmentation of bones in pelvic ct images based onextraction of key frames,”
BMC medical imaging , vol. 18, no.1, pp. 18, 2018.[3] Kenneth Clark, Bruce Vendt, Kirk Smith, John Freymann,Justin Kirby, Paul Koppel, Stephen Moore, Stanley Phillips,David Maffitt, Michael Pringle, et al., “The cancer imagingarchive (tcia): maintaining and operating a public informationrepository,”
Journal of digital imaging , vol. 26, no. 6, pp.1045–1057, 2013.[4] M. J. Ackerman, “The visible human project,”
Proceedings ofthe IEEE , vol. 86, no. 3, pp. 504–511, 1998.[5] Phan TH Truc, Sungyoung Lee, and Tae-Seong Kim, “A den-sity distance augmented chan-vese active contour for ct bonesegmentation,” in .IEEE, 2008, pp. 482–485.[6] Dagmar Kainmueller, Hans Lamecker, Stefan Zachow, andHans-Christian Hege, “Coupling deformable models for multi-object segmentation,” in
International Symposium on Biomed-ical Simulation . Springer, 2008, pp. 69–78.[7] Hongzhi Wang, Jung W Suh, Sandhitsu R Das, John B Pluta,Caryne Craige, and Paul A Yushkevich, “Multi-atlas segmen-tation with joint label fusion,”
IEEE transactions on patternanalysis and machine intelligence , vol. 35, no. 3, pp. 611–623,2012.[8] Hongzhi Wang, Mehdi Moradi, Yaniv Gur, Prasanth Prasanna,and Tanveer Syeda-Mahmood, “A multi-atlas approach to re-gion of interest detection for medical image classification,” in
International Conference on Medical Image Computing andComputer-Assisted Intervention . Springer, 2017, pp. 168–176.[9] Futoshi Yokota, Toshiyuki Okada, Masaki Takao, NobuhikoSugano, Yukio Tada, Noriyuki Tomiyama, and YoshinobuSato, “Automated ct segmentation of diseased hip using hi-erarchical and conditional statistical shape models,” in
In-ternational Conference on Medical Image Computing andComputer-Assisted Intervention . Springer, 2013, pp. 190–197.[10] Chengwen Chu, Junjie Bai, Xiaodong Wu, and Guoyan Zheng,“Mascg: Multi-atlas segmentation constrained graph methodfor accurate segmentation of hip ct images,”
Medical imageanalysis , vol. 26, no. 1, pp. 173–184, 2015.[11] Guodong Zeng, Xin Yang, Jing Li, Lequan Yu, Pheng-AnnHeng, and Guoyan Zheng, “3d u-net with multi-level deep su-pervision: fully automatic segmentation of proximal femur in 3d mr images,” in
International workshop on machine learningin medical imaging . Springer, 2017, pp. 274–282.[12] Fang Chen, Jia Liu, Zhe Zhao, Mingyu Zhu, and Hongen Liao,“Three-dimensional feature-enhanced network for automaticfemur segmentation,”
IEEE journal of biomedical and healthinformatics , vol. 23, no. 1, pp. 243–252, 2017.[13] Yong Chang, Yongfeng Yuan, Changyong Guo, Yadong Wang,Yuanzhi Cheng, and Shinichi Tamura, “Accurate pelvis andfemur segmentation in hip ct with a novel patch-based refine-ment,”
IEEE journal of biomedical and health informatics , vol.23, no. 3, pp. 1192–1204, 2018.[14] Pengbo Liu, Hu Han, Yuanqi Du, Heqin Zhu, Yinhao Li,Feng Gu, Honghu Xiao, Jun Li, Chunpeng Zhao, Li Xiao,et al., “Deep learning to segment pelvic bones: Large-scale ct datasets and baseline models,” arXiv preprintarXiv:2012.08721 , 2020.[15] Olaf Ronneberger, Philipp Fischer, and Thomas Brox, “U-net:Convolutional networks for biomedical image segmentation,”in
International Conference on Medical image computing andcomputer-assisted intervention . Springer, 2015, pp. 234–241.[16] Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun,“Deep residual learning for image recognition,” in
Proceed-ings of the IEEE conference on computer vision and patternrecognition , 2016, pp. 770–778.[17] Christian Szegedy, Vincent Vanhoucke, Sergey Ioffe, JonShlens, and Zbigniew Wojna, “Rethinking the inception ar-chitecture for computer vision,” in
Proceedings of the IEEEconference on computer vision and pattern recognition , 2016,pp. 2818–2826.[18] Mingxing Tan and Quoc V Le, “Efficientnet: Rethinkingmodel scaling for convolutional neural networks,” arXivpreprint arXiv:1905.11946 , 2019.[19] Richard Szeliski,
Computer vision: algorithms and applica-tions , Springer Science & Business Media, 2010.[20] William R Crum, Oscar Camara, and Derek LG Hill, “General-ized overlap measures for evaluation and validation in medicalimage analysis,”
IEEE Trans. Med. Imag. , vol. 25, no. 11, pp.1451–1461, 2006.[21] Herng-Hua Chang, Audrey H Zhuang, Daniel J Valentino, andWoei-Chyn Chu, “Performance measure characterization forevaluating neuroimage segmentation algorithms,”
Neuroim-age , vol. 47, no. 1, pp. 122–135, 2009.[22] Pavel Yakubovskiy, “Segmentation models,” https://segmentation-models.readthedocs.io/en/latest/ , 2019.[23] Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, andLi Fei-Fei, “Imagenet: A large-scale hierarchical imagedatabase,” in2009 IEEE conference on computer vision andpattern recognition