bioRxiv | 2021

Deep learning-based segmentation of high-resolution computed tomography image data outperforms commonly used automatic bone segmentation methods

 
 
 
 
 
 
 
 
 
 

Abstract


Segmenting bone from background is required to quantify bone architecture in computed tomography (CT) image data. A deep learning approach using convolutional neural networks (CNN) is a promising alternative method for automatic segmentation. The study objectives were to evaluate the performance of CNNs in automatic segmentation of human vertebral body (micro-CT) and femoral neck (nano-CT) data and to investigate the performance of CNNs to segment data across scanners. Scans of human L1 vertebral bodies (microCT [North Star Imaging], n=28, 53μm3) and femoral necks (nano-CT [GE], n=28, 27μm3) were used for evaluation. Six slices were selected for each scan and then manually segmented to create ground truth masks (Dragonfly 4.0, ORS). Two-dimensional U-Net CNNs were trained in Dragonfly 4.0 with images of the [FN] femoral necks only, [VB] vertebral bodies only, and [F+V] combined CT data. Global (i.e., Otsu and Yen) and local (i.e., Otsu r = 100) thresholding methods were applied to each dataset. Segmentation performance was evaluated using the Dice coefficient, a similarity metric of overlap. Kruskal-Wallis and Tukey-Kramer post-hoc tests were used to test for significant differences in the accuracy of segmentation methods. The FN U-Net had significantly higher Dice coefficients (i.e., better performance) than the global (Otsu: p=0.001; Yen: p=0.001) and local (Otsu [r=100]: p=0.001) thresholding methods and the VB U-Net (p=0.001) but there was no significant difference in model performance compared to the FN + VB U-net (p=0.783) on femoral neck image data. The VB U-net had significantly higher Dice coefficients than the global and local Otsu (p=0.001 for both) and FN U-Net (p=0.001) but not compared to the Yen (p=0.462) threshold or FN + VB U-net (p=0.783) on vertebral body image data. The results demonstrate that the U-net architecture outperforms common thresholding methods. Further, a network trained with bone data from a different system (i.e., different image acquisition parameters and voxel size) and a different anatomical site can perform well on unseen data. Finally, a network trained with combined datasets performed well on both datasets, indicating that a network can feasibly be trained with multiple datasets and perform well on varied image data.

Volume None
Pages None
DOI 10.1101/2021.07.27.453890
Language English
Journal bioRxiv

Full Text