Automated femur segmentation from computed tomography images using a deep neural network
P.A. Bjornsson, B. Helgason, H. Palsson, S. Sigurdsson, V. Gudnason, L.M. Ellingsen
AAutomated femur segmentation from computed tomographyimages using a deep neural network
P.A. Bjornsson ∗ a , B. Helgason b , H. Palsson c , S. Sigurdsson d , V. Gudnason d,e , and L.M.Ellingsen a,fa Dept. of Electrical and Computer Engineering, The University of Iceland, Reykjavik, Iceland b Institute for Biomechanics, ETH Z¨urich, Z¨urich, Switzerland c Dept. of Industrial Engineering, Mechanical Engineering, and Computer Science, TheUniversity of Iceland, Reykjavik, Iceland d The Icelandic Heart Association, Kopavogur, Iceland e Dept. of Medicine, The University of Iceland, Reykjavik, Iceland f Dept. of Electrical and Computer Engineering, The Johns Hopkins University, Baltimore, MD21218, USA
ABSTRACT
Osteoporosis is a common bone disease that occurs when the creation of new bone does not keep up with the lossof old bone, resulting in increased fracture risk. Adults over the age of 50 are especially at risk and see their qualityof life diminished because of limited mobility, which can lead to isolation and depression. We are developing arobust screening method capable of identifying individuals predisposed to hip fracture to address this clinicalchallenge. The method uses finite element analysis and relies on segmented computed tomography (CT) imagesof the hip. Presently, the segmentation of the proximal femur requires manual input, which is a tedious task,prone to human error, and severely limits the practicality of the method in a clinical context. Here we presenta novel approach for segmenting the proximal femur that uses a deep convolutional neural network to produceaccurate, automated, robust, and fast segmentations of the femur from CT scans. The network architecture isbased on the renowned u-net, which consists of a downsampling path to extract increasingly complex features ofthe input patch and an upsampling path to convert the acquired low resolution image into a high resolution one.Skipped connections allow us to recover critical spatial information lost during downsampling. The model wastrained on 30 manually segmented CT images and was evaluated on 200 ground truth manual segmentations.Our method delivers a mean Dice similarity coefficient (DSC) and 95 th percentile Hausdorff distance (HD95) of0 .
990 and 0 .
981 mm, respectively.
Keywords:
Computed tomography, Femur, Segmentation, Convolutional neural networks, Osteoporosis.
1. INTRODUCTION
A consequence of ageing societies is a higher prevalence of chronic diseases amongst older populations. In thecoming years, this will lead to shortages of qualified health care practitioners and mounting health care costs as aresult of the added strain placed by these longevous populations. It appears that the current status quo of healthcare systems is unsustainable and must be restructured from face-to-face based care to a more decentralized,home based care that places emphasis on prevention rather than treatment. One disease that disproportionatelyaffects those over the age of 50 is osteoporosis - a bone disease that occurs when the body loses too muchbone, makes too little bone, or both, resulting in reduced bone mass, which leads to increased bone fragilityand heightened fracture risk with age. Hip fractures are associated with some of the most dire socioeconomicconsequences: those who incur a fracture typically experience a steep decline in physical, mental, and emotionalfunction and, in 50-55% of cases, individuals are left with residual walking disability and in 15-30% of cases,
Further author information: (Send correspondence to P.A.B.)P.A.B.: E-mail: [email protected].: E-mail: [email protected] a r X i v : . [ ee ss . I V ] J a n hese individuals must be remanded to institutional care. What is most startling is that 11-23% of individualswill be deceased six months after incurring the fracture, increasing to 22-29% after one year has passed since theincident. Subject-specific, image-based finite element (FE) analysis of bone is a popular approach in biomechanics thathas gained considerable traction in recent years. By exploiting the gray-scale features of the CT images inconcert with the respective segmentation mask, a screening tool for hip fracture risk prediction can be broughtto fruition. Such a clinical screening tool could prevent potentially devastating fractures for patients, as well asto dramatically relieve the economic burden of hip fractures on our healthcare systems. However, these methodshave yet to be implemented for clinical use, since the segmentation of the proximal femur currently requiresmanual input, which severely limits their use in clinical context.A number of methods have been implemented to surmount the problem of bone segmentation, ranging fromthresholding techniques to graph-cut methods.
Current segmentation methods often require a “user-in-the-loop” paradigm in order to manually correct segmentations to produce acceptable masks for FE modeling and/ortheir processing time is too long for clinical use. This lack of robustness is costly in terms of time and the need fora highly trained specialist to manually correct the segmentations. Consequently, these methods cannot processlarger cohorts to the same degree as fully automated ones. In recent years, the application of deep neuralnetworks (DNNs) in image segmentation has gained considerable attention. The use of deep neural networks asa viable option for biomedical image segmentation is a direct consequence of the u-net architecture proposed byRonneberger et al. In this impactful paper, they demonstrated how the u-net architecture can produce fastand precise segmentations without relying on a large training set. However, the process by which ground truthsegmentation masks are authored is typically manual delineation, which is a taxing and time-consuming processthat is ideally only implemented for a small training data set. Despite the need for ground truth labels, deepneural networks have become the state-of-the-art approach in medical imaging.
13, 14
A segmentation predictionon novel data can be generated autonomously (i.e., without human intervention) in a matter of seconds, which isoften an order of magnitude faster than preceding methods that do not make use of DNN’s. Zhao et al. proposedan automated, patch-based three-dimensional v-net architecture and reported a mean DSC of 0 . ± . was conducted by Chen et al., who reported amean DSC of 0 . ± . aunique longitudinal study of the Icelandic elderly, to demonstrate the accuracy and robustness of our proposedmethod.
2. METHODS2.1 Preprocessing and data augmentation
The Icelandic Heart Association (IHA) provided us with CT images from the AGES-RS: a cohort that consistsof both men and women born between 1907 and 1935, monitored in Iceland by the IHA since 1967. This uniquedatabase of high-quality CT images contains roughly 4800 density calibrated CT scans of the proximal femur atbaseline and 3300 scans of the same individuals acquired at a five-year follow-up. The in-plane resolution of eachscan is 512 ×
512 voxels with 88-178 slices and 0 . × . × voxel size. The preprocessing of the dataconsisted of three steps: Firstly, each of the 30 CT images used for training was split in half and the left sideswere subsequently mirrored to the right, resulting in 60 images of the right side hip/upper leg. Second, min-maxnormalization was used to linearly map the intensity values from Hounsfield units to the range [0 , was implemented to identify the background voxels and allow for automatic cropping of the3D images to decrease their size and therefore speed up the execution of the neural network. Since each extravoxel is computationally costly, the images were all cropped to eliminate unnecessary background voxels.igure 1. A flowchart showing the workflow of our proposed method.The proposed fully automated proximal femur segmentation pipeline is illustrated in Figure 1. Duringtraining, a training/validation set of 30 CT images (i.e., 60 proximal femurs) from the AGES cohort wasfed into a neural network that exploited on-the-fly data augmentation on randomly cropped patches from arandomly selected batch of two images. On-the-fly data augmentation eliminates the need for excessive storageof augmented images by performing the augmentation prior to each optimization iteration. Data augmentationis imperative for maximizing the efficiency of a training set, since obtaining manual segmentations (i.e., groundtruth segmentations) is an arduous, time-consuming process that does not cater to large cohorts. Hence, dataaugmentation is applied to teach the model invariance and robustness properties when only a limited data setis available. In addition, artificially expanding the training set with data augmentation has a regularizing effect,which avoids overfitting the model to a small subset of data. Deformations that capture variation within thedata set can be simulated efficiently and assist the model in learning invariance between images. Here weapplied elastic deformations as well as linear-spatial and intensity transformations (i.e., scaling, rotation andbrightness) to simulate the variability between patients’ scans. Data augmentation, with random transformationparameters from pre-defined ranges, was implemented on-the-fly for each image before it was forwarded into theneural network. Each transformation had a 35% likelihood of being applied to the image at hand, allowing themodel to encounter a diverse set of images, thereby decreasing redundancy. The scaling, rotation, and brightnesstransforms were defined in the ranges (0 . , . − ◦ , ◦ ), and (0 . , . α = (0 , σ = (9 , ∗ On-the-fly data augmentation, in concert withparameter sharing and batch normalization, rendered the use of explicit regularization techniques unnecessaryand even counterproductive. As previously stated, the u-net architecture addresses two main issues: namely, the ability to train a model froma very small data set and the ability to produce precise segmentations despite the former. A schematic of theproposed architecture is shown in Figure 2. The two paths that comprise the u-net structure are the contracting(downsampling) path and the expanding (upsampling) path. The former is the encoder and captures contextwith the use of stacked convolutional units and max pooling layers, while the latter is the decoder and allowsfor precise localization with the use of transposed convolutions. In the final layer of the network, a 1 × × , > . in Python. ∗ Here, α denotes the scaling factor (controls the deformation intensity) and σ denotes the smoothing factor (controlsthe displacement field smoothing) for the elastic deformation. igure 2. A schematic of our proposed 3D u-net model. The bold numbers at the corners represent the numberof feature maps (channels) per layer. For the sake of simplicity, feature maps are depicted in two dimensions.A patch-based model was adopted in consideration of memory constraints; each volume patch was 128 × ×
128 voxels with an overlap of 64 × ×
64 voxels. This patch size captures the entire femoral head, which is themost dynamic section of the proximal femur, and downsamples nicely, meaning that after each use of max poolingwe are left with integer values for patch dimensions. During training, volume patches were randomly selectedfrom the image volume and subsequently fed into the neural network. As the image patch is downsampled in thecontracting path, the number of channels increases, which allows for the extraction of more complex features.Skip connections are used to pass features from the contracting path to the expanding path in order to recoverspatial information lost during downsampling. At each step in the decoder, a skip connection allows for theconcatenation of the output of the transposed convolutional layer with the corresponding feature map from theencoder. A convolutional layer is subsequently able to produce a more precise output based on this information.The Dice loss function, defined as 1 − DSC, quantified the performance of the model at each iteration of each epochin order for the gradient descent algorithm to adjust the parameters during backpropagation. The validation setallowed us to gain insight into the performance of the model on unseen data so that hyperparameters could beadjusted accordingly.Our model was trained on images from the AGES data set, comprising 260 proximal femurs with correspondingmanually delineated segmentations, generated with a semi-automated delineation protocol, that served as groundtruth annotations. A total of 54 images were used for training, 6 were used for validation, and the remaining200 were used for evaluation. The Adam optimizer ( β = 0 . β = 0 . · − . Our model was trained using dual Nvidia GeForce GTX 1080 Ti GPUsfor 300 epochs, which took roughly 11.5 hours.
3. EXPERIMENTS AND RESULTS
We evaluated the performance of our model on the aforementioned AGES data set by processing 200 proximalfemurs that had been manually segmented (the current gold standard). We used the DSC and HD95 to evaluatethe accuracy and robustness of our method. The former metric measures the overlap between the automaticallygenerated segmentation mask and the ground truth manual mask. The HD95 metric, on the other hand, quantifiesthe largest segmentation error between two images and provides valuable insight into the performance of theigure 3. The box plots above show the mean DSC (left) and the mean HD95 (right) for 200 proximal femurs.Figure 4. A visual comparison of our segmentation results and the manual segmentation. (a)
The original CTscans, (b) the segmentation predictions using the proposed method, (c) the ground truth segmentations.model. The mean DSC on this evaluation set was 0 . ± .
002 and the mean HD95 was 0 . ± .
040 mm(see Fig. 3), demonstrating both high accuracy and robustness of the proposed method. The time for eachsegmentation prediction was 12-15 seconds, making the method viable for use in both large studies and clinicalsettings. Figure 4 displays a visual comparison between the output of our proposed method and the groundtruth segmentation. Despite the thin and diffusive bone structure boundaries between the femoral head and theacetabulum (i.e., the bottom two rows of Figure 4), our method accurately labels the voxels that belong to theforeground class.
4. DISCUSSION AND CONCLUSION
This paper introduces a fully automated, accurate, fast, and robust method for proximal femur segmentationthat produces segmentation results with sub-millimeter accuracy. Our model addresses the biggest hurdles thathave impeded prior methods. Firstly, it is fully automated and does not require a trained operator to makead hoc corrections to unacceptable segmentation predictions. Secondly, the time our model takes to outputa segmentation prediction (around 12-15 seconds per prediction) is well within reasonable bounds for clinicaliability and can additionally be implemented to process large cohorts. The mean DSC was 0 . ± .
002 andmean HD95 was 0 . ± .
040 mm when evaluated on 200 manually segmented femurs. The proposed methodis superior to preceding methods in terms of previously reported numbers of DSC and HD95 metrics and doesnot require any manual interaction. In the near future, we will conduct a more extensive evaluation on a largercohort and, in turn, integrate the method into our existing FE pipeline, bringing it one step closer to becominga clinically viable option for screening at-risk patients for hip fracture susceptibility.
5. ACKNOWLEDGEMENTS
This work was supported by the RANNIS Icelandic Student Innovation Fund.
REFERENCES [1] Ekstr¨om, W., Miedel, R., Ponzer, S., Hedstr¨om, M., Samneg˚ard, E., and Tidermark, J., “Quality of lifeafter a stable trochanteric fracture–a prospective cohort study on 148 patients,”
J Orthop Trauma ,39–44 (2009).[2] Carpenter, R. and Carter, D., “The mechanobiological effects of periosteal surface loads,”
Biomech ModelMechanobiol. , 227–242 (2008).[3] Magaziner, J., Fredman, L., Hawkes, W., Hebel, J., Zimmerman, S., Orwig, D., and Wehren, L., “Changes infunctional status attributable to hip fracture: A comparison of hip fracture patients to community-dwellingaged,”
Am. J. Epidemiol. , 1023–1031 (2003).[4] Nevalainen, T., Hiltunen, L., and Jalovaara, P., “Functional ability after hip fracture among patients home-dwelling at the time of fracture,”
Cent Eur J Public Health , 211–216 (2004).[5] Osnes, E., Lofthus, C., Meyer, H., Falch, J., Nordsletten, L., Cappelen, I., and Kristiansen, I., “Consequencesof hip fracture on activities of daily life and residential needs,”
Osteoporos Int. , 567–574 (2004).[6] Haleem, S., Lutchman, L., Mayahi, R., Grice, J., and Parker, M., “Mortality following hip fracture: Trendsand geographical variations over the last 40 years,”
Injury , 1157–1163 (2008).[7] Fleps, I., Guy, P., Ferguson, S., Cripton, P., and Helgason, B., “Explicit finite element models accuratelypredict subject-specific and velocity-dependent kinetics of sideways fall impact,”
J Bone Miner Res. ,1837–1850 (2019).[8] Kim, J., Nam, J., and Jang, I., “Fully automated segmentation of a hip joint using the patient-specificoptimal thresholding and watershed algorithm,”
Comput Methods Programs Biomed. , 161–171 (2018).[9] Chang, Y., Yuan, Y., Guo, C., Wang, Y., Cheng, Y., and Tamura, S., “Accurate pelvis and femur segmen-tation in hip ct with a novel patch-based refinement,”
IEEE J Biomed Health Inform. , 1192–1204(2020).[10] Younes, L., Nakajima, Y., and Saito, T., “Fully automatic segmentation of the femur from 3d-ct imagesusing primitive shape recognition and statistical shape models,”
Int J Comput Assist Radiol Surg. ,189–196 (2019).[11] Pauchard, Y., Fitze, T., Browarnik, D., Eskandari, A., Pauchard, I., Enns-Bray, W., P´alsson, H., Sigurdsson,S., Ferguson, S. J., Harris, T. B., Gudnason, V., and Helgason, B., “Interactive graph- cut segmentationfor fast creation of finite element models from clinical ct data for hip fracture prediction,”
Comput MethodsBiomech Biomed Engin. , 342 (2016).[12] Ronneberger, O., Fischer, P., and Brox, T., “U-net: Convolutional networks for biomedical image segmen-tation,” in [
Medical Image Computing and Computer-Assisted Intervention – MICCAI 2015 ], Navab, N.,Hornegger, J., Wells, W. M., and Frangi, A. F., eds., 234–241, Springer International Publishing, Cham(2015).[13] Shao, M., Han, S., Carass, A., Li, X., Blitz, A., Shin, J., Prince, J., and Ellingsen, L. M., “Brain ventricleparcellation using a deep neural network: Application to patients with ventriculomegaly,”
NeuroImage:Clinical , 101871 (2019).[14] Huo, Y., Terry, J., Wang, J., Nair, S., Lasko, T., Freedman, B., Carr, J., and Landman, B., “Fully automaticliver attenuation estimation combing cnn segmentation and morphological operations,” Med. Phys. ,3508–3519 (2019).15] Zhao, C., Keyak, J., Tang, J., Kaneko, T. S., Khosla, S., Amin, S., Atkinson, E., Zhao, L., Serou, M.,Zhang, C., Shen, H., Deng, H., and Zhou, W., “A deep learning-based method for automatic segmentationof proximal femur from quantitative computed tomography images,”
ArXiv. abs/ . ], (2016).[17] Chen, F., Liu, J., Zhao, Z., Zhu, M., and Liao, H., “Three-dimensional feature-enhanced network forautomatic femur segmentation,” IEEE J Biomed Health Inform. , 243–252 (2019).[18] Harris, T., Launer, L., Eiriksdottir, G., Kjartansson, O., Jonsson, P., Sigurdsson, G., Thorgeirsson, G., As-pelund, T., Garcia, M., Cotch, M., Hoffman, H., and Gudnason, V., “Age, gene/environment susceptibility– reykjavik study: Multidisciplinary applied phenomics,”
Am J Epidemiol. , 1076–1087 (2007).[19] Otsu, N., “A threshold selection method from gray-level histograms,”
IEEE Trans. Syst. , 62–66 (1979).[20] LeCun, Y., Boser, B., Denker, J., Henderson, D., Howard, R., Hubbard, W., and Jackel, L., “Handwrittendigit recognition with a back-propagation network,” in [
Advances in Neural Information Processing Systems ],Touretzky, D., ed., , 396–404, Morgan-Kaufmann (1990).[21] Ioffe, S. and Szegedy, C., “Batch normalization: Accelerating deep network training by reducing internalcovariate shift,” CoRR abs/1502.03167 (2015).[22] M¨uller, D. and Kramer, F., “Miscnn: A framework for medical image segmentation with convolutionalneural networks and deep learning,”
ArXiv abs/1910.09308abs/1910.09308