Grading Loss: A Fracture Grade-based Metric Loss for Vertebral Fracture Detection
Malek Husseini, Anjany Sekuboyina, Maximilian Loeffler, Fernando Navarro, Bjoern H. Menze, Jan S. Kirschke
GGrading Loss: A Fracture Grade-based MetricLoss for Vertebral Fracture Detection
Malek Husseini* , , Anjany Sekuboyina* , , Maximilian Loeffler , FernandoNavarro , , Bjoern H. Menze , and Jan S. Kirschke Department of Computer Science, Technical University of Munich, Germany Klinikum rechts der Isar, Technical University of Munich, Germany [email protected]
Abstract.
Osteoporotic vertebral fractures have a severe impact on pa-tients’ overall well-being but are severely under-diagnosed. These frac-tures present themselves at various levels of severity measured usingthe Genant’s grading scale. Insufficient annotated datasets, severe data-imbalance, and minor difference in appearances between fractured andhealthy vertebrae make naive classification approaches result in poor dis-criminatory performance. Addressing this, we propose a representationlearning-inspired approach for automated vertebral fracture detection,aimed at learning latent representations efficient for fracture detection.Building on state-of-art metric losses, we present a novel
Grading Loss forlearning representations that respect Genant’s fracture grading scheme.On a publicly available spine dataset, the proposed loss function achievesa fracture detection F Keywords:
Fracture Detection, Metric Loss, Representation Learning
Vertebral fractures are severely under-diagnosed. According to a 2013 study, 84%of incidental vertebral fractures were not reported in CT [2]. This is either due tothe fractures being asymptomatic or to the symptoms wrongly being attributedto other factors. Osteoporotic vertebral fractures have critical consequences suchas disability or increased mortality. Osteoporotic vertebral fractures cause painand kyphosis in the short term, but are associated with an 8-fold higher mor-tality in the long term [3]. Accentuating this is their high prevalence in olderadult population (40% by the age of 80 years), making a missed diagnosis crit-ical. Therefore, there is a need for an automated and reproducible detection ofvertebral fractures. * Shared first authors a r X i v : . [ ee ss . I V ] A ug M. Husseini et al.(a) (b)
Fig. 1:
Illustrating fracture grades : (a) TSNE visualisation of latent rep-resentations learnt by formulating fracture detection as a simple classificationproblem, resulting in poor separability. (b) An example selection of the threeclasses of vertebrae studied in this work, healthy, grade-2 fracture, and grade-3fracture.
Vertebral Fracture Detection
Automatic detection of vertebral fractures isrelatively unexplored. Valetinitsch et al. [7] propose the extraction of texture-based features such as histogram of gradients or local binary patterns from thetrabecular of a segmented verebrae and classifying them using a random forest.From a deep learning perspective, Bar et al. [8] employ a convolutional neuralnetwork for classifying sagittal patches from the vertebral column and aggre-gating the classification across patches using a recurrent neural network. Alongsimilar lines, Tomita et al. [9] work on thoraco-lumbar slices processed with aCNN and aggregated across slices using a long short-term memory (LSTM) net-work. However, unlike [8], the latter does not need any anatomy to be segmentedto start the processing. Note that these approaches are ad hoc implementationsof CNNs working on large data samples and provide minimal insights into theworkings of the network. Recently, Nicolaes et al. [10] proposed a fully 3D ap-proach for detecting vertebral fractures based on a voxel-level prediction regime,also providing a weak localization of the fracture. However, it being patch-basedand predicting per voxel limits its real-time applicability.We argue that formulating a vertebral fracture detection as a naive classifi-cation problem is sub-optimal, more so in case of limited and unbalanced dataregimes. Fig. 1a illustrates the TSNE representations of the latent features ofone of the baselines in this work, viz. detecting vertebral fractures using a simplecross entropy loss using a convolutional neural network. Observe the resultingpoor class-separation between healthy and fractured vertebrae. We attribute thisto the wide variation in vertebral shapes: a healthy lumbar vertebra is ‘more dif-ferent’ from a healthy upper-thoracic vertebra than a fractured lumbar vertebra.Moreover, there exists a ‘gradation’ among vertebral fractures, further obfuscat-ing a clear shape-based separation (cf. Fig. 1b).
Genant’s Vertebral Fracture Grading
The Current gold standard in gradingvertebral fractures is a semi-quantitative method developed by Genant et al [1],according to which fractures are categorized into three grades (Grades 1, 2, and rading Loss for vertebral fracture detection 3
3; cf. Fig 1b). This is based on the height-loss a vertebra undergoes compared toits healthy counterpart. A healthy vertebra is considered to be Grade 0. Grades2 and 3 have proven clinical consequences, while this is unknown for Grade 1;as its small height reduction results in a high inter-rater uncertainty, Grade 1fractures are excluded in this study.
Representation Learning
In this work, we aim to incorporate the gradualshape variations, courtesy of the fracture grades, into the training process ofa classifier by explicitly adjusting the latent space. Deep learning models arebelieved to generate useful representations as a byproduct of the task they aretrying to solve. However, this is not the case in low-data regimes as shown inFig 1a.
Representation learning or metric learning can be used to learn efficientlatent representations in such scenarios. Siamese networks [15] using contrastiveloss and Face-Net [14] with its triplet loss are examples of standard metric learn-ing frameworks wherein representations of similar entities are clustered togetherwhile those of dissimilar ones are pushed apart. Contributions
In this work, we attempt to solve vertebral fracture detectionas a two-class, healthy vs. fractured classification problem. – Towards accurate classification, we pre-train the neural network using fracture-grade based representation learning. For this, we propose a novel loss func-tion termed grading loss , which encourages the learnt representations torespect the gradation in the appearance of fractures. – Accounting for the dependence of vertebral shapes on vertebral labels, wealso propose a spine-region based pre-conditioning module. – We validate the proposed fracture detection regime on a publicly availableVerSe dataset obtaining a classification F Given a collection of 2D vertebral patches, the objective of our work is to clas-sify them into two classes, fractured and healthy. As vertebral shape dependson it label and the amount of variation in this shape due to a fracture de-pends on the fracture grade, we hypothesize that preceding the classificationstage with fracture-grade and vertebral-label dependant pre-training results inan improved class separation. Consequently, this results in improved classifica-tion performance. We model these pre-training stages with inspiration from thefield of representation learning.
Metric learning aims to learn better data representations by working on a notionof ‘distance’ in the latent space. It aims to cluster similar objects closer (by
M. Husseini et al. if Fig. 2: Left: the arrangement of the three grades and the positive anchor duringinitialization. Right: g g g g g g
3. The positive anchor gn works on clustering the similar classes together in the latent space. gn belongsto the g fractured , but the former is more similar to a vertebra froma healthy class than the latter. Incorporating such ’ranking’ criterion into themetric learning framework, we propose the grading loss .Assume a 2D vertebral patch, x ∈ X , is mapped to a representation f ( x )by a neural network f . The Euclidean distance between the representations oftwo examples x i and x j is denoted by d ( x i , x j ) = || f ( x i ) − f ( x j ) || . We designthe grading loss as a quadruplet loss [11] working with quadruplets denoted by { x ig , x jg , x kg , x lg n } , where x ig denotes a healthy vertebra sample, x jg and x kg denote samples of grade two and three, respectively, and n ∈ { , , } can berandomly chosen as a healthy or a fractured example. Observe that the x i , x j ,and x k form a static triplet, i.e. they are always sampled from fixed sub-classesof fracture grades. We incorporate a grading in the embedding space as follow: agrade-3 fracture is farther away from a healthy (grade-0) vertebra than a grade-2fracture vertebra, and grade-2 is closer to grade-3 and it is to healthy. We canformulate these requirements as: d ( x g , x g ) + α < d ( x g , x g ) and (1) d ( x g , x g ) + β < d ( x g , x g ) , (2)where α and β are distance thresholds. Note that the Eq. 1 uses g g
0. Owing to the triangular inequality of distances,the restrictions on the distances from g g g rading Loss for vertebral fracture detection 5 L = max(0 , d ( x g , x g ) − d ( x g , x g ) + α ) and (3) L = max(0 , d ( x g , x g ) − d ( x g , x g ) + β ) . (4)Observe that Eqs. 3 and 4 structurally represent the triplet loss. However,observe that these do not work on similarities. They form separating objectivesbetween various fracture grades. Finally, a third, clustering objective is incorpo-rated by virtue of x l in the quadruplet. Recall that x g n could belong to any of thethree sub-classes. Based on the value of n in the sampled triplet, the clusteringobjective pulls x g n closer to its match in the static triplet. We demonstrate our grading loss in Fig.2 with x g n belonging to g x g n bythe term positive anchor. The clustering objective can be represented as: L = max(0 , γ − d ( x { i,j,k } g n , x lg n )) (5)where n ∈ { , , } and x i and x j are a pair of samples from the sameclass. Assembling the loss terms together results in the proposed objective of grading loss , L G = L + L + L . In this work, the distance thresholds arechosen to be α > β > γ . This is to ensure that d ( x g , x g ) is as large as possiblewhile maintaining d ( x g , x g ) > d ( x g , x g ). That is, a higher separation betweenfractured and healthy classes is desirable compared to that between the twogrades. Considering the wide variation in shape from cervical to lumbar vertebrae, weclaim that learning label-specific representations as a pre-training stage alsoimproves fracture detection. Assuming the availability vertebral labels duringthe training process, we construct five categories of vertebra based on theirshape similarity: T ∼ T , T ∼ T , T ∼ T , L ∼ L
4, and L
5. Treatingthis as a five-class problem, we can employ any standard metric loss for learninglabel-specific representations. Note that our grading loss can also be extendedfor this case as there exists a ‘ranking’ among the classes. Another applicationcould be in brain tumors, where grades are also present (high grade and lowgrade gliomas). We leave the application of our loss to such scenarios for futurework.
We perform fracture detection with the following network architecture [13], con-taining 5 × → maxpool → (conv64-bn-relu) → maxpool (conv128-bn-relu) → maxpool → (conv256-bn-relu) → maxpool → (linear256-bn-lrelu) M. Husseini et al. → (linear128-bn-lrelu) → (linear64-bn-lrelu) → linear8where ‘bn’ and ‘lrelu’ represent batch normalization and leaky Relu layersrespectively. Recall that our network consists of a pre-training stage (for repre-sentation learning) and a training stage (for fracture detection). Furthermore,the pre-training stage consists of two sub-stages: vertebral-index-based repre-sentation learning and fracture-grade-based representation learning. Once pre-trained, the network is trained by optimizing a binary cross entropy loss overthe fractured and healthy classes. For this, the last linear layer (linear8) is re-placed with a two node linear layer (linear2) for the two classes. The network isimplemented using the Pytorch library on an Nvidia GTX 1080 gpu. All lossesare optimized using the Adam optimizer with a learning rate of 0 . α, β and γ to 1 . , . In this section, we evaluate the contribution of the two main components pro-posed as part of our classification routine: first, the proposed grading loss’ abilityin in learning efficient representations, and second, our complete fracture detec-tion routine.
Dataset
Recall that the proposed approach works at a vertebra level and utilizesvertebral labels. We utilize the publicly available VerSe [4,5,6] dataset and itscentroid annotations. As part of [1], its vertebra are annotated for fractures ofthree grades. We work with healthy, grade-2 and grade-3 fracture. We excludethe cervical vertebrae ( C ∼ C
7) as vertebral fractures are extremely rare inthis region. The dataset consists of 1283 vertebrae extracted from 157 scans,among which 1133 are healthy, 104 are g g g g Data Preperation
Typically, a vertebra’s mid-sagittal slice is a good indicatorof a fractures. However, in cases where the vertebra presents itself in an atypicalorientation, using the mid-sagittal slice is ineffective. Therefore, we utilize thevertebral centroids to extract 2D reformations of the vertebra along the mid-vertebral plane perpendicular to the vertebra’s sagittal axis. Specifically, weconstruct a spline passing through the centroids and reformat the sagittal planealong which this spline passes. From this reformation, vertebral patches of size112 ×
112 pixels at 1 × mm resolution are extracted so that additional contextis provided by the vertebra above and below the vertebra-of-interest (VOI). Ournetwork consumes these patches. Additionally, a Gaussian around the centroidis passed as an additional channel for indicating the VOI. rading Loss for vertebral fracture detection 7Setup SN SP F ± ± * 57.4 ± ± ± ± Grading ± ± * ± Table 1: Evaluating learnt representations ( representation learn → fracturetrain ): Performance comparison of various losses for learning fracture-specificrepresentations. * indicates statistical insignificance ( p -value=0.44). We validate the proposed grading loss in two stages: first, it is deployed as astand-alone representation learning loss, where the separability of the learnt rep-resentations is tested (without any fracture-oriented training), and second, it iscombined with a fracture classification module as a pre-training stage along withthe proposed spine region-based representation learning component. The clas-sification performances of various setups are compared using sensitivity ( SN ),specificity ( SP ), and F g g Grading loss results in better representations
In this experiment, wevalidate the effectiveness of the proposed grading loss at learning efficient rep-resentations for fracture detection. We compare our loss with the two standardmetric-learning losses: contrastive and triplet losses. Specifically, the neural net-work is optimized on the training set using each of the metric losses. Oncetrained, it is used to obtained the latent representations of the training sampleson which a support vector machine (SVM) with a linear kernel is learnt. Themore linearly separable the representations are, better the learnt SVM performson the test set’s latent representation. Table 1 reports the classification perfor-mance of this SVM on the test set representations. Observe that the gradingloss readily offer better ‘linear’ separability ( ∼
8% increase in F grading loss . Proposed fracture detection regime
Our complete fracture detection pipelineconsists of three stages: two pre-training stages followed by the main classifica-tion stage. The first pre-training stage includes optimizing a contrastive lossover the five regions of spine described in Sec. 2.2. Experiments justifying thechoice of contrastive loss for this stage are presented in the supplement. Fol-lowing this, the network goes through the second pre-training stage where our
M. Husseini et al.(a) Contrastive Loss (b) Triplet Loss (c)
Grading loss
Fig. 3: TSNE visualisation of the representations learnt by various metric learn-ing losses, without explicit classification-specific training. Proposed grading loss obtains more separability between healthy and fractured classes.
Label Pre-train Rep. Learn. Frac. train
SN SP F (cid:56) (cid:56) (cid:88) ± ± ± (cid:88) (cid:56) (cid:88) ± ± ± (cid:56) Contrastive (cid:88) ± ± ± (cid:56) Triplet (cid:88) ± ± ± (cid:56) Grading (cid:88) ± ± ± (cid:88) Contrastive (cid:88) ± ± ± (cid:88) Triplet (cid:88) ± ± ± (cid:88) Grading (cid:88) ± ± ± Table 2: Validating the proposed fracture detection regime ( label pre-train → representation learn → fracture train ): Comparison of the pro-posed training routine based on grading loss with naive classification as well aswith other representation-learning-augmented classifications. grading loss is minimized. Finally, the network is optimized for fracture de-tection using cross entropy loss. We represent the proposed pipeline as labelpre-train → representation learn → fracture train ). Table 2 reports anablative test of the proposed routine. We test the contribution of label-basedpre-training and that of the proposed grading loss-based representation learningis evaluated. Compared with a baseline network trained end-to-end for frac-ture detection, pre-training with vertebral labels offers a 5% improvement in F grading loss also improves the classification performance, providing about 7% F F rading Loss for vertebral fracture detection 9 We conclude that in case of low-data regimes with severe data imbalance, aug-menting classification with representation learning-based pre-training helps. Com-pared to conventional metric losses which work on similarity or dissimilarity ofexamples, the proposed grading loss which incorporating a ‘ranking’ within theclasses provides a superior performance. Going a step further, incorporating thevertebral label information using similar techniques of representation learningfurther improves fracture detection. The proposed fracture routine achieves an F Acknowledgements
This work is supported by DIFUTURE, funded by the German Federal Ministryof Education and Research under (01ZZ1603[A-D]) and (01ZZ1804[A-I]).
References
1. Genant, H. K.et al.: Vertebral fracture assessment using a semiquantitative tech-nique. Journal of bone and mineral research, 8(9), 1137-1148 (1993)2. Carberry, G. et al.: Unreported vertebral body compression fractures at abdominalmultidetector CT. Radiology, 268(1), 120-126 (2013)3. Cauley, J. et al.: Risk of mortality following clinical fractures. Osteoporosis inter-national, 11(7), 556-561 (2000)4. Loeffler, M. et al.: A Vertebral Segmentation Dataset with Fracture Grading, Radi-ology: Artificial Intelligence (In Press) (2020)5. Sekuboyina, A. et al.: VerSe: A Vertebrae Labelling and Segmentation Benchmark.arXiv eprint: 2001.09193. URL: arXiv:2001.09193 (2020)6. Sekuboyina A. et al., Labelling Vertebrae with 2D Reformations of Mul-tidetector CT Images: An Adversarial Approach for Incorporating PriorKnowledge of Spine Anatomy, In: Radiology: Artificial Intelligence vol. 2https://doi.org/10.1148/ryai.2020190074 (2020)7. Valentinitsch, A. et al.: Opportunistic osteoporosis screening in multi-detector CTimages via local classification of textures. Osteoporosis international, 30(6), 1275-1285 (2019)8. Bar, A. et al.: Compression fractures detection on CT. In Medical Imaging 2017:Computer-Aided Diagnosis Vol. 10134, p. 1013440. International Society for Opticsand Photonics (2017)9. Tomita, N.et al.: Deep neural networks for automatic detection of osteoporotic ver-tebral fractures on CT scans. Computers in biology and medicine, 98, 8-15 (2018)10. Nicolaes, J.et al.: Detection of vertebral fractures in CT using 3D ConvolutionalNeural Networks. arXiv preprint arXiv:1911.01816 (2019)11. Chen, W. et al.: Beyond triplet loss: a deep quadruplet network for person re-identification. In Proceedings of the IEEE Conference on Computer Vision andPattern Recognition, pp. 403-412 (2017)0 M. Husseini et al.12. Husseini, M. et al.: Conditioned Variational Auto-encoder for Detecting Osteo-porotic Vertebral Fractures. In International Workshop and Challenge on Compu-tational Methods and Clinical Applications for Spine Imaging, pp. 29-38 (2019)13. Raghu, M. et al.: Transfusion: Understanding transfer learning for medicalimaging. In Advances in Neural Information Processing Systems, pp. 3342-3352,https://doi.org/10.10007/1234567890 (2019)14. Schroff, F. et al.: Facenet: A unified embedding for face recognition and clustering.In: Proceedings of the IEEE conference on computer vision and pattern recognitionpp. 815-823 (2015)15. Hadsell, R. et al.: Dimensionality reduction by learning an invariant mapping.IEEE Computer Society Conference on Computer Vision and Pattern RecognitionVol. 2, pp. 1735-1742 (2006)16. Finn, C. et al.: Model-agnostic meta-learning for fast adaptation of deep networks.In Proceedings of the 34th International Conference on Machine Learning vol. 70,pp. 1126-1135 (2017) rading Loss for vertebral fracture detection 11
Appendix A Supplementary Material: Label pre-train
Setup
SN SP F ± ± ± ± ± ± Triplet 68.9 ± ± ± Table 3: Contrastive loss was used in the label pre-trainlabel pre-train