[PDF] Grading Loss: A Fracture Grade-based Metric Loss for Vertebral Fracture Detection

Abstract

Osteoporotic vertebral fractures have a severe impact on patients' overall well-being but are severely under-diagnosed. These fractures present themselves at various levels of severity measured using the Genant's grading scale. Insufficient annotated datasets, severe data-imbalance, and minor difference in appearances between fractured and healthy vertebrae make naive classification approaches result in poor discriminatory performance. Addressing this, we propose a representation learning-inspired approach for automated vertebral fracture detection, aimed at learning latent representations efficient for fracture detection. Building on state-of-art metric losses, we present a novel Grading Loss for learning representations that respect Genant's fracture grading scheme. On a publicly available spine dataset, the proposed loss function achieves a fracture detection F1 score of 81.5%, a 10% increase over a naive classification baseline.

Full PDF

GGrading Loss: A Fracture Grade-based MetricLoss for Vertebral Fracture Detection

Malek Husseini* , , Anjany Sekuboyina* , , Maximilian Loeﬄer , FernandoNavarro , , Bjoern H. Menze , and Jan S. Kirschke Department of Computer Science, Technical University of Munich, Germany Klinikum rechts der Isar, Technical University of Munich, Germany [email protected]

Abstract.

Osteoporotic vertebral fractures have a severe impact on pa-tients’ overall well-being but are severely under-diagnosed. These frac-tures present themselves at various levels of severity measured usingthe Genant’s grading scale. Insuﬃcient annotated datasets, severe data-imbalance, and minor diﬀerence in appearances between fractured andhealthy vertebrae make naive classiﬁcation approaches result in poor dis-criminatory performance. Addressing this, we propose a representationlearning-inspired approach for automated vertebral fracture detection,aimed at learning latent representations eﬃcient for fracture detection.Building on state-of-art metric losses, we present a novel

Grading Loss forlearning representations that respect Genant’s fracture grading scheme.On a publicly available spine dataset, the proposed loss function achievesa fracture detection F Keywords:

Fracture Detection, Metric Loss, Representation Learning

Vertebral fractures are severely under-diagnosed. According to a 2013 study, 84%of incidental vertebral fractures were not reported in CT [2]. This is either due tothe fractures being asymptomatic or to the symptoms wrongly being attributedto other factors. Osteoporotic vertebral fractures have critical consequences suchas disability or increased mortality. Osteoporotic vertebral fractures cause painand kyphosis in the short term, but are associated with an 8-fold higher mor-tality in the long term [3]. Accentuating this is their high prevalence in olderadult population (40% by the age of 80 years), making a missed diagnosis crit-ical. Therefore, there is a need for an automated and reproducible detection ofvertebral fractures. * Shared ﬁrst authors a r X i v : . [ ee ss . I V ] A ug M. Husseini et al.(a) (b)

Fig. 1:

Illustrating fracture grades : (a) TSNE visualisation of latent rep-resentations learnt by formulating fracture detection as a simple classiﬁcationproblem, resulting in poor separability. (b) An example selection of the threeclasses of vertebrae studied in this work, healthy, grade-2 fracture, and grade-3fracture.

Vertebral Fracture Detection

Automatic detection of vertebral fractures isrelatively unexplored. Valetinitsch et al. [7] propose the extraction of texture-based features such as histogram of gradients or local binary patterns from thetrabecular of a segmented verebrae and classifying them using a random forest.From a deep learning perspective, Bar et al. [8] employ a convolutional neuralnetwork for classifying sagittal patches from the vertebral column and aggre-gating the classiﬁcation across patches using a recurrent neural network. Alongsimilar lines, Tomita et al. [9] work on thoraco-lumbar slices processed with aCNN and aggregated across slices using a long short-term memory (LSTM) net-work. However, unlike [8], the latter does not need any anatomy to be segmentedto start the processing. Note that these approaches are ad hoc implementationsof CNNs working on large data samples and provide minimal insights into theworkings of the network. Recently, Nicolaes et al. [10] proposed a fully 3D ap-proach for detecting vertebral fractures based on a voxel-level prediction regime,also providing a weak localization of the fracture. However, it being patch-basedand predicting per voxel limits its real-time applicability.We argue that formulating a vertebral fracture detection as a naive classiﬁ-cation problem is sub-optimal, more so in case of limited and unbalanced dataregimes. Fig. 1a illustrates the TSNE representations of the latent features ofone of the baselines in this work, viz. detecting vertebral fractures using a simplecross entropy loss using a convolutional neural network. Observe the resultingpoor class-separation between healthy and fractured vertebrae. We attribute thisto the wide variation in vertebral shapes: a healthy lumbar vertebra is ‘more dif-ferent’ from a healthy upper-thoracic vertebra than a fractured lumbar vertebra.Moreover, there exists a ‘gradation’ among vertebral fractures, further obfuscat-ing a clear shape-based separation (cf. Fig. 1b).

Genant’s Vertebral Fracture Grading

The Current gold standard in gradingvertebral fractures is a semi-quantitative method developed by Genant et al [1],according to which fractures are categorized into three grades (Grades 1, 2, and rading Loss for vertebral fracture detection 3

3; cf. Fig 1b). This is based on the height-loss a vertebra undergoes compared toits healthy counterpart. A healthy vertebra is considered to be Grade 0. Grades2 and 3 have proven clinical consequences, while this is unknown for Grade 1;as its small height reduction results in a high inter-rater uncertainty, Grade 1fractures are excluded in this study.

Representation Learning

In this work, we aim to incorporate the gradualshape variations, courtesy of the fracture grades, into the training process ofa classiﬁer by explicitly adjusting the latent space. Deep learning models arebelieved to generate useful representations as a byproduct of the task they aretrying to solve. However, this is not the case in low-data regimes as shown inFig 1a.

Representation learning or metric learning can be used to learn eﬃcientlatent representations in such scenarios. Siamese networks [15] using contrastiveloss and Face-Net [14] with its triplet loss are examples of standard metric learn-ing frameworks wherein representations of similar entities are clustered togetherwhile those of dissimilar ones are pushed apart. Contributions

In this work, we attempt to solve vertebral fracture detectionas a two-class, healthy vs. fractured classiﬁcation problem. – Towards accurate classiﬁcation, we pre-train the neural network using fracture-grade based representation learning. For this, we propose a novel loss func-tion termed grading loss , which encourages the learnt representations torespect the gradation in the appearance of fractures. – Accounting for the dependence of vertebral shapes on vertebral labels, wealso propose a spine-region based pre-conditioning module. – We validate the proposed fracture detection regime on a publicly availableVerSe dataset obtaining a classiﬁcation F Given a collection of 2D vertebral patches, the objective of our work is to clas-sify them into two classes, fractured and healthy. As vertebral shape dependson it label and the amount of variation in this shape due to a fracture de-pends on the fracture grade, we hypothesize that preceding the classiﬁcationstage with fracture-grade and vertebral-label dependant pre-training results inan improved class separation. Consequently, this results in improved classiﬁca-tion performance. We model these pre-training stages with inspiration from theﬁeld of representation learning.

Metric learning aims to learn better data representations by working on a notionof ‘distance’ in the latent space. It aims to cluster similar objects closer (by

M. Husseini et al. if Fig. 2: Left: the arrangement of the three grades and the positive anchor duringinitialization. Right: g g g g g g

3. The positive anchor gn works on clustering the similar classes together in the latent space. gn belongsto the g fractured , but the former is more similar to a vertebra froma healthy class than the latter. Incorporating such ’ranking’ criterion into themetric learning framework, we propose the grading loss .Assume a 2D vertebral patch, x ∈ X , is mapped to a representation f ( x )by a neural network f . The Euclidean distance between the representations oftwo examples x i and x j is denoted by d ( x i , x j ) = || f ( x i ) − f ( x j ) || . We designthe grading loss as a quadruplet loss [11] working with quadruplets denoted by { x ig , x jg , x kg , x lg n } , where x ig denotes a healthy vertebra sample, x jg and x kg denote samples of grade two and three, respectively, and n ∈ { , , } can berandomly chosen as a healthy or a fractured example. Observe that the x i , x j ,and x k form a static triplet, i.e. they are always sampled from ﬁxed sub-classesof fracture grades. We incorporate a grading in the embedding space as follow: agrade-3 fracture is farther away from a healthy (grade-0) vertebra than a grade-2fracture vertebra, and grade-2 is closer to grade-3 and it is to healthy. We canformulate these requirements as: d ( x g , x g ) + α < d ( x g , x g ) and (1) d ( x g , x g ) + β < d ( x g , x g ) , (2)where α and β are distance thresholds. Note that the Eq. 1 uses g g

0. Owing to the triangular inequality of distances,the restrictions on the distances from g g g rading Loss for vertebral fracture detection 5 L = max(0 , d ( x g , x g ) − d ( x g , x g ) + α ) and (3) L = max(0 , d ( x g , x g ) − d ( x g , x g ) + β ) . (4)Observe that Eqs. 3 and 4 structurally represent the triplet loss. However,observe that these do not work on similarities. They form separating objectivesbetween various fracture grades. Finally, a third, clustering objective is incorpo-rated by virtue of x l in the quadruplet. Recall that x g n could belong to any of thethree sub-classes. Based on the value of n in the sampled triplet, the clusteringobjective pulls x g n closer to its match in the static triplet. We demonstrate our grading loss in Fig.2 with x g n belonging to g x g n bythe term positive anchor. The clustering objective can be represented as: L = max(0 , γ − d ( x { i,j,k } g n , x lg n )) (5)where n ∈ { , , } and x i and x j are a pair of samples from the sameclass. Assembling the loss terms together results in the proposed objective of grading loss , L G = L + L + L . In this work, the distance thresholds arechosen to be α > β > γ . This is to ensure that d ( x g , x g ) is as large as possiblewhile maintaining d ( x g , x g ) > d ( x g , x g ). That is, a higher separation betweenfractured and healthy classes is desirable compared to that between the twogrades. Considering the wide variation in shape from cervical to lumbar vertebrae, weclaim that learning label-speciﬁc representations as a pre-training stage alsoimproves fracture detection. Assuming the availability vertebral labels duringthe training process, we construct ﬁve categories of vertebra based on theirshape similarity: T ∼ T , T ∼ T , T ∼ T , L ∼ L

4, and L

5. Treatingthis as a ﬁve-class problem, we can employ any standard metric loss for learninglabel-speciﬁc representations. Note that our grading loss can also be extendedfor this case as there exists a ‘ranking’ among the classes. Another applicationcould be in brain tumors, where grades are also present (high grade and lowgrade gliomas). We leave the application of our loss to such scenarios for futurework.

We perform fracture detection with the following network architecture [13], con-taining 5 × → maxpool → (conv64-bn-relu) → maxpool (conv128-bn-relu) → maxpool → (conv256-bn-relu) → maxpool → (linear256-bn-lrelu) M. Husseini et al. → (linear128-bn-lrelu) → (linear64-bn-lrelu) → linear8where ‘bn’ and ‘lrelu’ represent batch normalization and leaky Relu layersrespectively. Recall that our network consists of a pre-training stage (for repre-sentation learning) and a training stage (for fracture detection). Furthermore,the pre-training stage consists of two sub-stages: vertebral-index-based repre-sentation learning and fracture-grade-based representation learning. Once pre-trained, the network is trained by optimizing a binary cross entropy loss overthe fractured and healthy classes. For this, the last linear layer (linear8) is re-placed with a two node linear layer (linear2) for the two classes. The network isimplemented using the Pytorch library on an Nvidia GTX 1080 gpu. All lossesare optimized using the Adam optimizer with a learning rate of 0 . α, β and γ to 1 . , . In this section, we evaluate the contribution of the two main components pro-posed as part of our classiﬁcation routine: ﬁrst, the proposed grading loss’ abilityin in learning eﬃcient representations, and second, our complete fracture detec-tion routine.

Dataset

Recall that the proposed approach works at a vertebra level and utilizesvertebral labels. We utilize the publicly available VerSe [4,5,6] dataset and itscentroid annotations. As part of [1], its vertebra are annotated for fractures ofthree grades. We work with healthy, grade-2 and grade-3 fracture. We excludethe cervical vertebrae ( C ∼ C

7) as vertebral fractures are extremely rare inthis region. The dataset consists of 1283 vertebrae extracted from 157 scans,among which 1133 are healthy, 104 are g g g g Data Preperation

Typically, a vertebra’s mid-sagittal slice is a good indicatorof a fractures. However, in cases where the vertebra presents itself in an atypicalorientation, using the mid-sagittal slice is ineﬀective. Therefore, we utilize thevertebral centroids to extract 2D reformations of the vertebra along the mid-vertebral plane perpendicular to the vertebra’s sagittal axis. Speciﬁcally, weconstruct a spline passing through the centroids and reformat the sagittal planealong which this spline passes. From this reformation, vertebral patches of size112 ×

112 pixels at 1 × mm resolution are extracted so that additional contextis provided by the vertebra above and below the vertebra-of-interest (VOI). Ournetwork consumes these patches. Additionally, a Gaussian around the centroidis passed as an additional channel for indicating the VOI. rading Loss for vertebral fracture detection 7Setup SN SP F ± ± * 57.4 ± ± ± ± Grading ± ± * ± Table 1: Evaluating learnt representations ( representation learn → fracturetrain ): Performance comparison of various losses for learning fracture-speciﬁcrepresentations. * indicates statistical insigniﬁcance ( p -value=0.44). We validate the proposed grading loss in two stages: ﬁrst, it is deployed as astand-alone representation learning loss, where the separability of the learnt rep-resentations is tested (without any fracture-oriented training), and second, it iscombined with a fracture classiﬁcation module as a pre-training stage along withthe proposed spine region-based representation learning component. The clas-siﬁcation performances of various setups are compared using sensitivity ( SN ),speciﬁcity ( SP ), and F g g Grading loss results in better representations

In this experiment, wevalidate the eﬀectiveness of the proposed grading loss at learning eﬃcient rep-resentations for fracture detection. We compare our loss with the two standardmetric-learning losses: contrastive and triplet losses. Speciﬁcally, the neural net-work is optimized on the training set using each of the metric losses. Oncetrained, it is used to obtained the latent representations of the training sampleson which a support vector machine (SVM) with a linear kernel is learnt. Themore linearly separable the representations are, better the learnt SVM performson the test set’s latent representation. Table 1 reports the classiﬁcation perfor-mance of this SVM on the test set representations. Observe that the gradingloss readily oﬀer better ‘linear’ separability ( ∼

8% increase in F grading loss . Proposed fracture detection regime

Our complete fracture detection pipelineconsists of three stages: two pre-training stages followed by the main classiﬁca-tion stage. The ﬁrst pre-training stage includes optimizing a contrastive lossover the ﬁve regions of spine described in Sec. 2.2. Experiments justifying thechoice of contrastive loss for this stage are presented in the supplement. Fol-lowing this, the network goes through the second pre-training stage where our

M. Husseini et al.(a) Contrastive Loss (b) Triplet Loss (c)

Grading loss

Fig. 3: TSNE visualisation of the representations learnt by various metric learn-ing losses, without explicit classiﬁcation-speciﬁc training. Proposed grading loss obtains more separability between healthy and fractured classes.

Label Pre-train Rep. Learn. Frac. train

SN SP F (cid:56) (cid:56) (cid:88) ± ± ± (cid:88) (cid:56) (cid:88) ± ± ± (cid:56) Contrastive (cid:88) ± ± ± (cid:56) Triplet (cid:88) ± ± ± (cid:56) Grading (cid:88) ± ± ± (cid:88) Contrastive (cid:88) ± ± ± (cid:88) Triplet (cid:88) ± ± ± (cid:88) Grading (cid:88) ± ± ± Table 2: Validating the proposed fracture detection regime ( label pre-train → representation learn → fracture train ): Comparison of the pro-posed training routine based on grading loss with naive classiﬁcation as well aswith other representation-learning-augmented classiﬁcations. grading loss is minimized. Finally, the network is optimized for fracture de-tection using cross entropy loss. We represent the proposed pipeline as labelpre-train → representation learn → fracture train ). Table 2 reports anablative test of the proposed routine. We test the contribution of label-basedpre-training and that of the proposed grading loss-based representation learningis evaluated. Compared with a baseline network trained end-to-end for frac-ture detection, pre-training with vertebral labels oﬀers a 5% improvement in F grading loss also improves the classiﬁcation performance, providing about 7% F F rading Loss for vertebral fracture detection 9 We conclude that in case of low-data regimes with severe data imbalance, aug-menting classiﬁcation with representation learning-based pre-training helps. Com-pared to conventional metric losses which work on similarity or dissimilarity ofexamples, the proposed grading loss which incorporating a ‘ranking’ within theclasses provides a superior performance. Going a step further, incorporating thevertebral label information using similar techniques of representation learningfurther improves fracture detection. The proposed fracture routine achieves an F Acknowledgements

This work is supported by DIFUTURE, funded by the German Federal Ministryof Education and Research under (01ZZ1603[A-D]) and (01ZZ1804[A-I]).

References

1. Genant, H. K.et al.: Vertebral fracture assessment using a semiquantitative tech-nique. Journal of bone and mineral research, 8(9), 1137-1148 (1993)2. Carberry, G. et al.: Unreported vertebral body compression fractures at abdominalmultidetector CT. Radiology, 268(1), 120-126 (2013)3. Cauley, J. et al.: Risk of mortality following clinical fractures. Osteoporosis inter-national, 11(7), 556-561 (2000)4. Loeﬄer, M. et al.: A Vertebral Segmentation Dataset with Fracture Grading, Radi-ology: Artiﬁcial Intelligence (In Press) (2020)5. Sekuboyina, A. et al.: VerSe: A Vertebrae Labelling and Segmentation Benchmark.arXiv eprint: 2001.09193. URL: arXiv:2001.09193 (2020)6. Sekuboyina A. et al., Labelling Vertebrae with 2D Reformations of Mul-tidetector CT Images: An Adversarial Approach for Incorporating PriorKnowledge of Spine Anatomy, In: Radiology: Artiﬁcial Intelligence vol. 2https://doi.org/10.1148/ryai.2020190074 (2020)7. Valentinitsch, A. et al.: Opportunistic osteoporosis screening in multi-detector CTimages via local classiﬁcation of textures. Osteoporosis international, 30(6), 1275-1285 (2019)8. Bar, A. et al.: Compression fractures detection on CT. In Medical Imaging 2017:Computer-Aided Diagnosis Vol. 10134, p. 1013440. International Society for Opticsand Photonics (2017)9. Tomita, N.et al.: Deep neural networks for automatic detection of osteoporotic ver-tebral fractures on CT scans. Computers in biology and medicine, 98, 8-15 (2018)10. Nicolaes, J.et al.: Detection of vertebral fractures in CT using 3D ConvolutionalNeural Networks. arXiv preprint arXiv:1911.01816 (2019)11. Chen, W. et al.: Beyond triplet loss: a deep quadruplet network for person re-identiﬁcation. In Proceedings of the IEEE Conference on Computer Vision andPattern Recognition, pp. 403-412 (2017)0 M. Husseini et al.12. Husseini, M. et al.: Conditioned Variational Auto-encoder for Detecting Osteo-porotic Vertebral Fractures. In International Workshop and Challenge on Compu-tational Methods and Clinical Applications for Spine Imaging, pp. 29-38 (2019)13. Raghu, M. et al.: Transfusion: Understanding transfer learning for medicalimaging. In Advances in Neural Information Processing Systems, pp. 3342-3352,https://doi.org/10.10007/1234567890 (2019)14. Schroﬀ, F. et al.: Facenet: A uniﬁed embedding for face recognition and clustering.In: Proceedings of the IEEE conference on computer vision and pattern recognitionpp. 815-823 (2015)15. Hadsell, R. et al.: Dimensionality reduction by learning an invariant mapping.IEEE Computer Society Conference on Computer Vision and Pattern RecognitionVol. 2, pp. 1735-1742 (2006)16. Finn, C. et al.: Model-agnostic meta-learning for fast adaptation of deep networks.In Proceedings of the 34th International Conference on Machine Learning vol. 70,pp. 1126-1135 (2017) rading Loss for vertebral fracture detection 11

Appendix A Supplementary Material: Label pre-train

Setup

SN SP F ± ± ± ± ± ± Triplet 68.9 ± ± ± Table 3: Contrastive loss was used in the label pre-trainlabel pre-train