[PDF] Cranial Implant Prediction using Low-Resolution 3D Shape Completion and High-Resolution 2D Refinement

Abstract

Designing of a cranial implant needs a 3D understanding of the complete skull shape. Thus, taking a 2D approach is sub-optimal, since a 2D model lacks a holistic 3D view of both the defective and healthy skulls. Further, loading the whole 3D skull shapes at its original image resolution is not feasible in commonly available GPUs. To mitigate these issues, we propose a fully convolutional network composed of two subnetworks. The first subnetwork is designed to complete the shape of the downsampled defective skull. The second subnetwork upsamples the reconstructed shape slice-wise. We train the 3D and 2D networks together end-to-end, with a hierarchical loss function. Our proposed solution accurately predicts a high-resolution 3D implant in the challenge test case in terms of dice-score and the Hausdorff distance.

Full PDF

CCranial Implant Prediction usingLow-Resolution 3D Shape Completion andHigh-Resolution 2D Reﬁnement

Amirhossein Bayat , , , Suprosanna Shit , , Adrian Kilian , Jrgen T.Liechtenstein , Jan S. Kirschke , , and Bjoern H. Menze , Department of Informatics, Technical University of Munich, Germany Department of Neuroradiology, Klinikum rechts der Isar, Germany TranslaTUM Center for Translational Cancer Research, Munich, Germany Department for Oral & Maxillofacial Surgery, University HospitalSchleswig-Holstein, Campus Kiel, Arnold-Heller-Strasse 3, 24105 Kiel, Germany [email protected]

Abstract.

Designing of a cranial implant needs a 3D understanding ofthe complete skull shape. Thus, taking a 2D approach is sub-optimal,since a 2D model lacks a holistic 3D view of both the defective andhealthy skulls. Further, loading the whole 3D skull shapes at its originalimage resolution is not feasible in commonly available GPUs. To mitigatethese issues, we propose a fully convolutional network composed of twosubnetworks. The ﬁrst subnetwork is designed to complete the shape ofthe downsampled defective skull. The second subnetwork upsamples thereconstructed shape slice-wise. We train both the 3D and 2D networks intandem in an end-to-end fashion, with a hierarchical loss function. Ourproposed solution accurately predicts a high-resolution 3D implant inthe challenge test case in terms of dice-score and the Hausdorﬀ distance.

Keywords:

Cranial-implant design · shape completion ·

3D reconstruc-tion · super resolution. Cranial implant design is a crucial task for clinical planning of cranioplasty [13].Previous works mainly rely on freely available CAD tools for cranial implantdesign [8,4,14,6]. The time requirements and need for expert intervention forthese approaches are a major hindrance for fast and in-prompt deployment.The AutoImplant challenge aims to look for simple and easy-to-use automaticsolution that can accurately predict cranial implants. Keeping this in mind, wetailor our proposed solution to best ﬁt the requirements of clinicians.Previous literature [1] tend to exploit the geometric symmetry and predictcranial implant based on the unaﬀected skull region. Nevertheless, this resultsin a suboptimal solution, since the human skull is not perfectly symmetric inreality. These solutions also fall short when the implant is not exclusively inone hemisphere. Morais et al. [15] used a deep 3D encoder-decoder [2,18,11] a r X i v : . [ ee ss . I V ] S e p A. Bayat et al.

Fig. 1. Few Training sample:

The ﬁrst row depicts rendered 3D volumes of fourrandomly selected defective scan from the training dataset. The second row shows thecorresponding ground truth cranial implant. network to reconstruct the incomplete skull in low-resolution space. While thelow-resolution space facilitates faster processing, the quality of the reconstructionlacks minute local anatomical detail. In the Autoimplant baseline paper [12], asimilar approach is taken where the authors ﬁrst localize the defective region inthe skull and then predict the implant using an encoder-decoder network. Whilethis pipeline is suitable for modular design of accurate defective region detectionand implant prediction, the network is not end-to-end trainable, and thus anyerror during the ﬁrst stages would penalize the implant prediction.In this approach, we try to alleviate this by relying on coarse-scale implantprediction in 3D followed by ﬁne-scale enhancement of the predicted implant.We identify that 3D is most suitable to predict the implant since anatomicalconsistency is best captured by a 3D receptive ﬁeld compared to any local 2Dslices. However, to reduce the memory and computational power, we ﬁrst predictthe implant in a down-sampled defective skull. Subsequently, we enhance thepredicted implant slice wise by a 2D decoder network. Thus our solution becomesend-to-end trainable and also is eﬃcient at the same time for the high-resolutionimplant prediction task.

The dataset is created by artiﬁcially generating the defect in the scan [12]. Thusthe original skull would be the ground truth for the implant prediction task. Weleverage this availability of the target label and cast the implant prediction taskas a supervised volumetric reconstruction task. At the core of our method liesa 3D encoder-decoder network. This network takes the low-resolution defectiveskull as input and predicts a low-resolution implant at the output. We arguethat the implant prediction task lies in a lower-dimensional manifold since thekey properties to predict implant are the inner and outer surface consistency. bbreviated paper title 3

3D Encoder-DecoderIncompleteScan ReconstructedScan Random Slice 2D Upsampler Upsampled Slice3D Loss 2D Loss

Fig. 2. Schematic overview of our proposed pipeline for predicting the cra-nial implant.

The downsampled defective scan goes through an encoder-decoder basedshape completion network. During training, N number of random reconstructed skullgoes through a second decoder network for high-resolution reconstruction. For the 3Dshape completion, we use a volumetric (cid:96) norm, and for the 2D reﬁnement task, weuse summation of 2D (cid:96) loss. Hence, a down-sampled input space is suﬃcient for a coarse-scale identiﬁcation ofthe implant region. A simple element-wise subtraction of the reconstructed skulland the input will produce the desired implant. This approach is in line withthe shape completion literature [5,20,17,19,9]. Next, we need to upsample thepredicted implant, which can be done in several ways. Classical approaches, suchas spline-based interpolation, can be a simple choice. Alternatively, a decodernetwork proved to be superior in the super-resolution task [10,3]. Hence, weincorporate a second module in our method, a 2D up-sampler. This up-samplertakes selected axial slices during training and predicts the up-sampled versionof it. To be able to train the both the network jointly and also ﬁt the data inthe GPU memory, we select N random slices out of the reconstructed shapeand select the corresponding slices from the original scale Ground Truth. Theerror between the predicted slice and the ground truth skull is used to train the2D decoder. The high-resolution reconstruction error, along with the 3D shapecompletion error, contributes to the training of the 3D encoder-decoder. In the following, we describe the architecture of two subnetworks in our modeland the loss functions used to train the model.

3D Encoder-Decoder:

Encoder-decoder type network has been previouslyused in bio-physical simulation [7], image segmentation [16] etc. Our 3D net-work has three sequential components, such as an encoder, bottleneck, and adecoder. The encoder further compresses the input signals into a more compactrepresentation, which is processed in the bottleneck unit to extract useful fea-tures. These features go through the decoder part to reconstruct the completeskull. The complete architecture is as follows: IN → CN → CN → CN → RB → RB → RB → RB → T C → T C → C → OU T A. Bayat et al. where IN and OU T is input and output volume respectively with single chan-nel, CN s ch is convolution with stride s and output channel ch followed bybatch norm and ReLU, T C s ch is transposed convolution with stride s and out-put channel ch followed by batch norm and ReLU, RB ch is residual blockconsists of two successive unit of convolution with stride 1 and output channel ch followed by instance norm and ReLU, and C s ch is convolution with stride s and output channel ch followed by sigmoid. Note that all convolution andnorm layers described here are 3D.

2D Decoder Upsampler:

The 2D upsampler network consists of four resid-ual blocks, followed by the nearest neighborhood upsampling layer and a ﬁnalconvolution layer. The residual blocks reﬁne the low-resolution reconstructedscans to incorporate anatomical consistency, which aids precise high-resolutionskull at the output. We concatenate the corresponding slice of the defective scanalong with the reconstructed scan and pass it as an input to the 2D upsampler.This helps to correct any location-wise mismatch in the 3D shape-completiontask. Borrowing a few notations deﬁned in the previous paragraph, the completearchitecture is given below: IN → CN → SE → RB → SE → RB → SE → RB → SE → RB → N N sqrt (512 / → N N sqrt (512 / → C → OU T where SE ch is ‘squeeze and excitation’ layer and N N s ch is Nearest Neighbor-hood (NN) upsample with scale factor s and output channel ch followed byinstance norm and ReLU. Note that all convolution and norm layers describedhere are 2D. Loss Function:

Let’s denote the ground truth data at original scale as I G ,downsampled ground truth data I g , defective 3D volume at original scale as I D ,downsampled defective 3D volume as I d , the functional form of the 3D encoder-decoder network as S (), and the functional form of the 2D upsampler networkas U () respectively. The cranial implant is predicted as follows:Cranial Implant = U ( S ( I d )) \ I D (1)where \ denotes set diﬀerence. The total loss function of our method is as follows: L total = L D + L D (2) L D = (cid:107) S ( I d ) − I g (cid:107) (cid:96) (3) L D = (cid:88) i ∈ Ω (cid:107) U ( S ( I d ) i ) − I iG (cid:107) (cid:96) (4)where Ω is the set of random slices We realize our model in PyTorch. We trained the networks with Adam optimizerand a learning rate of 0.0001. We used an Nvidia Quadro P6000 GPU. The batch bbreviated paper title 5 size for the 3D network was 1, so one volume per iteration.

We downsampledthe original 3D volume by a factor of in all dimension because thatis the largest 3D volume we can ﬁt in our GPU along with the 2Ddecoder module. The downsampled 3D volume is zero-padded in the z-dimension to make it × × . After predicting the completed 3D shapein low resolution, we sample 10 slices randomly along the Z-axis and concatenatethem channel-wise with the downsampled corresponding slice from the defectiveskull and feed them to the upsampler decoder. We can’t ﬁt the entire volumewith the original scale in the memory, so we have to select 2D slices. In order toavoid overﬁtting, we select the slices randomly. It is important to note that, afterdownsampling the volume with a scaling factor, every 3 slices along Z-axisin the original scale correspond to 1 slice in the downsampled volume. Thus,after reconstructing the 3D shape, we have a set of selected slices using randomindices and three sets of [random indices/0.35], [random indices/0.35]+1 and[random indices/0.35]+2. We select the corresponding slices from the defectivescan and downsample them in 2D to be concatenated with the slices from thepredicted shape. Thus, the batch size for the upsampler decoder network is 30.

Inference:

For inference, similarly, a downsampled volume is fed to the network,and it is reconstructed in low resolution using the ﬁrst sub-network. After that,all of the slices along the Z-axis are fed to the upsampler decoder, one-by-one,and stacked in volume to reconstruct the shape in 3D. Subsequently, we subtractthe defective input scan from the high-resolution reconstructed scans to estimatethe cranial implant. Finally, as a post-processing step, we erode and dilate thesegmentation consequently with a sphere structure with a radius of 2 to removethe noise. Subsequently, we select the largest component in the segmentationmap, using connected component analysis.

Table 1.

Our score on the validation datasetMethod

Dice HD-distance

Ours (Transposed Conv) 0.8363 10.6570Ours (NN Upsampling)

We work with 100 data samples split 5 : 1 forming the training and valida-tion set. We validate our approach by comparing the constructed implants tothe ground truth, using the Dice score and the Hausdorﬀ distance. The valida-tion results are presented in Table 1.

We experimented with two variationsof our 2D decoder model. In the ﬁrst case, we trained the original

A. Bayat et al.

Fig. 3. Qualitative results:

The ﬁrst row depicts four rendered 3D volume of defec-tive scan from the test dataset. The second row shows the reconstructed skull by ourmethod of the corresponding defective skull. The third row is the corresponding cranialimplant predicted by our method. We observe that our method generalizes well andaccurately reconstruct the skull to predict the cranial implants. decoder, and in the second case, we replace the NN upsampling lay-ers of the 2D decoder with transposed convolution layers.

We observethat the 2D decoder with NN upsampling layers performs signiﬁcantly betterthan the decoder with transposed-convolution layers. We attribute this to theover parameterization during the upsampling step. Since the image is binary innature, the nearest neighborhood upsampling layer is suﬃcient for this task.

Table 2.

Our score on the 100 test cases.Method

Dice HD-distance

Baseline [12] 0.8555 5.1825Ours (NN Upsampling)

Finally, we tested our model on the challenge test set and report the resultsfor cases 000 ∼

099 in Table 2. Our model outperforms the baseline method bbreviated paper title 7 proposed in [12] both in the Dice score and Hausdorﬀ distance. We did notreport the results on cases 100 ∼ We provide an eﬃcient and compact solution for the AutoImplant 2020 chal-lenge, which is suitable for fast and easy deployment. Our key innovation is theincorporation of a two-stage reconstruction policy, where the ﬁrst stage predictsa coarse-scale implant, and the second stage super-resolve it to a high-resolutionone. We achieve accurate implant prediction on the validation dataset. Ourmodel is end-to-end in the high-resolution space and thus can serve as a baselinefor developing more complex models aiming to better learn the anatomicallyinvariant implant prediction.

Acknowledgement

Amirhossein Bayat is supported by the European Research Council (ERC) un-der the European Union’s ‘Horizon 2020’ research & innovation programme(GA637164iBackERC2014STG). Suprosanna Shit is supported by the Transla-tional Brain Imaging Training Network (TRABIT) under the European Union’s‘Horizon 2020’ research & innovation program (Grant agreement ID: 765148).

References

1. Angelo, L., Di Stefano, P., Governi, L., Marzola, A., Volpe, Y.: A robust andautomatic method for the best symmetry plane detection of craniofacial skeletons.Symmetry , 245 (02 2019)2. Bayat, A., Sekuboyina, A., Paetzold, J.C., Payer, C., Stern, D., Urschler, M.,Kirschke, J.S., Menze, B.H.: Inferring the 3d standing spine posture from 2d ra-diographs. arXiv preprint arXiv:2007.06612 (2020)3. Bhowmik, A., Shit, S., Seelamantula, C.S.: Training-free, single-image super-resolution using a dynamic convolutional network. IEEE Signal Processing Letters (1), 85–89 (2017)4. Chen, X., Xu, L., Li, X., Egger, J.: Computer-aided implant design for the restora-tion of cranial defects. Scientiﬁc Reports , 1–10 (06 2017)5. Dai, A., Qi, C.R., Nießner, M.: Shape completion using 3d-encoder-predictor cnnsand shape synthesis. 2017 IEEE Conference on Computer Vision and PatternRecognition (CVPR) pp. 6545–6554 (2016)6. Egger, J., Gall, M., Tax, A., al, M., Zeﬀerer, U., Li, X., von Campe, G., Schfer,U., Schmalstieg, D., Chen, X.: Interactive reconstructions of cranial 3d implantsunder mevislab as an alternative to commercial planning software. PLoS ONE ,20 (03 2017) A. Bayat et al.7. Ezhov, I., Mot, T., Shit, S., Lipkova, J., Paetzold, J.C., Koﬂer, F., Navarro, F.,Metz, M., Wiestler, B., Menze, B.: Real-time bayesian personalization via a learn-able brain tumor growth model. arXiv preprint arXiv:2009.04240 (2020)8. Gall, M., Li, X., Chen, X., Schmalstieg, D., Egger, J.: Computer-aided planningand reconstruction of cranial 3d implants. Annual International Conference of theIEEE Engineering in Medicine and Biology Society (EMBC) pp. 1179–1183 (082016)9. Han, X., Li, Z., Huang, H., Kalogerakis, E., Yu, Y.: High-resolution shape comple-tion using deep neural networks for global structure and local geometry inference.2017 IEEE International Conference on Computer Vision (ICCV) pp. 85–93 (2017)10. Hu, X., Yan, Y., Ren, W., Li, H., Zhao, Y., Bayat, A., Menze, B.: Feedback graphattention convolutional network for medical image enhancement. arXiv preprintarXiv:2006.13863 (2020)11. Husseini, M., Sekuboyina, A., Bayat, A., Menze, B.H., Loeﬄer, M., Kirschke, J.S.:Conditioned variational auto-encoder for detecting osteoporotic vertebral fractures.In: International Workshop and Challenge on Computational Methods and ClinicalApplications for Spine Imaging. pp. 29–38. Springer (2019)12. Li, J., Pepe, A., Gsaxner, C., von Campe, G., Egger, J.: A baseline approachfor autoimplant: the miccai 2020 cranial implant design challenge. arXiv preprintarXiv:2006.12449 (2020)13. Li, J., Pepe, A., Gsaxner, C., Egger, J.: An online platform for automatic skulldefect restoration and cranial implant design. arXiv:2006.00980 (2020)14. Marzola, A., Governi, L., Genitori, L., Mussa, F., Volpe, Y., Furferi, R.: A semi-automatic hybrid approach for defective skulls reconstruction. Computer-AidedDesign and Applications , 190–204 (05 2019)15. Morais, A., Egger, J., Alves, V.: Automated Computer-aided Design of CranialImplants Using a Deep Volumetric Convolutional Denoising Autoencoder, pp. 151–160 (04 2019)16. Navarro, F., Shit, S., Ezhov, I., Paetzold, J., Gaﬁta, A., Peeken, J.C., Combs, S.E.,Menze, B.H.: Shape-aware complementary-task learning for multi-organ segmen-tation. In: International Workshop on Machine Learning in Medical Imaging. pp.620–627. Springer (2019)17. Sarmad, M., Lee, H.J., Kim, Y.M.: Rl-gan-net: A reinforcement learning agent con-trolled gan network for real-time point cloud shape completion. 2019 IEEE/CVFConference on Computer Vision and Pattern Recognition (CVPR) pp. 5891–5900(2019)18. Sekuboyina, A., Bayat, A., Husseini, M.E., L¨oﬄer, M., Rempﬂer, M., Kukaˇcka, J.,Tetteh, G., Valentinitsch, A., Payer, C., Urschler, M., et al.: Verse: A vertebraelabelling and segmentation benchmark. arXiv preprint arXiv:2001.09193 (2020)19. Stutz, D., Geiger, A.: Learning 3d shape completion under weak supervision. In-ternational Journal of Computer Vision pp. 1–20 (2018)20. Sung, M., Kim, V.G., Angst, R., Guibas, L.J.: Data-driven structural priors forshape completion. ACM Trans. Graph.34