Medical image analysis | 2021

Detection, segmentation, and 3D pose estimation of surgical tools using convolutional neural networks and algebraic geometry

 
 
 
 

Abstract


BACKGROUND AND OBJECTIVE\nSurgical tool detection, segmentation, and 3D pose estimation are crucial components in Computer-Assisted Laparoscopy (CAL). The existing frameworks have two main limitations. First, they do not integrate all three components. Integration is critical; for instance, one should not attempt computing pose if detection is negative. Second, they have highly specific requirements, such as the availability of a CAD model. We propose an integrated and generic framework whose sole requirement for the 3D pose is that the tool shaft is cylindrical. Our framework makes the most of deep learning and geometric 3D vision by combining a proposed Convolutional Neural Network (CNN) with algebraic geometry. We show two applications of our framework in CAL: tool-aware rendering in Augmented Reality (AR) and tool-based 3D measurement.\n\n\nMETHODS\nWe name our CNN as ART-Net (Augmented Reality Tool Network). It has a Single Input Multiple Output (SIMO) architecture with one encoder and multiple decoders to achieve detection, segmentation, and geometric primitive extraction. These primitives are the tool edge-lines, mid-line, and tip. They allow the tool s 3D pose to be estimated by a fast algebraic procedure. The framework only proceeds if a tool is detected. The accuracy of segmentation and geometric primitive extraction is boosted by a new Full resolution feature map Generator (FrG). We extensively evaluate the proposed framework with the EndoVis and new proposed datasets. We compare the segmentation results against several variants of the Fully Convolutional Network (FCN) and U-Net. Several ablation studies are provided for detection, segmentation, and geometric primitive extraction. The proposed datasets are surgery videos of different patients.\n\n\nRESULTS\nIn detection, ART-Net achieves 100.0% in both average precision and accuracy. In segmentation, it achieves 81.0% in mean Intersection over Union (mIoU) on the robotic EndoVis dataset (articulated tool), where it outperforms both FCN and U-Net, by 4.5pp and 2.9pp, respectively. It achieves 88.2% in mIoU on the remaining datasets (non-articulated tool). In geometric primitive extraction, ART-Net achieves 2.45∘ and 2.23∘ in mean Arc Length (mAL) error for the edge-lines and mid-line, respectively, and 9.3 pixels in mean Euclidean distance error for the tool-tip. Finally, in terms of 3D pose evaluated on animal data, our framework achieves 1.87\xa0mm, 0.70\xa0mm, and 4.80\xa0mm mean absolute errors on the X,Y, and Z coordinates, respectively, and 5.94∘ angular error on the shaft orientation. It achieves 2.59\xa0mm and 1.99\xa0mm in mean and median location error of the tool head evaluated on patient data.\n\n\nCONCLUSIONS\nThe proposed framework outperforms existing ones in detection and segmentation. Compared to separate networks, integrating the tasks in a single network preserves accuracy in detection and segmentation but substantially improves accuracy in geometric primitive extraction. Overall, our framework has similar or better accuracy in 3D pose estimation while largely improving robustness against the very challenging imaging conditions of laparoscopy. The source code of our framework and our annotated dataset will be made publicly available at https://github.com/kamruleee51/ART-Net.

Volume 70
Pages \n 101994\n
DOI 10.1016/j.media.2021.101994
Language English
Journal Medical image analysis

Full Text