"Train one, Classify one, Teach one" -- Cross-surgery transfer learning for surgical step recognition
Daniel Neimark, Omri Bar, Maya Zohar, Gregory D. Hager, Dotan Asselmann
PProceedings of Machine Learning Research – Under Review:1–12, 2021 Full Paper – MIDL 2021 submission “Train one, Classify one, Teach one” - Cross-surgery transferlearning for surgical step recognition
Daniel Neimark [email protected] Omri Bar [email protected] Maya Zohar [email protected] Gregory D. Hager , [email protected] Dotan Asselmann [email protected] Theator Inc., Palo Alto, CA, USA. Department of Computer Science, Johns Hopkins University, Baltimore, USA.
Editors:
Under Review for MIDL 2021
Abstract
Prior work demonstrated the ability of machine learning to automatically recognize sur-gical workflow steps from videos. However, these studies focused on only a single type ofprocedure. In this work, we analyze, for the first time, surgical step recognition on four dif-ferent laparoscopic surgeries: Cholecystectomy, Right Hemicolectomy, Sleeve Gastrectomy,and Appendectomy. Inspired by the traditional apprenticeship model, in which surgicaltraining is based on the
Halstedian method, we paraphrase the “ see one, do one, teachone ” approach for the surgical intelligence domain as “ train one, classify one, teach one ”.In machine learning, this approach is often referred to as transfer learning. To analyzethe impact of transfer learning across different laparoscopic procedures, we explore varioustime-series architectures and examine their performance on each target domain. We pro-pose a Time-Series Adaptation Network (TSAN), an architecture optimized for transferlearning of surgical step recognition. In addition, we show how TSAN can be pre-trainedusing self-supervised learning on a Sequence Sorting task. Such pre-training enables TSANto learn workflow steps of a new laparoscopic procedure type given only a small numberof samples from the target procedure dataset. Our proposed architecture leads to betterperformance compared to other possible architectures, reaching over 90% accuracy whentransferring from laparoscopic Cholecystectomy to the other three procedure types.
Keywords:
Surgical Intelligence, Surgical Transfer Learning, Surgical Step Recognition,Phase Recognition, Domain Adaptation, Deep Learning.
1. Introduction
Minimally invasive surgery (MIS) video analysis is steadily gaining acceptance for surgicalcompetency assessment (Ritter et al., 2019; Feldman et al., 2020). As MIS is performedunder visualization of endoscopic footage, the possibilities for AI-enabled computer-assistedsurgery (CAS) applications are immense (Maier-Hein et al., 2017).Regardless of the use case, such video-based applications must serve a wide variety ofsurgical procedures in order to be relevant, actionable, meet the surgeons’ clinical needsand provide them with value.A variety of surgery-related video-analysis tasks have been explored in recent studies.Surgical step (phase) recognition (Bar et al., 2020; Twinanda et al., 2016; Zisimopoulos © a r X i v : . [ c s . C V ] F e b eimark Bar Zohar Hager Asselmann Figure 1: The same step,
Adhesiolysis , is viewed in different procedures. ( A ) Cholecystec-tomy, ( B ) Appendectomy, ( C ) Right Hemicolectomy, and ( D ) Sleeve Gastrec-tomy.et al., 2018; Hashimoto et al., 2019), surgical tool detection and segmentation (Twinandaet al., 2016; Al Hajj et al., 2019; Choi et al., 2017; Ni et al., 2020; Jin et al., 2018) andsurgical gesture and skill assessment (Gao et al., 2014; Ahmidi et al., 2017) are a fewexamples. However, these studies were developed and evaluated on only a single type ofprocedure and thus provide limited evidence of their broader applicability in the surgicaldomain.This study aims to address three key aspects which, taken together, provide insight intoour practical ability to scale video analysis of surgery to multiple procedures while minimiz-ing the need for large labeled datasets. First, we assess the potential of using self-supervisedpre-training to reduce dependence on explicitly labeled data. Second, we investigate theeffectiveness of transfer learning to move pre-trained models between different surgical pro-cedures. Finally, we explore the impact of data size on the adaptation capabilities. Takentogether, our results suggest a practical and effective path to generalizing video analysis ofsurgery while minimizing the need for laborious fine-grain labeling.We chose to focus on the foundational task of surgical step recognition – that is, parsinga procedure video into meaningful segments that represent the surgeon’s workflow. Whileprevious studies have explored step recognition for a single type of surgical procedure,laparoscopic Cholecystectomy (Bar et al., 2020; Twinanda et al., 2016), Cataract surgery(Yu et al., 2019; Zisimopoulos et al., 2018) and laparoscopic Sleeve Gastrectomy (Hashimotoet al., 2019), they did not assess whether their methods would perform well if applied toother types of surgeries and did not examine the ability to adapt to new types of procedures.Inspired by the traditional apprenticeship model, in which surgical training is based onthe Halstedian method (Cameron, 1997), we paraphrase the “ see one, do one, teach one ”approach for the surgical intelligence domain as “ train one, classify one, teach one ”. Inmachine learning, this approach is often referred to as transfer learning.Transfer learning attempts to exploit a model that was pre-trained on one task, andapply its knowledge when training a different task, thus improving the overall generalization(Goodfellow et al., 2016). It is especially useful when the target task’s dataset is relativelysmall, like in the surgical domain. Transfer learning has been proven to be a robust methodin many ML challenges. Specifically in the computer vision domain, it enables achievingstate-of-the-art results in object detection (Girshick et al., 2014; Girshick, 2015), imagesegmentation (Long et al., 2015; He et al., 2017), face identification (Taigman et al., 2015)and video action recognition (Carreira and Zisserman, 2017). However, transfer learningtends to work better when the source task is related to the target task (Yosinski et al., ross-surgery transfer learning for surgical step recognition Adhesiolysis in four different procedure types. In
Adhesiolysis ,the goal is to remove adhesions. In these procedures, the adhesions are abdominal, andtheir removal can be done with different tools. While the anatomy viewed changes betweenprocedures, the action remains the same. Thus, we argue that adapting knowledge fromone procedure to another is beneficial.The standard approach for step recognition in previous studies is training two modelsfor each procedure type. A deep ConvNet that extracts visual features and a time-seriesmodel that processes the features sequentially (Bar et al., 2020; Zisimopoulos et al., 2018;Hashimoto et al., 2019). The ConvNets are usually first trained with non-surgical datasets,e.g., ImageNet (Deng et al., 2009) and Kinetics-400 (Kay et al., 2017), but the obviousapproach of transferring knowledge across different procedures has never been assessedbefore.In this study, we suggest a new approach. We use a 3D ConvNet (Carreira and Zisser-man, 2017; Wang et al., 2018), pre-trained for step recognition on Cholecystectomy (Baret al., 2020). This model is used to extract feature representations from videos of threedifferent laparoscopic procedures: Right Hemicolectomy, Sleeve Gastrectomy, and Appen-dectomy. We then explore various architectures for the time-series model and focus onfinding the best one for surgical domain adaptation. We also suggest a self-supervised ini-tialization method that improves the performance of our time-series model. Finally, wecompare our findings with the traditional approach described above.
2. Methods
Our overall approach involves (1) extracting feature representations from videos using a 3DConvNet; and (2) training a time-series model on these features to predict a step label foreach second of video.
We consider several time-series model architectures below and explore two main variantsto process the temporal dimension: (1) 1D convolution layers and (2) recurrent layers.As the main contribution, we found a specific combination of the two that yields optimalperformance. In what follows, we describe architectural details. In all cases, the finalclassification layer for all architectures is a fully connected layer, followed by a softmaxfunction that predicts, for each second, a single surgical step.
1D Convolution Layers (Conv1D).
As the short-term context is important when pre-dicting a step for each second, we use a standard 1D convolution layer and apply them tothe temporal dimension. In our experiments, we explore different kernel sizes in order toobserve different temporal contexts (Han et al., 2020).
Long Short-Term Memory (LSTM).
While Conv1D should be able to learn the contextin a short temporal region of interest, it still lacks the larger scope of view and cannotlink distant information. Hence, most recent studies use LSTM networks as their time-series model. We also explore the capabilities of LSTM to transfer knowledge and use abidirectional LSTM in our experiments, thus not assuming any causality constraints. eimark Bar Zohar Hager Asselmann Figure 2: Time-Series Adaptation Network (TSAN) architecture. Combining three Conv1Dlayers with two LSTM, followed by a fully connected classification layer. φ i indi-cates the feature representations extracted from the 3D ConvNet. K denotes thekernel size of the 1D convolution layers. Time-Series Adaptation Network (TSAN).
Inspired by Ghosh and Kristensson (2017),our architecture fuses three Conv1D layers and two LSTM networks into a single architec-ture. The three Conv1D operates in parallel to a single bidirectional LSTM. The outputsof all four are concatenated and applied to an additional bidirectional LSTM, followed bya fully connected layer for classification (Figure 2).
Sequence Sorting (SeSo).
Compared to other architectures, TSAN is a deeper network,with more parameters to train. The fact that the target surgical datasets are relativelysmall, especially compared to other video benchmarks (Kay et al., 2017), led us to explorea better initialization technique as an alternative to the random initialization.We establish our initialization approach using an analogy of solving jigsaw puzzles(Noroozi and Favaro, 2016). In the temporal domain, this can be structured as correctlyreassembling the shuffled segments of a video. We thus formulate a self-supervised trainingmethod as an initial task for step recognition.More concretely, we split a video into nine segments and shuffle the order randomly.Then, we process each segment’s feature vectors separately with a time-series model. Weconcatenate all nine segments’ last layer output and feed the resulting representation to aclassification head that predicts the correct order. We use SeSo to pre-train both TSANand LSTM networks. We then remove the classification layer and finetune the networks onthe step recognition task.
The training process of the 3D ConvNets is based on the work of Bar et al. (2020). Eachvideo’s features is a matrix of size L × N . Where L is the length of the video (in seconds)and N is the features dimension size (N=2049 in all our experiments).For the Conv1D, we explore three temporal kernel sizes, K = 5 , ,
39, and the output’slength is matched to the input by padding, based on the kernel size. The output dimensionof both the 1D convolution layer and the LSTM hidden layer is set to 128. ross-surgery transfer learning for surgical step recognition Table 1: Number of samples per subset for each of the target datasets.
Total Training Validation TestRight Hemicolectomy 205 123 31 51Sleeve Gastrectomy 229 138 34 57Appendectomy 852 511 128 213
Each network architecture was trained for 100 epochs. We use SGD and set the learningrate to 10 − for the Conv1D networks and 10 − for the LSTM and TSAN networks. Theloss function is the negative log-likelihood loss.Since the features are extracted from the raw videos in advance, applying augmentations,like those used on images, is not feasible. Thus, to apply some sort of data augmentation onour data and avoid overfitting, we apply two types of augmentations on the input featuresmatrix. First, we used an out-of-body and non-relevant video segment detection by applyingthe method described by Zohar et al. (2020). We then mark each video second as eitherrelevant or not, and randomly remove these seconds from the training with a probability of0.5. We also use Dropout of 0.5 both on the input matrix and on the intermediate layers.
3. Results
All datasets were randomly split into three subsets: training, validation, and test, with aratio of 25% for the test and 20% of the remaining for the validation (Table 1). We providea detailed description of the datasets and the steps workflow definition in Appendix A.The annotation process is identical for all procedures types. Each video undergoes arigorous annotation process by two different annotation specialists. The team of annotatorsunderwent thorough training on labeling the workflow steps. Annotation process validitywas confirmed in a previous study (Korndorffer Jr et al., 2020), in which an unbiasedgroup of surgeons reviewed large portions of the Cholecystectomy cases and reported highagreement with our annotation method.
We start by searching for an optimized architecture for surgical transfer learning. As abaseline, we use the traditional technique of training a 3D ConvNet on each target dataset,followed by training a bidirectional LSTM network on the resulting features. This is fullylabeled training on each type of surgery – no surgical transfer learning is applied at thisstage in the process.We then evaluate several time-series models using Cholecystectomy features from a pre-trained 3D ConvNet. In Table 2, we evaluate various models described in Sec. 2.1. Wereport the test set accuracy by measuring the number of seconds in all test videos that arelabeled correctly by each model.At a high level, we see that TSAN, the combination of two LSTMs and three Conv1Ds,pre-trained with our self-supervised approach (Sec. 2.1), outperformed all other methods. eimark Bar Zohar Hager Asselmann Table 2: Comparing different time-series model architectures. The last column is a resultof averaging the accuracy of all three target datasets: Right Hemicolectomy (RH),Sleeve Gastrectomy (SG), and Appendectomy (APPY). The first row is a result ofusing the standard approach without surgical transfer learning. Other table rowsshow the development of our suggested architecture. K denotes the kernel size ofthe 1D convolution layers (C1D). L denotes the number of LSTM layers. RH SG APPYC1D(K=5) C1D(K=25) C1D(K=39) LSTM(L=1) LSTM(L=2) WithSeSo Transferlearning Accuracy AVG (cid:88) (cid:88) (cid:88) (cid:88) (cid:88) (cid:88) (cid:88) (cid:88) (cid:88) (cid:88) (cid:88) (cid:88) (cid:88) (cid:88) (cid:88) (cid:88) (cid:88) (cid:88) (cid:88) (cid:88) (cid:88) (cid:88) (cid:88) (cid:88) (cid:88) (cid:88) (cid:88) (cid:88) (cid:88) (cid:88) (cid:88) (cid:88)
We also observe two other interesting results from the comparisons shown in Table 2.First, our transfer learning approach produces better results than using the traditionaltraining method. And second, if one does apply transfer learning for the surgical domain,our method improves the results by about 2%, compared to a single LSTM network.
To further explore the generalization of our approach and its usability in the future forrapidly achieving high performance on smaller datasets, we study the impact of (labeled)training set size on the final accuracy results. We chose to focus on two variants thatgave the best results for transfer learning, the LSTM network and our TSAN. Both arepre-trained using the SeSo method applied to features from Cholecystectomy.To understand accuracy as a function of dataset size, we split the videos in the trainingsets of Right Hemicolectomy and Sleeve Gastrectomy into smaller subsets with 5 , ,
50 and all training samples ( all equals 123 and 138, respectively). For Appendectomy, as it is alarger dataset, we added two additional subsets of 150 and 350 ( all equals 511). The subsetsare randomly selected and constructed so that each subset extends the previous smaller one.The test set is kept the same to enable a fair comparison.In Figure 3, we show the accuracy values when training with different training setsizes. Our approach with SeSo generalized better for the majority of training set sizes.Furthermore, we see consistently high accuracy achieved between 100 and 200 videos. ross-surgery transfer learning for surgical step recognition Figure 3: Evaluating the impact of the training set size on model generalization. ( A ) RightHemicolectomy, ( B ) Sleeve Gastrectomy and ( C ) Appendectomy. We train theLSTM-SeSo and TSAN-SeSo, using smaller subsets of the original training setand measure the results on a fixed test set. ( D ) The validation accuracy curvewhen training the Sequence Sorting task vs. the number of samples used duringtraining. The model trained using the Cholecystectomy data converges muchfaster compared to all other procedure types. The SeSo initialization helps improve the results of the single LSTM and TSAN architec-tures. Especially for TSAN, this type of pre-training yields the best-performing architecturecompared to other possibilities (Table 2).To better understand the impact of SeSo on surgical transfer learning, we explore theeffect of pre-training the time-series model on the source (Cholecystectomy) or the targetdatasets. We trained four TSAN models on the sorting task using all four datasets. Then,we fine-tuned the models on the step recognition task. We measure the accuracy on eachof the three target datasets, first when pre-training using the source dataset (Cholecys-tectomy), and second when pre-training using the target dataset. Table 3 shows only asmall improvement when using the source dataset to train the SeSo task. While this issurprising, it is likely due to the fact that the Cholecystectomy dataset is larger than the eimark Bar Zohar Hager Asselmann Table 3: Comparing step recognition accuracy results on the three target datasets aftertraining the Sequence Sorting initialization task on either the source dataset(Cholecystectomy) or the target dataset.
Step training dataset SeSo training dataset AccuracyRight Hemicolectomy Right Hemicolectomy 94.5Right Hemicolectomy Cholecystectomy
Sleeve Gastrectomy Sleeve Gastrectomy 94.2Sleeve Gastrectomy Cholecystectomy
Appendectomy Appendectomy 89.9Appendectomy Cholecystectomy others. However, it supports the notion that self-supervised initialization can effectivelyexploit unlabeled data, even when it does not come from the target dataset.In Figure 3.D, we plot the validation accuracy during the SeSo task training and demon-strate that the model also converges much faster on the Cholecystectomy dataset.
4. Conclusion
This work suggests a new approach to train surgical step recognition models by using sur-gical transfer learning. We show, for the first time, an analysis of transfer learning betweendifferent surgical procedures and our findings demonstrate that it is possible to transferknowledge from one procedure to another, even when using relatively small target datasets.It is also the first study to explore surgical step recognition on Right Hemicolectomy andAppendectomy. To facilitate a robust domain adaptation, we explore various architecturesand introduce a new time-series architecture, TSAN, optimized for model adaptation inthe surgical domain. Moreover, we present a Sequence Sorting task as a pre-initializationmethod. The main advantage of this approach, besides that it improves TSAN performancewhen transferring knowledge from one surgery type to another, is the fact that it is trainedwith a self-supervised method.Future work should explore how mutual learning of surgical step recognition, trainedon several procedures simultaneously, will perform. Also, the ideas of domain adaptationpresented in this study could be applied to other surgical related tasks, such as eventdetection, and it would be interesting to test our findings on such tasks.Although significant progress has been made in recent years in the field of surgicalintelligence, the next leap forward must focus on the practical implementation of applyingartificial intelligence in the surgical domain. Solid evidence that these technologies canbe generalized for various surgical procedures is essential for surgeons to embrace themas part of their daily routine, both inside and outside the operating rooms. We believethat surgical transfer learning and the ability to transfer knowledge between models in thesurgical domain are key facilitators and will expedite the development of computer-assistedsurgery in a wide range of surgical procedures. This study is a step in that direction. ross-surgery transfer learning for surgical step recognition References
Narges Ahmidi, Lingling Tao, Shahin Sefati, Yixin Gao, Colin Lea, Benjamin Bejar Haro,Luca Zappella, Sanjeev Khudanpur, Ren´e Vidal, and Gregory D Hager. A dataset andbenchmarks for segmentation and recognition of gestures in robotic surgery.
IEEE Trans-actions on Biomedical Engineering , 64(9):2025–2041, 2017.Hassan Al Hajj, Mathieu Lamard, Pierre-Henri Conze, Soumali Roychowdhury, Xiaowei Hu,Gabija Marˇsalkait˙e, Odysseas Zisimopoulos, Muneer Ahmad Dedmari, Fenqiang Zhao,Jonas Prellberg, et al. Cataracts: Challenge on automatic tool annotation for cataractsurgery.
Medical image analysis , 52:24–41, 2019.Omri Bar, Daniel Neimark, Maya Zohar, Gregory D Hager, Ross Girshick, Gerald M Fried,Tamir Wolf, and Dotan Asselmann. Impact of data on generalization of ai for surgicalintelligence applications.
Scientific reports , 10(1):1–12, 2020.John L Cameron. William stewart halsted. our surgical heritage.
Annals of surgery , 225(5):445, 1997.Joao Carreira and Andrew Zisserman. Quo vadis, action recognition? a new model and thekinetics dataset. In proceedings of the IEEE Conference on Computer Vision and PatternRecognition , pages 6299–6308, 2017.Bareum Choi, Kyungmin Jo, Songe Choi, and Jaesoon Choi. Surgical-tools detection basedon convolutional neural network in laparoscopic robot-assisted surgery. In , pages 1756–1759. Ieee, 2017.Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-Fei. Imagenet: A large-scale hierarchical image database. In , pages 248–255. Ieee, 2009.Liane S Feldman, Aurora D Pryor, Aimee K Gardner, Brian J Dunkin, Linda Schultz,Michael M Awad, and E Matthew Ritter. Sages video-based assessment (vba) program:a vision for life-long learning for surgeons.
Surgical endoscopy , 34:3285–3288, 2020.Yixin Gao, S Swaroop Vedula, Carol E Reiley, Narges Ahmidi, Balakrishnan Varadarajan,Henry C Lin, Lingling Tao, Luca Zappella, Benjamın B´ejar, David D Yuh, et al. Jhu-isigesture and skill assessment working set (jigsaws): A surgical activity dataset for humanmotion modeling. In
MICCAI Workshop: M2CAI , volume 3, page 3, 2014.Shaona Ghosh and Per Ola Kristensson. Neural networks for text correction and completionin keyboard decoding. arXiv preprint arXiv:1709.06429 , 2017.Ross Girshick. Fast r-cnn. In
Proceedings of the IEEE international conference on computervision , pages 1440–1448, 2015.Ross Girshick, Jeff Donahue, Trevor Darrell, and Jitendra Malik. Rich feature hierarchiesfor accurate object detection and semantic segmentation. In
Proceedings of the IEEEconference on computer vision and pattern recognition , pages 580–587, 2014. eimark Bar Zohar Hager Asselmann Ian Goodfellow, Yoshua Bengio, and Aaron Courville.
Deep learning . MIT press, 2016.Wei Han, Zhengdong Zhang, Yu Zhang, Jiahui Yu, Chung-Cheng Chiu, James Qin, An-mol Gulati, Ruoming Pang, and Yonghui Wu. Contextnet: Improving convolutionalneural networks for automatic speech recognition with global context. arXiv preprintarXiv:2005.03191 , 2020.Daniel A Hashimoto, Guy Rosman, Elan R Witkowski, Caitlin Stafford, Allison J Navarette-Welton, David W Rattner, Keith D Lillemoe, Daniela L Rus, and Ozanan R Meireles.Computer vision analysis of intraoperative video: Automated recognition of operativesteps in laparoscopic sleeve gastrectomy.
Annals of surgery , 270(3):414–421, 2019.Kaiming He, Georgia Gkioxari, Piotr Doll´ar, and Ross Girshick. Mask r-cnn. In
Proceedingsof the IEEE international conference on computer vision , pages 2961–2969, 2017.Amy Jin, Serena Yeung, Jeffrey Jopling, Jonathan Krause, Dan Azagury, Arnold Milstein,and Li Fei-Fei. Tool detection and operative skill assessment in surgical videos usingregion-based convolutional neural networks. In , pages 691–699. IEEE, 2018.Will Kay, Joao Carreira, Karen Simonyan, Brian Zhang, Chloe Hillier, Sudheendra Vijaya-narasimhan, Fabio Viola, Tim Green, Trevor Back, Paul Natsev, et al. The kineticshuman action video dataset. arXiv preprint arXiv:1705.06950 , 2017.James R Korndorffer Jr, Mary T Hawn, David A Spain, Lisa M Knowlton, Dan E Azagury,Aussama K Nassar, James N Lau, Katherine D Arnow, Amber W Trickey, and Carla MPugh. Situating artificial intelligence in surgery: a focus on disease severity.
Annals ofSurgery , 272(3):523–528, 2020.Jonathan Long, Evan Shelhamer, and Trevor Darrell. Fully convolutional networks forsemantic segmentation. In
Proceedings of the IEEE conference on computer vision andpattern recognition , pages 3431–3440, 2015.Lena Maier-Hein, Swaroop S Vedula, Stefanie Speidel, Nassir Navab, Ron Kikinis, AdrianPark, Matthias Eisenmann, Hubertus Feussner, Germain Forestier, Stamatia Giannarou,et al. Surgical data science for next-generation interventions.
Nature Biomedical Engi-neering , 1(9):691–696, 2017.Zhen-Liang Ni, Gui-Bin Bian, Guan-An Wang, Xiao-Hu Zhou, Zeng-Guang Hou, Xiao-Liang Xie, Zhen Li, and Yu-Han Wang. Barnet: Bilinear attention network with adaptivereceptive field for surgical instrument segmentation. arXiv preprint arXiv:2001.07093 ,2020.Mehdi Noroozi and Paolo Favaro. Unsupervised learning of visual representations by solvingjigsaw puzzles. In
European conference on computer vision , pages 69–84. Springer, 2016.E Matthew Ritter, Aimee K Gardner, Brian J Dunkin, Linda Schultz, Aurora D Pryor, andLiane Feldman. Video-based assessment for laparoscopic fundoplication: initial develop-ment of a robust tool for operative performance assessment.
Surgical endoscopy , pages1–8, 2019. ross-surgery transfer learning for surgical step recognition Yaniv Taigman, Ming Yang, Marc’Aurelio Ranzato, and Lior Wolf. Web-scale training forface identification. In
Proceedings of the IEEE conference on computer vision and patternrecognition , pages 2746–2754, 2015.Andru P Twinanda, Sherif Shehata, Didier Mutter, Jacques Marescaux, Michel De Mathe-lin, and Nicolas Padoy. Endonet: a deep architecture for recognition tasks on laparoscopicvideos.
IEEE transactions on medical imaging , 36(1):86–97, 2016.Xiaolong Wang, Ross Girshick, Abhinav Gupta, and Kaiming He. Non-local neural net-works. In
Proceedings of the IEEE conference on computer vision and pattern recognition ,pages 7794–7803, 2018.Jason Yosinski, Jeff Clune, Yoshua Bengio, and Hod Lipson. How transferable are featuresin deep neural networks? In
Advances in neural information processing systems , pages3320–3328, 2014.Felix Yu, Gianluca Silva Croso, Tae Soo Kim, Ziang Song, Felix Parker, Gregory D Hager,Austin Reiter, S Swaroop Vedula, Haider Ali, and Shameema Sikder. Assessment ofautomated identification of phases in videos of cataract surgery using machine learningand deep learning techniques.
JAMA network open , 2(4):e191860–e191860, 2019.Odysseas Zisimopoulos, Evangello Flouty, Imanol Luengo, Petros Giataganas, Jean Nehme,Andre Chow, and Danail Stoyanov. Deepphase: surgical phase recognition in cataractsvideos. In
International Conference on Medical Image Computing and Computer-AssistedIntervention , pages 265–272. Springer, 2018.Maya Zohar, Omri Bar, Daniel Neimark, Gregory D Hager, and Dotan Asselmann. Accuratedetection of out of body segments in surgical video using semi-supervised learning. In
Medical Imaging with Deep Learning , pages 923–936. PMLR, 2020. eimark Bar Zohar Hager Asselmann Appendix A. Detailed datasets description.
The datasets characteristics are summarized in Table 4.
Right Hemicolectomy.
This dataset contains 205 videos curated from four different med-ical centers. Each second in the video was annotated, categorized into one of seven clinically-relevant surgical steps: (1) Preparation, (2) Adhesiolysis, (3) Mobilization and Dissection,(4) Specimen Packaging, (5) Anastomosis, (6) Specimen Retrieval, and (7) Final Inspection.
Sleeve Gastrectomy.
This dataset contains 229 videos curated from two medical centers.Each second in the video was annotated, categorized into one of seven clinically-relevantsurgical steps: (1) Preparation, (2) Adhesiolysis, (3) Dissection of Greater Curvature, (4)Gastric Transaction, (5) Reinforcement of Staple Line, (6) Specimen Extraction, and (7)Final Inspection.
Appendectomy.