Stefan Petscharnig | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Stefan Petscharnig is active.

Explore More

Publication

Featured researches published by Stefan Petscharnig.

Multimedia Tools and Applications | 2018

Learning laparoscopic video shot classification for gynecological surgery

Stefan Petscharnig; Klaus Schöffmann

Videos of endoscopic surgery are used for education of medical experts, analysis in medical research, and documentation for everyday clinical life. Hand-crafted image descriptors lack the capabilities of a semantic classification of surgical actions and video shots of anatomical structures. In this work, we investigate how well single-frame convolutional neural networks (CNN) for semantic shot classification in gynecologic surgery work. Together with medical experts, we manually annotate hours of raw endoscopic gynecologic surgery videos showing endometriosis treatment and myoma resection of over 100 patients. The cleaned ground truth dataset comprises 9 h of annotated video material (from 111 different recordings). We use the well-known CNN architectures AlexNet and GoogLeNet and train these architectures for both, surgical actions and anatomy, from scratch. Furthermore, we extract high-level features from AlexNet with weights from a pre-trained model from the Caffe model zoo and feed them to an SVM classifier. Our evaluation shows that we reach an average recall of .697 and .515 for classification of anatomical structures and surgical actions respectively using off-the-shelf CNN features. Using GoogLeNet, we achieve a mean recall of .782 and .617 for classification of anatomical structures and surgical actions respectively. With AlexNet the achieved recall is .615 for anatomical structures and .469 for surgical action classification respectively. The main conclusion of our work is that advances in general image classification methods transfer to the domain of endoscopic surgery videos in gynecology. This is relevant as this domain is different from natural images, e.g. it is distinguished by smoke, reflections, or a limited amount of colors.

conference on multimedia modeling | 2017

Deep Learning for Shot Classification in Gynecologic Surgery Videos

Stefan Petscharnig; Klaus Schöffmann

In the last decade, advances in endoscopic surgery resulted in vast amounts of video data which is used for documentation, analysis, and education purposes. In order to find video scenes relevant for aforementioned purposes, physicians manually search and annotate hours of endoscopic surgery videos. This process is tedious and time-consuming, thus motivating the (semi-)automatic annotation of such surgery videos. In this work, we want to investigate whether the single-frame model for semantic surgery shot classification is feasible and useful in practice. We approach this problem by further training of AlexNet, an already pre-trained CNN architecture. Thus, we are able to transfer knowledge gathered from the Imagenet database to the medical use case of shot classification in endoscopic surgery videos. We annotate hours of endoscopic surgery videos for training and testing data. Our results imply that the CNN-based single-frame classification approach is able to provide useful suggestions to medical experts while annotating video scenes. Hence, the annotation process is consequently improved. Future work shall consider the evaluation of more sophisticated classification methods incorporating the temporal video dimension, which is expected to improve on the baseline evaluation done in this work.

conference on multimedia modeling | 2017

Collaborative Feature Maps for Interactive Video Search

Klaus Schoeffmann; Manfred Jürgen Primus; Bernd Muenzer; Stefan Petscharnig; Christof Karisch; Qing Xu; Wolfgang Huerst

This extended demo paper summarizes our interface used for the Video Browser Showdown (VBS) 2017 competition, where visual and textual known-item search (KIS) tasks, as well as ad-hoc video search (AVS) tasks in a 600-h video archive need to be solved interactively. To this end, we propose a very flexible distributed video search system that combines many ideas of related work in a novel and collaborative way, such that several users can work together and explore the video archive in a complementary manner. The main interface is a perspective Feature Map, which shows keyframes of shots arranged according to a selected content similarity feature (e.g., color, motion, semantic concepts, etc.). This Feature Map is accompanied by additional views, which allow users to search and filter according to a particular content feature. For collaboration of several users we provide a cooperative heatmap that shows a synchronized view of inspection actions of all users. Moreover, we use collaborative re-ranking of shots (in specific views) based on retrieved results of other users.

IEEE Transactions on Multimedia | 2017

Statistically Indifferent Quality Variation: An Approach for Reducing Multimedia Distribution Cost for Adaptive Video Streaming Services

Benjamin Rainer; Stefan Petscharnig; Christian Timmerer; Hermann Hellwagner

Forecasts predict that Internet traffic will continue to grow in the near future. A huge share of this traffic is caused by multimedia streaming. The quality of experience (QoE) of such streaming services is an important aspect and in most cases the goal is to maximize the bit rate which—in some cases—conflicts with the requirements of both consumers and providers. For example, in mobile environments users may prefer a lower bit rate to come along with their data plan. Likewise, providers aim at minimizing bandwidth usage in order to reduce costs by transmitting less data to users while maintaining a high QoE. Todays adaptive video streaming services try to serve users with the highest bit rates that consequently results in high QoE. In practice, however, some of these high bit rate representations may not differ significantly in terms of perceived video quality compared to lower bit rate representations. In this paper, we present a novel approach to determine the statistically indifferent quality variation of adjacent video representations for adaptive video streaming services by adopting standard objective quality metrics and existing QoE models. In particular, whenever the quality variation between adjacent representations is imperceptible from a statistical point of view, the representation with higher bit rate can be substituted with a lower bit rate representation. As expected, this approach results in savings with respect to bandwidth consumption while still providing a high QoE for users. The approach is evaluated subjectively with a crowdsourcing study. Additionally, we highlight the benefits of our approach, by providing a case study that extrapolates possible savings for providers.

acm multimedia | 2017

Real-Time Image-based Smoke Detection in Endoscopic Videos

Andreas Leibetseder; Manfred Jürgen Primus; Stefan Petscharnig; Klaus Schoeffmann

The nature of endoscopy as a type of minimally invasive surgery (MIS) requires surgeons to perform complex operations by merely inspecting a live camera feed. Inherently, a successful intervention depends upon ensuring proper working conditions, such as skillful camera handling, adequate lighting and removal of confounding factors, such as fluids or smoke. The latter is an undesirable byproduct of cauterizing tissue and not only constitutes a health hazard for the medical staff as well as the treated patients, it can also considerably obstruct the operating physicians field of view. Therefore, as a standard procedure the gaseous matter is evacuated by using specialized smoke suction systems that typically are activated manually whenever considered appropriate. We argue that image-based smoke detection can be employed to undertake such a decision, while as well being a useful indicator for relevant scenes in post-procedure analyses. This work represents a continued effort to previously conducted studies utilizing pre-trained convolutional neural networks (CNNs) and threshold-based saturation analysis. Specifically, we explore further methodologies for comparison and provide as well as evaluate a public dataset comprising over 100K smoke/non-smoke images extracted from the Cholec80 dataset, which is composed of 80 different cholecystectomy procedures. Having applied deep learning to merely 20K images of a custom dataset, we achieve Receiver Operating Characteristic (ROC) curves enclosing areas of over 0.98 for custom datasets and over 0.77 for the public dataset. Surprisingly, a fixed threshold for saturation-based histogram analysis still yields areas of over 0.78 and 0.75.

quality of multimedia experience | 2015

Is one second enough? Evaluating QoE for inter-destination multimedia synchronization using human computation and crowdsourcing

Benjamin Rainer; Stefan Petscharnig; Christian Timmerer; Hermann Hellwagner

Modern-age technology enables us to consume multimedia for enjoyment and as a social experience. The traditional way to consume multimedia together (e.g., with family or friends in the living room) is being superseded by a location-independent scenario where geographically distributed users consume the same content while having a real-time communication channel among each other. Inter-Destination Multimedia Synchronization (IDMS) is the tool of choice in order to enable users a high-quality multimedia experience. In this paper, we investigate the influence of asynchronism when consuming multimedia content together while being geographically distributed. In particular, we adopt the concept of human computation and developed a reaction game which we used to conduct a crowdsourced subjective quality assessment in order to evaluate a threshold for multimedia synchronization within an IDMS scenario. Our results show a significant decrease in overall Quality of Experience (QOE) at an asynchronism level of 750ms. At the same time, we were able to show that asynchronism at a level of 400ms does not have significant differences regarding the QoE when compared to the synchronous reference case.

acm sigmm conference on multimedia systems | 2018

Lapgyn4: a dataset for 4 automatic content analysis problems in the domain of laparoscopic gynecology

Andreas Leibetseder; Stefan Petscharnig; Manfred Jürgen Primus; Sabrina Kletz; Bernd Münzer; Klaus Schoeffmann; Jörg Keckstein

Modern imaging technology enables medical practitioners to perform minimally invasive surgery (MIS), i.e. a variety of medical interventions inflicting minimal trauma upon patients, hence, greatly improving their recoveries. Not only patients but also surgeons can benefit from this technology, as recorded media can be utilized for speeding-up tedious and time-consuming tasks such as treatment planning or case documentation. In order to improve the predominantly manually conducted process of analyzing said media, with this work we publish four datasets extracted from gynecologic, laparoscopic interventions with the intend on encouraging research in the field of post-surgical automatic media analysis. These datasets are designed with the following use cases in mind: medical image retrieval based on a query image, detection of instrument counts, surgical actions and anatomical structures, as well as distinguishing on which anatomical structure a certain action is performed. Furthermore, we provide suggestions for evaluation metrics and first baseline experiments.

acm sigmm conference on multimedia systems | 2015

Merge and forward: self-organized inter-destination multimedia synchronization

Benjamin Rainer; Stefan Petscharnig; Christian Timmerer

Social networks have become ubiquitous and with these new possible ways for social communication and experiencing multimedia together the traditional TV scenario drifts more and more towards a distributed social experience. Asynchronism in the multimedia playback of the users may have a significant impact on the acceptability of systems providing the distributed multimedia experience. The synchronization needed in such systems is called Inter-Destination Multimedia Synchronization. In this paper we propose a demo that implements IDMS by the means of our self-organized and distributed approach assisted by pull-based streaming. We also provide a video of the planned demonstration and provide the mobile application as open source licensed under the GNU LGPL.

Multimedia Tools and Applications | 2018

Video retrieval in laparoscopic video recordings with dynamic content descriptors

Klaus Schoeffmann; Heinrich Husslein; Sabrina Kletz; Stefan Petscharnig; Bernd Muenzer; Christian Beecks

In the domain of gynecologic surgery an increasing number of surgeries are performed in a minimally invasive manner. These laparoscopic surgeries require specific psychomotor skills of the operating surgeon, which are difficult to learn and teach. This is the reason why an increasing number of surgeons promote checking video recordings of laparoscopic surgeries for the occurrence of technical errors with surgical actions. This manual surgical quality assessment (SQA) process, however, is very cumbersome and time-consuming when carried out without any support from content-based video retrieval. In this work we propose a video content descriptor called MIDD (Motion Intensity and Direction Descriptor) that can be effectively used to find similar segments in a laparoscopic video database and thereby help surgeons to more quickly inspect other instances of a given error scene. We evaluate the retrieval performance of MIDD with surgical actions from gynecologic surgery in direct comparison to several other dynamic content descriptors. We show that the MIDD descriptor significantly outperforms the state-of-the-art in terms of retrieval performance as well as in terms of runtime performance. Additionally, we release the manually created video dataset of 16 classes of surgical actions from medical laparoscopy to the public, for further evaluations.

CARE/CLIP@MICCAI | 2017

Image-Based Smoke Detection in Laparoscopic Videos

Andreas Leibetseder; Manfred Jürgen Primus; Stefan Petscharnig; Klaus Schoeffmann

The development and improper removal of smoke during minimally invasive surgery (MIS) can considerably impede a patient’s treatment, while additionally entailing serious deleterious health effects. Hence, state-of-the-art surgical procedures employ smoke evacuation systems, which often still are activated manually by the medical staff or less commonly operate automatically utilizing industrial, highly-specialized and operating room (OR) approved sensors. As an alternate approach, video analysis can be used to take on said detection process – a topic not yet much researched in aforementioned context. In order to advance in this sector, we propose utilizing an image-based smoke classification task on a pre-trained convolutional neural network (CNN). We provide a custom data set of over 30 000 laparoscopic smoke/non-smoke images, part of which served as training data for GoogLeNet-based [41] CNN models. To be able to compare our research for evaluation, we separately developed a non-CNN classifier based on observing the saturation channel of a sample picture in the HSV color space. While the deep learning approaches yield excellent results with Receiver Operating Characteristic (ROC) curves enclosing areas of over 0.98, the computationally much less costly analysis of an image’s saturation histogram under certain circumstances can, surprisingly, as well be a good indicator for smoke with areas under the curves (AUCs) of around 0.92–0.97.

Explore More