Marcus Thaler | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Marcus Thaler is active.

Explore More

Publication

Featured researches published by Marcus Thaler.

conference on visual media production | 2011

Real-time Person Tracking in High-resolution Panoramic Video for Automated Broadcast Production

Rene Kaiser; Marcus Thaler; Andreas Kriechbaum; Hannes Fassold; Werner Bailer; Jakub Rosner

For enabling immersive user experiences for interactive TV services and automating camera view selection and framing, knowledge of the location of persons in a scene is essential. We describe an architecture for detecting and tracking persons in high-resolution panoramic video streams, obtained from the Omni Cam, a panoramic camera stitching video streams from 6 HD resolution tiles. We use a CUDA accelerated feature point tracker, a blob detector and a CUDA HOG person detector, which are used for region tracking in each of the tiles before fusing the results for the entire panorama. In this paper we focus on the application of the HOG person detector in real-time and the speedup of the feature point tracker by porting it to NVIDIAs Fermi architecture. Evaluations indicate significant speedup for our feature point tracker implementation, enabling the entire process in a real-time system.

advanced video and signal based surveillance | 2010

Automatic Inter-image Homography Estimation from Person Detections

Marcus Thaler; Roland Mörzinger

Inter-image homographies are essential for many differenttasks involving projective geometry. This paper proposesan adaptive correspondence estimation approach betweenperson detections in a planar scene not relying oncorrespondence features as it is the case in many otherRANSAC-based approaches. The result is a planar interimagehomography calculated from estimated point correspondences.The approach is self-configurable, adaptiveand provides robustness over time by exploiting temporaland geometric information. We demonstrate the manifoldapplicability of the proposed approach on a variety ofdatasets. Improved results compared to a common baselineapproach are shown and the influence of error sources suchas missed detections, false detections and non overlappingfield of views is investigated.

conference on multimedia modeling | 2016

Quality Analysis on Mobile Devices for Real-Time Feedback

Stefanie Wechtitsch; Hannes Fassold; Marcus Thaler; Krzysztof Kozłowski; Werner Bailer

Media capture of live events such as concerts can be improved by including user generated content, adding more perspectives and possibly covering scenes outside the scope of professional coverage. In this paper we propose methods for visual quality analysis on mobile devices, in order to provide direct feedback to the contributing user about the quality of the captured content. Thus, wasting bandwidth and battery for uploading/streaming low-quality content can be avoided. We focus on real-time quality analysis that complements information that can be obtained from other sensors e.g., stability. The proposed methods include real-time capable algorithms for sharpness, noise and over-/ underexposure which are integrated in a capture app for Android. Objective evaluation results show that our algorithms are competitive to state-of-the art quality algorithms while enabling real-time quality feedback on mobile devices.

acm sigmm conference on multimedia systems | 2015

Multi-sensor concert recording dataset including professional and user-generated content

Werner Bailer; Chris Pike; Rik Bauwens; Reinhard Grandl; Mike Matton; Marcus Thaler

We present a novel dataset for multi-view video and spatial audio. An ensemble of ten musicians from the BBC Philharmonic Orchestra performed in the orchestras rehearsal studio in Salford, UK, on 25th March 2014. This presented a controlled environment in which to capture a dataset that could be used to simulate a large event, whilst allowing control over the conditions and performance. The dataset consists of hundreds of video and audio clips captured during 18 takes of performances, using a broad range of professional-and consumer-grade equipment, up to 4K video and high-end spatial microphones. In addition to the audiovisual essence, sensor metadata has been captured, and ground truth annotations, in particular for temporal synchronization and spatial alignment, have been created. A part of the dataset has also been prepared for adaptive content streaming. The dataset is released under a Creative Commons Attribution Non-Commercial Share Alike license and hosted on a specifically adapted content management platform.

advanced video and signal based surveillance | 2011

Improved person detection in industrial environments using multiple self-calibrated cameras

Roland Mörzinger; Marcus Thaler; Severin Stalder; Helmut Grabner; Luc Van Gool

Person detection is a challenging task in industrial environments which typically feature rapidly changing conditions of illumination and the presence of occluding objects and cluttered background. This paper proposes a series of algorithms for improving the robustness of person detection in such harsh industrial environments. Based on a state-of-the-art person detector, significant robustness and automation is achieved by introducing automatic ground plane estimation, confidence filtering, cross-camera correspondence estimation and multi-camera fusion. Detailed experiments made on an industrial dataset that captures an automotive assembly process show the stepwise improvement when combining the above mentioned techniques in a fully unsupervised manner.

content based multimedia indexing | 2015

A GPU-accelerated two stage visual matching pipeline for image and video retrieval

Hannes Fassold; Harald Stiegler; Jakub Rosner; Marcus Thaler; Werner Bailer

We propose a two stage visual matching pipeline including a first step using VLAD signatures for filtering results, and a second step which reranks the top results using raw matching of SIFT descriptors. This enables adjusting the tradeoff between high computational cost of matching local descriptors and the insufficient accuracy of compact signatures in many application scenarios. We describe GPU accelerated extraction and matching algorithms for SIFT, which result in a speedup factor of at least 4. The VLAD filtering step reduces the number of images/frames for which the local descriptors need to be matched, thus speeding up retrieval by an additional factor of 9-10 without sacrificing mean average precision over full raw descriptor matching.

Proceedings of the 7th ACM International Workshop on Mobile Video | 2015

Real-time video quality analysis on mobile devices

Hannes Fassold; Stefanie Wechtitsch; Marcus Thaler; Krzysztof Kozłowski; Werner Bailer

Media capture of live events such as concerts can be improved by including user generated content. In this paper we propose methods for visual quality analysis on mobile devices, in order to provide direct feedback to the contributing user about the quality of the captured content and avoid wasting bandwidth and battery for uploading/streaming low-quality content. We focus on quality analysis that complements information we can obtain from other sensors. The proposed methods include real-time algorithms for sharpness, noise and over-/underexposure.

computer vision and pattern recognition | 2013

Real-Time Person Detection and Tracking in Panoramic Video

Marcus Thaler; Werner Bailer

The format agnostic production paradigm has been proposed to offer more engaging live broadcasts to the audience while ensuring the cost-efficiency of the production. An ultra-HD resolution panorama is captured, and streams for different devices and user profiles are semi-automatically generated. Information about person positions and trajectories in the video are important cues for making editing decisions for sports content. In this paper we describe a real-time person detection and tracking system for panoramic video. The approach extends our earlier tracking by detection algorithm by addressing a number of robustness issues that are especially relevant in sports content. The design of the approach is strongly driven by the requirement to process high-resolution video in real-time. We show that we can achieve improvements of the robustness of the algorithm while being able to perform real-time processing.

acm multimedia | 2010

Tools for semi-automatic monitoring of industrial workflows

Roland Mörzinger; Manolis Sardis; Igor Rosenberg; Helmut Grabner; Galina V. Veres; Imed Bouchrika; Marcus Thaler; René Schuster; Albert Hofmann; Georg Thallinger; Vasileios Anagnostopoulos; Dimitrios I. Kosmopoulos; Athanasios Voulodimos; Constantinos Lalos; Nikolaos D. Doulamis; Theodora A. Varvarigou; Rolando Palma Zelada; Ignacio Jubert Soler; Severin Stalder; Luc Van Gool; Lee Middleton; Zoheir Sabeur; Banafshe Arbab-Zavar; John N. Carter; Mark S. Nixon

This paper describes a tool chain for monitoring complex workflows. Statistics obtained from automatic workflow monitoring in a car assembly environment assist in improving industrial safety and process quality. To this end, we propose automatic detection and tracking of humans and their activity in multiple networked cameras. The described tools offer human operators retrospective analysis of a huge amount of pre-recorded and analyzed footage from multiple cameras in order to get a comprehensive overview of the workflows. Furthermore, the tools help technical administrators in adjusting algorithms by letting the user correct detections (for relevance feedback) and ground truth for evaluation. Another important feature of the tool chain is the capability to inform the employees about potentially risky conditions using the tool for automatic detection of unusual scenes.

conference on multimedia modeling | 2017

Compressing Visual Descriptors of Image Sequences

Werner Bailer; Stefanie Wechtitsch; Marcus Thaler

In recent years, there has been significant progress in developing more compact visual descriptors, typically by aggregating local descriptors. However, all these methods are descriptors for still images, and are typically applied independently to (key) frames when used in tasks such as instance search in video. Thus, they do not make use of the temporal redundancy of the video, which has negative impacts on the descriptor size and the matching complexity. We propose a compressed descriptor for image sequences, which encodes a segment of video using a single descriptor. The proposed approach is a framework that can be used with different local descriptors, including compact descriptors. We describe the extraction and matching process for the descriptor and provide evaluation results on a large video data set.

Explore More