Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Michael Boelstoft Holte is active.

Publication


Featured researches published by Michael Boelstoft Holte.


Computer Vision and Image Understanding | 2012

Selective spatio-temporal interest points

Bhaskar Chakraborty; Michael Boelstoft Holte; Thomas B. Moeslund; Jordi Gonzílez

Recent progress in the field of human action recognition points towards the use of Spatio-Temporal Interest Points (STIPs) for local descriptor-based recognition strategies. In this paper, we present a novel approach for robust and selective STIP detection, by applying surround suppression combined with local and temporal constraints. This new method is significantly different from existing STIP detection techniques and improves the performance by detecting more repeatable, stable and distinctive STIPs for human actors, while suppressing unwanted background STIPs. For action representation we use a bag-of-video words (BoV) model of local N-jet features to build a vocabulary of visual-words. To this end, we introduce a novel vocabulary building strategy by combining spatial pyramid and vocabulary compression techniques, resulting in improved performance and efficiency. Action class specific Support Vector Machine (SVM) classifiers are trained for categorization of human actions. A comprehensive set of experiments on popular benchmark datasets (KTH and Weizmann), more challenging datasets of complex scenes with background clutter and camera motion (CVC and CMU), movie and YouTube video clips (Hollywood 2 and YouTube), and complex scenes with multiple actors (MSR I and Multi-KTH), validates our approach and show state-of-the-art performance. Due to the unavailability of ground truth action annotation data for the Multi-KTH dataset, we introduce an actor specific spatio-temporal clustering of STIPs to address the problem of automatic action annotation of multiple simultaneous actors. Additionally, we perform cross-data action recognition by training on source datasets (KTH and Weizmann) and testing on completely different and more challenging target datasets (CVC, CMU, MSR I and Multi-KTH). This documents the robustness of our proposed approach in the realistic scenario, using separate training and test datasets, which in general has been a shortcoming in the performance evaluation of human action recognition techniques.


IEEE Journal of Selected Topics in Signal Processing | 2012

Human Pose Estimation and Activity Recognition From Multi-View Videos: Comparative Explorations of Recent Developments

Michael Boelstoft Holte; Cuong Tran; Mohan M. Trivedi; Thomas B. Moeslund

This paper presents a review and comparative study of recent multi-view approaches for human 3D pose estimation and activity recognition. We discuss the application domain of human pose estimation and activity recognition and the associated requirements, covering: advanced human–computer interaction (HCI), assisted living, gesture-based interactive games, intelligent driver assistance systems, movies, 3D TV and animation, physical therapy, autonomous mental development, smart environments, sport motion analysis, video surveillance, and video annotation. Next, we review and categorize recent approaches which have been proposed to comply with these requirements. We report a comparison of the most promising methods for multi-view human action recognition using two publicly available datasets: the INRIA Xmas Motion Acquisition Sequences (IXMAS) Multi-View Human Action Dataset, and the i3DPost Multi-View Human Action and Interaction Dataset. To compare the proposed methods, we give a qualitative assessment of methods which cannot be compared quantitatively, and analyze some prominent 3D pose estimation techniques for application, where not only the performed action needs to be identified but a more detailed description of the body pose and joint configuration. Finally, we discuss some of the shortcomings of multi-view camera setups and outline our thoughts on future directions of 3D body pose estimation and human action recognition.


Computer Vision and Image Understanding | 2010

View-invariant gesture recognition using 3D optical flow and harmonic motion context

Michael Boelstoft Holte; Thomas B. Moeslund; Preben Fihl

This paper presents an approach for view-invariant gesture recognition. The approach is based on 3D data captured by a SwissRanger SR4000 camera. This camera produces both a depth map as well as an intensity image of a scene. Since the two information types are aligned, we can use the intensity image to define a region of interest for the relevant 3D data. This data fusion improves the quality of the motion detection and hence results in better recognition. The gesture recognition is based on finding motion primitives (temporal instances) in the 3D data. Motion is detected by a 3D version of optical flow and results in velocity annotated point clouds. The 3D motion primitives are represented efficiently by introducing motion context. The motion context is transformed into a view-invariant representation using spherical harmonic basis functions, yielding a harmonic motion context representation. A probabilistic Edit Distance classifier is applied to identify which gesture best describes a string of primitives. The approach is trained on data from one viewpoint and tested on data from a very different viewpoint. The recognition rate is 94.4% which is similar to the recognition rate when training and testing on gestures from the same viewpoint, hence the approach is indeed view-invariant.


computer vision and pattern recognition | 2008

Fusion of range and intensity information for view invariant gesture recognition

Michael Boelstoft Holte; Thomas B. Moeslund; Preben Fihl

This paper presents a system for view invariant gesture recognition. The approach is based on 3D data from a CSEM SwissRanger SR-2 camera. This camera produces both a depth map as well as an intensity image of a scene. Since the two information types are aligned, we can use the intensity image to define a region of interest for the relevant 3D data. This data fusion improves the quality of the range data and hence results in better recognition. The gesture recognition is based on finding motion primitives in the 3D data. The primitives are represented compactly and view invariant using harmonic shape context. A probabilistic Edit Distance classifier is applied to identify which gesture best describes a string of primitives. The approach is trained on data from one viewpoint and tested on data from a different viewpoint. The recognition rate is 92.9% which is similar to the recognition rate when training and testing on gestures from the same viewpoint, hence the approach is indeed view invariant.


international conference on 3d imaging, modeling, processing, visualization & transmission | 2011

3D Human Action Recognition for Multi-view Camera Systems

Michael Boelstoft Holte; Thomas B. Moeslund; Nikos Nikolaidis; Ioannis Pitas

This paper presents a novel approach for combining optical flow into enhanced 3D motion vector fields for human action recognition. Our approach detects motion of the actors by computing optical flow in video data captured by a multi-view camera setup with an arbitrary number of views. Optical flow is estimated in each view and extended to 3D using 3D reconstructions of the actors and pixel-to-vertex correspondences. The resulting 3D optical flow for each view is combined into a 3D motion vector field by taking the significance of local motion and its reliability into account. 3D Motion Context (3D-MC) and Harmonic Motion Context (HMC) are used to represent the extracted 3D motion vector fields efficiently and in a view-invariant manner, while considering difference in anthropometry of the actors and their movement style variations. The resulting 3D-MC and HMC descriptors are classified into a set of human actions using normalized correlation, taking into account the performing speed variations of different actors. We compare the performance of the 3D-MC and HMC descriptors, and show promising experimental results for the publicly available i3DPost Multi View Human Action Dataset.


international conference on computer vision | 2009

Detection and removal of chromatic moving shadows in surveillance scenarios

Ivan Huerta; Michael Boelstoft Holte; Thomas B. Moeslund; Jordi Gonzàlez

Segmentation in the surveillance domain has to deal with shadows to avoid distortions when detecting moving objects. Most segmentation approaches dealing with shadow detection are typically restricted to penumbra shadows. Therefore, such techniques cannot cope well with umbra shadows. Consequently, umbra shadows are usually detected as part of moving objects. In this paper we present a novel technique based on gradient and colour models for separating chromatic moving cast shadows from detected moving objects. Firstly, both a chromatic invariant colour cone model and an invariant gradient model are built to perform automatic segmentation while detecting potential shadows. In a second step, regions corresponding to potential shadows are grouped by considering “a bluish effect” and an edge partitioning. Lastly, (i) temporal similarities between textures and (ii) spatial similarities between chrominance angle and brightness distortions are analysed for all potential shadow regions in order to finally identify umbra shadows. Unlike other approaches, our method does not make any a-priori assumptions about camera location, surface geometries, surface textures, shapes and types of shadows, objects, and background. Experimental results show the performance and accuracy of our approach in different shadowed materials and illumination conditions.


IEEE Journal of Selected Topics in Signal Processing | 2012

A Local 3-D Motion Descriptor for Multi-View Human Action Recognition from 4-D Spatio-Temporal Interest Points

Michael Boelstoft Holte; Bhaskar Chakraborty; Jordi Gonzàlez; Thomas B. Moeslund

In this paper, we address the problem of human action recognition in reconstructed 3-D data acquired by multi-camera systems. We contribute to this field by introducing a novel 3-D action recognition approach based on detection of 4-D (3-D space


Proceedings of the 2011 joint ACM workshop on Human gesture and behavior understanding | 2011

Human action recognition using multiple views: a comparative perspective on recent developments

Michael Boelstoft Holte; Cuong Tran; Mohan M. Trivedi; Thomas B. Moeslund

+


international conference on computer vision | 2011

A selective spatio-temporal interest point detector for human action recognition in complex scenes

Bhaskar Chakraborty; Michael Boelstoft Holte; Thomas B. Moeslund; Jordi Gonzàlez; F. Xavier Roca

time) spatio-temporal interest points (STIPs) and local description of 3-D motion features. STIPs are detected in multi-view images and extended to 4-D using 3-D reconstructions of the actors and pixel-to-vertex correspondences of the multi-camera setup. Local 3-D motion descriptors, histogram of optical 3-D flow (HOF3D), are extracted from estimated 3-D optical flow in the neighborhood of each 4-D STIP and made view-invariant. The local HOF3D descriptors are divided using 3-D spatial pyramids to capture and improve the discrimination between arm- and leg-based actions. Based on these pyramids of HOF3D descriptors we build a bag-of-words (BoW) vocabulary of human actions, which is compressed and classified using agglomerative information bottleneck (AIB) and support vector machines (SVMs), respectively. Experiments on the publicly available i3DPost and IXMAS datasets show promising state-of-the-art results and validate the performance and view-invariance of the approach.


International Journal of Intelligent Systems Technologies and Applications | 2008

View invariant gesture recognition using the CSEM SwissRanger SR-2 camera

Michael Boelstoft Holte; Thomas B. Moeslund; Preben Fihl

This paper presents a review and comparative study of recent multi-view 2D and 3D approaches for human action recognition. The approaches are reviewed and categorized due to their nature. We report a comparison of the most promising methods using two publicly available datasets: the INRIA Xmas Motion Acquisition Sequences (IXMAS) and the i3DPost Multi-View Human Action and Interaction Dataset. Additionally, we discuss some of the shortcomings of multi-view camera setups and outline our thoughts on future directions of 3D human action recognition.

Collaboration


Dive into the Michael Boelstoft Holte's collaboration.

Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Jordi Gonzàlez

Autonomous University of Barcelona

View shared research outputs
Top Co-Authors

Avatar

Bhaskar Chakraborty

Autonomous University of Barcelona

View shared research outputs
Top Co-Authors

Avatar

Cuong Tran

University of California

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Ivan Huerta

Università Iuav di Venezia

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Researchain Logo
Decentralizing Knowledge