Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Oswald Lanz is active.

Publication


Featured researches published by Oswald Lanz.


IEEE Transactions on Pattern Analysis and Machine Intelligence | 2006

Approximate Bayesian multibody tracking

Oswald Lanz

Visual tracking of multiple targets is a challenging problem, especially when efficiency is an issue. Occlusions, if not properly handled, are a major source of failure. Solutions supporting principled occlusion reasoning have been proposed but are yet unpractical for online applications. This paper presents a new solution which effectively manages the trade-off between reliable modeling and computational efficiency. The hybrid joint-separable (HJS) filter is derived from a joint Bayesian formulation of the problem, and shown to be efficient while optimal in terms of compact belief representation. Computational efficiency is achieved by employing a Markov random field approximation to joint dynamics and an incremental algorithm for posterior update with an appearance likelihood that implements a physically-based model of the occlusion process. A particle filter implementation is proposed which achieves accurate tracking during partial occlusions, while in cases of complete occlusion, tracking hypotheses are bound to estimated occlusion volumes. Experiments show that the proposed algorithm is efficient, robust, and able to resolve long-term occlusions between targets with identical appearance


international conference on computer vision | 2013

No Matter Where You Are: Flexible Graph-Guided Multi-task Learning for Multi-view Head Pose Classification under Target Motion

Yan Yan; Elisa Ricci; Ramanathan Subramanian; Oswald Lanz; Nicu Sebe

We propose a novel Multi-Task Learning framework (FEGA-MTL) for classifying the head pose of a person who moves freely in an environment monitored by multiple, large field-of-view surveillance cameras. As the target (person) moves, distortions in facial appearance owing to camera perspective and scale severely impede performance of traditional head pose classification methods. FEGA-MTL operates on a dense uniform spatial grid and learns appearance relationships across partitions as well as partition-specific appearance variations for a given head pose to build region-specific classifiers. Guided by two graphs which a-priori model appearance similarity among (i) grid partitions based on camera geometry and (ii) head pose classes, the learner efficiently clusters appearance wise related grid partitions to derive the optimal partitioning. For pose classification, upon determining the targets position using a person tracker, the appropriate region specific classifier is invoked. Experiments confirm that FEGA-MTL achieves state-of-the-art classification with few training data.


IEEE Transactions on Pattern Analysis and Machine Intelligence | 2016

A Multi-Task Learning Framework for Head Pose Estimation under Target Motion

Yan Yan; Elisa Ricci; Ramanathan Subramanian; Gaowen Liu; Oswald Lanz; Nicu Sebe

Recently, head pose estimation (HPE) from low-resolution surveillance data has gained in importance. However, monocular and multi-view HPE approaches still work poorly under target motion, as facial appearance distorts owing to camera perspective and scale changes when a person moves around. To this end, we propose FEGA-MTL, a novel framework based on Multi-Task Learning (MTL) for classifying the head pose of a person who moves freely in an environment monitored by multiple, large field-of-view surveillance cameras. Upon partitioning the monitored scene into a dense uniform spatial grid, FEGA-MTL simultaneously clusters grid partitions into regions with similar facial appearance, while learning region-specific head pose classifiers. In the learning phase, guided by two graphs which a-priori model the similarity among (1) grid partitions based on camera geometry and (2) head pose classes, FEGA-MTL derives the optimal scene partitioning and associated pose classifiers. Upon determining the targets position using a person tracker at test time, the corresponding region-specific classifier is invoked for HPE. The FEGA-MTL framework naturally extends to a weakly supervised setting where the targets walking direction is employed as a proxy in lieu of head orientation. Experiments confirm that FEGA-MTL significantly outperforms competing single-task and multi-task learning methods in multi-view settings.


international conference on image processing | 2013

Multi-scale f-formation discovery for group detection

Francesco Setti; Oswald Lanz; Roberta Ferrario; Vittorio Murino; Marco Cristani

We present an unsupervised approach for the automatic detection of static interactive groups. The approach builds upon a novel multi-scale Hough voting policy, which incorporates in a flexible way the sociological notion of group as F-formation; the goal is to model at the same time small arrangements of close friends and aggregations of many individuals spread over a large area. Our technique is based on a competition of different voting sessions, each one specialized for a particular group cardinality; all the votes are then evaluated using information theoretic criteria, producing the final set of groups. The proposed technique has been applied on public benchmark sequences and a novel cocktail party dataset, evaluating new group detection metrics and obtaining state-of-the-art performances.


international conference on multimodal interfaces | 2013

On the relationship between head pose, social attention and personality prediction for unstructured and dynamic group interactions

Ramanathan Subramanian; Yan Yan; Jacopo Staiano; Oswald Lanz; Nicu Sebe

Correlates between social attention and personality traits have been widely acknowledged in social psychology studies. Head pose has commonly been employed as a proxy for determining the social attention direction in small group interactions. However, the impact of head pose estimation errors on personality estimates has not been studied to our knowledge. In this work, we consider the unstructured and dynamic cocktail party scenario where the scene is captured by multiple, large field-of-view cameras. Head pose estimation is a challenging task under these conditions owing to the uninhibited motion of persons (due to which facial appearance varies owing to perspective and scale changes), and the low resolution of captured faces. Based on proxemic and social attention features computed from position and head pose annotations, we first demonstrate that social attention features are excellent predictors of the Extraversion and Neuroticism personality traits. We then repeat classification experiments with behavioral features computed from automated estimates-- obtained experimental results show that while prediction performance for both traits is affected by head pose estimation errors, the impact is more adverse for Extraversion.


CLEaR | 2006

A generative approach to audio-visual person tracking

Roberto Brunelli; Alessio Brutti; Paul Chippendale; Oswald Lanz; Maurizio Omologo; Piergiorgio Svaizer; Francesco Tobia

This paper focuses on the integration of acoustic and visual information for people tracking. The system presented relies on a probabilistic framework within which information from multiple sources is integrated at an intermediate stage. An advantage of the method proposed is that of using a generative approach which supports easy and robust integration of multi source information by means of sampled projection instead of triangulation. The system described has been developed in the EU funded CHIL Project research activities. Experimental results from the CLEAR evaluation workshop are reported.


acm multimedia | 2015

Analyzing Free-standing Conversational Groups: A Multimodal Approach

Xavier Alameda-Pineda; Yan Yan; Elisa Ricci; Oswald Lanz; Nicu Sebe

During natural social gatherings, humans tend to organize themselves in the so-called free-standing conversational groups. In this context, robust head and body pose estimates can facilitate the higher-level description of the ongoing interplay. Importantly, visual information typically obtained with a distributed camera network might not suffice to achieve the robustness sought. In this line of thought, recent advances in wearable sensing technology open the door to multimodal and richer information flows. In this paper we propose to cast the head and body pose estimation problem into a matrix completion task. We introduce a framework able to fuse multimodal data emanating from a combination of distributed and wearable sensors, taking into account the temporal consistency, the head/body coupling and the noise inherent to the scenario. We report results on the novel and challenging SALSA dataset, containing visual, auditory and infrared recordings of 18 people interacting in a regular indoor environment. We demonstrate the soundness of the proposed method and the usability for higher-level tasks such as the detection of F-formations and the discovery of social attention attractors.


International Journal of Computer Vision | 2014

Exploring Transfer Learning Approaches for Head Pose Classification from Multi-view Surveillance Images

Anoop Kolar Rajagopal; Ramanathan Subramanian; Elisa Ricci; Radu L. Vieriu; Oswald Lanz; Ramakrishnan Kalpathi; Nicu Sebe

Head pose classification from surveillance images acquired with distant, large field-of-view cameras is difficult as faces are captured at low-resolution and have a blurred appearance. Domain adaptation approaches are useful for transferring knowledge from the training (source) to the test (target) data when they have different attributes, minimizing target data labeling efforts in the process. This paper examines the use of transfer learning for efficient multi-view head pose classification with minimal target training data under three challenging situations: (i) where the range of head poses in the source and target images is different, (ii) where source images capture a stationary person while target images capture a moving person whose facial appearance varies under motion due to changing perspective, scale and (iii) a combination of (i) and (ii). On the whole, the presented methods represent novel transfer learning solutions employed in the context of multi-view head pose classification. We demonstrate that the proposed solutions considerably outperform the state-of-the-art through extensive experimental validation. Finally, the DPOSE dataset compiled for benchmarking head pose classification performance with moving persons, and to aid behavioral understanding applications is presented in this work.


international conference on image analysis and processing | 2007

An information theoretic rule for sample size adaptation in particle filtering

Oswald Lanz

To become robust, a tracking algorithm must be able to support uncertainty and ambiguity often inherently present in the data in form of occlusion and clutter. This comes usually at the price of more demanding computations. Sampling methods, such as the popular particle filter, accommodate this capability and provide a means of controlling the computational trade-off by adapting their resolution. This paper presents a method for adapting resolution on-the-fly to current demands. The key idea is to select the number of samples necessary to populate the high probability regions with a predefined density. The scheme then allocates more particles when uncertainty is high while saving resources otherwise. The resulting tracker propagates compact while consistent representations and enables for reliable real time operation otherwise compromised.


international conference on multisensor fusion and integration for intelligent systems | 2006

Dynamic Head Location and Pose from Video

Oswald Lanz; Roberto Brunelli

In this paper we present a visual particle filter for jointly tracking the position of a person and her head orientation. The resulting information may be used to support automatic analysis of interactive people behaviour, by supporting proxemics analysis and providing dynamic information on focus of attention. An orientation-sensitive visual likelihood is proposed which models the appearance of the target on a key-view basis, and uses body part color histograms as descriptors. The resulting system is able to provide reliable, real time estimates of people position and orientation, with a scalable number of sensors. It is then possible to adaptively allocate computational resources and sensors to increase people monitoring coverage or tracking accuracy. The integration of multi-view sensing, the joint estimation of location and orientation, the use of generative image models, and of simple visual matching measures, make the system robust to low image resolution and sensors failure

Collaboration


Dive into the Oswald Lanz's collaboration.

Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Yan Yan

University of Trento

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Tao Hu

University of Trento

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Researchain Logo
Decentralizing Knowledge