Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Jean-Marc Odobez is active.

Publication


Featured researches published by Jean-Marc Odobez.


international conference on computer vision | 2005

Modeling scenes with local descriptors and latent aspects

Pedro Quelhas; Florent Monay; Jean-Marc Odobez; Daniel Gatica-Perez; T. Tuytelaars; L. Van Gool

We present a new approach to model visual scenes in image collections, based on local invariant features and probabilistic latent space models. Our formulation provides answers to three open questions:(l) whether the invariant local features are suitable for scene (rather than object) classification; (2) whether unsupennsed latent space models can be used for feature extraction in the classification task; and (3) whether the latent space formulation can discover visual co-occurrence patterns, motivating novel approaches for image organization and segmentation. Using a 9500-image dataset, our approach is validated on each of these issues. First, we show with extensive experiments on binary and multi-class scene classification tasks, that a bag-of-visterm representation, derived from local invariant descriptors, consistently outperforms state-of-the-art approaches. Second, we show that probabilistic latent semantic analysis (PLSA) generates a compact scene representation, discriminative for accurate classification, and significantly more robust when less training data are available. Third, we have exploited the ability of PLSA to automatically extract visually meaningful aspects, to propose new algorithms for aspect-based image ranking and context-sensitive image segmentation.


Pattern Recognition | 2004

Text detection and recognition in images and video frames

Datong Chen; Jean-Marc Odobez

Text embedded in images and videos represents a rich source of information for content-based indexing and retrieval applications. In this paper, we present a new method for localizing and recognizing text in complex images and videos. Text localization is performed in a two step approach that combines the speed of a focusing step with the strength of a machine learning based text verification step. The experiments conducted show that the support vector machine is more appropriate for the verification task than the more commonly used neural networks. To perform text recognition on the localized regions, we propose a new multi-hypotheses method. Assuming different models of the text image, several segmentation hypotheses are produced. They are processed by an optical character recognition (OCR) system, and the result is selected from the generated strings according to a confidence value computed using language modeling and OCR statistics. Experiments show that this approach leads to much better results than the conventional method that tries to improve the individual segmentation algorithm. The whole system has been tested on several hours of videos and showed good performance when integrated in a sports video annotation system and a video indexing system within the framework of two European projects.


computer vision and pattern recognition | 2007

Multi-Layer Background Subtraction Based on Color and Texture

Jian Yao; Jean-Marc Odobez

In this paper, we propose a robust multi-layer background subtraction technique which takes advantages of local texture features represented by local binary patterns (LBP) and photometric invariant color measurements in RGB color space. LBP can work robustly with respective to light variation on rich texture regions but not so efficiently on uniform regions. In the latter case, color information should overcome LBPs limitation. Due to the illumination invariance of both the LBP feature and the selected color feature, the method is able to handle local illumination changes such as cast shadows from moving objects. Due to the use of a simple layer-based strategy, the approach can model moving background pixels with quasi-periodic flickering as well as background scenes which may vary over time due to the addition and removal of long-time stationary objects. Finally, the use of a cross-bilateral filter allows to implicitly smooth detection results over regions of similar intensity and preserve object boundaries. Numerical and qualitative experimental results on both simulated and real data demonstrate the robustness of the proposed method.


computer vision and pattern recognition | 2005

Using particles to track varying numbers of interacting people

Kevin Smith; Daniel Gatica-Perez; Jean-Marc Odobez

In this paper, we present a Bayesian framework for the fully automatic tracking of a variable number of interacting targets using a fixed camera. This framework uses a joint multi-object state-space formulation and a trans-dimensional Markov Chain Monte Carlo (MCMC) particle filter to recursively estimates the multi-object configuration and efficiently search the state-space. We also define a global observation model comprised of color and binary measurements capable of discriminating between different numbers of objects in the scene. We present results which show that our method is capable of tracking varying numbers of people through several challenging real-world tracking situations such as full/partial occlusion and entering/leaving the scene.


IEEE Transactions on Pattern Analysis and Machine Intelligence | 2007

A Thousand Words in a Scene

Pedro Quelhas; Florent Monay; Jean-Marc Odobez; Daniel Gatica-Perez; Tinne Tuytelaars

This paper presents a novel approach for visual scene modeling and classification, investigating the combined use of text modeling methods and local invariant features. Our work attempts to elucidate (1) whether a textlike bag-of-visterms (BOV) representation (histogram of quantized local visual features) is suitable for scene (rather than object) classification, (2) whether some analogies between discrete scene representations and text documents exist, and 3) whether unsupervised, latent space models can be used both as feature extractors for the classification task and to discover patterns of visual co-occurrence. Using several data sets, we validate our approach, presenting and discussing experiments on each of these issues. We first show, with extensive experiments on binary and multiclass scene classification tasks using a 9,500-image data set, that the BOV representation consistently outperforms classical scene classification approaches. In other data sets, we show that our approach competes with or outperforms other recent more complex methods. We also show that probabilistic latent semantic analysis (PLSA) generates a compact scene representation, is discriminative for accurate classification, and is more robust than the BOV representation when less labeled training data is available. Finally, through aspect-based image ranking experiments, we show the ability of PLSA to automatically extract visually meaningful scene patterns, making such representation useful for browsing image collections.


computer vision and pattern recognition | 2005

Evaluating Multi-Object Tracking

Kevin Smith; Daniel Gatica-Perez; Jean-Marc Odobez; Silèye O. Ba

Multiple object tracking (MOT) is an active and challenging research topic. Many different approaches to the MOT problem exist, yet there is little agreement amongst the community on how to evaluate or compare these methods, and the amount of literature addressing this problem is limited. The goal of this paper is to address this issue by providing a comprehensive approach to the empirical evaluation of tracking performance. To that end, we explore the tracking characteristics important to measure in a real-life application, focusing on configuration (the number and location of objects in a scene) and identification (the consistent labeling of objects over time), and define a set of measures and a protocol to objectively evaluate these characteristics.


IEEE Transactions on Audio, Speech, and Language Processing | 2007

Audiovisual Probabilistic Tracking of Multiple Speakers in Meetings

Daniel Gatica-Perez; Guillaume Lathoud; Jean-Marc Odobez; Iain A. McCowan

Tracking speakers in multiparty conversations constitutes a fundamental task for automatic meeting analysis. In this paper, we present a novel probabilistic approach to jointly track the location and speaking activity of multiple speakers in a multisensor meeting room, equipped with a small microphone array and multiple uncalibrated cameras. Our framework is based on a mixed-state dynamic graphical model defined on a multiperson state-space, which includes the explicit definition of a proximity-based interaction model. The model integrates audiovisual (AV) data through a novel observation model. Audio observations are derived from a source localization algorithm. Visual observations are based on models of the shape and spatial structure of human heads. Approximate inference in our model, needed given its complexity, is performed with a Markov Chain Monte Carlo particle filter (MCMC-PF), which results in high sampling efficiency. We present results-based on an objective evaluation procedure-that show that our framework 1) is capable of locating and tracking the position and speaking activity of multiple meeting participants engaged in real conversations with good accuracy, 2) can deal with cases of visual clutter and occlusion, and 3) significantly outperforms a traditional sampling-based approach


IEEE Transactions on Pattern Analysis and Machine Intelligence | 2008

Tracking the Visual Focus of Attention for a Varying Number of Wandering People

Kevin Smith; Silèye O. Ba; Jean-Marc Odobez; Daniel Gatica-Perez

In this paper, we define and address the problem of finding the visual focus of attention for a varying number of wandering people (VFOA-W), determining where a person is looking when their movement is unconstrained. The VFOA-W estimation is a new and important problem with implications in behavior understanding and cognitive science and real-world applications. One such application, presented in this paper, monitors the attention passers-by pay to an outdoor advertisement by using a single video camera. In our approach to the VFOA-W problem, we propose a multiperson tracking solution based on a dynamic Bayesian network that simultaneously infers the number of people in a scene, their body locations, their head locations, and their head pose. For efficient inference in the resulting variable-dimensional state-space, we propose a Reversible-Jump Markov Chain Monte Carlo (RJMCMC) sampling scheme and a novel global observation model, which determines the number of people in the scene and their locations. To determine if a person is looking at the advertisement or not, we propose Gaussian Mixture Model (GMM)-based and Hidden Markov Model (HMM)-based VFOA-W models, which use head pose and location information. Our models are evaluated for tracking performance and ability to recognize people looking at an outdoor advertisement, with results indicating good performance on sequences where up to three mobile observers pass in front of an advertisement.


international conference on pattern recognition | 2004

A probabilistic framework for joint head tracking and pose estimation

Silèye O. Ba; Jean-Marc Odobez

Head tracking and pose estimation are usually considered as two sequential and separate problems: pose is estimated on the head patch provided by a tracking module. However, precision in head pose estimation is dependent on tracking accuracy which itself could benefit from the head orientation knowledge. Therefore, this work considers head tracking and pose estimation as two coupled problems in a probabilistic setting. Head pose models are learned and incorporated into a mixed-state particle filter framework for joint head tracking and pose estimation. Experimental results on real sequences show the effectiveness of the method in estimating more stable and accurate pose values.


IEEE Transactions on Pattern Analysis and Machine Intelligence | 2016

Deep Dynamic Neural Networks for Multimodal Gesture Segmentation and Recognition

Di Wu; Lionel Pigou; Pieter-Jan Kindermans; Nam Le; Ling Shao; Joni Dambre; Jean-Marc Odobez

This paper describes a novel method called Deep Dynamic Neural Networks (DDNN) for multimodal gesture recognition. A semi-supervised hierarchical dynamic framework based on a Hidden Markov Model (HMM) is proposed for simultaneous gesture segmentation and recognition where skeleton joint information, depth and RGB images, are the multimodal input observations. Unlike most traditional approaches that rely on the construction of complex handcrafted features, our approach learns high-level spatiotemporal representations using deep neural networks suited to the input modality: a Gaussian-Bernouilli Deep Belief Network (DBN) to handle skeletal dynamics, and a 3D Convolutional Neural Network (3DCNN) to manage and fuse batches of depth and RGB images. This is achieved through the modeling and learning of the emission probabilities of the HMM required to infer the gesture sequence. This purely data driven approach achieves a Jaccard index score of 0.81 in the ChaLearn LAP gesture spotting challenge. The performance is on par with a variety of state-of-the-art hand-tuned feature-based approaches and other learning-based methods, therefore opening the door to the use of deep learning techniques in order to further explore multimodal time series data.

Collaboration


Dive into the Jean-Marc Odobez's collaboration.

Top Co-Authors

Avatar

Daniel Gatica-Perez

École Polytechnique Fédérale de Lausanne

View shared research outputs
Top Co-Authors

Avatar

Silèye O. Ba

Idiap Research Institute

View shared research outputs
Top Co-Authors

Avatar

Rémi Emonet

Idiap Research Institute

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Nam Le

Idiap Research Institute

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Gulcan Can

Idiap Research Institute

View shared research outputs
Researchain Logo
Decentralizing Knowledge