Silèye O. Ba
Idiap Research Institute
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Silèye O. Ba.
computer vision and pattern recognition | 2005
Kevin Smith; Daniel Gatica-Perez; Jean-Marc Odobez; Silèye O. Ba
Multiple object tracking (MOT) is an active and challenging research topic. Many different approaches to the MOT problem exist, yet there is little agreement amongst the community on how to evaluate or compare these methods, and the amount of literature addressing this problem is limited. The goal of this paper is to address this issue by providing a comprehensive approach to the empirical evaluation of tracking performance. To that end, we explore the tracking characteristics important to measure in a real-life application, focusing on configuration (the number and location of objects in a scene) and identification (the consistent labeling of objects over time), and define a set of measures and a protocol to objectively evaluate these characteristics.
IEEE Transactions on Pattern Analysis and Machine Intelligence | 2008
Kevin Smith; Silèye O. Ba; Jean-Marc Odobez; Daniel Gatica-Perez
In this paper, we define and address the problem of finding the visual focus of attention for a varying number of wandering people (VFOA-W), determining where a person is looking when their movement is unconstrained. The VFOA-W estimation is a new and important problem with implications in behavior understanding and cognitive science and real-world applications. One such application, presented in this paper, monitors the attention passers-by pay to an outdoor advertisement by using a single video camera. In our approach to the VFOA-W problem, we propose a multiperson tracking solution based on a dynamic Bayesian network that simultaneously infers the number of people in a scene, their body locations, their head locations, and their head pose. For efficient inference in the resulting variable-dimensional state-space, we propose a Reversible-Jump Markov Chain Monte Carlo (RJMCMC) sampling scheme and a novel global observation model, which determines the number of people in the scene and their locations. To determine if a person is looking at the advertisement or not, we propose Gaussian Mixture Model (GMM)-based and Hidden Markov Model (HMM)-based VFOA-W models, which use head pose and location information. Our models are evaluated for tracking performance and ability to recognize people looking at an outdoor advertisement, with results indicating good performance on sequences where up to three mobile observers pass in front of an advertisement.
systems man and cybernetics | 2009
Silèye O. Ba; Jean-Marc Odobez
We address the problem of recognizing the visual focus of attention (VFOA) of meeting participants based on their head pose. To this end, the head pose observations are modeled using a Gaussian mixture model (GMM) or a hidden Markov model (HMM) whose hidden states correspond to the VFOA. The novelties of this paper are threefold. First, contrary to previous studies on the topic, in our setup, the potential VFOA of a person is not restricted to other participants only. It includes environmental targets as well (a table and a projection screen), which increases the complexity of the task, with more VFOA targets spread in the pan as well as tilt gaze space. Second, we propose a geometric model to set the GMM or HMM parameters by exploiting results from cognitive science on saccadic eye motion, which allows the prediction of the head pose given a gaze target. Third, an unsupervised parameter adaptation step not using any labeled data is proposed, which accounts for the specific gazing behavior of each participant. Using a publicly available corpus of eight meetings featuring four persons, we analyze the above methods by evaluating, through objective performance measures, the recognition of the VFOA from head pose information obtained either using a magnetic sensor device or a vision-based tracking system. The results clearly show that in such complex but realistic situations, the VFOA recognition performance is highly dependent on how well the visual targets are separated for a given meeting participant. In addition, the results show that the use of a geometric model with unsupervised adaptation achieves better results than the use of training data to set the HMM parameters.
international conference on pattern recognition | 2004
Silèye O. Ba; Jean-Marc Odobez
Head tracking and pose estimation are usually considered as two sequential and separate problems: pose is estimated on the head patch provided by a tracking module. However, precision in head pose estimation is dependent on tracking accuracy which itself could benefit from the head orientation knowledge. Therefore, this work considers head tracking and pose estimation as two coupled problems in a probabilistic setting. Head pose models are learned and incorporated into a mixed-state particle filter framework for joint head tracking and pose estimation. Experimental results on real sequences show the effectiveness of the method in estimating more stable and accurate pose values.
acm multimedia | 2007
Hayley Hung; Dinesh Babu Jayagopi; Chuohao Yeo; Gerald Friedland; Silèye O. Ba; Jean-Marc Odobez; Kannan Ramchandran; Nikki Mirghafori; Daniel Gatica-Perez
The automated extraction of semantically meaningful information from multi-modal data is becoming increasingly necessary due to the escalation of captured data for archival. A novel area of multi-modal data labelling, which has received relatively little attention, is the automatic estimation of the most dominant person in a group meeting. In this paper, we provide a framework for detecting dominance in group meetings using different audio and video cues. We show that by using a simple model for dominance estimation we can obtain promising results.
international conference on multimodal interfaces | 2008
Hayley Hung; Dinesh Babu Jayagopi; Silèye O. Ba; Jean-Marc Odobez; Daniel Gatica-Perez
We study the automation of the visual dominance ratio (VDR); a classic measure of displayed dominance in social psychology literature, which combines both gaze and speaking activity cues. The VDR is modified to estimate dominance in multi-party group discussions where natural verbal exchanges are possible and other visual targets such as a table and slide screen are present. Our findings suggest that fully automated versions of these measures can estimate effectively the most dominant person in a meeting and can match the dominance estimation performance when manual labels of visual attention are used.
IEEE Transactions on Image Processing | 2006
Jean-Marc Odobez; Daniel Gatica-Perez; Silèye O. Ba
Particle filtering (PF) is now established as one of the most popular methods for visual tracking. Within this framework, two assumptions are generally made. The first is that the data are temporally independent given the sequence of object states, and the second one is the use of the transition prior as proposal distribution. In this paper, we argue that the first assumption does not strictly hold and that the second can be improved. We propose to handle both modeling issues using motion. Explicit motion measurements are used to drive the sampling process towards the new interesting regions of the image, while implicit motion measurements are introduced in the likelihood evaluation to model the data correlation term. The proposed model allows to handle abrupt motion changes and to filter out visual distractors when tracking objects with generic models based on shape representations. Experimental results compared against the CONDENSATION algorithm have demonstrated superior tracking performance.
international conference on multimodal interfaces | 2008
Dinesh Babu Jayagopi; Silèye O. Ba; Jean-Marc Odobez; Daniel Gatica-Perez
This paper addresses the automatic estimation of two aspects of social verticality (status and dominance) in small-group meetings using nonverbal cues. The correlation of nonverbal behavior with these social constructs have been extensively documented in social psychology, but their value for computational models is, in many cases, still unknown. We present a systematic study of automatically extracted cues - including vocalic, visual activity, and visual attention cues - and investigate their relative effectiveness to predict both the most-dominant person and the high-status project manager from relative short observations. We use five hours of task-oriented meeting data with natural behavior for our experiments. Our work suggests that, although dominance and role-based status are related concepts, they are not equivalent and are thus not equally explained by the same nonverbal cues. Furthermore, the best cues can correctly predict the person with highest dominance or role-based status with an accuracy of 70% approximately.
international conference on machine learning | 2006
Silèye O. Ba; Jean-Marc Odobez
This paper presents a study on the recognition of the visual focus of attention (VFOA) of meeting participants based on their head pose. Contrarily to previous studies on the topic, in our set-up, the potential VFOA of people is not restricted to other meeting participants only, but includes environmental targets (table, slide screen). This has two consequences. Firstly, this increases the number of possible ambiguities in identifying the VFOA from the head pose. Secondly, due to our particular set-up, the identification of the VFOA from head pose can not rely on an incomplete representation of the pose (the pan), but requests the knowledge of the full head pointing information (pan and tilt). In this paper, using a corpus of 8 meetings of 8 minutes on average, featuring 4 persons involved in the discussion of statements projected on a slide screen, we analyze the above issues by evaluating, through numerical performance measures, the recognition of the VFOA from head pose information obtained either using a magnetic sensor device (the ground truth) or a vision based tracking system (head pose estimates). The results clearly show that in complex but realistic situations, it is quite optimistic to believe that the recognition of the VFOA can solely be based on the head pose, as some previous studies had suggested.
international conference on multimedia and expo | 2005
Iain A. McCowan; Maganti Hari Krishna; Daniel Gatica-Perez; Darren Moore; Silèye O. Ba
Close-talk headset microphones have been traditionally used for speech acquisition in a number of applications, as they naturally provide a higher signal-to-noise ratio -needed for recognition tasks-than single distant microphones. However, in multi-party conversational settings like meetings, microphone arrays represent an important alternative to close-talking microphones, as they allow for localisation and tracking of speakers and signal-independent enhancement, while providing a non-intrusive, hands-free operation mode. In this article, we investigate the use of an audio-visual sensor array, composed of a small table-top microphone array and a set of cameras, for speaker tracking and speech enhancement in meetings. Our methodology first fuses audio and video for person tracking, and then integrates the output of the tracker with a beamformer for speech enhancement. We compare and discuss the features of the resulting speech signal with respect to that obtained from single close-talking and table-top microphones