Isabelle Ferrané
University of Toulouse
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Isabelle Ferrané.
Autonomous Robots | 2012
Brice Burger; Isabelle Ferrané; Frédéric Lerasle; Guillaume Infantes
Assistance is currently a pivotal research area in robotics, with huge societal potential. Since assistant robots directly interact with people, finding natural and easy-to-use user interfaces is of fundamental importance. This paper describes a flexible multimodal interface based on speech and gesture modalities in order to control our mobile robot named Jido. The vision system uses a stereo head mounted on a pan-tilt unit and a bank of collaborative particle filters devoted to the upper human body extremities to track and recognize pointing/symbolic mono but also bi-manual gestures. Such framework constitutes our first contribution, as it is shown, to give proper handling of natural artifacts (self-occlusion, camera out of view field, hand deformation) when performing 3D gestures using one or the other hand even both. A speech recognition and understanding system based on the Julius engine is also developed and embedded in order to process deictic and anaphoric utterances. The second contribution deals with a probabilistic and multi-hypothesis interpreter framework to fuse results from speech and gesture components. Such interpreter is shown to improve the classification rates of multimodal commands compared to using either modality alone. Finally, we report on successful live experiments in human-centered settings. Results are reported in the context of an interactive manipulation task, where users specify local motion commands to Jido and perform safe object exchanges.
robot and human interactive communication | 2006
Aurélie Clodic; Sara Fleury; Rachid Alami; Raja Chatila; Gérard Bailly; Ludovic Brèthes; Maxime Cottret; Patrick Danès; Xavier Dollat; Frédéric Elisei; Isabelle Ferrané; Matthieu Herrb; Guillaume Infantes; Christian Lemaire; Frédéric Lerasle; Jérôme Manhes; Patrick Marcoul; Paulo Menezes; Vincent Montreuil
Rackham is an interactive robot-guide that has been used in several places and exhibitions. This paper presents its design and reports on results that have been obtained after its deployment in a permanent exhibition. The project is conducted so as to incrementally enhance the robot functional and decisional capabilities based on the observation of the interaction between the public and the robot. Besides robustness and efficiency in the robot navigation abilities in a dynamic environment, our focus was to develop and test a methodology to integrate human-robot interaction abilities in a systematic way. We first present the robot and some of its key design issues. Then, we discuss a number of lessons that we have drawn from its use in interaction with the public and how that will serve to refine our design choices and to enhance robot efficiency and acceptability
international conference on computer vision systems | 2008
Brice Burger; Isabelle Ferrané; Fr ederic Lerasle
Among the cognitive abilities a robot companion must be endowed with, human perception and speech understanding are both fundamental in the context of multimodal human-robot interaction. In order to provide a mobile robot with the visual perception of its user and means to handle verbal and multimodal communication, we have developed and integrated two components. In this paper we will focus on an interactively distributed multiple object tracker dedicated to two-handed gestures and head location in 3D. Its relevance is highlighted by in- and off- line evaluations from data acquired by the robot. Implementation and preliminary experiments on a household robot companion, including speech recognition and understanding as well as basic fusion with gesture, are then demonstrated. The latter illustrate how vision can assist speech by specifying location references, object/person IDs in verbal statements in order to interpret natural deictic commands given by human beings. Extensions of our work are finally discussed.
Proceedings of the 2010 international workshop on Searching spontaneous conversational speech | 2010
Benjamin Bigot; Isabelle Ferrané; Julien Pinquier; Régine André-Obrecht
In the audio indexing context, we present our recent contributions to the field of speaker role recognition, especially applied to conversational speech. We assume that there exist clues about roles like Anchor, Journalists or Others in temporal, acoustic and prosodic features extracted from the results of speaker segmentation and from audio files. In this paper, investigations are done on the EPAC corpus, mainly containing conversational documents. First, an automatic clustering approach is used to validate the proposed features and the role definitions. In a second study we propose a hierarchical supervised classification system. The use of dimensionality reduction methods as well as feature selection are investigated. This system correctly classifies 92% of speaker roles
Eurasip Journal on Image and Video Processing | 2011
Zein Al Abidin Ibrahim; Isabelle Ferrané; Philippe Joly
We propose a novel approach for video classification that bases on the analysis of the temporal relationships between the basic events in audiovisual documents. Starting from basic segmentation results, we define a new representation method that is called Temporal Relation Matrix (TRM). Each document is then described by a set of TRMs, the analysis of which makes events of a higher level stand out. This representation has been first designed to analyze any audiovisual document in order to find events that may well characterize its content and its structure. The aim of this work is to use this representation to compute a similarity measure between two documents. Approaches for audiovisual documents classification are presented and discussed. Experimentations are done on a set of 242 video documents and the results show the efficiency of our proposals.
Computer Vision and Image Understanding | 2016
Christophe Mollaret; Alhayat Ali Mekonnen; Frédéric Lerasle; Isabelle Ferrané; Julien Pinquier; Blandine Boudet; Pierre Rumeau
We present a complete multi-modal perception driven non-intrusive domestic robotic system for the elderly.We present a novel multi-modal users intention-for-interaction detection modality.A fusion method to improve the speech recognition given the users position, available sensors, and recognition tools is presented.We present details of the complete implemented system along with relevant evaluations that demonstrate the soundness of the framework via an exemplar application whereby the robot helps the user find hidden or misplaced objects in his/her living place.The proposed framework is further investigated by conducting relevant user studies involving 17 elderly participants. In this paper, we present a multi-modal perception based framework to realize a non-intrusive domestic assistive robotic system. It is non-intrusive in that it only starts interaction with a user when it detects the users intention to do so. All the robots actions are based on multi-modal perceptions which include user detection based on RGB-D data, users intention-for-interaction detection with RGB-D and audio data, and communication via user distance mediated speech recognition. The utilization of multi-modal cues in different parts of the robotic activity paves the way to successful robotic runs (94% success rate). Each presented perceptual component is systematically evaluated using appropriate dataset and evaluation metrics. Finally the complete system is fully integrated on the PR2 robotic platform and validated through system sanity check runs and user studies with the help of 17 volunteer elderly participants.
Multimedia Tools and Applications | 2012
Benjamin Bigot; Isabelle Ferrané; Julien Pinquier; Régine André-Obrecht
In the field of automatic audiovisual content-based indexing and structuring, finding events like interviews, debates, reports, or live commentaries requires to bridge the gap between low-level feature extraction and such high-level event detection. In our work, we consider that detecting speaker roles like Anchor, Journalist and Other is a first step to enrich interaction sequences between speakers. Our work relies on the assumption of the existence of clues about speaker roles in temporal, prosodic and basic signal features extracted from audio files and from speaker segmentations. Each speaker is therefore represented by a 36-feature vector. Contrarily to most of the state-of-the-art propositions we do not use the structure of the document to recognize the roles of the interveners. We investigate the influence of two dimensionality reduction techniques (Principal Component Analysis and Linear Discriminant Analysis) and different classification methods (Gaussian Mixture Models, K-nearest neighbours and Support Vectors Machines). Experiments are done on the 13-h corpus of the ESTER2 evaluation campaign. The best result reaches about 82% of well recognized roles. This corresponds to more than 89% of speech duration correctly labelled.
intelligent robots and systems | 2008
Brice Burger; Frédéric Lerasle; Isabelle Ferrané; Aurélie Clodic
Among the cognitive abilities a robot companion must be endowed with, human perception and speech understanding are both fundamental in the context of multimodal human-robot interaction. First, we propose a multiple object visual tracker which is interactively distributed and dedicated to two-handed gestures and head location in 3D. An on-board speech understanding system is also developed in order to process deictic and anaphoric utterances. Characteristics and performances for each of the two components are presented. Finally, integration and experiments on a robot companion highlight the relevance and complementarity of our multimodal interface. Outlook to future work is finally discussed.
content based multimedia indexing | 2008
Benjamin Bigot; Isabelle Ferrané; Zein Al Abidin Ibrahim
Giving access to the semantically rich content of large amounts of digital audiovisual data using an automatic and generic method is still an important challenge. The aim of our work is to address this issue while focusing on temporal aspects. Our approach is based on a method previously developed for analyzing temporal relations from a data mining point of view. This method is used to detect zones of a document in which two characteristics are active. These characteristics can result from low-level segmentations of the audio or video components, or from more semantic processings. Once ldquoactivity zonesrdquo have been detected, we propose to compute a set of additional descriptors in order to better characterize them. The method is applied in the scope of the EPAC project that focuses on the detection and the characterization of conversational speech.
international conference on image processing | 2014
Christophe Mollaret; Frédéric Lerasle; Isabelle Ferrané; Julien Pinquier
Visual tracking is dynamic optimization where time and object state simultaneously influence the problem. In this paper, we intend to show that we built a tracker from an evolutionary optimization approach, the PSO (Particle Swarm optimization) algorithm. We demonstrated that an extension of the original algorithm where system dynamics is explicitly taken into consideration, it can perform an efficient tracking. This tracker is also shown to outperform SIR (Sampling Importance Resampling) algorithm with random walk and constant velocity model, as well as a previously PSO inspired tracker, SPSO (Sequential Particle Swarm Optimization). Experiments were performed both on simulated data and real visual RGB-D information. Our PSO inspired tracker can be a very effective and robust alternative for visual tracking.