Julien Pinquier
University of Toulouse
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Julien Pinquier.
Multimedia Tools and Applications | 2014
Svebor Karaman; Jenny Benois-Pineau; Vladislavs Dovgalecs; Rémi Mégret; Julien Pinquier; Régine André-Obrecht; Yann Gaëstel; Jean-François Dartigues
This paper presents a method for indexing activities of daily living in videos acquired from wearable cameras. It addresses the problematic of analyzing the complex multimedia data acquired from wearable devices, which has been recently a growing concern due to the increasing amount of this kind of multimedia data. In the context of dementia diagnosis by doctors, patient activities are recorded in the environment of their home using a lightweight wearable device, to be later visualized by the medical practitioners. The recording mode poses great challenges since the video data consists in a single sequence shot where strong motion and sharp lighting changes often appear. Because of the length of the recordings, tools for an efficient navigation in terms of activities of interest are crucial. Our work introduces a video structuring approach that combines automatic motion based segmentation of the video and activity recognition by a hierarchical two-level Hidden Markov Model. We define a multi-modal description space over visual and audio features, including mid-level features such as motion, location, speech and noise detections. We show their complementarities globally as well as for specific activities. Experiments on real data obtained from the recording of several patients at home show the difficulty of the task and the promising results of the proposed approach.
acm multimedia | 2010
Rémi Mégret; Vladislavs Dovgalecs; Hazem Wannous; Svebor Karaman; Jenny Benois-Pineau; Elie Khoury; Julien Pinquier; Philippe Joly; Régine André-Obrecht; Yann Gaëstel; Jean-François Dartigues
In this paper, we describe a new application for multimedia indexing, using a system that monitors the instrumental activities of daily living to assess the cognitive decline caused by dementia. The system is composed of a wearable camera device designed to capture audio and video data of the instrumental activities of a patient, which is leveraged with multimedia indexing techniques in order to allow medical specialists to analyze several hour long observation shots efficiently.
international conference on acoustics, speech, and signal processing | 2009
Elie El-Khoury; Christine Senac; Julien Pinquier
In this paper, we investigate new approaches to improve speech activity detection, speaker segmentation and speaker clustering. The main idea behind them is to deal with the problem of speaker diarization for meetings where error rates are relatively high. In opposition to existing methods, a new iterative scheme is proposed considering those three tasks as only one problem. New bidirectional source segmentation is proposed based on the GLR/BIC method. The well-known BIC clustering is also reviewed and a new unsupervised post-processing is added to increase clusters purity. Those new proposals applied on meeting data show a relative improvement of about 40% compared to a standard speaker diarization system.
international conference on acoustics, speech, and signal processing | 2002
Julien Pinquier; Christine Sénac
To index efficiently the soundtrack of multimedia documents, it is necessary to extract elementary and homogeneous acoustic segments. In this paper, we explore such a prior partitioning which consists in detect the two basic components, which are speech and music components. The originality of this work is that music and speech are not considered as two classes and two classification systems are independently defined, a speech/non-speech one and a music/non-music one. This approach permits to better characterize and discriminate each component: in particular, two different feature spaces are necessary as two pairs of Gaussian mixture models. More, the acoustic signal is divided into four types of segments: speech, music, speech-music and other. The experiments are performed on the soundtracks of audio video documents (films, TV sport broadcasts). The performance proves the interest of this approach, so called the Differentiated Modeling Approach.
Proceedings of the 2010 international workshop on Searching spontaneous conversational speech | 2010
Benjamin Bigot; Isabelle Ferrané; Julien Pinquier; Régine André-Obrecht
In the audio indexing context, we present our recent contributions to the field of speaker role recognition, especially applied to conversational speech. We assume that there exist clues about roles like Anchor, Journalists or Others in temporal, acoustic and prosodic features extracted from the results of speaker segmentation and from audio files. In this paper, investigations are done on the EPAC corpus, mainly containing conversational documents. First, an automatic clustering approach is used to validate the proposed features and the role definitions. In a second study we propose a hierarchical supervised classification system. The use of dimensionality reduction methods as well as feature selection are investigated. This system correctly classifies 92% of speaker roles
international conference on acoustics, speech, and signal processing | 2013
Patrice Guyot; Julien Pinquier; Régine André-Obrecht
This article describes an audio signal processing algorithm to detect water sounds, built in the context of a larger system aiming to monitor daily activities of elderly people. While previous proposals for water sound recognition relied on classical machine learning and generic audio features to characterize water sounds as a flow texture, we describe here a recognition system based on a physical model of air bubble acoustics. This system is able to recognize a wide variety of water sounds and does not require training. It is validated on a home environmental sound corpus with a classification task, in which all water sounds are correctly detected. In a free detection task on a real life recording, it outperformed the classical systems and obtained 70% of F-measure.
international conference on multimedia and expo | 2013
Patrice Guyot; Julien Pinquier; Xavier Valero; Francesc Alías
A significant aging of world population is foreseen for the next decades. Thus, developing technologies to empower the independency and assist the elderly are becoming of great interest. In this framework, the IMMED project investigates tele-monitoring technologies to support doctors in the diagnostic and follow-up of dementia illnesses such as Alzheimer. Specifically, water sounds are very useful to track and identify abnormal behaviors form everyday activities (e.g. hygiene, household, cooking, etc.). In this work, we propose a double-stage system to detect this type of sound events. In the first stage, the audio stream is segmented with a simple but effective algorithm based on the Spectral Cover feature. The second stage improves the system precision by classifying the segmented streams into water/non-water sound events using Gammatone Cepstral Coefficients and Support Vector Machines. Experimental results reveal the potential of the combined system, yielding a F-measure higher than 80%.
content based multimedia indexing | 2011
Svebor Karaman; Jenny Benois-Pineau; Rémi Mégret; Julien Pinquier; Yann Gaëstel; Jean-François Dartigues
This paper presents a method for indexing human activities in videos captured from a wearable camera being worn by patients, for studies of progression of the dementia diseases. Our method aims to produce indexes to facilitate the navigation throughout the individual video recordings, which could help doctors search for early signs of the disease in the activities of daily living. The recorded videos have strong motion and sharp lighting changes, inducing noise for the analysis. The proposed approach is based on a two steps analysis. First, we propose a new approach to segment this type of video, based on apparent motion. Each segment is characterized by two original motion descriptors, as well as color, and audio descriptors. Second, a Hidden-Markov Model formulation is used to merge the multimodal audio and video features, and classify the test segments. Experiments show the good properties of the approach on real data.
Computer Vision and Image Understanding | 2016
Christophe Mollaret; Alhayat Ali Mekonnen; Frédéric Lerasle; Isabelle Ferrané; Julien Pinquier; Blandine Boudet; Pierre Rumeau
We present a complete multi-modal perception driven non-intrusive domestic robotic system for the elderly.We present a novel multi-modal users intention-for-interaction detection modality.A fusion method to improve the speech recognition given the users position, available sensors, and recognition tools is presented.We present details of the complete implemented system along with relevant evaluations that demonstrate the soundness of the framework via an exemplar application whereby the robot helps the user find hidden or misplaced objects in his/her living place.The proposed framework is further investigated by conducting relevant user studies involving 17 elderly participants. In this paper, we present a multi-modal perception based framework to realize a non-intrusive domestic assistive robotic system. It is non-intrusive in that it only starts interaction with a user when it detects the users intention to do so. All the robots actions are based on multi-modal perceptions which include user detection based on RGB-D data, users intention-for-interaction detection with RGB-D and audio data, and communication via user distance mediated speech recognition. The utilization of multi-modal cues in different parts of the robotic activity paves the way to successful robotic runs (94% success rate). Each presented perceptual component is systematically evaluated using appropriate dataset and evaluation metrics. Finally the complete system is fully integrated on the PR2 robotic platform and validated through system sanity check runs and user studies with the help of 17 volunteer elderly participants.
IEEE Transactions on Audio, Speech, and Language Processing | 2011
Hélène Lachambre; Régine André-Obrecht; Julien Pinquier
In the context of music indexation, it would be useful to have a precise information about the number of sources performing; a source is a solo voice or an isolated instrument which produces a single note at any time. This correspondence discusses the automatic distinction between monophonic music excerpts, where only one source is present, and polyphonic ones. Our method is based on the analysis of a “confidence indicator,” which gives the confidence (in fact its inverse) on the current estimated fundamental frequency (pitch). In a monophony, the confidence indicator is low. In a polyphony, the confidence indicator is higher and varies more. This leads us to compute the short term mean and variance of this indicator, take this 2-D vector as the observation vector and model its conditional distribution with Weibull bivariate models. This probability density function is characterized by five parameters. A method to perform their estimation is developed (in theory and practice). The decision is taken considering the maximum likelihood, computed over one second. The best configuration gives a global error rate of 6.3%, performed on a balanced corpus (18 minutes in total).