Julien Pinquier | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Julien Pinquier is active.

Explore More

Publication

Featured researches published by Julien Pinquier.

Multimedia Tools and Applications | 2014

Hierarchical Hidden Markov Model in detecting activities of daily living in wearable videos for studies of dementia

Svebor Karaman; Jenny Benois-Pineau; Vladislavs Dovgalecs; Rémi Mégret; Julien Pinquier; Régine André-Obrecht; Yann Gaëstel; Jean-François Dartigues

This paper presents a method for indexing activities of daily living in videos acquired from wearable cameras. It addresses the problematic of analyzing the complex multimedia data acquired from wearable devices, which has been recently a growing concern due to the increasing amount of this kind of multimedia data. In the context of dementia diagnosis by doctors, patient activities are recorded in the environment of their home using a lightweight wearable device, to be later visualized by the medical practitioners. The recording mode poses great challenges since the video data consists in a single sequence shot where strong motion and sharp lighting changes often appear. Because of the length of the recordings, tools for an efficient navigation in terms of activities of interest are crucial. Our work introduces a video structuring approach that combines automatic motion based segmentation of the video and activity recognition by a hierarchical two-level Hidden Markov Model. We define a multi-modal description space over visual and audio features, including mid-level features such as motion, location, speech and noise detections. We show their complementarities globally as well as for specific activities. Experiments on real data obtained from the recording of several patients at home show the difficulty of the task and the promising results of the proposed approach.

acm multimedia | 2010

The IMMED project: wearable video monitoring of people with age dementia

Rémi Mégret; Vladislavs Dovgalecs; Hazem Wannous; Svebor Karaman; Jenny Benois-Pineau; Elie Khoury; Julien Pinquier; Philippe Joly; Régine André-Obrecht; Yann Gaëstel; Jean-François Dartigues

In this paper, we describe a new application for multimedia indexing, using a system that monitors the instrumental activities of daily living to assess the cognitive decline caused by dementia. The system is composed of a wearable camera device designed to capture audio and video data of the instrumental activities of a patient, which is leveraged with multimedia indexing techniques in order to allow medical specialists to analyze several hour long observation shots efficiently.

international conference on acoustics, speech, and signal processing | 2009

Improved speaker diarization system for meetings

Elie El-Khoury; Christine Senac; Julien Pinquier

In this paper, we investigate new approaches to improve speech activity detection, speaker segmentation and speaker clustering. The main idea behind them is to deal with the problem of speaker diarization for meetings where error rates are relatively high. In opposition to existing methods, a new iterative scheme is proposed considering those three tasks as only one problem. New bidirectional source segmentation is proposed based on the GLR/BIC method. The well-known BIC clustering is also reviewed and a new unsupervised post-processing is added to increase clusters purity. Those new proposals applied on meeting data show a relative improvement of about 40% compared to a standard speaker diarization system.

international conference on acoustics, speech, and signal processing | 2002

Speech and music classification in audio documents

Julien Pinquier; Christine Sénac

To index efficiently the soundtrack of multimedia documents, it is necessary to extract elementary and homogeneous acoustic segments. In this paper, we explore such a prior partitioning which consists in detect the two basic components, which are speech and music components. The originality of this work is that music and speech are not considered as two classes and two classification systems are independently defined, a speech/non-speech one and a music/non-music one. This approach permits to better characterize and discriminate each component: in particular, two different feature spaces are necessary as two pairs of Gaussian mixture models. More, the acoustic signal is divided into four types of segments: speech, music, speech-music and other. The experiments are performed on the soundtracks of audio video documents (films, TV sport broadcasts). The performance proves the interest of this approach, so called the Differentiated Modeling Approach.

Proceedings of the 2010 international workshop on Searching spontaneous conversational speech | 2010

Speaker role recognition to help spontaneous conversational speech detection

Benjamin Bigot; Isabelle Ferrané; Julien Pinquier; Régine André-Obrecht

In the audio indexing context, we present our recent contributions to the field of speaker role recognition, especially applied to conversational speech. We assume that there exist clues about roles like Anchor, Journalists or Others in temporal, acoustic and prosodic features extracted from the results of speaker segmentation and from audio files. In this paper, investigations are done on the EPAC corpus, mainly containing conversational documents. First, an automatic clustering approach is used to validate the proposed features and the role definitions. In a second study we propose a hierarchical supervised classification system. The use of dimensionality reduction methods as well as feature selection are investigated. This system correctly classifies 92% of speaker roles

international conference on acoustics, speech, and signal processing | 2013

Water sound recognition based on physical models

Patrice Guyot; Julien Pinquier; Régine André-Obrecht

This article describes an audio signal processing algorithm to detect water sounds, built in the context of a larger system aiming to monitor daily activities of elderly people. While previous proposals for water sound recognition relied on classical machine learning and generic audio features to characterize water sounds as a flow texture, we describe here a recognition system based on a physical model of air bubble acoustics. This system is able to recognize a wide variety of water sounds and does not require training. It is validated on a home environmental sound corpus with a classification task, in which all water sounds are correctly detected. In a free detection task on a real life recording, it outperformed the classical systems and obtained 70% of F-measure.

international conference on multimedia and expo | 2013

Two-step detection of water sound events for the diagnostic and monitoring of dementia

Patrice Guyot; Julien Pinquier; Xavier Valero; Francesc Alías

A significant aging of world population is foreseen for the next decades. Thus, developing technologies to empower the independency and assist the elderly are becoming of great interest. In this framework, the IMMED project investigates tele-monitoring technologies to support doctors in the diagnostic and follow-up of dementia illnesses such as Alzheimer. Specifically, water sounds are very useful to track and identify abnormal behaviors form everyday activities (e.g. hygiene, household, cooking, etc.). In this work, we propose a double-stage system to detect this type of sound events. In the first stage, the audio stream is segmented with a simple but effective algorithm based on the Spectral Cover feature. The second stage improves the system precision by classifying the segmented streams into water/non-water sound events using Gammatone Cepstral Coefficients and Support Vector Machines. Experimental results reveal the potential of the combined system, yielding a F-measure higher than 80%.

content based multimedia indexing | 2011

Activities of daily living indexing by hierarchical HMM for dementia diagnostics

Svebor Karaman; Jenny Benois-Pineau; Rémi Mégret; Julien Pinquier; Yann Gaëstel; Jean-François Dartigues

This paper presents a method for indexing human activities in videos captured from a wearable camera being worn by patients, for studies of progression of the dementia diseases. Our method aims to produce indexes to facilitate the navigation throughout the individual video recordings, which could help doctors search for early signs of the disease in the activities of daily living. The recorded videos have strong motion and sharp lighting changes, inducing noise for the analysis. The proposed approach is based on a two steps analysis. First, we propose a new approach to segment this type of video, based on apparent motion. Each segment is characterized by two original motion descriptors, as well as color, and audio descriptors. Second, a Hidden-Markov Model formulation is used to merge the multimodal audio and video features, and classify the test segments. Experiments show the good properties of the approach on real data.

Computer Vision and Image Understanding | 2016

A multi-modal perception based assistive robotic system for the elderly

Christophe Mollaret; Alhayat Ali Mekonnen; Frédéric Lerasle; Isabelle Ferrané; Julien Pinquier; Blandine Boudet; Pierre Rumeau

We present a complete multi-modal perception driven non-intrusive domestic robotic system for the elderly.We present a novel multi-modal users intention-for-interaction detection modality.A fusion method to improve the speech recognition given the users position, available sensors, and recognition tools is presented.We present details of the complete implemented system along with relevant evaluations that demonstrate the soundness of the framework via an exemplar application whereby the robot helps the user find hidden or misplaced objects in his/her living place.The proposed framework is further investigated by conducting relevant user studies involving 17 elderly participants. In this paper, we present a multi-modal perception based framework to realize a non-intrusive domestic assistive robotic system. It is non-intrusive in that it only starts interaction with a user when it detects the users intention to do so. All the robots actions are based on multi-modal perceptions which include user detection based on RGB-D data, users intention-for-interaction detection with RGB-D and audio data, and communication via user distance mediated speech recognition. The utilization of multi-modal cues in different parts of the robotic activity paves the way to successful robotic runs (94% success rate). Each presented perceptual component is systematically evaluated using appropriate dataset and evaluation metrics. Finally the complete system is fully integrated on the PR2 robotic platform and validated through system sanity check runs and user studies with the help of 17 volunteer elderly participants.

IEEE Transactions on Audio, Speech, and Language Processing | 2011

Distinguishing Monophonies From Polyphonies Using Weibull Bivariate Distributions

Hélène Lachambre; Régine André-Obrecht; Julien Pinquier

In the context of music indexation, it would be useful to have a precise information about the number of sources performing; a source is a solo voice or an isolated instrument which produces a single note at any time. This correspondence discusses the automatic distinction between monophonic music excerpts, where only one source is present, and polyphonic ones. Our method is based on the analysis of a “confidence indicator,” which gives the confidence (in fact its inverse) on the current estimated fundamental frequency (pitch). In a monophony, the confidence indicator is low. In a polyphony, the confidence indicator is higher and varies more. This leads us to compute the short term mean and variance of this indicator, take this 2-D vector as the observation vector and model its conditional distribution with Weibull bivariate models. This probability density function is characterized by five parameters. A method to perform their estimation is developed (in theory and practice). The decision is taken considering the maximum likelihood, computed over one second. The best configuration gives a global error rate of 6.3%, performed on a balanced corpus (18 minutes in total).

Explore More