Is this you? Create Your Porfile

Mihai Gurban

École Polytechnique Fédérale de Lausanne

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Mihai Gurban is active.

Explore More

Publication

Featured researches published by Mihai Gurban.

IEEE Transactions on Signal Processing | 2009

Information Theoretic Feature Extraction for Audio-Visual Speech Recognition

Mihai Gurban; Jean-Philippe Thiran

The problem of feature selection has been thoroughly analyzed in the context of pattern classification, with the purpose of avoiding the curse of dimensionality. However, in the context of multimodal signal processing, this problem has been studied less. Our approach to feature extraction is based on information theory, with an application on multimodal classification, in particular audio-visual speech recognition. Contrary to previous work in information theoretic feature selection applied to multimodal signals, our proposed methods penalize features for their redundancy, achieving more compact feature sets and better performance. We propose two greedy selection algorithms, one that penalizes a proportion of feature redundancy, while the other uses conditional mutual information as an evaluation measure, for the selection of visual features for audio-visual speech recognition. Our features perform better than linear discriminant analysis, the most usual transform for dimensionality reduction in the field, across a wide range of dimensionality values and combined with audio at different quality levels.

IEEE Transactions on Audio, Speech, and Language Processing | 2012

On Dynamic Stream Weighting for Audio-Visual Speech Recognition

Virginia Estellers; Mihai Gurban; Jean-Philippe Thiran

The integration of audio and visual information improves speech recognition performance, specially in the presence of noise. In these circumstances it is necessary to introduce audio and visual weights to control the contribution of each modality to the recognition task. We present a method to set the value of the weights associated to each stream according to their reliability for speech recognition, allowing them to change with time and adapt to different noise and working conditions. Our dynamic weights are derived from several measures of the stream reliability, some specific to speech processing and others inherent to any classification task, and take into account the special role of silence detection in the definition of audio and visual weights. In this paper, we propose a new confidence measure, compare it to existing ones, and point out the importance of the correct detection of silence utterances in the definition of the weighting system. Experimental results support our main contribution: the inclusion of a voice activity detector in the weighting scheme improves speech recognition over different system architectures and confidence measures, leading to an increase in performance more relevant than any difference between the proposed confidence measures.

international conference on multimodal interfaces | 2008

Dynamic modality weighting for multi-stream hmms inaudio-visual speech recognition

Mihai Gurban; Jean-Philippe Thiran; Thomas Drugman; Thierry Dutoit

Merging decisions from different modalities is a crucial problem in Audio-Visual Speech Recognition. To solve this, state synchronous multi-stream HMMs have been proposed for their important advantage of incorporating stream reliability in their fusion scheme. This paper focuses on stream weight adaptation based on modality confidence estimators. We assume different and time-varying environment noise, as can be encountered in realistic applications, and, for this, adaptive methods are best suited. Stream reliability is assessed directly through classifier outputs since they are not specific to either noise type or level. The influence of constraining the weights to sum to one is also discussed.

multimedia signal processing | 2007

Relevant Feature Selection for Audio-Visual Speech Recognition

Thomas Drugman; Mihai Gurban; Jean-Philippe Thiran

We present a feature selection method based on information theoretic measures, targeted at multimodal signal processing, showing how we can quantitatively assess the relevance of features from different modalities. We are able to find the features with the highest amount of information relevant for the recognition task, and at the same having minimal redundancy. Our application is audio-visual speech recognition, and in particular selecting relevant visual features. Experimental results show that our method outperforms other feature selection algorithms from the literature by improving recognition accuracy even with a significantly reduced number of features.

Multimodal Signal Processing#R##N#Theory and Applications for Human–Computer Interaction | 2010

Basic Concepts of Multimodal Analysis

Mihai Gurban; Jean-Philippe Thiran

Keywords: Multimodal ; Signal Processing ; Human-Computer Interaction ; LTS5 Reference EPFL-CHAPTER-144046 Record created on 2010-02-02, modified on 2017-05-10

international conference on image processing | 2009

Selecting relevant visual features for speechreading

Virginia Estellers; Mihai Gurban; Jean-Philippe Thiran

A quantitative measure of relevance is proposed for the task of constructing visual feature sets which are at the same time relevant and compact. A features relevance is given by the amount of information that it contains about the problem, while compactness is achieved by preventing the replication of information between features. To achieve these goals, we use mutual information both for assessing relevance and measuring the redundancy between features. Our application is speechreading, that is, speech recognition performed on the video of the speaker. This is justified by the fact that the performance of audio speech recognition can be improved by augmenting the audio features with visual ones, especially when there is noise in the audio channel. We report significant improvements compared to the most common method of dimensionality reduction for speechreading, Linear Discriminant Analysis (LDA).

Archive | 2008

Face and Speech Interaction

Mihai Gurban; Verónica Vilaplana; Jean-Philippe Thiran; Ferran Marqués

Two of the communication channels conveying more information in human-to-human interaction are face and speech. A robust interpretation of the information being expressed by people can be obtained by the combined analysis of both sources of information: short-term facial feature evolution (face) and speech information. This way, face and speech combined analysis is the basis of a large number of human computer interfaces and services. Regardless of the final application of such interfaces, there are two aspects that are commonly required: detection of human faces and combination of both sources of information. In the first section of the chapter, we review the state of the art of face and facial feature detection. The various methods are analyzed from the perspective of the different models that they use to represent images and patterns: pixel based, block based, transform coefficient based and region based techniques. In the second section of the chapter, we present two examples of multimodal signal processing applications. The first one allows the localization of the speakers mouth in a video sequence, using both the audio signal and the motion extracted from the video. The second application consists in recognizing the spoken words in a video sequence using both the audio and the images of moving lips.

Multimodal Signal Processing#R##N#Theory and Applications for Human–Computer Interaction | 2010