Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Fabio Valente is active.

Publication


Featured researches published by Fabio Valente.


IEEE Transactions on Audio, Speech, and Language Processing | 2009

An Information Theoretic Approach to Speaker Diarization of Meeting Data

Deepu Vijayasenan; Fabio Valente

A speaker diarization system based on an information theoretic framework is described. The problem is formulated according to the information bottleneck (IB) principle. Unlike other approaches where the distance between speaker segments is arbitrarily introduced, the IB method seeks the partition that maximizes the mutual information between observations and variables relevant for the problem while minimizing the distortion between observations. This solves the problem of choosing the distance between speech segments, which becomes the Jensen-Shannon divergence as it arises from the IB objective function optimization. We discuss issues related to speaker diarization using this information theoretic framework such as the criteria for inferring the number of speakers, the tradeoff between quality and compression achieved by the diarization system, and the algorithms for optimizing the objective function. Furthermore, we benchmark the proposed system against a state-of-the-art system on the NIST RT06 (rich transcription) data set for speaker diarization of meetings. The IB-based system achieves a diarization error rate of 23.2% compared to 23.6% for the baseline system. This approach being mainly based on nonparametric clustering, it runs significantly faster than the baseline HMM/GMM based system, resulting in faster-than-real-time diarization.


international conference on acoustics, speech, and signal processing | 2007

Combination of Acoustic Classifiers Based on Dempster-Shafer Theory of Evidence

Fabio Valente; Hynek Hermansky

In this paper we investigate combination of neural net based classifiers using Dempster-Shafer theory of evidence. Under some assumptions, combination rule resembles a product of errors rule observed in human speech perception. Different combination are tested in ASR experiments both in matched and mismatched conditions and compared with more conventional probability combination rules. Proposed techniques are particularly effective in mismatched conditions.


international conference on acoustics, speech, and signal processing | 2008

Hierarchical and parallel processing of modulation spectrum for ASR applications

Fabio Valente; Hynek Hermansky

The modulation spectrum is an efficient representation for describing dynamic information in signals. In this work we investigate how to exploit different elements of the modulation spectrum for extraction of information in automatic recognition of speech (ASR). Parallel and hierarchical (sequential) approaches are investigated. Parallel processing combines outputs of independent classifiers applied to different modulation frequency channels. Hierarchical processing uses different modulation frequency channels sequentially. Experiments are run on a LVCSR task for meetings transcription and results are reported on the RT05 evaluation data. Processing modulation frequencies channels with different classifiers provides a consistent reduction in WER (2% absolute w.r.t. PLP baseline). Hierarchical processing outperforms parallel processing. The largest WER reduction is obtained through sequential processing moving from high to low modulation frequencies. This model is consistent with several perceptual and physiological studies on auditory processing.


ieee automatic speech recognition and understanding workshop | 2007

Agglomerative information bottleneck for speaker diarization of meetings data

Deepu Vijayasenan; Fabio Valente

In this paper, we investigate the use of agglomerative information bottleneck (aIB) clustering for the speaker diarization task of meetings data. In contrary to the state-of-the-art diarization systems that models individual speakers with Gaussian mixture models, the proposed algorithm is completely non parametric . Both clustering and model selection issues of non-parametric models are addressed in this work. The proposed algorithm is evaluated on meeting data on the RT06 evaluation data set. The system is able to achieve diarization error rates comparable to state-of-the-art systems at a much lower computational complexity.


IEEE Transactions on Audio, Speech, and Language Processing | 2011

An Information Theoretic Combination of MFCC and TDOA Features for Speaker Diarization

Deepu Vijayasenan; Fabio Valente

This correspondence describes a novel system for speaker diarization of meetings recordings based on the combination of acoustic features (MFCC) and time delay of arrivals (TDOAS). The first part of the paper analyzes differences between MFCC and TDOA features which possess completely different statistical properties. When Gaussian mixture models are used, experiments reveal that the diarization system is sensitive to the different recording scenarios (i.e., meeting rooms with varying number of microphones). In the second part, a new multistream diarization system is proposed extending previous work on information theoretic diarization. Both speaker clustering and speaker realignment steps are discussed; in contrary to current systems, the proposed method avoids to perform the feature combination averaging log-likelihood scores. Experiments on meetings data reveal that the proposed approach outperforms the GMM-based system when the recording is done with varying number of microphones.


international conference on acoustics, speech, and signal processing | 2008

Combination of agglomerative and sequential clustering for speaker diarization

Deepu Vijayasenan; Fabio Valente

This paper aims at investigating the use of sequential clustering for speaker diarization. Conventional diarization systems are based on parametric models and agglomerative clustering. In our previous work we proposed a non-parametric method based on the agglomerative information bottleneck for very fast diarization. Here we consider the combination of sequential and agglomerative clustering for avoiding local maxima of the objective function and for purification. Experiments are run on the RT06 eval data. Sequential Clustering with oracle model selection can reduce the speaker error by 10% w.r.t. agglomerative clustering. When the model selection is based on Normalized Mutual Information criterion, a relative improvement of 5% is obtained using a combination of agglomerative and sequential clustering.


Speech Communication | 2010

Multi-Stream Speech Recognition based on Dempster-Shafer Combination Rule

Fabio Valente

Abstract This paper aims at investigating the use of Dempster–Shafer (DS) combination rule for multi-stream automatic speech recognition. The DS combination is based on a generalization of the conventional Bayesian framework. The main motivation for this work is the similarity between the DS combination and findings of Fletcher on human speech recognition. Experiments are based on the combination of several Multi Layer Perceptron (MLP) classifiers trained on different representations of the speech signal. The TANDEM framework is adopted in order to use the MLP outputs into conventional speech recognition systems. We exhaustively investigate several methods for applying the DS combination into multi-stream ASR. Experiments are run on small and large vocabulary speech recognition tasks and aim at comparing the proposed technique with other frame-based combination rules (e.g. inverse entropy). Results reveal that the proposed method outperforms conventional combination rules in both tasks. Furthermore we verify that the performance of the combined feature stream is never inferior to the performance of the best individual feature stream. We conclude the paper discussing other applications of the DS combination and possible extensions.


IEEE Transactions on Affective Computing | 2014

Predicting Continuous Conflict Perceptionwith Bayesian Gaussian Processes

Samuel Kim; Fabio Valente; Maurizio Filippone; Alessandro Vinciarelli

Conflict is one of the most important phenomena of social life, but it is still largely neglected by the computing community. This work proposes an approach that detects common conversational social signals (loudness, overlapping speech, etc.) and predicts the conflict level perceived by human observers in continuous, non-categorical terms. The proposed regression approach is fully Bayesian and it adopts automatic relevance determination to identify the social signals that influence most the outcome of the prediction. The experiments are performed over the SSPNet Conflict Corpus, a publicly available collection of 1,430 clips extracted from televised political debates (roughly 12 hours of material for 138 subjects in total). The results show that it is possible to achieve a correlation close to 0.8 between actual and predicted conflict perception.


international conference on acoustics, speech, and signal processing | 2012

Automatic detection of conflicts in spoken conversations: Ratings and analysis of broadcast political debates

Samuel Kim; Fabio Valente; Alessandro Vinciarelli

Automatic analysis of spoken conversations has recently searched for phenomena like agreement/disagreement in collaborative and non-conflictual discussions (e.g., meetings). This work adds a novel dimension investigating conflicts in spontaneous conversations. The study makes use of broadcasted political debates where conflicts naturally arise between participants. In the first part, an annotation scheme to rate the degree of conflict in conversations is described and applied to 12 hours of recordings. In the second part, the correlation between various prosodic/conversational features and the degree of conflict is investigated. In the third part, we perform automatic detection of the level of conflict based on those features showing an F-measure of 71.6% in three-level classification tasks.


acm multimedia | 2012

Predicting the conflict level in television political debates: an approach based on crowdsourcing, nonverbal communication and gaussian processes

Samuel Kim; Maurizio Filippone; Fabio Valente; Alessandro Vinciarelli

One of the most recent trends in multimedia indexing is to represent data in terms of the social and psychological phenomena that users perceive. In such a perspective this article proposes an approach for the automatic detection of conflict level in television political debates. The proposed approach includes the use of crowdsourcing techniques for modeling the perception of data consumers, the extraction of (language independent) nonverbal behavioral cues and the application of regression techniques based on Gaussian Processes. The experiments have been performed over 1430 clips of 30 seconds extracted from 45 political debates (roughly 12 hours of material). The results show that a correlation up to 0.8 can be achieved between the actual and predicted conflict level.

Collaboration


Dive into the Fabio Valente's collaboration.

Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Petr Motlicek

Idiap Research Institute

View shared research outputs
Top Co-Authors

Avatar

Samuel Kim

Idiap Research Institute

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Researchain Logo
Decentralizing Knowledge