Claudio Vair
Polytechnic University of Turin
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Claudio Vair.
IEEE Transactions on Audio, Speech, and Language Processing | 2007
Fabio Castaldo; Daniele Colibro; Emanuele Dalmasso; Pietro Laface; Claudio Vair
The variability of the channel and environment is one of the most important factors affecting the performance of text-independent speaker verification systems. The best techniques for channel compensation are model based. Most of them have been proposed for Gaussian mixture models, while in the feature domain blind channel compensation is usually performed. The aim of this work is to explore techniques that allow more accurate intersession compensation in the feature domain. Compensating the features rather than the models has the advantage that the transformed parameters can be used with models of a different nature and complexity and for different tasks. In this paper, we evaluate the effects of the compensation of the intersession variability obtained by means of the channel factors approach. In particular, we compare channel variability modeling in the usual Gaussian mixture model domain, and our proposed feature domain compensation technique. We show that the two approaches lead to similar results on the NIST 2005 Speaker Recognition Evaluation data with a reduced computation cost. We also report the results of a system, based on the intersession compensation technique in the feature space that was among the best participants in the NIST 2006 Speaker Recognition Evaluation. Moreover, we show how we obtained significant performance improvement in language recognition by estimating and compensating, in the feature domain, the distortions due to interspeaker variability within the same language.
2006 IEEE Odyssey - The Speaker and Language Recognition Workshop | 2006
Claudio Vair; Daniele Colibro; Fabio Castaldo; Emanuele Dalmasso; Pietro Laface
The variability of the channel and environment is one of the most important factors affecting the performance of text-independent speaker verification systems. The best techniques for channel compensation are model based. Most of them have been proposed for Gaussian mixture models, while in the feature domain typically blind channel compensation is performed. The aim of this work is to explore techniques that allow more accurate channel compensation in the domain of the features. Compensating the features rather than the models has the advantage that the transformed parameters can be used with models of different nature and complexity, and also for different tasks. In this paper we evaluate the effects of the compensation of the channel variability obtained by means of the channel factors approach. In particular, we compare channel variability modeling in the usual Gaussian mixture model domain, and our proposed feature domain compensation technique. We show that the two approaches lead to similar results on the NIST 2005 speaker recognition evaluation data. Moreover, the quality of the transformed features is also assessed in the support vector machines framework for speaker recognition on the same data, and in preliminary experiments on language identification
international conference on acoustics, speech, and signal processing | 1995
Pietro Laface; Claudio Vair; Luciano Fissore
The paper presents a fast segmental Viterbi algorithm. A new search strategy particularly effective for very large vocabulary word recognition. It performs a tree based, time synchronous, left-to-right beam search that develops time-dependent acoustic and phonetic hypotheses. At any given time, it makes active a sub-word unit associated to an arc of a lexical tree only if that time is likely to be the boundary between the current and the next unit. This new technique, tested with a vocabulary of 188892 directory entries, achieves the same results obtained with the Viterbi algorithm, with a 35% speedup. Results are also presented for a 718 word, speaker independent continuous speech recognition task.
international conference on acoustics, speech, and signal processing | 2008
Fabio Castaldo; Daniele Colibro; Emanuele Dalmasso; Pietro Laface; Claudio Vair
This paper presents a stream-based approach for unsupervised multi-speaker conversational speech segmentation. The main idea of this work is to exploit prior knowledge about the speaker space to find a low dimensional vector of speaker factors that summarize the salient speaker characteristics. This new approach produces segmentation error rates that are better than the state of the art ones reported in our previous work on the segmentation task in the NIST 2000 Speaker Recognition Evaluation (SRE). We also show how the performance of a speaker recognition system in the core test of the 2006 NIST SRE is affected, comparing the results obtained using single speaker and automatically segmented test data.
international conference on acoustics, speech, and signal processing | 2007
Fabio Castaldo; Emanuele Dalmasso; Pietro Laface; Daniele Colibro; Claudio Vair
This work presents two contributions to language identification. The first contribution is the definition of a set of properly selected time-frequency features that are a valid alternative to the commonly used shifted delta cepstral features. As a second contribution, we show that significant performance improvement in language recognition can be obtained estimating a subspace that represents the distortions due to inter-speaker variability within the same language, and compensating these distortions in the domain of the features. Experiments on the NIST 1996 and 2003 Language Recognition Evaluation data have been successfully used to validate the effectiveness of the proposed techniques.
international conference on acoustics, speech, and signal processing | 2009
Emanuele Dalmasso; Fabio Castaldo; Pietro Laface; Daniele Colibro; Claudio Vair
This paper describes the improvements introduced in the Loquendo-Politecnico di Torino (LPT) speaker recognition system submitted to the NIST SRE08 evaluation campaign. This system, which was among the best participants in this evaluation, combines the results of three core acoustic systems, two based on Gaussian Mixture Models (GMMs), and one on Phonetic GMMs. We discuss the results of the experiments performed for the 10sec-10sec condition and for the core condition, including the challenging tasks involving a target speaker and an interviewer. The error rate reduction of our SRE08 system compared to the SRE06 system ranges from 25% of the telephone-interview condition to 57% of the interview-interview condition. On the test with telephone and microphone conversations, the improvements range from 9% to 32%.
international conference on acoustics, speech, and signal processing | 2010
Fabio Castaldo; Daniele Colibro; Sandro Cumani; Emanuele Dalmasso; Pietro Laface; Claudio Vair
This paper describes the system submitted by Loquendo and Politecnico di Torino (LPT) for the 2009 NIST Language Recognition Evaluation. The system is a combination of classifiers based on two core acoustic models and on two core phone tokenizers. It exploits several state-of-the-art techniques that have been successfully applied in recent years both in speaker and in language recognition.
international conference on acoustics, speech, and signal processing | 2002
Cosmin Popovici; M. Andorno; Pietro Laface; L. Fissore; Mario Nigra; Claudio Vair
Telecom Italia has deployed since the beginning of year 2001 a nationwide automatic Directory Assistance (DA) system that routinely serves customers asking for residential and business listings.
international conference on acoustics, speech, and signal processing | 2005
Daniele Colibro; Luciano Fissore; Cosmin Popovici; Claudio Vair; Pietro Laface
Most voice driven applications are based on recognition grammars. In complex applications it is difficult to exactly predict how the users will formulate their requests even if a careful study of the users behavior has been performed. Moreover, it is possible that a speakers word pronunciation does not match the phonetic transcription of the system, mainly in the case of foreign words. Loquendo has developed a tool that collects field data, detects the most significant weaknesses of the application due to pronunciation of formulation mismatches, and filters the collected field corpora. This permits the application designers to perform their analysis only on a reasonable amount of preprocessed and automatically labeled data. This paper presents the approaches that have been devised to detect pronunciation variants of vocabulary words and linguistic formulations not covered by the recognition grammar. Results showing the improvements that have been obtained including automatically detected formulations in three grammars for two languages are also detailed.
Archive | 2005
Claudio Vair; Daniele Colibro; Luciano Fissore