Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Pietro Laface is active.

Publication


Featured researches published by Pietro Laface.


Speech Communication | 2007

Linear hidden transformations for adaptation of hybrid ANN/HMM models

Roberto Gemello; Franco Mana; Stefano Scanzio; Pietro Laface; Renato De Mori

This paper focuses on the adaptation of Automatic Speech Recognition systems using Hybrid models combining Artificial Neural Networks (ANN) with Hidden Markov Models (HMM). Most adaptation techniques for ANNs reported in the literature consist in adding a linear transformation network connected to the input of the ANN. This paper describes the application of linear transformations not only to the input features, but also to the outputs of the internal layers. The motivation is that the outputs of an internal layer represent discriminative features of the input pattern suitable for the classification performed at the output of the ANN. In order to reduce the effect due to the lack of adaptation samples for some phonetic units we propose a new solution, called Conservative Training. Supervised adaptation experiments with different corpora and for different types of adaptation are described. The results show that the proposed approach always outperforms the use of transformations in the feature space and yields even better results when combined with linear input transformations.


IEEE Transactions on Audio, Speech, and Language Processing | 2007

Compensation of Nuisance Factors for Speaker and Language Recognition

Fabio Castaldo; Daniele Colibro; Emanuele Dalmasso; Pietro Laface; Claudio Vair

The variability of the channel and environment is one of the most important factors affecting the performance of text-independent speaker verification systems. The best techniques for channel compensation are model based. Most of them have been proposed for Gaussian mixture models, while in the feature domain blind channel compensation is usually performed. The aim of this work is to explore techniques that allow more accurate intersession compensation in the feature domain. Compensating the features rather than the models has the advantage that the transformed parameters can be used with models of a different nature and complexity and for different tasks. In this paper, we evaluate the effects of the compensation of the intersession variability obtained by means of the channel factors approach. In particular, we compare channel variability modeling in the usual Gaussian mixture model domain, and our proposed feature domain compensation technique. We show that the two approaches lead to similar results on the NIST 2005 Speaker Recognition Evaluation data with a reduced computation cost. We also report the results of a system, based on the intersession compensation technique in the feature space that was among the best participants in the NIST 2006 Speaker Recognition Evaluation. Moreover, we show how we obtained significant performance improvement in language recognition by estimating and compensating, in the feature domain, the distortions due to interspeaker variability within the same language.


2006 IEEE Odyssey - The Speaker and Language Recognition Workshop | 2006

Channel Factors Compensation in Model and Feature Domain for Speaker Recognition

Claudio Vair; Daniele Colibro; Fabio Castaldo; Emanuele Dalmasso; Pietro Laface

The variability of the channel and environment is one of the most important factors affecting the performance of text-independent speaker verification systems. The best techniques for channel compensation are model based. Most of them have been proposed for Gaussian mixture models, while in the feature domain typically blind channel compensation is performed. The aim of this work is to explore techniques that allow more accurate channel compensation in the domain of the features. Compensating the features rather than the models has the advantage that the transformed parameters can be used with models of different nature and complexity, and also for different tasks. In this paper we evaluate the effects of the compensation of the channel variability obtained by means of the channel factors approach. In particular, we compare channel variability modeling in the usual Gaussian mixture model domain, and our proposed feature domain compensation technique. We show that the two approaches lead to similar results on the NIST 2005 speaker recognition evaluation data. Moreover, the quality of the transformed features is also assessed in the support vector machines framework for speaker recognition on the same data, and in preliminary experiments on language identification


international conference on acoustics, speech, and signal processing | 1995

A fast segmental Viterbi algorithm for large vocabulary recognition

Pietro Laface; Claudio Vair; Luciano Fissore

The paper presents a fast segmental Viterbi algorithm. A new search strategy particularly effective for very large vocabulary word recognition. It performs a tree based, time synchronous, left-to-right beam search that develops time-dependent acoustic and phonetic hypotheses. At any given time, it makes active a sub-word unit associated to an arc of a lexical tree only if that time is likely to be the boundary between the current and the next unit. This new technique, tested with a vocabulary of 188892 directory entries, achieves the same results obtained with the Viterbi algorithm, with a 35% speedup. Results are also presented for a 718 word, speaker independent continuous speech recognition task.


international conference on acoustics, speech, and signal processing | 2008

Stream-based speaker segmentation using speaker factors and eigenvoices

Fabio Castaldo; Daniele Colibro; Emanuele Dalmasso; Pietro Laface; Claudio Vair

This paper presents a stream-based approach for unsupervised multi-speaker conversational speech segmentation. The main idea of this work is to exploit prior knowledge about the speaker space to find a low dimensional vector of speaker factors that summarize the salient speaker characteristics. This new approach produces segmentation error rates that are better than the state of the art ones reported in our previous work on the segmentation task in the NIST 2000 Speaker Recognition Evaluation (SRE). We also show how the performance of a speaker recognition system in the core test of the 2006 NIST SRE is affected, comparing the results obtained using single speaker and automatically segmented test data.


IEEE Transactions on Pattern Analysis and Machine Intelligence | 1985

Parallel Algorithms for Syllable Recognition in Continuous Speech

Renato De Mori; Pietro Laface; Yu F. Mong

A distributed rule-based system for automatic speech recognition is described. Acoustic property extraction and feature hypothesization are performed by the application of sequences of operators. These sequences, called plans, are executed by cooperative expert programs. Experimental results on the automatic segmentation and recognition of phrases, made of connected letters and digits, are described and discussed.


international conference on acoustics, speech, and signal processing | 2011

Fast discriminative speaker verification in the i-vector space

Sandro Cumani; Niko Brümmer; Lukas Burget; Pietro Laface

This work presents a new approach to discriminative speaker verification. Rather than estimating speaker models, or a model that discriminates between a speaker class and the class of all the other speakers, we directly solve the problem of classifying pairs of utterances as belonging to the same speaker or not.


IEEE Transactions on Acoustics, Speech, and Signal Processing | 1989

Lexical access to large vocabularies for speech recognition

Luciano Fissore; Pietro Laface; Giorgio Micca; Roberto Pieraccini

A large-vocabulary isolated-word recognition system based on the hypothesize-and-test paradigm is described. Word preselection is achieved by segmenting and classifying the input signal in terms of broad phonetic classes. A lattice of phonetic segments is generated and organized as a graph. Word hypothesization is obtained by matching this graph against the models of all vocabulary words, where a word model is itself a phonetic representation made in terms of a graph. A modified dynamic programming matching procedure gives an efficient solution to this graph-to-graph matching problem. Hidden Markov models (HMMs) of subword units are used as a more detailed knowledge in the verification step. The word candidates generated by the previous step are represented as sequences of diphone-like subword units, and the Viterbi algorithm is used for evaluating their likelihood. Lexical knowledge is organized in a tree structure where the initial common subsequences of word descriptions are shared, and a beam-search strategy carries on the most promising paths only. The results show that a complexity reduction of about 73% can be achieved by using the two-pass approach with respect to the direct approach, while the recognition accuracy remains comparable. >


IEEE Transactions on Pattern Analysis and Machine Intelligence | 1980

Use of Fuzzy Algorithms for Phonetic and Phonemic Labeling of Continuous Speech

Renato De Mori; Pietro Laface

A model for assigning phonetic and phonemic labels to speech segments is presented. The system executes fuzzy algorithms that assign degrees of worthiness to structured interpretations of syllabic segments extracted from the signal of a spoken sentence. The knowledge source is a series of syntactic rules whose syntactic categories are phonetic and phonemic features detected by a precategorical and a categorical classification of speech sounds. Rules inferred from experiments and results for male and female voices are presented.


IEEE Transactions on Acoustics, Speech, and Signal Processing | 1976

Automatic detection and description of syllabic features in continuous speech

R. De Mori; Pietro Laface; Elio Piccolo

The details of the implementation of a syntax-controlled acoustic encoder of a speech understanding system (SUS) are presented. Finite-state automata operating on artificial descriptions of suprasegmentals and global spectral features isolate syllables in continuous speech. Then a combinational algorithm tracks the formants for the voiced intervals of each syllable, and other algorithms provide a complete structural description of spectral and prosodic features for a spoken sentence. Such a description consists of a string of symbols and numerical attributes and is a representation of speech in terms of perceptually significant primitive forms. It contains all the information required to reconstruct the analyzed sentence with a formant synthesizer; it can be used directly either for emitting or verifying hypotheses at the lexical level of an SUS and for automatically learning phonetic features by grammatical inference.

Collaboration


Dive into the Pietro Laface's collaboration.

Top Co-Authors

Avatar

Giorgio Micca

Polytechnic University of Turin

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Researchain Logo
Decentralizing Knowledge