Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Xavier Menendez-Pidal is active.

Publication


Featured researches published by Xavier Menendez-Pidal.


Journal of the Acoustical Society of America | 2005

System and method for speech verification using out-of-vocabulary models

Duanpei Wu; Lex Olorenshaw; Xavier Menendez-Pidal; Ruxin Chen

A system and method for speech verification using out-of-vocabulary models includes a speech recognizer that has a model bank with system vocabulary word models, a garbage model, and one or more noise models. The model bank may reject an utterance or other sound as an invalid vocabulary word when the model bank identifies the utterance or other sound as corresponding to the garbage model or the noise models. Initial noise models may be selectively combined into a pre-determined number of final noise model clusters to effectively reduce the number of noise models that are utilized by the model bank of the speech recognizer to verify system vocabulary words.


international conference on acoustics speech and signal processing | 1999

A robust speech detection algorithm for speech activated hands-free applications

Duanpei Wu; Miyuki Tanaka; Ruxin Chen; Lex Olorenshaw; Mariscela Amador; Xavier Menendez-Pidal

This paper describes a novel noise robust speech detection algorithm that can operate reliably in severe noisy car conditions. High performance has been obtained with the following techniques: (1) noise suppression based on principal component analysis for pre-processing, (2) robust endpoint detection using dynamic parameters, and (3) speech verification using the periodicity of voiced signals with harmonic enhancement. Noise suppression improves the SNR as compared with nonlinear spectrum subtraction by about 20 dB. This makes the endpoint detection operate reliably in SNRs down to -10 dB. In car environments, road bump noises are problematic for speech detectors causing mis-detection errors. Speech verification helps to remove these errors. This technology is being used in Sony car navigation products.


international conference on acoustics, speech, and signal processing | 2003

Automatic set-up for speech recognition engines based on merit optimization

G. Hernandez-Abrego; Xavier Menendez-Pidal; T. Kemp; Katsuki Minamino; H. Lucke

We propose an automatic method to set-up the several parameters that define the behavior and performance of a typical speech recognition engine. Such parameters include weights and beam widths among others. Our method is based on the definition of a merit function. Here, merit is understood as an intuitive notion of recognition performance based on both recognition accuracy and computation time. A convenient definition of merit allows an optimization procedure to be applied to define a convenient set-up for the recognizer with little human intervention. The method is applied to adjust the recognition parameters of two different LVCSR (large vocabulary continuous speech recognition) applications, one in American English and another in Japanese.


Journal of the Acoustical Society of America | 1997

Automatic phoneme labeler in the TIMIT database

Xavier Menendez-Pidal; H. Tim Bunnell

A phonetic HMM labeler was developed for use in an automatic diphone extraction system. Using the TIMIT database for training, the best HMM labeler performance was obtained when the number of states for a given phoneme model was proportional to the average duration of that phoneme over the TIMIT database, and the HMM topology allowed both repetition of states and skipping of adjacent states. The acoustic feature set was comprised of 30 features per frame (8 Mel cepstral coefficients, the log rms amplitude, and the zero crossing rate of a frame, as well as their first and second time derivatives). Labeling accuracy was tested using all speech files in the TIMIT test set and accuracy was assessed in terms of the degree of separation between labeler‐assigned phoneme boundaries and the nominal phoneme boundaries provided in the database. 97.4% of the boundaries assigned by the labeler were within 30 ms of the nominal phoneme boundaries, and 86.8% were within 10 ms (i.e., one analysis frame) of the nominal bou...


Archive | 1999

Apparatus and method for noise attenuation in a speech recognition system

Xavier Menendez-Pidal; Miyuki Tanaka; Ruxin Chen


Archive | 2001

Supervised automatic text generation based on word classes for language modeling

Gustavo Hernandez Abrego; Xavier Menendez-Pidal


Journal of the Acoustical Society of America | 2007

System and method for speech verification using a confidence measure

Duanpei Wu; Xavier Menendez-Pidal; Lex Olorenshaw; Ruxin Chen


Journal of the Acoustical Society of America | 1998

Method for reducing noise distortions in a speech recognition system

Xavier Menendez-Pidal; Miyuki Tanaka; Ruxin Chen; Duanpei Wu


Archive | 2004

System and method for effectively implementing an optimized language model for speech recognition

Lei Duan; Gustavo Hernandez Abrego; Xavier Menendez-Pidal; Lex Olorenshaw


Archive | 2004

System and method for utilizing distance measures to perform text classification

Xavier Menendez-Pidal; Lei Duan; Michael Emonts

Collaboration


Dive into the Xavier Menendez-Pidal's collaboration.

Top Co-Authors

Avatar

Masanori Omote

Sony Computer Entertainment

View shared research outputs
Researchain Logo
Decentralizing Knowledge