Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Alejandro Acero is active.

Publication


Featured researches published by Alejandro Acero.


international conference on acoustics, speech, and signal processing | 1990

Environmental robustness in automatic speech recognition

Alejandro Acero; Richard M. Stern

Initial efforts to make Sphinx, a continuous-speech speaker-independent recognition system, robust to changes in the environment are reported. To deal with differences in noise level and spectral tilt between close-talking and desk-top microphones, two novel methods based on additive corrections in the cepstral domain are proposed. In the first algorithm, the additive correction depends on the instantaneous SNR of the signal. In the second technique, expectation-maximization techniques are used to best match the cepstral vectors of the input utterances to the ensemble of codebook entries representing a standard acoustical ambience. Use of the algorithms dramatically improves recognition accuracy when the system is tested on a microphone other than the one on which it was trained.<<ETX>>


international conference on acoustics, speech, and signal processing | 1991

Robust speech recognition by normalization of the acoustic space

Alejandro Acero; Richard M. Stern

Several algorithms are presented that increase the robustness of SPHINX, the CMU (Carnegie Mellon University) continuous-speech speaker-independent recognition systems, by normalizing the acoustic space via minimization of the overall VQ distortion. The authors propose an affine transformation of the cepstrum in which a matrix multiplication perform frequency normalization and a vector addition attempts environment normalization. The algorithms for environment normalization are efficient and improve the recognition accuracy when the system is tested on a microphone other than the one on which it was trained. The frequency normalization algorithm applies a different warping on the frequency axis to different speakers and it achieves a 10% decrease in error rate.<<ETX>>


international conference on acoustics, speech, and signal processing | 1992

Efficient joint compensation of speech for the effects of additive noise and linear filtering

Fu-Hua Liu; Alejandro Acero; Richard M. Stern

The authors describe two algorithms that provide robustness for automatic speech recognition systems in a fashion that is suitable for real-time environmental normalization for workstations of moderate size. The first algorithm is a modification of the SNR-dependent cepstral normalization (SDCN) and the fixed code-word dependent cepstral normalization (FCDCN) algorithms given by Acero and Stern (1990), except that unlike these algorithms it provides computationally-efficient environment normalization without prior knowledge of the acoustical characteristics of the environment in which the system will be operated. The second algorithm is a modification of the more complex CDCN algorithm that enables it to perform environmental compensation in better than real time. The authors compare the recognition accuracy, computational complexity, and amount of training data needed to adapt to new acoustical environments using these algorithms with several different types of headset-mounted and desktop microphones.<<ETX>>


human language technology | 1993

Efficient cepstral normalization for robust speech recognition

Fu-Hua Liu; Richard M. Stern; Xuedong Huang; Alejandro Acero

In this paper we describe and compare the performance of a series of cepstrum-based procedures that enable the CMU SPHINX-II speech recognition system to maintain a high level of recognition accuracy over a wide variety of acoustical environments. We describe the MFCDCN algorithm, an environment-independent extension of the efficient SDCN and FCDCN algorithms developed previously. We compare the performance of these algorithms with the very simple RASTA and cepstral mean normalization procedures, describing the performance of these algorithms in the context of the 1992 DARPA CSR evaluation using secondary microphones, and in the DARPA stress-test evaluation.


human language technology | 1992

Multiple approaches to robust speech recognition

Richard M. Stern; Fu-Hua Liu; Yoshiaki Ohshima; Thomas M. Sullivan; Alejandro Acero

This paper compares several different approaches to robust speech recognition. We review CMUs ongoing research in the use of acoustical pre-processing to achieve robust speech recognition, and we present the results of the first evaluation of preprocessing in the context of the DARPA standard ATIS domain for spoken language systems. We also describe and compare the effectiveness of three complementary methods of signal processing for robust speech recognition: acoustical pre-processing, microphone array processing, and the use of physiologically-motivated models of peripheral signal processing. Recognition error rates are presented using these three approaches in isolation and in combination with each other for the speaker-independent continuous alphanumeric census speech recognition task.


Journal of the Acoustical Society of America | 2011

Combined speech and alternate input modality to a mobile device

Milind Mahajan; Alejandro Acero; Bo-June Hsu

(51) International Patent Classification: (81) Designated States (unless otherwise indicated, for every G06F 3/16 (2006.01) GlOL 15/22 (2006.01) kind of national protection available): AE, AG, AL, AM, AT,AU, AZ, BA, BB, BG, BR, BW, BY, BZ, CA, CH, CN, (21) International Application Number: CO, CR, CU, CZ, DE, DK, DM, DZ, EC, EE, EG, ES, FI, PCT/US2006/040537 GB, GD, GE, GH, GM, GT, HN, HR, HU, ID, IL, IN, IS, JP, KE, KG, KM, KN, KP, KR, KZ, LA, LC, LK, LR, LS, (22) International Filing Date: 16 October 2006 (16.10.2006) LT, LU, LV,LY,MA, MD, MG, MK, MN, MW, MX, MY, (25) Filing Language: English MZ, NA, NG, NI, NO, NZ, OM, PG, PH, PL, PT, RO, RS, RU, SC, SD, SE, SG, SK, SL, SM, SV, SY, TJ, TM, TN, (26) Publication Language: English TR, TT, TZ, UA, UG, US, UZ, VC, VN, ZA, ZM, ZW


international conference on acoustics, speech, and signal processing | 2004

Noise robust speech recognition with a switching linear dynamic model

James G. Droppo; Alejandro Acero

Model based feature enhancement techniques are constructed from acoustic models for speech and noise, together with a model of how the speech and noise produce the noisy observations. Most techniques incorporate either Gaussian mixture models (GMM) or hidden Markov models (HMM). This paper explores using a switching linear dynamic model (LDM) for the clean speech. The linear dynamics of the model capture the smooth time evolution of speech. The switching states of the model capture the piecewise stationary characteristics of speech. However, incorporating a switching LDM causes the enhancement problem to become intractable. With a GMM or an HMM, the enhancement running time is proportional to the length of the utterance. The switching LDM causes the running time to become exponential in the length of the utterance. To overcome this drawback, the standard generalized pseudo-Bayesian technique is used to provide an approximate solution of the enhancement problem. We present preliminary results demonstrating that, even with relatively small model sizes, substantial word error rate improvement can be achieved.


Journal of the Acoustical Society of America | 2006

Including the category of environmental noise when processing speech signals

James G. Droppo; Alejandro Acero; Li Deng

A method and apparatus are provided for identifying a noise environment for a frame of an input signal based on at least one feature for that frame. Under one embodiment, the noise environment is identified by determining the probability of each of a set of possible noise environments. For some embodiments, the probabilities of the noise environments for past frames are included in the identification of an environment for a current frame. In one particular embodiment, a count is generated for each environment that indicates the number of past frames for which the environment was the most probable environment. The environment with the highest count is then selected as the environment for the current frame.


human language technology | 1994

Signal processing for robust speech recognition

Fu-Hua Liu; Pedro J. Moreno; Richard M. Stern; Alejandro Acero

This paper describes a series of cepstral-based compensation procedures that render the SPHINX-II system more robust with respect to acoustical environment. The first algorithm, phone-dependent cepstral compensation, is similar in concept to the previously-described MFCDCN method, except that cepstral compensation vectors are selected according to the current phonetic hypothesis, rather than on the basis of SNR or VQ codeword identity. We also describe two procedures to accomplish adaptation of the VQ codebook for new environments, as well as the use of reduced-bandwidth frequency analysis to process telephone-bandwidth speech. Use of the various compensation algorithms in consort produces a reduction of error rates for SPHINX-II by as much as 40 percent relative to the rate achieved with cepstral mean normalization alone, in both development test sets and in the context of the 1993 ARPA CSR evaluations.


international conference on acoustics, speech, and signal processing | 1994

Environment normalization for robust speech recognition using direct cepstral comparison

Fu-Hua Liu; Richard M. Stern; Alejandro Acero; Pedro J. Moreno

In this paper we describe and evaluate a series of new algorithms that compensate for the effects of unknown acoustical environments or changes in environment. The algorithms use compensation vectors that are added to the cepstral representations of speech that is input to a speech recognition system. While these vectors are computed from direct frame-by-frame comparisons of cepstra of speech simultaneously recorded in the training environment and various prototype testing environments, the compensation algorithms do not assume that the acoustical characteristics of the actual testing environment are known. The specific compensation vector applied in a given frame depends on either physical attributes such as SNR or presumed phonetic identity. The compensation algorithms are evaluated using the 1992 ARPA 5000 word WSJ/CSR corpus. The best system combines phoneme-based and SNR-based cepstral compensation with cepstral mean normalization, and provides a 66.8% reduction in error rate over baseline processing when tested using a standard suite of unknown microphones.<<ETX>>

Collaboration


Dive into the Alejandro Acero's collaboration.

Researchain Logo
Decentralizing Knowledge