Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Mazin G. Rahim is active.

Publication


Featured researches published by Mazin G. Rahim.


Journal of the Acoustical Society of America | 2000

System and method of recognizing an acoustic environment to adapt a set of based recognition models to the current acoustic environment for subsequent speech recognition

Mazin G. Rahim

A speech recognition system which effectively recognizes unknown speech from multiple acoustic environments includes a set of secondary models, each associated with one or more particular acoustic environments, integrated with a base set of recognition models. The speech recognition system is trained by making a set of secondary models in a first stage of training, and integrating the set of secondary models with a base set of recognition models in a second stage of training.


international conference on acoustics, speech, and signal processing | 2005

The AT&T WATSON speech recognizer

Vincent Goffin; Cyril Allauzen; Enrico Bocchieri; Dilek Hakkani-Tür; Andrej Ljolje; Sarangarajan Parthasarathy; Mazin G. Rahim; Giuseppe Riccardi; Murat Saraclar

This paper describes the AT&T WATSON real-time speech recognizer, the product of several decades of research at AT&T. The recognizer handles a wide range of vocabulary sizes and is based on continuous-density hidden Markov models for acoustic modeling and finite state networks for language modeling. The recognition network is optimized for efficient search. We identify the algorithms used for high-accuracy, real-time and low-latency recognition. We present results for small and large vocabulary tasks taken from the AT&T VoiceTone/sup /spl reg// service, showing word accuracy improvement of about 5% absolute and real-time processing speed-up by a factor between 2 and 3.


IEEE Transactions on Speech and Audio Processing | 2000

Maximum likelihood and minimum classification error factor analysis for automatic speech recognition

Lawrence K. Saul; Mazin G. Rahim

Hidden Markov models (HMMs) for automatic speech recognition rely on high dimensional feature vectors to summarize the short-time properties of speech. Correlations between features can arise when the speech signal is nonstationary or corrupted by noise. We investigate how to model these correlations using factor analysis, a statistical method for dimensionality reduction. Factor analysis uses a small number of parameters to model the covariance structure of high dimensional data. These parameters can be chosen in two ways: (1) to maximize the likelihood of observed speech signals, or (2) to minimize the number of classification errors. We derive an expectation-maximization (EM) algorithm for maximum likelihood estimation and a gradient descent algorithm for improved class discrimination. Speech recognizers are evaluated on two tasks, one small-sized vocabulary (connected alpha-digits) and one medium-sized vocabulary (New Jersey town names). We find that modeling feature correlations by factor analysis leads to significantly increased likelihoods and word accuracies. Moreover, the rate of improvement with model size often exceeds that observed in conventional HMMs.


international conference on acoustics, speech, and signal processing | 2004

Unsupervised and active learning in automatic speech recognition for call classification

Dilek Hakkani-Tür; Gokhan Tur; Mazin G. Rahim; Giuseppe Riccardi

A key challenge in rapidly building spoken natural language dialog applications is minimizing the manual effort required in transcribing and labeling speech data. This task is not only expensive but also time consuming. We present a novel approach that aims at reducing the amount of manually transcribed in-domain data required for building automatic speech recognition (ASR) models in spoken language dialog systems. Our method is based on mining relevant text from various conversational systems and Web sites. An iterative process is employed where the performance of the models can be improved through both unsupervised and active learning of the ASR models. We have evaluated the robustness of our approach on a call classification task that has been selected from AT&T VoiceTone/sup SM/ customer care. Our results indicate that with unsupervised learning it is possible to achieve a call classification performance that is only 1.5% lower than the upper bound set when using all available in-domain transcribed data.


Journal of the Acoustical Society of America | 1993

On the use of neural networks in articulatory speech synthesis

Mazin G. Rahim; Colin C. Goodyear; W. Bastiaan Kleijn; Juergen Schroeter; Man Mohan Sondhi

A long‐standing problem in the analysis and synthesis of speech by articulatory description is the estimation of the vocal tract shape parameters from natural input speech. Methods to relate spectral parameters to articulatory positions are feasible if a sufficiently large amount of data is available. This, however, results in a high computational load and large memory requirements. Further, one needs to accommodate ambiguities in this mapping due to the nonuniqueness problem (i.e., several vocal tract shapes can result in identical spectral envelopes). This paper describes the use of artificial neural networks for acoustic to articulatory parameter mapping. Experimental results show that a single feed‐forward neural net is unable to perform this mapping sufficiently well when trained on a large data set. An alternative procedure is proposed, based on an assembly of neural networks. Each network is designated to a specific region in the articulatory space, and performs a mapping from cepstral values into ...


ieee automatic speech recognition and understanding workshop | 2003

WebTalk: mining Websites for automatically building dialog systems

Junlan Feng; Srinivas Bangalore; Mazin G. Rahim

The task of creating customized spoken dialog applications has traditionally been known to be expensive, requiring significant resources and a certain level of expertise. This is clearly an obstacle in porting and scaling dialog systems especially those required for customer care and help desk applications. This paper describes WebTalk - a technique for automatically creating spoken and text-based customer-care dialog applications solely based on a companys Website. Our goal is to create task oriented dialog services by automatically learning the task knowledge and mining information present on corporate Websites. In this paper, we discuss the motivation and the feasibility of creating such a technique and present an overview of the main components of WebTalk. We address some of the challenges and present methods for evaluating such a system.


international conference on acoustics, speech, and signal processing | 2002

Combining prior knowledge and boosting for call classification in spoken language dialogue

Marie Rochery; Robert E. Schapire; Mazin G. Rahim; Narendra K. Gupta; Giuseppe Riccardi; Srinivas Bangalore; Hiyan Alshawi; Shona Douglas

Data collection and annotation are major bottlenecks in rapid development of accurate syntactic and semantic models for natural-language dialogue systems. In this paper we show how human knowledge can be used when designing a language understanding system in a manner that would alleviate the dependence on large sets of data. In particular, we extend BoosTexter, a member of the boosting family of algorithms, to combine and balance hand-crafted rules with the statistics of available data. Experiments on two voice-enabled applications for customer care and help desk are presented.


Journal of the Acoustical Society of America | 1995

Artificial Neural Networks for Speech Analysis/synthesis

Mazin G. Rahim; John C. Burgess

Speech production and synthesis models. Artificial neural networks (ANNS). ANNS for speech analysis. An articulatory speech synthesizer. ANNS in articulatory speech synthesis: part one. ANNS in articulatory speech synthesis: part two. Training of ANNS on real speech.


Computer Speech & Language | 1999

Integrated bias removal techniques for robust speech recognition

Craig T. Lawrence; Mazin G. Rahim

In this paper, we present a family of maximum likelihood (ML) techniques that aim at reducing an acoustic mismatch between the training and testing conditions of hidden Markov model (HMM)-based automatic speech recognition (ASR) systems. Our study is conducted in two phases. In the first phase, we evaluate two classes of robustness techniques; those that represent the acoustic mismatch for the entire utterance as a single additive bias and those that represent the mismatch as a non-stationary bias. In the second phase, we propose a codebook-based stochastic matching (CBSM) approach for bias removal both at the feature level and at the model level. CBSM associates each bias with an ensemble of HMM mixture components that share similar acoustic characteristics. It is integrated with hierarchical signal bias removal and further extended to account for n -best candidates. Experimental results on connected digits, recorded over a cellular network, shows that incorporating bias removal reduces both the word and string error rates by about 12% and 16%, respectively, when using a global bias, and 36% and 31%, respectively, when using a non-stationary bias.


Computer Speech & Language | 1997

String-based minimum verification error (SB-MVE) training for speech recognition

Mazin G. Rahim; Chin-Hui Lee

Abstract In recent years, we have experienced an increasing demand for speech recognition technology to be utilized in various real-world applications, such as name dialling, message retrieval, etc. During this process, we have learned that the performance of speech recognition systems under laboratory environment cannot be duplicated in the actual service. Two major causes have been identified to this problem. The first is the lack ofrobustnesswhen the acoustic conditions in testing are different from those in training. The second is the lack offlexibilitywhen handling spontaneous speech input which often contains extraneous speech in addition to the desired speech segments of key phrases. This paper focuses on one aspect of achieving flexible speech recognition, namely, improving the ability to cope with naturally spoken utterances through discriminative utterance verification. We propose an algorithm for training utterance verification systems based on the minimum verification error (MVE) training framework. Experimental results on speaker-independent telephone-based connected digits show a significant improvement in verification accuracy when the discriminant function used in MVE training is made consistent with the confidence measure used in utterance verification. At a 10% rejection rate, for example, the new proposed method reduces the string error rate by a further 22·7% over our previously reported results in which the MVE-based discriminative training was not incorporated.

Collaboration


Dive into the Mazin G. Rahim's collaboration.

Researchain Logo
Decentralizing Knowledge