Ahmad Emami | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Ahmad Emami is active.

Explore More

Publication

Featured researches published by Ahmad Emami.

ieee automatic speech recognition and understanding workshop | 2007

Empirical study of neural network language models for Arabic speech recognition

Ahmad Emami; Lidia Mangu

In this paper we investigate the use of neural network language models for Arabic speech recognition. By using a distributed representation of words, the neural network model allows for more robust generalization and is better able to fight the data sparseness problem. We investigate different configurations of the neural probabilistic model, experimenting with such parameters as N-gram order, output vocabulary, normalization method, and model size and parameters. Experiments were carried out on Arabic broadcast news and broadcast conversations data and the optimized neural network language models showed significant improvements over the baseline N-gram model.

international conference on acoustics, speech, and signal processing | 2007

Large-Scale Distributed Language Modeling

Ahmad Emami; Kishore Papineni; Jeffrey S. Sorensen

A novel distributed language model that has no constraints on the n-gram order and no practical constraints on vocabulary size is presented. This model is scalable and allows for an arbitrarily large corpus to be queried for statistical estimates. Our distributed model is capable of producing n-gram counts on demand. By using a novel heuristic estimate for the interpolation weights of a linearly interpolated model, it is possible to dynamically compute the language model probabilities. The distributed architecture follows the client-server paradigm and allows for each client to request an arbitrary weighted mixture of the corpus. This allows easy adaptation of the language model to particular test conditions. Experiments using the distributed LM for re-ranking N-best lists of a speech recognition system resulted in considerable improvements in word error rate (WER), while integration with a machine translation decoder resulted in significant improvements in translation quality as measured by the BLEU score.

IEEE Transactions on Audio, Speech, and Language Processing | 2009

Advances in Arabic Speech Transcription at IBM Under the DARPA GALE Program

Hagen Soltau; George Saon; Brian Kingsbury; Hong-Kwang Jeff Kuo; Lidia Mangu; Daniel Povey; Ahmad Emami

This paper describes the Arabic broadcast transcription system fielded by IBM in the GALE Phase 2.5 machine translation evaluation. Key advances include the use of additional training data from the Linguistic Data Consortium (LDC), use of a very large vocabulary comprising 737 K words and 2.5 M pronunciation variants, automatic vowelization using flat-start training, cross-adaptation between unvowelized and vowelized acoustic models, and rescoring with a neural-network language model. The resulting system achieves word error rates below 10% on Arabic broadcasts. Very large scale experiments with unsupervised training demonstrate that the utility of unsupervised data depends on the amount of supervised data available. While unsupervised training improves system performance when a limited amount (135 h) of supervised data is available, these gains disappear when a greater amount (848 h) of supervised data is used, even with a very large (7069 h) corpus of unsupervised data. We also describe a method for modeling Arabic dialects that avoids the problem of data sparseness entailed by dialect-specific acoustic models via the use of non-phonetic, dialect questions in the decision trees. We show how this method can be used with a statically compiled decoding graph by partitioning the decision trees into a static component and a dynamic component, with the dynamic component being replaced by a mapping that is evaluated at run-time.

ieee automatic speech recognition and understanding workshop | 2009

Syntactic features for Arabic speech recognition

Hong-Kwang Jeff Kuo; Lidia Mangu; Ahmad Emami; Imed Zitouni; Young-Suk Lee

We report word error rate improvements with syntactic features using a neural probabilistic language model through N-best re-scoring. The syntactic features we use include exposed head words and their non-terminal labels both before and after the predicted word. Neural network LMs generalize better to unseen events by modeling words and other context features in continuous space. They are suitable for incorporating many different types of features, including syntactic features, where there is no pre-defined back-off order. We choose an N-best re-scoring framework to be able to take full advantage of the complete parse tree of the entire sentence. Using syntactic features, along with morphological features, improves the word error rate (WER) by up to 5.5% relative, from 9.4% to 8.6%, on the latest GALE evaluation test set.

international conference on acoustics, speech, and signal processing | 2011

Multi-class Model M

Ahmad Emami; Stanley F. Chen

Model M, a novel class-based exponential language model, has been shown to significantly outperform word n-gram models in state-of-the-art machine translation and speech recognition systems. The model was motivated by the observation that shrinking the sum of the parameter magnitudes in an exponential language model leads to better performance on unseen data. Being a class-based language model, Model M makes use of word classes that are found automatically from training data. In this paper, we extend Model M to allow for different clusterings to be used at different word positions. This is motivated by the fact that words play different roles depending on their position in an n-gram. Experiments on standard NIST and GALE Arabic-to-English development and test sets show improvements in machine translation quality as measured by automatic evaluation metrics.

empirical methods in natural language processing | 2008

When Harry Met Harri: Cross-lingual Name Spelling Normalization

Fei Huang; Ahmad Emami; Imed Zitouni

Foreign name translations typically include multiple spelling variants. These variants cause data sparseness problems, increase Out-of-Vocabulary (OOV) rate, and present challenges for machine translation, information extraction and other NLP tasks. This paper aims to identify name spelling variants in the target language using the source name as an anchor. Based on word-to-word translation and transliteration probabilities, as well as the string edit distance metric, target name translations with similar spellings are clustered. With this approach tens of thousands of high precision name translation spelling variants are extracted from sentence-aligned bilingual corpora. When these name spelling variants are applied to Machine Translation and Information Extraction tasks, improvements over strong baseline systems are observed in both cases.

international conference on acoustics, speech, and signal processing | 2010

Continuous space language modeling techniques

Ruhi Sarikaya; Ahmad Emami; Mohamed Afify; Bhuvana Ramabhadran

This paper focuses on comparison of two continuous space language modeling techniques, namely Tied-Mixture Language modeling (TMLM) and Neural Network Based Language Modeling (NNLM). Additionally, we report on using alternative feature representations for words and histories used in TMLM. Besides bigram co-occurrence based features we consider using NNLM based input features for training TMLMs. We also describe how we improve certain steps in building TMLMs. We demonstrate that TMLMs provide significant improvements of over 16% relative and 10% relative in Character Error Rate (CER) for Mandarin speech recognition, over the trigram and NNLM models, respectively in a speech to speech translation task.

conference of the international speech communication association | 2014