Amr El-Desoky Mousa
RWTH Aachen University
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Amr El-Desoky Mousa.
international conference on acoustics, speech, and signal processing | 2013
Amr El-Desoky Mousa; Hong-Kwang Jeff Kuo; Lidia Mangu; Hagen Soltau
Egyptian Arabic (EA) is a colloquial version of Arabic. It is a low-resource morphologically rich language that causes problems in Large Vocabulary Continuous Speech Recognition (LVCSR). Building LMs on morpheme level is considered a better choice to achieve higher lexical coverage and better LM probabilities. Another approach is to utilize information from additional features such as morphological tags. On the other hand, LMs based on Neural Networks (NNs) with a single hidden layer have shown superiority over the conventional n-gram LMs. Recently, Deep Neural Networks (DNNs) with multiple hidden layers have achieved better performance in various tasks. In this paper, we explore the use of feature-rich DNN-LMs, where the inputs to the network are a mixture of words and morphemes along with their features. Significant Word Error Rate (WER) reductions are achieved compared to the traditional word-based LMs.
spoken language technology workshop | 2010
Amr El-Desoky Mousa; M. Ali Basha Shaik; Ralf Schlüter; Hermann Ney
One of the major difficulties related to German LVCSR is the rich morphology nature of German, leading to high out-of-vocabulary (OOV) rates, and high language model (LM) perplexities. Normally, compound words make up an essential fraction of the German vocabulary. Most compound OOVs are composed of frequent in-vocabulary words. Here, we investigate the use of sub-lexical LMs based on different approaches for word decomposition, namely supervised and unsupervised decomposition, as well as decomposition derived from grapheme-to-phoneme (G2P) conversion. In the later approach, we augment a normal word model with a set of grapheme-phoneme pairs called graphones used to model the OOV words. A novel approach is proposed to select the representative graphone sequences for OOVs based on unsupervised decomposition and word-pronunciation alignment. We obtain relative reductions in word error rate (WER) from 4.2% to 6.5% with respect to a comparable full-words system.
international conference on acoustics, speech, and signal processing | 2011
M. Ali Basha Shaik; Amr El-Desoky Mousa; Ralf Schlüter; Hermann Ney
Polish is a synthetic language with a high morpheme-per-word ratio. It makes use of a high degree of inflection leading to high out-of-vocabulary (OOV) rates, and high Language Model (LM) perplexities. This poses a challenge for Large Vocabulary and Continuous Speech Recognition (LVCSR) systems. Here, the use of morpheme and syllable based units is investigated for building sub-lexical LMs. A different type of sub-lexical units is proposed based on combining morphemic or syllabic units with corresponding pronunciations. Thereby, a set of grapheme-phoneme pairs called graphones are used for building LMs. A relative reduction of 3.5% in Word Error Rate (WER) is obtained with respect to a traditional system based on full-words.
document analysis systems | 2014
Mahdi Hamdani; Patrick Doetsch; Michal Kozielski; Amr El-Desoky Mousa; Hermann Ney
This paper describes the RWTH system for large vocabulary Arabic handwriting recognition. The recognizer is based on Hidden Markov Models (HMMs) with state of the art methods for visual/language modeling and decoding. The feature extraction is based on Recurrent Neural Networks (RNNs) which estimate the posterior distribution over the character labels for each observation. Discriminative training using the Minimum Phone Error (MPE) criterion is used to train the HMMs. The recognition is done with the help of n-gram Language Models (LMs) trained using in-domain text data. Unsupervised writer adaptation is also performed using the Constrained Maximum Likelihood Linear Regression (CMLLR) feature adaptation. The RWTH Arabic handwriting recognition system gave competitive results in previous handwriting recognition competitions. The used techniques allows to improve the performance of the system participating in the OpenHaRT 2013 evaluation.
international conference on document analysis and recognition | 2013
Mahdi Hamdani; Amr El-Desoky Mousa; Hermann Ney
The use of Language Models (LMs) is a very important component in large and open vocabulary recognition systems. This paper presents an open-vocabulary approach for Arabic handwriting recognition. The proposed approach makes use of Arabic word decomposition based on morphological analysis. The vocabulary is a combination of words and sub-words obtained by the decomposition process. Out Of Vocabulary (OOV) words can be recognized by combining different elements from the lexicon. The recognition system is based on Hidden Markov Models (HMMs) with position and context dependent character models. An n-gram LM trained on the decomposed text is used along with the HMMs during the search. The approach is evaluated using two Arabic handwriting datasets. The open vocabulary approach leads to a significant improvement in the system performance. Two different types experiments for two Arabic handwriting recognition tasks are conducted in this work. The proposed approach for open vocabulary allows to have an absolute improvement of up to 1% in the Word Error Rate (WER) for the constrained task and to keep the same performance of the baseline system for the unconstrained one.
international conference on acoustics, speech, and signal processing | 2012
Amr El-Desoky Mousa; Ralf Schlüter; Hermann Ney
A major challenge for Arabic Large Vocabulary Continuous Speech Recognition (LVCSR) is the rich morphology of Arabic, which leads to high Out-of-vocabulary (OOV) rates, and poor Language Model (LM) probabilities. In such cases, the use of morphemes rather than full-words is considered a better choice for LMs. Thereby, higher lexical coverage and less LM perplexities are achieved. On the other side, an effective way to increase the robustness of LMs is to incorporate features of words into LMs. In this paper, we investigate the use of features derived for morphemes rather than words. Thus, we combine the benefits of both morpheme level and feature rich modeling. We compare the performance of stream-based, class-based and Factored LMs (FLMs) estimated over sequences of morphemes and their features for performing Arabic LVCSR. A relative reduction of 3.9% in Word Error Rate (WER) is achieved compared to a word-based system.
conference of the international speech communication association | 2016
Amr El-Desoky Mousa; Björn W. Schuller
Efficient grapheme-to-phoneme (G2P) conversion models are considered indispensable components to achieve the stateof-the-art performance in modern automatic speech recognition (ASR) and text-to-speech (TTS) systems. The role of these models is to provide such systems with a means to generate accurate pronunciations for unseen words. Recent work in this domain is based on recurrent neural networks (RNN) that are capable of translating grapheme sequences into phoneme sequences taking into account the full context of graphemes. To achieve high performance with these models, utilizing explicit alignment information is found essential. The quality of the G2P model heavily depends on the imposed alignment constraints. In this paper, a novel approach is proposed using complex many-to-many G2P alignments to improve the performance of G2P models based on deep bidirectional long short-term memory (BLSTM) RNNs. Extensive experiments cover models with different numbers of hidden layers, projection layer, input splicing windows, and varying alignment schemes. One observes that complex alignments significantly improve the performance on the publicly available CMUDict US English dataset. We compare our results with previously published results.
international conference on acoustics, speech, and signal processing | 2015
M. Ali Basha Shaik; Amr El-Desoky Mousa; Stefan Hahn; Ralf Schlüter; Hermann Ney
In this work, multiple hierarchical language modeling strategies for a zero OOV rate large vocabulary continuous speech recognition system are investigated. In our previously proposed hierarchical approach, a full-word language model and a context independent character-level LM (CLM) are directly used during search. The novelty of this work is to jointly model the character-level prior and the pronunciation probabilities, to introduce across-word context into the characterlevel LM, and to properly normalize the character-level LM using prefix-tree based normalization for the hierarchical approach. Significant reductions in-terms of word error rates (WER) on the best full-word Quaero Polish LVCSR system are reported.
conference of the international speech communication association | 2011
M. Ali; Basha Shaik; Amr El-Desoky Mousa; Ralf Schl; Hermann Ney
conference of the international speech communication association | 2011
Amr El-Desoky Mousa; M. Ali Basha Shaik; Ralf Schlüter; Hermann Ney