Ossama Emam
IBM
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Ossama Emam.
meeting of the association for computational linguistics | 2003
Young-Suk Lee; Kishore Papineni; Salim Roukos; Ossama Emam; Hany Hassan
We approximate Arabics rich morphology by a model that a word consists of a sequence of morphemes in the pattern prefix*-stem-suffix* (* denotes zero or more occurrences of a morpheme). Our method is seeded by a small manually segmented Arabic corpus and uses it to bootstrap an unsupervised algorithm to build the Arabic word segmenter from a large unsegmented Arabic corpus. The algorithm uses a trigram language model to determine the most probable morpheme sequence for a given input. The language model is initially estimated from a small manually segmented corpus of about 110,000 words. To improve the segmentation accuracy, we use an unsupervised algorithm for automatically acquiring new stems from a 155 million word unsegmented corpus, and re-estimate the model parameters with the expanded vocabulary and training corpus. The resulting Arabic word segmentation system achieves around 97% exact match accuracy on a test corpus containing 28,449 word tokens. We believe this is a state-of-the-art performance and the algorithm can be used for many highly inflected languages provided that one can create a small manually segmented corpus of the language of interest.
meeting of the association for computational linguistics | 2007
Walid Magdy; Kareem Darwish; Ossama Emam; Hany Hassan
This paper presents a machine learning approach based on an SVM classifier coupled with preprocessing rules for cross-document named entity normalization. The classifier uses lexical, orthographic, phonetic, and morphological features. The process involves disambiguating different entities with shared name mentions and normalizing identical entities with different name mentions. In evaluating the quality of the clusters, the reported approach achieves a cluster F-measure of 0.93. The approach is significantly better than the two baseline approaches in which none of the entities are normalized or entities with exact name mentions are normalized. The two baseline approaches achieve cluster F-measures of 0.62 and 0.74 respectively. The classifier properly normalizes the vast majority of entities that are misnormalized by the baseline system.
meeting of the association for computational linguistics | 2005
Kareem Darwish; Hany Hassan; Ossama Emam
This paper explores the effect of improved morphological analysis, particularly context sensitive morphology, on monolingual Arabic Information Retrieval (IR). It also compares the effect of context sensitive morphology to non-context sensitive morphology. The results show that better coverage and improved correctness have a dramatic effect on IR effectiveness and that context sensitive morphology further improves retrieval effectiveness, but the improvement is not statistically significant. Furthermore, the improvement obtained by the use of context sensitive morphology over the use of light stemming was not significantly significant.
meeting of the association for computational linguistics | 2007
Amgad Madkour; Kareem Darwish; Hany Hassan; Ahmed Hassan; Ossama Emam
The vast number of published medical documents is considered a vital source for relationship discovery. This paper presents a statistical unsupervised system, called BioNoculars, for extracting protein-protein interactions from biomedical text. BioNoculars uses graph-based mutual reinforcement to make use of redundancy in data to construct extraction patterns in a domain independent fashion. The system was tested using MEDLINE abstract for which the protein-protein interactions that they contain are listed in the database of interacting proteins and protein-protein interactions (DIPPPI). The system reports an F-Measure of 0.55 on test MEDLINE abstracts.
international conference on acoustics, speech, and signal processing | 2004
Siegfried Kunzmann; Volker Fischer; Jorge Gonzalez; Ossama Emam; Carsten Günther; Eric Janke
In this paper, we review the design of a common phone alphabet for up to fifteen languages and describe its application in two important components of a seamless multilingual conversational system, namely speech recognition and synthesis. We report on experiments that demonstrate the advantages of multilingual acoustic models both for the recognition of foreign names and non-native speech, and describe the usefulness of a common phone alphabet for the construction of unit selection based mono- and bilingual speech synthesis systems.
Archive | 2004
Ossama Emam; Volker Fischer
Archive | 2007
Ossama Emam; Dimitri Kanevsky; Alexander Zlatsin
empirical methods in natural language processing | 2006
Hany Hassan; Ahmed Hassan; Ossama Emam
Archive | 2006
Ossama Emam; Hany Hassan; Amr F. Yassin
Journal of the Acoustical Society of America | 2002
Ossama Emam; Siegfried Kunzmann