Ossama Emam | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Ossama Emam is active.

Explore More

Publication

Featured researches published by Ossama Emam.

meeting of the association for computational linguistics | 2003

Language Model Based Arabic Word Segmentation

Young-Suk Lee; Kishore Papineni; Salim Roukos; Ossama Emam; Hany Hassan

We approximate Arabics rich morphology by a model that a word consists of a sequence of morphemes in the pattern prefix*-stem-suffix* (* denotes zero or more occurrences of a morpheme). Our method is seeded by a small manually segmented Arabic corpus and uses it to bootstrap an unsupervised algorithm to build the Arabic word segmenter from a large unsegmented Arabic corpus. The algorithm uses a trigram language model to determine the most probable morpheme sequence for a given input. The language model is initially estimated from a small manually segmented corpus of about 110,000 words. To improve the segmentation accuracy, we use an unsupervised algorithm for automatically acquiring new stems from a 155 million word unsegmented corpus, and re-estimate the model parameters with the expanded vocabulary and training corpus. The resulting Arabic word segmentation system achieves around 97% exact match accuracy on a test corpus containing 28,449 word tokens. We believe this is a state-of-the-art performance and the algorithm can be used for many highly inflected languages provided that one can create a small manually segmented corpus of the language of interest.

meeting of the association for computational linguistics | 2007

Arabic Cross-Document Person Name Normalization

Walid Magdy; Kareem Darwish; Ossama Emam; Hany Hassan

This paper presents a machine learning approach based on an SVM classifier coupled with preprocessing rules for cross-document named entity normalization. The classifier uses lexical, orthographic, phonetic, and morphological features. The process involves disambiguating different entities with shared name mentions and normalizing identical entities with different name mentions. In evaluating the quality of the clusters, the reported approach achieves a cluster F-measure of 0.93. The approach is significantly better than the two baseline approaches in which none of the entities are normalized or entities with exact name mentions are normalized. The two baseline approaches achieve cluster F-measures of 0.62 and 0.74 respectively. The classifier properly normalizes the vast majority of entities that are misnormalized by the baseline system.

meeting of the association for computational linguistics | 2005

Examining the Effect of Improved Context Sensitive Morphology on Arabic Information Retrieval

Kareem Darwish; Hany Hassan; Ossama Emam

This paper explores the effect of improved morphological analysis, particularly context sensitive morphology, on monolingual Arabic Information Retrieval (IR). It also compares the effect of context sensitive morphology to non-context sensitive morphology. The results show that better coverage and improved correctness have a dramatic effect on IR effectiveness and that context sensitive morphology further improves retrieval effectiveness, but the improvement is not statistically significant. Furthermore, the improvement obtained by the use of context sensitive morphology over the use of light stemming was not significantly significant.

meeting of the association for computational linguistics | 2007

BioNoculars: Extracting Protein-Protein Interactions from Biomedical Text

Amgad Madkour; Kareem Darwish; Hany Hassan; Ahmed Hassan; Ossama Emam

The vast number of published medical documents is considered a vital source for relationship discovery. This paper presents a statistical unsupervised system, called BioNoculars, for extracting protein-protein interactions from biomedical text. BioNoculars uses graph-based mutual reinforcement to make use of redundancy in data to construct extraction patterns in a domain independent fashion. The system was tested using MEDLINE abstract for which the protein-protein interactions that they contain are listed in the database of interacting proteins and protein-protein interactions (DIPPPI). The system reports an F-Measure of 0.55 on test MEDLINE abstracts.

international conference on acoustics, speech, and signal processing | 2004

Multilingual acoustic models for speech recognition and synthesis

Siegfried Kunzmann; Volker Fischer; Jorge Gonzalez; Ossama Emam; Carsten Günther; Eric Janke

In this paper, we review the design of a common phone alphabet for up to fifteen languages and describe its application in two important components of a seamless multilingual conversational system, namely speech recognition and synthesis. We report on experiments that demonstrate the advantages of multilingual acoustic models both for the recognition of foreign names and non-native speech, and describe the usefulness of a common phone alphabet for the construction of unit selection based mono- and bilingual speech synthesis systems.

Archive | 2004