Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Mohsen A. Rashwan is active.

Publication


Featured researches published by Mohsen A. Rashwan.


IEEE Transactions on Audio, Speech, and Language Processing | 2011

A Stochastic Arabic Diacritizer Based on a Hybrid of Factorized and Unfactorized Textual Features

Mohsen A. Rashwan; Mohamed Al-Badrashiny; Mohamed Attia; Sherif M. Abdou; Ahmed Rafea

This paper introduces a large-scale dual-mode stochastic system to automatically diacritize raw Arabic text. The first of these modes determines the most likely diacritics by choosing the sequence of full-form Arabic word diacritizations with maximum marginal probability via A^ lattice search and long-horizon n-grams probability estimation. When full-form words are OOV, the system switches to the second mode which factorizes each Arabic word into all its possible morphological constituents, then uses also the same techniques used by the first mode to get the most likely sequence of morphemes, hence the most likely diacritization. While the second mode achieves a far better coverage of the highly derivative and inflective Arabic language, the first mode is faster to learn, i.e., yields better disambiguation results for the same size of training corpora, especially for inferring syntactical (case-ending) diacritics. Our presented hybrid system that benefits from the advantages of both modes has experimentally been found superior to the best performing reported systems of Habash and Rambow, and of Zitouni, using the same training and test corpus for the sake of fair comparison. The word error rates of (morphological diacritization, overall diacritization including the case endings) for the three systems are, respectively, as follows (3.1%, 12.5%), (5.5%, 14.9%), and (7.9%, 18%). The hybrid architecture of language factorizing and unfactorizing components may be inspiring to other NLP/HLT problems in analogous situations.


empirical methods in natural language processing | 2014

Semantic Query Expansion for Arabic Information Retrieval

Ashraf Y. Mahgoub; Mohsen A. Rashwan; Hazem M. Raafat; Mohamed A. Zahran; Magda B. Fayek

Traditional keyword based search is found to have some limitations. Such as word sense ambiguity, and the query intent ambiguity which can hurt the precision. Semantic search uses the contextual meaning of terms in addition to the semantic matching techniques in order to overcome these limitations. This paper introduces a query expansion approach using an ontology built from Wikipedia pages in addition to other thesaurus to improve search accuracy for Arabic language. Our approach outperformed the traditional keyword based approach in terms of both F-score and NDCG measures.


international conference natural language processing | 2008

A Compact Arabic Lexical Semantics Language Resource Based on the Theory of Semantic Fields

Mohamed Attia; Mohsen A. Rashwan; Ahmed Ragheb; Mohamed Al-Badrashiny; Husein Al-Basoumy; Sherif M. Abdou

Applications of statistical Arabic NLP in general, and text mining in specific, along with the tools underneath perform much better as the statistical processing operates on deeper language factorizations than on raw text. Lexical semantic factorization is very important in this regard due to its feasibility, high level of abstraction, and the language independence of its output. In the core of such a factorization lies an Arabic lexical semantic DB. While building this LR, we had to go beyond the conventional exclusive collection of words from dictionaries and thesauri that cannot alone produce a satisfactory coverage of this highly inflective and derivative language. This paper is hence devoted to the design and implementation of an Arabic lexical semantics LR that enables the retrieval of the possible senses of any given Arabic word at a high coverage. Instead of tying full Arabic words to their possible senses, our LR flexibly relates morphologically and PoS-tags constrained Arabic lexical compounds to a predefined limited set of semantic fields across which the standard semantic relations are defined. With the aid of the same large-scale Arabic morphological analyzer and PoS tagger in the runtime, the possible senses of virtually any given Arabic word are retrievable.


conference on intelligent text processing and computational linguistics | 2015

Word Representations in Vector Space and their Applications for Arabic

Mohamed A. Zahran; Ahmed Magooda; Ashraf Y. Mahgoub; Hazem M. Raafat; Mohsen A. Rashwan; Amir Atyia

A lot of work has been done to give the individual words of a certain language adequate representations in vector space so that these representations capture semantic and syntactic properties of the language. In this paper, we compare different techniques to build vectorized space representations for Arabic, and test these models via intrinsic and extrinsic evaluations. Intrinsic evaluation assesses the quality of models using benchmark semantic and syntactic dataset, while extrinsic evaluation assesses the quality of models by their impact on two Natural Language Processing applications: Information retrieval and Short Answer Grading. Finally, we map the Arabic vector space to the English counterpart using Cosine error regression neural network and show that it outperforms standard mean square error regression neural networks in this task.


IEEE Transactions on Audio, Speech, and Language Processing | 2015

Deep learning framework with confused sub-set resolution architecture for automatic Arabic diacritization

Mohsen A. Rashwan; Ahmad A. Al Sallab; Hazem M. Raafat; Ahmed Rafea

The Arabic language belongs to a group of languages that require diacritization over their characters. Modern Standard Arabic (MSA) transcripts omit the diacritics, which are essential for many machine learning tasks like Text-To-Speech (TTS) systems. In this work Arabic diacritics restoration is tackled under a deep learning framework that includes the Confused Sub-set Resolution (CSR) method to improve the classification accuracy, in addition to an Arabic Part-of-Speech (PoS) tagging framework using deep neural nets. Special focus is given to syntactic diacritization, which still suffers low accuracy as indicated in prior works. Evaluation is done versus state-of-the-art systems reported in literature, with quite challenging datasets collected from different domains. Standard datasets like the LDC Arabic Tree Bank are used in addition to custom ones we have made available online to allow other researchers to replicate these results. Results show significant improvement of the proposed techniques over other approaches, reducing the syntactic classification error to 9.9% and morphological classification error to 3% compared to 12.7% and 3.8% of the best reported results in literature, improving the error by 22% over the best reported systems.


international conference on document analysis and recognition | 1993

A tree structured neural network

Hazem M. Raafat; Mohsen A. Rashwan

A tree structured system for pattern classification is proposed. It uses the feedforward neural network with back-propagation (FN) as a building block. A single FN is used to classify all of the given patterns, then a confusion matrix is carefully studied and used to divide the patterns into groups. This process is repeated by training new FNs with these groups then dividing them into subgroups and so on, until no more grouping could be obtained. It is shown that by this approach, the available feature set can be used more effectively. The testing environment of this work is the isolated handwritten Arabic character set, which is a problem of reasonable complexity. However, the suggested method can be applied to other pattern classification problems. Dividing a large problem into smaller and easier ones is the target that is successful reached.<<ETX>>


IEEE Transactions on Audio, Speech, and Language Processing | 2009

Fassieh¯, a Semi-Automatic Visual Interactive Tool for Morphological, PoS-Tags, Phonetic, and Semantic Annotation of Arabic Text Corpora

Mohamed Attia; Mohsen A. Rashwan; Mohamed Al-Badrashiny

This paper introduces an Arabic text annotation tool called Fassiehreg. Via a sophisticated interactive GUI application, Fassiehreg makes it easy to build structured large standard written Arabic corpora, then allows the production of fundamental linguistic analyses; i.e., language factorizations, at high coverage and accuracy rates over such corpora. Arabic morphological analysis, part-of-speech (PoS)-tagging, full phonetic transcription (diacritization), and lexical semantics analysis are the most significant Arabic language factorizations currently supported by Fassiehreg. The high inherent ambiguity of these analyses is statistically resolved in Fassiehreg which also affords a multitude of auxiliary features enabling a guided, normalized, and efficient proofreading of any part of the factorized corpus. The paper first reviews the highly inflective and derivative nature of Arabic language, our Arabic language factorization models, and the associated statistical disambiguation methodology. Afterwards, we present Fassiehreg which is not only a text annotation tool, but is also an evaluation, demonstrative, and tutorial means of Arabic natural language processing (NLP).


empirical methods in natural language processing | 2014

Automatic Arabic diacritics restoration based on deep nets

Ahmad A. Al Sallab; Mohsen A. Rashwan; Hazem M. Raafat; Ahmed Rafea

In this paper, Arabic diacritics restoration problem is tackled under the deep learning framework presenting Confused Subset Resolution (CSR) method to improve the classification accuracy, in addition to Arabic Part-of-Speech (PoS) tagging framework using deep neural nets. Special focus is given to syntactic diacritization, which still suffer low accuracy as indicated by related works. Evaluation is done versus state-of-the-art systems reported in literature, with quite challenging datasets, collected from different domains. Standard datasets like LDC Arabic Tree Bank is used in addition to custom ones available online for results replication. Results show significant improvement of the proposed techniques over other approaches, reducing the syntactic classification error to 9.9% and morphological classification error to 3% compared to 12.7% and 3.8% of the best reported results in literature, improving the error by 22% over the best reported systems


international conference on cloud computing | 2016

A multi-layered approach for Arabic text diacritization

Aya S. Metwally; Mohsen A. Rashwan; Amir F. Atiya

Text diacritization is a critical task which plays an important role for improving the performance of many NLP tasks for languages that include diacritics in their orthographies. In this paper, we handle the problem of Arabic text diacritization such that our system diacritize input Arabic sequence of words both morphologically and syntactically. The operation of the system is divided into three layers: the first layer uses HMM for the morphological diacritization of previously seen words, the second layer uses an external morphological analyzer for the morphological diacritization of OOV words, and the third layer uses CRF for the syntactic diacritization of all words. To evaluate the performance of the system, we used the benchmark LDC Arabic Treebank Part 3 datasets used by the state-of-the-art systems. The proposed system achieved a morphological WER of 4.3%, and a syntactic WER of 9.4%.


conference on intelligent text processing and computational linguistics | 2015

High Quality Arabic Lexical Ontology Based on MUHIT, WordNet, SUMO and DBpedia

Eslam Kamal; Mohsen A. Rashwan; Sameh Alansary

In this paper, we aim to move ontology-based Arabic NLP forward by experimenting with the generation of a comprehensive Arabic lexical ontology using multiple language resources. We recommend a combination of MUHIT, WordNet and SUMO and use a simple method to link them, which results in the generation of an Arabic-lexicalized version of the SUMO ontology. Then, we evaluate the generated ontology, and propose a method for increasing its named entity coverage using DBpedia, English-to-Arabic Transliteration, and Named Entity Recognition. We end up with an Arabic lexical ontology that has 228K Arabic synsets, linked to 7.8K concepts and 143K instances. This ontology achieves a precision of 96.9% and recall of 75.5% for NLU scenarios.

Collaboration


Dive into the Mohsen A. Rashwan's collaboration.

Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Ahmed Rafea

American University in Cairo

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Researchain Logo
Decentralizing Knowledge