Tarek Elghazaly
Cairo University
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Tarek Elghazaly.
applications of natural language to data bases | 2013
Ahmed S. Ibrahim; Tarek Elghazaly
This paper examines the benefits of both the Rhetorical Representation and Vector Representation for Arabic text summarization. The Rhetorical Representation uses the Rhetorical Structure Theory (RST) for building the Rhetorical Structure Tree (RS-Tree) and extracts the most significant paragraphs as a summary. On the other hand, the Vector Representation uses a cosine similarity measure for ranking and extracting the most significant paragraphs as a summary. The framework evaluates both summaries using precision. Statistical results show that Rhetorical Representation is superior to Vector Representation. Moreover, the rhetorical summary keeps the text in context, without leading to lack of cohesion in which the anaphoric reference is not broken i.e. improving the ability of extracting the semantics behind the text.
the internet of things | 2016
Tarek Elghazaly; A. Mahmoud; Hesham A. Hefny
There is a remarkable growth in the usage of social networks, such as Facebook and Twitter. Users from different cultures and backgrounds post large volumes of textual comments reflecting their opinion in different aspect of life and make them available to everyone. In particular we study the case of Twitter and focus on presidential elections in Egypt 2012. This paper compares between two techniques for Arabic text classification using WEKA application. These techniques are Support Vector Machine (SVM) and Naïve Bayesian (NB), we investigate the use of TF-IDF to obtain document vector. The main objective of this paper is to measure the accuracy and time to get the result for each classifier and to determine which classifier is more accurate for Arabic text classification. Comparison reported in this paper shows that the Naïve Bayesian method is the highest accuracy and the lowest error rate.
mexican international conference on artificial intelligence | 2013
Ahmed S. Ibrahim; Tarek Elghazaly
This paper uses a semantic technique by adopting a Rhetorical Structure Theory (RST) for summarization purpose, to discover the most significant paragraphs based on functional and semantic criteria. However, the quality of RST summarization suffers when dealing with large documents. This paper proposes a new hybrid summarization model for Arabic text, which mingles two sub-models: The first sub-model produces a primary summary by using Rhetorical Structure Theory for identifying a range of the most significant parts of the text (the nucleus). Then the second sub-model ranks the significant parts in the primary rhetorical-summary based on the cosine similarity feature. To evaluate the proposed model, a prototype was developed on a range of articles, which have been classified into three groups different in size. The final output summary was evaluated in relation to its manual counterpart. In terms of enhancement of the rhetorical-summary precision, the experiment shows that proposed model HSM average precision is 71.6%, superior over the primary rhetorical-summary precision 56.3%.
mexican international conference on artificial intelligence | 2013
Mostafa Ezzat; Tarek Elghazaly; Mervat Gheith
This paper provides a new model enhancing the Arabic OCR degraded text retrieval effectiveness. The proposed model based on simulating the Arabic OCR recognition mistakes on a word based approach. Then the model expands the user search query using the expected OCR errors. The resulting expanded search query gives higher precision and recall in searching Arabic OCR-Degraded text rather than the original query. The proposed new model showed a significant increase in the degraded text retrieval effectiveness over the previous models. The retrieval effectiveness of the new model is %97, while the best effectiveness published for word based approach was %84 and the best effectiveness for character based approach was %56. In addition, the new model overcomes several limitations of the current two existing models.
international conference on computational linguistics | 2009
Tarek Elghazaly; Aly A. Fahmy
This paper provides a novel model for English/Arabic Query Translation to search Arabic text, and then expands the Arabic query to handle Arabic OCR-Degraded Text. This includes detection and translation of word collocations, translating single words, transliterating names, and disambiguating translation and transliteration through different approaches. It also expands the query with the expected OCR-Errors that are generated from the Arabic OCR-Errors simulation model which proposed inside the paper. The query translation and expansion model has been supported by different libraries proposed in the paper like a Word Collocations Dictionary, Single Words Dictionaries, a Modern Arabic corpus, and other tools. The model gives high accuracy in translating the Queries from English to Arabic solving the translation and transliteration ambiguities and with orthographic query expansion; it gives high degree of accuracy in handling OCR errors.
international conference on swarm intelligence | 2015
Tarek Elghazaly
This paper introduces an Enhanced Orthographic Query Expansion Model for improving Text Retrieval of Arabic Text resulting from the Optical Character Recognition (OCR) process. The proposed model starts with checking the query word through two word based a word based error synthesizing sub-models then in a character N-Gram simulation sub-model. The model is flexible either to get the corrected word once it finds it from the early stages (in case of highest performance is needed) or to check all possibilities from all sub-models (in case of highest expansion is needed). The 1st word based sub-model that has manual word alignment (degraded & original pairs) alone has high precision and recall but with some limitations that may affect recall (in case of connected multi-words as OCR output). The second words based sub-model provides high precession (less than the 1st one) but also with higher recall. The last sub-model which is a character N-gram one, provides low precision but high recall. The output of the proposed orthographic query expansion model is the original query extended with the expected degraded words taken from the OCR errors simulation model. The proposed model gave a higher precision (97.5%) than all previous ones with keeping the highest previous recall numbers.
international conference on swarm intelligence | 2015
Abdelmawgoud Mohamed Maabid; Tarek Elghazaly; Mervat Ghaith
Morphological analysis is a vital part of natural language processing applications, there are no definitive standards for evaluating and benchmarking Arabic morphological systems. This paper proposes assessment criteria for evaluating Arabic morphological systems by scrutinizing the input, output and architectural design to enables researchers to evaluate and fairly compare Arabic morphology systems. By scoring some state of the art Arabic morphological analyzers based on the proposed criteria; the accuracy scores showed that the best algorithm failed to achieve a reliable rate. Hence, this paper introduced an enhanced algorithm for resolving the inflected Arabic word, identifies its root, finds its pattern and POS tagging that will reduce the search time considerably and to free up the deficiencies identified by this assessment criteria. The proposed model uses semantic rules of the Arabic language on top of a hybrid sub-model based on two existing algorithms (Al-Khalil and An Improved Arabic morphology analyzer IAMA rules).
Archive | 2018
A. Mahmoud; Tarek Elghazaly
Twitter is one of the most famous applications of social networks that allow users to communicate with each other and share their opinions and feelings in all types of topics: economics, business, science, social, religion, and politics in a very short message of information called Tweets. Users are usually written using colloquial Arabic and include a lot of slang. In this Paper, we studied sentiment analysis of Arabic text retrieved from a twitter focus on presidential elections in Egypt 2012. We are using Naive Bayes (NB) which is a machine learning algorithm, one time by using N-Gram (unigram and bigram) and another time by using feature selection. The main objective of this paper is to measure the accuracy of each method and determine which method is more accurate for Arabic text classification. The results show that unigram and information gain attribute selection achieves the highest accuracy and the lowest error rate.
international computer engineering conference | 2016
Mai Mohamed Mahmoud Farag; Tarek Elghazaly; Hesham A. Hefny
In this paper we apply particle swarm optimization (PSO) feature selection to enhance Hidden Markov Model (HMM) states and parameters for face recognition systems. Ideal Feature selection for face images based on the idea of collaborative behavior of bird flocking to reduce the feature size and hence recognition time complicity. The framework has been inspected on 400 face pictures of the Olivetti Research Laboratory face database. The experiments demonstrated an acknowledgment rate of 98.5%, using half of the images for training.
International Conference on Advanced Intelligent Systems and Informatics | 2016
Mariam Muhammad; Tarek Elghazaly; Mostafa Ezzat; Mervat Gheith
This paper presents a new correction model for Arabic OCR errors. The proposed model is mainly based on the character segmentation and the character alignment on a single character or multi-characters. Results show that the multi-character model is better than the single character model in that it is trained on 502,167 words and can find the correct word within the top 10 proposed corrections for 94 % of the words. This model considers the effect of increasing the size of training set that perfectly leads to better results; the correction rate will approach 53 % upon using 6000 words, 80 % upon using 64,225 words, and 94 % upon using 502,167 words.