Magdy Nagi
Alexandria University
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Magdy Nagi.
international conference on neural information processing | 2004
Mahmoud F. Hussin; Mohamed S. Kamel; Magdy Nagi
Document Clustering is one of the popular techniques that can unveil inherent structure in the underlying data. Two successful models of unsupervised neural networks, Self-Organizing Map (SOM) and Adaptive Resonance Theory (ART) have shown promising results in this task. The high dimensionality of the data has always been a challenging problem in document clustering. It is common to overcome this problem using dimension reduction methods. In this paper, we propose a new two-level neural network based document clustering architecture that can be used for high dimensional data. Our solution is to use SOM in the first level as a dimension reduction method to produce multiple output clusters, then use ART in the second level to produce the final clusters using the reduced vector space. The experimental results of clustering documents from the RETURES corpus using our proposed architecture show an improvement in the clustering performance evaluated using the entropy and the f_measure.
empirical methods in natural language processing | 2014
Sameh Alansary; Magdy Nagi
This paper focuses on a project for building the first International Corpus of Arabic (ICA). It is planned to contain 100 million analyzed tokens with an interface which allows users to interact with the corpus data in a number of ways [ICA website]. ICA is a representative corpus of Arabic that has been initiated in 2006, it is intended to cover the Modern Standard Arabic (MSA) language as being used all over the Arab world. ICA has been analyzed by Bibliotheca Alexandrina Morphological Analysis Enhancer (BAMAE). BAMAE is based on Buckwalter Arabic Morphological Analyzer (BAMA). Precision and Recall are the evaluation measures used to evaluate the BAMAE system. At this point, Precision measurement ranges from 95%-92% while recall measurement was 92%-89%. This depends on the number of qualifiers retrieved for every word. The percentages are expected to rise by implementing the improvements while working on larger amounts of data.
international symposium on computers and communications | 2011
Sahar M. Ghanem; Mona A. Mohamed; Magdy Nagi
Association rule discovery algorithms generate all rules satisfying minimum support and confidence thresholds. These techniques yield too many rules and are infeasible when the minimum support is low. Recently, Li [1] proposed the Optimal Rule Discovery (ORD) algorithm that discovers a family of rule sets that maximizes a range of interestingness metrics, other than the commonly used confidence metric. In addition, the discovered optimal class association rule set is the minimum subset of rules with the same predictive power as the complete class association rule set. Moreover, ORD is significantly more efficient than association rule discovery independent of the data structure and the implementation. Due to the existence of huge amounts of data, it is important to investigate efficient methods for distributed/parallel mining of rules. In this paper, we propose EDP-ORD an efficient distributed/parallel extension of the ORD algorithm. We theoretically disclose a relationship between locally large and globally large rules and use it in reducing the number of generated rules and the exchanged messages at each site/partition. Moreover, we empirically compare EDP-ORD with a naïve distributed/parallel ORD version on five benchmark datasets. The experimental results shows that the reduction in number of generated rules at each site can reach 44% while the reduction in total size of exchanged messages can reach 58%.
international conference on computer engineering and systems | 2016
Mona A. Mohamed; Magdy Nagi; Sahar M. Ghanem
Privacy preserving data mining have been studied widely on static data. Static algorithms are not suitable for streaming data. This imposes the study of new algorithms for privacy preserving that cope with data streams characteristics. Recently, effective anonymization algorithms have been studied on centralized data streams. In this paper we propose an approach for anonymizing distributed data streams based on clustering. First, anonymization is performed locally at each site by clustering a single stream, then local clusters are exchanged between sites through a global server to construct global clusters. The algorithm is shown to be effective when compared to a centralized algorithm and to the case where no communication is exchanged between sites. In addition, empirical results on real and synthetic data sets have shown that the proposed algorithm gives better information loss when compared to the without communication case and close results to the centralized case. Moreover, the algorithm is shown to be efficient in terms of communication and scalable with increasing number of sites.
international conference on big data | 2016
Mariam Malak Fahmy; Iman Elghandour; Magdy Nagi
Given the recent advancement in the ubiquitous positioning technologies, it is now common to query terabytes of spatial data. These massive data are usually geo-distributed across multiple data centers to ensure their availability. Yet, at least one replica of the data is stored close to where the data are generated. Spatial queries are complex and computationally intensive, and therefore, distributed computation platforms, such as Hadoop are now used to improve their execution time. However, Hadoop is agnostic to the spatial data characteristics, and it randomly partitions and locates the data stored in its distributed file system which degrades the performance of the execution of spatial queries. In this paper, we propose CoS-HDFS, an extension to the Hadoop Distributed File System (HDFS) that takes into account the spatial characteristics of the data and accordingly co-locates them on the HDFS nodes that span multiple data centers. We integrate CoS-HDFS with SpatialHadoop, a MapReduce framework that natively supports spatial data, to make use of its implementation of spatial indexes, operations, and query interfaces. We experimentally demonstrate significant reduction in the network usage and total execution time in the case of spatial join queries on the TIGER dataset.
advanced information networking and applications | 2006
Shaimaa Y. Lazem; Noha Adly; Magdy Nagi
Association rules discovery is an important data mining technique which usually produces large number of rules. Subset and superset queries are common queries for association rules. We introduce a new index structure (SSST) for querying association rules, based on a unique set representation using a hierarchical structure. It supports both Subset and Superset queries. Further, it is scalable and adapts to different types of data. The performance of SSST is evaluated using real as well as synthetic datasets, spanning dense and sparse data. The experiments showed that the proposed structure outperforms other set indexing techniques significantly, especially for dense datasets. Also, it scales well with both the number of association rules and the query size.
Second International Workshop on Services in Distributed and Networked Environments | 1995
Noha Adly; Jean Bacon; Magdy Nagi
This paper evaluates the performance of HARP, a hierarchical replication protocol based on nodes organized into a logical hierarchy. The scheme is based an communication with nearby replicas and scales well for thousands of replicas. It proposes a new service interface that provides different levels of asynchrony, allowing strong consistency and weak consistency to be integrated into the same framework. Further it provides the ability to offer different levels of staleness, by querying from different levels of the hierarchy. We present results from a detailed simulation analysis evaluating the benefits and losses in performance resulting from using synchronous versus asynchronous operations within HARP as well as comparing it with a traditional replication protocol.<<ETX>>
International Conference on Advanced Intelligent Systems and Informatics | 2017
Sarah Habashi; Mohamed A. Ismail; Magdy Nagi
A novel active Affinity Propagation algorithm for pairwise constrained image clustering is proposed. It selects the most informative image pairs and then queries human expert for pairwise must-link and cannot-link constraints between these pairs. The constraints are then used as partial background information to supervise the Affinity Propagation based image clustering resulting in a significant performance improvement. Experimental results on different image datasets show that the proposed approach outperforms baseline and state-of-the-art active clustering approaches.
ieee region 10 conference | 2015
Sameh Alansary; Magdy Nagi
With the exponential growth of information available on the internet pages, humans need to extract specific information has also witnessed an ever growing increase. This paper presents KEYS (Knowledge Extraction sYStem). It searches for information inside documents represented in Universal Networking Language (UNL), i.e., in semantic hyper-graphs. This allows for retrieval and extraction practices that are language-independent and semantically-oriented. It is expected to provide high-quality knowledge extraction through a shallow analysis of the source text into the UNL using a specific ontological relations then generate the resulting UNL document into several different target languages in a fully-automatic manner. This is expected to present a novel approach to the topic of identifying named entities; extracting names with all its types from a natural language texts. The Precision measurement of the system is 0.86 while recall measurement is 0.82.
International Journal of Data Mining, Modelling and Management | 2014
Sahar M. Ghanem; Mona A. Mohamed; Magdy Nagi
A classification rule set is usually generated from history data to make predictions on future coming data that is usually not as complete as the training data. In this work, we provide a review of the robust rule-based optimal associate classifier (OAC) and its main building blocks. OAC is robust in the sense that it is able to make an accurate prediction when the future record is incomplete. OAC robustness is achieved by finding a larger classification rule set. We propose to initially transform the database to an item set tree (IST) data structure for efficient support-counting. Then, the optimal rule discovery (ORD) is adopted to mine the rules that are fed to OAC to select the classification rules from. Several experiments have been conducted to compare OAC classification accuracy and number of rules for a wide range of settings, and a classifier measure is introduced.