Iman Elghandour
Alexandria University
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Iman Elghandour.
very large data bases | 2012
Iman Elghandour; Ashraf Aboulnaga
Analyzing large scale data has emerged as an important activity for many organizations in the past few years. This large scale data analysis is facilitated by the MapReduce programming and execution model and its implementations, most notably Hadoop. Users of MapReduce often have analysis tasks that are too complex to express as individual MapReduce jobs. Instead, they use high-level query languages such as Pig, Hive, or Jaql to express their complex tasks. The compilers of these languages translate queries into workflows of MapReduce jobs. Each job in these workflows reads its input from the distributed file system used by the MapReduce system and produces output that is stored in this distributed file system and read as input by the next job in the workflow. The current practice is to delete these intermediate results from the distributed file system at the end of executing the workflow. One way to improve the performance of workflows of MapReduce jobs is to keep these intermediate results and reuse them for future workflows submitted to the system. In this paper, we present ReStore, a system that manages the storage and reuse of such intermediate results. ReStore can reuse the output of whole MapReduce jobs that are part of a workflow, and it can also create additional reuse opportunities by materializing and storing the output of query execution operators that are executed within a MapReduce job. We have implemented ReStore as an extension to the Pig dataflow system on top of Hadoop, and we experimentally demonstrate significant speedups on queries from the PigMix benchmark.
international conference on management of data | 2012
Iman Elghandour; Ashraf Aboulnaga
Analyzing large scale data has become an important activity for many organizations, and is now facilitated by the MapReduce programming and execution model and its implementations, most notably Hadoop. Query languages such as Pig Latin, Hive, and Jaql make it simpler for users to express complex analysis tasks, and the compilers of these languages translate these complex tasks into workflows of MapReduce jobs. Each job in these workflows reads its input from the distributed file system used by the MapReduce system (e.g., HDFS in the case of Hadoop) and produces output that is stored in this distributed file system. This output is then read as input by the next job in the workflow. The current practice is to delete these intermediate results from the distributed file system at the end of executing the workflow. It would be more useful if these intermediate results can be stored and reused in future workflows. We demonstrate ReStore, an extension to Pig that enables it to manage storage and reuse of intermediate results of the MapReduce workflows executed in the Pig data analysis system. ReStore matches input workflows of MapReduce jobs with previously executed jobs and rewrites these workflows to reuse the stored results of the matched jobs. ReStore also creates additional reuse opportunities by materializing and reserving the output of query execution operators that are executed within a MapReduce job. In this demonstration we showcase the MapReduce jobs and sub-jobs recommended by ReStore for a given Pig query, the rewriting of input queries to reuse stored intermediate results, and a what-if analysis of the effectiveness of reusing stored outputs of previously executed jobs.
international conference on management of data | 2008
Iman Elghandour; Ashraf Aboulnaga; Daniel C. Zilio; Fei Chiang; Andrey Balmin; Kevin S. Beyer; Calisto Zuzarte
XML database systems are expected to handle increasingly complex queries over increasingly large and highly structured XML databases. An important problem that needs to be solved for these systems is how to choose the best set of indexes for a given workload. We have developed an XML Index Advisor that solves this XML index recommendation problem and is tightly coupled with the query optimizer of the database system. We have implemented our XML Index Advisor for DB2. In this demonstration we showcase the new query optimizer modes that we added to DB2, the index recommendation process, and the effectiveness of the recommended indexes.
very large data bases | 2013
Iman Elghandour; Ashraf Aboulnaga; Daniel C. Zilio; Calisto Zuzarte
Database systems employ physical structures such as indexes and materialized views to improve query performance, potentially by orders of magnitude. It is therefore important for a database administrator to choose the appropriate configuration of these physical structures for a given database. XML database systems are increasingly being used to manage semi-structured data, and XML support has been added to commercial database systems. In this paper, we address the problem of automatic physical design for XML databases, which is the process of automatically selecting the best set of physical structures for a database and a query workload. We focus on recommending two types of physical structures: XML indexes and relational materialized views of XML data. We present a design advisor for recommending XML indexes, one for recommending materialized views, and an integrated design advisor that recommends both indexes and materialized views. A key characteristic of our advisors is that they are tightly coupled with the query optimizer of the database system, and they rely on the optimizer for enumerating and evaluating physical designs. We have implemented our advisors in a prototype version of IBM DB2 V9, and we experimentally demonstrate the effectiveness of their recommendations using this implementation.
Lecture Notes in Computer Science | 2003
Khaled Nagi; Iman Elghandour; Birgitta König-Ries
The wide availability of mobile devices equipped with wireless communication capabilities results in highly dynamic communities of mobile users. An interesting application in such an environment is decentralized peer-to-peer file sharing. Locating files in a highly dynamic network while minimizing the consumption of scarce resources is challenging. Since the availability of files changes significantly over time, an asynchronous approach to searching is promising. In this paper, we show why existing file sharing systems cannot be used here and introduce our approach based on mobile agents.
international xml database symposium | 2009
Iman Elghandour; Ashraf Aboulnaga; Daniel C. Zilio; Calisto Zuzarte
Physical structures, for example indexes and materialized views, can improve query execution performance by orders of magnitude. Hence, it is important to choose the right configuration of these physical structures for a given database. In this paper, we discuss the types of materialized views that are suitable for an XML database. We then focus on XMLTable materialized views and present a procedure to recommend them given an XML database and a workload of XQuery queries. We have implemented our XMLTable View Advisor in a prototype version based on IBM® DB2® V9.7, which supports both relational and XML data, and we experimentally demonstrate the effectiveness of our advisors recommendations.
international conference on data engineering | 2008
Iman Elghandour; Ashraf Aboulnaga; Daniel C. Zilio; Fei Chiang; Andrey Balmin; Kevin S. Beyer; Calisto Zuzarte
XML database systems are expected to handle increasingly complex queries over increasingly large and highly structured XML databases. An important problem that needs to be solved for these systems is how to choose the best set of indexes for a given workload. In this paper, we present an XML Index Advisor that solves this XML index recommendation problem and has the key characteristic of being tightly coupled with the query optimizer. We rely on the optimizer to enumerate index candidates and to estimate the benefit gained from potential index configurations. We expand the set of candidate indexes obtained from the query optimizer to include more general indexes that can be useful for queries other than those in the training workload. To recommend an index configuration, we introduce two new search algorithms. The first algorithm finds the best set of indexes for the specific training workload, and the second algorithm finds a general set of indexes that can benefit the training workload as well as other similar workloads. We have implemented our XML Index Advisor in a prototype version of IBMreg DB2reg 9, which supports both relational and XML data, and we experimentally demonstrate the effectiveness of our advisor using this implementation.
international conference on big data | 2016
Mariam Malak Fahmy; Iman Elghandour; Magdy Nagi
Given the recent advancement in the ubiquitous positioning technologies, it is now common to query terabytes of spatial data. These massive data are usually geo-distributed across multiple data centers to ensure their availability. Yet, at least one replica of the data is stored close to where the data are generated. Spatial queries are complex and computationally intensive, and therefore, distributed computation platforms, such as Hadoop are now used to improve their execution time. However, Hadoop is agnostic to the spatial data characteristics, and it randomly partitions and locates the data stored in its distributed file system which degrades the performance of the execution of spatial queries. In this paper, we propose CoS-HDFS, an extension to the Hadoop Distributed File System (HDFS) that takes into account the spatial characteristics of the data and accordingly co-locates them on the HDFS nodes that span multiple data centers. We integrate CoS-HDFS with SpatialHadoop, a MapReduce framework that natively supports spatial data, to make use of its implementation of spatial indexes, operations, and query interfaces. We experimentally demonstrate significant reduction in the network usage and total execution time in the case of spatial join queries on the TIGER dataset.
international conference on data engineering | 2016
Ahmed E. Khalifa; Iman Elghandour; Nagwa M. El-Makky
Many applications in various industrial and research areas analyze large continuously evolving data. Big data analytics platforms such as MapReduce focus on distributed batch processing, and therefore, a query needs to be re-executed every time its input data evolve. In this paper, we present IncReStore, a system that incrementally computes queries on fast growing datasets by materializing query outputs and maintaining them. IncReStore runs in two modes: (1) Opportunistic IncReStore generates compensating queries on the fly during their execution to use previously materialized query outputs taking into account that data might have evolved; and (2) Active IncReStore automatically generates MapReduce jobs to update the materialized query outputs whenever the datasets that they depend on evolve. We have implemented IncReStore as an extension to Pig and Hadoop. Our experimental evaluation of IncRestore using the TPC-H benchmark shows significant speedups.
international conference on big data | 2016
Mohamed Hassaan; Iman Elghandour
It is important to analyze and predict meteorological phenomena in real-time. Parallel programming by exploiting thousands of threads in GPUs can be efficiently used to speed up the execution of many applications. However, GPUs have limitations when used for processing big data, which can be better analyzed using distributed computing platforms such as Hadoop and Spark. In this paper, we propose DAMB a system that processes streamed data on a heterogeneous cluster of CPUs and GPUs in real-time. The core of DAMB is SparkGPU, a platform that extends Apache Spark to allow it to manage a heterogeneous cluster that has both CPUs and GPUs and to execute tasks on GPUs. DAMB also provides data visualization tools that present the analyzed data in an interactive way in real-time. As a case study, we focus on a meteorological application that analyzes lightening discharges. We show that DAMB can successfully process and analyze the meteorological data streamed to it and visualize the results in real-time on a cluster of size 12 nodes, each is equipped with one or more GPU cards. This is a speedup of two orders of magnitude as compared to a sequential program implementation for the same application.