Ahmed M. Aly
Purdue University
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Ahmed M. Aly.
international conference on data engineering | 2012
Ahmed M. Aly; Asmaa Sallam; Bala M. Gnanasekaran; Long Van Nguyen-Dinh; Walid G. Aref; Mourad Ouzzani; Arif Ghafoor
The continuous growth of social web applications along with the development of sensor capabilities in electronic devices is creating countless opportunities to analyze the enormous amounts of data that is continuously steaming from these applications and devices. To process large scale data on large scale computing clusters, MapReduce has been introduced as a framework for parallel computing. However, most of the current implementations of the MapReduce framework support only the execution of fixed-input jobs. Such restriction makes these implementations inapplicable for most streaming applications, in which queries are continuous in nature, and input data streams are continuously received at high arrival rates. In this demonstration, we showcase M3, a prototype implementation of the MapReduce framework in which continuous queries over streams of data can be efficiently answered. M3 extends Hadoop, the open source implementation of MapReduce, bypassing the Hadoop Distributed File System (HDFS) to support main-memory-only processing. Moreover, M3 supports continuous execution of the Map and Reduce phases where individual Mappers and Reducers never terminate.
very large data bases | 2015
Ahmed R. Mahmood; Ahmed M. Aly; Thamir Qadah; El Kindi Rezig; Anas Daghistani; Amgad Madkour; Ahmed S. Abdelhamid; Mohamed S. Hassan; Walid G. Aref; Saleh M. Basalamah
The widespread use of location-aware devices together with the increased popularity of micro-blogging applications (e.g., Twitter) led to the creation of large streams of spatio-textual data. In order to serve real-time applications, the processing of these large-scale spatio-textual streams needs to be distributed. However, existing distributed stream processing systems (e.g., Spark and Storm) are not optimized for spatial/textual content. In this demonstration, we introduce Tornado, a distributed in-memory spatio-textual stream processing server that extends Storm. To efficiently process spatio-textual streams, Tornado introduces a spatio-textual indexing layer to the architecture of Storm. The indexing layer is adaptive, i.e., dynamically re-distributes the processing across the system according to changes in the data distribution and/or query workload. In addition to keywords, higher-level textual concepts are identified and are semantically matched against spatio-textual queries. Tornado provides data deduplication and fusion to eliminate redundant textual data. We demonstrate a prototype of Tornado running against real Twitter streams, where the users can register continuous or snapshot spatio-textual queries using a map-assisted query-interface.
very large data bases | 2012
Ahmed M. Aly; Walid G. Aref; Mourad Ouzzani
The widespread use of location-aware devices has led to countless location-based services in which a user query can be arbitrarily complex, i.e., one that embeds multiple spatial selection and join predicates. Amongst these predicates, the k-Nearest-Neighbor (kNN) predicate stands as one of the most important and widely used predicates. Unlike related research, this paper goes beyond the optimization of queries with single kNN predicates, and shows how queries with two kNN predicates can be optimized. In particular, the paper addresses the optimization of queries with: (i) two kNN-select predicates, (ii) two kNN-join predicates, and (iii) one kNN-join predicate and one kNN-select predicate. For each type of queries, conceptually correct query evaluation plans (QEPs) and new algorithms that optimize the query execution time are presented. Experimental results demonstrate that the proposed algorithms outperform the conceptually correct QEPs by orders of magnitude.
international conference on data engineering | 2014
Amr Magdy; Ahmed M. Aly; Mohamed F. Mokbel; Sameh Elnikety; Yuxiong He; Suman Nath
Mars demonstration exploits the microblogs location information to support a wide variety of important spatio-temporal queries on microblogs. Supported queries include range, nearest-neighbor, and aggregate queries. Mars works under a challenging environment where streams of microblogs are arriving with high arrival rates. Mars distinguishes itself with three novel contributions: (1) Efficient in-memory digestion/expiration techniques that can handle microblogs of high arrival rates up to 64,000 microblog/sec. This also includes highly accurate and efficient hopping-window based aggregation for incoming microblogs keywords. (2) Smart memory optimization and load shedding techniques that adjust in-memory contents based on the expected query load to trade off a significant storage savings with a slight and bounded accuracy loss. (3) Scalable real-time query processing, exploiting Zipf distributed microblogs data for efficient top-k aggregate query processing. In addition, Mars employs a scalable real-time nearest neighbor and range query processing module that employs various pruning techniques so that it serves heavy query workloads in real time. Mars is demonstrated using a stream of real tweets obtained from Twitter firehose with a production query workload obtained from Bing web search. We show that Mars serves incoming queries with an average latency of less than 4 msec and with 99% answer accuracy while saving up to 70% of storage overhead for different query loads.
extending database technology | 2015
Ahmed M. Aly; Walid G. Aref; Mourad Ouzzani
Advances in geo-sensing technology have led to an unprecedented spread of location-aware devices. In turn, this has resulted into a plethora of location-based services in which huge amounts of spa- tial data need to be efficiently consumed by spatial query proces- sors. For a spatial query processor to properly choose among the various query processing strategies, the cost of the spatial operators has to be estimated. In this paper, we study the problem of estimat- ing the cost of the spatialk-nearest-neighbor (k-NN, for short) op- erators, namely,k-NN-Select andk-NN-Join. Given a query that has ak-NN operator, the objective is to estimate the number of blocks that are going to be scanned during the processing of this operator. Estimating the cost of ak-NN operator is challenging for several reasons. For instance, the cost of ak-NN-Select operator is directly affected by the value ofk, the location of the query focal point, and the distribution of the data. Hence, a cost model that captures these factors is relatively hard to realize. This paper in- troduces cost estimation techniques that maintain a compact set of cataloginformation that can be kept in main-memory to enable fast estimation via lookups. A detailed study of the performance and accuracy trade-off of each proposed technique is presented. Ex- perimental results using real spatial datasets from OpenStreetMap demonstrate the robustness of the proposed estimation techniques.
advances in geographic information systems | 2016
Amr Magdy; Ahmed M. Aly; Mohamed F. Mokbel; Sameh Elnikety; Yuxiong He; Suman Nath; Walid G. Aref
This paper presents GeoTrend; a system for scalable support of spatial trend discovery on recent microblogs, e.g., tweets and online reviews, that come in real time. GeoTrend is distinguished from existing techniques in three aspects: (1) It discovers trends in arbitrary spatial regions, e.g., city blocks. (2) It supports trending measures that effectively capture trending items under a variety of definitions that suit different applications. (3) It promotes recent microblogs as first-class citizens and optimizes its system components to digest a continuous flow of fast data in main-memory while removing old data efficiently. GeoTrend queries are top-k queries that discover the most trending k keywords that are posted within an arbitrary spatial region and during the last T time units. To support its queries efficiently, GeoTrend employs an in-memory spatial index that is able to efficiently digest incoming data and expire data that is beyond the last T time units. The index also materializes top-k keywords in different spatial regions so that incoming queries can be processed with low latency. In case of peak times, a main-memory optimization technique is employed to shed less important data, so that the system still sustains high query accuracy with limited memory resources. Experimental results based on real Twitter feed and Bing Mobile spatial search queries show the scalability of GeoTrend to support arrival rates of up to 50,000 microblog/second, average query latency of 3 milli-seconds, and at least 90+% query accuracy even under limited memory resources.
very large data bases | 2015
Ahmed M. Aly; Ahmed S. Abdelhamid; Ahmed R. Mahmood; Walid G. Aref; Mohamed S. Hassan; Hazem Elmeleegy; Mourad Ouzzani
The ubiquity of location-aware devices, e.g., smartphones and GPS devices, has led to a plethora of location-based services in which huge amounts of geotagged information need to be efficiently processed by large-scale computing clusters. This demo presents AQWA, an adaptive and query-workload-aware data partitioning mechanism for processing large-scale spatial data. Unlike existing cluster-based systems, e.g., SpatialHadoop, that apply static partitioning of spatial data, AQWA has the ability to react to changes in the query-workload and data distribution. A key feature of AQWA is that it does not assume prior knowledge of the query-workload or data distribution. Instead, AQWA reacts to changes in both the data and the query-workload by incrementally updating the partitioning of the data. We demonstrate two prototypes of AQWA deployed over Hadoop and Spark. In both prototypes, we process spatial range and k-nearest-neighbor (kNN, for short) queries over large-scale spatial datasets, and we exploit the performance of AQWA under different query-workloads.
web search and data mining | 2016
Ahmed M. Aly; Hazem Elmeleegy; Yan Qi; Walid G. Aref
Despite the importance and widespread use of range data, e.g., time intervals, spatial ranges, etc., little attention has been devoted to study the processing and querying of range data in the context of big data. The main challenge relies in the nature of the traditional index structures e.g., B-Tree and R-Tree, being centralized by nature, and hence are almost crippled when deployed in a distributed environment. To address this challenge, this paper presents Kangaroo, a system built on top of Hadoop to optimize the execution of range queries over range data. The main idea behind Kangaroo is to split the data into non-overlapping partitions in a way that minimizes the query execution time. Kangaroo is query workload-aware, i.e., results in partitioning layouts that minimize the query processing time of given query patterns. In this paper, we study the design challenges Kangaroo addresses in order to be deployed on top of a distributed file system, i.e., HDFS. We also study four different partitioning schemes that Kangaroo can support. With extensive experiments using real range data of more than one billion records and real query workload of more than 30,000 queries, we show that the partitioning schemes of Kangaroo can significantly reduce the I/O of range queries on range data.
international conference on management of data | 2016
Mohamed S. Hassan; Walid G. Aref; Ahmed M. Aly
A variety of applications spanning various domains, e.g., social networks, transportation, and bioinformatics, have graphs as first-class citizens. These applications share a vital operation, namely, finding the shortest path between two nodes. In many scenarios, users are interested in filtering the graph before finding the shortest path. For example, in social networks, one may need to compute the shortest path between two persons on a sub-graph containing only family relationships. This paper focuses on dynamic graphs with labeled edges, where the target is to find a shortest path after filtering some edges based on user-specified query labels. This problem is termed the Edge-Constrained Shortest Path query (or ECSP, for short). This paper introduces Edge-Disjoint Partitioning (EDP, for short), a new technique for efficiently answering ECSP queries over dynamic graphs. EDP has two main components: a dynamic index that is based on graph partitioning, and a traversal algorithm that exploits the regular patterns of the answers of ECSP queries. The main idea of EDP is to partition the graph based on the labels of the edges. On demand, EDP computes specific sub-paths within each partition and updates its index. The computed sub-paths act as pre-computations that can be leveraged by future queries. To answer an ECSP query, EDP connects sub-paths from different partitions using its efficient traversal algorithm. EDP can dynamically handle various types of graph updates, e.g., label, edge, and node updates. The index entries that are potentially affected by graph updates are invalidated and re-computed on demand. EDP is evaluated using real graph datasets from various domains. Experimental results demonstrate that EDP can achieve query performance gains of up to four orders of magnitude in comparison to state of the art techniques.
advances in geographic information systems | 2016
Ahmed R. Mahmood; Walid G. Aref; Ahmed M. Aly; Mingjie Tang
The popularity of GPS-enabled cellular devices introduced numerous applications, e.g., social networks, micro-blogs, and crowd-powered reviews. These applications produce large amounts of geo-tagged textual data that need to be processed and queried. Nowadays, many complex spatio-textual operators and their matching complex indexing structures are being proposed in the literature to process this spatio-textual data. For example, there exist several complex variations of the spatio-textual group queries that retrieve groups of objects that collectively satisfy certain spatial and textual criteria. However, having complex operators is against the spirit of SQL and relational algebra. In contrast to these complex spatio-textual operators, in relational algebra, simple relational operators are offered, e.g., relational selects, projects, order by, and group by, that are composable to form more complex queries. In this paper, we introduce Atlas, an SQL extension to express complex spatial-keyword group queries. Atlas follows the philosophy of SQL and relational algebra in that it uses simple declarative spatial and textual building-block operators and predicates to extend SQL. Not only that Atlas can represent spatio-textual group queries from the literature, but also it can compose other important queries, e.g., retrieve spatio-textual groups from subsets of object datasets where the selected subset satisfies user-defined relational predicates and the groups of close-by objects contain miss-spelled keywords. We demonstrate that Atlas is able to represent a wide range of spatial-keyword queries that existing indexes and algorithms would not be able to address. The building- block paradigm adopted by Atlas creates room for query optimization, where multiple query execution plans can be formed.