Hazem Elmeleegy
Purdue University
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Hazem Elmeleegy.
international conference on web services | 2008
Hazem Elmeleegy; Anca A. Ivan; Rama Akkiraju; Richard Goodwin
Mashup editors, like Yahoo Pipes and IBM Lotus Mashup Maker, allow non-programmer end-users to ldquomash-uprdquo information sources and services to meet their information needs. However, with the increasing number of services, information sources and complex operations like filtering and joining, even an easy to use editor is not sufficient. MashupAdvisor aims to assist mashup creators to build higher quality mashups in less time. Based on the current state of a mashup, the MashupAdvisor quietly suggests outputs (goals) that the user might want to include in the final mashup. MashupAdvisor exploits a repository of mashups to estimate the popularity of specific outputs, and makes suggestions using the conditional probability that an output will be included, given the current state of the mashup. When a suggestion is accepted, MashupAdvisor uses a semantic matching algorithm and a metric planner to modify the mashup to produce the suggested output. Our prototype was implemented on top of IBM Lotus MashupMaker and our initial results show that it is effective.
international conference on data engineering | 2010
Nilothpal Talukder; Mourad Ouzzani; Ahmed K. Elmagarmid; Hazem Elmeleegy; Mohamed Yakout
The increasing popularity of social networks, such as Facebook and Orkut, has raised several privacy concerns. Traditional ways of safeguarding privacy of personal information by hiding sensitive attributes are no longer adequate. Research shows that probabilistic classification techniques can effectively infer such private information. The disclosed sensitive information of friends, group affiliations and even participation in activities, such as tagging and commenting, are considered background knowledge in this process. In this paper, we present a privacy protection tool, called Privometer, that measures the amount of sensitive information leakage in a user profile and suggests self-sanitization actions to regulate the amount of leakage. In contrast to previous research, where inference techniques use publicly available profile information, we consider an augmented model where a potentially malicious application installed in the users friend profiles can access substantially more information. In our model, merely hiding the sensitive information is not sufficient to protect the user privacy. We present an implementation of Privometer in Facebook.
international conference on management of data | 2013
Meihui Zhang; Hazem Elmeleegy; Cecilia M. Procopiuc; Divesh Srivastava
We study the following problem: Given a database D with schema G and an output table Out, compute a join query Q that generates OUT from D. A simpler variant allows Q to return a superset of Out. This problem has numerous applications, both by itself, and as a building block for other problems. Related prior work imposes conditions on the structure of Q which are not always consistent with the application, but simplify computation. We discuss several natural SQL queries that do not satisfy these conditions and cannot be discovered by prior work. In this paper, we propose an efficient algorithm that discovers queries with arbitrary join graphs. A crucial insight is that any graph can be characterized by the combination of a simple structure, called a star, and a series of merge steps over the star. The merge steps define a lattice over graphs derived from the same star. This allows us to explore the set of candidate solutions in a principled way and quickly prune out a large number of infeasible graphs. We also design several optimizations that significantly reduce the running time. Finally, we conduct an extensive experimental study over a benchmark database and show that our approach is scalable and accurately discovers complex join queries.
international conference on data engineering | 2008
Hazem Elmeleegy; Mourad Ouzzani; Ahmed K. Elmagarmid
Existing techniques for schema matching are classified as either schema-based, instance-based, or a combination of both. In this paper, we define a new class of techniques, called usage-based schema matching. The idea is to exploit information extracted from the query logs to find correspondences between attributes in the schemas to be matched. We propose methods to identify co-occurrence patterns between attributes in addition to other features such as their use in joins and with aggregate functions. Several scoring functions are considered to measure the similarity of the extracted features, and a genetic algorithm is employed to find the highest- score mappings between the two schemas. Our technique is suitable for matching schemas even when their attribute names are opaque. It can further be combined with existing techniques to obtain more accurate results. Our experimental study demonstrates the effectiveness of the proposed approach and the benefit of combining it with other existing approaches.
very large data bases | 2013
Hazem Elmeleegy; Yinan Li; Yan Qi; Peter Wilmot; Mingxi Wu; Santanu Kolay; Ali Dasdan; Songting Chen
This paper gives an overview of Turn Data Management Platform (DMP). We explain the purpose of this type of platforms, and show how it is positioned in the current digital advertising ecosystem. We also provide a detailed description of the key components in Turn DMP. These components cover the functions of (1) data ingestion and integration, (2) data warehousing and analytics, and (3) real-time data activation. For all components, we discuss the main technical and research challenges, as well as the alternative design choices. One of the main goals of this paper is to highlight the central role that data management is playing in shaping this fast growing multi-billion dollars industry.
international conference on management of data | 2010
Hazem Elmeleegy; Mourad Ouzzani; Ahmed K. Elmagarmid; Ahmad M. Abusalah
Peer-to-peer data integration - a.k.a. Peer Data Management Systems (PDMSs) - promises to extend the classical data integration approach to the Internet scale. Unfortunately, some challenges remain before realizing this promise. One of the biggest challenges is preserving the privacy of the exchanged data while passing through several intermediate peers. Another challenge is protecting the mappings used for data translation. Protecting the privacy without being unfair to any of the peers is yet a third challenge. This paper presents a novel query answering protocol in PDMSs to address these challenges. The protocol employs a technique based on noise selection and insertion to protect the query results, and a commutative encryption-based technique to protect the mappings and ensure fairness among peers. An extensive security analysis of the protocol shows that it is resilient to several possible types of attacks. We implemented the protocol within an established PDMS: the Hyperion system. We conducted an experimental study using real data from the healthcare domain. The results show that our protocol manages to achieve its privacy and fairness goals, while maintaining query processing time at the interactive level.
very large data bases | 2015
Ahmed M. Aly; Ahmed S. Abdelhamid; Ahmed R. Mahmood; Walid G. Aref; Mohamed S. Hassan; Hazem Elmeleegy; Mourad Ouzzani
The ubiquity of location-aware devices, e.g., smartphones and GPS devices, has led to a plethora of location-based services in which huge amounts of geotagged information need to be efficiently processed by large-scale computing clusters. This demo presents AQWA, an adaptive and query-workload-aware data partitioning mechanism for processing large-scale spatial data. Unlike existing cluster-based systems, e.g., SpatialHadoop, that apply static partitioning of spatial data, AQWA has the ability to react to changes in the query-workload and data distribution. A key feature of AQWA is that it does not assume prior knowledge of the query-workload or data distribution. Instead, AQWA reacts to changes in both the data and the query-workload by incrementally updating the partitioning of the data. We demonstrate two prototypes of AQWA deployed over Hadoop and Spark. In both prototypes, we process spatial range and k-nearest-neighbor (kNN, for short) queries over large-scale spatial datasets, and we exploit the performance of AQWA under different query-workloads.
very large data bases | 2017
Wei Chit Tan; Meihui Zhang; Hazem Elmeleegy; Divesh Srivastava
Query reverse engineering seeks to re-generate the SQL query that produced a given query output table from a given database. In this paper, we solve this problem for OLAP queries with group-by and aggregation. We develop a novel three-phase algorithm named REGAL 1 for this problem. First, based on a lattice graph structure, we identify a set of group-by candidates for the desired query. Second, we apply a set of aggregation constraints that are derived from the properties of aggregate operators at both the table-level and the group-level to discover candidate combinations of group-by columns and aggregations that are consistent with the given query output table. Finally, we find a multi-dimensional filter, i.e., a conjunction of selection predicates over the base table attributes, that is needed to generate the exact query output table. We conduct an extensive experimental study over the TPC-H dataset to demonstrate the effectiveness and efficiency of our proposal.
web search and data mining | 2016
Ahmed M. Aly; Hazem Elmeleegy; Yan Qi; Walid G. Aref
Despite the importance and widespread use of range data, e.g., time intervals, spatial ranges, etc., little attention has been devoted to study the processing and querying of range data in the context of big data. The main challenge relies in the nature of the traditional index structures e.g., B-Tree and R-Tree, being centralized by nature, and hence are almost crippled when deployed in a distributed environment. To address this challenge, this paper presents Kangaroo, a system built on top of Hadoop to optimize the execution of range queries over range data. The main idea behind Kangaroo is to split the data into non-overlapping partitions in a way that minimizes the query execution time. Kangaroo is query workload-aware, i.e., results in partitioning layouts that minimize the query processing time of given query patterns. In this paper, we study the design challenges Kangaroo addresses in order to be deployed on top of a distributed file system, i.e., HDFS. We also study four different partitioning schemes that Kangaroo can support. With extensive experiments using real range data of more than one billion records and real query workload of more than 30,000 queries, we show that the partitioning schemes of Kangaroo can significantly reduce the I/O of range queries on range data.
very large data bases | 2018
Wei Chit Tan; Meihui Zhang; Hazem Elmeleegy; Divesh Srivastava
The goal of query reverse engineering is to re-generate the SQL query that produced a given result from some known database. The problem has many real world applications where users need to better understand the lineage and trustworthiness of various data reports even when the authors of those reports are no longer reachable or are unable to provide the required explanations anymore. It gets more challenging as the complexities of both the query and database schema increase. Prior work has addressed the reverse engineering of constrained types of SQL queries and sometimes on constrained schemas, such as single-table schemas. In this demonstration, we present a framework called REGAL, which builds upon, and extends prior work to enable the discovery of Select-Project-Join-Aggregation (SPJA) queries over arbitrary schemas. Without any prior schema knowledge or SQL expertise, the user only needs to upload a data report (e.g., as a spreadsheet), and the system will automatically compute and display the queries capable of generating that report from the database. PVLDB Reference Format: Wei Chit Tan, Meihui Zhang, Hazem Elmeleegy, and Divesh Srivastava. REGAL+: Reverse Engineering SPJA Queries. PVLDB, 11 (12): 1982-1985, 2018. DOI: https://doi.org/10.14778/3229863.3236240