Amr El-Helw
University of Waterloo
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Amr El-Helw.
international conference on management of data | 2014
Mohamed A. Soliman; Lyublena Antova; Venkatesh Raghavan; Amr El-Helw; Zhongxian Gu; Entong Shen; George Constantin Caragea; Carlos Garcia-Alvarado; Foyzur Rahman; Michalis Petropoulos; Florian Waas; Sivaramakrishnan Narayanan; Konstantinos Krikellas; Rhonda Baldwin
The performance of analytical query processing in data management systems depends primarily on the capabilities of the systems query optimizer. Increased data volumes and heightened interest in processing complex analytical queries have prompted Pivotal to build a new query optimizer. In this paper we present the architecture of Orca, the new query optimizer for all Pivotal data management products, including Pivotal Greenplum Database and Pivotal HAWQ. Orca is a comprehensive development uniting state-of-the-art query optimization technology with own original research resulting in a modular and portable optimizer architecture. In addition to describing the overall architecture, we highlight several unique features and present performance comparisons against other systems.
international conference on data engineering | 2007
Amr El-Helw; Ihab F. Ilyas; Wing Lau; Volker Markl; Calisto Zuzarte
Traditional DBMSs decouple statistics collection and query optimization both in space and time. Decoupling in time may lead to outdated statistics. Decoupling in space may cause statistics not to be available at the desired granularity needed to optimize a particular query, or some important statistics may not be available at all. Overall, this decoupling often leads to large cardinality estimation errors and, in consequence, to the selection of suboptimal plans for query execution. In this paper, we present JITS, a system for proactively collecting query-specific statistics during query compilation. The system employs a lightweight sensitivity analysis to choose which statistics to collect by making use of previously collected statistics and database activity patterns. The collected statistics are materialized and incrementally updated for later reuse. We present the basic concepts, architecture, and key features of JITS. We demonstrate its benefits through an extensive experimental study on a prototype inside the IBM DB2 engine.
international conference on management of data | 2014
Lyublena Antova; Amr El-Helw; Mohamed A. Soliman; Zhongxian Gu; Michalis Petropoulos; Florian Waas
Partitioning of tables based on value ranges provides a powerful mechanism to organize tables in database systems. In the context of data warehousing and large-scale data analysis partitioned tables are of particular interest as the nature of queries favors scanning large swaths of data. In this scenario, eliminating partitions from a query plan that contain data not relevant to answering a given query can represent substantial performance improvements. Dealing with partitioned tables in query optimization has attracted significant attention recently, yet, a number of challenges unique to Massively Parallel Processing (MPP) databases and their distributed nature remain unresolved. In this paper, we present optimization techniques for queries over partitioned tables as implemented in Pivotal Greenplum Database. We present a concise and unified representation for partitioned tables and devise optimization techniques to generate query plans that can defer decisions on accessing certain partitions to query run-time. We demonstrate, the resulting query plans distinctly outperform conventional query plans in a variety of scenarios.
very large data bases | 2009
Amr El-Helw; Ihab F. Ilyas; Calisto Zuzarte
Database statistics are crucial to cost-based optimizers for estimating the execution cost of a query plan. Using traditional basic statistics on base tables requires adopting unrealistic assumptions to estimate the cardinalities of intermediate results, which usually causes large estimation errors that can be several orders of magnitude. Modern commercial database systems support statistical or sample views, which give more accurate statistics on intermediate results and query sub-expressions. While previous research focused on creating and maintaining these advanced statistics, only little effort has been done towards automatically recommending the most beneficial statistical views to construct. In this paper, we present StatAdvisor, a system for recommending statistical views for a given SQL workload. The StatAdvisor addresses the special characteristics of statistical views with respect to view matching and benefit estimation, and introduces a novel plan-based candidate enumeration method, and a benefit-based analysis to determine the most useful statistical views. We present the basic concepts, architecture, and key features of StatAdvisor, and demonstrate its validity and benefits through an extensive experimental study using a prototype that we built in the IBM® DB2® database system as part of the DB2 Design Advisor tools.
data warehousing and olap | 2011
Amr El-Helw; Kenneth A. Ross; Bishwaranjan Bhattacharjee; Christian A. Lang; George A. Mihaila
Column-oriented DBMSs have gained increasing interest due to their superior performance for analytical workloads. Prior efforts tried to determine the possibility of simulating the query processing techniques of column-oriented systems in row-oriented databases, in a hope to improve their performance, especially for OLAP and data warehousing applications. In this paper, we show that column-oriented query processing can significantly improve the performance of row-oriented DBMSs. We introduce new operators that take into account the unique characteristics of data obtained from indexes, and exploit new technologies such as flash SSDs and multi-core processors to boost the performance. We demonstrate our approach with an experimental study using a prototype built on a commercial row-oriented DBMS.
very large data bases | 2015
Amr El-Helw; Venkatesh Raghavan; Mohamed A. Soliman; George Constantin Caragea; Zhongxian Gu; Michalis Petropoulos
Big Data analytics often include complex queries with similar or identical expressions, usually referred to as Common Table Expressions (CTEs). CTEs may be explicitly defined by users to simplify query formulations, or implicitly included in queries generated by business intelligence tools, financial applications and decision support systems. In Massively Parallel Processing (MPP) database systems, CTEs pose new challenges due to the distributed nature of query processing, the overwhelming volume of underlying data and the scalability criteria that systems are required to meet. In these settings, the effective optimization and efficient execution of CTEs are crucial for the timely processing of analytical queries over Big Data. In this paper, we present a comprehensive framework for the representation, optimization and execution of CTEs in the context of Orca -- Pivotals query optimizer for Big Data. We demonstrate experimentally the benefits of our techniques using industry standard decision support benchmark.
very large data bases | 2018
Bart Samwel; Himani Apte; Felix Weigel; David Wilhite; Jiacheng Yang; Jun Xu; Jiexing Li; Zhan Yuan; Craig Chasseur; Qiang Zeng; Ian Rae; John Cieslewicz; Anurag Biyani; Andrew Harn; Yang Xia; Andrey Gubichev; Amr El-Helw; Orri Erling; Zhepeng Yan; Mohan Yang; Yiqun Wei; Thanh Do; Ben Handy; Colin Zheng; Goetz Graefe; Somayeh Sardashti; Ahmed M. Aly; Divy Agrawal; Ashish Gupta; Shiv Venkataraman
F1 Query is a stand-alone, federated query processing platform that executes SQL queries against data stored in different file-based formats as well as different storage systems at Google (e.g., Bigtable, Spanner, Google Spreadsheets, etc.). F1 Query eliminates the need to maintain the traditional distinction between different types of data processing workloads by simultaneously supporting: (i) OLTP-style point queries that affect only a few records; (ii) low-latency OLAP querying of large amounts of data; and (iii) large ETL pipelines. F1 Query has also significantly reduced the need for developing hard-coded data processing pipelines by enabling declarative queries integrated with custom business logic. F1 Query satisfies key requirements that are highly desirable within Google: (i) it provides a unified view over data that is fragmented and distributed over multiple data sources; (ii) it leverages datacenter resources for performant query processing with high throughput and low latency; (iii) it provides high scalability for large data sizes by increasing computational parallelism; and (iv) it is extensible and uses innovative approaches to integrate complex business logic in declarative query processing. This paper presents the end-to-end design of F1 Query. Evolved out of F1, the distributed database originally built to manage Googles advertising data, F1 Query has been in production for multiple years at Google and serves the querying needs of a large number of users and systems.
international conference on management of data | 2012
Amr El-Helw; Mina H. Farid; Ihab F. Ilyas
Archive | 2014
Lyublena Antova; Amr El-Helw; Mohamed Y. Soliman; Zhongxian Gu; Michail Petropoulos; Florian Waas
Archive | 2014
Amr El-Helw; Venkatesh Raghavan; Mohamed F. Soliman; George Constantin Caragea; Michail Petropoulos