Mohammed Al-Kateb | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Mohammed Al-Kateb is active.

Explore More

Publication

Featured researches published by Mohammed Al-Kateb.

extending database technology | 2013

Temporal query processing in Teradata

Mohammed Al-Kateb; Ahmad Ghazal; Alain Crolotte; Ramesh Bhashyam; Jaiprakash Chimanchode; Sai Pavan Pakala

The importance of temporal data management is evident by the temporal features recently released in major commercial database systems. In Teradata, the temporal feature is based on the TSQL2 specification. In this paper, we present Teradatas implementation approach for temporal query processing. There are two common approaches to support temporal query processing in a database engine. One is through functional query rewrites to convert a temporal query to a semantically-equivalent non-temporal counterpart, mostly by adding time-based constraints. The other is a native support that implements temporal database operations such as scans and joins directly in the DBMS internals. These approaches have competing pros and cons. The rewrite approach is generally simpler to implement. But it adds a structural complexity to original query, which can pose a potential challenge to query optimizer and cause it to generate sub-optimal plans. A native support is expected to perform better. But it usually involves a higher cost of implementation, maintenance, and extension. We discuss why and describe how Teradata adopted the rewrite approach. In addition, we present an evaluation of our approach through a performance study conducted on a variation of the TPC-H benchmark with temporal tables and queries.

international conference on management of data | 2012

Adaptive optimizations of recursive queries in teradata

Ahmad Ghazal; Dawit Seid; Alain Crolotte; Mohammed Al-Kateb

Recursive queries were introduced as part of ANSI SQL 99 to support processing of hierarchical data typical of air flight schedules, bill-of-materials, data cube dimension hierarchies, and ancestor-descendant information (e.g. XML data stored in relations). Recently, recursive queries have also found extensive use in web data analysis such as social network and click stream data. Teradata implemented recursive queries in V2R6 using static plans whereby a query is executed in multiple iterations, each iteration corresponding to one level of the recursion. Such a static planning strategy may not be optimal since the demographics of intermediate results from recursive iterations often vary to a great extent. Gathering feedback at each iteration could address this problem by providing size estimates to the optimizer which, in turn, can produce an execution plan for the next iteration. However, such a full feedback scheme suffers from lack of pipelining and the inability to exploit global optimizations across the different recursion iterations. In this paper, we propose adaptive optimization techniques that avoid the issues with static as well as full feedback optimization approaches. Our approach employs a mix of multi-iteration pre-planning and dynamic feedback techniques which are generally applicable to any recursive query implementation in an RDBMS. We also validated the effectiveness of our proposed techniques by conducting experiments on a prototype implementation using a real-life social network data from the FriendFeed online blogging service.

Technology Conference on Performance Evaluation and Benchmarking | 2012

Adding a Temporal Dimension to the TPC-H Benchmark

Mohammed Al-Kateb; Alain Crolotte; Ahmad Ghazal; Linda Rose

The importance of time in decision-support is widely recognized and has been addressed through temporal applications or through native temporal features by major DBMS vendors. In this paper we propose a framework for adding a new temporal component to the TPC-H benchmark. Our proposal includes temporal DDL, procedures to populate the temporal tables via insertselect thereby providing history, and temporal queries based on a workload that covers the temporal dimension broken down as current, history, and both. The queries we define as part of this benchmark include the typical SQL operators involved in scans, joins and aggregations. The paper concludes with experimental results. While in this paper we consider adding temporal history to a subset of the TPC-H benchmark tables namely Part/ Supplier/Partsupp, our proposed framework addresses a need and uses, as a starting point, a benchmark that is widely successful and well-understood.

international conference on data engineering | 2017

BigBench V2: The New and Improved BigBench

Ahmad Ghazal; Todor Ivanov; Pekka Kostamaa; Alain Crolotte; Ryan Voong; Mohammed Al-Kateb; Waleed Ghazal; Roberto V. Zicari

Benchmarking Big Data solutions has been gaining a lot of attention from research and industry. BigBench is one of the most popular benchmarks in this area which was adopted by the TPC as TPCx-BB. BigBench, however, has key shortcomings. The structured component of the data model is the same as the TPC-DS data model which is a complex snowflake-like schema. This is contrary to the simple star schema Big Data models in real life. BigBench also treats the semi-structured web-logs more or less as a structured table. In real life, web-logs are modeled as key-value pairs with unknown schema. Specific keys are captured at query time - a process referred to as late binding. In addition, eleven (out of thirty) of the BigBench queries are TPC-DS queries. These queries are complex SQL applied on the structured part of the data model which again is not typical of Big Data workloads. In this paper1, we present BigBench V2 to address the aforementioned limitations of the original BigBench. BigBench V2 is completely independent of TPC-DS with a new data model and an overhauled workload. The new data model has a simple structured data model. Web-logs are modeled as key-value pairs with a substantial and variable number of keys. BigBench V2 mandates late binding by requiring query processing to be done directly on key-value web-logs rather than a pre-parsed form of it. A new scale factor-based data generator is implemented to produce structured tables, key-value semistructured web-logs, and unstructured data. We implemented and executed BigBench V2 on Hive. Our proof of concept shows the feasibility of BigBench V2 and outlines different ways of implementing late binding.

database and expert systems applications | 2012

An Efficient SQL Rewrite Approach for Temporal Coalescing in the Teradata RDBMS

Mohammed Al-Kateb; Ahmad Ghazal; Alain Crolotte

The importance of temporal data management is manifested by a considerable attention from the database research community. This importance is becoming even more evident by the recent increasing support of temporal features in major commercial database systems. Among these systems, Teradata offers a native support to a wide range of temporal analytics. In this paper, we address the problem of temporal coalescing in the Teradata RDBMS. Temporal coalescing is a key temporal query processing operation, which merges adjacent or overlapping timestamps of value-equivalent rows. From existing approaches to implement temporal coalescing, pursuing an SQL-based approach is perhaps the most feasible and the easiest applicable. Along this direction, we propose an efficient SQL rewrite approach to implement temporal coalescing in the Teradata RDBMS by leveraging runtime conditional partitioning – a Teradata enhancement to ANSI ordered analytic functions – that enables to express the coalescing semantic in an optimized join-free single-scan SQL query. We evaluated our proposed approach over a system running Teradata 14.0 with a performance study that demonstrates its efficiency.

international conference on data engineering | 2017

Dynamic Statistics Collection in the Teradata Unified Data Architecture

Sung Jin Kim; Mohammed Al-Kateb; Paul Sinclair; Alain Crolotte; Chengyang Zhang; Linda Rose

The Unified Data Architecture (UDA) of Teradata is an inclusive multisystem data analytics solution. A key challenge for query optimization under the UDA is to find optimal plans for queries that access data on heterogeneous remote data stores. The challenge comes primarily from the lack of statistics for data stored on remote systems. In this paper, we present techniques implemented in Teradata Database for dynamically collecting statistics on data fetched from a remote system and feeding these statistics back to the query optimizer during query execution. We demonstrate the performance impact of dynamic statistics collection and feedback with experiments conducted on a system that consists of Teradata Database and a remote Hadoop server.

international conference on data engineering | 2017

Optimizing UNION ALL Join Queries in Teradata

Mohammed Al-Kateb; Paul Sinclair; Alain Crolotte; Lu Ma; Grace Au; Sanjay Nair

The UNION ALL set operator is useful for combining data from multiple sources. With the emergence of big data ecosystems in which data is typically stored on multiple systems, UNION ALL has become even more important. In this paper, we present optimization techniques implemented in Teradata Database for join queries with UNION ALL. Instead of spooling all branches of UNION ALL before performing join operations, we propose cost-based pushing of joins into branches. Join pushing not only addresses the prohibitive cost of spooling all branches, but also helps in exposing more efficient join methods (e.g., direct hash-based joins) which, otherwise, would not be considered by the query optimizer. The geography of relations being pushed to UNION ALL branches is also adjusted to avoid unnecessary redistributions and duplications of data. We conclude the paper with a performance study that demonstrates the impact of the proposed optimization techniques on query performance.

very large data bases | 2016

Hybrid row-column partitioning in teradata ®

Mohammed Al-Kateb; Paul Sinclair; Grace Au; Carrie Ballinger

Data partitioning is an indispensable ingredient of database systems due to the performance improvement it can bring to any given mixed workload. Data can be partitioned horizontally or vertically. While some commercial proprietary and open source database systems have one flavor or mixed flavors of these partitioning forms, Teradata Database offers a unique hybrid row-column store solution that seamlessly combines both of these partitioning schemes. The key feature of this hybrid solution is that either row, column, or combined partitions are all stored and handled in the same way internally by the underlying file system storage layer. In this paper, we present the main characteristics and explain the implementation approach of Teradatas row-column store. We also discuss query optimization techniques applicable specifically to partitioned tables. Furthermore, we present a performance study that demonstrates how different partitioning options impact the performance of various queries.

international conference on management of data | 2018

Joins over UNION ALL Queries in Teradata®: Demonstration of Optimized Execution

Mohammed Al-Kateb; Paul Sinclair; Grace Au; Sanjay Nair; Mark Sirek; Lu Ma; Mohamed Y. Eltabakh

The UNION ALL set operator is useful for combining data from multiple sources. With the emergence and prevalence of big data ecosystems in which data is typically stored on multiple systems, UNION ALL has become even more important in many analytical queries. In this project, we demonstrate novel cost-based optimization techniques implemented in Teradata Database for join queries involving UNION ALL views and derived tables. Instead of the naive and traditional way of spooling each UNION ALL branch to a common spool prior to performing join operations, which can be prohibitively expensive, we demonstrate new techniques developed in Teradata Database including: 1) Cost-based pushing of joins into UNION ALL branches, 2) Branch grouping strategy prior to join pushing, 3) Geography adjustment of the pushed relations to avoid unnecessary redistribution or duplication, 4) Iterative join decomposition of a pushed join to multiple joins, and 5) Combining multiple join steps into a single multisource join step. In the demonstration, we use the Teradata Visual Explain tool, which offers a rich set of visual rendering capabilities of query plans, the display of various metadata information for each plan step, and several interactive UGI options for end-users.

Archive | 2012