Alain Crolotte | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Alain Crolotte is active.

Explore More

Publication

Featured researches published by Alain Crolotte.

extending database technology | 2013

Temporal query processing in Teradata

Mohammed Al-Kateb; Ahmad Ghazal; Alain Crolotte; Ramesh Bhashyam; Jaiprakash Chimanchode; Sai Pavan Pakala

The importance of temporal data management is evident by the temporal features recently released in major commercial database systems. In Teradata, the temporal feature is based on the TSQL2 specification. In this paper, we present Teradatas implementation approach for temporal query processing. There are two common approaches to support temporal query processing in a database engine. One is through functional query rewrites to convert a temporal query to a semantically-equivalent non-temporal counterpart, mostly by adding time-based constraints. The other is a native support that implements temporal database operations such as scans and joins directly in the DBMS internals. These approaches have competing pros and cons. The rewrite approach is generally simpler to implement. But it adds a structural complexity to original query, which can pose a potential challenge to query optimizer and cause it to generate sub-optimal plans. A native support is expected to perform better. But it usually involves a higher cost of implementation, maintenance, and extension. We discuss why and describe how Teradata adopted the rewrite approach. In addition, we present an evaluation of our approach through a performance study conducted on a variation of the TPC-H benchmark with temporal tables and queries.

international conference on management of data | 2009

Dynamic plan generation for parameterized queries

Ahmad Ghazal; Dawit Seid; Bhashyam Ramesh; Alain Crolotte; Manjula Koppuravuri; Vinod G

Query processing in a DBMS typically involves two distinct phases: compilation, which generates the best plan and its corresponding execution steps, and execution, which evaluates these steps against database objects. For some queries, considerable resource savings can be achieved by skipping the compilation phase when the same query was previously submitted and its plan was already cached. In a number of important applications the same query, called a Parameterized Query (PQ), is repeatedly submitted in the same basic form but with different parameter values. PQs are extensively used in both data update (e.g. batch update programs) and data access queries. There are tradeoffs associated with caching and re-using query plans such as space utilization and maintenance cost. Besides, pre-compiled plans may be suboptimal for a particular execution due to various reasons including data skew and inability to exploit value-based query transformation like materialized view rewrite and unsatisfiable predicate elimination. We address these tradeoffs by distinguishing two types of plans for PQs: generic and specific plans. Generic plans are pre-compiled plans that are independent of the actual parameter values. Prior to execution, parameter values are plugged in to generic plans. In specific plans, parameter values are plugged prior to the compilation phase. This paper provides a practical framework for dynamically deciding between specific and generic plans for PQs based on a mix of rule and cost based heuristics which are implemented in the Teradata 12.0 DBMS.

international conference on management of data | 2012

Adaptive optimizations of recursive queries in teradata

Ahmad Ghazal; Dawit Seid; Alain Crolotte; Mohammed Al-Kateb

Recursive queries were introduced as part of ANSI SQL 99 to support processing of hierarchical data typical of air flight schedules, bill-of-materials, data cube dimension hierarchies, and ancestor-descendant information (e.g. XML data stored in relations). Recently, recursive queries have also found extensive use in web data analysis such as social network and click stream data. Teradata implemented recursive queries in V2R6 using static plans whereby a query is executed in multiple iterations, each iteration corresponding to one level of the recursion. Such a static planning strategy may not be optimal since the demographics of intermediate results from recursive iterations often vary to a great extent. Gathering feedback at each iteration could address this problem by providing size estimates to the optimizer which, in turn, can produce an execution plan for the next iteration. However, such a full feedback scheme suffers from lack of pipelining and the inability to exploit global optimizations across the different recursion iterations. In this paper, we propose adaptive optimization techniques that avoid the issues with static as well as full feedback optimization approaches. Our approach employs a mix of multi-iteration pre-planning and dynamic feedback techniques which are generally applicable to any recursive query implementation in an RDBMS. We also validated the effectiveness of our proposed techniques by conducting experiments on a prototype implementation using a real-life social network data from the FriendFeed online blogging service.

Technology Conference on Performance Evaluation and Benchmarking | 2012

Adding a Temporal Dimension to the TPC-H Benchmark

Mohammed Al-Kateb; Alain Crolotte; Ahmad Ghazal; Linda Rose

The importance of time in decision-support is widely recognized and has been addressed through temporal applications or through native temporal features by major DBMS vendors. In this paper we propose a framework for adding a new temporal component to the TPC-H benchmark. Our proposal includes temporal DDL, procedures to populate the temporal tables via insertselect thereby providing history, and temporal queries based on a workload that covers the temporal dimension broken down as current, history, and both. The queries we define as part of this benchmark include the typical SQL operators involved in scans, joins and aggregations. The paper concludes with experimental results. While in this paper we consider adding temporal history to a subset of the TPC-H benchmark tables namely Part/ Supplier/Partsupp, our proposed framework addresses a need and uses, as a starting point, a benchmark that is widely successful and well-understood.

tpc technology conference | 2011

Introducing skew into the TPC-H benchmark

Alain Crolotte; Ahmad Ghazal

While uniform data distributions were a design choice for the TPC-D benchmark and its successor TPC-H, it has been universally recognized that data skew is prevalent in data warehousing. A modern benchmark should therefore provide a test bed to evaluate the ability of database engines to handle skew. This paper introduces a concrete and practical way to introduce skew in the TPC-H data model by modifying the customer and supplier tables to reflect non-uniform customer and supplier populations. The first proposal consists in defining customer and supplier populations by nation that are roughly proportional to the actual nation populations. In a second proposal, nations are divided into two groups, one with large and equal populations and the other with equal and small populations. We then experiment with the proposed skew models to show how the optimizer of a parallel system can recognize skew and potentially produce different plans depending on the presence of skew. A comparison is made between query performance with the proposed method vs. the original uniform TPC-H distributions. Finally, an approach is presented to introduce skew into TPC-H with the current query set that is compatible with the current benchmark specification rules and could be implemented today.

Revised Selected Papers of the First Workshop on Specifying Big Data Benchmarks - Volume 8163 | 2012

BigBench Specification V0.1

Tilmann Rabl; Ahmad Ghazal; Minqing Hu; Alain Crolotte; Francois Raab; Meikel Poess; Hans-Arno Jacobsen

In this article, we present the specification of BigBench, an end-to-end big data benchmark proposal. BigBench models a retail product supplier. The benchmark proposal covers a data model and a set of big data specific queries. BigBenchs synthetic data generator addresses the variety, velocity and volume aspects of big data workloads. The structured part of the BigBench data model is adopted from the TPC-DS benchmark. In addition, the structured schema is enriched with semi-structured and unstructured data components that are common in a retail product supplier environment. This specification contains the full query set as well as the data model.

international conference on algorithms and architectures for parallel processing | 2011

Verification of partitioning and allocation techniques on teradata DBMS

Ladjel Bellatreche; Soumia Benkrid; Ahmad Ghazal; Alain Crolotte; Alfredo Cuzzocrea

Data fragmentation and allocation in distributed and parallel Database Management Systems (DBMS) have been extensively studied in the past. Previous work tackled these two problems separately even though they are dependent on each other. We recently developed a combined algorithm that handles the dependency issue between fragmentation and allocation. A novel genetic solution was developed for this problem. The main issue of this solution and previous solutions is the lack of real life verifications of these models. This paper addresses this gap by verifying the effectiveness of our previous genetic solution on the Teradata DBMS. Teradata is a shared nothing DBMS with proven scalability and robustness in real life user environments as big as 10s of petabytes of relational data. Experiments are conducted for the genetic solution and previous work using the SSB benchmark (TPC-H like) on a Teradata appliance running TD 13.10. Results show that the genetic solution is faster than previous work by a 38%.

complex, intelligent and software intensive systems | 2012

The FaA Methodology and Its Experimental Validation on a Real-Life Parallel Processing Database System

Ladjel Bellatreche; Soumia Benkrid; Alain Crolotte; Alfredo Cuzzocrea; Ahmad Ghazal

This paper complements our previous results in the context of effectively and efficient designing Parallel Relational Data Warehouses (PRDW) over heterogeneous database clusters, which are represented by the proposal of a methodology called Fragmentation & Allocation (F& A). The main merit of F& A is that of combining the fragmentation and the allocation phases simultaneously, which are instead performed separately by traditional approaches. In this paper, we prove the practical impact and the reliability of F& A on a real-life parallel processing database system.

international conference on data engineering | 2017

BigBench V2: The New and Improved BigBench

Ahmad Ghazal; Todor Ivanov; Pekka Kostamaa; Alain Crolotte; Ryan Voong; Mohammed Al-Kateb; Waleed Ghazal; Roberto V. Zicari

Benchmarking Big Data solutions has been gaining a lot of attention from research and industry. BigBench is one of the most popular benchmarks in this area which was adopted by the TPC as TPCx-BB. BigBench, however, has key shortcomings. The structured component of the data model is the same as the TPC-DS data model which is a complex snowflake-like schema. This is contrary to the simple star schema Big Data models in real life. BigBench also treats the semi-structured web-logs more or less as a structured table. In real life, web-logs are modeled as key-value pairs with unknown schema. Specific keys are captured at query time - a process referred to as late binding. In addition, eleven (out of thirty) of the BigBench queries are TPC-DS queries. These queries are complex SQL applied on the structured part of the data model which again is not typical of Big Data workloads. In this paper1, we present BigBench V2 to address the aforementioned limitations of the original BigBench. BigBench V2 is completely independent of TPC-DS with a new data model and an overhauled workload. The new data model has a simple structured data model. Web-logs are modeled as key-value pairs with a substantial and variable number of keys. BigBench V2 mandates late binding by requiring query processing to be done directly on key-value web-logs rather than a pre-parsed form of it. A new scale factor-based data generator is implemented to produce structured tables, key-value semistructured web-logs, and unstructured data. We implemented and executed BigBench V2 on Hive. Our proof of concept shows the feasibility of BigBench V2 and outlines different ways of implementing late binding.

database and expert systems applications | 2012

An Efficient SQL Rewrite Approach for Temporal Coalescing in the Teradata RDBMS

Mohammed Al-Kateb; Ahmad Ghazal; Alain Crolotte

The importance of temporal data management is manifested by a considerable attention from the database research community. This importance is becoming even more evident by the recent increasing support of temporal features in major commercial database systems. Among these systems, Teradata offers a native support to a wide range of temporal analytics. In this paper, we address the problem of temporal coalescing in the Teradata RDBMS. Temporal coalescing is a key temporal query processing operation, which merges adjacent or overlapping timestamps of value-equivalent rows. From existing approaches to implement temporal coalescing, pursuing an SQL-based approach is perhaps the most feasible and the easiest applicable. Along this direction, we propose an efficient SQL rewrite approach to implement temporal coalescing in the Teradata RDBMS by leveraging runtime conditional partitioning – a Teradata enhancement to ANSI ordered analytic functions – that enables to express the coalescing semantic in an optimized join-free single-scan SQL query. We evaluated our proposed approach over a system running Teradata 14.0 with a performance study that demonstrates its efficiency.

Explore More