Paul Sinclair | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Paul Sinclair is active.

Explore More

Publication

Featured researches published by Paul Sinclair.

international conference on data engineering | 2017

Dynamic Statistics Collection in the Teradata Unified Data Architecture

Sung Jin Kim; Mohammed Al-Kateb; Paul Sinclair; Alain Crolotte; Chengyang Zhang; Linda Rose

The Unified Data Architecture (UDA) of Teradata is an inclusive multisystem data analytics solution. A key challenge for query optimization under the UDA is to find optimal plans for queries that access data on heterogeneous remote data stores. The challenge comes primarily from the lack of statistics for data stored on remote systems. In this paper, we present techniques implemented in Teradata Database for dynamically collecting statistics on data fetched from a remote system and feeding these statistics back to the query optimizer during query execution. We demonstrate the performance impact of dynamic statistics collection and feedback with experiments conducted on a system that consists of Teradata Database and a remote Hadoop server.

international conference on data engineering | 2017

Optimizing UNION ALL Join Queries in Teradata

Mohammed Al-Kateb; Paul Sinclair; Alain Crolotte; Lu Ma; Grace Au; Sanjay Nair

The UNION ALL set operator is useful for combining data from multiple sources. With the emergence of big data ecosystems in which data is typically stored on multiple systems, UNION ALL has become even more important. In this paper, we present optimization techniques implemented in Teradata Database for join queries with UNION ALL. Instead of spooling all branches of UNION ALL before performing join operations, we propose cost-based pushing of joins into branches. Join pushing not only addresses the prohibitive cost of spooling all branches, but also helps in exposing more efficient join methods (e.g., direct hash-based joins) which, otherwise, would not be considered by the query optimizer. The geography of relations being pushed to UNION ALL branches is also adjusted to avoid unnecessary redistributions and duplications of data. We conclude the paper with a performance study that demonstrates the impact of the proposed optimization techniques on query performance.

very large data bases | 2016

Hybrid row-column partitioning in teradata ®

Mohammed Al-Kateb; Paul Sinclair; Grace Au; Carrie Ballinger

Data partitioning is an indispensable ingredient of database systems due to the performance improvement it can bring to any given mixed workload. Data can be partitioned horizontally or vertically. While some commercial proprietary and open source database systems have one flavor or mixed flavors of these partitioning forms, Teradata Database offers a unique hybrid row-column store solution that seamlessly combines both of these partitioning schemes. The key feature of this hybrid solution is that either row, column, or combined partitions are all stored and handled in the same way internally by the underlying file system storage layer. In this paper, we present the main characteristics and explain the implementation approach of Teradatas row-column store. We also discuss query optimization techniques applicable specifically to partitioned tables. Furthermore, we present a performance study that demonstrates how different partitioning options impact the performance of various queries.

international conference on management of data | 2018

Joins over UNION ALL Queries in Teradata®: Demonstration of Optimized Execution

Mohammed Al-Kateb; Paul Sinclair; Grace Au; Sanjay Nair; Mark Sirek; Lu Ma; Mohamed Y. Eltabakh

The UNION ALL set operator is useful for combining data from multiple sources. With the emergence and prevalence of big data ecosystems in which data is typically stored on multiple systems, UNION ALL has become even more important in many analytical queries. In this project, we demonstrate novel cost-based optimization techniques implemented in Teradata Database for join queries involving UNION ALL views and derived tables. Instead of the naive and traditional way of spooling each UNION ALL branch to a common spool prior to performing join operations, which can be prohibitively expensive, we demonstrate new techniques developed in Teradata Database including: 1) Cost-based pushing of joins into UNION ALL branches, 2) Branch grouping strategy prior to join pushing, 3) Geography adjustment of the pushed relations to avoid unnecessary redistribution or duplication, 4) Iterative join decomposition of a pushed join to multiple joins, and 5) Combining multiple join steps into a single multisource join step. In the demonstration, we use the Teradata Visual Explain tool, which offers a rich set of visual rendering capabilities of query plans, the display of various metadata information for each plan step, and several interactive UGI options for end-users.

Archive | 2004