Garret Swart | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Garret Swart is active.

Explore More

Publication

Featured researches published by Garret Swart.

international conference on data engineering | 2008

Constant-Time Query Processing

Vijayshankar Raman; Garret Swart; Lin Qiao; Frederick R. Reiss; Vijay Dialani; Donald Kossmann; Inderpal Narang; Richard S. Sidle

Query performance in current systems depends significantly on tuning: how well the query matches the available indexes, materialized views etc. Even in a well tuned system, there are always some queries that take much longer than others. This frustrates users who increasingly want consistent response times to ad hoc queries. We argue that query processors should instead aim for constant response times for all queries, with no assumption about tuning. We present Blink, our first attempt at this goal, that runs every query as a table scan over a fully denormalized database, with hash group-by done along the way. To make this scan efficient, Blink uses a novel compression scheme that horizontally partitions tuples by frequency, thereby compressing skewed data almost down to entropy, even while producing long runs of fixed-length, easily-parseable values. We also present a scheme for evaluating a conjunction of range and equality predicates in SIMD fashion over compressed tuples, and different schemes for efficient hash-based aggregation within the L2 cache. A experimental study with a suite of arbitrary single block SQL queries over a TPCH-like schema suggests that constant-time queries can be efficient.

international conference on management of data | 2007

How to barter bits for chronons: compression and bandwidth trade offs for database scans

Allison L. Holloway; Vijayshankar Raman; Garret Swart; David J. DeWitt

Two trends are converging to make the CPU cost of a table scan a more important component of database performance. First, table scans are becoming a larger fraction of the query processing workload, and second, large memories and compression are making table scans CPU, rather than disk bandwidth, bound. Data warehouse systems have found that they can avoid the unpredictability of joins and indexing and achieve good performance by using massive parallel processing to perform scans over compressed vertical partitions of a denormalized schema. In this paper we present a study of how to make such scans faster by the use of a scan code generator that produces code tuned to the database schema, the compression dictionaries, the queries being evaluated and the target CPU architecture. We investigate a variety of compression formats and propose two novel optimizations: tuple length quantization and a field length lookup table, for efficiently processing variable length fields and tuples. We present a detailed experimental study of the performance of generated scans against these compression formats, and use this to explore the trade off between compression quality and scan speed. We also introduce new strategies for removing instruction-level dependencies and increasing instruction-level parallelism, allowing for greater exploitation of multi-issue processors.

very large data bases | 2008

Row-wise parallel predicate evaluation

Ryan Johnson; Vijayshankar Raman; Richard S. Sidle; Garret Swart

Table scans have become more interesting recently due to greater use of ad-hoc queries and greater availability of multi-core, vector-enabled hardware. Table scan performance is limited by value representation, table layout, and processing techniques. In this paper we propose a new layout and processing technique for efficient one-pass predicate evaluation. Starting with a set of rows with a fixed number of bits per column, we append columns to form a set of banks and then pad each bank to a supported machine word length, typically 16, 32, or 64 bits. We then evaluate partial predicates on the columns of each bank, using a novel evaluation strategy that evaluates column level equality, range tests, IN-list predicates, and conjuncts of these predicates, simultaneously on multiple columns within a bank, and on multiple rows within a machine register. This approach outperforms pure column stores, which must evaluate the partial predicates one column at a time. We evaluate and compare the performance and representation overhead of this new approach and several proposed alternatives.

symposium on cloud computing | 2012

Balancing reducer skew in MapReduce workloads using progressive sampling

Smriti R. Ramakrishnan; Garret Swart; Aleksey M. Urmanov

The elapsed time of a parallel job depends on the completion time of its longest running constituent. We present a static load balancing algorithm that distributes work evenly across the reducers in a MapReduce job resulting in significant elapsed time reductions. Taking a user-specified model of reducer performance, our load balancer uses a progressive objective-based cluster sampler to estimate the load associated with each reduce-key. It balances the workload using Key Chopping, to split keys with large loads into sub-keys that can be assigned to different distributive reducers, and Key Packing, to assign keys with medium loads to reducers to minimize the maximum reducer load. Keys with small loads are hashed as they have little effect on the balance. This repeats until the user specified balancing objective and confidence level are achieved. The sampler and load balancer have been implemented in the Oracle Loader for Hadoop (OLH), a commercial MapReduce application that employs Apache Hadoop to perform parallel data formatting and data movement into partitioned relational tables. We present the performance improvements we achieve in both OLH and in a MapReduce program for inverted index creation. The balancer works for arbitrary IID key distributions, the time used for sampling is small and our solution is very effective at reducing the elapsed time for the MapReduce jobs we explored.

very large data bases | 2009

Autonomic query parallelization using non-dedicated computers: an evaluation of adaptivity options

Norman W. Paton; Jorge Buenabad-Chávez; Mengsong Chen; Vijayshankar Raman; Garret Swart; Inderpal Narang; Daniel M. Yellin; Alvaro A. A. Fernandes

Writing parallel programs that can take advantage of non-dedicated processors is much more difficult than writing such programs for networks of dedicated processors. In a non-dedicated environment such programs must use autonomic techniques to respond to the unpredictable load fluctuations that prevail in the computational environment. In adaptive query processing (AQP), several techniques have been proposed for dynamically redistributing processor load assignments throughout a computation to take account of varying resource capabilities, but we know of no previous study that compares their performance. This paper presents a simulation-based evaluation of these autonomic parallelization techniques in a uniform environment and compares how well they improve the performance of the computation. Four published strategies are compared with a new algorithm that seeks to overcome some weaknesses identified in the existing approaches. In addition, we explore the use of techniques from online algorithms to provide a firm foundation for determining when to adapt in two of the existing algorithms. The evaluations identify situations in which each strategy may be used effectively and in which it should be avoided.

Quality of Protection | 2006

Multilevel Security and Quality of Protection

Simon N. Foley; Stefano Bistarelli; Barry O’Sullivan; John Herbert; Garret Swart

Constraining how information may flow within a system is at the heart of many protection mechanisms and many security policies have direct interpretations in terms of information flow and multilevel security style controls. However, while conceptually simple, multilevel security controls have been difficult to achieve in practice. In this paper we explore how the traditional assurance measures that are used in the network multilevel security model can be re-interpreted and generalised to provide the basis of a framework for reasoning about the quality of protection provided by a secure system configuration.

international conference on management of data | 2012

Oracle in-database hadoop: when mapreduce meets RDBMS

Xueyuan Su; Garret Swart

Big data is the tar sands of the data world: vast reserves of raw gritty data whose valuable information content can only be extracted at great cost. MapReduce is a popular parallel programming paradigm well suited to the programmatic extraction and analysis of information from these unstructured Big Data reserves. The Apache Hadoop implementation of MapReduce has become an important player in this market due to its ability to exploit large networks of inexpensive servers. The increasing importance of unstructured data has led to the interest in MapReduce and its Apache Hadoop implementation, which has led to the interest of data processing vendors in supporting this programming style. Oracle RDBMS has had support for the MapReduce paradigm for many years through the mechanism of user defined pipelined table functions and aggregation objects. However, such support has not been Hadoop source compatible. Native Hadoop programs needed to be rewritten before becoming usable in this framework. The ability to run Hadoop programs inside the Oracle database provides a versatile solution to database users, allowing them use programming skills they may already possess and to exploit the growing Hadoop eco-system. In this paper, we describe a prototype of Oracle In-Database Hadoop that supports the running of native Hadoop applications written in Java. This implementation executes Hadoop applications using the efficient parallel capabilities of the Oracle database and a subset of the Apache Hadoop infrastructure. This systems target audience includes both SQL and Hadoop users. We discuss the architecture and design, and in particular, demonstrate how MapReduce functionalities are seamlessly integrated within SQL queries. We also share our experience in building such a system within Oracle database and follow-on topics that we think are promising areas for exploration.

international symposium on parallel and distributed computing | 2004

Spreading the load using consistent hashing: a preliminary report

Garret Swart

Consistent hashing can be used to assign objects to nodes in a distributed system. It has been used by several distributed systems including Chord, Pastry, and Tornado because of its efficient handling of node failure and repair. In this paper we analyze how well consistent hashing does at evenly distributing objects among the nodes in the system. We also extend current consistent hashing algorithms to allow for dynamic load balancing while retaining the good properties of consistent hashing. Finally we analyze our extensions using both probabilistic analysis and simulations. The algorithms derived appear to achieve much better load balancing.

very large data bases | 2014

Changing engines in midstream: a java stream computational model for big data processing

Xueyuan Su; Garret Swart; Brian Goetz; Brian Oliver; Paul Sandoz

With the addition of lambda expressions and the Stream API in Java 8, Java has gained a powerful and expressive query language that operates over in-memory collections of Java objects, making the transformation and analysis of data more convenient, scalable and efficient. In this paper, we build on Java 8 Stream and add a DistributableStream abstraction that supports federated query execution over an extensible set of distributed compute engines. Each query eventually results in the creation of a materialized result that is returned either as a local object or as an engine defined distributed Java Collection that can be saved and/or used as a source for future queries. Distinctively, DistributableStream supports the changing of compute engines both between and within a query, allowing different parts of a computation to be executed on different platforms. At execution time, the query is organized as a sequence of pipelined stages, each stage potentially running on a different engine. Each node that is part of a stage executes its portion of the computation on the data available locally or produced by the previous stage of the computation. This approach allows for computations to be assigned to engines based on pricing, data locality, and resource availability. Coupled with the inherent laziness of stream operations, this brings great flexibility to query planning and separates the semantics of the query from the details of the engine used to execute it. We currently support three engines, Local, Apache Hadoop MapReduce and Oracle Coherence, and we illustrate how new engines and data sources can be added.

international conference on autonomic computing | 2006

Autonomic Query Parallelization using Non-dedicated Computers: An Evaluation of Adaptivity Options

Norman W. Paton; Vijayshankar Raman; Garret Swart; Inderpal Narang

Writing parallel programs that can take advantage of non-dedicated processors is much more difficult than writing such programs for networks of dedicated processors. In a non-dedicated environment such programs must use autonomic techniques to respond to the unpredictable load fluctuations that characterize the computational model. In the area of adaptive query processing (AQP), several techniques have been proposed for dynamically redistributing processor assignments throughout a computation to take account of varying resource capabilities, but we know of no previous study that compares their performance. This paper presents an simulation based evaluation of these autonomic parallelization techniques in a uniform environment and compares on how well they improve the scaleability of the computation. The existing strategies are compared with a new approach inspired by work on distributed hash tables. The evaluation identifies situations in which each strategy may be used effectively and it should be avoided.

Explore More