Oliver Kennedy | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Oliver Kennedy is active.

Explore More

Publication

Featured researches published by Oliver Kennedy.

very large data bases | 2012

DBToaster: higher-order delta processing for dynamic, frequently fresh views

Yanif Ahmad; Oliver Kennedy; Christoph Koch; Milos Nikolic

Applications ranging from algorithmic trading to scientific data analysis require realtime analytics based on views over databases that change at very high rates. Such views have to be kept fresh at low maintenance cost and latencies. At the same time, these views have to support classical SQL, rather than window semantics, to enable applications that combine current with aged or historical data. In this paper, we present viewlet transforms, a recursive finite differencing technique applied to queries. The viewlet transform materializes a query and a set of its higher-order deltas as views. These views support each others incremental maintenance, leading to a reduced overall view maintenance cost. The viewlet transform of a query admits efficient evaluation, the elimination of certain expensive query operations, and aggressive parallelization. We develop viewlet transforms into a workable query execution technique, present a heuristic and cost-based optimization framework, and report on experiments with a prototype dynamic data management system that combines viewlet transforms with an optimizing compilation technique. The system supports tens of thousands of complete view refreshes a second for a wide range of queries.

international conference on data engineering | 2010

PIP: A database system for great and small expectations

Oliver Kennedy; Christoph Koch

Estimation via sampling out of highly selective join queries is well known to be problematic, most notably in online aggregation. Without goal-directed sampling strategies, samples falling outside of the selection constraints lower estimation efficiency at best, and cause inaccurate estimates at worst. This problem appears in general probabilistic database systems, where query processing is tightly coupled with sampling. By committing to a set of samples before evaluating the query, the engine wastes effort on samples that will be discarded, query processing that may need to be repeated, or unnecessarily large numbers of samples.

international conference on data engineering | 2009

Dynamic Approaches to In-network Aggregation

Oliver Kennedy; Christoph Koch; Alan J. Demers

Collaboration between small-scale wireless devices depends on their ability to infer aggregate properties of all nearby nodes. The highly dynamic environment created by mobile devices introduces a silent failure mode that is disruptive to this kind of inference. We address this problem by presenting techniques for extending existing unstructured aggregation protocols to cope with failure modes introduced by mobile environments. The modified protocols allow devices with limited connectivity to maintain estimates of aggregates, despite \textit{unexpected} peer departures and arrivals.

international world wide web conferences | 2016

Ettu: Analyzing Query Intents in Corporate Databases

Gökhan Kul; Duc Thanh Anh Luong; Ting Xie; Patrick Coonan; Varun Chandola; Oliver Kennedy; Shambhu J. Upadhyaya

Insider threats to databases in the financial sector have become a very serious and pervasive security problem. This paper proposes a framework to analyze access patterns to databases by clustering SQL queries issued to the database. Our system Ettu works by grouping queries with other similarly structured queries. The small number of intent groups that result can then be efficiently labeled by human operators. We show how our system is designed and how the components of the system work. Our preliminary results show that our system accurately models user intent.

Scopus | 2015

Pocket Data: The Need for TPC-MOBILE

Oliver Kennedy; Jerry Ajay; Geoffrey Challen; Lukasz Ziarek

Embedded database engines such as SQLite provide a convenient data persistence layer and have spread along with the applications using them to many types of systems, including interactive devices such as smartphones. Android, the most widely-distributed smartphone platform, both uses SQLite internally and provides interfaces encouraging apps to use SQLite to store their own private structured data. As similar functionality appears in all major mobile operating systems, embedded database performance affects the response times and resource consumption of billions of smartphones and the millions of apps that run on them—making it more important than ever to characterize smartphone embedded database workloads. To do so, we present results from an experiment which recorded SQLite activity on 11 Android smartphones during one month of typical usage. Our analysis shows that Android SQLite usage produces queries and access patterns quite different from canonical server workloads. We argue that evaluating smartphone embedded databases will require a new benchmarking suite and we use our results to outline some of its characteristics.

Scopus | 2015

Detecting the Temporal Context of Queries

Oliver Kennedy; Ying Yang; Jan Chomicki; Ronny Fehling; Zhen Hua Liu; Dieter Gawlick

Business intelligence and reporting tools rely on a database that accurately mirrors the state of the world. Yet, even if the schema and queries are constructed in exacting detail, assumptions about the data made during extraction, transformation, and schema and query creation of the reporting database may be (accidentally) ignored by end users, or may change as the database evolves over time. As these assumptions are typically implicit (e.g., assuming that a sales record relation is append-only), it can be hard to even detect that a mistaken assumption has been made. In this paper, we argue that such errors are consequences of unintended contextual dependence, i.e., query outputs dependent on a variable characteristic of the database. We characterize contextual dependence, and explore several strategies for efficiently detecting and quantifying the effects of contextual dependence on query outputs. We present and evaluate our findings in the context of a concrete case study: Detecting temporal dependence using a database management system with versioning capabilities.

international conference on big data | 2014

PigOut: Making multiple Hadoop clusters work together

Kyungho Jeon; Sharath Chandrashekhara; Feng Shen; Shikhar Mehra; Oliver Kennedy; Steven Y. Ko

This paper presents PigOut, a system that enables federated data processing over multiple Hadoop clusters. Using PigOut, a user (such as a data analyst) can write a single script in a high-level language to efficiently use multiple Hadoop clusters. There is no need to manually write multiple scripts and coordinate the execution for different clusters. PigOut accomplishes this by automatically partitioning a single, user-supplied script into multiple scripts that run on different clusters. Additionally, PigOut generates workflow descriptions to coordinate execution across clusters. In doing so, PigOut leverages existing tools built around Hadoop, avoiding extra effort required from users or administrators. For example, PigOut uses Pig Latin, a popular query language for Hadoop MapReduce, in a (virtually) unmodified form. Through our evaluation with PigMix, the standard benchmark for Pig, we demonstrate that PigOuts automatically-generated scripts and workflow definitions have comparable performance to manual, hand-tuned ones. We also report our experience with manually writing multiple scripts for a set of federated clusters, and compare the process with PigOuts automated approach.

international workshop on mobile computing systems and applications | 2015

maybe We Should Enable More Uncertain Mobile App Programming

Geoffrey Challen; Jerry Ajay; Nick DiRienzo; Oliver Kennedy; Anudipa Maiti; Anandatirtha Nandugudi; Sriram Shantharam; Jinghao Shi; Guru Prasad Srinivasa; Lukasz Ziarek

One of the reasons programming mobile systems is so hard is the wide variety of environments a typical app encounters at runtime. As a result, in many cases only post-deployment user testing can determine the right algorithm to use, the rate at which something should happen, or when an app should attempt to conserve energy. Programmers should not be forced to make these choices at development time. Unfortunately, languages leave no way for programmers to express and structure uncertainty about runtime conditions, forcing them to adopt ineffective or fragile ad-hoc solutions. We introduce a new approach based on structured uncertainty through a new language construct: the maybe statement. maybe statements allow programmers to defer choices about app behavior that cannot be made at development time, while providing enough structure to allow a system to later adaptively choose from multiple alternatives. Eliminating the uncertainty introduced by maybe statements can be done in a large variety of ways: through simulation, split testing, user configuration, temporal adaptation, or machine learning techniques, depending on the type of adaptation appropriate for each situation. Our paper motivates the maybe statement, presents its syntax, and describes a complete system for testing and choosing from maybe alternatives.

international conference on management of data | 2018

SchemaDrill: Interactive Semi-Structured Schema Design

William Spoth; Ting Xie; Oliver Kennedy; Ying Yang; Beda Christoph Hammerschmidt; Zhen Hua Liu; Dieter Gawlick

Ad-hoc data models like JSON make it easy to evolve schemas and to multiplex different data-types into a single stream. This flexibility makes JSON great for generating data, but also makes it much harder to query, ingest into a database, and index. In this paper, we explore the first step of JSON data loading: schema design. Specifically, we consider the challenge of designing schemas for existing JSON datasets as an interactive problem. We present SchemaDrill, a roll-up/drill-down style interface for exploring collections of JSON records. SchemaDrill helps users to visualize the collection, identify relevant fragments, and map it down into one or more flat, relational schemas. We describe and evaluate two key components of SchemaDrill: (1) A summary schema representation that significantly reduces the complexity of JSON schemas without a meaningful reduction in information content, and (2) A collection of schema visualizations that help users to qualitatively survey variability amongst different schemas in the collection.

international conference on management of data | 2017

Beta Probabilistic Databases: A Scalable Approach to Belief Updating and Parameter Learning

Niccolo' Meneghetti; Oliver Kennedy; Wolfgang Gatterbauer

Tuple-independent probabilistic databases (TI-PDBs) handle uncertainty by annotating each tuple with a probability parameter; when the user submits a query, the database derives the marginal probabilities of each output-tuple, assuming input-tuples are statistically independent. While query processing in TI-PDBs has been studied extensively, limited research has been dedicated to the problems of updating or deriving the parameters from observations of query results. Addressing this problem is the main focus of this paper. We introduce Beta Probabilistic Databases (B-PDBs), a generalization of TI-PDBs designed to support both (i) belief updating and (ii) parameter learning in a principled and scalable way. The key idea of B-PDBs is to treat each parameter as a latent, Beta-distributed random variable. We show how this simple expedient enables both belief updating and parameter learning in a principled way, without imposing any burden on regular query processing. We use this model to provide the following key contributions: (i) we show how to scalably compute the posterior densities of the parameters given new evidence; (ii) we study the complexity of performing Bayesian belief updates, devising efficient algorithms for tractable classes of queries; (iii) we propose a soft-EM algorithm for computing maximum-likelihood estimates of the parameters; (iv) we show how to embed the proposed algorithms into a standard relational engine; (v) we support our conclusions with extensive experimental results.

Explore More