Venkatesh Raghavan | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Venkatesh Raghavan is active.

Explore More

Publication

Featured researches published by Venkatesh Raghavan.

international conference on management of data | 2014

Orca: a modular query optimizer architecture for big data

Mohamed A. Soliman; Lyublena Antova; Venkatesh Raghavan; Amr El-Helw; Zhongxian Gu; Entong Shen; George Constantin Caragea; Carlos Garcia-Alvarado; Foyzur Rahman; Michalis Petropoulos; Florian Waas; Sivaramakrishnan Narayanan; Konstantinos Krikellas; Rhonda Baldwin

The performance of analytical query processing in data management systems depends primarily on the capabilities of the systems query optimizer. Increased data volumes and heightened interest in processing complex analytical queries have prompted Pivotal to build a new query optimizer. In this paper we present the architecture of Orca, the new query optimizer for all Pivotal data management products, including Pivotal Greenplum Database and Pivotal HAWQ. Orca is a comprehensive development uniting state-of-the-art query optimization technology with own original research resulting in a modular and portable optimizer architecture. In addition to describing the overall architecture, we highlight several unique features and present performance comparisons against other systems.

international conference on data engineering | 2010

Progressive result generation for multi-criteria decision support queries

Venkatesh Raghavan; Elke A. Rundensteiner

Multi-criteria decision support (MCDS) is crucial in many business and web applications such as web searches, B2B portals and on-line commerce. Such MCDS applications need to report results early; as soon as they are being generated so that they can react and formulate competitive decisions in near real-time. The ease in expressing user preferences in web-based applications has made Pareto-optimal (skyline) queries a popular class of MCDS queries. However, state-of-the-art techniques either focus on handling skylines on single input sets (i.e., no joins) or do not tackle the challenge of producing progressive early output results. In this work, we propose a progressive query evaluation framework ProgXe that transforms the execution of queries involving skyline over joins to be non-blocking, i.e., to be progressively generating results early and often. In ProgXe the query processing (join, mapping and skyline) is conducted at multiple levels of abstraction, thereby exploiting the knowledge gained from both input as well as mapped output spaces. This knowledge enables us to identify and reason about abstract-level relationships to guarantee correctness of early output. It also provides optimization opportunities previously missed by current techniques. To further optimize ProgXe, we incorporate an ordering technique that optimizes the rate at which results are reported by translating the optimization of tuple-level processing into a job-sequencing problem. Our experimental study over a wide variety of data sets demonstrates the superiority of our approach over state-of-the-art techniques.

international conference on management of data | 2010

QRelX: generating meaningful queries that provide cardinality assurance

Manasi Vartak; Venkatesh Raghavan; Elke A. Rundensteiner

In many business and consumer applications, queries have cardinality constraints. However, current database systems provide minimal support for cardinality assurance. Consequently, users must adopt a cumbersome trial-and-error approach to find queries that are close to the original query but also attain the desired cardinality. In this demonstration, we present QRelX a novel framework to automatically generate alternate queries that meet the cardinality and closeness criteria. QRelX employs an innovative query space transformation strategy, proximity-based search and incremental cardinality estimation to efficiently find alternate queries. Our demonstration is an interactive game that allows the audience to compete with QRelX via manual query refinement. We illustrate the importance of cardinality assurance through real-time comparisons between manual refinement and QRelX. We also highlight the novelty of our solution by visualizing the core algorithms of QRelX.

international conference on data engineering | 2007

FireStream: Sensor Stream Processing for Monitoring Fire Spread

Venkatesh Raghavan; Elke A. Rundensteiner; John P. Woycheese; Abhishek Mukherji

This demonstration presents FireStream, a sensor stream processing system which provides services for run-time detection, monitoring and visualization of fire spread in intelligent buildings that can be of great benefit to first responders. Our system can effectively handle large heterogeneous sensor streams using shared window execution and dynamic participant handling to yield a high-ary MJoin solution.

data and knowledge engineering | 2010

A new look at generating multi-join continuous query plans: A qualified plan generation problem

Yali Zhu; Venkatesh Raghavan; Elke A. Rundensteiner

State-of-the-art relational and continuous algorithms alike have focused on producing optimal or near-optimal query plans by minimizing a single cost function. However, ensuring accurate yet real-time responses for stream processing applications necessitates that the system identifies qualified rather than optimal query plans - with the former guaranteeing that their utilization of both the CPU and the memory resources stays within their respective system capacities. In such scenarios, being optimal in one resource usage while out-of-bound in the other is not viable. Our experimental study illustrates that to be effective a qualified plan optimizer must explore an extended plan search space called the jtree space composed not only of the standard mjoin and binary join plans, but also of general join trees with mixed operator types. While our proposed dynamic programming-based JTree-Finder algorithm is guaranteed to generate a qualified query plan if such a plan exists in the search space, its exponential time complexity makes it not viable for continuous stream environments. To facilitate run-time optimization, we thus propose an efficient yet effective two-layer plan generation framework. The proposed framework first exploits the positive correlation between the CPU and memory usages to obtain plans that are minimal in at least one of the two resource usages. In our second layer we propose two alternative polynomial-time algorithms to explore the negative correlation between the resource usages to successfully generate query plans that adhere to both CPU and memory resource constraints. Effectiveness and efficiency of the proposed algorithms are experimentally evaluated by comparing them to each other as well as state-of-the-art techniques.

very large data bases | 2015

Optimization of common table expressions in MPP database systems

Amr El-Helw; Venkatesh Raghavan; Mohamed A. Soliman; George Constantin Caragea; Zhongxian Gu; Michalis Petropoulos

Big Data analytics often include complex queries with similar or identical expressions, usually referred to as Common Table Expressions (CTEs). CTEs may be explicitly defined by users to simplify query formulations, or implicitly included in queries generated by business intelligence tools, financial applications and decision support systems. In Massively Parallel Processing (MPP) database systems, CTEs pose new challenges due to the distributed nature of query processing, the overwhelming volume of underlying data and the scalability criteria that systems are required to meet. In these settings, the effective optimization and efficient execution of CTEs are crucial for the timely processing of analytical queries over Big Data. In this paper, we present a comprehensive framework for the representation, optimization and execution of CTEs in the context of Orca -- Pivotals query optimizer for Big Data. We demonstrate experimentally the benefits of our techniques using industry standard decision support benchmark.

international conference on data engineering | 2012

Automatic Data Placement in MPP Databases

Carlos Garcia-Alvarado; Venkatesh Raghavan; Sivaramakrishnan Narayanan; Florian Waas

Physical design for shared-nothing databases includes decisions regarding the placement of data across a cluster of database servers. In particular, for each table in the database a distribution policy must be specified. In general, the choice of distribution policy affects the performance of query workloads significantly as individual queries may have to redistribute data on-the-fly as part of the execution. As is the case with a number of other physical design decisions, the problem is hard and poses substantial difficulties for database administrators. In this paper, we present FINDER, a design tool that optimizes data placement decisions for a database schema with respect to any given query workload. We designed FINDER with portability in mind: The tool is fully external to the target database system, i.e., does not require any code-level integration with the system, and avoids reverse engineering of query optimization techniques. Our experiments show FINDER converges quickly and delivers superior results compared to state-of-the-art solutions.

british national conference on databases | 2009

Multi-Join Continuous Query Optimization: Covering the Spectrum of Linear, Acyclic, and Cyclic Queries

Venkatesh Raghavan; Yali Zhu; Elke A. Rundensteiner; Daniel J. Dougherty

Traditional optimization algorithms that guarantee optimal plans have exponential time complexity and are thus not viable in streaming contexts. Continuous query optimizers commonly adopt heuristic techniques such as Adaptive Greedy to attain polynomial-time execution. However, these techniques are known to produce optimal plans only for linear and star shaped join queries. Motivated by the prevalence of acyclic, cyclic and even complete query shapes in stream applications, we conduct an extensive experimental study of the behavior of the state-of-the-art algorithms. This study has revealed that heuristic-based techniques tend to generate sub-standard plans even for simple acyclic join queries. For general acyclic join queries we extend the classical IK approach to the streaming context to define an algorithm TreeOpt that is guaranteed to find an optimal plan in polynomial time. For the case of cyclic queries, for which finding optimal plans is known to be NP-complete, we present an algorithm FAB which improves other heuristic-based techniques by (i) increasing the likelihood of finding an optimal plan and (ii) improving the effectiveness of finding a near-optimal plan when an optimal plan cannot be found in polynomial time. To handle the entire spectrum of query shapes from acyclic to cyclic we propose a Q-Aware approach that selects the optimization algorithm used for generating the join order, based on the shape of the query.

conference on information and knowledge management | 2008

SNIF TOOL: sniffing for patterns in continuous streams

Abhishek Mukherji; Elke A. Rundensteiner; David C. Brown; Venkatesh Raghavan

Continuous time-series sequence matching, specifically, matching a numeric live stream against a set of redefined pattern sequences, is critical for domains ranging from fire spread tracking to network traffic monitoring. While several algorithms exist for similarity matching of static time-series data, matching continuous data poses new, largely unsolved challenges including online real-time processing requirements and system resource limitations for handling infinite streams. In this work, we propose a novel live stream matching framework, called n-Snippet Indices Framework (in short, SNIF), to tackle these challenges. SNIF employs snippets as the basic unit for matching streaming time-series. The insight is to perform the matching at two levels of granularity: bag matching of subsets of snippets of the live stream against prefixes of the patterns, and order checking for maintaining successive candidate snippet bag matches. We design a two-level index structure, called SNIF index, which supports these two modes of matching. We propose a family of online two-level prefix matching algorithms that trade off between result accuracy and response time. The effectiveness of SNIF to detect patterns has been thoroughly tested through experiments using real datasets from the domains of fire monitoring and sensor motes. In this paper, we also present a study of SNIFs performance, accuracy and tolerance to noise compared against those of the state-of-the-art Continuous Query with Prediction (CQP) approach.

international conference on data engineering | 2005

VAMANA - A Scalable Cost-Driven XPath Engine

Venkatesh Raghavan; Kurt W. Deschler; Elke A. Rundensteiner

Abstract¡ Several systems have recently been proposed for the evaluation of XPath expressions. However, none of these systems have demonstrated both scalability with large document sizes and robust support for the XPath language. Many of the scalability problems can be attributed to inadequate use of indexing during query evaluation. While poor support for the XPath language is often a consequence of an architecture overly optimized for certain queries. Finally, the proposed systems fail to adequately address costing with respect to query optimizations. We present VAMANA as a solution for a cost driven and scalable evaluation of ad-hoc XPath expressions. VAMANAs index-oriented query plans allow queries to be evaluated while reading only a fraction of the data. VAMANAs pipelined query framework minimizes the cost of intermediate query processing while providing cost-based transformations to further improve performance. Our experimental study con¦rms that VAMANAs cost-driven optimization approach for optimizing queries achieves a substantial performance improvement with negligible optimization overhead compared to non-optimized queries. Our study comparing VAMANA against several leading XML query engines demonstrates that VAMANAs query engine is signi¦cantly faster than these existing solutions in all considered cases.

Explore More