Is this you? Create Your Porfile

Karen Works

Worcester Polytechnic Institute

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Karen Works is active.

Explore More

Publication

Featured researches published by Karen Works.

international conference on data engineering | 2011

Semantic stream query optimization exploiting dynamic metadata

Luping Ding; Karen Works; Elke A. Rundensteiner

Data stream management systems (DSMS) processing long-running queries over large volumes of stream data must typically deliver time-critical responses. We propose the first semantic query optimization (SQO) approach that utilizes dynamic substream metadata at runtime to find a more efficient query plan than the one selected at compilation time. We identify four SQO techniques guaranteed to result in performance gains. Based on classic satisfiability theory we then design a lightweight query optimization algorithm that efficiently detects SQO opportunities at runtime. At the logical level, our algorithm instantiates multiple concurrent SQO plans, each processing different partially overlapping substreams. Our novel execution paradigm employs multi-modal operators to support the execution of these concurrent SQO logical plans in a single physical plan. This highly agile execution strategy reduces resource utilization while supporting lightweight adaptivity. Our extensive experimental study in the CAPE stream processing system using both synthetic and real data confirms that our optimization techniques significantly reduce query execution times, up to 60%, compared to the traditional approach.

Journal of Computer and System Sciences | 2013

Optimizing adaptive multi-route query processing via time-partitioned indices

Karen Works; Elke A. Rundensteiner; Emmanuel Agu

Adaptive multi-route query processing (AMR) is an emerging paradigm for processing stream queries in highly fluctuating environments. The content of stream data can be unpredictable. Thus, instead of selecting a fixed plan, AMR dynamically routes batches of tuples to operators in the query network based on up-to-date system statistics. The workload of query access patterns in AMR systems is ever changing. Selecting a single best index may not efficiently support all query access patterns at all times. While maintaining multiple indices to match a variety of query access patterns increases overhead and decreases throughput. Index design, while paramount for efficient query execution, is particularly challenging in AMR systems as the indices must serve the continuously evolving query access patterns. Our proposed Adaptive Multi-Route Index (AMRI) employs a bitmap time-partitioned design that serves a diverse ever changing workload of query access patterns and remains lightweight in terms of maintenance and storage requirements. We propose a high quality yet efficient assessment method modeled after hierarchical heavy hitters that exploits route relationships by modeling the frequency of the search access patterns used as nodes in a lattice. We also design assessment scheduling methods for AMRI based upon detecting changes in the search access patterns used. Our AMRI incorporates migration strategies that seek to meet the needs of both old partially serviced and new incoming search requests. Our experimental study using both synthetic and real data streams demonstrates that AMRI strikes a balance between effectively supporting dynamic stream environments while keeping the index overhead to a minimum. Using an environmental data set collected in the Intel Berkeley Research lab, our AMRI produced on average 68% more cumulative throughput than the state-of-the-art approach.

international conference on data engineering | 2011

The Proactive Promotion Engine

Karen Works; Elke A. Rundensteiner

Given the nature of high volume streaming environments, not all tuples can be processed within the required response time. In such instances, it is crucial to dedicate resources to producing the most important results. We will demonstrate the Proactive Promotion Engine (PP) which employs a new preferential resource allocation methodology for priority processing of stream tuples. Our key contributions include: 1) our promotion continuous query language allows the specification of priorities within a query, 2) our promotion query algebra supports proactive promotion query processing, 3) our promotion query optimization locates an optimized PP query plan, and 4) our adaptive promotion control adapts online which subset of tuples are given priority online within a single physical query plan. Our “Portland Home Arrest” demonstration facilitates the capture of in-flight criminals using data generated by the Virginia Tech Network Dynamics and Simulation Science Laboratory via simulation-based modeling techniques.

International Journal of Cooperative Information Systems | 2014

Preferential Resource Allocation in Stream Processing Systems

Karen Works; Elke A. Rundensteiner

Overloaded data stream management systems (DSMS) cannot process all tuples within their response time. For some DSMS it is crucial to allocate the precious resources to process the most significant tuples. Prior work has applied shedding and spilling to permanently drop or temporarily place to disk insignificant tuples. However neither approach considers that tuple significance can be multi-tiered nor that significance determination can be costly. These approaches consider all tuples not dropped to be equally significant. Unlike these prior works, we take a fresh stance by pulling the most significant tuples forward throughout the query pipeline. Proactive Promotion (PP), a new DSMS methodology for preferential CPU resource allocation, selectively pulls the most significant tuples ahead of less significant tuples. Our optimizer produces an optimal PP plan that minimizes the processing latency of tuples in the most significant tiers in this multi-tiered precedence scheme by strategically placing significance determination operators throughout the query pipeline at compile-time and by agilely activating them at run-time. Our results substantiate that PP lowers the latency and increases the throughput for significant results when compared to the state-of-the-art shedding and traditional DSMS approaches (between 2 and 18 fold for a rich diversity of datasets) with negligible overhead.

ieee international symposium on parallel distributed processing workshops and phd forum | 2010

Index tuning for adaptive multi-route data stream systems

Karen Works; Elke A. Rundensteiner; Emmanuel Agu

Adaptive multi-route query processing (AMR) is an emerging paradigm for processing stream queries in highly fluctuating environments. AMR dynamically routes batches of tuples to operators in the query network based on routing criteria and up-to-date system statistics. In the context of AMR systems, indexing, a core technology for efficient stream processing, has received little attention. Indexing in AMR systems is demanding as indices must adapt to serve continuously evolving query paths while maintaining index content under high volumes of data. Our Adaptive Multi-Route Index (AMRI) employs a bitmap design. Our AMRI design is both versatile in serving a diverse ever changing workload of multiple query access patterns as well as lightweight in terms of maintenance and storage requirements. In addition, our AMRI index tuner exploits the hierarchical interrelationships between query access patterns to compress the statistics collected for assessment. Our experimental study using synthetic data streams has demonstrated that AMRI strikes a balance between supporting effective query processing in dynamic stream environments while keeping the overhead to a minimum.

Big Data Research | 2015

Practical Identification of Dynamic Precedence Criteria to Produce Critical Results from Big Data Streams

Karen Works; Elke A. Rundensteiner

Abstract During periods of high volume, big data stream applications may not have enough resources to process all incoming tuples. To maximize the production of the most critical results under such resource shortages, a recent solution, PR (short for Preferential Result), utilizes both static criteria (defined at compile-time) and dynamic criteria (identified online at run-time) to prioritize the processing of tuples throughout the query pipeline. Unfortunately, locating the optimal criteria placement (i.e., where in the query pipeline to evaluate each prioritization criteria) is extremely compute-intensive and runs in exponential time. This makes PR impractical for complex big data stream systems. Our proposed criteria selection and placement approach, PR-Prune (short for Preferential Result-Pruning), is practical. PR-Prune prunes ineffective dynamic criteria and combines multiple criteria along the same pipeline. To achieve this, PR-Prune seeks to expand the duration in the query pipeline that tuples identified as critical are pulled forward. Our experiments use a real data stream from the S&P 500 stocks, synthetic data streams, and a diverse set of queries. The results substantiate that PR-Prune increases the production of the most critical results compared to the state-of-the-art approaches. In addition, PR-Prune significantly lowers the optimization search time compared to PR.

Trans. Large-Scale Data- and Knowledge-Centered Systems | 2014

Reliable Aggregation over Prioritized Data Streams

Karen Works; Elke A. Rundensteiner

Under limited resources, targeted prioritized data stream systems (TP) adjust the processing order of tuples to produce the most significant results first. In TP, an aggregation operator may not receive all tuples within an aggregation group. Typically, the aggregation operator is unaware of how many and which tuples are missing. As a consequence, computed averages over these streams could be skewed, invalid, and worse yet totally misleading. Such inaccurate results are unacceptable for many applications. TP-Ag is a novel aggregate operator for TP that produces reliable average calculations for normally distributed data under adverse conditions. It determines at run-time which results to produce and which subgroups in the aggregate population are used to generate each result. A carefully designed application of Cochran’s sample size methodology is used to measure the reliability of results. Each result is annotated with which subgroups were used in its production. Our experimental findings substantiate that TP-Ag increases the reliability of average calculations compared to the state-of-the-art approaches for TP systems (up to 91% more accurate results).

Journal of Computer and System Sciences | 2013