Abhay Mehta
Hewlett-Packard
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Abhay Mehta.
international conference on autonomic computing | 2008
Chetan Gupta; Abhay Mehta; Umeshwar Dayal
Modern enterprise data warehouses have complex workloads that are notoriously difficult to manage. One of the key pieces to managing workloads is an estimate of how long a query will take to execute. An accurate estimate of this query execution time is critical to self managing Enterprise Class Data Warehouses. In this paper we study the problem of predicting the execution time of a query on a loaded data warehouse with a dynamically changing workload. We use a machine learning approach that takes the query plan, combines it with the observed load vector of the system and uses the new vector to predict the execution time of the query. The predictions are made as time ranges. We validate our solution using real databases and real workloads. We show experimentally that our machine learning approach works well. This technology is slated for incorporation into a commercial, enterprise class DBMS.
international conference on management of data | 2011
Mo Liu; Elke A. Rundensteiner; Kara Greenfield; Chetan Gupta; Song Wang; Ismail Ari; Abhay Mehta
Many modern applications, including online financial feeds, tag-based mass transit systems and RFID-based supply chain management systems transmit real-time data streams. There is a need for event stream processing technology to analyze this vast amount of sequential data to enable online operational decision making. Existing techniques such as traditional online analytical processing (OLAP) systems are not designed for real-time pattern-based operations, while state-of-the-art Complex Event Processing (CEP) systems designed for sequence detection do not support OLAP operations. We propose a novel E-Cube model which combines CEP and OLAP techniques for efficient multi-dimensional event pattern analysis at different abstraction levels. Our analysis of the interrelationships in both concept abstraction and pattern refinement among queries facilitates the composition of these queries into an integrated E-Cube hierarchy. Based on this E-Cube hierarchy, strategies of drill-down (refinement from abstract to more specific patterns) and of roll-up (generalization from specific to more abstract patterns) are developed for the efficient workload evaluation. Our proposed execution strategies reuse intermediate results along both the concept and the pattern refinement relationships between queries. Based on this foundation, we design a cost-driven adaptive optimizer called Chase, that exploits the above reuse strategies for optimal E-Cube hierarchy execution. Our experimental studies comparing alternate strategies on a real world financial data stream under different workload conditions demonstrate the superiority of the Chase method. In particular, our Chase execution in many cases performs ten fold faster than the state-of-the art strategy for real stock market query workloads.
international conference on data engineering | 2011
Mo Liu; Elke A. Rundensteiner; Daniel J. Dougherty; Chetan Gupta; Song Wang; Ismail Ari; Abhay Mehta
Complex event processing (CEP) over event streams has become increasingly important for real-time applications ranging from health care, supply chain management to business intelligence. These monitoring applications submit complex queries to track sequences of events that match a given pattern. As these systems mature the need for increasingly complex nested sequence query support arises, while the state-of-art CEP systems mostly support the execution of flat sequence queries only. To assure real-time responsiveness and scalability for pattern detection even on huge volume high-speed streams, efficient processing techniques must be designed. In this paper, we first analyze the prevailing nested pattern query processing strategy and identify several serious shortcomings. Not only are substantial subsequences first constructed just to be subsequently discarded, but also opportunities for shared execution of nested subexpressions are overlooked. As foundation, we introduce NEEL, a CEP query language for expressing nested CEP pattern queries composed of sequence, negation, AND and OR operators. To overcome deficiencies, we design rewriting rules for pushing negation into inner subexpressions. Next, we devise a normalization procedure that employs these rules for flattening a nested complex event expression. To conserve CPU and memory consumption, we propose several strategies for efficient shared processing of groups of normalized NEEL subexpressions. These strategies include prefix caching, suffix clustering and customized “bit-marking” execution strategies. We design an optimizer to partition the set of all CEP subexpressions in a NEEL normal form into groups, each of which can then be mapped to one of our shared execution operators. Lastly, we evaluate our technologies by conducting a performance study to assess the CPU processing time using real-world stock trades data. Our results confirm that our NEEL execution in many cases performs 100 fold faster than the traditional iterative nested execution strategy for real stock market query workloads.
congress on evolutionary computation | 2009
Chetan Gupta; Song Wang; Ismail Ari; Ming C. Hao; Umeshwar Dayal; Abhay Mehta; Manish Marwah; Ratnesh Sharma
In this paper, we describe the design of our architecture for Continuous, Heterogeneous Analysis Over Streams, aka CHAOS that combines stream processing, approximation techniques, mining, complex event processing and visualization. CHAOS, with the novel concept of Computational Stream Analysis Cube, provides an effective, scalable platform for near real time processing of business and enterprise streams. We describe our approach with a real data center temperature analysis application.
international conference on data engineering | 2010
Mo Liu; Elke A. Rundensteiner; Kara Greenfield; Chetan Gupta; Song Wang; Ismail Ari; Abhay Mehta
Many modern applications including tag based mass transit systems, RFID-based supply chain management systems and online financial feeds require special purpose event stream processing technology to analyze vast amounts of sequential multi-dimensional data available in real-time data feeds. Traditional online analytical processing (OLAP) systems are not designed for real-time pattern-based operations, while Complex Event Processing (CEP) systems are designed for sequence detection and do not support OLAP operations. We will demonstrate a novel E-Cube model that combines CEP and OLAP techniques for multi-dimensional event pattern analysis at different abstraction levels. A London transit scenario will be given to demonstrate the utility and performance of this proposed technology.
data management for sensor networks | 2010
Mo Liu; Medhabi Ray; Elke A. Rundensteiner; Daniel J. Dougherty; Chetan Gupta; Song Wang; Ismail Ari; Abhay Mehta
Complex event processing (CEP) has become increasingly important for tracking and monitoring applications ranging from health care, supply chain management to surveillance. These monitoring applications submit complex event queries to track sequences of events that match a given pattern. As these systems mature the need for increasingly complex nested sequence queries arises, while the state-of-the-art CEP systems mostly focus on the execution of flat sequence queries only. In this paper, we now introduce an iterative execution strategy for nested CEP queries composed of sequence, negation, AND and OR operators. Lastly we have introduced the promising direction of applying selective caching of intermediate results to optimize the execution. Our experimental study using real-world stock trades evaluates the performance of our proposed iterative execution strategy for different query types.
international conference on data engineering | 2010
Chetan Gupta; Choudur Lakshminarayan; Song Wang; Abhay Mehta
In streaming and sensor data applications, the problems of synopsis construction and outlier detection are important. Due to their low complexity, desirable properties and relative ease of understanding, wavelet based techniques are often used for both synopsis construction and anomaly detection. In streaming data literature, Mallats algorithm [1] is often used to achieve a Haar wavelet decomposition in O(n) time. However, there is one limitation to this popular technique, in that it leads to a dyadic decomposition of data. We demonstrate that the property of non-dyadicity is of considerable use in synopsis construction and anomaly detection. In this regard we present several application results, a synopsis data structure for streaming data that is an order of magnitude superior to the popular Haar based wavelet technique, a method for finding anomalies for sensor data over non-dyadic hierarchies, etc. In our work, we enable non-dyadicity by proposing a Mallat like construction for a wavelet system that admits non-dyadic basis. Our algorithm builds a non-dyadic hierarchical structure, and is more efficient than the state of the art construction. We prove the correctness of our construction by showing that our basis functions demonstrates the properties of a wavelet system.
business intelligence for the real-time enterprises | 2010
Mo Liu; Elke A. Rundensteiner; Daniel J. Dougherty; Chetan Gupta; Song Wang; Ismail Ari; Abhay Mehta
Complex event processing (CEP) over event streams has become increasingly important for real-time applications ranging from health care, supply chain management to business intelligence. These monitoring applications submit complex event queries to track sequences of events that match a given pattern. As these systems mature the need for increasingly complex nested sequence query support arises, while the state-of-art CEP systems mostly support the execution of only flat sequence queries. In this paper, we introduce our nested CEP query language NEEL for expressing nested queries composed of sequence, negation, AND and OR operators. Thereafter, we also define its formal semantics. Subtle issues with negation and predicates within the nested sequence context are discussed. An E-Analytics system for processing nested CEP queries expressed in the NEEL language has been developed. Lastly, we demonstrate the utility of this technology by describing a case study of applying this technology to a real-world application in health care.
visualization and data analysis | 2009
Ming C. Hao; Umeshwar Dayal; Daniel A. Keim; Ratnesh Sharma; Abhay Mehta
Most data streams usually are multi-dimensional, high-speed, and contain massive volumes of continuous information. They are seen in daily applications, such as telephone calls, retail sales, data center performance, and oil production operations. Many analysts want insight into the behavior of this data. They want to catch the exceptions in flight to reveal the causes of the anomalies and to take immediate action. To guide the user in finding the anomalies in the large data stream quickly, we derive a new automated neighborhood threshold marking technique, called AnomalyMarker. This technique is built on cell-based data streams and user-defined thresholds. We extend the scope of the data points around the threshold to include the surrounding areas. The idea is to define a focus area (marked area) which enables users to (1) visually group the interesting data points related to the anomalies (i.e., problems that occur persistently or occasionally) for observing their behavior; (2) discover the factors related to the anomaly by visualizing the correlations between the problem attribute with the attributes of the nearby data items from the entire multi-dimensional data stream. Mining results are quickly presented in graphical representations (i.e., tooltip) for the user to zoom into the problem regions. Different algorithms are introduced which try to optimize the size and extent of the anomaly markers. We have successfully applied this technique to detect data stream anomalies in large real-world enterprise server performance and data center energy management.
databases in networked information systems | 2011
Chetan Gupta; Umeshwar Dayal; Song Wang; Abhay Mehta
The increasing instrumentation of real world physical systems provides an opportunity for real time operations management for the purpose of efficient management of large complex systems. Real time operations management solutions for such large, complex systems such as a transportation network, massive data centers, etc., share many common characteristics and requirements. In this paper, we identify these common challenges in terms of data characteristics and system requirements. We then point out the insufficiencies in current solutions in addressing these requirements and present some results that help meet the challenges.