Jayant R. Haritsa | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Jayant R. Haritsa is active.

Explore More

Publication

Featured researches published by Jayant R. Haritsa.

very large data bases | 2002

Maintaining data privacy in association rule mining

Shariq J. Rizvi; Jayant R. Haritsa

Data mining services require accurate input data for their results to be meaningful, but privacy concerns may influence users to provide spurious information. We investigate here, with respect to mining association rules, whether users can be encouraged to provide correct information by ensuring that the mining process cannot, with any reasonable degree of certainty, violate their privacy. We present a scheme, based on probabilistic distortion of user data, that can simultaneously provide a high degree of privacy to the user and retain a high level of accuracy in the mining results. The performance of the scheme is validated against representative real and synthetic datasets.

international conference on management of data | 2000

Turbo-charging vertical mining of large databases

Pradeep Shenoy; Jayant R. Haritsa; S. Sudarshan; Gaurav Bhalotia; Mayank Bawa; Devavrat Shah

In a vertical representation of a market-basket database, each item is associated with a column of values representing the transactions in which it is present. The association-rule mining algorithms that have been recently proposed for this representation show performance improvements over their classical horizontal counterparts, but are either efficient only for certain database sizes, or assume particular characteristics of the database contents, or are applicable only to specific kinds of database schemas. We present here a new vertical mining algorithm called VIPER, which is general-purpose, making no special requirements of the underlying database. VIPER stores data in compressed bit-vectors called “snakes” and integrates a number of novel optimizations for efficient snake generation, intersection, counting and storage. We analyze the performance of VIPER for a range of synthetic database workloads. Our experimental results indicate significant performance gains, especially for large databases, over previously proposed vertical and horizontal mining algorithms. In fact, there are even workload regions where VIPER outperforms an optimal, but practically infeasible, horizontal mining algorithm.

real-time systems symposium | 1991

Earliest deadline scheduling for real-time database systems

Jayant R. Haritsa; Miron Livny; Michael J. Carey

A new priority assignment algorithm called adaptive earliest deadline (AED) is given that stabilizes the overload performance of the earliest deadline policy in a real-time database system (RTDBS) environment. The AED algorithm uses a feedback control mechanism to achieve this objective and does not require knowledge of transaction characteristics. Using a detailed simulation model, the authors compare the performance of AED with respect to earliest deadline and other fixed priority schemes. They also present and evaluate an extension of the AED algorithm called hierarchical earliest deadline (HED), which is designed to handle applications that assign different values to transactions and where the goal is to maximize the total value of the in-time transactions.<<ETX>>

symposium on principles of database systems | 1990

On being optimistic about real-time constraints

Jayant R. Haritsa; Michael J. Carey; Miron Livny

Performance studies of concurrency control algorithms for conventional database systems have shown that, under most operating circumstances, locking protocols outperform optimistic techniques. Real-time database systems have special characteristics - timing constraints are associated with transactions, performance criteria are based on satisfaction of these timing constraints, and scheduling algorithms are priority driven. In light of these special characteristics, results regarding the performance of concurrency control algorithms need to be re-evaluated. We show in this paper that the following parameters of the real-time database system - its policy for dealing with transactions whose constraints are not met, its knowledge of transaction resource requirements, and the availability of resources - have a significant impact on the relative performance of the concurrency control algorithms. In particular, we demonstrate that under a policy that discards transactions whose constraints are not met, optimistic concurrency control outperforms locking over a wide range of system utilization. We also outline why, for a variety of reasons, optimistic algorithms appear well-suited to real-time database systems.

real-time systems symposium | 1990

Dynamic real-time optimistic concurrency control

Jayant R. Haritsa; Michael J. Carey; Miron Livny

The authors (1990) have shown that in real-time database systems that discard late transactions, optimistic concurrency control outperforms locking. Although the optimistic algorithm used in that study, OPT-BC, did not factor in transaction deadlines in making data conflict resolution decisions, it still outperformed a deadline-cognizant locking algorithm. A discussion is presented of why adding deadline information to optimistic algorithms is a nontrivial problem, and some alternative methods of doing so are described. A new real-time optimistic concurrency control algorithm, WAIT-50, is presented that monitors transaction conflict states and gives precedence to urgent transactions in a controlled manner. WAIT-50 is shown to provide significant performance gains over OPT-BC under a variety of operating conditions and workloads.<<ETX>>

Real-time Systems | 1992

Data access scheduling in firm real-time database systems

Jayant R. Haritsa; Michael J. Carey; Miron Livny

A major challenge addressed by conventional database systems has been to efficiently implement the transaction model, which provides the properties of atomicity, serializability, and permanence. Real-time applications have added a complex new dimension to this challenge by placing deadlines on the response time of the database system. In this paper, we examine the problem of real-time data access scheduling, that is, the problem of scheduling the data accesses of real-time transactions in order to meet their deadlines. In particular, we focus on firm deadline real-time database applications, where transactions that miss their deadlines are discarded and the objective of the real-time database system is to minimize the number of missed deadlines. Within this framework, we use a detailed simulation model to compare the performance of several real-time locking protocols and optimistic concurrency control algorithms under a variety of real-time transaction workloads. The results of our study show that in moving from the conventional database system domain to the real-time domain, there are new performance-related forces that come into effect. Our experiments demonstrate that these factors can cause performance recommendations that were valid in a conventional database setting to be significantly altered in the corresponding real-time setting.

international conference on data engineering | 2005

A framework for high-accuracy privacy-preserving mining

Shipra Agrawal; Jayant R. Haritsa

To preserve client privacy in the data mining process, a variety of techniques based on random perturbation of individual data records have been proposed recently. In this paper, we present FRAPP, a generalized matrix-theoretic framework of random perturbation, which facilitates a systematic approach to the design of perturbation mechanisms for privacy-preserving mining. Specifically, FRAPP is used to demonstrate that (a) the prior techniques differ only in their choices for the perturbation matrix elements, and (b) a symmetric perturbation matrix with minimal condition number can be identified, maximizing the accuracy even under strict privacy guarantees. We also propose a novel perturbation mechanism wherein the matrix elements are themselves characterized as random variables, and demonstrate that this feature provides significant improvements in privacy at only a marginal cost in accuracy. The quantitative utility of FRAPP, which applies to random-perturbation-based privacy-preserving mining in general, is evaluated specifically with regard to frequent-itemset mining on a variety of real datasets. Our experimental results indicate that, for a given privacy requirement, substantially lower errors are incurred, with respect to both itemset identity and itemset support, as compared to the prior techniques.

international conference on management of data | 2002

StatiX: making XML count

Juliana Freire; Jayant R. Haritsa; Maya Ramanath; Prasan Roy; Jérôme Siméon

The availability of summary data for XML documents has many applications, from providing users with quick feedback about their queries, to cost-based storage design and query optimization. StatiX is a novel XML Schema-aware statistics framework that exploits the structure derived by regular expressions (which define elements in an XML Schema) to pinpoint places in the schema that are likely sources of structural skew. As we discuss below, this information can be used to build concise, yet accurate, statistical summaries for XML data. StatiX leverages standard XML technology for gathering statistics, notably XML Schema validators, and it uses histograms to summarize both the structure and values in an XML document. In this paper we describe the StatiX system. We develop algorithms that decompose schemas to obtain statistics at different granularities and discuss how statistics can be gathered as documents are validated. We also present an experimental evaluation which demonstrates the accuracy and scalability of our approach and show an application of these statistics to cost-based XML storage design.

very large data bases | 1993

Value-based scheduling in real-time database systems

Jayant R. Haritsa; Michael J. Carey; Miron Livny

In a real-time database system, an application may assign avalue to a transaction to reflect the return it expects to receive if the transaction commits before its deadline. Most research on real-time database systems has focused on systems where all transactions are assigned the same value, the performance goal being to minimize the number of missed deadlines. When transactions are assigned different values, the goal of the system shifts to maximizing the sum of the values of those transactions that commit by their deadlines. Minimizing the number of missed deadlines becomes a secondary concern. In this article, we address the problem of establishing a priority ordering among transactions characterized by both values and deadlines that results in maximizing the realized value. Of particular interest is the tradeoff established between these values and deadlines in constructing the priority ordering. Using a detailed simulation model, we evaluate the performance of several priority mappings that make this tradeoff in different, but fixed, ways. In addition, a “bucket” priority mechanism that allows the relative importannce of values and deadlines to be controlled is introduced and studied. The notion of associating a penalty with transactions whose deadlines are not met is also briefly considered.

Data Mining and Knowledge Discovery | 2009

FRAPP: a framework for high-accuracy privacy-preserving mining

Shipra Agrawal; Jayant R. Haritsa; B. Aditya Prakash

To preserve client privacy in the data mining process, a variety of techniques based on random perturbation of individual data records have been proposed recently. In this paper, we present FRAPP, a generalized matrix-theoretic framework of random perturbation, which facilitates a systematic approach to the design of perturbation mechanisms for privacy-preserving mining. Specifically, FRAPP is used to demonstrate that (a) the prior techniques differ only in their choices for the perturbation matrix elements, and (b) a symmetric positive-definite perturbation matrix with minimal condition number can be identified, substantially enhancing the accuracy even under strict privacy requirements. We also propose a novel perturbation mechanism wherein the matrix elements are themselves characterized as random variables, and demonstrate that this feature provides significant improvements in privacy at only a marginal reduction in accuracy. The quantitative utility of FRAPP, which is a general-purpose random-perturbation-based privacy-preserving mining technique, is evaluated specifically with regard to association and classification rule mining on a variety of real datasets. Our experimental results indicate that, for a given privacy requirement, either substantially lower modeling errors are incurred as compared to the prior techniques, or the errors are comparable to those of direct mining on the true database.

Explore More