Ramakrishnan Srikant | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Ramakrishnan Srikant is active.

Explore More

Publication

Featured researches published by Ramakrishnan Srikant.

international conference on data engineering | 1995

Mining sequential patterns

Rakesh Agrawal; Ramakrishnan Srikant

We are given a large database of customer transactions, where each transaction consists of customer-id, transaction time, and the items bought in the transaction. We introduce the problem of mining sequential patterns over such databases. We present three algorithms to solve this problem, and empirically evaluate their performance using synthetic data. Two of the proposed algorithms, AprioriSome and AprioriAll, have comparable performance, albeit AprioriSome performs a little better when the minimum number of customers that must support a sequential pattern is low. Scale-up experiments show that both AprioriSome and AprioriAll scale linearly with the number of customer transactions. They also have excellent scale-up properties with respect to the number of transactions per customer and the number of items in a transaction.<<ETX>>

international conference on management of data | 2000

Privacy-preserving data mining

Rakesh Agrawal; Ramakrishnan Srikant

A fruitful direction for future data mining research will be the development of techniques that incorporate privacy concerns. Specifically, we address the following question. Since the primary task in data mining is the development of models about aggregated data, can we develop accurate models without access to precise information in individual data records? We consider the concrete case of building a decision-tree classifier from training data in which the values of individual records have been perturbed. The resulting data records look very different from the original records and the distribution of data values is also very different from the original distribution. While it is not possible to accurately estimate original values in individual data records, we propose a novel reconstruction procedure to accurately estimate the distribution of original data values. By using these reconstructed distributions, we are able to build classifiers whose accuracy is comparable to the accuracy of classifiers built with the original data.

extending database technology | 1996

Mining Sequential Patterns: Generalizations and Performance Improvements

Ramakrishnan Srikant; Rakesh Agrawal

The problem of mining sequential patterns was recently introduced in [3]. We are given a database of sequences, where each sequence is a list of transactions ordered by transaction-time, and each transaction is a set of items. The problem is to discover all sequential patterns with a user-specified minimum support, where the support of a pattern is the number of data-sequences that contain the pattern. An example of a sequential pattern is“5% of customers bought ‘Foundation’ and ‘Ringworld’ in one transaction, followed by ‘Second Foundation’ in a later transaction”. We generalize the problem as follows. First, we add time constraints that specify a minimum and/or maximum time period between adjacent elements in a pattern. Second, we relax the restriction that the items in an element of a sequential pattern must come from the same transaction, instead allowing the items to be present in a set of transactions whose transaction-times are within a user-specified time window. Third, given a user-defined taxonomy (is-a hierarchy) on items, we allow sequential patterns to include items across all levels of the taxonomy.

international conference on management of data | 2004

Order preserving encryption for numeric data

Rakesh Agrawal; Jerry Kiernan; Ramakrishnan Srikant; Yirong Xu

Encryption is a well established technology for protecting sensitive data. However, once encrypted, data can no longer be easily queried aside from exact matches. We present an order-preserving encryption scheme for numeric data that allows any comparison operation to be directly applied on encrypted data. Query results produced are sound (no false hits) and complete (no false drops). Our scheme handles updates gracefully and new values can be added without requiring changes in the encryption of other values. It allows standard databse indexes to be built over encrypted tables and can easily be integrated with existing database systems. The proposed scheme has been designed to be deployed in application environments in which the intruder can get access to the encrypted database, but does not have prior domain information such as the distribution of values and annot encrypt or decrypt arbitrary values of his choice. The encryption is robust against estimation of the true value in such environments.

international conference on management of data | 2003

Information sharing across private databases

Rakesh Agrawal; Alexandre V. Evfimievski; Ramakrishnan Srikant

Literature on information integration across databases tacitly assumes that the data in each database can be revealed to the other databases. However, there is an increasing need for sharing information across autonomous entities in such a way that no information apart from the answer to the query is revealed. We formalize the notion of minimal information sharing across private databases, and develop protocols for intersection, equijoin, intersection size, and equijoin size. We also show how new applications can be built using the proposed protocols.

Future Generation Computer Systems | 1997

Mining generalized association rules

Ramakrishnan Srikant; Rakesh Agrawal

Abstract We introduce the problem of mining generalized association rules. Given a large database of transactions, where each transaction consists of a set of items, and a taxonomy (is-a hierarchy) on the items, we find associations between items at any level of the taxonomy. For example, given a taxonomy that says that jackets is-a outerwear is-a clothes, we may infer a rule that “people who buy outerwear tend to buy shoes”. This rule may hold even if rules that “people who buy jackets tend to buy shoes”, and “people who buy clothes tend to buy shoes” do not hold. An obvious solution to the problem is to add all ancestors of each item in a transaction to the transaction, and then run any of the algorithms for mining association rules on these “extended transactions”. However, this “Basic” algorithm is not very fast; we present two algorithms, Cumulate and EstMerge, which run 2 to 5 times faster than Basic (and more than 100 times faster on one real-life dataset). Finally, we present a new interest-measure for rules which uses the information in the taxonomy. Given a user-specified “minimum-interest-level”, this measure prunes a large number of redundant rules; 40–60% of all the rules were pruned on two real-life datasets.

international conference on management of data | 1997

Range queries in OLAP data cubes

Ching-Tien Ho; Rakesh Agrawal; Nimrod Megiddo; Ramakrishnan Srikant

A range query applies an aggregation operation over all selected cells of an OLAP data cube where the selection is specified by providing ranges of values for numeric dimensions. We present fast algorithms for range queries for two types of aggregation operations: SUM and MAX. These two operations cover techniques required for most popular aggregation operations, such as those supported by SQL. For range-sum queries, the essential idea is to precompute some auxiliary information (prefix sums) that is used to answer ad hoc queries at run-time. By maintaining auxiliary information which is of the same size as the data cube, all range queries for a given cube can be answered in constant time, irrespective of the size of the sub-cube circumscribed by a query. Alternatively, one can keep auxiliary information which is 1/bd of the size of the d-dimensional data cube. Response to a range query may now require access to some cells of the data cube in addition to the access to the auxiliary information, but the overall time complexity is typically reduced significantly. We also discuss how the precomputed information is incrementally updated by batching updates to the data cube. Finally, we present algorithms for choosing the subset of the data cube dimensions for which the auxiliary information is computed and the blocking factor to use for each such subset. Our approach to answering range-max queries is based on precomputed max over balanced hierarchical tree structures. We use a branch-and-bound-like procedure to speed up the finding of max in a region. We also show that with a branch-and-bound procedure, the average-case complexity is much smaller than the worst-case complexity.

very large data bases | 2002

Chapter 14 – Hippocratic Databases

Rakesh Agrawal; Jerry Kiernan; Ramakrishnan Srikant; Yirong Xu

Publisher Summary The Hippocratic Oath has guided the conduct of physicians for centuries. Inspired by its tenet of preserving privacy, it has been argued that future database systems must include responsibility for the privacy of data that they manage as a founding tenet. The explosive progress in networking, storage, and processor technologies is resulting in an unprecedented amount of digitization of information. It is estimated that the amount of information in the world is doubling every 20 months, and the size and number of databases are increasing even faster. In concert with this dramatic and escalating increase in digital data, concerns about the privacy of personal information have emerged globally. Privacy issues have been further exacerbated, now that the Internet makes it easy for new data to be automatically collected and added to databases. Privacy is the fight of individuals to determine for themselves when, how, and to what extent information about them is communicated to others. Privacy concerns are being fueled by an ever-increasing list of privacy violations, ranging from privacy accidents to illegal actions. Lax security for sensitive data is of equal concern.

international conference on management of data | 2005

Privacy preserving OLAP

Rakesh Agrawal; Ramakrishnan Srikant; Dilys Thomas

We present techniques for privacy-preserving computation of multidimensional aggregates on data partitioned across multiple clients. Data from different clients is perturbed (randomized) in order to preserve privacy before it is integrated at the server. We develop formal notions of privacy obtained from data perturbation and show that our perturbation provides guarantees against privacy breaches. We develop and analyze algorithms for reconstructing counts of subcubes over perturbed data. We also evaluate the tradeoff between privacy guarantees and reconstruction accuracy and show the practicality of our approach.

international world wide web conferences | 2001

Mining web logs to improve website organization

Ramakrishnan Srikant; Yinghui Yang

Many websites have a hierarchical organization of content. This organization may be quite different from the organization expected by visitors to the website. In particular, it is often unclear where a specific document is located. In this paper, we propose an algorithm to automatically find pages in a website whose location is different from where visitors expect to find them. The key insight is that visitors will backtrack if they do not find the information where they expect it: the point from where they backtrack is the expected location for the page. We present an algorithm for discovering such expected locations that can handle page caching by the browser. Expected locations with a significant number of hits are then presented to the website administrator. We also present algorithms for selecting expected locations (for adding navigation links) to optimize the benefit to the website or the visitor. We ran our algorithm on the Wharton business school website and found that even on this small website, there were many pages with expected locations different from their actual location.

Explore More