Srivatsan Laxman
Microsoft
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Srivatsan Laxman.
knowledge discovery and data mining | 2007
Srivatsan Laxman; P. S. Sastry; K. P. Unnikrishnan
Frequent episode discovery is a popular framework for mining data available as a long sequence of events. An episode is essentially a short ordered sequence of event types and the frequency of an episode is some suitable measure of how often the episode occurs in the data sequence. Recently,we proposed a new frequency measure for episodes based on the notion of non-overlapped occurrences of episodes in the event sequence, and showed that, such a definition, in addition to yielding computationally efficient algorithms, has some important theoretical properties in connecting frequent episode discovery with HMM learning. This paper presents some new algorithms for frequent episode discovery under this non-overlapped occurrences-based frequency definition. The algorithms presented here are better (by a factor of N, where N denotes the size of episodes being discovered) in terms of both time and space complexities when compared to existing methods for frequent episode discovery. We show through some simulation experiments, that our algorithms are very efficient. The new algorithms presented here have arguably the least possible orders of spaceand time complexities for the task of frequent episode discovery.
knowledge discovery and data mining | 2008
Srivatsan Laxman; Vikram Tankasali; Ryen W. White
This paper presents a new algorithm for sequence prediction over long categorical event streams. The input to the algorithm is a set of target event types whose occurrences we wish to predict. The algorithm examines windows of events that precede occurrences of the target event types in historical data. The set of significant frequent episodes associated with each target event type is obtained based on formal connections between frequent episodes and Hidden Markov Models (HMMs). Each significant episode is associated with a specialized HMM, and a mixture of such HMMs is estimated for every target event type. The likelihoods of the current window of events, under these mixture models, are used to predict future occurrences of target events in the data. The only user-defined model parameter in the algorithm is the length of the windows of events used during model estimation. We first evaluate the algorithm on synthetic data that was generated by embedding (in varying levels of noise) patterns which are preselected to characterize occurrences of target events. We then present an application of the algorithm for predicting targeted user-behaviors from large volumes of anonymous search session interaction logs from a commercially-deployed web browser tool-bar.
IEEE Transactions on Knowledge and Data Engineering | 2007
Srivatsan Laxman; P. S. Sastry; K. P. Unnikrishnan
This paper is concerned with the framework of frequent episode discovery in event sequences. A new temporal pattern, called the generalized episode, is defined, which extends this framework by incorporating event duration constraints explicitly into the patterns definition. This new formalism facilitates extension of the technique of episodes discovery to applications where data appears as a sequence of events that persist for different durations (rather than being instantaneous). We present efficient algorithms for episode discovery in this new framework. Through extensive simulations, we show the expressive power of the new formalism. We also show how the duration constraint possibilities can be used as a design choice to properly focus the episode discovery process. Finally, we briefly discuss some interesting results obtained on data from manufacturing plants of General Motors.
international world wide web conferences | 2011
Nikita Mishra; Rishiraj Saha Roy; Niloy Ganguly; Srivatsan Laxman; Monojit Choudhury
We introduce an unsupervised query segmentation scheme that uses query logs as the only resource and can effectively capture the structural units in queries. We believe that Web search queries have a unique syntactic structure which is distinct from that of English or a bag-of-words model. The segments discovered by our scheme help understand this underlying grammatical structure. We apply a statistical model based on Hoeffdings Inequality to mine significant word n-grams from queries and subsequently use them for segmenting the queries. Evaluation against manually segmented queries shows that this technique can detect rare units that are missed by our Pointwise Mutual Information (PMI) baseline.
Knowledge and Information Systems | 2012
Avinash Achar; Srivatsan Laxman; P. S. Sastry
Frequent episode discovery framework is a popular framework in temporal data mining with many applications. Over the years, many different notions of frequencies of episodes have been proposed along with different algorithms for episode discovery. In this paper, we present a unified view of all the apriori-based discovery methods for serial episodes under these different notions of frequencies. Specifically, we present a unified view of the various frequency counting algorithms. We propose a generic counting algorithm such that all current algorithms are special cases of it. This unified view allows one to gain insights into different frequencies, and we present quantitative relationships among different frequencies. Our unified view also helps in obtaining correctness proofs for various counting algorithms as we show here. It also aids in understanding and obtaining the anti-monotonicity properties satisfied by the various frequencies, the properties exploited by the candidate generation step of any apriori-based method. We also point out how our unified view of counting helps to consider generalization of the algorithm to count episodes with general partial orders.
international conference on the theory and application of cryptology and information security | 2011
Raghav Bhaskar; Abhishek Bhowmick; Vipul Goyal; Srivatsan Laxman; Abhradeep Thakurta
Differential Privacy (DP) has emerged as a formal, flexible framework for privacy protection, with a guarantee that is agnostic to auxiliary information and that admits simple rules for composition. Benefits notwithstanding, a major drawback of DP is that it provides noisy responses to queries, making it unsuitable for many applications. We propose a new notion called Noiseless Privacy that provides exact answers to queries, without adding any noise whatsoever. While the form of our guarantee is similar to DP, where the privacy comes from is very different, based on statistical assumptions on the data and on restrictions to the auxiliary information available to the adversary. We present a first set of results for Noiseless Privacy of arbitrary Boolean-function queries and of linear Real-function queries, when data are drawn independently, from nearly-uniform and Gaussian distributions respectively. We also derive simple rules for composition under models of dynamically changing data.
Data Mining and Knowledge Discovery | 2012
Avinash Achar; Srivatsan Laxman; Raajay Viswanathan; P. S. Sastry
Frequent episode discovery is a popular framework for temporal pattern discovery in event streams. An episode is a partially ordered set of nodes with each node associated with an event type. Currently algorithms exist for episode discovery only when the associated partial order is total order (serial episode) or trivial (parallel episode). In this paper, we propose efficient algorithms for discovering frequent episodes with unrestricted partial orders when the associated event-types are unique. These algorithms can be easily specialized to discover only serial or parallel episodes. Also, the algorithms are flexible enough to be specialized for mining in the space of certain interesting subclasses of partial orders. We point out that frequency alone is not a sufficient measure of interestingness in the context of partial order mining. We propose a new interestingness measure for episodes with unrestricted partial orders which, when used along with frequency, results in an efficient scheme of data mining. Simulations are presented to demonstrate the effectiveness of our algorithms.
international acm sigir conference on research and development in information retrieval | 2012
Rishiraj Saha Roy; Niloy Ganguly; Monojit Choudhury; Srivatsan Laxman
This paper presents the first evaluation framework for Web search query segmentation based directly on IR performance. In the past, segmentation strategies were mainly validated against manual annotations. Our work shows that the goodness of a segmentation algorithm as judged through evaluation against a handful of human annotated segmentations hardly reflects its effectiveness in an IR-based setup. In fact, state-of the-art algorithms are shown to perform as good as, and sometimes even better than human annotations a fact masked by previous validations. The proposed framework also provides us an objective understanding of the gap between the present best and the best possible segmentation algorithm. We draw these conclusions based on an extensive evaluation of six segmentation strategies, including three most recent algorithms, vis-a-vis segmentations from three human annotators. The evaluation framework also gives insights about which segments should be necessarily detected by an algorithm for achieving the best retrieval results. The meticulously constructed dataset used in our experiments has been made public for use by the research community.
Knowledge and Information Systems | 2011
Debprakash Patnaik; Srivatsan Laxman; Naren Ramakrishnan
Mining temporal network models from discrete event streams is an important problem with applications in computational neuroscience, physical plant diagnostics, and human–computer interaction modeling. In this paper, we introduce the notion of excitatory networks which are essentially temporal models where all connections are stimulative, rather than inhibitive. The emphasis on excitatory connections facilitates learning of network models by creating bridges to frequent episode mining. Specifically, we show that frequent episodes help identify nodes with high mutual information relationships and that such relationships can be summarized into a dynamic Bayesian network (DBN). This leads to an algorithm that is significantly faster than state-of-the-art methods for inferring DBNs, while simultaneously providing theoretical guarantees on network optimality. We demonstrate the advantages of our approach through an application in neuroscience, where we show how strong excitatory networks can be efficiently inferred from both mathematical models of spiking neurons and several real neuroscience datasets.
international conference on data mining | 2009
Debprakash Patnaik; Srivatsan Laxman; Naren Ramakrishnan
Mining temporal network models from discrete event streams is an important problem with applications in computational neuroscience, physical plant diagnostics, and human-computer interaction modeling. We focus in this paper on temporal models representable as excitatory networks where all connections are stimulative, rather than inhibitory. Through this emphasis on excitatory networks, we show how they can be learned by creating bridges to frequent episode mining. Specifically, we show that frequent episodes help identify nodes with high mutual information relationships and which can be summarized into a dynamic Bayesian network (DBN). To demonstrate the practical feasibility of our approach, we show how excitatory networks can be inferred from both mathematical models of spiking neurons as well as real neuroscience datasets.