Ashwin Lall | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Ashwin Lall is active.

Explore More

Publication

Featured researches published by Ashwin Lall.

measurement and modeling of computer systems | 2006

Data streaming algorithms for estimating entropy of network traffic

Ashwin Lall; Vyas Sekar; Mitsunori Ogihara; Jun Xu; Hui Zhang

Using entropy of traffic distributions has been shown to aid a wide variety of network monitoring applications such as anomaly detection, clustering to reveal interesting patterns, and traffic classification. However, realizing this potential benefit in practice requires accurate algorithms that can operate on high-speed links, with low CPU and memory requirements. In this paper, we investigate the problem of estimating the entropy in a streaming computation model. We give lower bounds for this problem, showing that neither approximation nor randomization alone will let us compute the entropy efficiently. We present two algorithms for randomly approximating the entropy in a time and space efficient manner, applicable for use on very high speed (greater than OC-48) links. The first algorithm for entropy estimation is inspired by the structural similarity with the seminal work of Alon et al. for estimating frequency moments, and we provide strong theoretical guarantees on the error and resource usage. Our second algorithm utilizes the observation that the performance of the streaming algorithm can be enhanced by separating the high-frequency items (or elephants) from the low-frequency items (or mice). We evaluate our algorithms on traffic traces from different deployment scenarios.

internet measurement conference | 2007

A data streaming algorithm for estimating entropies of od flows

Haiquan Zhao; Ashwin Lall; Mitsunori Ogihara; Oliver Spatscheck; Jia Wang; Jun Xu

Entropy has recently gained considerable significance as an important metric for network measurement. Previous research has shown its utility in clustering traffic and detecting traffic anomalies. While measuring the entropy of the traffic observed at a single point has already been studied, an interesting open problem is to measure the entropy of the traffic between every origin-destination pair. In this paper, we propose the first solution to this challenging problem. Our sketch builds upon and extends the Lp sketch of Indyk with significant additional innovations. We present calculations showing that our data streaming algorithm is feasible for high link speeds using commodity CPU/memory at a reasonable cost. Our algorithm is shown to be very accurate in practice via simulations, using traffic traces collected at a tier-1 ISP backbone link.

international conference on data engineering | 2011

Representative skylines using threshold-based preference distributions

Atish Das Sarma; Ashwin Lall; Danupon Nanongkai; Richard J. Lipton; Jim Xu

The study of skylines and their variants has received considerable attention in recent years. Skylines are essentially sets of most interesting (undominated) tuples in a database. However, since the skyline is often very large, much research effort has been devoted to identifying a smaller subset of (say k) “representative skyline” points. Several different definitions of representative skylines have been considered. Most of these formulations are intuitive in that they try to achieve some kind of clustering “spread” over the entire skyline, with k points. In this work, we take a more principled approach in defining the representative skyline objective. One of our main contributions is to formulate the problem of displaying k representative skyline points such that the probability that a random user would click on one of them is maximized.

very large data bases | 2009

Randomized multi-pass streaming skyline algorithms

Atish Das Sarma; Ashwin Lall; Danupon Nanongkai; Jun Xu

We consider external algorithms for skyline computation without pre-processing. Our goal is to develop an algorithm with a good worst case guarantee while performing well on average. Due to the nature of disks, it is desirable that such algorithms access the input as a stream (even if in multiple passes). Using the tools of randomness, proved to be useful in many applications, we present an efficient multi-pass streaming algorithm, RAND, for skyline computation. As far as we are aware, RAND is the first randomized skyline algorithm in the literature. RAND is near-optimal for the streaming model, which we prove via a simple lower bound. Additionally, our algorithm is distributable and can handle partially ordered domains on each attribute. Finally, we demonstrate the robustness of RAND via extensive experiments on both real and synthetic datasets. RAND is comparable to the existing algorithms in average case and additionally tolerant to simple modifications of the data, while other algorithms degrade considerably with such variation.

international conference on data engineering | 2010

Global iceberg detection over distributed data streams

Haiquan Zhao; Ashwin Lall; Mitsunori Ogihara; Jun Xu

In todays Internet applications or sensor networks we often encounter large amounts of data spread over many physically distributed nodes. The sheer volume of the data and bandwidth constraints make it impractical to send all the data to one central node for query processing. Finding distributed icebergs—elements that may have low frequency at individual nodes but high aggregate frequency—is a problem that arises commonly in practice. In this paper we present a novel algorithm with two notable properties. First, its accuracy guarantee and communication cost are independent of the way in which element counts (for both icebergs and non-icebergs) are split amongst the nodes. Second, it works even when each distributed data set is a stream (i.e., one pass data access only). Our algorithm builds upon sketches constructed for the estimation of the second frequency moment (F2) of data streams. The intuition of our idea is that when there are global icebergs in the union of these data streams the F2 of the union becomes very large. This quantity can be estimated due to the summable nature of F2 sketches. Our key innovation here is to establish tight theoretical guarantees of our algorithm, under certain reasonable assumptions, using an interesting combination of convex ordering theory and large deviation techniques.

international conference on management of data | 2012

Interactive regret minimization

Danupon Nanongkai; Ashwin Lall; Atish Das Sarma; Kazuhisa Makino

We study the notion of regret ratio proposed in [19] Nanongkai et al. [VLDB10] to deal with multi-criteria decision making in database systems. The regret minimization query proposed in [19] Nanongkai et al. was shown to have features of both skyline and top-k: it does not need information from the user but still controls the output size. While this approach is suitable for obtaining a reasonably small regret ratio, it is still open whether one can make the regret ratio arbitrarily small. Moreover, it remains open whether reasonable questions can be asked to the users in order to improve efficiency of the process. In this paper, we study the problem of minimizing regret ratio when the system is enhanced with interaction. We assume that when presented with a set of tuples the user can tell which tuple is most preferred. Under this assumption, we develop the problem of interactive regret minimization where we fix the number of questions and tuples per question that we can display, and aim at minimizing the regret ratio. We try to answer two questions in this paper: (1) How much does interaction help? That is, how much can we improve the regret ratio when there are interactions? (2) How efficient can interaction be? In particular, we measure how many questions we have to ask the user in order to make her regret ratio small enough. We answer both questions from both theoretical and practical standpoints. For the first question, we show that interaction can reduce the regret ratio almost exponentially. To do this, we prove a lower bound for the previous approach (thereby resolving an open problem from [19] Nanongkai et al.), and develop an almost-optimal upper bound that makes the regret ratio exponentially smaller. Our experiments also confirm that, in practice, interactions help in improving the regret ratio by many orders of magnitude. For the second question, we prove that when our algorithm shows a reasonable number of points per question, it only needs a few questions to make the regret ratio small. Thus, interactive regret minimization seems to be a necessary and sufficient way to deal with multi-criteria decision making in database systems.

international conference on computer communications | 2012

A simpler and better design of error estimating coding

Nan Hua; Ashwin Lall; Baochun Li; Jun Xu

We study error estimating codes with the goal of establishing better bounds for the theoretical and empirical overhead of such schemes. We explore the idea of using sketch data structures for this problem, and show that the tug-of-war sketch gives an asymptotically optimal solution. The optimality of our algorithms are proved using communication complexity lower bound techniques. We then propose a novel enhancement of the tug-of-war sketch that greatly reduces the communication overhead for realistic error rates. Our theoretical analysis and assertions are supported by extensive experimental evaluation.

meeting of the association for computational linguistics | 2014

Exponential Reservoir Sampling for Streaming Language Models

Miles Osborne; Ashwin Lall; Benjamin Van Durme

We show how rapidly changing textual streams such as Twitter can be modelled in fixed space. Our approach is based upon a randomised algorithm called Exponential Reservoir Sampling, unexplored by this community until now. Using language models over Twitter and Newswire as a testbed, our experimental results based on perplexity support the intuition that recently observed data generally outweighs that seen in the past, but that at times, the past can have valuable signals enabling better modelling of the present.

measurement and modeling of computer systems | 2012

Towards optimal error-estimating codes through the lens of Fisher information analysis

Nan Hua; Ashwin Lall; Baochun Li; Jun Xu

Error estimating coding (EEC) has recently been established as an important tool to estimate bit error rates in the transmission of packets over wireless links, with a number of potential applications in wireless networks. In this paper, we present an in-depth study of error estimating codes through the lens of Fisher information analysis and find that the original EEC estimator fails to exploit the information contained in its code to the fullest extent. Motivated by this discovery, we design a new estimator for the original EEC algorithm, which significantly improves the estimation accuracy, and is empirically very close to the Cramer-Rao bound. Following this path, we generalize the EEC algorithm to a new family of algorithms called gEEC generalized EEC. These algorithms can be tuned to hold 25-35% more information with the same overhead, and hence deliver even better estimation accuracy---close to optimal, as evidenced by the Cramer-Rao bound. Our theoretical analysis and assertions are supported by extensive experimental evaluation.

international symposium on distributed computing | 2012

Dense subgraphs on dynamic networks

Atish Das Sarma; Ashwin Lall; Danupon Nanongkai; Amitabh Trehan

In distributed networks, it is often useful for the nodes to be aware of dense subgraphs, e.g., such a dense subgraph could reveal dense substructures in otherwise sparse graphs (e.g. the World Wide Web or social networks); these might reveal community clusters or dense regions for possibly maintaining good communication infrastructure. In this work, we address the problem of self-awareness of nodes in a dynamic network with regards to graph density, i.e., we give distributed algorithms for maintaining dense subgraphs that the member nodes are aware of. The only knowledge that the nodes need is that of the dynamic diameterD, i.e., the maximum number of rounds it takes for a message to traverse the dynamic network. For our work, we consider a model where the number of nodes are fixed, but a powerful adversary can add or remove a limited number of edges from the network at each time step. The communication is by broadcast only and follows the CONGEST model. Our algorithms are continuously executed on the network, and at any time (after some initialization) each node will be aware if it is part (or not) of a particular dense subgraph. We give algorithms that (2+e)-approximate the densest subgraph and (3+e)-approximate the at-least-k-densest subgraph (for a given parameter k). Our algorithms work for a wide range of parameter values and run in O(Dlog1+en) time. Further, a special case of our results also gives the first fully decentralized approximation algorithms for densest and at-least-k-densest subgraph problems for static distributed graphs.

Explore More