Nisheeth Shrivastava
Bell Labs
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Nisheeth Shrivastava.
international conference on embedded networked sensor systems | 2004
Nisheeth Shrivastava; Chiranjeeb Buragohain; Divyakant Agrawal; Subhash Suri
Wireless sensor networks offer the potential to span and monitor large geographical areas inexpensively. Sensors, however, have significant power constraint (battery life), making communication very expensive. Another important issue in the context of sensor-based information systems is that individual sensor readings are inherently unreliable. In order to address these two aspects, sensor database systems like TinyDB and Cougar enable in-network data aggregation to reduce the communication cost and improve reliability. The existing data aggregation techniques, however, are limited to relatively simple types of queries such as SUM, COUNT, AVG, and MIN/MAX. In this paper we propose a data aggregation scheme that significantly extends the class of queries that can be answered using sensor networks. These queries include (approximate) quantiles, such as the median, the most frequent data values, such as the consensus value, a histogram of the data distribution, as well as range queries. In our scheme, each sensor aggregates the data it has received from other sensors into a fixed (user specified) size message. We provide strict theoretical guarantees on the approximation quality of the queries in terms of the message size. We evaluate the performance of our aggregation scheme by simulation and demonstrate its accuracy, scalability and low resource utilization for highly variable input data sets.Wireless sensor networks offer the potential to span and monitor large geographical areas inexpensively. Sensors, however, have significant power constraint (battery life), making communication very expensive. Another important issue in the context of sensor-based information systems is that individual sensor readings are inherently unreliable. In order to address these two aspects, sensor database systems like TinyDB and Cougar enable in-network data aggregation to reduce the communication cost and improve reliability. The existing data aggregation techniques, however, are limited to relatively simple types of queries such as SUM, COUNT, AVG, and MIN/MAX. In this paper we propose a data aggregation scheme that significantly extends the class of queries that can be answered using sensor networks. These queries include (approximate) quantiles, such as the median, the most frequent data values, such as the <i>consensus</i> value, a histogram of the data distribution, as well as range queries. In our scheme, each sensor aggregates the data it has received from other sensors into a fixed (user specified) size message. We provide strict theoretical guarantees on the approximation quality of the queries in terms of the message size. We evaluate the performance of our aggregation scheme by simulation and demonstrate its accuracy, scalability and low resource utilization for highly variable input data sets.
international conference on management of data | 2008
Saket Navlakha; Rajeev Rastogi; Nisheeth Shrivastava
We propose a highly compact two-part representation of a given graph G consisting of a graph summary and a set of corrections. The graph summary is an aggregate graph in which each node corresponds to a set of nodes in G, and each edge represents the edges between all pair of nodes in the two sets. On the other hand, the corrections portion specifies the list of edge-corrections that should be applied to the summary to recreate G. Our representations allow for both lossless and lossy graph compression with bounds on the introduced error. Further, in combination with the MDL principle, they yield highly intuitive coarse-level summaries of the input graph G. We develop algorithms to construct highly compressed graph representations with small sizes and guaranteed accuracy, and validate our approach through an extensive set of experiments with multiple real-life graph data sets. To the best of our knowledge, this is the first work to compute graph summaries using the MDL principle, and use the summaries (along with corrections) to compress graphs with bounded error.
ACM Transactions on Sensor Networks | 2009
Nisheeth Shrivastava; Raghuraman Mudumbai; Upamanyu Madhow; Subhash Suri
We explore fundamental performance limits of tracking a target in a two-dimensional field of binary proximity sensors, and design algorithms that attain those limits while providing minimal descriptions of the estimated target trajectory. Using geometric and probabilistic analysis of an idealized model, we prove that the achievable spatial resolution in localizing a targets trajectory is of the order of 1/ρR, where R is the sensing radius and ρ is the sensor density per unit area. We provide a geometric algorithm for computing an economical (in descriptive complexity) piecewise linear path that approximates the trajectory within this fundamental limit of accuracy. We employ analogies between binary sensing and sampling theory to contend that only a “lowpass” approximation of the trajectory is attainable, and explore the implications of this observation for estimating the targets velocity. We also consider nonideal sensing, employing particle filters to average over noisy sensor observations, and geometric geometric postprocessing of the particle filter output to provide an economical piecewise linear description of the trajectory. In addition to simulation results validating our approaches for both idealized and nonideal sensing, we report on lab-scale experiments using motes with acoustic sensors.
international conference on data engineering | 2008
Nisheeth Shrivastava; Anirban Majumder; Rajeev Rastogi
Modern communication networks are vulnerable to attackers who send unsolicited messages to innocent users, wasting network resources and user time. Some examples of such attacks are spam emails, annoying tele-marketing phone calls, viral marketing in social networks, etc. Existing techniques to identify these attacks are tailored to certain specific domains (like email spam filtering), but are not applicable to a majority of other networks. We provide a generic abstraction of such attacks, called the Random Link Attack (RLA), that can be used to describe a large class of attacks in communication networks. In an RLA, the malicious user creates a set of false identities and uses them to communicate with a large, random set of innocent users. We mine the social networking graph extracted from user interactions in the communication network to find RLAs. To the best of our knowledge, this is the first attempt to conceptualize the attack definition, applicable to a variety of communication networks. In this paper, we formally define RLA and show that the problem of finding an RLA is NP-complete. We also provide two efficient heuristics to mine subgraphs satisfying the RLA property; the first (GREEDY) is based on greedy set-expansion, and the second (TRWALK) on randomized graph traversal. Our experiments with a real-life data set demonstrate the effectiveness of these algorithms.
Algorithmica | 2006
John Hershberger; Nisheeth Shrivastava; Subhash Suri; Csaba D. Tóth
We propose a space-efficient scheme for summarizing multidimensional data streams. Our sketch can be used to solve spatial versions of several classical data stream queries efficiently. For instance, we can track ε-hot spots, which are congruent boxes containing at least an ε fraction of the stream, and maintain hierarchical heavy hitters in d dimensions. Our sketch can also be viewed as a multidimensional generalization of the ε-approximate quantile summary. The space complexity of our scheme is O((1/ε) log R) if the points lie in the domain [0, R]d, where d is assumed to be a constant. The scheme extends to the sliding window model with a log (ε n) factor increase in space, where n is the size of the sliding window. Our sketch can also be used to answer ε-approximate rectangular range queries over a stream of d-dimensional points.
international conference on data engineering | 2007
Chiranjeeb Buragohain; Nisheeth Shrivastava; Subhash Suri
We propose new algorithms for constructing maximum error (L∞) histograms in the data stream model. Our first algorithm (Min-Merge) achieves the following performance guarantee: using O(B) memory, it constructs a 2B-bucket histogram whose approximation error is at most the error of the optimal B-bucket histogram. Our second algorithm (Min-Increment) achieves a (1 + ε)-approximation of a B-bucket histogram using O(ε-1 B log U) space, where U is the size of the domain for data values. The memory requirements of these algorithms are a significant improvement over the previous best schemes for constructing near-optimal histograms in the data stream model, making them ideal for data summary applications where memory is at a premium, such as wireless sensor networks. Our Min-Increment algorithm also extends to the sliding window model without any asymptotic increase in space. Finally, using synthetic and real-world data, we show that our algorithms are indeed as space-efficient in practice as their theoretical analysis predicts - compared to previous best algorithms, they require two or more orders of magnitude less memory for the same approximation error.
symposium on principles of database systems | 2005
John Hershberger; Nisheeth Shrivastava; Subhash Suri; Csaba D. Tóth
Heavy hitters, which are items occurring with frequency above a given threshold, are an important aggregation and summary tool when processing data streams or data warehouses. Hierarchical heavy hitters (HHHs) have been introduced as a natural generalization for hierarchical data domains, including multi-dimensional data. An item <i>x</i> in a hierarchy is called a φ-<i>HHH</i> if its frequency <i>after discounting the frequencies of all its descendant hierarchical heavy hitters</i> exceeds φ<i>n</i>, where φ is a user-specified parameter and <i>n</i> is the size of the data set. Recently, single-pass schemes have been proposed for computing φ-<i>HHHs</i> using space roughly <i>O</i>(1/φ log(φ<i>n</i>)). The frequency estimates of these algorithms, however, hold only for the <i>total frequencies</i> of items, and not the discounted frequencies; this leads to <i>false positives</i> because the discounted frequency can be significantly smaller than the total frequency. This paper attempts to explain the difficulty of finding hierarchical heavy hitters with better accuracy. We show that a single-pass deterministic scheme that computes φ-<i>HHHs</i> in a <i>d</i>-dimensional hierarchy with any approximation guarantee must use Ω(1/φ<sup><i>d</i>+1</sup>) space. This bound is tight: in fact, we present a data stream algorithm that can report the φ-<i>HHHs</i> without false positives in <i>O</i>(1/φ<sup><i>d</i>+1</sup>) space.
international conference on data mining | 2010
Samik Datta; Anirban Majumder; Nisheeth Shrivastava
Viral Marketing, the idea of exploiting social interactions of users to propagate awareness for products, has gained considerable focus in recent years. One of the key issues in this area is to select the best seeds that maximize the influence propagated in the social network. In this paper, we define the seed selection problem (called t-Influence Maximization, or t-IM) for multiple products. Specifically, given the social network and t products along with their seed requirements, we want to select seeds for each product that maximize the overall influence. As the seeds are typically sent promotional messages, to avoid spamming users, we put a hard constraint on the number of products for which any single user can be selected as a seed. In this paper, we design two efficient techniques for the t-IM problem, called Greedy and FairGreedy. The Greedy algorithm uses simple greedy hill climbing, but still results in a 1/3-approximation to the optimum. Our second technique, FairGreedy, allocates seeds with not only high overall influence (close to Greedy in practice), but also ensures fairness across the influence of different products. We also design efficient heuristics for estimating the influence of the selected seeds, that are crucial for running the seed selection on large social network graphs. Finally, using extensive simulations on real-life social graphs, we show the effectiveness and scalability of our techniques compared to existing and naive strategies.
symposium on usable privacy and security | 2013
Lalit Agarwal; Nisheeth Shrivastava; Sharad Jaiswal; Saurabh Panjwani
Recent studies have highlighted user concerns with respect to third-party tracking and online behavioral advertising (OBA) and the need for better consumer choice mechanisms to address these phenomena. We re-investigate the question of perceptions of third-party tracking while situating it in the larger context of how online ads, in general, are perceived by users. Via in-depth interviews with 53 Web users in India, we find that although concerns for third-party tracking and OBA remain noticeable amongst this population, other aspects of online advertising---like the possibility of being shown ads with embarrassing and suggestive content---are voiced as greater concerns than the concern of being tracked. Current-day blocking tools are insufficient to redress the situation: users demand selective filtering of ad content (as opposed to blocking out all ads) and are not satisfied with mechanisms that only control tracking and OBA. We conclude with design recommendations for enduser tools to control online ad consumption keeping in mind the concerns brought forth by our study.
ACM Transactions on Sensor Networks | 2008
Nisheeth Shrivastava; Subhash Suri; Csaba D. Tóth
We propose a low overhead scheme for detecting a network partition or cut in a sensor network. Consider a network S of n sensors, modeled as points in a two-dimensional plane. An /spl epsiv/-cut, for any 0</spl epsiv/<1, is a linear separation of /spl epsiv/n nodes in S from a distinguished node, the base station. We show that the base station can detect whenever an /spl epsiv/-cut occurs by monitoring the status of just O(1//spl epsiv/) nodes in the network. Our scheme is deterministic and it is free of false positives: no reported cut has size smaller than 1/2/spl epsiv/n. Besides this combinatorial result, we also propose efficient algorithms for finding the O(1//spl epsiv/) nodes that should act as sentinels, and report on our simulation results, comparing the sentinel algorithm with two natural schemes based on sampling.