Rina Panigrahy | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Rina Panigrahy is active.

Explore More

Publication

Featured researches published by Rina Panigrahy.

symposium on the theory of computing | 1997

Consistent hashing and random trees: distributed caching protocols for relieving hot spots on the World Wide Web

David R. Karger; Eric Lehman; Tom Leighton; Rina Panigrahy; Matthew S. Levine; Daniel M. Lewin

We describe a family of caching protocols for distrib-uted networks that can be used to decrease or eliminate the occurrence of hot spots in the network. Our protocols are particularly designed for use with very large networks such as the Internet, where delays caused by hot spots can be severe, and where it is not feasible for every server to have complete information about the current state of the entire network. The protocols are easy to implement using existing network protocols such as TCP/IP, and require very little overhead. The protocols work with local control, make efficient use of existing resources, and scale gracefully as the network grows. Our caching protocols are based on a special kind of hashing that we call consistent hashing. Roughly speaking, a consistent hash function is one which changes minimally as the range of the function changes. Through the development of good consistent hash functions, we are able to develop caching protocols which do not require users to have a current or even consistent view of the network. We believe that consistent hash functions may eventually prove to be useful in other applications such as distributed name servers and/or quorum systems.

acm special interest group on data communication | 2008

Spamming botnets: signatures and characteristics

Yinglian Xie; Fang Yu; Kannan Achan; Rina Panigrahy; Geoff Hulten; Ivan Osipkov

In this paper, we focus on characterizing spamming botnets by leveraging both spam payload and spam server traffic properties. Towards this goal, we developed a spam signature generation framework called AutoRE to detect botnet-based spam emails and botnet membership. AutoRE does not require pre-classified training data or white lists. Moreover, it outputs high quality regular expression signatures that can detect botnet spam with a low false positive rate. Using a three-month sample of emails from Hotmail, AutoRE successfully identified 7,721 botnet-based spam campaigns together with 340,050 unique botnet host IP addresses. Our in-depth analysis of the identified botnets revealed several interesting findings regarding the degree of email obfuscation, properties of botnet IP addresses, sending patterns, and their correlation with network scanning traffic. We believe these observations are useful information in the design of botnet detection schemes.

international conference on database theory | 2005

Anonymizing tables

Gagan Aggarwal; Tomás Feder; Krishnaram Kenthapadi; Rajeev Motwani; Rina Panigrahy; Dilys Thomas; An Zhu

We consider the problem of releasing tables from a relational database containing personal records, while ensuring individual privacy and maintaining data integrity to the extent possible. One of the techniques proposed in the literature is k-anonymization. A release is considered k-anonymous if the information for each person contained in the release cannot be distinguished from at least k–1 other persons whose information also appears in the release. In the k-Anonymityproblem the objective is to minimally suppress cells in the table so as to ensure that the released version is k-anonymous. We show that the k-Anonymity problem is NP-hard even when the attribute values are ternary. On the positive side, we provide an O(k)-approximation algorithm for the problem. This improves upon the previous best-known O(klog k)-approximation. We also give improved positive results for the interesting cases with specific values of k — in particular, we give a 1.5-approximation algorithm for the special case of 2-Anonymity, and a 2-approximation algorithm for 3-Anonymity.

european symposium on algorithms | 2006

An improved construction for counting bloom filters

Flavio Bonomi; Michael Mitzenmacher; Rina Panigrahy; Sushil Singh; George Varghese

A counting Bloom filter (CBF) generalizes a Bloom filter data structure so as to allow membership queries on a set that can be changing dynamically via insertions and deletions. As with a Bloom filter, a CBF obtains space savings by allowing false positives. We provide a simple hashing-based alternative based on d-left hashing called a d-left CBF (dlCBF). The dlCBF offers the same functionality as a CBF, but uses less space, generally saving a factor of two or more. We describe the construction of dlCBFs, provide an analysis, and demonstrate their effectiveness experimentally.

IEEE Transactions on Information Theory | 2005

The smallest grammar problem

Moses Charikar; Eric Lehman; Ding Liu; Rina Panigrahy; Manoj Prabhakaran; Amit Sahai; Abhi Shelat

This paper addresses the smallest grammar problem: What is the smallest context-free grammar that generates exactly one given string /spl sigma/? This is a natural question about a fundamental object connected to many fields such as data compression, Kolmogorov complexity, pattern identification, and addition chains. Due to the problems inherent complexity, our objective is to find an approximation algorithm which finds a small grammar for the input string. We focus attention on the approximation ratio of the algorithm (and implicitly, the worst case behavior) to establish provable performance guarantees and to address shortcomings in the classical measure of redundancy in the literature. Our first results are concern the hardness of approximating the smallest grammar problem. Most notably, we show that every efficient algorithm for the smallest grammar problem has approximation ratio at least 8569/8568 unless P=NP. We then bound approximation ratios for several of the best known grammar-based compression algorithms, including LZ78, B ISECTION, SEQUENTIAL, LONGEST MATCH, GREEDY, and RE-PAIR. Among these, the best upper bound we show is O(n/sup 1/2/). We finish by presenting two novel algorithms with exponentially better ratios of O(log/sup 3/n) and O(log(n/m/sup */)), where m/sup */ is the size of the smallest grammar for that input. The latter algorithm highlights a connection between grammar-based compression and LZ77.

symposium on the theory of computing | 2003

Better streaming algorithms for clustering problems

Moses Charikar; Liadan O'Callaghan; Rina Panigrahy

We study clustering problems in the streaming model, where the goal is to cluster a set of points by making one pass (or a few passes) over the data using a small amount of storage space. Our main result is a randomized algorithm for the k--Median problem which produces a constant factor approximation in one pass using storage space O(k poly log n). This is a significant improvement of the previous best algorithm which yielded a 2O(1/ε) approximation using O(nε) space. Next we give a streaming algorithm for the k--Median problem with an arbitrary distance function. We also study algorithms for clustering problems with outliers in the streaming model. Here, we give bicriterion guarantees, producing constant factor approximations by increasing the allowed fraction of outliers slightly.

symposium on discrete algorithms | 2006

Entropy based nearest neighbor search in high dimensions

Rina Panigrahy

In this paper we study the problem of finding the approximate nearest neighbor of a query point in the high dimensional space, focusing on the Euclidean space. The earlier approaches use locality-preserving hash functions (that tend to map nearby points to the same value) to construct several hash tables to ensure that the query point hashes to the same bucket as its nearest neighbor in at least one table. Our approach is different - we use one (or a few) hash table and hash several randomly chosen points in the neighborhood of the query point showing that at least one of them will hash to the bucket containing its nearest neighbor. We show that the number of randomly chosen points in the neighborhood of the query point q required depends on the entropy of the hash value h(p) of a random point p at the same distance from q at its nearest neighbor, given q and the locality preserving hash function h chosen randomly from the hash family. Precisely, we show that if the entropy I(h(p)|q, h) = M and g is a bound on the probability that two far-off points will hash to the same bucket, then we can find the approximate nearest neighbor in O(np) time and near linear Õ(n) space where p = M/log(1/g). Alternatively we can build a data structure of size Õ(n1/(1-p)) to answer queries in Õ(d) time. By applying this analysis to the locality preserving hash functions in [17, 21, 6] and adjusting the parameters we show that the c nearest neighbor can be computed in time Õ(np) and near linear space where p ≈ 2.06/c as c becomes large.

high performance interconnects | 2002

Reducing TCAM power consumption and increasing throughput

Rina Panigrahy; Samar Sharma

TCAMs have been an emerging technology for packet forwarding in the networking industry. They are fast and easy to use. However, due to their inherent parallel structure they consume high power - much higher than SRAMs or DRAMs. A system using four TCAMs could consume upto 60 watts. The power issue is one of the chief disadvantages of TCAMs over RAM based methods for forwarding. For a system using multiple TCAMs we present methods to significantly reduce TCAM power consumption for forwarding, making it comparable to RAM based forwarding solutions. Using our techniques one can use a TCAM for forwarding at 3 to 4 watts worst case. Our techniques also have an interesting connotation to TCAM forwarding rates. For a static distribution of requests we present methods that make the forwarding rate of a system proportional to the number of TCAMs. So if a system has four TCAMs, one could achieve a four fold performance of that of a single TCAM for a static distribution of requests.

symposium on principles of database systems | 2008

Estimating PageRank on graph streams

Atish Das Sarma; Sreenivas Gollapudi; Rina Panigrahy

This study focuses on computations on large graphs (e.g., the web-graph) where the edges of the graph are presented as a stream. The objective in the streaming model is to use small amount of memory (preferably sub-linear in the number of nodes n) and a few passes. In the streaming model, we show how to perform several graph computations including estimating the probability distribution after a random walk of length l, mixing time, and the conductance. We estimate the mixing time M of a random walk in Õ(nα+Mα√n+√Mn/ α) space and Õ(√Mα) passes. Furthermore, the relation between mixing time and conductance gives us an estimate for the conductance of the graph. By applying our algorithm for computing probability distribution on the web-graph, we can estimate the PageRank p of any node up to an additive error of √εp in Õ(√M/α) passes and Õ(min(nα + 1/ε √M/α + 1/ε Mα, αn√Mα + 1/ε √M/α)) space, for any α ∈ (0, 1]. In particular, for ε = M/n, by setting α = M--1/2, we can compute the approximate PageRank values in Õ(nM--1/4) space and Õ(M3/4) passes. In comparison, a standard implementation of the PageRank algorithm will take O(n) space and O(M) passes.

SIAM Journal on Discrete Mathematics | 2008