Mayank Bawa | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Mayank Bawa is active.

Explore More

Publication

Featured researches published by Mayank Bawa.

international conference on management of data | 2000

Turbo-charging vertical mining of large databases

Pradeep Shenoy; Jayant R. Haritsa; S. Sudarshan; Gaurav Bhalotia; Mayank Bawa; Devavrat Shah

In a vertical representation of a market-basket database, each item is associated with a column of values representing the transactions in which it is present. The association-rule mining algorithms that have been recently proposed for this representation show performance improvements over their classical horizontal counterparts, but are either efficient only for certain database sizes, or assume particular characteristics of the database contents, or are applicable only to specific kinds of database schemas. We present here a new vertical mining algorithm called VIPER, which is general-purpose, making no special requirements of the underlying database. VIPER stores data in compressed bit-vectors called “snakes” and integrates a number of novel optimizations for efficient snake generation, intersection, counting and storage. We analyze the performance of VIPER for a range of synthetic database workloads. Our experimental results indicate significant performance gains, especially for large databases, over previously proposed vertical and horizontal mining algorithms. In fact, there are even workload regions where VIPER outperforms an optimal, but practically infeasible, horizontal mining algorithm.

very large data bases | 2004

Online balancing of range-partitioned data with applications to peer-to-peer systems

Prasanna Ganesan; Mayank Bawa; Hector Garcia-Molina

We consider the problem of horizontally partitioning a dynamic relation across a large number of disks/nodes by the use of range partitioning. Such partitioning is often desirable in large-scale parallel databases, as well as in peer-to-peer (P2P) systems. As tuples are inserted and deleted, the partitions may need to be adjusted, and data moved, in order to achieve storage balance across the participant disks/nodes. We propose efficient, asymptotically optimal algorithms that ensure storage balance at all times, even against an adversarial insertion and deletion of tuples. We combine the above algorithms with distributed routing structures to architect a P2P system that supports efficient range queries, while simultaneously guaranteeing storage balance.

international world wide web conferences | 2005

LSH forest: self-tuning indexes for similarity search

Mayank Bawa; Tyson Condie; Prasanna Ganesan

We consider the problem of indexing high-dimensional data for answering (approximate) similarity-search queries. Similarity indexes prove to be important in a wide variety of settings: Web search engines desire fast, parallel, main-memory-based indexes for similarity search on text data; database systems desire disk-based similarity indexes for high-dimensional data, including text and images; peer-to-peer systems desire distributed similarity indexes with low communication cost. We propose an indexing scheme called LSH Forest which is applicable in all the above contexts. Our index uses the well-known technique of locality-sensitive hashing (LSH), but improves upon previous designs by (a) eliminating the different data-dependent parameters for which LSH must be constantly hand-tuned, and (b) improving on LSHs performance guarantees for skewed data distributions while retaining the same storage and query overhead. We show how to construct this index in main memory, on disk, in parallel systems, and in peer-to-peer systems. We evaluate the design with experiments on multiple text corpora and demonstrate both the self-tuning nature and the superior performance of LSH Forest.

international acm sigir conference on research and development in information retrieval | 2003

SETS: search enhanced by topic segmentation

Mayank Bawa; Gurmeet Singh Manku; Prabhakar Raghavan

We present SETS, an architecture for efficient search in peer-to-peer networks, building upon ideas drawn from machine learning and social network theory. The key idea is to arrange participating sites in a topic-segmented overlay topology in which most connections are short-distance, connecting pairs of sites with similar content. Topically focused sets of sites are then joined together into a single network by long-distance links. Queries are matched and routed to only the topically closest regions. We discuss a variety of design issues and tradeoffs that an implementor of SETS would face. We show that SETS is efficient in network traffic and query processing load.

international conference on management of data | 2004

The price of validity in dynamic networks

Mayank Bawa; Aristides Gionis; Hector Garcia-Molina; Rajeev Motwani

Massive-scale self-administered networks like Peer-to-Peer and Sensor Networks have data distributed across thousands of participant hosts. These networks are highly dynamic with short-lived hosts being the norm rather than an exception. In recent years, researchers have investigated best-effort algorithms to efficiently process aggregate queries (e.g., sum, count, average, minimum and maximum) [6, 13, 21, 34, 35, 37] on these networks. Unfortunately, query semantics for best-effort algorithms are ill-defined, making it hard to reason about guarantees associated with the result returned. In this paper, we specify a correctness condition, single-site validity, with respect to which the above algorithms are best-effort. We present a class of algorithms that guarantee validity in dynamic networks. Experiments on real-life and synthetic network topologies validate performance of our algorithms, revealing the hitherto unknown price of validity.

very large data bases | 2003

Privacy-preserving indexing of documents on the network

Mayank Bawa; Roberto J. Bayardo; Rakesh Agrawal

With the ubiquitous collection of data and creation of large distributed repositories, enabling search over this data while respecting access control is critical. A related problem is that of ensuring privacy of the content owners while still maintaining an efficient index of distributed content. We address the problem of providing privacy-preserving search over distributed access-controlled content. Indexed documents can be easily reconstructed from conventional (inverted) indexes used in search. Currently, the need to avoid breaches of access-control through the index requires the index hosting site to be fully secured and trusted by all participating content providers. This level of trust is impractical in the increasingly common case where multiple competing organizations or individuals wish to selectively share content. We propose a solution that eliminates the need of such a trusted authority. The solution builds a centralized privacy-preserving index in conjunction with a distributed access-control enforcing search protocol. Two alternative methods to build the centralized index are proposed, allowing trade offs of efficiency and security. The new index provides strong and quantifiable privacy guarantees that hold even if the entire index is made public. Experiments on a real-life dataset validate performance of the scheme. The appeal of our solution is twofold: (a) content providers maintain complete control in defining access groups and ensuring its compliance, and (b) system implementors retain tunable knobs to balance privacy and efficiency concerns for their particular domains.

international world wide web conferences | 2003

Make it fresh, make it quick: searching a network of personal webservers

Mayank Bawa; Roberto J. Bayardo; Sridhar Rajagopalan; Eugene J. Shekita

Personal webservers have proven to be a popular means of sharing files and peer collaboration. Unfortunately, the transient availability and rapidly evolving content on such hosts render centralized, crawl-based search indices stale and incomplete. To address this problem, we propose YouSearch, a distributed search application for personal webservers operating within a shared context (e.g., a corporate intranet). With YouSearch, search results are always fast, fresh and complete -- properties we show arise from an architecture that exploits both the extensive distributed resources available at the peer webservers in addition to a centralized repository of summarized network state. YouSearch extends the concept of a shared context within web communities by enabling peers to aggregate into groups and users to search over specific groups. In this paper, we describe the challenges, design, implementation and experiences with a successful intranet deployment of YouSearch.

very large data bases | 2004

Vision paper: enabling privacy for the paranoids

Gagan Aggarwal; Mayank Bawa; Prasanna Ganesan; Hector Garcia-Molina; Krishnaram Kenthapadi; Nina Mishra; Rajeev Motwani; Utkarsh Srivastava; Dilys Thomas; Jennifer Widom; Ying Xu

P3P [23, 24] is a set of standards that allow corporations to declare their privacy policies. Hippocratic Databases [6] have been proposed to implement such policies within a corporations datastore. From an end-user individuals point of view, both of these rest on an uncomfortable philosophy of trusting corporations to protect his/her privacy. Recent history chronicles several episodes when such trust has been willingly or accidentally violated by corporations facing bankruptcy courts, civil subpoenas or lucrative mergers. We contend that data management solutions for information privacy must restore controls in the individuals hands. We suggest that enabling such control will require a radical re-think on modeling, release, and management of personal data.

acm special interest group on data communication | 2003

Transience of peers & streaming media

Mayank Bawa; Hrishikesh Deshpande; Hector Garcia-Molina

Application level multicast schemes have traditionally been evaluated with respect to the efficiency penalties incurred in migrating the multicast functionality from the network layer to the application layer. We argue that the current performance measures, and therefore design strategies, are incomplete as they do not consider transience of peers. The routers in application level multicast systems are participant clients, and not infrastructure units. As such, the assumptions on the behavior of these application routers are significantly different from the infrastructure routing units that traditional research has dealt with, especially in a peer-to-peer setting where peers are multi-use and the management is decentralized. we argue that the transience in peer behavior has implications on end-performance enabled. We outline a design philosophy that seeks to separate policy decisions in handling peer behavior from the end-application at a basic infrastructural peering layer. As a proof of concept, we have implemented a peering layer prototype, which is available for download.

international conference on management of data | 2003

Peer-to-peer research at Stanford

Mayank Bawa; Brian F. Cooper; Arturo Crespo; Neil Daswani; Prasanna Ganesan; Hector Garcia-Molina; Sepandar D. Kamvar; Sergio Marti; Mario T. Schlosser; Qi Sun; Patrick Vinograd; Beverly Yang

n this paper we present recent and ongoing research projects of the Peers research group at Stanford University.

Explore More