Parag Agrawal | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Parag Agrawal is active.

Explore More

Publication

Featured researches published by Parag Agrawal.

Operating Systems Review | 2010

The case for RAMClouds: scalable high-performance storage entirely in DRAM

John K. Ousterhout; Parag Agrawal; David Erickson; Christos Kozyrakis; Jacob Leverich; David Mazières; Subhasish Mitra; Aravind Narayanan; Guru M. Parulkar; Mendel Rosenblum; Stephen M. Rumble; Eric Stratmann; Ryan Stutsman

Disk-oriented approaches to online storage are becoming increasingly problematic: they do not scale gracefully to meet the needs of large-scale Web applications, and improvements in disk capacity have far outstripped improvements in access latency and bandwidth. This paper argues for a new approach to datacenter storage called RAMCloud, where information is kept entirely in DRAM and large-scale systems are created by aggregating the main memories of thousands of commodity servers. We believe that RAMClouds can provide durable and available storage with 100-1000x the throughput of disk-based systems and 100-1000x lower access latency. The combination of low latency and large scale will enable a new breed of dataintensive applications.

Communications of The ACM | 2011

The case for RAMCloud

John K. Ousterhout; Parag Agrawal; David Erickson; Christos Kozyrakis; Jacob Leverich; David Mazières; Subhasish Mitra; Aravind Narayanan; Diego Ongaro; Guru M. Parulkar; Mendel Rosenblum; Stephen M. Rumble; Eric Stratmann; Ryan Stutsman

With scalable high-performance storage entirely in DRAM, RAMCloud will enable a new breed of data-intensive applications.

very large data bases | 2008

Scheduling shared scans of large data files

Parag Agrawal; Daniel Kifer; Christopher Olston

We study how best to schedule scans of large data files, in the presence of many simultaneous requests to a common set of files. The objective is to maximize the overall rate of processing these files, by sharing scans of the same file as aggressively as possible, without imposing undue wait time on individual jobs. This scheduling problem arises in batch data processing environments such as Map-Reduce systems, some of which handle tens of thousands of processing requests daily, over a shared set of files. As we demonstrate, conventional scheduling techniques such as shortest-job-first do not perform well in the presence of cross-job sharing opportunities. We derive a new family of scheduling policies specifically targeted to sharable workloads. Our scheduling policies revolve around the notion that, all else being equal, it is good to schedule nonsharable scans ahead of ones that can share IO work with future jobs, if the arrival rate of sharable future jobs is expected to be high. We evaluate our policies via simulation over varied synthetic and real workloads, and demonstrate significant performance gains compared with conventional scheduling approaches.

international conference on management of data | 2009

Asynchronous view maintenance for VLSD databases

Parag Agrawal; Adam Silberstein; Brian F. Cooper; Utkarsh Srivastava; Raghu Ramakrishnan

The query models of the recent generation of very large scale distributed (VLSD) shared-nothing data storage systems, including our own PNUTS and others (e.g. BigTable, Dynamo, Cassandra, etc.) are intentionally simple, focusing on simple lookups and scans and trading query expressiveness for massive scale. Indexes and views can expand the query expressiveness of such systems by materializing more complex access paths and query results. In this paper, we examine mechanisms to implement indexes and views in a massive scale distributed database. For web applications, minimizing update latencies is critical, so we advocate deferring the work of maintaining views and indexes as much as possible. We examine the design space, and conclude that two types of view implementations, called remote view tables (RVTs) and local view tables (LVTs), provide good tradeoff between system throughput and minimizing view staleness. We describe how to construct and maintain such view tables, and how they can be used to implement indexes, group-by-aggregate views, equijoin views and selection views. We also introduce and analyze a consistency model that makes it easier for application developers to cope with the impact of deferred view maintenance. An empirical evaluation quantifies the maintenance costs of our views, and shows that they can significantly improve the cost of evaluating complex queries.

very large data bases | 2010

Foundations of uncertain-data integration

Parag Agrawal; Anish Das Sarma; Jeffrey D. Ullman; Jennifer Widom

There has been considerable past work studying data integration and uncertain data in isolation. We develop the foundations for local-as-view (LAV) data integration when the sources being integrated are uncertain. We motivate two distinct settings for uncertain-data integration. We then define containment of uncertain databases in these settings, which allows us to express uncertain sources as views over a virtual mediated uncertain database. Next, we define consistency of a set of uncertain sources and show intractability of consistency-checking. We identify an interesting special case for which consistency-checking is polynomial. Finally, the notion of certain answers from traditional LAV data integration does not generalize to the uncertain setting, so we define a corresponding notion of correct answers.

very large data bases | 2006