Rajiv Wickremesinghe
Duke University
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Rajiv Wickremesinghe.
data compression conference | 2002
Keith H. Randall; Raymie Stata; Rajiv Wickremesinghe; Janet L. Wiener
The Connectivity Server is a special-purpose database whose schema models the Web as a graph: a set of nodes (URL) connected by directed edges (hyperlinks). The Link Database provides fast access to the hyperlinks. To support easy implementation of a wide range of graph algorithms we have found it important to fit the Link Database into RAM. In the first version of the Link Database, we achieved this fit by using machines with lots of memory (8 GB), and storing each hyperlink in 32 bits. However, this approach was limited to roughly 100 million Web pages. This paper presents techniques to compress the links to accommodate larger graphs. Our techniques combine well-known compression methods with methods that depend on the properties of the Web graph. The first compression technique takes advantage of the fact that most hyperlinks on most Web pages point to other pages on the same host as the page itself. The second technique takes advantage of the fact that many pages on the same host share hyperlinks, that is, they tend to point to a common set of pages. Together, these techniques reduce space requirements to under 6 bits per link. While (de)compression adds latency to the hyperlink access time, we can still compute the strongly connected components of a 6 billion-edge graph in 22 minutes and run applications such as Kleinbergs HITS in real time. This paper describes our techniques for compressing the Link Database, and provides performance numbers for compression ratios and decompression speed.
Geoinformatica | 2003
Lars Arge; Jeffrey S. Chase; Patrick N. Halpin; Laura Toma; Jeffrey Scott Vitter; Dean L. Urban; Rajiv Wickremesinghe
As detailed terrain data becomes available, GIS terrain applications target larger geographic areas at finer resolutions. Processing the massive datasets involved in such applications presents significant challenges to GIS systems and demands algorithms that are optimized for both data movement and computation. In this paper we present efficient algorithms for flow routing on massive grid terrain datasets, extending our previous work on flow accumulation. Our algorithms are developed in the framework of external memory algorithms and use I/O-techniques to achieve efficiency. We have implemented the algorithms in the Terraflow system, which is the first comprehensive terrain flow software system designed and optimized for massive data. We compare the performance of Terraflow with that of state-of-the-art commercial and open-source GIS systems. On large terrains, Terraflow outperforms existing systems by a factor of 2 to 1,000, and is capable of solving problems no system was previously able to solve.
high performance distributed computing | 2002
Rajiv Wickremesinghe; Jeffrey S. Chase; Jeffrey Scott Vitter
One approach to high-performance processing of massive data sets is to incorporate computation into storage systems. Previous work has shown that this active storage model is effective for a variety of problems. This paper explores opportunities to use active storage as a basis for exploiting asymmetric parallelism in applications using a streaming computation model on collections of fixed-size records. This model is the basis for much of the research in I/O-efficient algorithms, which deals with an important class of massive data problems not studied in previous work on active storage. We present an extension of a streaming computation model for an external memory toolkit to support a flexible mapping of computations to storage-based processors. Our approach enables load-managed active storage: it exposes parallelism, ordering constraints, and primitive computation units to the system, which can configure the application to balance load and make the best use of available processing power Emulation results from a sorting application demonstrate the potential of dynamic adaptation in load-managed active storage.
ACM Journal of Experimental Algorithms | 2002
Rajiv Wickremesinghe; Lars Arge; Jeffrey S. Chase; Jeffrey Scott Vitter
Modern computer systems have increasingly complex memory systems. Common machine models for algorithm analysis do not reflect many of the features of these systems, e.g., large register sets, lockup-free caches, cache hierarchies, associativity, cache line fetching, and streaming behavior. Inadequate models lead to poor algorithmic choices and an incomplete understanding of algorithm behavior on real machines.A key step toward developing better models is to quantify the performance effects of features not reflected in the models. This paper explores the effect of memory system features on sorting performance. We introduce a new cache-conscious sorting algorithm, R-MERGE, which achieves better performance in practice over algorithms that are superior in the theoretical models. R-MERGE is designed to minimize memory stall cycles rather than cache misses by considering features common to many system designs.
advances in geographic information systems | 2001
Laura Toma; Rajiv Wickremesinghe; Lars Arge; Jeffery S. Chase; Jeffery Scott Vitter; Patrick N. Halpin; Dean L. Urban
As detailed terrain becomes available, GIS applications target larger geographic areas at finer resolutions. Processing the massive data presents significant challenges to GIS systems and demands algorithms that are optimized for both data movement and computation.In this paper we develop effcient algorithms for flow routing on massive terrains, extending our previous work on flow accumulation. Our implementations of these algorithms constitute the first comprehensive terrain flow software system designed and optimized for massive data. We compare the performance of our system, called TERRAFLOW, with that of state of the art commercial and open-source GIS systems. On large terrains, TERRAFLOW outpreforms existing systems by a factor of 2 to 1000, and is capable of solving problems of a scope and scale that are impractical with previous algorithms.
ACM Transactions on Storage | 2009
Stergios V. Anastasiadis; Rajiv Wickremesinghe; Jeffrey S. Chase
Whole-file transfer is a basic primitive for Internet content dissemination. Content servers are increasingly limited by disk arm movement, given the rapid growth in disk density, disk transfer rates, server network bandwidth, and content size. Individual file transfers are sequential, but the block access sequence on a content server is effectively random when many slow clients access large files concurrently. Although larger blocks can help improve disk throughput, buffering requirements increase linearly with block size. This article explores a novel block reordering technique that can reduce server disk traffic significantly when large content files are shared. The idea is to transfer blocks to each client in any order that is convenient for the server. The server sends blocks to each client opportunistically in order to maximize the advantage from the disk reads it issues to serve other clients accessing the same file. We first illustrate the motivation and potential impact of aggressive block reordering using simple analytical models. Then we describe a file transfer system using a simple block reordering algorithm, called Circus. Experimental results with the Circus prototype show that it can improve server throughput by a factor of two or more in workloads with strong file access locality.
high performance distributed computing | 2005
Stergios V. Anastasiadis; Rajiv Wickremesinghe; Jeffrey S. Chase
In the present paper, we examine the problem of supporting application-specific computation within a network file server. Our objectives are (i) to introduce an easy to use yet powerful architecture for executing both custom-developed and legacy applications close to the stored data, (ii) to investigate the performance improvement that we get from data proximity in I/O-intensive processing, and (in) to exploit the I/O-traffic information available within the file server for more effective resource management. One main difference from previous active storage research is our emphasis on the expressive power and usability of the network server interface. We describe an extensible active storage framework that we built in order to demonstrate the feasibility of the proposed system design. We show that accessing large datasets over a wide-area network through a regular file system can penalize the system performance, unless application computation is moved close to the stored data. Our conclusions are substantiated through experimentation with a popular multilayer map warehouse application.
ACM Journal of Experimental Algorithms | 2008
Thomas Hazel; Laura Toma; Jan Vahrenhold; Rajiv Wickremesinghe
This paper addresses the problem of computing least-cost-path surfaces for massive grid terrains. Consider a grid terrain T and let C be a cost grid for T such that every point in C stores a value that represents the cost of traversing the corresponding point in T. Given C and a set of sources S ∈ T, a least-cost-path grid Δ for T is a grid such that every point in Δ represents the distance to the source in S that can be reached with minimal cost. We present a scalable approach to computing least-cost-path grids. Our algorithm, terracost, is derived from our previous work on I/O-efficient shortest paths on grids and uses O(sort(n)) I/Os, where sort(n) is the complexity of sorting n items of data in the I/O-model of Aggarwal and Vitter. We present the design, the analysis, and an experimental study of terracost. An added benefit of the algorithm underlying terracost is that it naturally lends itself to parallelization. We have implemented terracost in a distributed environment using our cluster management tool and report on experiments that show that it obtains speedup near-linear with the size of the cluster. To the best of our knowledge, this is the first experimental evaluation of a multiple-source least-cost-path algorithm in the external memory setting.
acm symposium on applied computing | 2006
Thomas Hazel; Laura Toma; Jan Vahrenhold; Rajiv Wickremesinghe
This paper addresses the problem of computing least-cost-path surfaces for massive grid-based terrains. Our approach follows a modular design, enabling the algorithm to make efficient use of memory, disk, and grid computing environments. We have implemented the algorithm in the context of the GRASS open source GIS system and---using our cluster management tool---in a distributed environment. We report experimental results demonstrating that the algorithm is not only of theoretical and conceptual interest but also performs well in practice. Our implementation outperforms standard solutions as dataset size increases relative to available memory and our distributed solver obtains near-linear speedup when preprocessing large terrains for multiple queries.
Lecture Notes in Computer Science | 2000
Lars Arge; Jeffrey S. Chase; Jeffrey Scott Vitter; Rajiv Wickremesinghe
Modern computer systems have increasingly complex memory systems. Common machine models for algorithm analysis do not reflect many of the features of these systems, e.g., large register sets, lockup-free caches, cache hierarchies, associativity, cache line fetching, and streaming behavior. Inadequate models lead to poor algorithmic choices and an incomplete understanding of algorithm behavior on real machines. A key step toward developing better models is to quantify the performance effects of features not reflected in the models. This paper explores the effect of memory system features on sorting performance. We introduce a new cache-conscious sorting algorithm, R-merge, which achieves better performance in practice over algorithms that are theoretically superior under the models. R-merge is designed to minimize memory stall cycles rather than cache misses, considering features common to many system designs.