Dinesh Agarwal | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Dinesh Agarwal is active.

Explore More

Publication

Featured researches published by Dinesh Agarwal.

international parallel and distributed processing symposium | 2012

A System for GIS Polygonal Overlay Computation on Linux Cluster - An Experience and Performance Report

Dinesh Agarwal; Satish Puri; Xi He; Sushil K. Prasad

GIS polygon-based (also know as vector-based) spatial data overlay computation is much more complex than raster data computation. Processing of polygonal spatial data files has been a long standing research question in GIS community due to the irregular and data intensive nature of the underlying computation. The state-of-the-art software for overlay computation in GIS community is still desktop-based. We present a cluster-based distributed solution for end-to-end polygon overlay processing, modeled after our Windows Azure cloud-based Crayons system [1]. We present the details of porting Crayons system to MPI-based Linux cluster and show the improvements made by employing efficient data structures such as R-trees. We present performance report and show the scalability of our system, along with the remaining bottlenecks. Our experimental results show an absolute speedup of 15x for end-to-end overlay computation employing up to 80 cores.

international parallel and distributed processing symposium | 2012

AzureBench: Benchmarking the Storage Services of the Azure Cloud Platform

Dinesh Agarwal; Sushil K. Prasad

Cloud computing is becoming mainstream for High Performance Computing (HPC) application development over the last few years. However, even though many vendors have rolled out their commercial cloud infrastructures, the service offerings are usually only best-effort based, without any performance guarantees. Cloud computing effectively saves the eScience developer the hassles of resource provisioning but utilization of these resources will be questionable if it can not meet the performance expectations of deployed applications. Furthermore, in order to make application design choices for a particular cloud offering, an eScience developer needs to understand the performance capabilities of the underlying cloud platform. Among all clouds, the emerging Azure cloud from Microsoft remains a challenge for HPC program development both due to lack of its support for traditional parallel programming support such as MPI and map-reduce and due to its evolving APIs. To aid the HPC developers, we present an open-source benchmark suite, Azure Bench, for Windows Azure cloud platform. We report comprehensive performance analysis of Azure cloud platforms storage services which are its primary artifacts for inter-processor coordination and communication. We also report on how much scalability Azure platform affords using up to 100 processors and point out various bottlenecks in parallel access of storage services. The paper also has pointers to overcome the steep learning curve for HPC application development over Azure. We also provide an open-source generic application framework that can be a starting point for application development for bag-of-task applications over Azure.

international conference on cloud computing | 2012

Lessons Learnt from the Development of GIS Application on Azure Cloud Platform

Dinesh Agarwal; Sushil K. Prasad

Spatial overlay processing is a widely used compute-intensive GIS application that involves aggregation of two or more layers of maps to facilitate intelligent querying on the collocated output data. When large GIS data sets are represented in polygonal (vector) form, spatial analysis runs for extended periods of time, which is undesirable for time-sensitive applications such as emergency response. We have, for the first time, created an open-architecture-based system named Crayons for Azure cloud platform using state-of-the-art techniques. During the course of development of Crayons system, we faced numerous challenges and gained invaluable insights into Azure cloud platform, which are presented in detail in this paper. The challenges range from limitations of cloud storage and computational services to the choices of tools and technologies used for high performance computing (HPC) application design. We report our findings to provide concrete guidelines to an eScience developer for 1) choice of persistent data storage mechanism, 2) data structure representation, 3) communication and synchronization among nodes, 4) building robust failsafe applications, and 5) optimal cost-effective utilization of resources. Our insights into each challenge faced, the solution to overcome it, and the discussion on the lessons learnt from each challenge can be of help to eScience developers starting application development on Azure and possibly other cloud platforms.

ieee international symposium on parallel & distributed processing, workshops and phd forum | 2013

MapReduce Algorithms for GIS Polygonal Overlay Processing

Satish Puri; Dinesh Agarwal; Xi He; Sushil K. Prasad

Polygon overlay is one of the complex operations in computational geometry. It is applied in many fields such as Geographic Information Systems (GIS), computer graphics and VLSI CAD. Sequential algorithms for this problem are in abundance in literature but there is a lack of distributed algorithms especially for MapReduce platform. In GIS, spatial data files tend to be large in size (in GBs) and the underlying overlay computation is highly irregular and compute intensive. The MapReduce paradigm is now standard in industry and academia for processing large-scale data. Motivated by the MapReduce programming model, we revisit the distributed polygon overlay problem and its implementation on MapReduce platform. Our algorithms are geared towards maximizing local processing and minimizing the communication overhead inherent with shuffle and sort phases in MapReduce. We have experimented with two data sets and achieved up to 22x speedup with dataset 1 using 64 CPU cores.

ieee international symposium on parallel & distributed processing, workshops and phd forum | 2013

AzureBOT: A Framework for Bag-of-Tasks Applications on the Azure Cloud Platform

Dinesh Agarwal; Sushil K. Prasad

Windows Azure is an emerging cloud platform that provides application developers with APIs to write scientific and commercial applications. However, the steep learning curve to understand the unique architecture of the cloud platforms in general and continuously changing Azure APIs specifically, make it difficult for the application developers to write cloud based applications. During our extensive experience with Azure cloud platform over the past few years, we have identified the need of a framework to abstract the complexities of working with the Azure cloud platform. Such a framework is essential for adoption of cloud technologies. Therefore, we have created AzureBOT-a framework for the Azure cloud platform to write bag-of-tasks class of distributed applications. Azure provides a straightforward and general interface that permits developers to concentrate on their application logic rather than cloud interaction. While we have implemented AzureBOT on Azure cloud platform, our framework design is generic to most of the cloud platforms. In this paper, we present the detailed design of our frameworks internal architecture, the APIs in brief, and the usability of our framework. We also discuss the implementation of two different applications and their scalability results over 100Azure worker processors.

ieee international conference on high performance computing, data, and analytics | 2012

Design and implementation of a parallel priority queue on many-core architectures

Xi He; Dinesh Agarwal; Sushil K. Prasad

An efficient parallel priority queue is at the core of the effort in parallelizing important non-numeric irregular computations such as discrete event simulation scheduling and branch-and-bound algorithms. GPGPUs can provide powerful computing platform for such non-numeric computations if an efficient parallel priority queue implementation is available. In this paper, aiming at fine-grained applications, we develop an efficient parallel heap system employing CUDA. To our knowledge, this is the first parallel priority queue implementation on many-core architectures, thus represents a breakthrough. By allowing wide heap nodes to enable thousands of simultaneous deletions of highest priority items and insertions of new items, and taking full advantage of CUDAs data parallel SIMT architecture, we demonstrate up to 30-fold absolute speedup for relatively fine-grained compute loads compared to optimized sequential priority queue implementation on fast multicores. Compared to this, our optimized multicore parallelization of parallel heap yields only 2–3 fold speedup for such fine-grained loads. This parallelization of a tree-based data structure on GPGPUs provides a roadmap for future parallelizations of other such data structures.

ieee international conference on high performance computing data and analytics | 2012

Poster: Crayons: An Azure Cloud Based Parallel System for GIS Overlay Operations

Dinesh Agarwal

Processing of extremely large polygonal (vector-based) spatial datasets has been a long-standing research challenge for scientists in the Geographic Information Systems and Science (GIS) community. Surprisingly, it is not for the lack of individual parallel algorithm; we discovered that the irregular and data intensive nature of the underlying processing is the main reason for the meager amount of work by way of system design and implementation. Furthermore, of all the systems reported in the literature, very few deal with the complexities of vector-based datasets and none, including commercial systems, on the cloud platform. We have designed and implemented an open-architecture-based system named Crayons for Windows Azure cloud platform using state-of-the-art techniques. We have implemented three different architectures of Crayons with different load balancing schemes. Crayons scales well for sufficiently large data sets, achieving end-to-end absolute speedup of over 28-fold employing 100 Azure processors. For smaller and more irregular workload, it still yields over 10-fold speedup.

international conference on parallel processing | 2012

Acceleration of Bilateral Filtering Algorithm for Manycore and Multicore Architectures

Dinesh Agarwal; Sami Wilf; Abinashi Dhungel; Sushil K. Prasad

Bilateral filtering is an ubiquitous tool for several kinds of image processing applications. This work explores multicore and many core accelerations for the embarrassingly parallel yet compute-intensive bilateral filtering kernel. For many core architectures, we have created a novel pair-symmetric algorithm to avoid redundant calculations. For multicore architectures, we improve the algorithm by use of low-level single instruction multiple data (SIMD) parallelism across multiple threads. We propose architecture specific optimizations, such as exploiting the unique capabilities of special registers available in modern multicore architectures and the rearrangement of data access patterns as per the computations to exploit special purpose instructions. We also propose optimizations pertinent to Nvidias Compute Unified Device Architecture (CUDA), including utilization of CUDAs implicit synchronization capability and the maximization of single-instruction-multiple-thread efficiency. We present empirical data on the performance gains we achieved over a variety of hardware architectures including Nvidia GTX 280, AMD Barcelona, AMD Shanghai, Intel Harper town, AMD Phenom, Intel Core i7 quad core, and Intel Nehalem 32 core machines. The best performance achieved was (i) 169-fold speedup by the CUDA-based implementation of our pair-symmetric algorithm running on Nvidias GTX 280 GPU compared to the compiler-optimized sequential code on Intel Core i7, and (ii) 38-fold speedup using 16 cores of AMD Barcelona each equipped with a 4-stage vector pipeline compared to the compiler-optimized sequential code running on the same machine.

cluster computing and the grid | 2014

Towards an MPI-Like Framework for the Azure Cloud Platform

Dinesh Agarwal; Sara Karamati; Satish Puri; Sushil K. Prasad

Message Passing Interface (MPI) has been the predominant standardized system for writing parallel and distributed applications. However, while MPI has been the software system of choice for traditional parallel and distributed computing platforms such as large compute clusters and Grid, MPI is not the system of choice for cloud platforms. The primary reasons for this is the lack of low latency high bandwidth network capabilities of the cloud platforms and the inherent architectural differences from traditional compute clusters. Prior studies suggest that the message latency of cloud platforms could be as much as 35x slower than that of an infiniband-connected cluster [1] for popular MPI implementations. MPI-like environment on cloud platforms is desirable for a large class of applications that run for long time spans with varying computing needs, such as the modeling and analysis to predict swath of a hurricane. Such applications could benefit from clouds resiliency and on-demand access for a robust and green solution. Interestingly, most of the cloud vendors provide APIs to access cloud resources in an efficient manner different than how an MPI implementation would avail of those resources. We have done extensive research to identify the pain-points for designing and implementing an MPI-like framework for cloud platforms. Our research has provided us with vital guidelines that we are sharing in this paper. We present the details of the key components required for such a framework along with our experience while implementing a preliminary MPI-like framework over Azure dubbed cloud MPI and evaluate its pros and cons. A large GIS application has been ported over cloud MPI to study its effectiveness and limitations.

ieee international symposium on parallel & distributed processing, workshops and phd forum | 2011

Memory Hierarchy Aware Parallel Priority Based Data Structures

Dinesh Agarwal

With the proliferation of multicore and manycore architectures, memory hierarchy plays an important role in realizing expected performance of memory intensive scientific applications. A vast majority of scientific applications require a priority based data structure to discriminate among available data elements, for instance, a priority based data structure is imperative for extracting the earliest events in a discrete event simulation, identifying urgent tasks in a parallel scheduler, or exploring most promising sub-problems in a state-space search. Traditional priority based data structures are tree-based that makes them cache unfriendly due to the exponentially increasing distance between parent and child nodes. Fine grained largescale applications further cause excessive contention among competing processors due to frequent updates. In this dissertation we propose a priority based data structure that adapts to the memory hierarchy of the underlying computer system. The top priority items are aggregated and kept into a working subset of data items that is readily accessible for processing.

Explore More