Vikas Aggarwal | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Vikas Aggarwal is active.

Explore More

Publication

Featured researches published by Vikas Aggarwal.

ieee aerospace conference | 2006

High Performance Dependable Multiprocessor II

Jeremy Ramos; John Samson; David Lupia; Ian A. Troxel; R. Subramaniyan; Adam Jacobs; James Greco; G. Cieslewski; J. Curreri; M. Fischer; E. Grobelny; Alan D. George; Vikas Aggarwal; M. Patel; Raphael R. Some

With the ever-increasing demand for higher bandwidth and processing capacity of todays space exploration, space science, and defense missions, the ability to efficiently apply commercial-off-the-shelf (COTS) processors for on-board computing has become a critical need. In response to this need, NASAs new millennium program (NMP) office commissioned the development of dependable multiprocessor (DM) technology for use in science and autonomy missions, but the technology is also applicable to a wide variety of DoD missions. The goal of the DM project is to provide spacecraft/payload processing capability 10x -100x what is available today, enabling heretofore unrealizable levels of science and autonomy. DM technology is being developed as part of the NMP ST8 (space technology 8) project. The objective of this NMP ST8 effort is to combine high-performance, fault tolerant, COTS-based cluster processing and fault tolerant middleware in an architecture and software framework capable of supporting a wide variety of mission applications. Dependable multiprocessor development is continuing as one of the four selected ST8 flight experiments planned to be flown in 2009.

ieee aerospace conference | 2006

Technology Validation: NMP ST8 Dependable Multiprocessor Project II

John Samson; Gary R. Gardner; David Lupia; Minesh Patel; Paul Davis; Vikas Aggarwal; Alan D. George; Zbigniew Kalbarcyzk; Rafi Some

With the ever-increasing demand for higher bandwidth and processing capacity of todays space exploration, space science, and defense missions, the ability to efficiently apply Commercial-Off-The-Shelf (COTS) processors for on-board computing has become a critical need. In response to this need, NASAs New Millennium Program (NMP) commissioned the development of Dependable Multiprocessor (DM) technology for use in science and autonomy missions, but the technology is also applicable to a wide variety of DoD missions. The goal of the DM project is to provide spacecraft/payload processing capability lOx -lOOx what is available today, enabling heretofore unrealizable levels of science and autonomy. DM technology is being developed as part of the NMP ST8 (Space Technology 8) project. The objective of this NMP ST8 effort is to combine high-performance, fault tolerant, COTS-based cluster processing and fault tolerant middleware in an architecture and software framework capable of supporting a wide variety of mission applications. Dependable Multiprocessor development is continuing as one of the four selected ST8 flight experiments planned to be flown in 2009.

international workshop on high performance reconfigurable computing technology and applications | 2009

SCF: a device- and language-independent task coordination framework for reconfigurable, heterogeneous systems

Vikas Aggarwal; Rafael Garcia; Greg Stitt; Alan D. George; Herman Lam

Heterogeneous computing systems comprised of accelerators such as FPGAs, GPUs, and Cell processors coupled with standard microprocessors are becoming an increasingly popular solution to building future computing systems. Although programming languages and tools have evolved to simplify device-level design, programming such systems is still difficult and time-consuming due to system-level challenges involving synchronization and communication between heterogeneous devices, which currently require ad-hoc solutions. To solve this problem, this paper presents the System-Level Coordination Framework (SCF), which enables transparent communication and synchronization between tasks running on heterogeneous processing devices in the system. By hiding low-level architectural details from the application designer, SCF can improve application development productivity, provide higher levels of application portability, and offer rapid design-space exploration of different task/device mappings. In addition, SCF enables custom communication synthesis, which can provide performance improvements over generic solutions employed previously.

field programmable gate arrays | 2006

Reconfigurable computing with multiscale data fusion for remote sensing

Vikas Aggarwal; Alan D. George; K.C. Slatton

Recent advances in sensor technologies have resulted in tremendous increases in the amount of data collected for imaging applications such as airborne and space-based remote sensing of the Earth. Data acquisition and dissemination systems need to perform more processing than ever before to support real-time applications and reduce bandwidth demands on the downlink. FPGA-based reconfigurable computing systems are emerging as cost-effective solutions that offer enormous computation potential in the embedded systems arena. Research in this paper explores the potential capability offered by deploying reconfigurable computing systems in a remote sensing system by means of a commonly employed application. Multiple designs for a multiscale data-fusion algorithm were developed for an FPGA-based platform. These designs are used to demonstrate speedup over processor-based solutions and study the demands posed by such applications upon the system. Due to the vast number of sensors inputs, such applications pose high demands on the memory capacity and bandwidth which becomes a critical factor in determining the overall system performance. Results of our experiments depict that over an order of magnitude improvement can be obtained with efficient designs and appropriate hardware resources. Projections of enhanced performance with emerging system architectures are also presented.

international workshop on high performance reconfigurable computing technology and applications | 2009

Bridging parallel and reconfigurable computing with multilevel PGAS and SHMEM

Vikas Aggarwal; Alan D. George; Kishore Yalamanchili; Changil Yoon; Herman Lam; Greg Stitt

Reconfigurable computing (RC) systems based on FPGAs are becoming an increasingly attractive solution to building parallel systems of the future. Applications targeting such systems have demonstrated superior performance and reduced energy consumption versus their traditional counterparts based on microprocessors. However, most of such work has been limited to small system sizes. Unlike traditional HPC systems, lack of integrated, system-wide, parallel-programming models and languages presents a significant design challenge for creating applications targeting scalable, reconfigurable HPC systems. In this paper, we introduce and investigate a novel programming model based on Partitioned Global Address Space (PGAS), which simplifies development of parallel applications for such systems. The new multilevel PGAS programming model captures the unique characteristics of these systems, such as the existence of multiple levels of memory hierarchy and heterogeneous computation resources. To evaluate this multilevel PGAS model, we extend and adapt the SHMEM programming language to become what we call SHMEM+, the first known SHMEM library enabling coordination between FPGAs and CPUs in a reconfigurable, heterogeneous HPC system. Our design of SHMEM+ is highly portable and provides peak communication bandwidth comparable to vendor-proprietary versions of SHMEM. In addition, applications designed with SHMEM+ yield improved developer productivity compared to current methods of multi-device RC design and achieve a high degree of portability.

IEEE Design & Test of Computers | 2011

An End-to-End Tool Flow for FPGA-Accelerated Scientific Computing

Greg Stitt; Alan D. George; Herman Lam; Casey Reardon; Melissa C. Smith; Brian Holland; Vikas Aggarwal; Gongyu Wang; James Coole; Seth Koehler

As part of their ongoing work with the National Science Foundation (NSF) Center for High-Performance Reconfigurable Computing (CHREC), the authors are developing a complete tool chain for FPGA-based acceleration of scientific computing, from early-stage assessment of applications down to rapid routing. This article provides an overview of this tool chain.

ACM Transactions on Reconfigurable Technology and Systems | 2011

SHMEM+: A multilevel-PGAS programming model for reconfigurable supercomputing

Vikas Aggarwal; Alan D. George; Changil Yoon; Kishore Yalamanchili; Herman Lam

Reconfigurable Computing (RC) systems based on FPGAs are becoming an increasingly attractive solution to building parallel systems of the future. Applications targeting such systems have demonstrated superior performance and reduced energy consumption versus their traditional counterparts based on microprocessors. However, most of such work has been limited to small system sizes. Unlike traditional HPC systems, lack of integrated, system-wide, parallel-programming models and languages presents a significant design challenge for creating applications targeting scalable, reconfigurable HPC systems. In this article, we extend the traditional Partitioned Global Address Space (PGAS) model to provide a multilevel integration of memory, which simplifies development of parallel applications for such systems and improves developer productivity. The new multilevel-PGAS programming model captures the unique characteristics of reconfigurable HPC systems, such as the existence of multiple levels of memory hierarchy and heterogeneous computation resources. Based on this model, we extend and adapt the SHMEM communication library to become what we call SHMEM+, the first known SHMEM library enabling coordination between FPGAs and CPUs in a reconfigurable, heterogeneous HPC system. Applications designed with SHMEM+ yield improved developer productivity compared to current methods of multidevice RC design and exhibit a high degree of portability. In addition, our design of SHMEM+ library itself is portable and provides peak communication bandwidth comparable to vendor-proprietary versions of SHMEM. Application case studies are presented to illustrate the advantages of SHMEM+.

ACM Transactions on Reconfigurable Technology and Systems | 2012

SCF: A Framework for Task-Level Coordination in Reconfigurable, Heterogeneous Systems

Vikas Aggarwal; Greg Stitt; Alan D. George; Changil Yoon

Heterogeneous computing systems comprised of accelerators such as FPGAs, GPUs, and manycore processors coupled with standard microprocessors are becoming an increasingly popular solution for future computing systems due to their higher performance and energy efficiency. Although programming languages and tools are evolving to simplify device-level design, programming such systems is still difficult and time-consuming largely due to system-wide challenges involving communication between heterogeneous devices, which currently require ad hoc solutions. Most communication frameworks and APIs which have dominated parallel application development for decades were developed for homogeneous systems, and hence cannot be directly employed for hybrid systems. To solve this problem, this article presents the System Coordination Framework (SCF), which employs message passing to transparently enable communication between tasks described using different programming tools (and languages), and running on heterogeneous processing devices of systems from domains ranging from embedded systems to High-Performance Computing (HPC) systems. By hiding low-level architectural details of the underlying communication from an application designer, SCF can improve application development productivity, provide higher levels of application portability, and offer rapid design-space exploration of different task/device mappings. In addition, SCF enables custom communication synthesis that exploits mechanisms specific to different devices and platforms, which can provide performance improvements over generic solutions employed previously. Our results indicate a performance improvement of 28× and 682× by employing FPGA devices for two applications presented in this article, while simultaneously improving the developer productivity by approximately 2.5 to 5 times by using SCF.

Concurrency and Computation: Practice and Experience | 2015

Low-level PGAS computing on many-core processors with TSHMEM

Bryant C. Lam; Alan D. George; Herman Lam; Vikas Aggarwal

Diminishing returns from increased clock frequencies and instruction‐level parallelism have forced computer architects to adopt architectures that exploit wider parallelism through multiple processor cores. While emerging many‐core architectures have progressed at a remarkable rate, concerns arise regarding the performance and productivity of numerous parallel‐programming tools for application development. Development of parallel applications on many‐core processors often requires developers to familiarize themselves with unique characteristics of a target platform while attempting to maximize performance and maintain correctness of their applications. The family of partitioned global address space (PGAS) programming models comprises the current state of the art in balancing performance and programmability. One such PGAS approach is SHMEM, a lightweight, shared‐memory programming library that has demonstrated high performance and productivity potential for parallel‐computing systems with distributed‐memory architectures. In the paper, we present research, design, and analysis of a new SHMEM infrastructure specifically crafted for low‐level PGAS on modern and emerging many‐core processors featuring dozens of cores and more. Our approach (with a new library known as TSHMEM) is investigated and evaluated atop two generations of Tilera architectures, which are among the most sophisticated and scalable many‐core processors to date, and is intended to enable similar libraries atop other architectures now emerging. In developing TSHMEM, we explore design decisions and their impact on parallel performance for the Tilera TILE‐Gx and TILEPro many‐core architectures, and then evaluate the designs and algorithms within TSHMEM through microbenchmarking and applications studies with other communication libraries. Our results with barrier primitives provided by the Tilera libraries show dissimilar performance between the TILE‐Gx and TILEPro; therefore, TSHMEMs barrier design takes an alternative approach and leverages the on‐chip mesh network to provide consistent low‐latency performance. In addition, our experiments with TSHMEM show that naive collective algorithms consistently outperformed linear distributed collective algorithms when executed in an SMP‐centric environment. In leveraging these insights for the design of TSHMEM, our approach outperforms the OpenSHMEM reference implementation, achieves similar to positive performance over OpenMP and OSHMPI atop MPICH, and supports similar libraries in delivering high‐performance parallel computing to emerging many‐core systems. Copyright

Proceedings of the Fourth Conference on Partitioned Global Address Space Programming Model | 2010

Performance modeling for multilevel communication in SHMEM

Vikas Aggarwal; Changil Yoon; Alan D. George; Herman Lam; Greg Stitt

The field of high-performance computing (HPC) is currently undergoing a major transformation brought upon by a variety of new processor device technologies. Accelerator devices (e.g. FPGA, GPU) are becoming increasingly popular as coprocessors in HPC, embedded, and other systems, improving application performance while in some cases also reducing energy consumption. The presence of such devices introduces additional levels of communication and memory hierarchy in the system, which warrants an expansion of conventional parallel-programming practices to address these differences. Programming models and libraries for heterogeneous, parallel, and reconfigurable computing such as SHMEM+ have been developed to support communication and coordination involving a diverse mix of processor devices. However, to evaluate the impact of communication on application performance and obtain optimal performance, a concrete understanding of the underlying communication infrastructure is often imperative. In this paper, we introduce a new multilevel communication model for representing various data transfers encountered in these systems and for predicting performance. Three use cases are presented and evaluated. First, the model enables application developers to perform early design-space exploration of communication patterns in their applications before undertaking the laborious and expensive process of implementation, yielding improved performance and productivity. Second, the model enables system developers to quickly optimize performance of data-transfer routines within tools such as SHMEM+ when being ported to a new platform. Third, the model augments tools such as SHMEM+ to automatically improve performance of data transfers by self-tuning internal parameters to match platform capabilities. Results from experiments with these use cases suggest marked improvement in performance, productivity, and portability.

Explore More