Akila Gothandaraman | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Akila Gothandaraman is active.

Explore More

Publication

Featured researches published by Akila Gothandaraman.

IEEE Transactions on Parallel and Distributed Systems | 2011

Comparing Hardware Accelerators in Scientific Applications: A Case Study

Rick Weber; Akila Gothandaraman; Robert J. Hinde; Gregory D. Peterson

Multicore processors and a variety of accelerators have allowed scientific applications to scale to larger problem sizes. We present a performance, design methodology, platform, and architectural comparison of several application accelerators executing a Quantum Monte Carlo application. We compare the applications performance and programmability on a variety of platforms including CUDA with Nvidia GPUs, Brook+ with ATI graphics accelerators, OpenCL running on both multicore and graphics processors, C++ running on multicore processors, and a VHDL implementation running on a Xilinx FPGA. We show that OpenCL provides application portability between multicore processors and GPUs, but may incur a performance cost. Furthermore, we illustrate that graphics accelerators can make simulations involving large numbers of particles feasible.

parallel computing | 2008

FPGA acceleration of a quantum Monte Carlo application

Akila Gothandaraman; Gregory D. Peterson; G. L. Warren; Robert J. Hinde; Robert J. Harrison

Quantum Monte Carlo methods enable us to determine the ground-state properties of atomic or molecular clusters. Here, we present a reconfigurable computing architecture using Field Programmable Gate Arrays (FPGAs) to accelerate two computationally intensive kernels of a Quantum Monte Carlo (QMC) application applied to N-body systems. We focus on two key kernels of the QMC application: acceleration of potential energy and wave function calculations. We compare the performance of our application on two reconfigurable platforms. Firstly, we use a dual-processor 2.4GHz Intel Xeon augmented with two reconfigurable development boards consisting of Xilinx Virtex-II Pro FPGAs. Using this platform, we achieve a speedup of 3x over a software-only implementation. Following this, the chemistry application is ported to the Cray XD1 supercomputer equipped with Xilinx Virtex-II Pro and Virtex-4 FPGAs. The hardware-accelerated application on one node of the high performance system equipped with a single Virtex-4 FPGA yields a speedup of approximately 25x over the serial reference code running on one node of the dual-processor dual-core 2.2GHz AMD Opteron. This speedup is mainly attributed to the use of pipelining, the use of fixed-point arithmetic for all calculations and the fine-grained parallelism using FPGAs. We can further enhance the performance by operating multiple instances of our design in parallel.

conference on high performance computing (supercomputing) | 2006

Reconfigurable accelerator for quantum Monte Carlo simulations in N-body systems

Akila Gothandaraman; G. Lee Warren; Gregory D. Peterson; Robert J. Harrison

Recent advances in FPGA technology make them an attractive platform for accelerating scientific computing applications. We present a novel hardware accelerator for Quantum Monte Carlo simulations in N-body systems. The design is deeply pipelined and exploits the inherent fine-grained parallelism available using an FPGA for all calculations. The design is implemented on a Xilinx Virtex II Pro XC2VP30 device and preliminary results indicate a maximum operating frequency of 100MHz. A single instance of our design offers an estimated speedup of 20x and accuracy comparable to the serial code running on a 2.8GHz Intel Pentium 4 processor. This architecture performs all computations with fixed-point representation and delivers accuracy on the order of or better than double-precision floating point. After deploying a single instance on the present FPGA platform, targeting our design on the Cray XD1 platform with a high gate-density FPGA will allow us to operate multiple cores in parallel.

Computer Physics Communications | 2009

A Hardware-Accelerated Quantum Monte Carlo framework (HAQMC) for N -body systems

Akila Gothandaraman; Gregory D. Peterson; G. Lee Warren; Robert J. Hinde; Robert J. Harrison

Interest in the study of structural and energetic properties of highly quantum clusters, such as inert gas clusters has motivated the development of a hardware-accelerated framework for Quantum Monte Carlo simulations. In the Quantum Monte Carlo method, the properties of a system of atoms, such as the ground-state energies, are averaged over a number of iterations. Our framework is aimed at accelerating the computations in each iteration of the QMC application by offloading the calculation of properties, namely energy and trial wave function, onto reconfigurable hardware. This gives a user the capability to run simulations for a large number of iterations, thereby reducing the statistical uncertainty in the properties, and for larger clusters. This framework is designed to run on the Cray XD1 high performance reconfigurable computing platform, which exploits the coarse-grained parallelism of the processor along with the fine-grained parallelism of the reconfigurable computing devices available in the form of field-programmable gate arrays. In this paper, we illustrate the functioning of the framework, which can be used to calculate the energies for a model cluster of helium atoms. In addition, we present the capabilities of the framework that allow the user to vary the chemical identities of the simulated atoms.

radio frequency integrated circuits symposium | 2003

An all-digital frequency locked loop (ADFLL) with a pulse output direct digital frequency synthesizer (DDFS) and an adaptive phase estimator

Akila Gothandaraman; Syed K. Islam

An algorithm for frequency synthesizing all-digital frequency locked loops (ADFLLs) with fast frequency acquisition is presented in this paper. A Pulse Output Direct Digital Frequency Synthesizer (DDFS) with reduced hardware cost and architecture, full digitization, easy design and implementation is proposed. An adaptive phase estimator is also proposed. The DDFS has 16-bit binary weighted control and the simulations show that the ADFLL can operate in the frequency range between 50 MHz and 500 MHz.

midwest symposium on circuits and systems | 2010

A pipelined and parallel architecture for quantum Monte Carlo simulations on FPGAs

Akila Gothandaraman; Gregory D. Peterson; G. Lee Warren; Robert J. Hinde; Robert J. Harrison

Recent advances in Field-Programmable Gate Array (FPGA) technology make reconfigurable computing using FPGAs an attractive platform for accelerating scientific applications. We develop a deeply pipelined and parallel architecture for Quantum Monte Carlo simulations using FPGAs. Quantum Monte Carlo simulations enable us to obtain the structural and energetic properties of atomic clusters. We experiment with different pipeline structures for each component of the design and develop a deeply pipelined architecture that provides the best performance in terms of achievable clock rate, while at the same time has a modest use of the FPGA resources. We discuss the details of the pipelined and generic architecture that is used to obtain the potential energy and wave function of a cluster of atoms.

ieee international conference on high performance computing data and analytics | 2011

Application of Graphics Processing Units (GPUs) to the Study of Non-linear Dynamics of the Exciton Bose-Einstein Condensate in a Semiconductor Quantum Well

Akila Gothandaraman; Seyedhamidreza Sadatian; Michal Faryniarz; Oleg L. Berman; G. V. Kolmakov

In this paper, we explore the use of Graphics Processing Units (GPUs) to solve numerically the nonlinear Gross-Pitaevskii equation with an external potential. Our implementation uses NVIDIAs Compute Unified Device Architecture (CUDA) programming paradigm and demonstrates a speedup of 190x on an NVIDIA Tesla C2050 (Fermi) GPU compared to an optimized software implementation on a single-core of an Intel Xeon 5500-series processor. We apply the developed technique to the study of Bose-Einstein condensation (BEC) of excitons in semiconductor nanostructures. The technique is also applicable to the studies of atomic condensates, quantized vortices in quantum fluids, propagation of light pulses in optical wave guides, and ocean wave dynamics.

midwest symposium on circuits and systems | 2008

Design decisions in the pipelined architecture for Quantum Monte Carlo simulations

Akila Gothandaraman; Gregory D. Peterson; Robert J. Hinde; Robert J. Harrison

The ground-state properties of atomic and molecular clusters can be obtained using Quantum Monte Carlo (QMC) simulations. We propose a reconfigurable hardware architecture using Field-Programmable Gate Arrays (FPGAs) to implement the kernels of the QMC application. To achieve higher clock rates, we experiment with different pipeline stages for each component of the design and develop a deeply pipelined architecture that provides the best performance in terms of clock rate, while at the same time has a modest use of embedded memory and multiplier resources so we can fit additional functions in a future implementation. Here, we discuss the details of the pipelined architecture and our design decisions while developing a general framework that can be used to obtain the potential energy of atomic or molecular clusters and extended to compute other useful properties.

ieee international conference on high performance computing data and analytics | 2012

Poster: A Disc-Based Decomposition Alogrithm with Optimal Load Balancing for N-Body Simulations

Akila Gothandaraman; Thomas Nason; G. Lee Warren

We propose a novel disc-based data decomposition algorithm for N-body simulations and compare its performance against a cyclic decomposition algorithm. We implement the data decomposition algorithms towards the calculation of three-body interactions in the Stillinger-Weber potential for a system of water molecules. The performance is studied in terms of load balance behavior and speedup from the MPI implementations of the two algorithms. We are also currently working on a performance study of the disc-based decomposition algorithm on graphics processing units (GPUs).

Archive | 2011