Tania Banerjee | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Tania Banerjee is active.

Explore More

Publication

Featured researches published by Tania Banerjee.

IEEE Transactions on Computers | 2015

PC-TRIO: A Power Efficient TCAM Architecture for Packet Classifiers

Tania Banerjee; Sartaj Sahni; Gunasekaran Seetharaman

PC-TRIO is an indexed TCAM architecture for packet classification. In addition to index TCAMs, PC-TRIO uses wide SRAM words. On our packet classifier data sets, PC-TRIO reduced TCAM power by 96 percent and lookup time by 98 percent on an average, compared to PC-DUOS+ [28] that does not use indexing or wide SRAMs. PC-DUOS+ was shown to be better than STCAM, which is a single TCAM architecture conventionally used for packet classification [28]. In this paper, we also extend PC-DUOS+ by augmenting it with wide SRAMs and index TCAMs using the same methodology as used in PC-TRIO, to obtain PC-DUOS+W. On ACL data sets, PC-DUOS+W reduced TCAM power by 86 percent and lookup time by 98 percent, compared to PC-DUOS+, which demonstrates the effectiveness of indexing and usage of wide SRAMs in reducing power and lookup time for packet classifiers.

international green and sustainable computing conference | 2015

A genetic algorithm based autotuning approach for performance and energy optimization

Tania Banerjee; Sanjay Ranka

Autotuning is an empirical optimization approach in which the configuration space of an algorithmic code is explored in a systematic manner for a variety of software and hardware parameters. The objective of such autotuning is to reduce the computational time and/or energy requirements of the generated code. We develop a genetic algorithm based autotuning strategy that can be used for optimizing performance or energy or a combination thereof. The main advantage of our approach is that the number of possible compilations and executions that are explored in the configuration space is substantially smaller than exhaustive search. We demonstrate the usefulness of our approach to the underlying small matrix multiplication routines in spectral element solvers. The latter are an important class of higher order methods that are expected to be computationally intensive portion of next generation of large scale CFD simulations. Our experimental results were conducted of a variety of platforms. On AMD Fusion, for example, the genetic algorithm is able obtain 34% improvement in performance and 37% reduction in energy consumption over existing versions of the code. Further, a very small fraction of the entire configuration space needs to be explored.

IEEE Transactions on Computers | 2014

PC-DUOS+: A TCAM Architecture for Packet Classifiers

Tania Banerjee; Sartaj Sahni; Gunasekaran Seetharaman

We propose algorithms for distributing the classifier rules to two ternary content addressable memories (TCAMs) and for incrementally updating the TCAMs. The performance of our scheme is compared against the prevalent scheme of storing classifier rules in a single TCAM in priority order. Our scheme results in an improvement in average lookup speed by up to 49% and an improvement in update performance by up to 3.84 times in terms of the number of TCAM writes.

international conference on cluster computing | 2015

CMT-bone: A Mini-App for Compressible Multiphase Turbulence Simulation Software

Nalini Kumar; Mrugesh Sringarpure; Tania Banerjee; Jason Hackl; S. Balachandar; Herman Lam; Alan D. George; Sanjay Ranka

Designed with the goal of mimicking key features of real HPC workloads, mini-apps have become an important tool for co-design. An investigation of mini-app behavior can provide system designers with insight into the impact of architectures, programming models, and tools on application performance. Mini-apps can also serve as a platform for fast algorithm design space exploration, allowing the application developers to evaluate their design choices before significantly redesigning the application codes. Consequently, it is prudent to develop a mini-app alongside the full blown application it is intended to represent. In this paper, we present CMT-bone a mini-app for the compressible multiphase turbulence (CMT) application, CMT-nek, being developed to extend the physics of the CESAR Nek5000 application code. CMT-bone consists of the most computationally intensive kernels of CMT-nek and the communication operations involved in nearest-neighbor updates and vector reductions. The mini-app represents CMT-nek in its most mature state and going forward it will be developed in parallel with the CMT-nek application to keep pace with key new performance impacting changes. We describe these kernels and discuss the role that CMT-bone has played in enabling interdisciplinary collaboration by allowing application developers to work with computer scientists on performance optimization on current architectures and performance analysis on notional future systems.

ieee international conference on high performance computing data and analytics | 2016

CMT-Bone — A Proxy Application for Compressible Multiphase Turbulent Flows

Tania Banerjee; Jason Hackl; Mrugesh Shringarpure; Tanzima Islam; S. Balachandar; Thomas J. Jackson; Sanjay Ranka

CMT-bone is a proxy app of CMT-nek, which is a solver of the compressible Navier-Stokes equations for multiphase flows being developed at University of Florida. While the objective of CMT-nek is to perform high fidelity, predictive simulations of particle laden explosively dispersed turbulent flows, the goal of CMT-bone is to mimic the computational behavior of CMT-nek in terms of operation counts, memory access patterns for data and performance characteristics of hardware devices (memory, cache, floating point unit, etc.). CMT-bone, as a proxy app, has a tremendous potential to be an important benchmark to realize tradeoffs in HPC software, hardware, and algorithm design aspart of the co-design process.

international conference on supercomputing | 2018

Dynamic Load Balancing for Compressible Multiphase Turbulence

Keke Zhai; Tania Banerjee; David Zwick; Jason Hackl; Sanjay Ranka

CMT-nek is a new scientific application for performing high fidelity predictive simulations of particle laden explosively dispersed turbulent flows. CMT-nek involves detailed simulations, is compute intensive and is targeted to be deployed on exascale platforms. The moving particles are the main source of load imbalance as the application is executed on parallel processors. In a demonstration problem, all the particles are initially in a closed container until a detonation occurs and the particles move apart. If all processors get an equal share of the fluid domain, then only some of the processors get sections of the domain that are initially laden with particles, leading to disparate load on the processors. In order to eliminate load imbalance in different processors and to speedup the makespan, we present different load balancing algorithms for CMT-nek on large scale multicore platforms consisting of hundred of thousands of cores. The detailed process of the load balancing algorithms are presented. The performance of the different load balancing algorithms are compared and the associated overheads are analyzed. Evaluations on the application with and without load balancing are conducted and these show that with load balancing, simulation time becomes faster by a factor of up to 9.97.

international symposium on signal processing and information technology | 2016

Dynamic data driven image reconstruction using multiple GPUs

Adeesha Wijayasiri; Tania Banerjee; Sanjay Ranka; Sartaj Sahni; Mark S. Schmalz

The reconstruction of nxn-pixel Synthetic Aperture Radar imagery using Back Projection algorithm incurs O(n2·m) cost, where m is the number of pulses. This paper presents dynamic data driven multiresolution algorithms to speed up SAR backprojection on multiple GPUs. A critical part of this spatially variant reconstruction process is load balancing, which circumvents asymmetric work assignment. Our algorithms achieve 15 TFLOPS using 128 GPUs.

international green and sustainable computing conference | 2016

Multi-objective optimization of CMT-bone on hybrid processors

Mohamed Gadou; Tania Banerjee; Sanjay Ranka

Hybrid multicore processors (HMPs) consist of general purpose cores along with specialized cores and are expected to provide benefits to a wide spectrum of applications at significantly lower energy requirements per FLOP (Floating-point Operations per Second). In this paper, we present the challenges involved when developing performance and energy efficient software for CMT-bone, a proxy application of CMT-nek, on a hybrid multicore processor. We have also implemented CMT-bone for traditional CPU cores as well as GPU. We provide load balancing strategies for a combination of CPU and GPU cores. We also provide performance, power and energy tradeoffs on an Intel multicore processor and an NVIDIA GPU co-processor.

Sustainable Computing: Informatics and Systems | 2016