Is this you? Create Your Porfile

Gagan Gupta

University of Wisconsin-Madison

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Gagan Gupta is active.

Explore More

Publication

Featured researches published by Gagan Gupta.

IEEE Transactions on Circuits and Systems for Video Technology | 1995

Architectures for hierarchical and other block matching algorithms

Gagan Gupta; Chaitali Chakrabarti

Hierarchical block matching is an efficient motion estimation technique which provides an adaptation of the block size and the search area to the properties of the image. In this paper, we propose two novel special-purpose architectures to implement hierarchical block matching for real-time applications. The first architecture is memory-efficient, but requires a large external memory bandwidth and a large number of processors. The second architecture requires significantly fewer processors, but additional on-chip memory. We describe in details the processor architecture, the memory organization and the scheduling for both these architectures. We also show how the second architecture can be modified to handle full-search and 3-step hierarchical search block matching algorithms, with significant reduction in the hardware complexity as compared to existing architectures.

international symposium on microarchitecture | 2011

Dataflow execution of sequential imperative programs on multicore architectures

Gagan Gupta; Gurindar S. Sohi

As multicore processors become the default, researchers are aggressively looking for program execution models that make it easier to use the available resources. Multithreaded programming models that rely on statically-parallel programs have gained prevalence. Most of the existing research is directed at adapting and enhancing such models, alleviating their drawbacks, and simplifying their usage. This paper takes a different approach and proposes a novel execution model to achieve parallel execution of statically-sequential programs. It dynamically parallelizes the execution of suitably-written sequential programs, in a dataflow fashion, on multiple processing cores. Significantly, the execution is race-free and determinate. Thus the model eases program development and yet exploits available parallelism. This paper describes the implementation of a software runtime library that implements the proposed execution model on existing commercial multicore machines. We present results from experiments running benchmark programs, using both the proposed technique as well as traditional parallel programming, on three different systems. We find that in addition to easing the development of the benchmarks, the approach is resource-efficient and achieves performance similar to the traditional approach, using stock compilers, operating systems and hardware, despite the overheads of an all-software implementation of the model.

programming language design and implementation | 2014

Adaptive, efficient, parallel execution of parallel programs

Srinath Sridharan; Gagan Gupta; Gurindar S. Sohi

Future multicore processors will be heterogeneous, be increasingly less reliable, and operate in dynamically changing operating conditions. Such environments will result in a constantly varying pool of hardware resources which can greatly complicate the task of efficiently exposing a programs parallelism onto these resources. Coupled with this uncertainty is the diverse set of efficiency metrics that users may desire. This paper proposes Varuna, a system that dynamically, continuously, rapidly and transparently adapts a programs parallelism to best match the instantaneous capabilities of the hardware resources while satisfying different efficiency metrics. Varuna is applicable to both multithreaded and task-based programs and can be seamlessly inserted between the program and the operating system without needing to change the source code of either. We demonstrate Varunas effectiveness in diverse execution environments using unaltered C/C++ parallel programs from various benchmark suites. Regardless of the execution environment, Varuna always outperformed the state-of-the-art approaches for the efficiency metrics considered.

international conference on supercomputing | 2013

Holistic run-time parallelism management for time and energy efficiency

Srinath Sridharan; Gagan Gupta; Gurindar S. Sohi

The ubiquity of parallel machines will necessitate time- and energy-efficient parallel execution of a program in a wide range of hardware and software environments. Prevalent parallel execution models can fail to be efficient. Unable to account for dynamic changes in operating conditions, they may create non-optimum parallelism, leading to underutilization or contention of resources. We propose ParallelismDial (PD), a model to dynamically, continuously and judiciously adapt a programs degree of parallelism to a given dynamic operating environment. PD uses a holistic metric to measure system-efficiency. The metric is used to systematically optimize the programs execution. We apply PD to two diverse parallel programming models: Intel TBB, an industry standard, and Prometheus, a recent research effort. Two prototypes of PD have been implemented. The prototypes are evaluated on two stock multicore workstations. Dedicated and multiprogrammed environments were considered. Experimental results show that the prototypes outperform the state-of-the-art approaches, on average, by 15% on time and 31% on energy efficiency, in the dedicated environment. In the multiprogrammed environment, the savings are to the tune of 19% and 21% in time and energy, respectively.

design automation conference | 2006

Rapid estimation of control delay from high-level specifications

Gagan Gupta; Madhur Gupta; Preeti Ranjan Panda

We address the problem of estimating controller delay from high-level specifications during behavioral synthesis. Typically, the critical path of a synthesized behavioral design goes through both the data path and the control logic; yet most scheduling algorithms account only for data path and ignore control delay, leading to timing uncertainties in the resulting designs. We present an estimation technique for computing a fast, robust, scalable, and reasonably accurate approximation of the control delay from behavioral specifications. The delay estimate is formulated in terms of the properties of the input specification and other inputs to the synthesis process such as resource constraints

international symposium on circuits and systems | 1994

VLSI architectures for hierarchical block matching

Gagan Gupta; Chaitali Chakrabarti

Hierarchical block matching is an efficient motion estimation technique which provides an adaptation of the block size and the search area, to the properties of the image. In this work, we propose two novel special-purpose architectures for implementing hierarchical block matching. The first architecture is memory-efficient, but requires a large external memory bandwidth and a large number of processors. The second architecture requires significantly fewer processors, but additional on-chip memory. We describe the processor architecture, the memory organization and the scheduling details for both the architectures.<<ETX>>

global communications conference | 2007

A Distributed Algorithm for Level Set Estimation Using Uncoordinated Mobile Sensors

Gagan Gupta; Parmesh Ramanathan

We develop a level set estimation algorithm for a novel low cost sensor network architecture, where sensors are mounted on agents moving without an explicit objective of sensing. A level set in a planar scalar field is the set of points with field values greater than or equal to a specified threshold. The distributed algorithm uses opportunistic information exchange to estimate level set boundaries locally at nodes selected using leader election. Such estimates are aggregated at the base station. Effectiveness of the proposed scheme is evaluated using simulations with data from both synthetic and measured fields. Random way point mobility model is used for node motion and accuracy and trade off of coverage with communication costs is studied.

programming language design and implementation | 2014

Globally precise-restartable execution of parallel programs

Gagan Gupta; Srinath Sridharan; Gurindar S. Sohi

Emerging trends in computer design and use are likely to make exceptions, once rare, the norm, especially as the system size grows. Due to exceptions, arising from hardware faults, approximate computing, dynamic resource management, etc., successful and error-free execution of programs may no longer be assured. Yet, designers will want to tolerate the exceptions so that the programs execute completely, efficiently and without external intervention. Modern computers easily handle exceptions in sequential programs, using precise interrupts. But they are ill-equipped to handle exceptions in parallel programs, which are growing in prevalence. In this work we introduce the notion of globally precise-restartable execution of parallel programs, analogous to precise-interruptible execution of sequential programs. We present a software runtime recovery system based on the approach to handle exceptions in suitably-written parallel programs. Qualitative and quantitative analyses show that the proposed system scales with the system size, especially when exceptions are frequent, unlike the conventional checkpoint-and-recovery method.

ad hoc mobile and wireless networks | 2007

Level set estimation using uncoordinated mobile sensors

Gagan Gupta; Parmesh Ramanathan

We develop level set estimation algorithms for a novel low cost sensor network architecture, where sensors are mounted on agents moving without an explicit objective of sensing. A level set in a planar scalar field is the set of points with field values greater than or equal to a specified value. We model the problem as a classification problem and evaluate a heuristic to reduce the amount of communication assuming that the base station uses a Support Vector Machine classifier. We then develop a fully distributed, low complexity solution which uses opportunistic information exchange to estimate level set boundaries locally at nodes selected using leader election. We observe that the learning rates of the boundary in a locality is proportional to the complexity. Effectiveness of the proposed scheme is evaluated using simulations with data from both synthetic and measured fields. Random way point mobility model is used for node motion and trade off of accuracy and of coverage with communication costs is studied.

ieee hot chips symposium | 2014

Have your cake in parallel and eat it sequentially too! Semantically sequential, parallel execution of multiprocessor programs

Gagan Gupta

Multiprocessors are ubiquitous, but programming them continues to to be challenging. Our goal is to simplify multiprocessor programming without compromising performance.

Explore More