Sudarshan Banerjee | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Sudarshan Banerjee is active.

Explore More

Publication

Featured researches published by Sudarshan Banerjee.

design automation conference | 2005

Physically-aware HW-SW partitioning for reconfigurable architectures with partial dynamic reconfiguration

Sudarshan Banerjee; Elaheh Bozorgzadeh; Nikil D. Dutt

Many reconfigurable architectures offer partial dynamic configurability, but current system-level tools cannot guarantee feasible implementations when exploiting this feature. We present a physically aware hardware-software (HW-SW) scheme for minimizing application execution time under HW resource constraints, where the HW is a reconfigurable architecture with partial dynamic reconfiguration capability. Such architectures impose strict placement constraints that lead to implementation infeasibility of even optimal scheduling formulations that ignore the nature of these constraints. We propose an exact and a heuristic formulation that simultaneously partition, schedule, and do linear placement of tasks on such architectures. With our exact formulation, we prove the critical nature of placement constraints. We demonstrate that our heuristic generates high-quality schedules by comparing the results with the exact formulation for small tests and a popular, but placement-unaware scheduling heuristic for larger tests. With a case study, we demonstrate extension of our approach to handle heterogenous architectures with specialized resources distributed between general purpose programmable logic columns. The execution time of our heuristic is very reasonable - task graphs with hundreds of nodes are processed in a couple of minutes.

IEEE Transactions on Very Large Scale Integration Systems | 2006

Integrating Physical Constraints in HW-SW Partitioning for Architectures With Partial Dynamic Reconfiguration

Sudarshan Banerjee; Elaheh Bozorgzadeh; Nikil D. Dutt

Partial dynamic reconfiguration is a key feature of modern reconfigurable architectures such as the Xilinx Virtex series of devices. However, this capability imposes strict placement constraints such that even exact system-level partitioning (and scheduling) formulations are not guaranteed to be physically realizable due to placement infeasibility. We first present an exact approach for hardware-software (HW-SW) partitioning that guarantees correctness of implementation by considering placement implications as an integral aspect of HW-SW partitioning. Our exact approach is based on integer linear programming (ILP) and considers key issues such as configuration prefetch for minimizing schedule length on the target single-context device. Next, we present a physically aware HW-SW partitioning heuristic that simultaneously partitions, schedules, and does linear placement of tasks on such devices. With the exact formulation, we confirm the necessity of physically-aware HW-SW partitioning for the target architecture. We demonstrate that our heuristic generates high-quality schedules by comparing the results with the exact formulation for small tests and with a popular, but placement-uanaware scheduling heuristic for a large set of over a hundred tests. Our final set of experiments is a case study of JPEG encoding-we demonstrate that our focus on physical considerations along with our consideration of multiple task implementation points enables our approach to be easily extended to handle heterogenous architectures (with specialized resources distributed between general purpose programmable logic columns). The execution time of our heuristic is very reasonable-task graphs with hundreds of nodes are processed (partitioned, scheduled, and placed) in a couple of minutes

design, automation, and test in europe | 2005

ISEGEN: Generation of High-Quality Instruction Set Extensions by Iterative Improvement

Partha Biswas; Sudarshan Banerjee; Nikil D. Dutt; Laura Pozzi; Paolo Ienne

Customization of processor architectures through instruction set extensions (ISEs) is an effective way to meet the growing performance demands of embedded applications. A high-quality ISE generation approach needs to obtain results close to those achieved by experienced designers, particularly for complex applications that exhibit regularity; expert designers are able to exploit manually such regularity in the data flow graphs to generate high-quality ISEs. We present ISEGEN, an approach that identifies high-quality ISEs by iterative improvement following the basic principles of the well-known Kernighan-Lin (K-L) min-cut heuristic. Experimental results on a number of MediaBench, EEMBC and cryptographic applications show that our approach matches the quality of the optimal solution obtained by exhaustive search. We also show that our ISEGEN technique is on average 20 times faster than a genetic formulation that generates equivalent solutions. Furthermore, the ISEs identified by our technique exhibit 35% more speedup than the genetic solution on a large cryptographic application (AES) by effectively exploiting its regular structure.

international conference on hardware/software codesign and system synthesis | 2004

Efficient search space exploration for HW-SW partitioning

Sudarshan Banerjee; Nikil D. Dutt

Hardware/software (HW-SW) partitioning is a key problem in the codesign of embedded systems, studied extensively in the past. One major open challenge for traditional partitioning approaches ¿ as we move to more complex and heterogeneous SOCs ¿ is the lack of efficient exploration of the large space of possible HW/SW configurations,coupled with the inability to efficiently scale up with larger problem sizes. In this paper, we make two contributions for HW-SW partitioning of applications represented as procedural callgraphs: 1) we prove that during partitioning, the execution time metric for moving a vertex needs to be updated only for the immediate neighbours of the vertex, rather than for all ancestors along paths to the root vertex; consequently, we observe faster run-times for move-based partitioning algorithms such as Simulated Annealing (SA), allowing call graphs with thousands of vertices to be processed in less than a second, and 2) we devise a new cost function for SA that allows frequent discovery of better partitioning solutions by searching spaces overlooked by traditional SA cost functions. We present experimental results on a very large design space, where several thousand configurations are explored in minutes as compared to several hours or days using a traditional SA formulation. Furthermore, our approach is frequently able to locate better design points with over 10 % improvement in application execution time compared to the solutions generated by a Kernighan-Lin partitioning algorithm starting with an all-SW partitioning.

international conference on hardware/software codesign and system synthesis | 2006

ISEGEN: an iterative improvement-based ISE generation technique for fast customization of processors

Partha Biswas; Sudarshan Banerjee; Nikil D. Dutt; Laura Pozzi; Paolo Ienne

Customization of processor architectures through instruction set extensions (ISEs) is an effective way to meet the growing performance demands of embedded applications. A high-quality ISE generation approach needs to obtain results close to those achieved by experienced designers, particularly for complex applications that exhibit regularity: expert designers are able to exploit manually such regularity in the data flow graphs to generate high-quality ISEs. In this paper, we present ISEGEN, an approach that identifies high-quality ISEs by iterative improvement following the basic principles of the well-known Kernighan-Lin min-cut heuristic. Experimental results on a number of MediaBench, EEMBC, and cryptographic applications show that our approach matches the quality of the optimal solution obtained by exhaustive search. We also show that our ISEGEN technique is on average 20times faster than a genetic formulation that generates equivalent solutions. Furthermore, the ISEs identified by our technique exhibit 35% more speedup than the genetic solution on a large cryptographic application by effectively exploiting its regular structure

international conference on hardware/software codesign and system synthesis | 2006

Design space exploration of real-time multi-media MPSoCs with heterogeneous scheduling policies

Minyoung Kim; Sudarshan Banerjee; Nikil D. Dutt; Nalini Venkatasubramanian

Real-time multi-media applications are increasingly being mapped onto MPSoC (multi-processor system-on-chip) platforms containing hardware-software IPs (intellectual property) along with a library of common scheduling policies such as EDF, RM. The choice of a scheduling policy for each IP is a key decision that greatly affects the designs ability to meet real-time constraints, and also directly affects the energy consumed by the design. We present a cosynthesis framework for design space exploration that considers heterogenous scheduling while mapping multimedia applications onto such MPSoCs. In our approach, we select a suitable scheduling policy for each IP such that system energy is minimized - our framework also includes energy reduction techniques utilizing dynamic power management. Experimental results on a realistic multi-mode multi-media terminal application demonstrate that our approach enables us to select design points with up to 60.5% reduced energy for a given area constraint, while meeting all real-time requirements. More importantly, our approach generates a tradeoff space between energy and cost allowing designers to comparatively evaluate multiple system level mappings.

ACM Transactions in Embedded Computing Systems | 2008

Energy-aware cosynthesis of real-time multimedia applications on MPSoCs using heterogeneous scheduling policies

Minyoung Kim; Sudarshan Banerjee; Nikil D. Dutt; Nalini Venkatasubramanian

Real-time multimedia applications are increasingly being mapped onto MPSoC (multiprocessor system-on-chip) platforms containing hardware--software IPs (intellectual property), along with a library of common scheduling policies such as EDF, RM. The choice of a scheduling policy for each IP is a key decision that greatly affects the designs ability to meet real-time constraints, and also directly affects the energy consumed by the design. We present a cosynthesis framework for design space exploration that considers heterogeneous scheduling while mapping multimedia applications onto such MPSoCs. In our approach, we select a suitable scheduling policy for each IP such that system energy is minimized—our framework also includes energy-reduction techniques utilizing dynamic power management. Experimental results on a realistic multimode multimedia terminal application demonstrate that our approach enables us to select design points with up to 60.5&percent; reduced energy for a given area constraint, while meeting all real-time requirements. More importantly, our approach generates a tradeoff space between energy and cost allowing designers to comparatively evaluate multiple system level mappings.

Theoretical Computer Science | 2009

Strip packing with precedence constraints and strip packing with release times

John Augustine; Sudarshan Banerjee; Sandy Irani

The strip packing problem seeks to tightly pack a set of n rectangles into a strip of fixed width and arbitrary height. The rectangles model tasks and the height models time. This paper examines two variants of strip packing: when the rectangles to be placed have precedence constraints and when the rectangles have release times. Strip packing is used to model scheduling problems in which tasks require a contiguous subset of identical resources that are arranged in a linear topology. The variants studied here are motivated by scheduling tasks for dynamically reconfigurable Field-Programmable Gate Arrays (FPGAs) comprised of a linear arrangement of K homogeneous computing resources, where K is a fixed positive integer, and each task occupies a contiguous subset of these resources. For the case in which tasks have precedence constraints, we give an O(logn) approximation algorithm. We then consider the special case in which all the rectangles have uniform height, and reduce it to the resource constrained scheduling studied by Garey, Graham, Johnson and Yao, thereby extending their asymptotic results to our special case problem. We also give an absolute 3-approximation for this special case problem. For strip packing with release times, we provide an asymptotic polynomial time approximation scheme. We make the standard assumption that the rectangles have height at most 1. In addition, we also require widths to be in [1K,1]. For the FPGA application, this would imply that the rectangles are at least as wide as a column. Our running time is polynomial in n and 1/@e, but exponential in K.

asia and south pacific design automation conference | 2006

PARLGRAN: parallelism granularity selection for scheduling task chains on dynamically reconfigurable architectures

Sudarshan Banerjee; Elaheh Bozorgzadeh; Nikil D. Dutt

Partial dynamic reconfiguration, often called RTR (run-time reconfiguration) is a key feature in modern reconfigurable platforms. While partial RTR enables additional application performance, it imposes physical constraints necessitating simultaneous scheduling and placement while mapping application task graphs onto such architectures. In this paper, we present PARLGRAN, an approach that maximizes performance of application task chains by selecting a suitable granularity of data-parallelism for individual data parallel tasks. Our approach focuses on reconfiguration delay overhead and placement-related issues (such as fragmentation) while selecting individual data-parallelism granularity as an integral part of simultaneous scheduling and placement. We demonstrate that our heuristic generates high-quality schedules on an extensive set of over a 1000 synthetic experiments by comparing the results with an approach that tries to statically maximize data-parallelism, i.e., does not consider the overheads and constraints associated with partial RTR. A detailed case-study on JPEG encoding additionally confirms that blindly maximizing data-parallelism can result in schedules even worse than that generated by a simple (but RTR-aware) approach oblivious to data-parallelism

IEEE Transactions on Very Large Scale Integration Systems | 2009

Exploiting Application Data-Parallelism on Dynamically Reconfigurable Architectures: Placement and Architectural Considerations

Sudarshan Banerjee; Elaheh Bozorgzadeh; Nikil D. Dutt

Partial dynamic reconfiguration, often called run-time reconfiguration (RTR), is a key feature in modern reconfigurable platforms. In this paper, we present parallelism granularity selection (PARLGRAN), an application mapping approach that maximizes performance of application task chains on architectures with such capability. PARLGRAN essentially selects a suitable granularity of data-parallelism for individual data parallel tasks while considering key issues such as significant reconfiguration overhead and placement constraints. It integrates granularity selection very effectively in a joint scheduling and placement formulation, necessary due to constraints imposed by partial RTR. As a key step to validating PARLGRAN, we additionally present an exact strategy (integer linear programming formulation). We demonstrate that PARLGRAN generates high-quality schedules with: (1) a set of small test cases where we compare our results with the exact strategy; (2) a very large set of synthetic experiments with over a thousand data-points where we compare it with a simpler strategy that tries to statically maximize data-parallelism, i.e., only considers resource availability; and (3) a detailed application case study of JPEG encoding. The application case-study confirms that blindly maximizing data-parallelism can result in schedules even worse than that generated by a simple (but RTR-aware) approach oblivious to data-parallelism. Last, but very important, we demonstrate that our approach is well-suited for true on-demand computing with detailed execution time estimates on a typical embedded processor. Heuristic execution time is comparable to task execution time, i.e., it is feasible to integrate PARLGRAN in a run-time scheduler for dynamically reconfigurable architectures.

Explore More