Partha Biswas | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Partha Biswas is active.

Explore More

Publication

Featured researches published by Partha Biswas.

design automation conference | 2004

Introduction of local memory elements in instruction set extensions

Partha Biswas; Vinay Choudhary; Kubilay Atasu; Laura Pozzi; Paolo Ienne; Nikil D. Dutt

Automatic generation of Instruction Set Extensions (ISEs), to be executed on a custom processing unit or a coprocessor is an important step towards processor customization. A typical goal of a manual designer is to combine a large number of atomic instructions into an ISE satisfying microarchitectural constraints. However, memory operations pose a challenge for previous ISE approaches by limiting the size of the resulting instruction. In this paper, we introduce memory elements into custom units which result in ISEs closer to those sought after by the designers. We consider two kinds of memory elements for mapping to the specialized hardware: small hardware tables and architecturally-visible state registers. We devised a genetic algorithm to specifically exploit opportunities of introducing memory elements during ISE generation. Finally, we demonstrate the effectiveness of our approach by a detailed study of the variation in performance, area and energy in the presence of the generated ISEs, on a number of MediaBench, EEMBC and cryptographic applications. With the introduction of memory, the average speedup varied from 2.7X to 5X depending on the architectural configuration with a nominal area overhead. Moreover, we obtained an average energy reduction of 26% with respect to a 32-KB cache.

design, automation, and test in europe | 2005

ISEGEN: Generation of High-Quality Instruction Set Extensions by Iterative Improvement

Partha Biswas; Sudarshan Banerjee; Nikil D. Dutt; Laura Pozzi; Paolo Ienne

Customization of processor architectures through instruction set extensions (ISEs) is an effective way to meet the growing performance demands of embedded applications. A high-quality ISE generation approach needs to obtain results close to those achieved by experienced designers, particularly for complex applications that exhibit regularity; expert designers are able to exploit manually such regularity in the data flow graphs to generate high-quality ISEs. We present ISEGEN, an approach that identifies high-quality ISEs by iterative improvement following the basic principles of the well-known Kernighan-Lin (K-L) min-cut heuristic. Experimental results on a number of MediaBench, EEMBC and cryptographic applications show that our approach matches the quality of the optimal solution obtained by exhaustive search. We also show that our ISEGEN technique is on average 20 times faster than a genetic formulation that generates equivalent solutions. Furthermore, the ISEs identified by our technique exhibit 35% more speedup than the genetic solution on a large cryptographic application (AES) by effectively exploiting its regular structure.

design, automation, and test in europe | 2006

Automatic Identification of Application-Specific Functional Units with Architecturally Visible Storage

Partha Biswas; Nikil D. Dutt; Paolo Ienne; Laura Pozzi

Instruction set extensions (ISEs) can be used effectively to accelerate the performance of embedded processors. The critical, and difficult task of ISE selection is often performed manually by designers. A few automatic methods for ISE generation have shown good capabilities, but are still limited in the handling of memory accesses, and so they fail to directly address the memory wall problem. We present here the first ISE identification technique that can automatically identify state-holding application-specific functional units (AFUs) comprehensively, thus being able to eliminate a large portion of memory traffic from cache and main memory. Our cycle-accurate results obtained by the SimpleScalar simulator show that the identified AFUs with architecturally visible storage gain significantly more than previous techniques, and achieve an average speedup of 2.8times over pure software execution. Moreover, the number of required memory-access instructions is reduced by two thirds on average, suggesting corresponding benefits on energy consumption

IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems | 2007

Introduction of Architecturally Visible Storage in Instruction Set Extensions

Partha Biswas; Nikil D. Dutt; Laura Pozzi; Paolo Ienne

Instruction set extensions (ISEs) can be used effectively to accelerate the performance of embedded processors. The critical and difficult task of ISE selection is often performed manually by designers. A few automatic methods for ISE generation have shown good capabilities but are still limited in the handling of memory accesses, and so they fail to directly address the memory wall problem. We present here the first ISE identification technique that can automatically identify state-holding application-specific functional units (AFUs) comprehensively, thus being able to eliminate a large portion of memory traffic from cache and the main memory. Our cycle-accurate results obtained by the SimpleScalar simulator show that the identified AFUs with architecturally visible storage gain significantly more than previous techniques and achieve an average speedup of 2.8times over pure software execution with a little area overhead. Moreover, the number of required memory-access instructions is reduced by two thirds on average, suggesting corresponding benefits on energy consumption

international conference on hardware/software codesign and system synthesis | 2006

ISEGEN: an iterative improvement-based ISE generation technique for fast customization of processors

Partha Biswas; Sudarshan Banerjee; Nikil D. Dutt; Laura Pozzi; Paolo Ienne

Customization of processor architectures through instruction set extensions (ISEs) is an effective way to meet the growing performance demands of embedded applications. A high-quality ISE generation approach needs to obtain results close to those achieved by experienced designers, particularly for complex applications that exhibit regularity: expert designers are able to exploit manually such regularity in the data flow graphs to generate high-quality ISEs. In this paper, we present ISEGEN, an approach that identifies high-quality ISEs by iterative improvement following the basic principles of the well-known Kernighan-Lin min-cut heuristic. Experimental results on a number of MediaBench, EEMBC, and cryptographic applications show that our approach matches the quality of the optimal solution obtained by exhaustive search. We also show that our ISEGEN technique is on average 20times faster than a genetic formulation that generates equivalent solutions. Furthermore, the ISEs identified by our technique exhibit 35% more speedup than the genetic solution on a large cryptographic application by effectively exploiting its regular structure

ACM Transactions on Design Automation of Electronic Systems | 2006

Compilation framework for code size reduction using reduced bit-width ISAs (rISAs)

Aviral Shrivastava; Partha Biswas; Ashok Halambi; Nikil D. Dutt; Alexandru Nicolau

For many embedded applications, program code size is a critical design factor. One promising approach for reducing code size is to employ a “dual instruction set”, where processor architectures support a normal (usually 32-bit) Instruction Set, and a narrow, space-efficient (usually 16-bit) Instruction Set with a limited set of opcodes and access to a limited set of registers. This feature however, requires compilers that can reduce code size by compiling for both Instruction Sets. Existing compiler techniques operate at the routine-level granularity and are unable to make the trade-off between increased register pressure (resulting in more spills) and decreased code size. We present a compilation framework for such dual instruction sets, which uses a profitability based compiler heuristic that operates at the instruction-level granularity and is able to effectively take advantage of both Instruction Sets. We demonstrate consistent and improved code size reduction (on average 22%), for the MIPS 32/16 bit ISA. We also show that the code compression obtained by this “dual instruction set” technique is heavily dependent on the application characteristics and the narrow Instruction Set itself.

compilers, architecture, and synthesis for embedded systems | 2003

Reducing code size for heterogeneous-connectivity-based VLIW DSPs through synthesis of instruction set extensions

Partha Biswas; Nikil D. Dutt

VLIW DSP architectures exhibit heterogeneous connections between functional units and register files for speeding up special tasks. Such architectural characteristics can be effectively exploited through the use of complex instruction set extensions (ISEs). Although VLIWs are increasingly being used for DSP applications to achieve very high performance, such architectures are known to suffer from increased code size. This paper addresses how to generate ISEs that can result in significant code size reduction in VLIW DSPs without degrading performance. Unfortunately, contemporary techniques for instruction set synthesis fail to extract legal ISEs for heterogeneous-connectivity-based architectures. We propose a Heuristic-based algorithm to synthesize ISEs for a generalized heterogeneous-connectivity-based VLIW DSP architecture. We achieve an average code size reduction of 25% on the MiBench suite with no penalty in performance by applying our ISE generation algorithm on the TI TMS320C6xx, a representative VLIW DSP.

IEEE Transactions on Computers | 2005

Code size reduction in heterogeneous-connectivity-based DSPs using instruction set extensions

Partha Biswas; Nikil D. Dutt

Existing trend of processors shows a progress toward customizable and reconfigurable architectures. In this paper, we study the benefit of combining the architectural design of a VLIW DSP and the concepts of modern customizable processors like ASIPs (application specific instruction set processors) for code size reduction. VLIW DSP architectures exhibit heterogeneous connections between functional units and register files for speeding up special tasks. Such architectural characteristics can be effectively exploited through the use of complex instruction set extensions (ISEs). Although VLIWs are increasingly being used for DSP applications to achieve very high performance, such architectures are known to suffer from increased code size. This paper also addresses how to generate and use ISEs that can result in significant code size reduction in VLIW DSPs without degrading performance. Unfortunately, contemporary techniques for generation of ISEs when applied before resource-binding fail to generate legal ISEs for VLIW architectures with heterogeneous connectivity between the functional units and register files. We propose a heuristic-based approach to generate ISEs for a generalized heterogeneous-connectivity-based VLIW DSP architecture. We achieve an average code size reduction of 25 percent on the MiBench suite with no penalty in performance by applying our ISE generation algorithms on the Tl TMS320C6xx, a representative VLIW DSP. We also show that the overhead of the required architectural assists for our approach is minimal: The TMS320C6xx pipeline meets the required timing with only a limited overhead in area.

compilers, architecture, and synthesis for embedded systems | 2008

Comprehensive isomorphic subtree enumeration

Partha Biswas; Girish Venkataramani

A fundamental problem in program analysis and optimization concerns the discovery of structural similarities between different sections of a given program and/or across different programs. Specifically, there is a need to find topologically identical segments within compiler intermediate representations (IRs). Such topological isomorphism has many applications. For example, finding isomorphic sub-trees within different expression trees points to common computational resources that can be shared when targeting application-specific hardware. Isomorphism in the controlflow graph can be used to discovery of custom instructions for customizable processors. Discovering isomorphism in context call trees during program execution is invaluable to several JIT compiler optimizations. Thus, all these different applications rely on the fundamental ability to find topologically identical segments within a given tree or graph representation. In this paper, we present a generic formulation of the subtree isomorphism problem that is more powerful than previous proposals. We prove that an optimal quadratic time solution exists for this problem. We employ a dynamic programming based algorithm to efficiently enumerate all isomorphic sub-trees within given reference trees and also demonstrate its efficacy in a production compiler.

design, automation, and test in europe | 2002