Anupam Chattopadhyay | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Anupam Chattopadhyay is active.

Explore More

Publication

Featured researches published by Anupam Chattopadhyay.

IEEE Transactions on Computers | 2013

High-Performance Hardware Implementation for RC4 Stream Cipher

Sourav Sen Gupta; Anupam Chattopadhyay; Koushik Sinha; Subhamoy Maitra; Bhabani P. Sinha

RC4 is the most popular stream cipher in the domain of cryptology. In this paper, we present a systematic study of the hardware implementation of RC4, and propose the fastest known architecture for the cipher. We combine the ideas of hardware pipeline and loop unrolling to design an architecture that produces 2 RC4 keystream bytes per clock cycle. We have optimized and implemented our proposed design using VHDL description, synthesized with 130, 90, and 65 nm fabrication technologies at clock frequencies 625 MHz, 1.37 GHz, and 1.92 GHz, respectively, to obtain a final RC4 keystream throughput of 10, 21.92, and 30.72 Gbps in the respective technologies.

design, automation, and test in europe | 2004

RTL processor synthesis for architecture exploration and implementation

Oliver Schliebusch; Anupam Chattopadhyay; Rainer Leupers; Gerd Ascheid; Heinrich Meyr; Mario Steinert; Gunnar Braun; Achim Nohl

Architecture description languages are widely used to perform architecture exploration for application-driven designs, whereas the RT-level is the commonly accepted level for hardware implementation. For this reason, design parameters such as timing, area or power consumption cannot be taken into consideration accurately during design space exploration. Design automation tools currently used to bridge this gap are either limited in the flexibility provided or only generate fragments of the architecture. This paper presents a synthesis tool which preserves the full flexibility of the architecture description language LISA, while being able to generate the complete architecture on RT-level using systemC. This paper also presents two real world architecture case studies to prove the feasibility of our approach.

IEEE Transactions on Very Large Scale Integration Systems | 2008

A Design Flow for Architecture Exploration and Implementation of Partially Reconfigurable Processors

Kingshuk Karuri; Anupam Chattopadhyay; Xiaolin Chen; David Kammler; Ling Hao; Rainer Leupers; Heinrich Meyr; Gerd Ascheid

During the last years, the growing application complexity, design, and mask costs have compelled embedded system designers to increasingly consider partially reconfigurable application-specific instruction set processors (rASIPs) which combine a programmable base processor with a reconfigurable fabric. Although such processors promise to deliver excellent balance between performance and flexibility, their design remains a challenging task. The key to the successful design of a rASIP is combined architecture exploration of all the three major components: the programmable core, the reconfigurable fabric, and the interfaces between these two. This work presents a design flow that supports fast architecture exploration for rASIPs. The design flow is centered around a unified description of an entire rASIP in an architecture description language (ADL). This ADL description facilitates consistent modeling and exploration of all three components of a rASIP through automatic generation of the software tools (compiler tool chain and instruction set simulator) and the RTL hardware model. The generated software tools and the RTL model can be used either for final implementation of the rASIP or can serve as a preoptimized starting point for implementation that can be hand optimized afterward. The design flow is further enhanced by a number of automatic application analysis tools, including a fine-grained application profiler, an instruction set extension (ISE) generator, and a data path mapper for coarse grained reconfigurable architectures (CGRAs). We present some case studies on embedded benchmarks to show how the design space exploration process helps to efficiently design an application domain specific rASIP.

Vlsi Design | 2013

Ingredients of adaptability: a survey of reconfigurable processors

Anupam Chattopadhyay

For a design to survive unforeseen physical effects like aging, temperature variation, and/or emergence of new application standards, adaptability needs to be supported. Adaptability, in its complete strength, is present in reconfigurable processors, which makes it an important IP in modern System-on-Chips (SoCs). Reconfigurable processors have risen to prominence as a dominant computing platform across embedded, general-purpose, and high-performance application domains during the last decade. Significant advances have been made in many areas such as, identifying the advantages of reconfigurable platforms, their modeling, implementation flow and finally towards early commercial acceptance. This paper reviews these progresses from various perspectives with particular emphasis on fundamental challenges and their solutions. Empowered with the analysis of past, the future research roadmap is proposed.

international conference on computer aided design | 2007

Increasing data-bandwidth to instruction-set extensions through register clustering

Kingshuk Karuri; Anupam Chattopadhyay; Manuel Hohenauer; Rainer Leupers; Gerd Ascheid; Heinrich Meyr

The conflicting requirements of performance and flexibility in today s embedded system market are forcing system designers to use more and more of the so called configurable or customizable processor cores. Such processors tend to meet the demanding performance constraints by accommodating application specific instruction set extensions (ISEs) which have, naturally, become a vital component of current processor customization flows. One major bottleneck in maximizing ISE performance is the limitation on the data-bandwidth between the general purpose register (GPR) file and the ISEs. For improved performance, it is desirable to have a large data-bandwidth from the GPRs to ISEs. However, the tight area constraints of modern embedded processors often restrict the GPR I/O of ISEs to save port area of the register files. This paper presents a novel approach to increase the GPR I/O of ISEs without significantly increasing the size of the GPR files. This is achieved by applying the concept of register clustering, common in many VLIW architectures, to single-issue processors with high performance ISEs. Such clustering often causes extra register moves in compiled code. This work also presents an algorithm to minimize such register moves. The benchmark results presented in this paper show that our solution can significantly reduce the area overhead of many-port GPR files without sacrificing the performance improvements through ISEs.

rapid system prototyping | 2005

Optimization techniques for ADL-driven RTL processor synthesis

Oliver Schliebusch; Anupam Chattopadhyay; Ernst Martin Witte; David Kammler; Gerd Ascheid; Rainer Leupers; Heinrich Meyr

Nowadays, architecture description languages (ADLs) are becoming popular for speeding up the development of complex SoC design, by performing design space exploration at a higher level of abstraction. This increase in the abstraction level traditionally comes at the cost of low performance of the final application specific instruction-set processor (ASIP) implementation, which is generated automatically from the ADL. There is a pressing need for novel optimization techniques for high level synthesis from ADLs, to compensate for this loss of performance. Two important aspects of these optimizations are the efficient usage of available structural information in the high level architecture descriptions and prudent pruning of overhead, introduced by mapping from ADL to register transfer level (RTL). In this paper, we present two high level optimization techniques, path sharing and decision minimization. These optimization techniques are shown to be of lower complexity, by at least two orders, compared to similar optimization during gate-level synthesis. The optimizations are tested for a RISC architecture, a VLIW architecture and two industrial embedded processors, Motorola M68HC11 and Infineon ICORE. The results indicate a significant improvement in overall performance.

design, automation, and test in europe | 2007

Design space exploration of partially re-configurable embedded processors

Anupam Chattopadhyay; W. Ahmed; K. Karari; David Kammler; Rainer Leupers; Gerd Ascheid; Heinrich Meyr

In todays embedded processors, performance and flexibility have become the two key attributes. These attributes are often conflicting. The best performance is obtained from custom designed integrated circuits. In contrast, the maximum flexibility is delivered by a general purpose processor. Among the architecture types emerged over the past years to strike an optimum balance between these two attributes, two are prominent. The first ones are field programmable gate array (FPGA)-based architectures and the second ones are application-specific instruction-set processors (ASIPs). Depending on the type of application (i.e. stream-like or control-dominated) either one of the above mentioned architecture types is able to deliver high performance or flexibility or both. Consequently, a new design approach with partial re-configurability on the application-specific processor is attracting strong research interest. We call this architecture re-configurable ASIP (rASIP). Currently, the lack of a high-level abstraction of the rASIP limits the designer from trying out various design alternatives because of long and tedious exploration cycles. To address this issue, in this paper, a high-level specification for re-configurable processors is proposed. Furthermore, a seamless design space exploration methodology using this specification is proposed

asia and south pacific design automation conference | 2005

A framework for automated and optimized ASIP implementation supporting multiple hardware description languages

Oliver Schliebusch; Anupam Chattopadhyay; David Kammler; Gerd Ascheid; Rainer Leupers; Heinrich Meyr; Tim Kogel

Architecture description languages (ADLs) are widely used to perform design space exploration for application specific instruction set processors (ASIPs). While the design space exploration is well supported by numerous tools providing high flexibility and quality, the methodology of automated implementation is limited to simple transformations. Assuming fixed architectural templates, information given in the ADL is directly mapped to a hardware description on register transfer level (RTL). Gate-level synthesis tools are not able to perform potential optimizations, as the computational complexity grows exponential with the size of the architecture. Information such as exclusiveness, parallelism or Boolean relations are spread over multiple modules and therefore hard to determine. In this paper, we present an ASIP synthesis approach from architecture description languages, based on an intermediate representation (IR). The IR is the key technology to provide new language-independent high-level optimizations and to realize different hardware description language backends. The feasibility of our approach is proven in a case-study.

Scientific Reports | 2016

Multistate Memristive Tantalum Oxide Devices for Ternary Arithmetic

Wonjoo Kim; Anupam Chattopadhyay; Anne Siemon; Eike Linn; Rainer Waser; Vikas Rana

Redox-based resistive switching random access memory (ReRAM) offers excellent properties to implement future non-volatile memory arrays. Recently, the capability of two-state ReRAMs to implement Boolean logic functionality gained wide interest. Here, we report on seven-states Tantalum Oxide Devices, which enable the realization of an intrinsic modular arithmetic using a ternary number system. Modular arithmetic, a fundamental system for operating on numbers within the limit of a modulus, is known to mathematicians since the days of Euclid and finds applications in diverse areas ranging from e-commerce to musical notations. We demonstrate that multistate devices not only reduce the storage area consumption drastically, but also enable novel in-memory operations, such as computing using high-radix number systems, which could not be implemented using two-state devices. The use of high radix number system reduces the computational complexity by reducing the number of needed digits. Thus the number of calculation operations in an addition and the number of logic devices can be reduced.

design, automation, and test in europe | 2015

Exploiting dynamic timing margins in microprocessors for frequency-over-scaling with instruction-based clock adjustment

Jeremy Constantin; Lai Wang; Georgios Karakonstantis; Anupam Chattopadhyay; Andreas Burg

Static timing analysis provides the basis for setting the clock period of a microprocessor core, based on its worst-case critical path. However, depending on the design, this critical path is not always excited and therefore dynamic timing margins exist that can theoretically be exploited for the benefit of better speed or lower power consumption (through voltage scaling). This paper introduces predictive instruction-based dynamic clock adjustment as a technique to trim dynamic timing margins in pipelined microprocessors. To this end, we exploit the different timing requirements for individual instructions during the dynamically varying program execution flow without the need for complex circuit-level measures to detect and correct timing violations. We provide a design flow to extract the dynamic timing information for the design using post-layout dynamic timing analysis and we integrate the results into a custom cycle-accurate simulator. This simulator allows annotation of individual instructions with their impact on timing (in each pipeline stage) and rapidly derives the overall code execution time for complex benchmarks. The design methodology is illustrated at the microarchitecture level, demonstrating the performance and power gains possible on a 6-stage OpenRISC in-order general purpose processor core in a 28nm CMOS technology. We show that employing instruction-dependent dynamic clock adjustment leads on average to an increase in operating speed by 38% or to a reduction in power consumption by 24%, compared to traditional synchronous clocking, which at all times has to respect the worst-case timing identified through static timing analysis.

Explore More