Is this you? Create Your Porfile

Hritam Dutta

University of Erlangen-Nuremberg

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Hritam Dutta is active.

Explore More

Publication

Featured researches published by Hritam Dutta.

applied reconfigurable computing | 2008

PARO: Synthesis of Hardware Accelerators for Multi-dimensional Dataflow-Intensive Applications

Frank Hannig; Holger Ruckdeschel; Hritam Dutta; Jürgen Teich

In this paper, we present the PARO design tool for the automated hardware synthesis of massively parallel embedded architectures for given dataflow dominant applications. Key features of PARO are: (1) The design entry in form of a compact and intuitive functional programming language which allows highly parallel implementations. (2) Advanced partitioning techniques are applied in order to balance the trade-offs in cost and performance along with requisite throughputs. This is obtained by distributing computations onto an array of tightly coupled processor elements. (3) We demonstrate the performance of the FPGA synthesized hardware with several selected algorithms from different benchmarks.

international conference on acoustics, speech, and signal processing | 2004

Regular mapping for coarse-grained reconfigurable architectures

Frank Hannig; Hritam Dutta; Jürgen Teich

Similar to programmable devices such as processors or micro controllers, reconfigurable logic devices can also be built as software, by programming the configuration of the device. In this paper, we present an overview of constraints which have to be considered when mapping applications to coarse-grained reconfigurable architectures. The application areas of most of these architectures address computational-intensive algorithms like video and audio processing or wireless communication. Therefore, reconfigurable arrays are in direct competition with DSP processors which are traditionally used for digital signal processing. Hence, existing mapping methodologies are closely related to approaches from the DSP world. They try to employ pipelining and temporal partitioning but they do not exploit the full parallelism of a given algorithm and the computational potential of typically 2D arrays. We present a first case study for mapping regular algorithms onto reconfigurable arrays by using our design methodology which is characterized by loop parallelization in the polytope model. The case study shows that our regular mapping methodology may lead to highly efficient implementations taking the constraints of the architecture into account.

parallel computing in electrical engineering | 2006

Hierarchical Partitioning for Piecewise Linear Algorithms

Hritam Dutta; Frank Hannig; Jürgen Teich

Processor arrays are used as accelerators for plenty of data flow-dominant applications. The explosive growth in research and development of massively parallel processor array architectures has lead to demand for mapping tools to realize the full potential of these architectures. Such architectures are characterized by hierarchies of parallelism and memory structures, i.e. processor array apart from different levels of cache arrays have a number of processing elements (PE) where each PE can further contain sub-word parallelism. In order to handle large scale problems, balance local memory requirements with I/O-bandwidth, and use different hierarchies of parallelism and memory, one needs a sophisticated transformation called hierarchical partitioning. In this paper, we introduce for the first time a detailed methodology encompassing hierarchical partitioning

international parallel and distributed processing symposium | 2004

Mapping of regular nested loop programs to coarse-grained reconfigurable arrays - constraints and methodology

Frank Hannig; Hritam Dutta; Jürgen Teich

Summary form only given. Apart from academic, recently more and more commercial coarse-grained reconfigurable arrays have been developed. Computational intensive applications from the area of video and wireless communication seek to exploit the computational power of such massively parallel SoCs. Conventionally, DSP processors are used in the digital signal processing domain. Thus, the existing compilation techniques are closely related to approaches from the DSP world. These approaches employ several loop transformations, like pipelining or temporal partitioning, but they are not able to exploit the full parallelism of a given algorithm and the computational potential of a typical 2-dimensional array. In this paper, (i) we present an overview of constraints which have to be considered when mapping applications to coarse-grained reconfigurable arrays, (ii) we present our design methodology for mapping regular algorithms onto massively parallel arrays which is characterized by loop parallelization in the polytope model, and (Hi), in a first case study, we adapt our design methodology for targeting reconfigurable arrays. The case study shows that the presented regular mapping methodology may lead to highly efficient implementations taking into account the constraints of the architecture.

application-specific systems, architectures, and processors | 2006

A Design Methodology for Hardware Acceleration of Adaptive Filter Algorithms in Image Processing

Hritam Dutta; Frank Hannig; Jürgen Teich; Benno Heigl; Heinz Hornegger

Massively parallel processor array architectures can be used as hardware accelerators for a plenty of dataflow dominant applications. Bilateral filtering is an example of a state-of-the-art algorithm in medical imaging, which falls in the class of 2D adaptive filter algorithms. In this paper, we propose a semi-automatic mapping methodology for the generation of hardware accelerators for such a generic class of adaptive filtering applications in image processing. The final architecture deliver similar synthesis results as a hand-tuned design.

International Journal of Embedded Systems | 2006

Mapping a class of dependence algorithms to coarse-grained reconfigurable arrays : architectural parameters and methodology

Frank Hannig; Hritam Dutta; Jürgen Teich

Existing compilation techniques for coarse-grained reconfigurable arrays are closely related to approaches from the DSP world. These approaches employ several loop transformations, like pipelining or temporal partitioning, but they are not able to exploit the full parallelism of a given algorithm and the computational potential of a typical 2-dimensional array. In this paper: we present an overview of constraints which have to be considered when mapping applications to coarse-grained reconfigurable arrays; we present our design methodology for mapping regular algorithms onto massively parallel arrays which is characterised by loop parallelisation in the polytope model; and, in a first case study, we adapt our design methodology for targeting reconfigurable arrays. The case study shows that the presented regular mapping methodology may lead to highly efficient implementations taking into account the constraints of the architecture.

automation, robotics and control systems | 2006

Controller synthesis for mapping partitioned programs on array architectures

Hritam Dutta; Frank Hannig; Jürgen Teich

Processor arrays can be used as accelerators for a plenty of dataflow-dominant applications. Innately these applications have almost no control flow, but the application of sophisticated partitioning and scheduling techniques in order to handle large scale problems and to balance local memory requirements with I/O-bandwidth has the disadvantage of a more complex control flow. Thus, efficient control path synthesis is one of the greatest challenges when compiling algorithms onto processor arrays. This paper presents an efficient methodology for the automated control path synthesis for the mapping of partitioned algorithms onto processor arrays. The major advantages observed in the presented methodology are seen in, (a) control generation for different partitioning techniques and arbitrary parallelepiped tiles, (b) combined use of a global and a local control strategy in order to reduce the control overhead, (c) up to 90 percent reduction in control path area and resources compared to existing approaches.

design, automation, and test in europe | 2009

Model-based synthesis and optimization of static multi-rate image processing algorithms

Joachim Keinert; Hritam Dutta; Frank Hannig; Christian Haubelt; Jürgen Teich

High computational effort in modern image processing applications like medical imaging or high-resolution video processing often demands for massively parallel special purpose architectures in form of FPGAs or ASICs. However, their efficient implementation is still a challenge, as the design complexity causes exploding development times and costs. This paper presents a new design flow which permits to specify, analyze, and synthesize complex image processing algorithms. A novel buffer requirement analysis allows exploiting possible tradeoffs between required communication memory and computational logic for multi-rate applications. The derived schedule and buffer results are taken into account for resource optimized synthesis of the required hardware accelerators. Application to a multi-resolution filter shows that buffer analysis is possible in less than one second and that scheduling alternatives influence the required communication memory by up to 24% and the computational resources by up to 16%.

automation, robotics and control systems | 2009

Parallelization Approaches for Hardware Accelerators --- Loop Unrolling Versus Loop Partitioning

Frank Hannig; Hritam Dutta; Jürgen Teich

State-of-the-art behavioral synthesis tools barely have high-level transformations in order to achieve highly parallelized implementations. If any, they apply loop unrolling to obtain a higher throughput. In this paper, we employ the PARO behavioral synthesis tool which has the unique ability to perform both loop unrolling or loop partitioning. Loop unrolling replicates the loop kernel and exposes the parallelism for hardware implementation, whereas partitioning tiles the loop program onto a regular array consisting of tightly coupled processing elements. The usage of the same design tool for both the variants enables for the first time, a quantitative evaluation of the two approaches for reconfigurable architectures with help of computationally intensive algorithms selected from different benchmarks. Superlinear speedups in terms of throughput are accomplished for the processor array approach. In addition, area and power cost are reduced.

Microprocessors and Microsystems | 2009

A holistic approach for tightly coupled reconfigurable parallel processors

Hritam Dutta; Dmitrij Kissler; Frank Hannig; Alexey Kupriyanov; Jürgen Teich; Bernard Pottier

New standards in signal, multimedia, and network processing for embedded electronics are characterized by computationally intensive algorithms, high flexibility due to the swift change in specifications. In order to meet demanding challenges of increasing computational requirements and stringent constraints on area and power consumption in fields of embedded engineering, there is a gradual trend towards coarse-grained parallel embedded processors. Furthermore, such processors are enabled with dynamic reconfiguration features for supporting time- and space-multiplexed execution of the algorithms. However, the formidable problem in efficient mapping of applications (mostly loop algorithms) onto such architectures has been a hindrance in their mass acceptance. In this paper we present (a) a highly parameterizable, tightly coupled, and reconfigurable parallel processor architecture together with the corresponding power breakdown and reconfiguration time analysis of a case study application, (b) a retargetable methodology for mapping of loop algorithms, (c) a co-design framework for modeling, simulation, and programming of such architectures, and (d) loosely coupled communication with host processor.

Explore More