Mladen Berekovic
Braunschweig University of Technology
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Mladen Berekovic.
applied reconfigurable computing | 2007
Frank Bouwens; Mladen Berekovic; Andreas Kanstein; Georgi Gaydadjiev
Reconfigurable computational architectures are envisioned to deliver power efficient, high performance, flexible platforms for embedded systems design. The coarse-grained reconfigurable architecture ADRES (Architecture for Dynamically Reconfigurable Embedded Systems) and its compiler offer a tool flow to design sparsely interconnected 2D array processors with an arbitrary number of functional units, register files and interconnection topologies. This article presents an architectural exploration methodology and its results for the first implementation of the ADRES architecture on a 90nm standard-cell technology. We analyze performance, energy and power trade-offs for two typical kernels from the multimedia and wireless domains: IDCT and FFT. Architecture instances of different sizes and interconnect structures are evaluated with respect to their power versus performance trade-offs. An optimized architecture is derived. A detailed power breakdown for the individual components of the selected architecture is presented.
signal processing systems | 1999
Mladen Berekovic; Hans-Joachim Stolberg; Mark Bernd Kulaczewski; Peter Pirsch; Henning Möller; Holger Runge; Johannes Kneip; Benno Stabernack
This paper describes instruction set extensions for the acceleration of MPEG-4 algorithms on programmable (RISC-) CPUs. MPEG-4 standardizes audio and video compression schemes for a variety of bit rates and scenarios. As MPEG-4 targets a much broader range of different applications than previously defined hybrid video coding standards like H.263 or MPEG-2, it employs a much higher number of different algorithms and coding modes. Therefore, MPEG-4 implementations will require a more software-oriented approach to be efficient. However, the total computational load for an optimized implementation of an MPEG-4 video codec is expected to exceed the performance levels of todays multimedia signal processors, making further hardware acceleration a necessity. For that purpose, we propose a number of instruction set extensions that add function-specific blocks to the data path of a CPU. These dedicated modules are highly adapted to the most computation-intensive processing schemes of MPEG-4, such as DCT, motion compensation, padding, shape coding, or bitstream parsing. The increased functionality of basic instructions results in a significant speed-up over standard RISC instruction sets, thus making MPEG-4 implementations feasible on programmable processor platforms. Possible target architectures include VLIW multimedia processors, MIMD-style multiprocessors, or coprocessor architectures
high performance embedded architectures and compilers | 2008
Frank Bouwens; Mladen Berekovic; Bjorn De Sutter; Georgi Gaydadjiev
Reconfigurable architectures provide power efficiency, flexibility and high performance for next generation embedded multimedia devices. ADRES, the IMEC Coarse-Grained Reconfigurable Array architecture and its compiler DRESC enable the design of reconfigurable 2D array processors with arbitrary functional units, register file organizations and interconnection topologies. This creates an enormous design space making it difficult to find optimized architectures. Therefore, architectural explorations aiming at energy and performance trade-offs become a major effort. In this paper we investigate the influence of register file partitions, register file sizes and the interconnection topology of ADRES. We analyze power, performance and energy delay trade-offs using IDCT and FFT as benchmarks while targeting 90nm technology. We also explore quantitatively the influences of several hierarchical optimizations for power by applying specific hardware techniques, i.e. clock gating and operand isolation. As a result, we propose an enhanced architecture instantiation that improves performance by 60 - 70% and reduces energy by 50%.
IEEE Transactions on Circuits and Systems for Video Technology | 2002
Mladen Berekovic; Hans-Joachim Stolberg; Peter Pirsch
The newly defined MPEG-4 Advanced Simple (AS) profile delivers single-layered streaming video in digital television (DTV) quality in the promising 1-2 Mbit/s range. However, the coding tools involved add significantly to the complexity of the decoding process, raising the need for further hardware acceleration. A programmable multicore system-on-chip (SOC) architecture is presented which targets MPEG-4 AS profile decoding of ITU-R 601 resolution streaming video. Based on a detailed analysis of corresponding bitstream statistics, the implementation of an optimized software video decoder for the proposed architecture is described. Results show that overall performance is sufficient for real-time AS profile decoding of ITU-R 601 resolution video.
signal processing systems | 1997
Mladen Berekovic; H. Kloos; Peter Pirsch
This paper describes a new architecture for JAVA-based, interactive multimedia applications. A hardware implementation of a Java Virtual Machine (JVM) is proposed, which allows the direct execution of Java bytecode. In a single clock cycle, up to 3 bytecode instructions can be decoded and executed in parallel using a RISC pipeline. A splitable 64-bit ALU implementation addresses demanding processing requirements of typical multimedia signal processing schemes. The on-chip caches are adapted to the specific data structures of the JVM. The proposed architecture supports execution of multiple Java threads in parallel. An implementation of basic building blocks of the processor with a standard-cell library provides an estimate of 150 MHz clock-speed for a 0.35 μm 3 metal layer CMOS process. With a size of less than 10 mm2 needed for the core logic, it is possible to integrate multiple JVMs together with larger cache memories on a single chip. Based on these results, we discuss various performance aspects of JAVA for use in future multimedia terminals.
signal processing systems | 1995
Johannes Kneip; Jens Wittenburg; Mladen Berekovic; K. Ronner; Peter Pirsch
Recent sub-μ semiconductor technology supports the monolithic integration of multiprocessor systems. High wiring density and short on-chip memory access cycles motivate novel architecture concepts, outperforming conventional parallel systems. An efficient controlling strategy is a key to gain high performance from limited silicon resources. In this paper, a controlling concept for a monolithic Autonomous Single-Instruction/Multiple Data (ASIMD) processor is presented, which combines the high parallelism of an SIMD approach with the flexibility of standard DSP architectures. To demonstrate the performance gains of the concept, a digital video signal processor, the HiPAR-DSP has been implemented. It consists of an array of 4 or 16 datapaths, local memories for each datapath, a shared memory with concurrent data access in shape of a matrix and a central RISC controller. A three stage execution autonomy has been implemented, consisting of conditional instructions, conditional skip of instructions by the data paths and global evaluation of local conditions by the central controller. This allows efficient execution of data dependent medium- and high-level algorithms with very low controlling overhead. A performance of up to two arithmetic gigaoperations per second is achieved for algorithms with irregular data flow or control flow for the 100 MHz clocked processor with 16 data paths.
signal processing systems | 1999
Sven Bauer; Johannes Kneip; T. Mlasko; Bernd Schmale; J. Vollmer; A. Hutter; Mladen Berekovic
The upcoming MPEG-4 standard provides new possibilities for the compression and presentation of multimedia contents. The main characteristics of MPEG-4 are the object-based coding and representation of an audio-visual scene and the ability to code objects of natural or synthetic origin. These features will enhance existing applications with new functionalities and enable standardised solutions for new applications. This paper provides an overview of the three major parts Systems, Visual and Audio of the new MPEG-4 standard, highlights implementation aspects for some envisaged types of MPEG-4 terminals and describes possible future multimedia application scenarios using MPEG-4 functionalities.
design, automation, and test in europe | 2003
Hans-Joachim Stolberg; Mladen Berekovic; L. Friebe; Sören Moch; S. Flugel; Xun Mao; Mark Bernd Kulaczewski; H. Klussmann; Peter Pirsch
The HiBRID-SoC multi-core system-on-chip targets a wide range of application fields with particularly high processing demands, including general signal processing applications, video and audio de-/encoding, and a combination of these tasks. For this purpose, the HiBRID-SoC integrates three fully programmable processors cores and various interfaces onto a single chip, all tied to a 64 bit AMBA AHB bus. The processor cores are individually optimized to the particular computational characteristics of different application fields, complementing each other to deliver high performance levels with high flexibility at reduced system cost. The HiBRID-SoC is fabricated in a 0.18 /spl mu/m 6LM standard-cell CMOS technology, occupies about 82 mm/sup 2/, and operates at 145 MHz.
international symposium on circuits and systems | 2000
Hans-Joachim Stolberg; Mladen Berekovic; Peter Pirsch; Holger Runge; Henning Möller; Johannes Kneip
M-PIRE is a programmable MPEG-4 multimedia codec VLSI for mobile and stationary applications. It integrates a RISC core, two separate DSPs, a 64-bit dual-issue VLIW macroblock engine, and an autonomous I/O processor on a single chip to cope with the high flexibility and processing demands of the MPEG-4 standard. The first M-PIRE implementation will consume 90 mm/sup 2/ in 0.25 /spl mu/ CMOS technology. It will support real-time video and audio processing of MPEG-4 simple profile or ITU H.26x standards; future designs of M-PIRE will add support for higher MPEG-4 profiles. This paper focuses on the architecture, instruction set, and performance of M-PIREs macroblock engine, which carries most of the workload in MPEG-4 video processing.
design, automation, and test in europe | 2007
C. Arbelo; Andreas Kanstein; Sebastián López; José Francisco López; Mladen Berekovic; Roberto Sarmiento; Jean-Yves Mignolet
Deblocking filtering represents one of the most compute intensive tasks in an H.264/AVC standard video decoder due to its demanding memory accesses and irregular data flow. For these reasons, an efficient implementation poses big challenges, especially for programmable platforms. In this sense, the mapping of this decoders functionality onto a C-programmable coarse-grained reconfigurable architecture named ADRES (architecture for dynamically reconfigurable embedded systems) is presented in this paper, including results from the evaluation of different topologies. The results obtained show a considerable reduction in the number of cycles and memory accesses needed to perform the filtering as well as an increase in the degree of instruction parallelism (ILP) when compared with an implementation on a very long instruction word (VLIW) dedicated processor. This demonstrates that high ILP is achievable on the ADRES even for irregular, data-dependent kernels