J.T.J. van Eijndhoven

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where J.T.J. van Eijndhoven is active.

Explore More

Publication

Featured researches published by J.T.J. van Eijndhoven.

IEEE Design & Test of Computers | 2002

A heterogeneous multiprocessor architecture for flexible media processing

Martijn J. Rutten; J.T.J. van Eijndhoven; E.G.T. Jaspers; P. van der Wolf; Om Prakash Gangwal; A. Timmer; Evert-Jan D. Pol

Eclipse is a scalable architecture template for designing data-dependent stream-processing subsystems of media-processing SoCs. It combines application configuration flexibility with the efficiency of function-specific coprocessors that concurrently execute the tasks of one or more applications.

high-performance computer architecture | 2003

Inter-cluster communication models for clustered VLIW processors

Andrei Terechko; E. Le Thenaff; M. Garg; J.T.J. van Eijndhoven; H. Corporaal

Clustering is a well-known technique to improve the implementation of single register file VLIW processors. Many previous studies in clustering adhere to an inter-cluster communication means in the form of copy operations. This paper, however, identifies and evaluates five different inter-cluster communication models, including copy operations, dedicated issue slots, extended operands, extended results, and broadcasting. Our study reveals that these models have a major impact on performance and implementation of the clustered VLIW. We found that copy operations executed in regular VLIW issue slots significantly constrain the scheduling freedom of regular operations. For example, in the dense code for our four cluster machine the total cycle count overhead reached 46.8% with respect to the unicluster architecture, 56% of which are caused by the copy operation constraint. Therefore, we propose to use other models (e.g. extended results or broadcasting), which deliver higher performance than the copy operation model at the same hardware cost.

international parallel and distributed processing symposium | 2002

Eclipse: heterogeneous multiprocessor architecture for flexible media processing

Martijn J. Rutten; J.T.J. van Eijndhoven; E.D. Pol Egbert; G.T. Jaspers; P. van der Wolf; Om Prakash Gangwal; A. Timmer

Eclipse is a heterogeneous multiprocessor architecture for high-performance media processing, including high-definition MPEG encoding/decoding. The scalable architecture framework concurrently executes media processing kernels in function-specific multi-tasking coprocessors and a media processor, communicating via on-chip memory. Eclipse instances combine application configuration flexibility with the efficiency of function-specific hardware.

field-programmable custom computing machines | 2001

An 8x8 IDCT Implementation on an FPGA-Augmented TriMedia

Mihai Sima; Sorin Cotofana; J.T.J. van Eijndhoven; Stamatis Vassiliadis; Kornelis Antonius Vissers

This paper presents an experiment which aims to assess the potential impact on performance yielded by augmenting a TriMedia/CPU64 processor with a reconfigurable core. We first propose the skeleton of an extension of the Tri-Media/CPU64 architecture, which consists of a Reconfigurable Functional Unit (RFU) and the associated instructions. Then, we address the computation of the 8×8 IDCT on such extended TriMedia and propose a scheme to implement the 1-D IDCT operation on the RFU. When implemented on an ACEX EP1K100 FPGA from Altera, the proposed 1-D IDCT exhibits a latency of 16 and a recovery of 2 TriMedia (200 MHz) cycles, and occupies 42% of the device. By configuring the 1-D IDCT computing facility on the RFU at application load-time, a 2-D IDCT including all overheads can be computed with the throughput of 1/32 IDCT/cycle. This is an improvement of more than 40% over the standard TriMedia/CPU64.

international conference on vlsi design | 2003

Design of a 2D DCT/IDCT application specific VLIW processor supporting scaled and sub-sampled blocks

R. Krishnan; Om Prakash Gangwal; J.T.J. van Eijndhoven; A. Kumar

We present an innovative design of an accurate, 2D DCT IDCT processor, which handles scaled and sub-sampled input blocks efficiently. In the IDCT mode, the latency of the processor scales with the size of the input blocks varying from 7 cycles for an 1/spl times/1 block to 38 cycles for an 8 /spl times/ 8 block. This scalability is possible because the processor has input data dependant control by which it can exploit the reduced computational needs of sub-sampled blocks and blocks of smaller sizes to work in lesser cycles. This is a very useful feature for MPEG and HDTV decoders and has hitherto not been exploited. Clocking at 150 Mhz, the processor satisfies the high sample rate requirement of dual MPEG stream HD decoding with a picture size of 1920 /spl times/ 1080 at 30 frames per second. Fixed word length and accuracy simulations of our design shows that it conforms to the accuracy specifications of the CCITT standard within a 16 bit data path. A methodology based on architecture level synthesis is used to design the VLIW processor core. The VLIW design exploits the Instruction Level Parallelism present in the DCT/IDCT application, efficiently. The processor core is characterised by an area of 0.834 mm sq. and a frequency of 150 Mhz in 0.18 micron CMOS technology.

field-programmable custom computing machines | 2002

MPEG-compliant entropy decoding on FPGA-augmented TriMedia/CPU64

M. Simat; Sorin Cotofana; Stamatis Vassiliadis; J.T.J. van Eijndhoven; K. Vissers

The paper presents a Design Space Exploration (DSE) experiment which has been carried out in order to determine the optimum FPGA-based Variable-Length Decoder (VLD) computing resource and its associated instructions, with respect to an entropy decoding task which is to be executed on the FPGA-augmented TriMedia/CPU64 processor We first outline the extension of the TriMedia/CPU64 architecture, which consists of an FPGA-based Reconfigurable Functional Unit (RFU) and the associated generic instructions. Then we address entropy decoding and propose a strategy to partially break the data dependency related to variable-length decoding. Three VLDs (VLD-1, VLD-2, VLD-3) instructions which can return 1, 2, or 3 symbols, respectively, are subsequently analyzed. After completing the DSE, we determined that VLD-2 instruction leads to the most efficient entropy decoding in terms of instruction cycles and FPGA area. The FPGA-based implementation of the computing resource associated to VLD-2 instruction is subsequently presented. When mapped on an ACEX EP1K100 FPGA from Altera, VLD-2 exhibits a latency of 8 TriMedia cycles, and uses all the Electronic Array Blocks and 51% of the logic cells of the device. The simulation results indicate that the VLD-2-based entropy decoder is 43% faster than its pure software counterpart.

international conference on computer design | 2001

MPEG macroblock parsing and pel reconstruction on an FPGA-augmented TriMedia processor

Mihai Sima; Sorin Cotofana; S. Vasseliadis; J.T.J. van Eijndhoven; Kornelis Antonius Vissers

This paper describes an experiment which aims to reveal the potential impact on performance yielded by augmenting a TriMedia-CPU64 processor with a multiple-context FPGA core. We first propose an extension of the TriMedia CPU64 architecture, which consists of a reconfigurable functional unit and its associated instructions. Then, we address the decoding of variable-length codes on such extended TriMedia and describe the architecture and FPGA-implementation of a variable-length decoder (VLD) computing facility. When mapped on an ACEX EP1K100 FPGA, the proposed VLD exhibits a latency of 7 cycles. Preliminary results indicate that by configuring each of the VLD and 1-D IDCT (which is described elsewhere) facilities on a different FPGA context, and by activating the contexts as needed, the augmented TriMedia can perform macroblock parsing followed up by pel reconstruction with an improvement of 20 - 25% over the standard TriMedia.

design, automation, and test in europe | 2005

Compositional Memory Systems for Multimedia Communicating Tasks

Anca Mariana Molnos; M.J.M. Heijligers; Sorin Cotofana; J.T.J. van Eijndhoven

Conventional cache models are not suited for real-time parallel processing because tasks may flush each others data out of the cache in an unpredictable manner In this way the system is not compositional so the overall performance is difficult to predict and the integration of new tasks expensive. This paper proposes a new method that imposes compositionality to the systems performance and makes different memory hierarchy optimizations possible for multimedia communicating tasks when running on embedded multiprocessor architectures. The method is based on a cache allocation strategy that assigns sets of the unified cache exclusively to tasks and to the communication buffers. We also analytically formulate the problem and describe a method to compute the cache partitioning ratio for optimizing the throughput and the consumed power. When applied to a multiprocessor with memory hierarchy our technique delivers also performance gain. Compared to the shared cache case, for an application consisting of two jpeg decoders and one edge detection algorithm 5 times less misses are experienced and for an mpeg2 decoder 6.5 times less misses are experienced.

design, automation, and test in europe | 2001

PRMDL: a machine description language for clustered VLIW architectures

Andrei Terechko; Evert-Jan D. Pol; J.T.J. van Eijndhoven

Summary form only given. PRMDL is a format of the central machine description file that contains parameters of the whole retargetable compiler-simulator framework. The format features separate software and hardware views on the processor and defines a wide scope of the framework retargetability, enabling platform-based processor design and vast design space exploration for clustered VLIW architectures.

embedded systems for real-time multimedia | 2004

Application design trajectory towards reusable coprocessors - MPEG case study

Martijn J. Rutten; Om Prakash Gangwal; J.T.J. van Eijndhoven; E.G.T. Jaspers; E.J. Pol

This work presents a structured application design trajectory to transform media-processing applications - modeled as Kahn process network - into a set of function-specific hardware units called coprocessors. The proposed design trajectory focuses on identifying hardware-implementable computation kernels that are common for a predetermined set of applications. The design trajectory is exercised in a case study that maps MPEG video decoding and encoding applications onto a set of coprocessors in a heterogeneous multiprocessor architecture. The resulting set of coprocessors can simultaneously perform both encoding and decoding functions for multiple MPEG-2 streams in an estimated 4 mm/sup 2/ (excluding memory) in 0.18 /spl mu/ technology.

Explore More