P. van der Wolf | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where P. van der Wolf is active.

Explore More

Publication

Featured researches published by P. van der Wolf.

signal processing systems | 1999

A methodology for architecture exploration of heterogeneous signal processing systems

Paul Lieverse; P. van der Wolf; Ed F. Deprettere; Kees A. Vissers

We present a methodology for the exploration of signal processing architectures at the system level. The methodology, named SPADE, provides a means to quickly build models of architectures at an abstract level, to easily map applications, modeled as Kahn Process Networks, onto these architecture models, and to analyze the performance of the resulting system by simulation. The methodology distinguishes between applications and architectures, and uses a trace-driven simulation technique for co-simulation of application models and architecture models. As a consequence, architecture models need not be functionally complete to be used for performance analysis while data dependent behavior is still handled correctly. We have used the methodology for the exploration of architectures and mappings of an MPEG-2 video decoder application.

IEEE Computer | 2001

Exploring embedded-systems architectures with Artemis

Andy D. Pimentel; L.O. Hertzbetger; Paul Lieverse; P. van der Wolf; E.E. Deprettere

Because embedded systems mostly target mass production and often run on batteries, they should be cheap to realize and power efficient. In addition, they require a high degree of programmability to provide real-time performance for multiple applications and standards. However, performance requirements as well as cost and power-consumption constraints demand that substantial parts of these systems be implemented in dedicated hardware blocks. As a result, their heterogeneous system architecture consists of components ranging from fully dedicated hardware components for time-critical application tasks. Increasingly, these designs yield heterogeneous embedded multiprocessor systems that reside together on a single chip. The heterogeneity of these highly programmable systems and the varying demands of their target applications greatly complicate system design. The increasing complexity of embedded-system architectures makes predicting performance behavior more difficult. Therefore, having the appropriate tools to explore different choices at an early design stage is increasingly important. The Artemis modeling and simulation environment aims to efficiently explore the design space of heterogeneous embedded-systems architectures at multiple abstraction levels and for a wide range of applications targeting these architectures. The authors describe their of this methodology in two studies that showed promising results, providing useful feedback on a wide range of design decisions involving the architectures for the two applications.

IEEE Design & Test of Computers | 2002

A heterogeneous multiprocessor architecture for flexible media processing

Martijn J. Rutten; J.T.J. van Eijndhoven; E.G.T. Jaspers; P. van der Wolf; Om Prakash Gangwal; A. Timmer; Evert-Jan D. Pol

Eclipse is a scalable architecture template for designing data-dependent stream-processing subsystems of media-processing SoCs. It combines application configuration flexibility with the efficiency of function-specific coprocessors that concurrently execute the tasks of one or more applications.

international parallel and distributed processing symposium | 2002

Eclipse: heterogeneous multiprocessor architecture for flexible media processing

Martijn J. Rutten; J.T.J. van Eijndhoven; E.D. Pol Egbert; G.T. Jaspers; P. van der Wolf; Om Prakash Gangwal; A. Timmer

Eclipse is a heterogeneous multiprocessor architecture for high-performance media processing, including high-definition MPEG encoding/decoding. The scalable architecture framework concurrently executes media processing kernels in function-specific multi-tasking coprocessors and a media processor, communicating via on-chip memory. Eclipse instances combine application configuration flexibility with the efficiency of function-specific hardware.

international conference on computer design | 1999

TriMedia CPU64 application development environment

Evert-Jan D. Pol; B.J.M. Aarts; J.T.J. van Eijndhoven; P. Struik; F.W. Sijstermans; M.J.A. Tromp; J.-W. van de Waerdt; P. van der Wolf

The architecture of the TriMedia CPU64 is based on the TM1000 DSPCPU. The original VLIW architecture has been extended with the concepts of vector processing and superoperations. The new vector operations and superoperations need to be supported by the compiler and simulator to make them accessible to application programmers. It was our intention to support these new features while remaining compliant with the ANSI C standard. This paper describes the mechanisms which were implemented to achieve this goal. Furthermore, the optimization of applications needs to address the vectorization of the functions to be implemented. Some general guidelines for producing efficient vectorized code are given.

Proceedings of SPIE | 1998

A combined hardware/software solution for stream prefetching in multimedia applications

P. Struik; P. van der Wolf; Andy D. Pimentel

Prefetch techniques may, in general, be applied to reduce the miss rate of a processors data cache and thereby improve the overall performance of the processor. More in particular, stream prefetch techniques can be applied to prefetch data streams that are often encountered in multimedia applications. Stream prefetch techniques exploit the fact that data from such streams are often accessed in a regular fashion. Implementing a stream prefetch technique involves two issues, viz. stream combined hardware/software stream prefetch technique. A special stream-prefetch instruction is introduced to alert the hardware that load instructions access a data stream. Subsequently, prefetching is handled by the hardware automatically in such a way that the rate at which the data is prefetched is synchronized with the rate at which the prefetched data is processed by the application. These kinds of stream prefetch techniques have been proposed earlier but use instruction addresses for synchronization. The technique that is introduced in this paper uses a different synchronization mechanism that does not suffer from drawbacks of instruction address synchronization.

embedded systems for real-time multimedia | 2005

An interface for the design and implementation of dynamic applications on multi-processor architectures

Jeffrey Kang; Tomas Henriksson; P. van der Wolf

Embedded multimedia systems are becoming more complex and versatile, and need to support dynamic applications with multiple use cases. Switching from one use case to another during run time involves changing the application task graph configuration. This paper presents concepts and an interface for modeling and implementing dynamic applications. We show that this interface can be used to model several application change scenarios, and that it can be implemented on different multiprocessor architectures.

embedded systems for real-time multimedia | 2006

TTL Hardware Interface: A High-Level Interface for Streaming Multiprocessor Architectures

Tomas Henriksson; P. van der Wolf

Digital chips for multimedia applications use function-specific hardware co-processors to achieve high performance at low power consumption. These co-processors are typically equipped with traditional address-based interfaces. Networks-on-chips (NoCs) are emerging as scalable interconnect for advanced digital chips. Integration of co-processors with NoCs requires load/store packetizing wrappers on the network interfaces. This leads to unnecessary address generation and address transportation over the NoC for streaming data. By using high-level message passing interfaces for the streaming data, the co-processors can be made simpler and better reusable and the NoCs are used more efficiently. We present the task transaction level (TTL) high-level hardware interface for streaming co-processors as a concrete proposal for such an interface. We present three case study implementations and conclude that a TTL high-level message passing interface is beneficial compared to an address-based interface because it offers a better match with NoCs and it allows for better reuse and simpler design of co-processors

international performance computing and communications conference | 2000

Hardware versus hybrid data prefetching in multimedia processors: a case study

Andy D. Pimentel; Louis O. Hertzberger; P. Struik; P. van der Wolf

Data prefetching is a promising technique for hiding the penalties due to compulsory cache misses. In this paper we present a case study on two types of data prefetching in the context of multimedia processing: a purely hardware-based technique and a more low-cost hybrid hardware/software technique. Moreover, we also propose a technique for increasing the so-called prefetch distance in hardware prefetching and a scheme to reduce trashing in the data cache. Our results demonstrate that the low-cost hybrid prefetching scheme slightly outperforms hardware-based prefetching for the code segments for which both solutions have been applied, while hardware prefetching potentially allows more code to benefit from the prefetching.

signal processing systems | 2006

Transparent Embedded Compression in Systems-on-Chip

A. K. Riemens; R.J. van der Vleuten; P. van der Wolf; G. Jacob; J.-W. van de Waerdt; J. G. Janssen

Bandwidth to off-chip memory is a scarce resource in complex systems-on-chip for embedded media processing. We apply embedded compression for bandwidth-hungry image processing functions in order to alleviate this bandwidth bottleneck. In our solution embedded compression is implemented as part of the system-on-chip infrastructure, fully transparent for the hardware and software image processing components. Hence it can be applied without requiring changes to these components. We present the compression algorithm and demonstrate that we achieve significant bandwidth reductions (20%-40%) for image data at acceptable cost (approximately 1 mm2 in 90 nm CMOS) while preserving high image quality

Explore More