Paul E. R. Lippens | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Paul E. R. Lippens is active.

Explore More

Publication

Featured researches published by Paul E. R. Lippens.

european design automation conference | 1991

PHIDEO: a silicon compiler for high speed algorithms

Paul E. R. Lippens; J. van Meerbergen; A. van der Werf; Wim F. J. Verhaegh; B.T. McSweeney; J. O. Huisken; Owen Paul Mcardle

PHIDEO is a silicon compiler targeted at the design of high performance real time systems with high sampling frequencies such as HDTV. It supports the complete design trajectory starting from a high level specification all the way down to layout. New techniques are used to perform global optimisations across loop boundaries in hierarchical flow graphs. The compiler is based on a new target architectural model. Apart from the datapaths special attention is paid to memory optimisation. The new techniques are demonstrated using a progressive scan conversion algorithm.<<ETX>>

Design Automation for Embedded Systems | 2002

C-HEAP: A Heterogeneous Multi-Processor Architecture Template and Scalable and Flexible Protocol for the Design of Embedded Signal Processing Systems

Andre K. Nieuwland; Jeffrey Kang; Om Prakash Gangwal; Ramanathan Sethuraman; Natalino G. Busá; Kees Goossens; Rafael Peset Llopis; Paul E. R. Lippens

The key issue in the design of Systems-on-a-Chip (SoC) is to trade-off efficiency against flexibility, and time to market versus cost. Current deep submicron processing technologiesenable integration of multiple software programmable processors (e.g., CPUs,DSPs) and dedicated hardware components into a single cost-efficient IC. Ourtop-down design methodology with various abstraction levels helps designingthese ICs in a reasonable amount of time. This methodology starts with a high-levelexecutable specification, and converges towards a silicon implementation.A major task in the design process is to ensure that all components (hardwareand software) communicate with each other correctly. In this article, we tacklethis problem in the context of the signal processing domain in two ways: wepropose a modular, flexible, and scalable heterogeneous multi-processor architecturetemplate based on distributed shared memory, and we present an efficient andtransparent protocol for communication and (re)configuration. The protocolimplementations have been incorporated in libraries, which allows quick traversalof the various abstraction levels, so enabling incremental design. The designdecisions to be taken at each abstraction level are evaluated by means of(co-)simulation. Prototyping is used too, to verify the systems functionalcorrectness. The effectiveness of our approach is illustrated by a designcase of a multi-standard video and image codec.

international conference on computer aided design | 1993

Allocation of multiport memories for hierarchical data streams

Paul E. R. Lippens; J. van Meerbergen; Wim F. J. Verhaegh; A. van der Werf

A multiport memory allocation problem for hierarchical, i.e. multi-dimensional, data streams is described. Memory allocation techniques are used in high level synthesis for foreground and background memory allocation, the design of data format converters, and the design of synchronous inter-processor communication hardware. The techniques presented in this paper differ from other approaches in the sense that data streams are considered to be design entities and are not expanded to individual samples. A formal model for hierarchical data streams is given and a memory allocation algorithm is presented. The algorithm comprises two steps: data routing and assignment of signal delays to memories. A number of sub-problems are formulated as ILP programs. In the presented form, the allocation algorithm only considers interconnect costs, but memory size and other cost factors can be taken into account. The presented work is implemented in the memory allocation tool MEDEA which is part of the PHIDEO synthesis system.

international symposium on systems synthesis | 2001

A scalable and flexible data synchronization scheme for embedded HW-SW shared-memory systems

Om Prakash Gangwal; Andre K. Nieuwland; Paul E. R. Lippens

This paper describes the implementation of a data-synchronization scheme that can be used in the functional description and hardware realization of algorithms for heterogeneous multiprocessor architectures. In this scheme, synchronization primitives are chosen such that they can be implemented efficiently in both hardware and software on distributed shared memory architectures, without the need for atomic semaphore instructions. The proposed solution is flexible as the configuration of the data synchronization is programmable even after a hardware realization. It is also scalable since it can be implemented without the need for central resources. We show with experiments that distributed implementations are needed for scalable and high performance systems-on-a-chip.

signal processing systems | 1995

PHIDEO: High-level synthesis for high throughput applications

Jef L. van Meerbergen; Paul E. R. Lippens; Wim F. J. Verhaegh; Albert Van Der Werf

This paper describes a new approach to high-level synthesis for high throughput applications. Such applications are typically found in real-time video systems such as HDTV. The method is capable of dealing with hierarchical flow graphs containing loops with manifest boundaries and linear index expressions. The algorithm is based on the model of periodic operations which allows optimizations across loop boundaries. Processing units and storage units are minimized simultaneously. The algorithm is implemented in thePHIDEO system. The major parts of this system are the processing unit synthesis, the scheduler and the memory synthesis including address generation.

international conference on computer aided design | 1992

Area optimization of multi-functional processing units

A. van der Werf; M. J. H. Peek; Emile H. L. Aarts; J. van Meerbergen; Paul E. R. Lippens; Wim F. J. Verhaegh

Functions executed by a multifunctional processing unit (PU) correspond to clusters of operations in the specification, which are represented as signal flow graphs (SFGs). Because of high-throughput demands, the operations of each SFG are executed in parallel. Since operations for only one of the SFGs are executed at a given time, operations belonging to different SFGs can be executed on the same operator. Here, the most important part of the mapping of several SFGs onto one PU, which is the assignment of the SFGs operations to the PUs operators, given a number of allocated operators, is considered. The problem is to find an operator assignment that minimizes the silicon area that is occupied by the PUs interconnection consisting of multiplexers and wires. An approach based on local search algorithms such as iterative improvement and simulated annealing is presented. Although these algorithms are known to be generally applicable, it is shown that detailed knowledge of the operator assignment problem is required to obtain good results within acceptable CPU time limits for large problem instances.<<ETX>>

international conference on computer aided design | 1992

Efficiency improvements for force-directed scheduling

Wim F. J. Verhaegh; Paul E. R. Lippens; Emile H. L. Aarts; Jan H. M. Korst; A. van der Werf; J. van Meerbergen

Force-directed scheduling is a technique which schedules operations under time constraints in order to achieve schedules with a minimum number of resources. The worst case time complexity of the algorithm is cubic in the number of operations. This is due to the computation of the changes in the distribution functions needed for the force calculations. An incremental way to compute the changes in the distribution functions, based on gradual time-frame reduction, is presented. This reduces the time complexity of the algorithm to quadratic in the number of operations, without any loss in effectiveness or generality of the algorithm. Implementations show a substantial CPU-time reduction of force-directed scheduling, which is illustrated by means of some industrially relevant examples.<<ETX>>

custom integrated circuits conference | 1991

Memory synthesis for high speed DSP applications

Paul E. R. Lippens; J. van Meerbergen; A. van der Werf; Wim F. J. Verhaegh; B.T. McSweeney

Describes technique for performing automatic memory allocation and address allocation for high-speed applications. Memory access conflicts are solved and a global strategy to merge memory units is presented. Efficient reuse of memory locations is obtained by the proposed address allocation techniques. The techniques are based on a stream model for describing data transport. As a specific application, the memory management of the PHIDEO silicon compiler is discussed.<<ETX>>

european design automation conference | 1993

Relative location assignment for repetitive schedules

J. van Meerbergen; Paul E. R. Lippens; Wim F. J. Verhaegh; A. van der Werf

The authors point out that storage synthesis is becoming an important part of high-level synthesis. Emphasis is put on location assignment, i.e., the assignment of locations to variables in a storage unit. The technique of relative location assignment is discussed. This technique combines a fast algorithm (O(n log n)) with an efficient solution because at most one location more than the strict minimum is needed. As a consequence, the technique is particularly suited for large applications. This is illustrated using some real-life examples. The technique is implemented in a tool, called Matchbox, which is part of the Phideo synthesis system.<<ETX>>

european design and test conference | 1994

Optimization of address generator hardware

D.M. Grant; J. van Meerbergen; Paul E. R. Lippens

This paper describes an optimization process specific to address generation hardware. By examining a set of pre-defined address sequences at both the word- and bit-levels, a pool of possible hardware solutions may be created from which a global, optimal, bit-level implementation must be found which covers all address sequences. Optimization is completed following a generally iterative method and the resulting architecture may be further improved using generic logic synthesis. The whole process has been implemented in the tool ZIPPO and results for industrially relevant examples are presented.<<ETX>>

Explore More