Johan Janssen | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Johan Janssen is active.

Explore More

Publication

Featured researches published by Johan Janssen.

ACM Transactions on Programming Languages and Systems | 1997

Making graphs reducible with controlled node splitting

Johan Janssen; Henk Corporaal

Several compiler optimizations, such as data flow analysis, the exploitation of instruction-level parallelism (ILP), loop transformations, and memory disambiguation, require programs with reducible control flow graphs. However, not all programs satisfy this property. A new method for transforming irreducible control flow graphs to reducible control flow graphs, called Controlled Node Splitting (CNS), is presented. CNS duplicates nodes of the control flow graph to obtain reducible control flow graphs. CNS results in a minimum number of splits and a minimum number of duplicates. Since the computation time to find the optimal split sequence is large, a heuristic has been developed. The results of this heuristic are close to the optimum. Straightforward application of node splitting resulted in an average code size increase of 235% per procedure of our benchmark programs. CNS with the heuristic limits this increase to only 3%. The impact on the total code size of the complete programs is 13.6% for a straightforward application of node splitting. However, when CNS is used, with the heuristic the average growth in code size of a complete program dramatically reduces to 0.2%

compiler construction | 1996

Controlled Node Splitting

Johan Janssen; Henk Corporaal

To exploit instruction level parallelism in programs over multiple basic blocks, programs should have reducible control flow graphs. However not all programs satisfy this property. A new method, called Controlled Node Splitting (CNS), for transforming irreducible control flow graphs to reducible control flow graphs is presented. CNS duplicates nodes of the control flow graph to obtain reducible control flow graphs. CNS results in a minimum number of splits and a minimum number of duplicates. Since the computation time to find the optimal split sequence is large, a heuristic has been developed. The results of this heuristic are close to the optimum. Straightforward application of node splitting may result in an average code size increase of 235%. CNS with the heuristic limits the increase to only 3%.

digital systems design | 2003

Understanding video pixel processing applications for flexible implementations

Om Prakash Gangwal; Johan Janssen; Selliah Rathnam; Erwin B. Bellers; Marc Duranton

Media processing system-on-chips (SoCs) mainly consist of audio encoding/decoding (e.g. AC-3, MP3), video encoding/decoding (e.g. H263, MPEG-2) and video pixel processing functions (e.g. de-interlacing, noise reduction). Video pixel processing functions have very high computational demands, as they require a large amount of computations on large amount of data (note that the data are pixels of completely decoded pictures). In this paper, we focus on video pixel processing functions. Usually, these functions are implemented in dedicated hardware. However, flexibility (by means of programmability or reconfigurability) is needed to introduce the latest innovative algorithms, to allow differentiation of products, and to allow bug fixing after fabricating chips. It is impossible to fulfill the computational requirements of these functions by current programmable media processors. To achieve efficient implementations for flexible solutions, we will study, in this paper, the application characteristics of some representative video pixel processing functions. The characteristics considered are granularity of operations, amount and kind of data accesses and degree of parallelism present in these functions. We observe that from computational granularity point of view many functions can be expressed in terms of kernels e.g. Median3 (i.e. median of three values), finite impulse response (FIR) filters, table lookups (LUT) etc. that are coarser grain than ALU, Mult, MAC, etc. Regarding the kind of data accesses, we categorize these functions as regular, regular with some data rearrangement and irregular data access patterns. Furthermore, the degree of parallelism present in these functions is expressed in terms of data level parallelism (DLP) and instruction/operation level parallelism (ILP). We show with an example that these properties can be exploited to make specialized programmable processors.

International Journal of Parallel Programming | 2000

Computation in the Context of Transport Triggered Architectures

Henk Corporaal; Johan Janssen; Marnix Arnold

Processors used in embedded systems have specific requirements which are not always met by off-the-shelf processors. A templated processor architecture, which can easily be tuned towards a certain application (domain) offers a solution. The transport triggered architecture (TTA) template presented in this paper has a number of properties that make it very suitable for embedded system design. Key to its success is to give the compiler more control; it has to schedule all data transports within the processor. This paper highlights two important TTA-related issues. First a new code generation method for TTAs is discussed; it integrates scheduling and register allocation, thereby avoiding the notorious phase ordering problem between these two steps. Secondly, we discuss how to tune the instruction repertoire for an embedded processor. A tool is described which automatically detects frequent patterns of operations. These patterns can then be implemented on special function units.

international conference on consumer electronics | 2006

Motion estimation and temporal up-conversion on the TM3270 media-processor

J.-W. van de Waerdt; Stamatis Vassiliadis; E. Bellers; Johan Janssen

We present a qualitative performance evaluation of several components of a video format conversion algorithm (referred to as natural motion (NM)). The implementation platform is a new programmable media-processor, the TM3270, combined with dedicated hardware support. The performance of two compute-intense NM components, motion estimation (ME) and temporal up-conversion (TU), is evaluated. The impact of new TM3270 features, such as new video-processing operations and data prefetching, is quantified. We show that a real-time implementation of the ME and TU algorithms is achievable in a fraction of the available compute performance, when operating on standard definition video

Archive | 2001