Is this you? Create Your Porfile

M Mark Wijtvliet

Eindhoven University of Technology

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where M Mark Wijtvliet is active.

Explore More

Publication

Featured researches published by M Mark Wijtvliet.

international conference on supercomputing | 2016

SFU-Driven Transparent Approximation Acceleration on GPUs

Ang Li; Shuaiwen Leon Song; M Mark Wijtvliet; Akash Kumar; Henk Corporaal

Approximate computing, the technique that sacrifices certain amount of accuracy in exchange for substantial performance boost or power reduction, is one of the most promising solutions to enable power control and performance scaling towards exascale. Although most existing approximation designs target the emerging data-intensive applications that are comparatively more error-tolerable, there is still high demand for the acceleration of traditional scientific applications (e.g., weather and nuclear simulation), which often comprise intensive transcendental function calls and are very sensitive to accuracy loss. To address this challenge, we focus on a very important but long ignored approximation unit on todays commercial GPUs --- the special-function unit (SFU), and clarify its unique role in performance acceleration of accuracy-sensitive applications in the context of approximate computing. To better understand its features, we conduct a thorough empirical analysis on three generations of NVIDIA GPU architectures to evaluate all the single-precision and double-precision numeric transcendental functions that can be accelerated by SFUs, in terms of their performance, accuracy and power consumption. Based on the insights from the evaluation, we propose a transparent, tractable and portable design framework for SFU-driven approximate acceleration on GPUs. Our design is software-based and requires no hardware or application modifications. Experimental results on three NVIDIA GPU platforms demonstrate that our proposed framework can provide fine-grained tuning for performance and accuracy trade-offs, thus facilitating applications to achieve the maximum performance under certain accuracy constraints.

international conference on embedded computer systems architectures modeling and simulation | 2016

Coarse grained reconfigurable architectures in the past 25 years: Overview and classification

M Mark Wijtvliet; Ljw Luc Waeijen; Henk Corporaal

Reconfigurable architectures become more popular now general purpose compute performance does not increase as rapidly as before. Field programmable gate arrays are slowly moving into the direction of Coarse Grain Reconfigurable Architectures (CGRA) by adding DSP and other coarse grained IP blocks, general purpose processors become more heterogeneous and include sub-word parallelism and even some reconfigurable logic. In the past 25 years, several CGRAs have been published. In this paper an overview and classification of these architectures is presented. This work also provides a clear definition of CGRAs and identifies topics for future research which are key to unlock the full potential of CGRAs.

digital systems design | 2016

Code Generation for Reconfigurable Explicit Datapath Architectures with LLVM

M Michaël Adriaansen; M Mark Wijtvliet; R Roel Jordans; Ljw Luc Waeijen; Henk Corporaal

Good tool support is essential for computing platforms because they increase the programmability of the platform. This is especially the case for reconfigurable architectures because an application needs to be mapped on the architecture for each configuration individually. This paper investigates how the LLVM framework can be used to generate code for a Coarse Grained Reconfigurable Array (CGRA). A CGRA compiler must be retargetable to support all possible architecture configurations. The explicit bypassing capabilities of the hardware should be utilized. Utilizing the hardware features requires the compiler to support software pipelining, multiple register files and operation based scheduling. This paper evaluates the potential of the LLVM framework and identifies missing features for the support of reconfigurable explicit datapath architectures.

field-programmable logic and applications | 2013

MAMPSX: A demonstration of rapid, predictable HMPSOC synthesis

Shakith Fernando; M Mark Wijtvliet; Fm Firew Siyoum; Y Yifan He; Sander Sander Stuijk; Akash Kumar; Henk Corporaal

Heterogeneous Multiprocessor systems-on-chip (HMPSoC) are becoming popular as a means of meeting energy efficiency requirements of modern embedded systems. However, as these HMPSoCs run multimedia applications as well, they also need to meet realtime requirements. Designing HMPSoCs with predictable timing behavior is a key challenge, as the current design methods for these platforms are semi-automated, non-predictable, or support limited heterogeneity. In this demonstration, we present a design framework to rapidly generate and implement predictable HMPSoC designs. It takes the application specifications and the architecture model as input and generates the entire HMPSoC, for FPGA prototyping, that meets the throughput constraints of the application. We also present results of a case study that computes the performance-power tradeoffs of an industrial vision application. A tool-chain targeting the Xilinx Zynq FPGA is also presented.

digital systems design | 2017

Loop Overhead Reduction Techniques for Coarse Grained Reconfigurable Architectures

Kanishkan Vadivel; M Mark Wijtvliet; R Roel Jordans; Henk Corporaal

Due to their flexibility and high performance, Coarse Grained Reconfigurable Array (CGRA) are a topic of increasing research interest. However, CGRAs also have the potential to achieve very high energy efficiency in comparison to other reconfigurable architectures when hardware optimizations are applied. Some of these optimizations are common for more traditional processors but can also lead to large efficiency gains for reconfigurable architectures. This paper investigates three hardware based loop optimization techniques that can significantly improve the energy efficiency of CGRAs. The three techniques are evaluated on processing kernels from the image processing domain as well as an industrial computer vision application. Energy consumption and area estimates are obtained using a CGRA synthesized with a commercial 40nm library. For the three applied techniques (zero-overhead loop accelerator, single-cycle loop support, and loop buffers) the simulation results show overall energy gains of 6.8% for zero-overhead loop support, 13.2% for ZOLA combined with single-cycle loop support and 18.3% for a combination of all optimizations.

digital systems design | 2016

Multi-granular Arithmetic in a Coarse-Grain Reconfigurable Architecture

Spa Stefan Louwers; Ljw Luc Waeijen; M Mark Wijtvliet; Rpj Ruud Koolen; Henk Corporaal

Mismatch between operand width and hardware operation width is a source of energy inefficiency. This work proposes multi-granular arithmetic, which can adapt the hardware operation width to the application, preventing energy being wasted. In particular multi-granular arithmetic in the context of coarse-grain reconfigurable architectures is considered for the operations of addition, accumulation, multiplication, and multiply-accumulation. Using a silicon synthesis-toolflow it is shown that the multi-granular designs can perform narrow width operations, e.g. an 8-by-8 multiplication, much more efficiently than standard full-width circuits. For multiplication the required energy is reduced by up to 15 times under realistic conditions when compared to a full-width 32x32 multiplier.

field programmable logic and applications | 2015

SPINE: From C loop-nests to highly efficient accelerators using Algorithmic Species

M Mark Wijtvliet; Shakith Fernando; Henk Corporaal

In modern embedded systems, heterogeneous architectures are crucial in achieving desired performance requirements under area and energy constraints. Many of these systems combine a multi-processor system-on-chip and a Field Programmable Gate Array to enable hardware acceleration. Although the introduction of High-Level Synthesis significantly reduced the complexity of utilizing these systems, a programmer is still required to have expert knowledge of both the High-Level Synthesis tool and the target hardware and to perform time consuming manual iterations to achieve efficient implementations. In this paper we present SPINE, a design flow for automatic generation of efficient hardware accelerators based on Algorithmic Species. SPINE allows the designer to focus on the algorithm by automatically applying hardware specific optimizations and parallelization techniques to the design. As a case study, we present a design space exploration of nine different loop-nests used in image processing kernels and show how SPINE rapidly generates multiple area-performance trade-offs. Furthermore, we compare our results the state of the art and show that SPINE is a promising direction for accelerator generation as the average performance and area improvement with SPINE are respectively 107% and 75% over the state of the art.

Unmanned/Unattended Sensors and Sensor Networks X | 2014

Aerial networking communication solutions using Micro Air Vehicle (MAV)

Shyam Balasubramanian; Maurits de Graaf; Gerard Hoekstra; Henk Corporaal; M Mark Wijtvliet; Javier Cuadros Linde

The application of a Micro Air Vehicle (MAV) for wireless networking is slowly gaining significance in the field of network robotics. Aerial transport of data requires efficient network protocols along with accurate positional adjustment of the MAV to minimize transaction times. In our proof of concept, we develop an Aerial networking protocol for data transfer using the technology of Disruption Tolerant Networks (DTN), a store-and-forward approach for environments that deals with disrupted connectivity. Our results show that close interaction between networking and flight behavior helps in efficient data exchange. Potential applications are in areas where network infrastructure is minimal or unavailable and distances may be large. For example, forwarding video recordings during search and rescue, agriculture, swarm communication, among several others. A practical implementation and validation, as described in this paper, presents the complex dynamics of wireless environments and poses new challenges that are not addressed in earlier work on this topic. Several tests are evaluated in a practical setup to display the networking MAV behavior during such an operation.

design, automation, and test in europe | 2015