Is this you? Create Your Porfile

Erkan Diken

Eindhoven University of Technology

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Erkan Diken is active.

Explore More

Publication

Featured researches published by Erkan Diken.

Microprocessors and Microsystems | 2013

ASAM: Automatic architecture synthesis and application mapping

Lech Józwiak; Menno Lindwer; Rosilde Corvino; Paolo Meloni; Laura Micconi; Jan Madsen; Erkan Diken; Deepak Gangadharan; R Roel Jordans; Sebastiano Pomata; Paul Pop; Giuseppe Tuveri; Luigi Raffo; Giuseppe Notarangelo

This paper focuses on mastering the automatic architecture synthesis and application mapping for heterogeneous massively-parallel MPSoCs based on customizable application-specific instruction-set processors (ASIPs). It presents an overview of the research being currently performed in the scope of the European project ASAM of the ARTEMIS program. The paper briefly presents the results of our analysis of the main challenges to be faced in the design of such heterogeneous MPSoCs. It explains which system, design, and electronic design automation (EDA) concepts seem to be adequate to address the challenges and solve the problems. Finally, it discusses the ASAM design-flow, its main stages and tools and their application to a real-life case study.

digital systems design | 2012

ASAM: Automatic Architecture Synthesis and Application Mapping

Lech Józwiak; Menno Lindwer; Rosilde Corvino; Paolo Meloni; Laura Micconi; Jan Madsen; Erkan Diken; Deepak Gangadharan; R Roel Jordans; Sebastiano Pomata; Paul Pop; Giuseppe Tuveri; Luigi Raffo

This paper focuses on mastering the automatic architecture synthesis and application mapping for heterogeneous massively-parallel MPSoCs based on customizable application-specific instruction-set processors (ASIPs). It presents an over-view of the research being currently performed in the scope of the European project ASAM of the ARTEMIS program. The paper briefly presents the results of our analysis of the main problems to be solved and challenges to be faced in the design of such heterogeneous MPSoCs. It explains which system, design, and electronic design automation (EDA) concepts seem to be adequate to resolve the problems and address the challenges. Finally, it introduces and briefly discusses the ASAM design-flow and its main stages.

International Journal of Reconfigurable Computing | 2011

A middleware approach to achieving fault tolerance of Kahn process networks on networks on chips

Onur Derin; Erkan Diken; Leandro Fiorin

Kahn process networks (KPNs) is a distributed model of computation used for describing systems where streams of data are transformed by processes executing in sequence or parallel. Autonomous processes communicate through unbounded FIFO channels in absence of a global scheduler. In this work, we propose a task-aware middleware concept that allows adaptivity in KPN implemented over a Network on Chip (NoC). We also list our ideas on the development of a simulation platform as an initial step towards creating fault tolerance strategies for KPNs applications running on NoCs. In doing that, we extend our SACRE (Self-Adaptive Component Run Time Environment) framework by integrating it with an open source NoC simulator, Noxim. We evaluate the overhead that the middleware brings to the the total execution time and to the total amount of data transferred in the NoC. With this work, we also provide a methodology that can help in identifying the requirements and implementing fault tolerance and adaptivity support on real platforms.

Microprocessors and Microsystems | 2014

Construction and exploitation of VLIW ASIPs with heterogeneous vector-widths

Erkan Diken; R Roel Jordans; Rosilde Corvino; Lech Jóźwiak; Henk Corporaal; Felipe Augusto Chies

Numerous applications in important domains, such as communication and multimedia, show a significant data-level parallelism (DLP). A large part of the DLP is usually exploited through application vectorization and implementation of vector operations in processors executing the applications. While the amount of DLP varies between applications of the same domain or even within a single application, processor architectures usually support a single vector width. This may not be optimal and may cause a substantial energy inefficiency. Therefore, an adequate more sophisticated exploitation of DLP is highly relevant. This paper proposes the use of heterogeneous vector widths and a method to explore the heterogeneous vector widths for VLIW ASIPs. In our context, heterogeneity corresponds to the usage of two or more different vector widths in a single ASIP. After a brief explanation of the target ASIP architecture model, the paper describes the vector-width exploration method and explains the associated design automation tools. Subsequently, experimental results are discussed.

digital systems design | 2012

Transformation-Based Exploration of Data Parallel Architecture for Customizable Hardware: A JPEG Encoder Case Study

Rosilde Corvino; Erkan Diken; Abdoulaye Gamatié; Lech Józwiak

In this paper, we present a method for the design of MPSoCs for complex data-intensive applications. This method aims at a blend exploration of the communication, the memory system architecture and the computation resource parallelism. The proposed method is exemplified on a JPEG Encoder case study by describing all the design steps. Our method allows for a JPEG encoder implementation having a throughput increase of 84% and an increase of the achievable FPGA maximum frequency fmax of 64% with an area overhead of 6 with respect to a reference solution. Our method is also assessed with additional explorations of applications from different domains.

design and diagnostics of electronic circuits and systems | 2014

BuildMaster: Efficient ASIP architecture exploration through compilation and simulation result caching

R Roel Jordans; Erkan Diken; Lech Józwiak; Henk Corporaal

In this paper we introduce and discuss the Build-Master framework. This framework supports the design space exploration of application specific VLIW processors and offers automated caching of intermediate compilation and simulation results. Both the compilation and the simulation cache can greatly help to shorten the exploration time and make it possible to use more realistic data for the evaluation of selected designs. In each of the experiments we performed, we were able to reduce the number of required simulations with over 90% and save up to 50% on the required compilation time.

mediterranean conference on embedded computing | 2013

Rapid and accurate energy estimation of vector processing in VLIW ASIPs

Erkan Diken; Rosilde Corvino; Lech Józwiak

Many modern applications in important application domains, as communication, image and video processing, multimedia, etc. involve much data-level parallelism (DLP). Therefore, adequate exploitation of DLP is highly relevant. This paper focuses on effective and efficient exploitation of DLP for the synthesis of vector VLIW ASIP processors. We propose analytical energy models in order to rapidly estimate the energy consumption of a nested loop executed on a VLIW ASIP with respect to different vector widths. The models perform a rapid and relatively accurate energy consumption estimation through combining the relevant information on the application and implementation technology. The analytical energy models are experimentally validated and the validation results are discussed.

application-specific systems, architectures, and processors | 2015

Mixed-length SIMD code generation for VLIW architectures with multiple native vector-widths

Erkan Diken; Martin J. O'Riordan; R Roel Jordans; Lech Józwiak; Henk Corporaal; David Moloney

The degree of DLP parallelism in applications is not fixed and varies due to different computational characteristics of applications. On the contrary, most of the processors today include single-width SIMD (vector) hardware to exploit DLP. However, single-width SIMD architectures may not be optimal to serve applications with varying DLP and they may cause performance and energy inefficiency. We propose the usage of VLIW processors with multiple native vector-widths to better serve applications with changing DLP. SHAVE is an example of such VLIW processor and provides hardware support for the native 32-bit and 128-bit wide vector operations. This paper researches and implements the mixed-length SIMD code generation support for SHAVE processor. More specifically, we target generating 32-bit and 128/64-bit SIMD code for the native 32-bit and 128-bit wide vector units of SHAVE processor. In this way, we improved the performance of compiler generated SIMD code by reducing the number of overhead operations and by increasing the SIMD hardware utilization. Experimental results demonstrated that our methodology implemented in the compiler improves the performance of synthetic benchmarks up to 47%.

mediterranean conference on embedded computing | 2014

Construction and exploitation of VLIW asips with multiple vector-widths

Erkan Diken; R Roel Jordans; Lech Józwiak; Henk Corporaal

Many applications in important domains, such as communication, multimedia, etc. show a significant data-level parallelism (DLP). A large part of the DLP is usually exploited through application vectorization and implementation of vector operations in processors executing the applications. While the amount of DLP varies between applications of the same domain or even within a single application, processor architectures usually support a single vector width. This may not be optimal and may cause a substantial energy and performance inefficiency. Therefore, an adequate more sophisticated exploitation of DLP is highly relevant. This paper studies the construction and exploitation of VLIW ASIPs with multiple vector widths.

international conference on embedded computer systems architectures modeling and simulation | 2016

A configurable SIMD architecture with explicit datapath for intelligent learning

Y Yifan He; Mcj Maurice Peemen; Ljw Luc Waeijen; Erkan Diken; M Mattia Fiumara; Gerard K. Rauwerda; Henk Corporaal; T Tong Geng

The use of a wide Single-Instruction-Multiple-Data (SIMD) architecture is a promising approach to build energy-efficient high performance embedded processors. In this paper, based on our design framework for low-power SIMD processors, we propose a multiply-accumulate (MAC) unit with variable number of accumulator registers. The proposed MAC unit exploits both the merits of merged operation and register tiling. A Convolutional Neural Network (CNN) is a popular learning based algorithm due to its flexibility and high accuracy. However, a CNN-based application is often computationally intensive as it applies convolution operations extensively on a large data set. In this work, a CNN-based intelligent learning application is analyzed and mapped in the context of SIMD architectures. Experimental results show that the proposed architecture is efficient. In a 64-PE instance, the proposed SIMD processor with MAC4reg achieves an effective performance of 63.2 GOPS. Compared to the two baseline SIMD processors without MAC4reg, the proposed design brings 54.0% and 32.1% reduction in execution time, and 20.5% and 35.1% reduction in energy consumption respectively.

Explore More