Erkan Diken
Eindhoven University of Technology
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Erkan Diken.
Microprocessors and Microsystems | 2013
Lech Józwiak; Menno Lindwer; Rosilde Corvino; Paolo Meloni; Laura Micconi; Jan Madsen; Erkan Diken; Deepak Gangadharan; R Roel Jordans; Sebastiano Pomata; Paul Pop; Giuseppe Tuveri; Luigi Raffo; Giuseppe Notarangelo
This paper focuses on mastering the automatic architecture synthesis and application mapping for heterogeneous massively-parallel MPSoCs based on customizable application-specific instruction-set processors (ASIPs). It presents an overview of the research being currently performed in the scope of the European project ASAM of the ARTEMIS program. The paper briefly presents the results of our analysis of the main challenges to be faced in the design of such heterogeneous MPSoCs. It explains which system, design, and electronic design automation (EDA) concepts seem to be adequate to address the challenges and solve the problems. Finally, it discusses the ASAM design-flow, its main stages and tools and their application to a real-life case study.
digital systems design | 2012
Lech Józwiak; Menno Lindwer; Rosilde Corvino; Paolo Meloni; Laura Micconi; Jan Madsen; Erkan Diken; Deepak Gangadharan; R Roel Jordans; Sebastiano Pomata; Paul Pop; Giuseppe Tuveri; Luigi Raffo
This paper focuses on mastering the automatic architecture synthesis and application mapping for heterogeneous massively-parallel MPSoCs based on customizable application-specific instruction-set processors (ASIPs). It presents an over-view of the research being currently performed in the scope of the European project ASAM of the ARTEMIS program. The paper briefly presents the results of our analysis of the main problems to be solved and challenges to be faced in the design of such heterogeneous MPSoCs. It explains which system, design, and electronic design automation (EDA) concepts seem to be adequate to resolve the problems and address the challenges. Finally, it introduces and briefly discusses the ASAM design-flow and its main stages.
International Journal of Reconfigurable Computing | 2011
Onur Derin; Erkan Diken; Leandro Fiorin
Kahn process networks (KPNs) is a distributed model of computation used for describing systems where streams of data are transformed by processes executing in sequence or parallel. Autonomous processes communicate through unbounded FIFO channels in absence of a global scheduler. In this work, we propose a task-aware middleware concept that allows adaptivity in KPN implemented over a Network on Chip (NoC). We also list our ideas on the development of a simulation platform as an initial step towards creating fault tolerance strategies for KPNs applications running on NoCs. In doing that, we extend our SACRE (Self-Adaptive Component Run Time Environment) framework by integrating it with an open source NoC simulator, Noxim. We evaluate the overhead that the middleware brings to the the total execution time and to the total amount of data transferred in the NoC. With this work, we also provide a methodology that can help in identifying the requirements and implementing fault tolerance and adaptivity support on real platforms.
Microprocessors and Microsystems | 2014
Erkan Diken; R Roel Jordans; Rosilde Corvino; Lech Jóźwiak; Henk Corporaal; Felipe Augusto Chies
Numerous applications in important domains, such as communication and multimedia, show a significant data-level parallelism (DLP). A large part of the DLP is usually exploited through application vectorization and implementation of vector operations in processors executing the applications. While the amount of DLP varies between applications of the same domain or even within a single application, processor architectures usually support a single vector width. This may not be optimal and may cause a substantial energy inefficiency. Therefore, an adequate more sophisticated exploitation of DLP is highly relevant. This paper proposes the use of heterogeneous vector widths and a method to explore the heterogeneous vector widths for VLIW ASIPs. In our context, heterogeneity corresponds to the usage of two or more different vector widths in a single ASIP. After a brief explanation of the target ASIP architecture model, the paper describes the vector-width exploration method and explains the associated design automation tools. Subsequently, experimental results are discussed.
digital systems design | 2012
Rosilde Corvino; Erkan Diken; Abdoulaye Gamatié; Lech Józwiak
In this paper, we present a method for the design of MPSoCs for complex data-intensive applications. This method aims at a blend exploration of the communication, the memory system architecture and the computation resource parallelism. The proposed method is exemplified on a JPEG Encoder case study by describing all the design steps. Our method allows for a JPEG encoder implementation having a throughput increase of 84% and an increase of the achievable FPGA maximum frequency fmax of 64% with an area overhead of 6 with respect to a reference solution. Our method is also assessed with additional explorations of applications from different domains.
design and diagnostics of electronic circuits and systems | 2014
R Roel Jordans; Erkan Diken; Lech Józwiak; Henk Corporaal
In this paper we introduce and discuss the Build-Master framework. This framework supports the design space exploration of application specific VLIW processors and offers automated caching of intermediate compilation and simulation results. Both the compilation and the simulation cache can greatly help to shorten the exploration time and make it possible to use more realistic data for the evaluation of selected designs. In each of the experiments we performed, we were able to reduce the number of required simulations with over 90% and save up to 50% on the required compilation time.
mediterranean conference on embedded computing | 2013
Erkan Diken; Rosilde Corvino; Lech Józwiak
Many modern applications in important application domains, as communication, image and video processing, multimedia, etc. involve much data-level parallelism (DLP). Therefore, adequate exploitation of DLP is highly relevant. This paper focuses on effective and efficient exploitation of DLP for the synthesis of vector VLIW ASIP processors. We propose analytical energy models in order to rapidly estimate the energy consumption of a nested loop executed on a VLIW ASIP with respect to different vector widths. The models perform a rapid and relatively accurate energy consumption estimation through combining the relevant information on the application and implementation technology. The analytical energy models are experimentally validated and the validation results are discussed.
application-specific systems, architectures, and processors | 2015
Erkan Diken; Martin J. O'Riordan; R Roel Jordans; Lech Józwiak; Henk Corporaal; David Moloney
The degree of DLP parallelism in applications is not fixed and varies due to different computational characteristics of applications. On the contrary, most of the processors today include single-width SIMD (vector) hardware to exploit DLP. However, single-width SIMD architectures may not be optimal to serve applications with varying DLP and they may cause performance and energy inefficiency. We propose the usage of VLIW processors with multiple native vector-widths to better serve applications with changing DLP. SHAVE is an example of such VLIW processor and provides hardware support for the native 32-bit and 128-bit wide vector operations. This paper researches and implements the mixed-length SIMD code generation support for SHAVE processor. More specifically, we target generating 32-bit and 128/64-bit SIMD code for the native 32-bit and 128-bit wide vector units of SHAVE processor. In this way, we improved the performance of compiler generated SIMD code by reducing the number of overhead operations and by increasing the SIMD hardware utilization. Experimental results demonstrated that our methodology implemented in the compiler improves the performance of synthetic benchmarks up to 47%.
mediterranean conference on embedded computing | 2014
Erkan Diken; R Roel Jordans; Lech Józwiak; Henk Corporaal
Many applications in important domains, such as communication, multimedia, etc. show a significant data-level parallelism (DLP). A large part of the DLP is usually exploited through application vectorization and implementation of vector operations in processors executing the applications. While the amount of DLP varies between applications of the same domain or even within a single application, processor architectures usually support a single vector width. This may not be optimal and may cause a substantial energy and performance inefficiency. Therefore, an adequate more sophisticated exploitation of DLP is highly relevant. This paper studies the construction and exploitation of VLIW ASIPs with multiple vector widths.
international conference on embedded computer systems architectures modeling and simulation | 2016
Y Yifan He; Mcj Maurice Peemen; Ljw Luc Waeijen; Erkan Diken; M Mattia Fiumara; Gerard K. Rauwerda; Henk Corporaal; T Tong Geng
The use of a wide Single-Instruction-Multiple-Data (SIMD) architecture is a promising approach to build energy-efficient high performance embedded processors. In this paper, based on our design framework for low-power SIMD processors, we propose a multiply-accumulate (MAC) unit with variable number of accumulator registers. The proposed MAC unit exploits both the merits of merged operation and register tiling. A Convolutional Neural Network (CNN) is a popular learning based algorithm due to its flexibility and high accuracy. However, a CNN-based application is often computationally intensive as it applies convolution operations extensively on a large data set. In this work, a CNN-based intelligent learning application is analyzed and mapped in the context of SIMD architectures. Experimental results show that the proposed architecture is efficient. In a 64-PE instance, the proposed SIMD processor with MAC4reg achieves an effective performance of 63.2 GOPS. Compared to the two baseline SIMD processors without MAC4reg, the proposed design brings 54.0% and 32.1% reduction in execution time, and 20.5% and 35.1% reduction in energy consumption respectively.