Mehmet Ali Arslan | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Mehmet Ali Arslan is active.

Explore More

Publication

Featured researches published by Mehmet Ali Arslan.

digital systems design | 2013

Instruction Selection and Scheduling for DSP Kernels on Custom Architectures

Mehmet Ali Arslan; Krzysztof Kuchcinski

As custom architectures become more and more common for DSP applications, instruction selection and scheduling for such applications and architectures become important topics. In this paper, we explore the effects of defining the problem of finding an optimal instruction selection and scheduling as a constraint satisfaction problem (CSP). We incorporate methods based on sub-graph isomorphism and global constraints designed for scheduling. We experiment using several media applications on a custom architecture, a generic VLIW architecture and a RISC architecture, all three with several cores. Our results show that defining the problem with constraints gives flexibility in modeling, while state-of-the-art constraint solvers enable optimal solutions for large problems, hinting a new method for code generation.

asilomar conference on signals, systems and computers | 2012

Partitioning and mapping dynamic dataflow programs

Mehmet Ali Arslan; Jorn W. Janneck; Krzysztof Kuchcinski

Partitioning and mapping are important design decisions in exploiting the parallelism of programs that are to be run on systems with multiple processing elements. In this paper we introduce a fast, incremental approach for mapping dynamic dataflow programs to multiprocessor systems. We use causation traces and architecture descriptions as input for the mapping process that devises several heuristics for reaching a short makespan for the given trace. We evaluate our approach by comparing our results to two different lower bounds and another algorithm used often in solving mapping problems: simulated annealing.

programming models and applications for multicores and manycores | 2015

Programming support for reconfigurable custom vector architectures

Mehmet Ali Arslan; Krzysztof Kuchcinski; Flavius Gruian; Yangxurui Liu

High performance requirements increased the popularity of unconventional architectures. While providing better performance, such architectures are generally harder to program and generate code for. In this paper, we present our approach to ease programmability and code generation for such architectures. We present a domain specific language (DSL) for the programming part, and a constraint programming approach to scheduling with memory allocation. Our experiments on implementing a kernel extracted from a DSP application on an example reconfigurable custom architecture shows that it is possible to achieve performance close to hand-written machine code that is scheduled without memory allocation.

java technologies for real-time and embedded systems | 2012

Java bytecode to hardware made easy with bluespec system verilog

Flavius Gruian; Mehmet Ali Arslan

This paper presents a method for translation of Java bytecode sequences into synthesizable hardware, using the Bluespec System Verilog (BSV) environment. At the core of our approach lies a BSV description of a subset of Java bytecodes, that can be used to directly translate bytecode sequences into a BSV specification. The result is intended as an accelerator for existing Java processors (JOP, BlueJEP) or even standalone hardware. Preliminary evaluation shows our solution to produce hardware on par with established methods (area/performance), while supporting rare features (e.g. easy to automate, method calls and recursion).

asilomar conference on signals, systems and computers | 2014

Mapping and scheduling of dataflow graphs — A systematic map

Usman Mazhar Mirza; Mehmet Ali Arslan; Gustav Cedersjö; Sardar Muhammad Sulaman; Jorn W. Janneck

Dataflow is a natural way of modelling streaming applications, such as multimedia, networking and other signal processing applications. In order to cope with the computational and parallelism demands of such streaming applications, multiprocessor systems are replacing uniprocessor systems. Mapping and scheduling these applications on multiprocessor systems are crucial elements for efficient implementation in terms of latency, throughput, power and energy consumption etc. Performance of streaming applications running on multiprocessor systems may widely vary with mapping and scheduling strategy. This paper performs a systematic literature review of available research carried out in the area of mapping and scheduling of dataflow graphs.

Microprocessors and Microsystems | 2014

Instruction selection and scheduling for DSP kernels

Mehmet Ali Arslan; Krzysztof Kuchcinski

As custom multicore architectures become more and more common for DSP applications, instruction selection and scheduling for such applications and architectures become important topics. In this paper, we explore the effects of defining the problem of finding an optimal instruction selection and scheduling as a constraint satisfaction problem (CSP). We incorporate methods based on sub-graph isomorphism and global constraints designed for scheduling. We experiment using several media applications on a custom architecture, a generic VLIW architecture and a RISC architecture, all three with several cores. Our results show that defining the problem with constraints gives flexibility in modeling, while state-of-the-art constraint solvers enable optimal solutions for large problems, hinting a new method for code generation.

conference on design and architectures for signal and image processing | 2016

Code generation for a SIMD architecture with custom memory organisation

Mehmet Ali Arslan; Flavius Gruian; Krzysztof Kuchcinski; Andreas Karlsson

Todays multimedia and DSP applications impose requirements on performance and power consumption that only custom processor architectures with SIMD capabilities can satisfy. However, the specific features of such architectures, including vector operations and high-bandwidth complex memory organization, make them notoriously complicated and time consuming to program. In this paper we present an automated code generation approach that dramatically reduces the effort of programming such architectures, by carrying out instruction scheduling and memory allocation based on a constraint programming formulation. Furthermore, the quality of the generated code is close to that of hand-written code by an experienced programmer with knowledge of the architecture. We demonstrate the viability of our approach on an existing custom heterogeneous DSP architecture, by compiling and running a number of typical DSP kernels, and comparing the results to hand-optimized code.

acm sigplan symposium on principles and practice of parallel programming | 2016

Support for data parallelism in the CAL actor language

Essayas Gebrewahid; Mehmet Ali Arslan; Andreas Karlsson; Zain Ul-Abdin

With the arrival of heterogeneous manycores comprising various features to support task, data and instruction-level parallelism, developing applications that take full advantage of the hardware parallel features has become a major challenge. In this paper, we present an extension to our CAL compilation framework (CAL2Many) that supports data parallelism in the CAL Actor Language. Our compilation framework makes it possible to program architectures with SIMD support using high-level language and provides efficient code generation. We support general SIMD instructions but the code generation backend is currently implemented for two custom architectures, namely ePUMA and EIT. Our experiments were carried out for two custom SIMD processor architectures using two applications. The experiment shows the possibility of achieving performance comparable to hand-written machine code with much less programming effort.

application-specific systems, architectures, and processors | 2015

Application-set driven exploration for custom processor architectures

Mehmet Ali Arslan; Flavius Gruian; Krzysztof Kuchcinski

Custom architectures are often adopted as more efficient alternatives to general purpose processors in terms of performance and power. However, the design of such architectures requires experts both in hardware and the application domain. In this paper we propose a method for speeding up the design space exploration. Our method, based on Pareto points, identifies sets of solutions in terms of scalar units and vector units of certain length, fulfilling the throughput constraints for each application in a given set. Architectures can then be selected by combining these solutions, as starting points for a more thorough, model-based evaluation.

[Host publication title missing]; pp 1452-1456 (2012) | 2012