Miguel Angel Aguilar
RWTH Aachen University
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Miguel Angel Aguilar.
international conference on embedded computer systems architectures modeling and simulation | 2015
Miguel Angel Aguilar; Juan Fernando Eusse; Projjol Ray; Rainer Leupers; Gerd Ascheid; Weihua Sheng; Prashant Sharma
In the last years the presence of embedded devices in everyday life has grown exponentially. The market of these devices imposes conflicting requirements such as cost, performance and energy. The use of Multiprocessor Systems on Chip (MPSoCs) is a widely accepted solution to provide a trade-off between these demands. However, programming MPSoCs is still a cumbersome task. Several research efforts have addressed this challenge in two complementary directions: paradigms for parallel programming and tools for parallelism extraction. However, most of these efforts are focused on the high performance domain and they do not consider the characteristics of the underlying platform. In this paper, we present an approach to extract multiple forms of parallelism from sequential C code, which is applied to widespread Android mobile devices. We show the effectiveness of our work by parallelizing relevant embedded benchmarks on a quad-core Nexus 7 tablet.
design automation conference | 2016
Miguel Angel Aguilar; Rainer Leupers; Gerd Ascheid; Luis Gabriel Murillo
MPSoCs have evolved into heterogeneous architectures, where general purpose processors are combined with accelerators. Directive-based programming models such as the OpenMP 4.0 accelerator model have emerged as an approach to parallelize and offload code regions to accelerators. However, existing compiler technologies have focused mainly on parallelization, leaving the challenging task of offloading code regions to the developers. In this paper, we propose a novel approach that addresses parallelization and offloading jointly. Results show that our approach is able to speedup sequential embedded applications significantly on a commercial heterogeneous MPSoC, which incorporates a quad-core ARM cluster and an octa-core DSP cluster.
2012 5th European DSP Education and Research Conference (EDERC) | 2012
Maximilian Odendahl; Weihua Sheng; Miguel Angel Aguilar; Rainer Leupers; Gerd Ascheid
With the increasing complexity of modern, state-of-the-art Multiprocessor Systems on Chip (MPSoCs), recent trends in embedded software design show a rising interest in using dataflow models of computation for parallel programming. These models of computation do not only match the requirements of streaming applications found in the telecommunication, wireless and multimedia domain, but also provide an easier entry path in dealing with parallelism by reusing large parts of the sequential programming paradigm. As a consequence, it is highly desired to have an automated approach for bridging the gap between an applications specification using a dataflow model of computation and the actual binary which is executed on the hardware. In this paper, we present a toolflow which completely automates the process of code generation for Texas Instruments C6000 high performing, multicore, Digital Signal Processor (DSP) platforms. The mapping of individual processes to specific cores is achieved by a user-defined mapping file. This automated code generation for streaming applications opens up a wide range of possibilities as a research and educational platform.
Archive | 2019
Rainer Leupers; Miguel Angel Aguilar; Jeronimo Castrillon; Weihua Sheng
The increasing demands of modern embedded systems, such as high-performance and energy-efficiency, have motivated the use of heterogeneous multi-core platforms enabled by Multiprocessor System-on-Chips (MPSoCs). To fully exploit the power of these platforms, new tools are needed to address the increasing software complexity to achieve a high productivity. An MPSoC compiler is a tool-chain to tackle the problems of application modeling, platform description, software parallelization, software distribution and code generation for an efficient usage of the target platform. This chapter discusses various aspects of compilers for heterogeneous embedded multi-core systems, using the well-established single-core C compiler technology as a baseline for comparison. After a brief introduction to the MPSoC compiler technology, the important ingredients of the compilation process are explained in detail. Finally, a number of case studies from academia and industry are presented to illustrate the concepts discussed in this chapter.
software and compilers for embedded systems | 2015
Miguel Angel Aguilar; Rainer Leupers; Gerd Ascheid; Nikolaos Kavvadias
Multicore Digital Signal Processors (DSPs) have gained relevance in recent years due to the emergence of data-intensive applications, such as wireless communications and multimedia processing on mobile devices, which demand increased computational performance at a low cost and power consumption. Programming these platforms is still a big challenge, posing a multitude of software design issues. In this paper, we present a toolflow to guide developers in the process of programming multicore DSPs. We evaluate the applicability of our approach by parallelizing a set of realistic embedded benchmarks on a commercial multicore DSP platform from Texas Instruments.
international conference on parallel architectures and compilation techniques | 2015
Miguel Angel Aguilar; Rainer Leupers
The use of Multiprocessor Systems on Chip (MPSoCs) is a common practice in the design of state-of-the-art embedded devices, as MPSoCs provide a good trade-off between performance, energy and cost. However, programming MPSoCs is a challenging task, which currently involves multiple manual steps. Although, several research efforts have addressed this challenge, there is not yet a widely accepted solution. In this work, we describe an approach to automatically extract multiple forms of parallelism from sequential embedded applications in a unified manner. We evaluate the applicability of our work by parallelizing multiple embedded applications on two commercial platforms.
2014 6th European Embedded Design in Education and Research Conference (EDERC) | 2014
Miguel Angel Aguilar; Ronny Jimenez; Rainer Leupers; Gerd Ascheid
Complexity of modern applications, the performance requirements and the power constraints are the major driving forces that motivate the use of Multiprocessor Systems on Chip (MPSoCs). Programming these platforms is still a big challenge, posing a multitude of software design issues: What is the right MPSoC programming model to capture parallelism?, How to parallelize legacy C code?, How to achieve optimal utilization of processing elements?, How to minimize communication overhead?, How to explore the vast software mapping design space?. Traditional compiler technology does not solve these challenges, as it does not consider the architectural characteristics introduced by MPSoCs. Several research efforts have been directed to address these challenges. One example is the MAPS framework (MPSoC Application Programming Studio) that offers facilities for programming heterogeneous and homogeneous MPSoCs. In this paper, we focus on the applicability of this tool to the software development on the TI Keystone Multicore DSP platforms. The analysis considers both performance and productivity improvements achieved by MAPS.
compilers, architecture, and synthesis for embedded systems | 2017
Miguel Angel Aguilar; Abhishek Aggarwal; Awaid Shaheen; Rainer Leupers; Gerd Ascheid; Jeronimo Castrillon; Liam Fitzpatrick
Parallelizing compilers are a promising solution to tackle key challenges of MPSoC programming. One fundamental aspect for a profitable parallelization is to estimate the performance of the applications on the target platforms. There is a wide range of state-of-the-art performance estimation techniques, such as, simulation-based, measurement-based, among others. They provide performance estimates typically only at function or basic block granularity. However, MPSoC compilers require performance information at other granularities, such as statement, loop or even arbitrary code blocks. In this paper, we propose a framework to adapt performance information sources to any granularity required by an MPSoC compiler.
high performance computing and communications | 2015
Miguel Angel Aguilar; Juan Fernando Eusse; Rainer Leupers; Gerd Ascheid; Maximilian Odendahl
Many embedded applications such as multimedia, signal processing and wireless communications present a streaming processing behavior. In order to take full advantage of modern multi-and many-core embedded platforms, these applications have to be parallelized by describing them in a given parallel Model of Computation (MoC). One of the most prominent MoCs is Kahn Process Network (KPN) as it allows to express multiple forms of parallelism and it is suitable for efficient mapping and scheduling onto parallel embedded platforms. However, describing streaming applications manually in a KPN is a challenging task. Especially, since they spend most of their execution time in loops with unbounded number of iterations. These loops are in several cases implemented as while loops, which are difficult to analyze. In this paper, we present an approach to guide the derivation of KPNs from embedded streaming applications dominated by multiple types of while loops. We evaluate the applicability of our approach on an eight DSP core commercial embedded platform using realistic benchmarks. Results measured on the platform showed that we are able to speedup sequential benchmarks on average by a factor up to 4.3x and in the best case up to 7.7x. Additionally, to evaluate the effectiveness of our approach, we compared it against a state-of-the-art parallelization framework.
design, automation, and test in europe | 2017
Miguel Angel Aguilar; Rainer Leupers; Gerd Ascheid; Nikolaos Kavvadias; Liam Fitzpatrick
MPSoC programming is still a challenging task, where several aspects have to be taken into account to achieve a profitable parallel execution. Selecting a proper scheduling policy is an aspect that has a major impact on the performance. OpenMP is an example of a programming paradigm that allows to specify the scheduling policy on a per loop basis. However, choosing the best scheduling policy and the corresponding parameters is not a trivial task. In fact, there is already a large amount of software parallelized with OpenMP, where the scheduling policy is not explicitly specified. Then, the scheduling decision is left to the default runtime, which in most of the cases does not yield the best performance. In this paper, we present a schedule-aware optimization approach enabled by exploiting the parallel slack existing in loops parallelized with OpenMP. Results on an embedded multicore device, show that the performance achieved by OpenMP loops optimized with our approach outperform by up to 93%, the performance achieved by the original OpenMP loops, where the scheduling policy is not specified.