Marcelo Mandelli
Pontifícia Universidade Católica do Rio Grande do Sul
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Marcelo Mandelli.
symposium on integrated circuits and systems design | 2011
Marcelo Mandelli; Alexandre M. Amory; Luciano Ost; Fernando Gehm Moraes
Task mapping defines the best placement of a given task in the MPSoC, according to some criteria, as energy or Manhattan distance minimization. The ITRS roadmap forecast in a near future MPSoCs with hundreds of processing elements (PEs). Therefore, dynamic mapping heuristics are required. An important gap is observed in the mapping literature: the lack of proposals targeting multi-task dynamic mapping. In this context, the present work proposes an energy-aware dynamic task mapping heuristic, allowing multiple tasks allocation per PE. Experimental results are executed in an actual MPSoC running distributed applications. Comparing a single-task to the multi-task mapping, the energy spent in the NoC is reduced in average by 51% (best case: 72%), with an average execution time overhead of 18%. Besides the communication energy reduction, the multi-task mapping enables a greater number of applications executing simultaneously, or smaller MPSoCs, which reduces the system cost.
international symposium on circuits and systems | 2011
Marcelo Mandelli; Luciano Ost; Everton Alceu Carara; Guilherme Montez Guindani; Thiago Gouvea; G. Medeiros; Fernando Gehm Moraes
To cope with the dynamic workload of actual NoC-based MPSoCs, dynamic mechanisms are required to guarantee the application requirements. Application mapping may drastically influence the system performance and the energy consumption, which can be crucial to the success (or failure) of a product, even more for battery-powered embedded systems. In this context, the current work presents an energy-aware dynamic task mapping heuristic, which was evaluated in a real NoC-based MPSoC platform. Results show that the proposed heuristic may reduces up to 22.8% of the communication energy consumption compared to other dynamic mapping heuristics.
ieee computer society annual symposium on vlsi | 2013
Guilherme M. Castilhos; Marcelo Mandelli; Guilherme A. Madalozzo; Fernando Gehm Moraes
Scalability is an important issue in large MPSoCs. MPSoCs may execute several applications in parallel, with dynamic workload, and tight QoS constraints. Thus, the MPSoC management must be distributed to cope with such constraints. This paper presents a distributed resource management in NoC-Based MPSoC, using a clustering method, enabling the modification of the cluster size at runtime. This work addresses the following distributed techniques: task mapping, monitoring and task migration. Results show an important reduction in the total execution time of applications, reduced number of hops between tasks (smaller communication energy), and a reclustering method through monitoring and task migration.
ACM Transactions in Embedded Computing Systems | 2013
Luciano Ost; Marcelo Mandelli; Gabriel Marchesan Almeida; Leandro Möller; Leandro Soares Indrusiak; Gilles Sassatelli; Pascal Benoit; Manfred Glesner; Michel Robert; Fernando Gehm Moraes
The mapping of tasks to processing elements of an MPSoC has critical impact on system performance and energy consumption. To cope with complex dynamic behavior of applications, it is common to perform task mapping during runtime so that the utilization of processors and interconnect can be taken into account when deciding the allocation of each task. This paper has two major contributions, one of them targeting the general problem of evaluating dynamic mapping heuristics in NoC-based MPSoCs, and another focusing on the specific problem of finding a task mapping that optimizes energy consumption in those architectures.
Journal of Systems Architecture | 2016
Guilherme M. Castilhos; Marcelo Mandelli; Luciano Ost; Fernando Gehm Moraes
Executed at runtime. The proposed approach can better manage time-varying workloads and system changes.Hierarchical mapping approach. The proposed approach is implemented in a many-core managed in a hierarchical way. Such hierarchical system management improves system scalability by dividing the system into regions, each one with a manager responsible for actions inside it. Further, it reduces mapping decision computational effort, not compromising the system performance.Induces to a better system reliability. The proposed approach aims to improve energy balancing, which are directly related to a better system reliability.Hierarchical energy monitoring. The proposed approach does not employ physical sensors in the mapping decision, which increases area and energy costs. The energy data is obtained at runtime using a hierarchical monitoring approach.Clock-cycle model for validation. The proposed mapping approach is validated in a large many-core system (up to 256 processing elements), modeled in SystemC. This work addresses a research subject with a rich literature: task mapping in NoC-based systems. Task mapping is the process of selecting a processing element to execute a given task. The number of cores in many-core systems increases the complexity of the task mapping. The main concerns in task mapping in large systems include (i) scalability; (ii) dynamic workload; and (iii) reliability. It is necessary to distribute the mapping decision across the system to ensure scalability. The workload of emerging many-core systems may be dynamic, i.e., new applications may start at any moment, leading to different mapping scenarios. Therefore, it is necessary to execute the mapping process at runtime to support a dynamic workload assignment. The workload assignment plays an important role in the many-core system reliability. Load imbalance may generate hotspots zones and consequently thermal implications, which may generate hotspots zones and consequently thermal implications. More recently, task mapping techniques aiming at improving system reliability have been proposed in the literature. However, such approaches rely on centralized mapping decisions, which are not scalable. To address these challenges, the main goal of this work is to propose a hierarchical runtime mapping heuristic, which provides scalability and a fair workload distribution. Distributing the workload inside the system increases the system reliability in long-term, due to the reduction of hotspot regions. The proposed mapping heuristic considers the application workload as a function of the consumed energy in the processors and NoC routers. The proposal adopts a hierarchical energy monitoring scheme, able to estimate at runtime the consumption at each processing element. The mapping uses the energy estimated by the monitoring scheme to guide the mapping decision. Results compare the proposal against a mapping heuristic whose main cost function minimizes the communication energy. Results obtained in large systems, up to 256 cores, show improvements in the workload distribution (average value 59.2%) and a reduction in the maximum energy values spent by the processors (average value 32.2%). Such results demonstrate the effectiveness of the proposal.
symposium on integrated circuits and systems design | 2015
Marcelo Mandelli; Guilherme M. Castilhos; Gilles Sassatelli; Luciano Ost; Fernando Gehm Moraes
Investigating novel techniques to improve many-core embedded systems lifetime, reliability, and thermal management is a fundamental challenge for the semiconductor industry. Imbalanced mapping of applications may considerably affect the system performance and lifetime due to thermal issues in an integrated circuit (e.g. hotspot zones). Traditional mapping techniques focus on local optimizations, e.g. minimize the number of hops between communicating tasks, which may lead to hotspot zones and underutilization of some processing resources. This paper proposes a runtime mapping heuristic whose cost function targets temporal workload and energy consumption balance in large scale systems. The proposed heuristic minimizes the occurrence of hotspots by distributing application workload onto the processing elements in a uniform way, which contributes to a balanced thermal distribution across the system. These features improve system reliability and postpone aging effects. Results with several benchmarks executing in a cycle-accurate platform model show a uniform system utilization when comparing the proposed heuristic to conventional mapping approaches.
ACM Transactions on Reconfigurable Technology and Systems | 2012
Luciano Ost; Sameer Varyani; Leandro Soares Indrusiak; Marcelo Mandelli; Gabriel Marchesan Almeida; Eduardo Wächter; Fernando Gehm Moraes; Gilles Sassatelli
This article explores the use of virtualization to enable mechanisms like task migration and dynamic mapping in heterogeneous MPSoCs, thereby targeting the design of systems capable of adapt their behavior to time-changing workloads. Because tasks may have to be mapped to target processors with different instruction set architectures, we propose the use of Low Level Virtual Machine (LLVM) to postcompile the tasks at runtime depending on their target processor. A novel dynamic mapping heuristic is also proposed, aiming to exploit the advantages of specialized processors while taking into account the overheads imposed by virtualization. Extensive experimental work at different levels of abstraction---FPGA prototype, RTL and system-level simulation---is presented to evaluate the proposed techniques.
international conference on electronics, circuits, and systems | 2012
Marcelo Mandelli; Guilherme M. Castilhos; Fernando Gehm Moraes
The constant growth in the number of cores in SoCs implies an important issue: scalability. NoC-based MPSoCs offer scalability at the hardware level. However, the management of the MPSoC resources requires also scalable methods, to effectively extract the computational power offered by dozens of processors. State-of-the-art proposals adopt different approaches to tackle such problem, using the MPSoC clustering as the most common approach. The present work proposes a distributed mapping approach, using a clustering method, having as main goal to evaluate its pros and cons. Evaluation is carried-out using cycle accurate simulation, in large MPSoCs (up to 144 processors). Results show an important reduction in the total execution time of the applications running in the MPSoC, even if some processors are reserved for resources management.
symposium on integrated circuits and systems design | 2011
Luciano Ost; Marcelo Mandelli; Gabriel Marchesan Almeida; Leandro Soares Indrusiak; Leandro Möller; Manfred Glesner; Gilles Sassatelli; Michel Robert; Fernando Gehm Moraes
The power evaluation of NoC-based MPSoCs is an important and a time-consuming process. Mapping tasks onto processing elements (PEs) has a critical impact on system performance, as well as power dissipation. To cope with complex dynamic behavior of applications, it is common to perform task mapping at runtime so that the utilization of processors and interconnect can be taken into account when deciding the most appropriate PE to host tasks. On the other hand, the process of accurately comparing different mapping heuristics can be very costly once each adopted solution has to be evaluated using simulation that can take hours or even days in the case of large MPSoCs. In this context, this paper has two major contributions: (i) evaluation of dynamic mapping by employing a model-based framework that unifies abstract models of applications, NoC-based platforms and mapping heuristics, and (ii) power consumption evaluation of such heuristics by using a rate-based power model.
reconfigurable communication centric systems on chip | 2011
Luciano Ost; Gabriel Marchesan Almeida; Marcelo Mandelli; Eduardo Wächter; Sameer Varyani; Gilles Sassatelli; Leandro Soares Indrusiak; Michel Robert; Fernando Gehm Moraes
This paper proposes a novel strategy for enabling dynamic task mapping on heterogeneous NoC-based MPSoC architectures. The solution considers three different platforms with different area constraints and applications with distinct efficient characteristics. We propose a solution that uses a unified model-based framework, which is calibrated according to area information obtained from FPGA synthesis. Besides, we present the performance of various applications running on different processors on FPGAs aiming to obtain application efficiency characteristics for calibrating the proposed high-level model. The paper also presents three different scenarios and discusses the reduction in terms of energy consumption as well as the end-to-end communication cost for different applications such as MPEG and ADPCM, among others multimedia benchmarks.