Is this you? Create Your Porfile

Monica Magalhães Pereira

Federal University of Rio Grande do Norte

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Monica Magalhães Pereira is active.

Explore More

Publication

Featured researches published by Monica Magalhães Pereira.

compilers, architecture, and synthesis for embedded systems | 2011

An FPGA-based heterogeneous coarse-grained dynamically reconfigurable architecture

Ricardo S. Ferreira; Julio C. Goldner Vendramini; Lucas Mucida; Monica Magalhães Pereira; Luigi Carro

Coarse-grained reconfigurable architecture has emerged as a promising model for embedded systems as a solution to reduce the complexity of FPGA synthesis and mapping steps, consequently reducing reconfiguration time. Despite these advantages, CGRA usage has been limited due to the lack of commercial CGRA circuits. This work proposes a virtual and dynamic CGRA implemented on top of an FPGA. This approach allows the usage of commercial-off-the-shelf FPGA devices combined with the advantages of CGRAs. The proposed architecture consists of a set of heterogeneous functional units (FU) and a global interconnection network. The global network allows any FU to be used at each cycle, which reduces significantly the placement complexity. In addition, we introduce a polynomial mapping algorithm which includes scheduling, placement and routing steps (SPR). Moreover, the proposed approach performs a very fast placement and routing in comparison to similar CGRA approaches. The three SPR steps are computed in few milliseconds. The feasibility of this approach is demonstrated for a suite of digital signal processing benchmarks.

symposium on integrated circuits and systems design | 2007

RoSA: a reconfigurable stream-based architecture

Monica Magalhães Pereira; Bruno Cruz de Oliveira; Ivan Saraiva Silva

The increase of stream-based applications complexity has demanded hardware more flexible and able to reaching higher performance. Reconfigurable architectures have been showed significant progresses in exploiting the parallelism of these applications. This paper presents RoSA, a coarse-grained reconfigurable architecture that combines compilation techniques and hardware reuse to accelerate the execution of stream-based applications. The results showed that RoSA achieved performance gains of more than 74% over the code that can be executed concurrently and 55% of the total cost of the applications.

international conference on embedded computer systems architectures modeling and simulation | 2013

A just-in-time modulo scheduling for virtual coarse-grained reconfigurable architectures

Ricardo S. Ferreira; Vinicius Duarte; Waldir Meireles; Monica Magalhães Pereira; Luigi Carro; Stephan Wong

In the past decade, most solutions concerning the mapping of the compute-intensive loop kernels to accelerators have used heuristics and compiler-based strategies. These facts require that most of the decisions be taken at design time, thus precluding efficient solutions that can take run-time information into account. Any success in accelerating such applications greatly depends on two steps, extracting the loops and mapping them into the architecture. This last step is a challenge in itself since it is a NP-complete problem. In this paper, we propose a runtime solution that can provide speed ups of 3 to 6 orders of magnitude for the mapping step when compared to the state-of-the-art at minimal performance degradation, by the combined usage of 3 distinct mechanisms: 1) a simple and efficient modulo scheduling heuristic, 2) a crossbar network, which simplifies the placement and routing, 3) a virtual coarse-grained reconfigurable architecture (CGRA). Additionally, since the CGRA is a virtual layer on top of an FPGA, it is possible to use any off-the-shelf FPGA without the need of special tools or IP solutions. Although the mapping is NP-complete even for crossbar-based CGRAs, experimental results demonstrate a huge reduction in compilation time, as opposed to previous solutions that require seconds to map the applications, our solution requires only microseconds to find near optimal schedules. Besides the speed up, the proposed solution enables the use of just-in-time compilation, hence it is intrinsically adaptive to a changing scenario.

international conference on embedded computer systems architectures modeling and simulation | 2014

A run-time modulo scheduling by using a binary translation mechanism

Ricardo S. Ferreira; Waldir Denver; Monica Magalhães Pereira; Jorge Quadros; Luigi Carro; Stephan Wong

It is well known that innermost loop optimizations have a big effect on the total execution time. Although CGRAs is widely used for this type of optimizations, their usage at run-time has been limited due to the overheads introduced by application analysis, code transformation, and reconfiguration. These steps are normally performed during compile time. In this work, we present the first dynamic translation technique for the modulo scheduling approach that can convert binary code on-the-fly to run on a CGRA. The proposed mechanism ensures software compatibility as it supports different source ISAs. As proof of concept of scaling, a change in the memory bandwidth has been evaluated (from one memory access per cycle to two memory accesses per cycle). Moreover, a comparison to the state-of-the-art static compiler-based approaches for inner loop accelerators has been done by using CGRA and VLIW as target architectures. Additionally, to measure area and performance, the proposed CGRA was prototyped on a FPGA. The area comparisons show that crossbar CGRA (with 16 processing elements) is 1.9x larger than the VLIW 4-issue and 1.3x smaller than a VLIW 8-issue softcore processor, respectively. In addition, it reaches an overall speedup factor of 2.17x and 2.0x in comparison to the 4 and 8-issue, respectively. Our results also demonstrate that the run-time algorithm can reach a near-optimal ILP rate, better than an off-line compiler approach for an n-issue VLIW processor.

International Journal of Reconfigurable Computing | 2011

Dynamic Reconfigurable Computing: The Alternative to Homogeneous Multicores under Massive Defect Rates

Monica Magalhães Pereira; Luigi Carro

The aggressive scaling of CMOS technology has increased the density and allowed the integration of multiple processors into a single chip. Although solutions based on MPSoC architectures can increase applications speed through TLP exploitation, this speedup is still limited to the amount of parallelism available in the application, as demonstrated by Amdahls Law. Moreover, with the continuous shrinking of device features, very aggressive defect rates are expected for new technologies. Under high defect rates a large amount of processors of the MPSoC will be susceptible to defects and consequently will fail, not only reducing yield but also severely affecting the expected performance. This paper presents a run-time adaptive architecture that allows software execution even under aggressive defect rates. The proposed architecture can accelerate not only highly parallel applications but also sequential ones, and it is a heterogeneous solution to overcome the performance penalty that is imposed to homogeneous MPSoCs under massive defect rates.

adaptive hardware and systems | 2009

Dynamically Adapted Low-Energy Fault Tolerant Processors

Monica Magalhães Pereira; Luigi Carro

The constant advances on scaling have introduced several issues to the design of processing structures in new technologies. The closer one gets to nano-scale devices, the more necessary are methods to develop circuits that are able to tolerate high defect densities. At the same time, beyond area costs, there is a pressure to maintain energy and power dissipation at acceptable levels, which practically forbids classical redundancy. This paper presents a dynamic solution to provide reliability and reduce energy of a microprocessor using a dynamically adaptive reconfigurable fabric. The approach combines the binary translation mechanism with the sleep transistor technique to ensure graceful degradation for software applications, while at the same time can reduce energy by shutting off the power supply of the unused and the defective resources of a reconfigurable fabric.

IEEE Transactions on Very Large Scale Integration Systems | 2009

A dynamic reconfiguration approach for accelerating highly defective processors

Monica Magalhães Pereira; Luigi Carro

The advances on the scaling process have brought several challenges concerning fault-tolerance of new technologies. At nano-scale basis, the contacts and wires defect rate is predicted to be around 1% to 15%. At this point, it will be inevitable that designs in future technologies embed some defect tolerance scheme. The desired solution at the processor level should allow the computer architecture to continue to execute software, even with the high level of defects that new technologies should introduce. This paper presents an adaptive approach that is capable of guaranteeing not only software execution but also acceleration, even under aggressive defect densities. We propose the use of an on-line binary translation mechanism implemented in a dynamically reconfigurable fabric, exploiting regularity of the reconfigurable fabric as intrinsic spare-parts, trading a small acceleration penalty for quality assurance.

adaptive hardware and systems | 2011

Run-time resource instantiation for fault tolerance in FPGAs

Monica Magalhães Pereira; Lars Braun; Michael Hübner; Jürgen Becker; Luigi Carro

The scaling of IC feature sizes has increased the integration capability and allowed the design of large systems in one single chip. This improvement has also contributed to reconfigurable circuits densities, such as the Xilinx Virtex-4 FPGA, which exceeds ten million gates. In spite of that, circuit miniaturization will also increase defect and fault rates in such high magnitude that a fault tolerance approach will be mandatory for the proper functioning of any circuit in future technologies. To cope with manufacturing defects and permanent faults in FPGAs, this paper presents an approach that dynamically instantiates the resources in non-faulty regions of the FPGA. The run-time control system works around the faulty FPGA region and instantiates the logic and routing units in non-faulty ones. This allows one to sustain performance by preserving the amount of resources. Moreover, the proposed fault tolerance approach is transparent to the user. For each configuration the control system automatically instantiates the units according to the applications data flow graph and the defect and fault map.

acm symposium on applied computing | 2008

Using traditional loop unrolling to fit application on a new hybrid reconfigurable architecture

Monica Magalhães Pereira; Sílvio R. F. de Araújo; Bruno C. de Gliveira; Ivan Saraiva Silva

This paper presents a strategy to modify a sequential implementation of an H.264/AVC motion estimation to run on a new reconfigurable architecture called RoSA. The modifications aim to provide more parallelism that will be exploited by the architecture. In the strategy presented in this paper we used traditional loop unrolling and profile information as techniques to modify the application and to generate a best fit solution to RoSA architecture.

Archive | 2013

Adaptability: The Key for Future Embedded Systems

Antonio Carlos Schneider Beck; Carlos Arthur Lang Lisbôa; Luigi Carro; Gabriel L. Nazar; Monica Magalhães Pereira; Ronaldo Rodrigues Ferreira

Conflicting trends can be observed in the hardware industry for embedded systems, which are presently being required to run several different applications with distinctive behaviors, becoming more heterogeneous. At the same time, users also demand these systems to operate during an extended period of time, creating extra pressure for energy efficiency. While transistor size shrinks, processors are getting more sensitive to fabrication defects, aging and soft faults, which increase the costs associated with their production. To make this situation even worse, in most of the time designers are stuck with the need to sustain binary compatibility, in order to support the huge amount of embedded software already deployed.In this challenging context, adaptability in multiple levels is the key for sustaining the aforementioned requirements. Embedded systems must adapt themselves to better execute their applications with the lowest possible power dissipation, while respecting their original functional behavior and their set of non-functional constraints (such as maximum execution time or power budget). They also must adapt when scheduling these different applications to be executed on their distinct hardware components, depending on availability, performance requirements and energy budget; or still adapt themselves to keep working when a defect comes from the fabrication process, or when a fault appears at runtime. High resilience allows increased yield and reduced costs, even with aggressive scaling or by the use of unreliable technologies or operation in harsh environments.This chapter overviews the toughest challenges that embedded software and hardware engineers face when designing new devices and systems, and how these systems are expected to grow in complexity in the forthcoming years. In the end of this chapter it will become clear how only aggressive adaptability can tackle these conflicting design constraints in a sustainable fashion, and still allow huge fabrication volumes. Each challenge is developed in details throughout the next chapters, providing an extensive literature review as well as settling a promising research agenda for adaptability.

Explore More