Ricardo S. Ferreira | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Ricardo S. Ferreira is active.

Explore More

Publication

Featured researches published by Ricardo S. Ferreira.

compilers, architecture, and synthesis for embedded systems | 2011

An FPGA-based heterogeneous coarse-grained dynamically reconfigurable architecture

Ricardo S. Ferreira; Julio C. Goldner Vendramini; Lucas Mucida; Monica Magalhães Pereira; Luigi Carro

Coarse-grained reconfigurable architecture has emerged as a promising model for embedded systems as a solution to reduce the complexity of FPGA synthesis and mapping steps, consequently reducing reconfiguration time. Despite these advantages, CGRA usage has been limited due to the lack of commercial CGRA circuits. This work proposes a virtual and dynamic CGRA implemented on top of an FPGA. This approach allows the usage of commercial-off-the-shelf FPGA devices combined with the advantages of CGRAs. The proposed architecture consists of a set of heterogeneous functional units (FU) and a global interconnection network. The global network allows any FU to be used at each cycle, which reduces significantly the placement complexity. In addition, we introduce a polynomial mapping algorithm which includes scheduling, placement and routing steps (SPR). Moreover, the proposed approach performs a very fast placement and routing in comparison to similar CGRA approaches. The three SPR steps are computed in few milliseconds. The feasibility of this approach is demonstrated for a suite of digital signal processing benchmarks.

international parallel and distributed processing symposium | 2009

A low cost and adaptable routing network for reconfigurable systems

Ricardo S. Ferreira; Marcone Laure; Antonio Carlos Schneider Beck; Thiago Berticelli Lo; Mateus B. Rutzig; Luigi Carro

Nowadays, scalability, parallelism and fault-tolerance are key features to take advantage of last silicon technology advances, and that is why reconfigurable architectures are in the spotlight. However, one of the major problems in designing reconfigurable and parallel processing elements concerns the design of a cost-effective interconnection network. This way, considering that Multistage Interconnection Network (MIN) has been successfully used in several computer system levels and applications in the past, in this work we propose the use of a MIN, at the word level, on a coarse-grained reconfigurable architecture. More precisely, this work presents a novel parallel self-placement and routing mechanism for MIN on the circuit-switching mode. We take into account one-to-one as well as multicast (one-to-many) permutations. Our approach is scalable and it is targeted to be used in run-time environments where dynamic routing among functional units is required. In addition, our algorithm is embedded in the switch structure, and it is independent of the interstage interconnection pattern. Our approach can handle blocking and non-blocking networks, symmetrical or asymmetrical topologies. As case study, we use the proposed technique in a dynamic reconfigurable system, showing a major area reduction of 30% without performance overhead.

ieee computer society annual symposium on vlsi | 2007

A Polynomial Placement Algorithm for Data Driven Coarse-Grained Reconfigurable Architectures

Ricardo S. Ferreira; Alisson Garcia; Tiago Teixeira; João M. P. Cardoso

Coarse-grained reconfigurable computing architectures vary widely in the number and characteristics of the processing elements (cells) and routing topologies used. In order to exploit several different topologies, a place and route framework, able to deal with such vast design exploration space, is of paramount importance. Bearing this in mind, this paper proposes a placement scheme able to target different topologies when considering data-driven reconfigurable architectures. Our approach uses graph models for the target architecture and for the dataflow representation of the application being mapped. Our placement algorithm is guided by a depth-first traversal in both the architecture and the application graphs. Two versions of the placement algorithm with respectively O(e) and O(e + n3) computational complexities are presented, where e is the number of edges in the dataflow representation of the application and n is the number of cells in the graph model of the architecture. The achieved experimental results show that our approach can be useful to exploit different interconnect topologies as far as coarse-grained reconfigurable computing architectures are concerned

reconfigurable computing and fpgas | 2006

Mesh Mapping Exploration for Coarse-Grained Reconfigurable Array Architectures

Marcos Vin'icius G. Barbosa da Silva; Ricardo S. Ferreira; Alisson Garcia; João M. P. Cardoso

Coarse-grained reconfigurable array architectures are currently focus of intensive research. They have already proven performance improvements and energy savings over traditional architectures. However, coarse-grained arrays vary widely in the number and characteristics of the processing elements and routing topologies used. This work presents a flexible mapping environment for design space exploration of coarse-grained, data-driven, reconfigurable array architectures. The mapping included in the environment presented in this paper takes advantage of Java and XML technologies to enable an efficient architectural tradeoff analysis. This approach does not focus on neither a specific mapping algorithm nor a specific architecture, but on an open environment where users can add their own mapping algorithms and architecture patterns. A genetic algorithm for placement is presented. A number of DSP benchmarks are used to explore a range of mesh architectures and to validate the approach. The experiments show a fast, scalable and flexible mapping environment to explore new mesh array patterns, homogeneous and heterogeneous architectures

international conference on embedded computer systems architectures modeling and simulation | 2013

A just-in-time modulo scheduling for virtual coarse-grained reconfigurable architectures

Ricardo S. Ferreira; Vinicius Duarte; Waldir Meireles; Monica Magalhães Pereira; Luigi Carro; Stephan Wong

In the past decade, most solutions concerning the mapping of the compute-intensive loop kernels to accelerators have used heuristics and compiler-based strategies. These facts require that most of the decisions be taken at design time, thus precluding efficient solutions that can take run-time information into account. Any success in accelerating such applications greatly depends on two steps, extracting the loops and mapping them into the architecture. This last step is a challenge in itself since it is a NP-complete problem. In this paper, we propose a runtime solution that can provide speed ups of 3 to 6 orders of magnitude for the mapping step when compared to the state-of-the-art at minimal performance degradation, by the combined usage of 3 distinct mechanisms: 1) a simple and efficient modulo scheduling heuristic, 2) a crossbar network, which simplifies the placement and routing, 3) a virtual coarse-grained reconfigurable architecture (CGRA). Additionally, since the CGRA is a virtual layer on top of an FPGA, it is possible to use any off-the-shelf FPGA without the need of special tools or IP solutions. Although the mapping is NP-complete even for crossbar-based CGRAs, experimental results demonstrate a huge reduction in compilation time, as opposed to previous solutions that require seconds to map the applications, our solution requires only microseconds to find near optimal schedules. Besides the speed up, the proposed solution enables the use of just-in-time compilation, hence it is intrinsically adaptive to a changing scenario.

international conference on embedded computer systems architectures modeling and simulation | 2005

Data-driven regular reconfigurable arrays: design space exploration and mapping

Ricardo S. Ferreira; João M. P. Cardoso; Andre Toledo; Horácio C. Neto

This work presents further enhancements to an environment for exploring coarse grained reconfigurable data-driven array architectures suitable to implement data-stream applications. The environment takes advantage of Java and XML technologies to enable architectural trade-off analysis. The flexibility of the approach to accommodate different topologies and interconnection patterns is shown by a first mapping scheme. Three benchmarks from the DSP scenario, mapped on hexagonal and grid architectures, are used to validate our approach and to establish comparison results.

field-programmable logic and applications | 2004

An Environment for Exploring Data-Driven Architectures

Ricardo S. Ferreira; João M. P. Cardoso; Horácio C. Neto

A wide range of reconfigurable coarse-grain architectures has been proposed in recent years, for an extensive set of applications. These architectures vary widely in the interconnectivity, number, granularity and complexity of the processing elements (PEs). The performance of a specific application usually depends heavily on the adequacy of the PEs to the particular tasks involved, but tools to efficiently experiment architectural features are lacking. This work proposes an environment for exploration and simulation of coarse-grain reconfigurable data-driven architectures. The proposed environment takes advantage of Java and XML technologies to enable a very efficient backend for experiments with different architectural trade-offs, from the array connectivity and topology to the granularity and complexity of each PE. For a proof of concept, we show results on implementing different versions of a FIR filter on a hexagonal data-driven array.

international conference on embedded computer systems architectures modeling and simulation | 2014

A run-time modulo scheduling by using a binary translation mechanism

Ricardo S. Ferreira; Waldir Denver; Monica Magalhães Pereira; Jorge Quadros; Luigi Carro; Stephan Wong

It is well known that innermost loop optimizations have a big effect on the total execution time. Although CGRAs is widely used for this type of optimizations, their usage at run-time has been limited due to the overheads introduced by application analysis, code transformation, and reconfiguration. These steps are normally performed during compile time. In this work, we present the first dynamic translation technique for the modulo scheduling approach that can convert binary code on-the-fly to run on a CGRA. The proposed mechanism ensures software compatibility as it supports different source ISAs. As proof of concept of scaling, a change in the memory bandwidth has been evaluated (from one memory access per cycle to two memory accesses per cycle). Moreover, a comparison to the state-of-the-art static compiler-based approaches for inner loop accelerators has been done by using CGRA and VLIW as target architectures. Additionally, to measure area and performance, the proposed CGRA was prototyped on a FPGA. The area comparisons show that crossbar CGRA (with 16 processing elements) is 1.9x larger than the VLIW 4-issue and 1.3x smaller than a VLIW 8-issue softcore processor, respectively. In addition, it reaches an overall speedup factor of 2.17x and 2.0x in comparison to the 4 and 8-issue, respectively. Our results also demonstrate that the run-time algorithm can reach a near-optimal ILP rate, better than an off-line compiler approach for an n-issue VLIW processor.

Revista Brasileira de Educação Médica | 2014

As redes neurais artificiais e o ensino da medicina

Rodrigo Siqueira-Batista; Rodrigo Roger Vitorino; Andréia Patrícia Gomes; Alcione de Paiva Oliveira; Ricardo S. Ferreira; Vanderson Esperidião-Antonio; Luiz Alberto Santana; Fabio Ribeiro Cerqueira

The transformations that medical practice has undergone in recent years - especially with the incorporation of new information technologies - point to the need to broaden discussions on the teaching-learning process in medical education. The use of new computer technologies in medical education has shown many advantages in the process of acquiring skills in problem solving, which encourages creativity, critical thinking, curiosity and scientific spirit. In this context, it is important to highlight artificial neural networks (ANN) - computer systems with a mathematical structure inspired by the human brain - which proved to be useful in the evaluation process and the acquisition of knowledge among medical students. The purpose of this communication is to review aspects of the application of ANN in medical education.

international symposium on quality electronic design | 2000

Probabilistic bottom-up RTL power estimation

Ricardo S. Ferreira; Anne-Marie Trullemans; José Carlos Costa; José C. Monteiro

We address the problem of power estimation at the register-transfer level (RTL). At this level, the circuit is described in terms of a set of interconnected memory elements and combinational modules of different degrees of complexity. We propose a bottom-up approach to create a simplified high-level model of the block behavior for power estimation, which is described by a symbolic local polynomial. We use an efficient gate-level modeling based on the polynomial simulation method and ZBDDs. We present a set of experimental results that show a large improvement in performance and robustness when compared to previous approaches.

Explore More