João Canas Ferreira
University of Porto
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by João Canas Ferreira.
international test conference | 1993
José Silva Matos; Ana C. Leão; João Canas Ferreira
BST is a well established standard and testability framework for digital ICs and boards. The paper presents a test support IC controlled by an IEEE1149.1 interface, capable of providing access to analog nodes in mixed-signal boards. The proposed architecture (ABSINT - Analog to Boundary Scan Interface) is described and relevant implementation issues are discussed. A demonstrator IC implementing the ABSINT architecture is presented, and it is shown how it can be used to provide analog test channels under control of IEEE1149.1.<<ETX>>
Journal of Systems Architecture | 2006
Miguel L. Silva; João Canas Ferreira
Run-time partial reconfiguration of programmable hardware devices can be applied to enhance many applications in high-end embedded systems, particularly those that employ recent platform FPGAs. The effective use of this approach is often hampered by the complexity added to the system development process and by limited tool support. The paper is concerned with several aspects related to the effective exploitation of run-time partial reconfiguration, with particular emphasis on the generation of partial configurations and the run-time utilisation of the reconfigurable resources. The paper presents an approach inspired by the traditional software development: partial configurations are produced by assembling components from a previously created library, thus enabling the embedded application developer to produce the configuration data required for run-time modifications with less effort than is needed with the conventional design flow. A tool that supports this approach is also described. A second set of issues is addressed by a run-time support library that provides facilities for managing the hardware reconfiguration process and the communication with the reconfigured circuits. The use of run-time partial reconfiguration requires a high level of system support. The paper describes one possible approach, presenting a demonstration system developed to support the present work and characterising its performance. In order to clarify the advantages of the approach to run-time reconfiguration discussed in the paper, two small case studies are described, the first on the use of dedicated datapaths for subword operations and the second on two-dimensional pattern-matching for bilevel images. Timing measurements for both cases are included.
field-programmable logic and applications | 2008
Miguel L. Silva; João Canas Ferreira
The paper presents a method for generating partial bitstreams on-line for use in systems with run-time reconfigurable FPGAs. Bitstream creation is performed at run-time by merging partial bitstreams from individual component modules. The process includes the capability to create connections between the modules by selection from a set of routes found during an off-line pre-processing step. Placement and interconnection of modules must follow a precise set of rules. While restricting the number of possible module arrangements, this approach allows bitstream creation to be performed with relatively few computational resources. Using a demonstration system with a Virtex-II Pro FPGA with a PowerPC 405 CPU, the process of creating at run-time a partial bitstream for 22% of the device area takes 24 ms.
International Journal of Reconfigurable Computing | 2013
João Bispo; Nuno Miguel Cardanha Paulino; João M. P. Cardoso; João Canas Ferreira
The ability to map instructions running in a microprocessor to a reconfigurable processing unit (RPU), acting as a coprocessor, enables the runtime acceleration of applications and ensures code and possibly performance portability. In this work, we focus on the mapping of loop-based instruction traces (called Megablocks) to RPUs. The proposed approach considers offline partitioning and mapping stages without ignoring their future runtime applicability. We present a toolchain that automatically extracts specific trace-based loops, called Megablocks, from MicroBlaze instruction traces and generates an RPU for executing those loops. Our hardware infrastructure is able to move loop execution from the microprocessor to the RPU transparently, at runtime, and without changing the executable binaries. The toolchain and the system are fully operational. Three FPGA implementations of the system, differing in the hardware interfaces used, were tested and evaluated with a set of 15 application kernels. Speedups ranging from 1.26 to 3.69 were achieved for the best alternative using a MicroBlaze processor with local memory.
reconfigurable computing and fpgas | 2012
Pedro Vieira dos Santos; José Carlos Alves; João Canas Ferreira
Cellular Genetic Algorithms (cGAs) exhibit a natural parallelism that makes them interesting candidates for hardware implementation, as several processing elements can operate simultaneously on subpopulations shared among them. This paper presents a scalable architecture for a cGA, suitable for FPGA implementation. A regular array of custom designed processing elements (PEs) works on a population of solutions that is spread into dual-port memory blocks locally shared by adjacent PEs. A travelling salesman problem with 150 cities was used to verify the implementation of the proposed cGA on a Virtex-6 FPGA, using a population of 128 solutions with different levels of parallelism (1, 4, 16 and 64 PEs). Results have shown that an increase of the number of PEs does not degrade the quality of the convergence of the iterative process, and that the throughput increases almost linearly with the number of PEs. Comparing with a software implementation running in a PC, the cGA with 64 PEs has shown a 45x speedup.
Reconfigurable Computing-From FPGAs to Hardware/Software Codesign. Ed.: J. M. P. Cardoso | 2011
João M. P. Cardoso; Pedro C. Diniz; Zlatko Petrov; Koen Bertels; Michael Hübner; Hans van Someren; Fernando M. Gonçalves; José Gabriel F. Coutinho; George A. Constantinides; Bryan Olivier; Wayne Luk; Juergen Becker; Georgi Kuzmanov; Florian Thoma; Lars Braun; Matthias Kühnle; Razvan Nane; Vlad Mihai Sima; Kamil Krátký; José Carlos Alves; João Canas Ferreira
The relentless increase in capacity of Field-Programmable Gate-Arrays (FPGAs) has made them vehicles of choice for both prototypes and final products requiring on-chip multi-core, heterogeneous and reconfigurable systems. Multiple cores can be embedded as hard- or soft-macros, have customizable instruction sets, multiple distributed RAMs and/or configurable interconnections. Their flexibility allows them to achieve orders of magnitude better performance than conventional computing systems via customization. Programming these systems, however, is extremely cumbersome and error-prone and as a result their true potential is only achieved very often at unreasonably high design efforts. This project covers developing, implementing and evaluating a novel compilation and synthesis system approach for FPGA-based platforms. We rely on Aspect-Oriented Specifications to convey critical domain knowledge to a mapping engine while preserving the advantages of a high-level imperative programming paradigm in early software development as well as program and application portability. We leverage Aspect-Oriented specifications and a set of transformations to generate an intermediate representation suitable to hardware mapping. A programming language, LARA, will allow the exploration of alternative architectures and design patterns enabling the generation of flexible hardware cores that can be easily incorporated into larger multi-core designs. We will evaluate the effectiveness of the proposed approach using partner-provided codes from the domain of audio processing and real-time avionics. We expect the technology developed in REFLECT to be integrated by our industrial partners, in particular by ACE, a leading compilation tool supplier for embedded systems, and by Honeywell, a worldwide solution supplier of embedded high-performance systems.
Iet Computers and Digital Techniques | 2007
Miguel L. Silva; João Canas Ferreira
A tool called BITLINKER, that creates partially reconfigurable modules from the bit-streams of individual components is described. It is also capable of performing restricted component placement and interconnect routing between the assembled components. The resulting modules are used in applications that exploit partial dynamic reconfiguration. The tool is integrated in a design flow particularly aimed at dynamically reconfigurable platform field-programmable gate arrays (FPGAs). The associated development design flow and a run-time support system that can be used to manage module activation and data communication are described. Evaluation results obtained with a Virtex-II Pro system are also reported.
reconfigurable computing and fpgas | 2011
João Bispo; Nuno Miguel Cardanha Paulino; João M. P. Cardoso; João Canas Ferreira
This paper presents an offline tool-chain which automatically extracts loops (Mega blocks) from Micro Blaze instruction traces and creates a tailored Reconfigurable Processing Unit (RPU) for those loops. The system moves loops from the CPU to the RPU transparently, at runtime, and without changing the executable binaries. The system was implemented in an FPGA and for the tested kernels measured speedups ranged between 3.9x and 18.2x for a Micro Blaze CPU without cache. We estimate speedups from 1.03x to 2.01x, when comparing to the best estimated performance achieved with a single Micro Blaze.
field-programmable logic and applications | 2013
Pedro Vieira dos Santos; José Carlos Alves; João Canas Ferreira
The genetic algorithm (GA) is an optimization metaheuristic that relies on the evolution of a set of solutions (population) according to genetically inspired transformations. In the variant of this technique called cellular GA, the evolution is done separately for subgroups of solutions. This paper describes a hardware framework capable of efficiently supporting custom accelerators for this metaheuristic. This approach builds a regular array of problem-specific processing elements (PEs), which perform the genetic evolution, connected to shared memories holding the local subpopulations. To assist the design of the custom PEs, a methodology based on highlevel synthesis from C++ descriptions is used. The proposed architecture was applied to a spectrum allocation problem in cognitive radio networks. For an array of 5×5 PEs in a Virtex-6 FPGA, the results show a minimum speedup of 22× compared to a software version running on a PC and a speedup near 2000× over a MicroBlaze soft processor.
international parallel and distributed processing symposium | 2005
João Canas Ferreira; Miguel Mira da Silva
We report on work in progress that aims to provide a run-time management kernel for applications running on FPGAs with embedded CPUs. We describe the global concept, the organization of the hardware environment for the reconfigurable modules and the reconfiguration strategy supported by the run-time management kernel. Practical issues concerning the implementation of the system on a Virtex-II Pro-based board are also addressed.