Victor Wanderley Costa de Medeiros

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Victor Wanderley Costa de Medeiros is active.

Explore More

Publication

Featured researches published by Victor Wanderley Costa de Medeiros.

International Journal of High Performance Systems Architecture | 2012

FPGA-based architecture to speed-up scientific computation in seismic applications

Victor Wanderley Costa de Medeiros; Rodrigo Camarotti Ferreira da Rocha; Antonyus Pyetro do Amaral Ferreira; João Paulo Fernandes Barbosa; Abel G. Silva-Filho; Manoel Eusebio de Lima; Thomas Grösser; Wolfgang Rosenstiel

Hardware accelerators like GPGPUs and FPGAs have been used as an alternative to conventional CPU architectures in scientific computing applications and have shown considerable speed-ups on them. In this context, this work presents an FPGA-based solution that explores efficiently the data reuse and spatial and time domain parallelism for the first computational stage of the reverse time migration (RTM) algorithm, the seismic modelling. We also implemented the same algorithm for some CPUs and GPGPU architectures and our results showed that an FPGA-based approach can be a feasible solution to improve performance. Experimental results showed similar performance when compared to the GPGPU and up to 28.91 times speed-up when compared to CPUs. In terms of energy efficiency, the FPGA is almost 23 times and 1.75 times more efficient than the CPU and GPGPU, respectively. We also discuss some other features and possible optimisations that can be included in the proposed architecture that can make this performance even better.

symposium on integrated circuits and systems design | 2009

Architecture for dense matrix multiplication on a high-performance reconfigurable system

Viviane Lucy Santos de Souza; Victor Wanderley Costa de Medeiros; Manoel Eusebio de Lima

The recent evolution of the programmable logic devices, such as FPGAs (Field Programmable Gate Array), associated with the growing demand for performance improvements in scientific computing applications, has attracted the attention of supercomputers vendors. They have been developing hybrid platforms that links general-purpose processors with co-processors based on FPGAs, aiming computing acceleration. In this work we present the analysis and development of an important scientific computing operation: matrix multiplication, targeting the commercial hybrid platform RASC (Reconfigurable Application-Specific Computing), developed by Silicon Graphics. The proposed architecture aims to reach better performance than conventional architectures, dissipating less power. To achieve this goal, we investigated the possibilities of implementation in parallel and data reuse intrinsic to the algorithm. Based on this investigation we propose a case study that uses the available resources in the target platform to explore these features.

symposium on integrated circuits and systems design | 2008

Implementation of a double-precision multiplier accumulator with exception treatment to a dense matrix multiplier module in FPGA

A. H. C. Barros; Victor Wanderley Costa de Medeiros; Viviane Lucy Santos de Souza; Paulo Sérgio B. Nascimento; Ângelo Mazer; João Paulo Fernandes Barbosa; Bruno Neves; Ismael H. F. dos Santos; Manoel Eusebio de Lima

Recently, the manufactures of supercomputers have made use of FPGAs to accelerate scientific applications [16][17]. Traditionally, the FPGAs were used only on non-scientific applications. The main reasons for this fact are: the floating-point computation complexity; the FPGA logic cells are not sufficient for the scientific cores implementation; the cores complexity prevents them to operate on high frequencies. Nowadays, the increase of specialized blocks availability in complex operations, as sum and multiplier blocks, implemented directly in FPGA and, the increase of internal RAM blocks (BRAMs) have made possible high performance systems that use FPGA as a processing element for scientific computation [2]. These devices are used as co-processors that execute intensive computation. The emphasis of these architectures is the exploration of parallelism present on scientific computation operations and data reuse. In major of these applications, the scientific computation uses, in general, operations of big floating-point dense matrices, which are normally operated by MACs. In this work, we describe the architecture of an accumulative multiplier (MAC) in double precision floating-point, according to IEEE-754 standard and we propose the architecture of a multiplier of matrices that uses developed instances of the MACs and explores the reuse of data through the use of the BRAMs (Blocks of RAM internal to the FPGAs) of a Xilinx Virtex 4 LX200 FPGA. The synthesis results showed that the implemented MAC could reach a performance of 4GFLOPs.

Archive | 2013

High Performance Implementation of RTM Seismic Modeling on FPGAs: Architecture, Arithmetic and Power Issues

Victor Wanderley Costa de Medeiros; A. H. C. Barros; Abel G. Silva-Filho; Manoel Eusebio de Lima

This work presents a case study in the oil and gas industry, namely the FPGA implementation of the 2D reverse timing migration (RTM) seismic modeling algorithm. These devices have been largely used as accelerators in scientific computing applications that require massive data processing, large parallel machines, huge memory bandwidth and power. The RTM algorithm enables you to directly solve the acoustic and elastic waves problems with precision in complex geological structures, demanding a high computational power. To face such challenges we suggest strategies such as reduced arithmetic precision, based on fixed-point numbers, and a highly parallel architecture are suggested. The effects of such reduced precision for storage/processing data are analyzed in this chapter through signal-noise ratio (SRN) and universal image quality index (UIQI) metrics. The results show that SRN higher than 50dB can be considered acceptable for a migrated image with 15 bits word size. A special stream-processing architecture aiming to implement the best possible data reuse for the algorithm is also presented. It was implemented by an FIFO-based cache in the internal memory of the FPGA. A temporal pipeline structure has also been developed, allowing that multiple time steps to be performed at the same time. The main advantage of this approach is the ability to keep the same memory bandwidth needs of processing just one time step. The number of time steps processed at the same time is limited by the amount of FPGA internal memory and logic blocks. The algorithm was implemented on an Altera Stratix 260E, with 16 processing elements (PEs). The FPGA was 29 times faster than the CPU and only 13% slower than the GPGPU. In terms of power consumption, the CPU+FPGA was 1.7 times more efficient than the GPGPU system.

2011 Simpasio em Sistemas Computacionais | 2011

FPGA-based Accelerator to Speed-up Seismic Applications

Victor Wanderley Costa de Medeiros; R.C.F. Rocha; A.P.A. Ferreira; J.C.B.L. Correia; J.P.F. Barbosa; Abel G. Silva-Filho; Manoel Eusebio de Lima; Rodrigo Gandra; Ricardo Braganca

Hardware accelerators such as GPGPUs and FPGAs have been used as an alternative to the conventional CPU in scientific computing applications and have shown significant performance improvements. In this context, this work presents an FPGA-based solution that explores efficiently the reuse of data and parallelization in both space and time domains for the first computational stage of the RTM (Reverse Time Migration) algorithm, the seismic modeling. We also implemented the same algorithm for CPU architectures and GPGPU and our results demonstrate that the FPGA-based approach can be a viable solution to improve performance. Experimental results show a speedup of 1.668 times compared with GPGPU and 25.79 times compared to CPU. Results were evaluated with the Marmousi velocity model, considering the same parameters in all approaches.

international conference on parallel processing | 2012

Energy Estimation Tool FPGA-based Approach for Petroleum Industry

Gilliano Ginno Silva de Menezes; Abel G. Silva-Filho; Viviane Lucy Santos de Souza; Victor Wanderley Costa de Medeiros; Manoel Eusebio de Lima; Rodrigo Gandra; Ricardo Braganca

Energy consumption is one of the great villains in high-performance processing when applied to large clusters that continuously run certain applications. Seismic migration applications are targets of this type of processing, since this feature denotes a need to apply complex models that are continuously run to evaluate drilling of petroleum wells. This work describes an analysis tool of energy consumption of a seismic application applied to FPGA architecture for a real Brazilian industry. A comparative study with the traditional multi-core and with GPGPU architectures is performed and results indicate an increase in efficiency/Joule of about 23 and 1.5 times higher respectively. Experiments performed with the Marmousi model revels an error about 3.7% when compared with measured values.

ieee international conference on high performance computing data and analytics | 2011

Poster: high performance FPGA-based implementation of the seismic modeling of the RTM algorithm

Victor Wanderley Costa de Medeiros; Rodrigo Camarotti Ferreira da Rocha; Antonyus Pyetro do Amaral Ferreira; Bruno Holanda Tavares Charamba Dutra; A. H. C. Barros; João Cleber Bezerra Liborio Correia; João Paulo Fernandes Barbosa; Severino José de Barros-Junior; Gilliano Ginno Silva de Menezes; Abel G. Silva-Filho; Manoel Eusebio de Lima

Hardware accelerators like GPGPUs and FPGAs have been used as an alternative for the conventional computing architectures (CPUs) in scientific computing applications and have shown considerable speed-ups. In this context, this poster presents a solution that takes advantage from FPGAs flexibility to explore efficiently data reuse, parallelization in both time and space domains for the first processing stage of the RTM (Reverse Time Migration) algorithm, the seismic modeling. In order to obtain a benchmark for our FPGA implementation, we also implemented the same algorithms for a CPU and GPGPU architecture. Our results showed that the FPGAs are a feasible platform for this set of applications. The experimental results have shown a 1,67x speed-up when compared to a Tesla C1060 GPGPU and a 25,79x speed-up when compared to an AMD Athlon 64 X2 CPU.

International Journal of Modeling and Simulation for the Petroleum Industry | 2011

An Experimental Cluster Based on FPGA Accelerators Nodes for Floating-Point Arithmetic Applications

Bruno Holanda; Rodrigo Pimentel; João Paulo Fernandes Barbosa; A. H. C. Barros; Bruno Pessoa; Ismael H. F. dos Santos; Victor Wanderley Costa de Medeiros; Viviane Lucy Santos de Souza; Abel G. Silva-Filho; Manoel Eusebio de Lima

Archive | 2008

TPTCGen: A Tools for Temporal Partitioning and Tasks Design Exploration for Massive Data Applications in High Performance Reconfigurable Computers

Paulo S. BrandNascimento; Manoel Eusebio de Lima; Victor Wanderley Costa de Medeiros; Ismael H. F. dos Santos; Caixa Postal

International Journal of Modeling and Simulation for the Petroleum Industry | 2007

Reconfigurable Platforms for High Performance Processing

Paulo Sérgio B. Nascimento; Jordana Seixas; Edson Barbosa; Stelita Silva; Abner Correa; Viviane Lucy; Victor Wanderley Costa de Medeiros; Arthur Rolim; Dercy Lima; Manoel Eusebio de Lima

Explore More