Pedro Tomás | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Pedro Tomás is active.

Explore More

Publication

Featured researches published by Pedro Tomás.

Catena | 1995

Characterization of raindrop size distributions at the Vale Formoso Experimental Erosion Center

Miguel Azevedo Coutinho; Pedro Tomás

Abstract Previous studies carried out by the authors and other references collected from the literature have shown the inadequacy of most rainfall erosivity indexes and specially the Wischmeier rainfall erosivity index, for properly estimating erosivity in southern Portugal and in Mediterranean climates. In order to measure raindrop size distributions of natural rainfall automatically and continuously, a device was installed at the Vale Formoso Experimental Erosion Center. The main objective was to correlate field measurements to model estimates. The relatively small amounts of rain which fell at the station last year meant that it was only possible to check the calibration of the device; preliminary results are presented here. Classical raindrop distributions are compared with the data and relationships between rainfall intensity and kinetic energy of rainfall are obtained and compared with that proposed by RUSLE.

IEEE Transactions on Circuits and Systems | 2005

Visual neuroprosthesis: a non invasive system for stimulating the cortex

Moisés Piedade; José A. B. Gerald; Leonel Sousa; Gonçalo Nuno Gomes Tavares; Pedro Tomás

This paper describes a complete visual neuroprosthesis wireless system designed to restore useful visual sense to profoundly blind people. This visual neuroprosthesis performs intracortical microstimulation through one or more arrays of microelectrodes implanted into the primary visual cortex. The whole system is composed by a primary unit located outside the body and a secondary unit, implanted inside the body. The primary unit comprises a neuromorphic encoder, a forward transmitter, and a backward receiver. The developed neuromorphic encoder generates the spikes to stimulate the cortex by approximating the spatio-temporal receptive fields characteristic response of ganglion cells. Power and stimuli information are carried to inside the cranium by means of a low-coupling transformer, which establishes a wireless inductive link between the two units. The secondary unit comprises a forward receiver, microelectrode stimulation circuitry and a backward transmitter that is used to monitor the implant. Address event representation is used for communicating spike events. Data is modulated with binary frequency-shift keying and differential binary phase-shift keying in the forward and in the backward directions, respectively. A prototype of the proposed system was developed and tested. Experimental results show that the spikes to stimulate the visual cortex are accurately generated and that the efficiency of the inductive link is relatively high, about 28% in average for 1 cm intercoil distance providing a power of about 50 milliwatts to the secondary implanted unit. Application specific integrated circuits were designed for this secondary unit, showing that, with current technology, it is possible to implement such a unit, respecting the power constraints.

european conference on parallel processing | 2014

SchedMon: A Performance and Energy Monitoring Tool for Modern Multi-cores

Luís Taniça; Aleksandar Ilic; Pedro Tomás; Leonel Sousa

Accurate characterization of modern systems and applications requires run-time and simultaneous assessment of several execution-related parameters. Although hardware monitoring facilities in modern multi-cores allow low-level profiling, it is not always easy to convert the acquired data into insightful information. For this, a low-overhead monitoring tool (SchedMon) is proposed herein, which relies on hardware facilities and interacts with the operating system scheduler to capture the run-time behavior of single and multi-threaded applications, even in presence of nested parallelism. By tracking the attainable performance, power and energy consumption of monitored applications, SchedMon also allows their insightful characterization with the Cache-aware Roofline model. In addition, the proposed tool provides application monitoring, either in their entirety or at the level of the function calls, without requiring any changes to the original source code. Experimental results show that SchedMon introduces negligible execution overheads, while capturing the interference of several co-scheduled SPEC2006 applications.

ACM Transactions on Architecture and Code Optimization | 2016

A Framework for Application-Guided Task Management on Heterogeneous Embedded Systems

Francisco Gaspar; Luis Taniça; Pedro Tomás; Aleksandar Ilic; Leonel Sousa

In this article, we propose a general framework for fine-grain application-aware task management in heterogeneous embedded platforms, which allows integration of different mechanisms for an efficient resource utilization, frequency scaling, and task migration. The proposed framework incorporates several components for accurate runtime monitoring by relying on the OS facilities and performance self-reporting for parallel and iterative applications. The framework efficiency is experimentally evaluated on a real hardware platform, where significant power and energy savings are attained for SPEC CPU2006 and PARSEC benchmarks, by guiding frequency scaling and intercluster migrations according to the runtime application behavior and predefined performance targets.

application specific systems architectures and processors | 2013

BioBlaze: Multi-core SIMD ASIP for DNA sequence alignment

Nuno Neves; Nuno Sebastião; Andre Patricio; David Martins de Matos; Pedro Tomás; Paulo F. Flores; Nuno Roma

A new Application-Specific Instruction-set Processor (ASIP) architecture for biological sequences alignment is proposed in this manuscript. This architecture achieves high processing throughputs by exploiting both fine and coarse-grained parallelism. The former is achieved by extending the Instruction Set Architecture (ISA) of a synthesizable processor to include multiple specialized SIMD instructions that implement vector-vector and vector-scalar arithmetic, logic, load/store and control operations. Coarse-grained parallelism is achieved by using multiple cores to cooperatively align multiple sequences in a shared memory architecture, comprising proper hardware-specific synchronization mechanisms. To ease the programming, a compilation framework based on an adaptation of the GCC back-end was also implemented. The proposed system was prototyped and evaluated on a Xilinx Virtex-7 FPGA, achieving a 200MHz working frequency. A sequential and a state-of-theart SIMD implementations of the Smith-Waterman algorithm were programmed in both the proposed ASIP and an Intel Core i7 processor. When comparing the achieved speedups, it was observed that the proposed ISA achieves a 40x speedup, which contrasts with the 11x speedup provided by SSE2 in the Intel Core i7 processor. The scalability of the multi-core system was also evaluated and proved to scale almost linearly with the number of cores.

IEEE Transactions on Very Large Scale Integration Systems | 2015

Multicore SIMD ASIP for Next-Generation Sequencing and Alignment Biochip Platforms

Nuno Neves; Nuno Sebastião; David Martins de Matos; Pedro Tomás; Paulo F. Flores; Nuno Roma

Targeting the development of new biochip platforms capable of autonomously sequencing and aligning biological sequences, a new multicore processing structure is proposed in this manuscript. This multicore structure makes use of a shared memory model and multiple instantiations of a novel application-specific instruction-set processor (ASIP) to simultaneously exploit both fine and coarse-grained parallelism and to achieve high performance levels at low-power consumption. The proposed ASIP is built by extending the instruction set architecture of a synthesizable processor, including both general and special-purpose single-instruction multiple-data instructions. This allows an efficient exploitation of fine-grained parallelism on the alignment of biological sequences, achieving over 30× speedup when compared with sequential algorithmic implementations. The complete system was prototyped on different field-programmable gate array platforms and synthesized with a 90-nm CMOS process technology. Experimental results demonstrate that the multicore structure scales almost linearly with the number of instantiated cores, achieving performances similar to a quad-core Intel Core i7 3820 processor, while using 25× less energy.

international conference on parallel processing | 2013

Transparent Application Acceleration by Intelligent Scheduling of Shared Library Calls on Heterogeneous Systems

João Colaço; Adrian Matoga; Aleksandar Ilic; Nuno Roma; Pedro Tomás; Ricardo Chaves

Transparent application acceleration in heterogeneous systems can be performed by automatically intercepting shared libraries calls and by efficiently orchestrating the execution across all processing devices. To fully exploit the available computing power, the intercepted calls must be replaced with faster accelerator-based implementations and intelligent scheduling algorithms must be incorporated. When compared with previous approaches, the framework herein proposed does not only transparently intercepts and redirects the library calls, but it also incorporates state-of-art scheduling algorithms, for both divisible and indivisible applications. When compared with highly optimized implementations for multi-core CPUs (e.g., MKL and FFTW), the obtained experimental results demonstrate that, by applying appropriate light-weight scheduling and load-balancing mechanisms, performance speedups as high as 7.86 (matrix multiplication) and 4.6 (FFT) can be achieved.

computer and information technology | 2010

Efficient Independent Component Analysis on a GPU

Rui Ramalho; Pedro Tomás; Leonel Sousa

Several problems in the signal processing field require generating suitable representations of data. One possible form of representation is given by independent component analysis (ICA). The computation of these representations can be quite expensive, especially if large datasizes are used. Over the last few years graphics processing units (GPUs) have emerged as inexpensive general-purpose computation accelerators. This paper presents an implementation of FastICA, an ICA algorithm, on a multicore GPU. The resulting implementation achieved an overall speedup of 55 for estimating 256 independent components, each with 1000 samples, regarding the implementation on a general purpose processor running at 2 GHz.

Computers & Geosciences | 2015

Acceleration of stochastic seismic inversion in OpenCL-based heterogeneous platforms

Tomás Ferreirinha; Ruben Nunes; Leonardo Azevedo; Amílcar Soares; Frederico Pratas; Pedro Tomás; Nuno Roma

Seismic inversion is an established approach to model the geophysical characteristics of oil and gas reservoirs, being one of the basis of the decision making process in the oil&gas exploration industry. However, the required accuracy levels can only be attained by dealing and processing significant amounts of data, often leading to consequently long execution times. To overcome this issue and to allow the development of larger and higher resolution elastic models of the subsurface, a novel parallelization approach is herein proposed targeting the exploitation of GPU-based heterogeneous systems based on a unified OpenCL programming framework, to accelerate a state of art Stochastic Seismic Amplitude versus Offset Inversion algorithm. To increase the parallelization opportunities while ensuring model fidelity, the proposed approach is based on a careful and selective relaxation of some spatial dependencies. Furthermore, to take into consideration the heterogeneity of modern computing systems, usually composed of several and different accelerating devices, multi-device parallelization strategies are also proposed. When executed in a dual-GPU system, the proposed approach allows reducing the execution time in up to 30 times, without compromising the quality of the obtained models. HighlightsNovel approach to accelerate a Stochastic Seismic AVO Inversion algorithm.Exploitation of GPU-based heterogeneous systems based on a unified OpenCL framework.Multi-device parallelization strategies to tackle system heterogeneity.The adopted parallelization strategy ensures the quality of the inversion results.Performance speedup as high as 30i? is obtained with a dual-GPU system.

international conference on parallel processing | 2013

Monitoring Performance and Power for Application Characterization with the Cache-Aware Roofline Model

Diogo Antão; Luís Taniça; Aleksandar Ilic; Frederico Pratas; Pedro Tomás; Leonel Sousa

Accurate on-the-fly characterization of application behaviour requires assessing a set of execution-related parameters at runtime, including performance, power and energy consumption. These parameters can be obtained by relying on hardware measurement facilities built-in modern multi-core architectures, such as performance and energy counters. However, current operating systems (OSs) do not provide the means to directly obtain these characterization data. Thus, the user needs to rely on complex custom-built libraries with limited capabilities, which might introduce significant execution and measurement overheads. In this work, we propose two different techniques for efficient performance, power and energy monitoring for systems with modern multi-core CPUs. Here we propose two monitoring tools that allow capturing the run-time behaviour of a wide range of applications at different system levels: (i) at the user-space level, and (ii) at kernel-level, by using the OS scheduler to directly capture this information. Although the importance of the proposed monitoring facilities is patent for many purposes, we focus herein on their employment for application characterization with the recently proposed Cache-aware Roofline model.

Explore More