Is this you? Create Your Porfile

Samuel Xavier-de-Souza

Federal University of Rio Grande do Norte

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Samuel Xavier-de-Souza is active.

Explore More

Publication

Featured researches published by Samuel Xavier-de-Souza.

IEEE Embedded Systems Letters | 2018

The IoT Energy Challenge: A Software Perspective

Kyriakos Georgiou; Samuel Xavier-de-Souza; Kerstin Eder

The Internet of Things (IoT) sparks a whole new world of embedded applications. Most of these applications are based on deeply embedded systems that have to operate on limited or unreliable sources of energy, such as batteries or energy harvesters. Meeting the energy requirements for such applications is a hard challenge, which threatens the future growth of the IoT. Software has the ultimate control over hardware. Therefore, its role is significant in optimizing the energy consumption of a system. Currently, programmers have no feedback on how their software affects the energy consumption of a system. Such feedback can be enabled by energy transparency, a concept that makes a program’s energy consumption visible, from hardware to software. This letter discusses the need for energy transparency in software development and emphasizes on how such transparency can be realized to help tackle the IoT energy challenge.

Microprocessors and Microsystems | 2015

Optimal processor dynamic-energy reduction for parallel workloads on heterogeneous multi-core architectures

Carlos Avelino de Barros; Luiz F. Q. Silveira; Carlos Valderrama; Samuel Xavier-de-Souza

With the increase in the number of cores in processor chips observed in recent years, design choices-such as the number of cores in chip, the amount of resources per core, and whether to design homogeneous or heterogeneous chips-need to be given proper support. Several studies on heterogeneous multi-core processors are concerned with performance improvements. In this work, we propose mathematical models to analyze some of these design issues with focus on the reduction of processor dynamic energy. In particular, these models allow the comparison of the dynamic-energy consumption of multi-core architectures when they execute a workload in the same amount of time while allowing different core operating frequency between the compared architectures. The results of the analysis allow chip designers to choose the right conditions for optimal energy savings in heterogeneous multi-core chips based on the parallel fraction of the workloads and on the distributions of the resources among the cores in the chip. Under a simplified context, the devised models agree with the consolidated knowledge that heterogeneous multi-core chips have considerable advantage over homogeneous multi-core and single-core architectures in terms of energy efficiency.

software and compilers for embedded systems | 2018

Less is More: Exploiting the Standard Compiler Optimization Levels for Better Performance and Energy Consumption

Kyriakos Georgiou; Craig Blackmore; Samuel Xavier-de-Souza; Kerstin Eder

This paper presents the interesting observation that by performing fewer of the optimizations available in a standard compiler optimization level such as -02, while preserving their original ordering, significant savings can be achieved in both execution time and energy consumption. This observation has been validated on two embedded processors, namely the ARM Cortex-M0 and the ARM Cortex-M3, using two different versions of the LLVM compilation framework; v3.8 and v5.0. Experimental evaluation with 71 embedded benchmarks demonstrated performance gains for at least half of the benchmarks for both processors. An average execution time reduction of 2.4% and 5.3% was achieved across all the benchmarks for the Cortex-M0 and Cortex-M3 processors, respectively, with execution time improvements ranging from 1% up to 90% over the -02. The savings that can be achieved are in the same range as what can be achieved by the state-of-the-art compilation approaches that use iterative compilation or machine learning to select flags or to determine phase orderings that result in more efficient code. In contrast to these time consuming and expensive to apply techniques, our approach only needs to test a limited number of optimization configurations, less than 64, to obtain similar or even better savings. Furthermore, our approach can support multi-criteria optimization as it targets execution time, energy consumption and code size at the same time.

international conference on localization and gnss | 2016

Time-effective GPS time domain signal Acquisition Algorithm

Glauberto Leilson Alves De Albuquerque; Carlos Valderrama; Fabrício Costa Silva; Samuel Xavier-de-Souza

This paper presents a new time-effective GPS acquisition algorithm that improves the Time to First Fix (TTFF) of hardware GPS receivers under power consumption or hardware constraints. Based on a modified Serial Search Acquisition Algorithm (SA), this enhanced alternative, called Reduced Two Steps Acquisition (RTSA), provides an average gain of 64.18% in number of SA calculations. In terms of speed-up RTSA was in average 3x faster than SA without needing additional hardware resources.

international conference on performance engineering | 2018

Application Speedup Characterization: Modeling Parallelization Overhead and Variations of Problem Size and Number of Cores.

Victor H. F. Oliveira; Alex F. A. Furtunato; Luiz F. Q. Silveira; Kyriakos Georgiou; Kerstin Eder; Samuel Xavier-de-Souza

To make efficient use of multi-core processors, it is important to understand the performance behavior of parallel applications. Modeling this can enable the use of online approaches to optimize throughput or energy, or even guarantee a minimum QoS. Accurate models would avoid probe different runtime configurations, which causes overhead. Throughout the years, many speedup models were proposed. Most of them based on Amdahls or Gustafsons laws. However, many of those make considerations such as a fixed parallel fraction, or a parallel fraction that varies linearly with problem size, and inexistent parallelization overhead. Although such models aid in the theoretical understanding, these considerations do not hold in real environments, which makes the modeling unsuitable for accurate characterization of parallel applications. The model proposed estimates the speedup taking into account the variation of its parallel fraction according to problem size, number of cores used and overhead. Using four applications from the PARSEC benchmark suite, the proposed model was able to estimate speedups more accurately than other models in recent literature.

The Journal of Supercomputing | 2018

Parallel synchronous and asynchronous coupled simulated annealing

Kayo Gonçalves-e-Silva; Daniel Aloise; Samuel Xavier-de-Souza

We propose a parallel synchronous and asynchronous implementation of the coupled simulated annealing (CSA) algorithm in a shared-memory architecture. The original CSA was implemented synchronously in a distributed-memory architecture. It synchronizes at each temperature update, which leads to idling and loss of efficiency when increasing the number of processors. The proposed synchronous CSA (SCSA) is implemented as the original, but in a shared-memory architecture. The proposed asynchronous CSA (ACSA) does not synchronize, allowing a larger parallel efficiency for larger numbers of processors. Results from extensive experiments show that the proposed ACSA presents much better quality of solution when compared to the serial and to the SCSA. The experiments also show that the performance of the proposed ACSA is better than the SCSA for less computationally intensive problems or when a larger number of processing cores are available. Moreover, the parallel efficiency of the ACSA improves by increasing the size of the problem. With the advent of the Multi-core Era, the use of the proposed algorithm becomes more attractive than the original synchronous CSA.

Computers & Electrical Engineering | 2018

Spectrum sensing with a parallel algorithm for cyclostationary feature extraction

Arthur Diego de Lira Lima; Luiz F. Q. Silveira; Samuel Xavier-de-Souza

Abstract The current static management policy for spectrum allocation has shown to be inefficient when dealing with the increasing demand for wireless communication systems. More recently, opportunistic spectrum access has emerged as a promising alternative that allows non-licensed users to utilize the spectrum if no primary users are detected. Spectrum sensing based on cyclostationary feature detection can be employed to reliably identify the presence of primary users even at low SNR levels. However, the detection of modulated signals at lower SNR levels demands a higher number of analyzed samples. In this paper, we propose an architecture for spectrum sensing that enables the reduction of the computational time needed to obtain cyclostationary features of a signal when using multi-core processors. Simulation results show that the proposed architecture can achieve over 92.8% parallel efficiency, which leads to a reduction of spectrum sensing time by a factor of 29.7.

Applied Mathematics and Computation | 2018

Memory-usage advantageous block recursive matrix inverse

Iria C. S. Cosme; Isaac F. Fernandes; João L. de Carvalho; Samuel Xavier-de-Souza

Abstract The inversion of extremely high order matrices has been a challenging task because of the limited processing and memory capacity of conventional computers. In a scenario in which the data does not fit in memory, it is worth to consider exchanging less memory usage for more processing time in order to enable the computation of the inverse which otherwise would be prohibitive. We propose a new algorithm to compute the inverse of block partitioned matrices with a reduced memory footprint. The algorithm works recursively to invert one block of a k × k block matrix M, with k ≥ 2, based on the successive splitting of M. It computes one block of the inverse at a time, in order to limit memory usage during the entire processing. Experimental results show that, despite increasing computational complexity, matrices that otherwise would exceed the memory-usage limit can be inverted using this technique.

2015 Fourth Berkeley Symposium on Energy Efficient Electronic Systems (E3S) | 2015

Not faster nor slower tasks, but less energy hungry and parallel: Simulation results

Samuel Xavier-de-Souza; Carlos Avelino de Barros; Márcio O. Jales; Luiz F. Q. Silveira

Before the current computational era, when the most common processors had a single processing core, the speed of computation was mainly defined by the speed of that core. Faster cores usually reflected in faster algorithms and applications. In the current era, the speed of computation is no longer primarily boosted by faster cores. Due to the thermal effect known as the power wall, the increment in speed that can be reached from one processor generation to another is very limited. The power wall is not the only limiting factor though. The degree of instruction level parallelism has also reached far in the law of diminishing returns. Todays era is governed by multi-core processors. The power wall was circumvented with task level parallelism. The downside is that many applications may not effortlessly become faster with new generations of processors. In the multi-core era, faster algorithms are obtained with a combination of more processing cores and a good exploration of task level parallelism, meaning that algorithm designers have now an active roll in sustaining the performance of their application through generations of processors.

Journal of the Brazilian Computer Society | 2014

On the parallel efficiency and scalability of the correntropy coefficient for image analysis

Aluisio I. R. Fontes; Samuel Xavier-de-Souza; Adrião Duarte Dória Neto; Luiz F. Q. Silveira

BackgroundSimilarity measures have application in many scenarios of digital image processing. The correntropy is a robust and relatively new similarity measure that recently has been employed in various engineering applications. Despite other competitive characteristics, its computational cost is relatively high and may impose hard-to-cope time restrictions for high-dimensional applications, including image analysis and computer vision.MethodsWe propose a parallelization strategy for calculating the correntropy on multi-core architectures that may turn the use of this metric viable in such applications. We provide an analysis of its parallel efficiency and scalability.ResultsThe simulation results were obtained on a shared memory system with 24 processing cores for input images of different dimensions. We performed simulations of various scenarios with images of different sizes. The aim was to analyze the parallel and serial fraction of the computation of the correntropy coefficient and the influence of these fractions in its speedup and efficiency.ConclusionsThe results indicate that correntropy has a large potential as a metric for image analysis in the multi-core era due to its high parallel efficiency and scalability.Similarity measures have application in many scenarios of digital image processing. The correntropy is a robust and relatively new similarity measure that recently has been employed in various engineering applications. Despite other competitive characteristics, its computational cost is relatively high and may impose hard-to-cope time restrictions for high-dimensional applications, including image analysis and computer vision. We propose a parallelization strategy for calculating the correntropy on multi-core architectures that may turn the use of this metric viable in such applications. We provide an analysis of its parallel efficiency and scalability. The simulation results were obtained on a shared memory system with 24 processing cores for input images of different dimensions. We performed simulations of various scenarios with images of different sizes. The aim was to analyze the parallel and serial fraction of the computation of the correntropy coefficient and the influence of these fractions in its speedup and efficiency. The results indicate that correntropy has a large potential as a metric for image analysis in the multi-core era due to its high parallel efficiency and scalability.

Explore More