Is this you? Create Your Porfile

Anderson Luiz Sartor

Universidade Federal do Rio Grande do Sul

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Anderson Luiz Sartor is active.

Explore More

Publication

Featured researches published by Anderson Luiz Sartor.

ACM Journal on Emerging Technologies in Computing Systems | 2017

Exploiting Idle Hardware to Provide Low Overhead Fault Tolerance for VLIW Processors

Anderson Luiz Sartor; Arthur Francisco Lorenzon; Luigi Carro; Fernanda Lima Kastensmidt; Stephan Wong; Antonio Carlos Schneider Beck

Because of technology scaling, the soft error rate has been increasing in digital circuits, which affects system reliability. Therefore, modern processors, including VLIW architectures, must have means to mitigate such effects to guarantee reliable computing. In this scenario, our work proposes three low overhead fault tolerance approaches based on instruction duplication with zero latency detection, which uses a rollback mechanism to correct soft errors in the pipelanes of a configurable VLIW processor. The first uses idle issue slots within a period of time to execute extra instructions considering distinct application phases. The second works at a finer grain, adaptively exploiting idle functional units at run-time. However, some applications present high instruction-level parallelism (ILP), so the ability to provide fault tolerance is reduced: less functional units will be idle, decreasing the number of potential duplicated instructions. The third approach attacks this issue by dynamically reducing ILP according to a configurable threshold, increasing fault tolerance at the cost of performance. While the first two approaches achieve significant fault coverage with minimal area and power overhead for applications with low ILP, the latter improves fault tolerance with low performance degradation. All approaches are evaluated considering area, performance, power dissipation, and error coverage.

application-specific systems, architectures, and processors | 2016

Adaptive ILP control to increase fault tolerance for VLIW processors

Anderson Luiz Sartor; Stephan Wong; Antonio Carlos Schneider Beck

Because of technology scaling, soft error rate has been increasing in modern processors, affecting system reliability. To mitigate such effect, we propose an adaptive fault tolerance approach that exploits, at run-time, idle functional units to execute duplicated instructions in a configurable VLIW processor. In applications with high Instruction Level Parallelism (ILP) and few functional units available for duplication, it adaptively reschedules instructions according to a configurable threshold, providing a tradeoff between performance and fault tolerance. On average, failure rate is reduced by 89.53%, performance by 5.86%; while energy consumption increases by 72% and area by 22.2%, using a fault tolerance oriented threshold.

ieee computer society annual symposium on vlsi | 2015

A Novel Phase-Based Low Overhead Fault Tolerance Approach for VLIW Processors

Anderson Luiz Sartor; Arthur Francisco Lorenzon; Luigi Carro; Fernanda Lima Kastensmidt; Stephan Wong; Antonio Carlos Schneider Beck

Because of technology scaling, the soft error rate has been increasing in digital circuits, which in turn affects system reliability. Therefore, modern processors, including VLIW architectures, must have means to mitigate such effects to guarantee reliable computation. In this scenario, our work proposes two new low overhead fault tolerance approaches, with zero latency detection, that correct soft errors in the pipelines of a configurable VLIW processor. Each approach has a distinct way to detect errors, but they both utilize the same rollback mechanism. The first utilizes redundant hardware by having specialized duplicated pipelines. The second uses idle issue slots to execute duplicated instructions and does this by first identifying phases within an application. Our implementation does not require changes to the binary code and has negligible performance losses. It has 50% of area overhead with 35% power dissipation for the full pipeline duplication, and only 7% of extra area when using idle resources. We compared our approach to related work and demonstrate that we are more efficient when one considers the area, performance, power dissipation and error coverage altogether.

design, automation, and test in europe | 2016

Run-time phase prediction for a reconfigurable VLIW processor

Qi Guo; Anderson Luiz Sartor; Anthony Brandon; Antonio Carlos Schneider Beck; Xuehai Zhou; Stephan Wong

It is well-known that different applications exhibit varying amounts of ILP. Execution of these applications on the same fixed-width VLIW processor will result (1) in wasted energy due to underutilized resources if the issue-width of the processor is larger than the inherent ILP; or alternatively, (2) in lower performance if the issue-width is smaller than the inherent ILP. Moreover, even within a single application distinct phases can be observed with varying ILP and therefore changing resource requirements. With this in mind, we designed the ρ-VEX processor, which is a VLIW processor that can change its issue-width at run-time. In this paper, we propose a novel scheme to dynamically (i.e., at run-time) optimize the resource utilization by predicting and matching the number of active data-paths for each application phase. The purpose is to achieve low energy consumption for applications with low ILP, and high performance for applications with high ILP, on a single VLIW processor design. We prototyped the ρ-VEX processor on an FPGA and obtained the dynamic traces of applications running on top of a Linux port. Our results show that it is possible in some cases to achieve the performance of an 8-issue core with 10% lower energy consumption, while in others we achieve the energy consumption of a 2-issue core with close to 20% lower execution time.

ieee computer society annual symposium on vlsi | 2015

Optimized Use of Parallel Programming Interfaces in Multithreaded Embedded Architectures

Arthur Francisco Lorenzon; Anderson Luiz Sartor; Márcia Cristina Cera; Antonio Carlos Schneider Beck

Thread-level parallelism (TLP) exploitation for embedded systems has been a challenge for software developers: while it is necessary to take advantage of the availability of multiple cores, it is also mandatory to consume less energy. To speed up the development process and make it as transparent as possible, software designers use parallel programming interfaces (PPIs). However, as will be shown in this paper, each one implements different ways to exchange data, influencing performance, energy consumption and energy-delay product (EDP), which varies across different embedded processors. By evaluating four PPIs and three multicore processors, we demonstrate that it is possible to save up to 62% in energy consumption and achieve up to 88% of EDP improvements by just switching the PPI, and that the efficiency (i.e., The best possible use of the available resources) decreases as the number of threads increases in almost all cases, but at distinct rates.

2017 VII Brazilian Symposium on Computing Systems Engineering (SBESC) | 2017

Simbah-FI: Simulation-Based Hybrid Fault Injector

Anderson Luiz Sartor; Pedro Henrique Exenberger Becker; Antonio Carlos Schneider Beck

Reliability testing has become extremely important in modern electronics as the soft error rate has been increasing due to technology scaling. The testing must be controllable, generic, done before deployment, cheap, and fast. Even though fault injection is often the most appropriate solution considering these requirements, it is very time-consuming. This work proposes a hybrid fault injection framework that automatically switches between RTL and gate-level simulation modes to speed up fault injection over conventional simulators by more than 10 times, maintaining gate-level fault injection accuracy and controllability. The proposed framework is generic, so that faults can be injected to any arbitrary circuit; and supports concurrent execution of several simulations. As case study, the reliability of a complex 8-issue VLIW processor is assessed.

reconfigurable computing and fpgas | 2015

A sparse VLIW instruction encoding scheme compatible with generic binaries

Anthony Brandon; Joost Hoozemans; Jeroen van Straten; Arthur Francisco Lorenzon; Anderson Luiz Sartor; Antonio Carlos Schneider Beck; Stephan Wong

Very Long Instruction Word (VLIW) processors are commonplace in embedded systems due to their inherent lowpower consumption as the instruction scheduling is performed by the compiler instead by sophisticated and power-hungry hardware instruction schedulers used in their RISC counterparts. This is achieved by maximizing resource utilization by only targeting a certain application domain. However, when the inherent application ILP (instruction-level parallelism) is low, resources are under-utilized/wasted and the encoding of NOPs results in large code sizes and consequently additional pressure on the memory subsystem to store these NOPs. To address the resource-utilization issue, we proposed a dynamic VLIW processor design that can merge unused resources to form additional cores to execute more threads. Therefore, the formation of cores can result in issue widths of 2, 4, and 8. Without sacrificing the possibility of code interruptability and resumption, we proposed a generic binary scheme that allows a single binary to be executed on these different issue-width cores. However, the code size issue remains as the generic binary scheme even slightly further increases the number NOPS. Therefore, in this paper, we propose to apply a well-known stop-bit code compression technique to the generic binaries that, most importantly, maintains its code compatibility characteristic allowing it to be executed on different cores. In addition, we present the hardware designs to support this technique in our dynamic core. For prototyping purposes, we implemented our design on a Xilinx Virtex-6 FPGA device and executed 14 embedded benchmarks. For comparison, we selected a nondynamic/ static VLIW core that incorporates a similar stop-bit technique for its code compression. We demonstrate, while maintaining code compatibility on top of a flexible dynamic VLIW processor, that the code size can be significantly reduced (up to 80%) resulting in energy savings, and that the performance can be increased (up to a factor of three). Finally, our experimental results show that we can use smaller caches (2 to 4 times as small), which will further help in decreasing energy consumption.

computer software and applications conference | 2015

The Impact of Virtual Machines on Embedded Systems

Anderson Luiz Sartor; Arthur Francisco Lorenzon; Antonio Carlos Schneider Beck

Embedded systems are becoming increasingly complex and, due to their tight energy requirements, all the available resources must be used in the best possible way. However, Android, the most used software platform for embedded systems, features a virtual machine to run applications. Even though it ensures flexibility so the application can execute on different underlying architectures without the need for recompilation, it burdens the system because of the introduction of an extra software layer. Considering this scenario, through the development of an extension of the Android QEMU emulator and a specific benchmark set, this work evaluates the significance of the virtual machine by comparing applications written in Java and in native language. We show that, given a fixed energy budget, a different amount of applications can be executed depending the way they were implemented. We also demonstrate that this difference varies according to the processor, by executing the applications on all officially supported Android architectures (Intel x86, ARM, and MIPS). Therefore, even though the Virtual Machine provides total transparency to the software developer, he/she must be aware of it and the underlying target micro architecture at early designs stages so as to build a low-energy application.

2013 III Brazilian Symposium on Computing Systems Engineering | 2013

AndroProf: A Profiling Tool for the Android Platform

Anderson Luiz Sartor; Ulisses Brisolara Correa; Antonio Carlos Schneider Beck Filho

Current tools for mobile development are very limited in which and how much information they can trace or profile. They are also scarce when compared to general-purpose development tools. This makes the development of embedded applications, with its hard constraints, such as limited performance and power budget, a hard task to be accomplished. Therefore, a tool that provides information such as energy consumption, execution time and other statistics is mandatory when it comes to develop embedded applications. This paper presents a tool that provides the aforementioned information per application and that is able to trace both Dalvik Virtual Machine and native code. To accomplish this, we extended Android SDKs QEMU, and we developed graphical user interfaces to process the traced data.

applied reconfigurable computing | 2018

A Low-Cost BRAM-Based Function Reuse for Configurable Soft-Core Processors in FPGAs

Pedro Henrique Exenberger Becker; Anderson Luiz Sartor; Marcelo Brandalero; Tiago T. Jost; Stephan Wong; Luigi Carro; Antonio Carlos Schneider Beck

Many modern FPGA-based soft-processor designs must include dedicated hardware modules to satisfy the requirements of a wide range of applications. Not seldom they all do not fit in the FPGA target, so their functionalities must be mapped into the much slower software domain. However, many complex soft-core processors usually underuse the available Block RAMs (BRAMs) when comparing to LUTs and registers. By taking advantage of this fact, we propose a generic low-cost BRAM-based function reuse mechanism (the BRAM-FR) that can be easily configured for precise or approximate modes to accelerate execution. The BRAM-FR was implemented in HDL and coupled to a configurable 4-issue VLIW processor. It was used to optimize different applications that use a soft-float library to emulate a Floating-Point Unit (FPU), and an image processing filter that tolerates a certain level of error. We show that our technique can accelerate the former by 1.23x and the latter by 1.52x, with a Reuse Table that fits in the BRAMs (that would otherwise be idle) of five tested FPGA targets with a marginal increase in the number of slice registers and LUTs.

Explore More