Rafael Trapani Possignolo

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Rafael Trapani Possignolo is active.

Explore More

Publication

Featured researches published by Rafael Trapani Possignolo.

ACM Transactions on Architecture and Code Optimization | 2016

Managing Mismatches in Voltage Stacking with CoreUnfolding

Ehsan K. Ardestani; Rafael Trapani Possignolo; José Luis Briz; Jose Renau

Five percent to 25% of power could be wasted before it is delivered to the computational resources on a die, due to inefficiencies of voltage regulators and resistive loss. The power delivery could benefit if, at the same power, the delivered voltage increases and the current decreases. This article presents CoreUnfolding, a technique that leverages voltage Stacking to improve power delivery efficiency. Our experiments show that about 10% system-wide power can be saved, the voltage regulator area can be reduced by 30%, di/dt improves 49%, and the power pin count is reduced by 40% (≈ 20% reduction in packaging costs), with negligible performance degradation.

field programmable logic and applications | 2016

Automated extra pipeline analysis of applications mapped to Xilinx UltraScale+ FPGAs

Ilya K. Ganusov; Henri Fraisse; Aaron N. Ng; Rafael Trapani Possignolo; Sabya Das

This paper describes the methodology and algorithms behind extra pipeline analysis tools released in the Xilinx Vivado Design Suite version 2015.3. Extra pipelining is one of the most effective ways to improve performance of FPGA applications. Manual pipelining, however, often requires significant efforts from FPGA designers who need to explore various changes in the RTL and re-run the flow iteratively. The automatic pipelining approach described in this paper, in contrast, allows FPGA users to explore latency vs. performance trade-offs of their designs before investing time and effort into modifying RTL. We describe algorithms behind these tools which use simple cut heuristics to maximize performance improvement while minimizing additional latency and register overhead. To demonstrate the effectiveness of the proposed approach, we analyse a set of 93 commercial FPGA applications and IP blocks mapped to Xilinx UltraScale+ and UltraScale generations of FPGAs. The results show that extra pipelining can provide from 18% to 29% potential Fmax improvement on average. It also shows that the distribution of improvements is bimodal, with almost half of benchmark suite designs showing no improvement due to the presence of large loops. Finally, we demonstrate that highly-pipelined designs map well to UltraScale+ and UltraScale FPGA architectures. Our approach demonstrates 19% and 20% Fmax improvement potential for the UltraScale+ and UltraScale architectures respectively, with the majority of applications reaching their loop limit through pipelining.

trust security and privacy in computing and communications | 2012

A Quantum-classical Hybrid Architecture for Security Algorithms Acceleration

Rafael Trapani Possignolo; Cintia B. Margi

Since the discovery of Shors algorithm, the anxiety about quantum computation has increased. A large amount of research has been conducted to discover new algorithms and to build a quantum computer. But it seems that a general purpose quantum computer is far from being achieved. Meanwhile, cryptographers around the world started to look for security algorithms that resist to quantum attacks, but these still need improvement to achieve practical execution time. This work proposes a quantum-classical hybrid architecture, focusing on photonic quantum computers. A small quantum coprocessor implementing the Grover search algorithm is used to perform the search for roots of polynomials in Fpq. This coprocessor is used to accelerate the decoding process of the McEliece cryptosystem.

international conference on electronics, circuits, and systems | 2009

Optimized joint NARX ANN - embedded processor design methodology

Rafael Trapani Possignolo; Omar Hammami

Neural Networks are largely used in a vast number of applications, including time series prediction, function approximation, pattern classification. Recently Nonlinear Auto Regressive with eXogenous input (NARX) Recurrent Neural Networks has been used in to predict noisy and large time series (also referred as chaotic time series). This paper present a multiobjective optimized implementation of NARX neural network, specially designed to work on embedded systems.

international symposium on circuits and systems | 2016

SRAM voltage stacking

Elnaz Ebrahimi; Rafael Trapani Possignolo; Jose Renau

Current delivery is a major challenge in chip design. Reduction of the nominal voltage due to technology scaling has worsened the problem. Voltage stacking has been proposed as a way to alleviate the problem by delivering power in a serial rather than the conventional parallel way. Several studies have proposed techniques to stack logic designs. This paper applies the voltage stacking technique to SRAMs. By dividing the SRAM into two logic domains, we are able to double the supply voltage VDD while reducing the current draw significantly. Since SRAMs have a predictable activity pattern, each stack consumes the same amount of power, therefore, the stack voltage 2VDD will distribute evenly between the stacks and the current demand will decrease up to 44%. The combined effects of increasing VDD and decreasing current allow the design of Voltage Regulators to be 10%–15% more efficient.

international conference on computer design | 2016

Fluid Pipelines: Elastic circuitry meets Out-of-Order execution

Rafael Trapani Possignolo; Elnaz Ebrahimi; Haven Skinner; Jose Renau

Pipeline depth and cycle time are fixed early in the chip design process but their impact can only be assessed when the implementation is mostly done and changing them is impractical. Elastic Systems are latency insensitive systems, and allow changes in the pipeline depth late in the design process with little design effort. Nevertheless, they have significant throughput penalty when new stages are added in the presence of pipeline loops. We propose Fluid Pipelines, an evolution that allows pipeline transformations without a throughput penalty. Formally, we introduce “or-causality” in addition to the already existing “and-causality” in Elastic Systems. It gives more flexibility than previously possible at the cost of having the designer to specify the intended behavior of the circuit. In an Out-of-Order core benchmark, Fluid Pipelines improve the optimal energy-delay point by shifting both performance (by 17%) and energy (by 13%). We envision a scenario where tools would be able to generate different pipeline configurations from the same RTL e.g., low power, high performance.

Archive | 2018

Automated Pipeline Transformations with Fluid Pipelines

Rafael Trapani Possignolo; Elnaz Ebrahimi; Haven Skinner; Jose Renau

In this chapter, we propose the concept of Fluid Pipelines, an evolution in the chip design process that allows for efficient late pipeline transformations. In a regular chip design process, pipeline depth and cycle time are fixed early in the design flow. However, their impact can only be assessed when the implementation is mostly done and any change in the pipeline design is impractical. Although Elastic Systems are latency insensitive and allow changes in the pipeline depth late in the design process with little design effort, they have significant throughput penalty when new stages are added in the presence of pipeline loops. Fluid Pipelines allow for pipeline transformations without a throughput penalty. Formally, we introduce “or-causality” in addition to the already existing “and-causality” in Elastic Systems. It gives more flexibility than previously possible at the cost of having the designer to specify the intended behavior of the circuit. In an Out-of-Order core benchmark, Fluid Pipelines improve the optimal energy-delay point by shifting both performance (by 17%) and energy (by 13%). We envision a scenario where tools would be able to generate different pipeline configurations from the same RTL, e.g., low power, high performance.

international symposium on circuits and systems | 2017

Level shifter design for voltage stacking

Elnaz Ebrahimi; Rafael Trapani Possignolo; Jose Renau

As chips increase in complexity with ever increasing power consumption, pressure in efficient power delivery mechanism such as multi-VDD, voltage stacked and DVS continues to rise. The main objective is to reduce the overall current delivered to the chip. For instance, in voltage stacking, if the circuit is stacked in 2 levels and supply voltage is doubled, the current drawn will be reduced by half. Hence, the same amount of power is delivered, but with half the current. With the prevalence of systems using those techniques, level shifters will have to be optimally designed to perform fast with low power. As the number of level shifters grows, area consumption becomes another design factor. This study explores different types of existing level shifters for voltage stacking application, their optimal sizing and energy, delay and area trade-offs. It includes effect the of PVT variation as another design factor and its impact on delay and energy consumption. We will also propose modifications to the best energy-delay level shifter to reduce its area overhead.

formal methods | 2017

Liam: an actor based programming model for HDLs

Haven Skinner; Rafael Trapani Possignolo; Jose Renau

The Liam paradigm is a collection of constructs aimed at managing latency-insensitive behavior. These constructs are language-agnostic, although implementing them requires integrating them into the compilater. For this paper, we use a custom, Python-like HDL called Pyrope to both provide concise code samples, and because the Pyrope compiler was designed with Liam-integration in mind.

design automation conference | 2017

LiveSynth: Towards an Interactive Synthesis Flow

Rafael Trapani Possignolo; Jose Renau

➔ The incremental step of LiveSynth reduces synthesis time by about 95% for incremental changes. ➔ LiveSynth shifts the paradigm to small, incremental changes and more iterations per day. ➔ We advocate for an interactive synthesis flow as a way to boost design productivity.

Explore More