Alessandro Cevrero | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Alessandro Cevrero is active.

Explore More

Publication

Featured researches published by Alessandro Cevrero.

cryptographic hardware and embedded systems | 2009

A Design Flow and Evaluation Framework for DPA-Resistant Instruction Set Extensions

Francesco Regazzoni; Alessandro Cevrero; François-Xavier Standaert; Stéphane Badel; Ties Kluter; Philip Brisk; Yusuf Leblebici; Paolo Ienne

Power-based side channel attacks are a significant security risk, especially for embedded applications. To improve the security of such devices, protected logic styles have been proposed as an alternative to CMOS. However, they should only be used sparingly, since their area and power consumption are both significantly larger than for CMOS. We propose to augment a processor, realized in CMOS, with custom instruction set extensions, designed with security and performance as the primary objectives, that are realized in a protected logic. We have developed a design flow based on standard CAD tools that can automatically synthesize and place-and-route such hybrid designs. The flow is integrated into a simulation and evaluation environment to quantify the security achieved on a sound basis. Using MCML logic as a case study, we have explored different partitions of the PRESENT block cipher between protected and unprotected logic. This experiment illustrates the tradeoff between the type and amount of application-level functionality implemented in protected logic and the level of security achieved by the design. Our design approach and evaluation tools are generic and could be used to partition any algorithm using any protected logic style.

design automation conference | 2011

Power-gated MOS current mode logic (PG-MCML): a power aware DPA-resistant standard cell library

Alessandro Cevrero; Francesco Regazzoni; Micheal Schwander; Stéphane Badel; Paolo Ienne; Yusuf Leblebici

MOS Current Mode Logic (MCML) is one of the most promising logic style to counteract power analysis attacks. Unfortunately, the static power consumption of MCML standard cells is significantly higher compared to equivalent functions implemented using static CMOS logic. As a result, the use of such a logic style is very limited in portable devices. Paradoxically, these devices are the most sensitive to physical attacks, thus the ones which would benefit more from the adoption of MCML.

IEEE Journal on Emerging and Selected Topics in Circuits and Systems | 2012

Design and Testing Strategies for Modular 3-D-Multiprocessor Systems Using Die-Level Through Silicon Via Technology

Giulia Beanato; Paolo Giovannini; Alessandro Cevrero; Panagiotis Athanasopoulos; Michael Zervas; Yuksel Temiz; Yusuf Leblebici

An innovative modular 3-D stacked multi-processor architecture is presented. The platform is composed of completely identical stacked dies connected together by through-silicon-vias (TSVs). Each die features four 32-bit embedded processors and associated memory modules, interconnected by a 3-D network-on-chip (NoC), which can route packets in the vertical direction. Superimposing identical planar dies minimizes design effort and manufacturing costs, ensuring at the same time high flexibility and reconfigurability. A single die can be used either as a fully testable standalone chip multi-processor (CMP), or integrated in a 3-D stack, increasing the overall core count and consequently the system performance. To demonstrate the feasibility of this architecture, fully functional samples have been fabricated using a conventional UMC 90 nm complementary metal-oxide-semiconductor process and stacked using an in-house, via-last Cu-TSV process. Initial results show that the proposed 3-D-CMP is capable of operating at a target frequency of 400 MHz, supporting a vertical data bandwidth of 3.2 Gb/s.

field-programmable custom computing machines | 2009

FPGA Implementation of a Single-Precision Floating-Point Multiply-Accumulator with Single-Cycle Accumulation

Arun Paidimarri; Alessandro Cevrero; Philip Brisk; Paolo Ienne

This paper describes an FPGA implementation of a single-precision floating-point multiply-accumulator (FPMAC) that supports single-cycle accumulation while maintaining high clock frequencies. A non-traditional internal representation reduces the cost of mantissa alignment within the accumulator. The FPMAC is evaluated on an Altera Stratix III FPGA.

international symposium on circuits and systems | 2011

Area, throughput, and energy-efficiency trade-offs in the VLSI implementation of LDPC decoders

Christoph Roth; Alessandro Cevrero; Christoph Studer; Yusuf Leblebici; Andreas Burg

Low-density parity-check (LDPC) codes are key ingredients for improving reliability of modern communication systems and storage devices. On the implementation side however, the design of energy-efficient and high-speed LDPC decoders with a sufficient degree of reconfigurability to meet the flexibility demands of recent standards remains challenging. This survey paper provides an overview of the state-of-the-art in the design of LDPC decoders using digital integrated circuits. To this end, we summarize available algorithms and characterize the design space. We analyze the different architectures and their connection to different codes and requirements. The advantages and disadvantages of the various choices are illustrated by comparing state-of-the-art LDPC decoder designs.

asian solid state circuits conference | 2010

A 5.35 mm 2 10GBASE-T Ethernet LDPC decoder chip in 90 nm CMOS

Alessandro Cevrero; Yusuf Leblebici; Paolo Ienne; Andreas Burg

A partially parallel low density parity check (LDPC) decoder compliant with the IEEE 802.3an standard for 100BASE-T Ethernet is presented. The design is optimized for minimum silicon area and is based on the layered offset-min-sum algorithm which speeds up the convergence of the message passing decoding algorithm. To avoid routing congestion the decoder architecture employs a novel communication scheme that reduces the critical number of global wires by 50%. The prototype LDPC decoder ASIC, fabricated in 90 nm CMOS, occupies only 5.35 mm2 and achieves a decoding throughput of 11.69 Gb/s at 1.2 V with an energy efficiency of 133pJ/bit.

system on chip conference | 2010

Design and feasibility of multi-Gb/s quasi-serial vertical interconnects based on TSVs for 3D ICs

Fengda Sun; Alessandro Cevrero; Panagiotis Athanasopoulos; Yusuf Leblebici

This paper proposes a novel technique to exploit the high bandwidth offered by through silicon vias (TSVs). In the proposed approach, synchronous parallel 3D links are replaced by serialized links to save silicon area and increase yield. Detailed analysis conducted in 90 nm CMOS technology shows that the proposed 2-Gb/s/pin quasi-serial link requires approximately five times less area than its parallel bus equivalent at same data rate for a TSV diameter of 20 µm.

field programmable gate arrays | 2008

Architectural improvements for field programmable counter arrays: enabling efficient synthesis of fast compressor trees on FPGAs

Alessandro Cevrero; Panagiotis Athanasopoulos; Hadi Parandeh-Afshar; Ajay K. Verma; Philip Brisk; Frank K. Gürkaynak; Yusuf Leblebici; Paolo Ienne

The Field Programmable Counter Array (FPCA) was introduced to improve FPGA performance for arithmetic circuits. An FPCA is a reconfigurable IP core that can be integrated into an FPGA. To exploit the FPCA, a circuit is transformed by merging disparate addition and multiplication operations into large multi-input addition operations, which are synthesized as compressor trees on the FPCA; the remaining portion of the circuit is synthesized on the FPGA. This paper presents a series of architectural improvements to the FPCA that reduce routing delay, increase flexibility and component utilization, and simplify the integration process. Using an FPGA containing six FPCAs, we observed average and maximum speedups of 1.60x and 2.40x on a set of arithmetic benchmarks

international new circuits and systems conference | 2013

A parallelized layered QC-LDPC decoder for IEEE 802.11ad

Alexios Balatsoukas-Stimming; Nicholas Preyss; Alessandro Cevrero; Andreas Burg; Christoph Roth

We present a doubly parallelized layered quasi-cyclic low-density parity-check decoder for the emerging IEEE 802.11ad multigigabit wireless standard. The decoding algorithm is equivalent to a non-parallelized layered decoder and, thus, retains its favorable convergence characteristics, which are known to be superior to those of flooding schedule based decoders. The proposed architecture was synthesized using a TSMC 40 nm CMOS technology, resulting in a cell area of 0.18 mm2 and a clock frequency of 850 MHz. At this clock frequency, the decoder achieves a coded throughput of 3.12 Gbps, thus meeting the throughput requirements when using both the mandatory BPSK modulation and the optional QPSK modulation.

ACM Transactions on Reconfigurable Technology and Systems | 2009

Field Programmable Compressor Trees: Acceleration of Multi-Input Addition on FPGAs

Alessandro Cevrero; Panagiotis Athanasopoulos; Hadi Parandeh-Afshar; Ajay K. Verma; Hosein Seyed Attarzadeh Niaki; Chrysostomos Nicopoulos; Frank K. Gürkaynak; Philip Brisk; Yusuf Leblebici; Paolo Ienne

Multi-input addition occurs in a variety of arithmetically intensive signal processing applications. The DSP blocks embedded in high-performance FPGAs perform fixed bitwidth parallel multiplication and Multiply-ACcumulate (MAC) operations. In theory, the compressor trees contained within the multipliers could implement multi-input addition; however, they are not exposed to the programmer. To improve FPGA performance for these applications, this article introduces the Field Programmable Compressor Tree (FPCT) as an alternative to the DSP blocks. By providing just a compressor tree, the FPCT can perform multi-input addition along with parallel multiplication and MAC in conjunction with a small amount of FPGA general logic. Furthermore, the user can configure the FPCT to precisely match the bitwidths of the operands being summed. Although an FPCT cannot beat the performance of a well-designed ASIC compressor tree of fixed bitwidth, for example, 9×9 and 18×18-bit multipliers/MACs in DSP blocks, its configurable bitwidth and ability to perform multi-input addition is ideal for reconfigurable devices that are used across a variety of applications.

Explore More