Ricardo Chaves | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Ricardo Chaves is active.

Explore More

Publication

Featured researches published by Ricardo Chaves.

IEEE Transactions on Circuits and Systems | 2005

A universal architecture for designing efficient modulo 2/sup n/+1 multipliers

Leonel Sousa; Ricardo Chaves

This paper proposes a simple and universal architecture for designing efficient modified Booth multipliers modulo (2/sup n/+1). The proposed architecture is comprehensive, providing modulo (2/sup n/+1) multipliers with similar performance and cost both for the ordinary and for the diminished-1 number representations. The performance and the efficiency of the proposed multipliers are evaluated and compared with the earlier fastest modulo (2/sup n/+1) multipliers, based on a simple gate-count and gate-delay model and on experimental results obtained from CMOS implementations. These results show that the proposed approach leads on average to approximately 10% faster multipliers than the fastest known structures for the diminished-1 representation based on the modified Booth recoding. Moreover, they also show that the proposed architecture is the only one taking advantage of this recoding to obtain faster multipliers with a significant reduction in hardware. With the used figures of merit, the proposed diminished-1 multipliers are on average 10% and 25% more efficient than the known most efficient modulo (2/sup n/+1) multipliers for Booth recoded and nonrecoded multipliers, respectively.

cryptographic hardware and embedded systems | 2006

Improving SHA-2 hardware implementations

Ricardo Chaves; Georgi Kuzmanov; Leonel Sousa; Stamatis Vassiliadis

This paper proposes a set of new techniques to improve the implementation of the SHA-2 hashing algorithm. These techniques consist mostly in operation rescheduling and hardware reutilization, allowing a significant reduction of the critical path while the required area also decreases. Both SHA256 and SHA512 hash functions have been implemented and tested in the VIRTEX II Pro prototyping technology. Experimental results suggest improvements to related SHA256 art above 50% when compared with commercial cores and 100% to academia art, and above 70% for the SHA512 hash function. The resulting cores are capable of achieving the same throughput as the fastest unrolled architectures with 25% less area occupation than the smallest proposed architectures. The proposed cores achieve a throughput of 1.4 Gbit/s and 1.8 Gbit/s with a slice requirement of 755 and 1667 for SHA256 and SHA512 respectively, on a XC2VP30-7 FPGA.

digital systems design | 2003

RDSP: a RISC DSP based on residue number system

Ricardo Chaves; Leonel Sousa

This paper is focused on low power programmable fast digital signal processors (DSP) design based on a configurable 5-stage RISC core architecture and on residue number systems (RNS). Several innovative aspects are introduced at the control and datapath architecture levels, which support both the binary system and the RNS. A new moduli set {2/sup n/-1, 2/sup 2n/, 2/sup n/+1} is also proposed for balancing the processing time in the different RNS channels. Experimental results, obtained trough RDSP implementation on FPGA and ASIC, show that not only a significant reduction in circuit area and power consumption but also a speedup may be achieved with RNS when compared with a binary DSP.

international parallel and distributed processing symposium | 2006

Reconfigurable memory based AES co-processor

Ricardo Chaves; Georgi Kuzmanov; Stamatis Vassiliadis; Leonel Sousa

We consider the AES encryption/decryption algorithm and propose a memory based hardware design to support it. The proposed implementation is mapped on the Xilinx Virtex II Pro technology. Both the byte substitution and the polynomial multiplication of the AES algorithm are implemented in a single dual port on-chip memory block (BRAM). Two AES encryption/decryption cores have been designed and implemented on a prototyping XC2VP20-7 FPGA: a completely unrolled loop structure capable of achieving a throughput above 34 Gbits/s, with an implementation cost of 3513 slices and 80 BRAMs; and a fully folded structure, requiring only 515 slices and 12 BRAMs, capable of a throughput above 2 Gbits/s. To evaluate the proposed AES design, it has been embedded in a polymorphic processor organization, as a reconfigurable co-processor. Comparisons to state-of-the-art AES cores indicate that the proposed unfolded core outperforms the most recent works by 34% in throughput and requires 68% less reconfigurable area. Experimental results of both folded and unfolded AES cores suggest over 560% improvement in the throughput/slice metric when compared to the recent AES related art

IEEE Transactions on Very Large Scale Integration Systems | 2008

Cost-Efficient SHA Hardware Accelerators

Ricardo Chaves; Georgi Kuzmanov; Leonel Sousa; Stamatis Vassiliadis

This paper presents a new set of techniques for hardware implementations of secure hash algorithm (SHA) hash functions. These techniques consist mostly in operation rescheduling and hardware reutilization, therefore, significantly decreasing the critical path and required area. Throughputs from 1.3 Gbit/s to 1.8 Gbit/s were obtained for the SHA implementations on a Xilinx VIRTEX II Pro. Compared to commercial cores and previously published research, these figures correspond to an improvement in throughput/slice in the range of 29% to 59% for SHA-1 and 54% to 100% for SHA-2. Experimental results on hybrid hardware/software implementations of the SHA cores, have shown speedups up to 150 times for the proposed cores, compared to pure software implementations.

IEEE Transactions on Circuits and Systems | 2013

RNS Reverse Converters for Moduli Sets With Dynamic Ranges up to

Hector Pettenghi; Ricardo Chaves; Leonel Sousa

In the last years, investigation on residue number systems (RNS) has targeted parallelism and larger dynamic ranges. In this paper, we start from the moduli set {2n,2n-1,2n+1,2n-2(n+1)/2+1,2n+2(n+1)/2+1} , with an equivalent 5n -bit dynamic range, and propose horizontal and vertical extensions in order to improve the parallelism and increase the dynamic range. The vertical extensions increase the value of the power-of-2 modulus in the five-moduli set. With the horizontal extensions, new six channel sets are allowed by introducing the 2n+1+1 or 2n-1+1 moduli. This paper proposes methods to design memoryless reverse converters for the proposed moduli sets with large dynamic ranges, up to (8n+1)-bit. Due to the complexity of the reverse conversion, both the Chinese Remainder Theorem and the Mixed Radix Conversion are applied in the proposed methods to derive efficient reverse converters. Experimental results suggest that the proposed vertical extensions allow to reduce the area-delay-product up to 1.34 times in comparison with the related state-of-the-art. The horizontal extensions allow larger and more balanced moduli sets, resulting in an improvement of the RNS arithmetic computation, at the cost of lower reverse conversion performance.

digital systems design | 2004

(8n+1)

Ricardo Chaves; Leonel Sousa

The increasing usage of residual number system (RNS) in signal processing applications demands the development of new and more adaptable RNS moduli sets and arithmetic units. This paper presents a new adaptable moduli set extension for the traditional moduli set {2/sup n/ + 1, 2/sup n/, 2/sup n/ - 1}. As it will be shown, this new moduli set extension ({2/sup n/ + 1, 2/sup n+k/, 2/sup n/ - 1}) allows the balancing of the binary channel (2/sup n+k/) in relation to the other two channels. Moreover, it does not require the development of new addition and multiplication units, since it is possible to reuse the already developed and well studied units for these moduli operations.

IEEE Transactions on Circuits and Systems Ii-express Briefs | 2013

-bit

Hector Pettenghi; Ricardo Chaves; Leonel Sousa

In recent years, research on residue number systems (RNS) has targeted larger dynamic ranges (DRs) in order to further explore their inherent parallelism. In this brief, we start from the traditional three-moduli set {2n, 2n - 1, 2n + 1}, with an equivalent 3 n-bit DR; propose horizontal and vertical extensions to scale the DR; and improve the parallelism according to the requirements. This brief also introduces a method to design general reverse converters for extended moduli sets with the desired DRs, whereas the existing state of the art allows to achieve at most (8n + 1) bit. The experimental results suggest that the proposed moduli set extensions allow for larger and more balanced moduli sets, in comparison with the state of the art, resulting in an improvement of the overall RNS performance at the cost of a slower reverse conversion operation.

IEEE Transactions on Very Large Scale Integration Systems | 2013

{2/sup n/ + 1, 2/sup n+k/, 2/sup n/ - 1} : a new RNS moduli set extension

Leonel Sousa; Samuel Antao; Ricardo Chaves

In this brief, we propose a method to design efficient adder-based converters for the four-moduli set {2n+1, 2n-1, 2n, 2n+1+1} with n odd, which provides a dynamic range of 4n+1 bits for the residue number system (RNS). This method hierarchically applies the mixed radix approach to balanced pairs of residues in two levels. With the proposed method, only simple binary and modulo 2k-1 additions are required, fully avoiding the usage of modulo 2k+1 arithmetic operations, which is a significant advantage over the currently available RNS reverse converters for this type of moduli set. Experimental results show that the delay of the proposed converters is significantly reduced when compared with the related state of the art; for example, for a 65-nm CMOS ASIC technology and a dynamic range of 21 bits, the conversion time and the circuit area are reduced by about 44% and 30%, respectively, while the conversion time is reduced by 34% for a dynamic range of 37 bits with the circuit area increasing only by 25%. Moreover, the proposed reverse converters outperform the related state of the art for any value of n by up to 70%, according to the figure-of-merit energy per conversion.

digital systems design | 2010

Method to Design General RNS Reverse Converters for Extended Moduli Sets

Pedro Miguens Matutino; Ricardo Chaves; Leonel Sousa

A new moduli set {2n-1, 2n+3, 2n+1, 2n-3} has recently been proposed to represent numbers in Residue Number Systems (RNS), increasing the number of channels. With this, the processing time can be reduced by simultaneously exploiting the carry-free characteristic of the modular arithmetic and improving the parallelism. In this paper, hardware structures for addition and multiplication operation in RNS for the moduli {2n-3} and {2n+3} are proposed and analyzed. In order to evaluate the performance of the proposed units they were implemented on an ASIC technology. The obtained experimental results suggest that the performance of the moduli {2n\pm3} are acceptable but demand more area resource and impose a larger delay than the typically used {2n\pm1} arithmetic units. Addition units require at least 42\% more area for a performance identical to the {2n+1} modulo adder. The multiplication units require up to 37% more area and impose a delay 25% higher. This paper also suggests that more balanced moduli sets should be developed in order to achieve more efficient RNS.

Explore More