Marcelo E. Kaihara | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Marcelo E. Kaihara is active.

Explore More

Publication

Featured researches published by Marcelo E. Kaihara.

symposium on computer arithmetic | 2009

Selected RNS Bases for Modular Multiplication

Jean-Claude Bajard; Marcelo E. Kaihara; Thomas Plantard

The selection of the elements of the bases in an RNS modular multiplication method is crucial and has a great impact in the overall performance.This work proposes specific sets of optimal RNS moduli with elements of Hamming weight three whose inverses used in the MRS reconstruction have very small Hamming weight. This property is exploited in RNS bases conversions, to completely remove and replace the products by few additions/subtractions and shifts, reducing the time complexity of modular multiplication.These bases are specially crafted to computation with operands of sizes

IEEE Transactions on Computers | 2005

A hardware algorithm for modular multiplication/division

Marcelo E. Kaihara; Naofumi Takagi

256

IEEE Transactions on Computers | 2008

Bipartite Modular Multiplication Method

Marcelo E. Kaihara; Naofumi Takagi

or more and are suitable for cryptographic applications such as the ECC protocols.

International Journal of Applied Cryptography | 2012

Solving a 112-bit prime elliptic curve discrete logarithm problem on game consoles using sloppy reduction

Joppe W. Bos; Marcelo E. Kaihara; Thorsten Kleinjung; Arjen K. Lenstra; Peter L. Montgomery

A mixed radix-4/2 algorithm for modular multiplication/division suitable for VLSI implementation is proposed. The algorithm is based on Montgomery method for modular multiplication and on the extended Binary GCD algorithm for modular division. Both algorithms are modified and combined into the proposed algorithm so that almost all the hardware components are shared. The new algorithm carries out both calculations using simple operations such as shifts, additions, and subtractions. The radix-2 signed-digit representation is used to avoid carry propagation in all additions and subtractions. A modular multiplier/divider based on the algorithm performs an n-bit modular multiplication/division in O(n) clock cycles where the length of the clock cycle is constant and independent of n. The modular multiplier/divider has a linear array structure with a bit-slice feature and can be implemented with much smaller hardware than that necessary to implement both multiplier and divider separately.

cryptographic hardware and embedded systems | 2005

Bipartite modular multiplication

Marcelo E. Kaihara; Naofumi Takagi

This paper proposes a new modular multiplication method that uses Montgomery residues defined by a modulus M and a Montgomery radix R whose value is less than the modulus M. This condition enables the operand multiplier to be split into two parts that can be processed separately in parallel - increasing the calculation speed. The upper part of the split multiplier can be processed by calculating a product modulo M of the multiplicand and this part of the split multiplier. The lower part of the split multiplier can be processed by calculating a product modulo M of the multiplicand, this part of the split multiplier, and the inverse of a constant R. Two different implementations based on this method are proposed: One uses a classical modular multiplier and a Montgomery multiplier and the other generates partial products for each part of the split multiplier separately, which are added and accumulated in a single pipelined unit. A radix-4 version of a multiplier based on a radix-4 classical modular multiplier and a radix-4 Montgomery multiplier has been designed and simulated. The proposed method is also suitable for software implementation in a multiprocessor environment.

parallel processing and applied mathematics | 2009

Montgomery multiplication on the cell

Joppe W. Bos; Marcelo E. Kaihara

We describe a cell processor implementation of Pollards rho method to solve discrete logarithms in groups of elliptic curves over prime fields. The implementation was used on a cluster of PlayStation 3 game consoles to set a new record. We present in detail the underlying single instruction multiple data modular arithmetic.

symposium on computer arithmetic | 2003

A VLSI algorithm for modular multiplication/division

Marcelo E. Kaihara; Naofumi Takagi

This paper proposes a new fast method for calculating modular multiplication. The calculation is performed using a new representation of residue classes modulo M that enables the splitting of the multiplier into two parts. These two parts are then processed separately, in parallel, potentially doubling the calculation speed. The upper part and the lower part of the multiplier are processed using the interleaved modular multiplication algorithm and the Montgomery algorithm respectively. Conversions back and forth between the original integer set and the new residue system can be performed at speeds up to twice that of the Montgomery method without the need for precomputed constants. This new method is suitable for both hardware implementation; and software implementation in a multiprocessor environment. Although this paper is focusing on the application of the new method in the integer field, the technique used to speed up the calculation can also easily be adapted for operation in the binary extended field GF(2m).

IEICE Transactions on Fundamentals of Electronics, Communications and Computer Sciences | 2005

A Hardware Algorithm for Modular Multiplication/Division Based on the Extended Euclidean Algorithm

Marcelo E. Kaihara; Naofumi Takagi

A technique to speed up Montgomery multiplication targeted at the Synergistic Processor Elements (SPE) of the Cell Broadband Engine is proposed. The technique consists of splitting a number into four consecutive parts. These parts are placed one by one in each of the four element positions of a vector, representing columns in a 4-SIMD organization. This representation enables arithmetic to be performed in a 4-SIMD fashion. An implementation of the Montgomery multiplication using this technique is up to 2.47 times faster compared to an unrolled implementation of Montgomery multiplication, which is part of the IBM multi-precision math library, for odd moduli of length 160 to 2048 bits. The presented technique can also be applied to speed up Montgomery multiplication on other SIMD-architectures.

IACR Cryptology ePrint Archive | 2009

On the security of 1024-bit RSA and 160-bit elliptic curve cryptography

Joppe W. Bos; Marcelo E. Kaihara; Thorsten Kleinjung; Arjen K. Lenstra; Peter L. Montgomery

We propose an algorithm for modular multiplication/division suitable for VLSI implementation. The algorithm is based on Montgomerys method for modular multiplication and on the extended binary GCD algorithm for modular division. It can perform either of these operations with a reduced amount of hardware. Both calculations are carried out through iterations of simple operations such as shifts and additions/subtractions. The radix-2 signed-digit representation is employed so that all additions and subtractions are performed without carry propagation. A modular multiplier/divider based on this algorithm has a linear array structure with a bit-slice feature and carries out an n-bit modular multiplication in at most /spl lfloor/2(n+2)/3/spl rfloor/+3 clock cycles and an n-bit modular division in at most 2n+5 clock cycles, where the length of the clock cycle is constant and independent of n.

Flash informatique | 2009

Massively parallel number crunching at EPFL

Joppe W. Bos; Marcelo E. Kaihara; Thorsten Kleinjung; Arjen K. Lenstra; A. D. Osvik

A hardware algorithm for modular multiplication/division which performs modular division, Montgomery multiplication, and ordinary modular multiplication is proposed. The modular division in our algorithm is based on the extended Euclidean algorithm. We employ our newly proposed computation method that consists of processing the multiplier from the most significant digit first to calculate Montgomery multiplication. Finally, the ordinary modular multiplication is based on shift-and-add multiplication. Each of these three operations is carried out through the iteration of simple operations such as shifts and additions/subtractions. To avoid carry propagation in all additions and subtractions, the radix-2 signed-digit representation is employed. A modular multiplier/divider based on the algorithm has a linear array structure with a bit-slice feature and carries out n-bit modular multiplication/division in O(n) clock cycles, where the length of the clock cycle is constant and independent of n. This multiplier/divider can be implemented using a hardware amount only slightly larger than that of the modular divider.

Explore More