Gavin Xiaoxu Yao | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Gavin Xiaoxu Yao is active.

Explore More

Publication

Featured researches published by Gavin Xiaoxu Yao.

cryptographic hardware and embedded systems | 2011

FPGA implementation of pairings using residue number system and lazy reduction

Ray C. C. Cheung; Sylvain Duquesne; Junfeng Fan; Nicolas Guillermin; Ingrid Verbauwhede; Gavin Xiaoxu Yao

Recently, a lot of progress has been made in the implementation of pairings in both hardware and software. In this paper, we present two FPGA-based high speed pairing designs using the Residue Number System and lazy reduction. We show that by combining RNS, which is naturally suitable for parallel architectures, and lazy reduction, which performs one reduction for multiple multiplications, the speed of pairing computation in hardware can be largely increased. The results show that both designs achieve higher speed than previous designs. The fastest version computes an optimal ate pairing at 126-bit security level in 0.573 ms, which is 2 times faster than all previous hardware implementations at the same security level.

international conference on pairing based cryptography | 2012

Faster pairing coprocessor architecture

Gavin Xiaoxu Yao; Junfeng Fan; Ray C. C. Cheung; Ingrid Verbauwhede

In this paper, we present a high-speed pairing coprocessor using Residue Number System (RNS) which is intrinsically suitable for parallel computation. This work improves the design of Cheung et al. [11] using a carefully selected RNS base and an optimized pipeline design of the modular multiplier. As a result, the cycle count for a modular reduction has been halved. When combining with the lazy reduction, Karatsuba-like formulas and optimal pipeline scheduling, a 128-bit optimal ate pairing computation can be completed in less than 100,000 cycles. We prototype the design on a Xilinx Virtex-6 FPGA using 5237 slices and 64 DSPs; a 128-bit pairing is computed in 0.358 ms running at 230MHz. To the best of our knowledge, this implementation outperforms all reported hardware and software designs.

IEEE Transactions on Computers | 2014

Novel RNS Parameter Selection for Fast Modular Multiplication

Gavin Xiaoxu Yao; Junfeng Fan; Ray C. C. Cheung; Ingrid Verbauwhede

The parameter selection of Residue Number Systems (RNS) has a great impact on its computational efficiency. This paper shows that a base extension, the most costly operation in RNS Montgomery multiplication, can be more efficient when the intervals between the RNS moduli are small. We propose a systematic RNS parameter selection procedure and two methods to select RNS moduli that lead to a reduced complexity. Our experimental results confirm the advantages of the selected moduli.

IEEE Transactions on Computers | 2016

Parameter Space for the Architecture of FFT-Based Montgomery Modular Multiplication

Donald Donglong Chen; Gavin Xiaoxu Yao; Ray C. C. Cheung; Derek Chi-Wai Pao; Çetin Kaya Koç

Modular multiplication is the core operation in public-key cryptographic algorithms such as RSA and the Diffie-Hellman algorithm. The efficiency of the modular multiplier plays a crucial role in the performance of these cryptographic methods. In this paper, improvements to FFT-based Montgomery Modular Multiplication (FFTM3) using carry-save arithmetic and pre-computation techniques are presented. Moreover, pseudo-Fermat number transform is used to enrich the supported operand sizes for the FFTM3. The asymptotic complexity of our method is O(l log l log log l), which is the same as the Schonhage-Strassen multiplication algorithm (SSA). A systematic procedure to select suitable parameter set for the FFTM3 is provided. Prototypes of the improved FFTM3 multiplier with appropriate parameter sets are implemented on Xilinx Virtex-6 FPGA. Our method can perform 3,100-bit and 4,124-bit modular multiplications in 6.74 and 7.78 μs, respectively. It offers better computation latency and area-latency product compared to the state-of-the-art methods for operand size of 3,072-bit and above.

field-programmable technology | 2010

Reconfigurable Number Theoretic Transform architectures for cryptographic applications

Gavin Xiaoxu Yao; Ray C. C. Cheung; Çetin Kaya Koç; Kim-Fung Man

As an important component of Spectral Modular Arithmetic (SMA) cryptographic co-processor, the efficient architectures of Number Theoretic Transforms (NTTs) on FPGA are discussed in this paper. We analyze characteristics of the NTTs for cryptographic applications, compare different arithmetic approaches, introduce an optimized solution for FPGA implementation, and developed several different architectures. Qualitative and quantitative analyses are provided to show the effectiveness of our proposed architectures.

field-programmable technology | 2012

Low complexity and hardware-friendly spectral modular multiplication

Donald Donglong Chen; Gavin Xiaoxu Yao; Çetin Kaya Koç; Ray C. C. Cheung

The Schönhage-Strassen Algorithm (SSA) is an asymptotically fast multiplication algorithm with the complexity of O(l log l log log l) where l is the operand size. It outperforms other multiplication algorithms when l is large enough. One possible usage of such long integer multiplication is for cryptography. Innovated from SSA, the Interleaved Spectral Montgomery Modular Multiplication (ISM3) algorithm is proposed to accelerate the modular multiplication. ISM3 algorithm primarily interleaves the Montgomery modular multiplication algorithm between time and spectral (frequency) domain. We show that the tasks in each step of the proposed algorithm have little data dependency, and hence, extremely suitable for hardware implementation. We present the parallel ISM3 architecture and implement it on Xilinx Virtex-II and Virtex-6 FPGAs. Experimental results show that our 3838-bit ISM3 is faster than the previous Montgomery multiplier. Moreover, our design can complete a 7678-bit modular multiplication in 3398 cycles in 17.98 μs on a Virtex-6 device.

2014 International Symposium on Integrated Circuits (ISIC) | 2014

Zero collision attack and its countermeasures on Residue Number System multipliers

Marc Stöttinger; Gavin Xiaoxu Yao; Ray C. C. Cheung

The Residue Number System (RNS) has been introduced to accelerate the modular multiplications in public-key cryptography. We investigate in this contribution the side-channel leakage of RNS multipliers used in an elliptic curve crypto system. Next to the threat analysis by zero collision attack we investigate different countermeasures to cope with such a physical attack. The resistance against side-channel attacks is improved without great area overhead or loss of speed performance.

Archive | 2017

Side Channel Attacks and Their Low Overhead Countermeasures on Residue Number System Multipliers

Gavin Xiaoxu Yao; Marc Stöttinger; Ray C. C. Cheung; Sorin A. Huss

Due to the natural parallelism and the speed enhancement, Residue Number System (RNS) has been introduced to perform the modular multiplications in public-key cryptography. In this work, we examine the secure performance of RNS under side channel attacks, expose the vulnerabilities, and propose countermeasures accordingly. The proposed methods improve the resistance against side channel attacks without great area overhead or loss of speed performance, and are compatible to other countermeasures on both the logic level and the algorithm level. We prototype the proposed design on FPGA, and presented the implementation results confirm the efficiency of the proposed countermeasures.

rapid system prototyping | 2010

Counter Embedded Memory architecture for trusted computing platform

Gavin Xiaoxu Yao; Ray C. C. Cheung; Kim-Fung Man

Due to various hacker attacks, trusted computing platform has received a lot of attentions recently. Encryption is introduced to maintain the confidentiality of data stored on such platform, while Message Authentication Codes (MACs) and authentication trees are employed to verify the data memory integrity. These encryption and authentication architectures suffer from several potential vulnerabilities which have been omitted by the previous work. In this paper, we first address our concern about a type of cryptanalysis; a ciphertext stored on memory can be decrypted and attacked by an adversary and the MACs and the authentication trees would become the victim of cryptanalytic attacks. In addition, we show that such an attack can be extended to multi-core systems by simply corrupting other unprotected cores and performing malicious behaviors. To handle these scenarios, we propose a Counter Embedded Memory (CEM) design, and employ embedded counters to record every data fetch and trace malicious operations. The proposed platform with CEM allows the system to trace unexpected memory access, thus can indicate potential attack in progress. We present both qualitative discussion and quantitative analysis to show the effectiveness of the proposed architecture. Our FPGA rapid prototype shows that the additional memory overhead is only 0.10% and the latency can be totally neglected.

Lecture Notes in Computer Science | 2012