Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Ray C. C. Cheung is active.

Publication


Featured researches published by Ray C. C. Cheung.


IEEE Transactions on Computers | 2017

Area-Time Efficient Architecture of FFT-Based Montgomery Multiplication

Wangchen Dai; Donald Donglong Chen; Ray C. C. Cheung; Çetin Kaya Koç

The modular multiplication operation is the most time-consuming operation for number-theoretic cryptographic algorithms involving large integers, such as RSA and Diffie-Hellman. Implementations reveal that more than 75 percent of the time is spent in the modular multiplication function within the RSA for more than 1,024-bit moduli. There are fast multiplier architectures to minimize the delay and increase the throughput using parallelism and pipelining. However such designs are large in terms of area and low in efficiency. In this paper, we integrate the fast Fourier transform (FFT) method into the McLaughlin’s framework, and present an improved FFT-based Montgomery modular multiplication (MMM) algorithm achieving high area-time efficiency. Compared to the previous FFT-based designs, we inhibit the zero-padding operation by computing the modular multiplication steps directly using cyclic and nega-cyclic convolutions. Thus, we reduce the convolution length by half. Furthermore, supported by the number-theoretic weighted transform, the FFT algorithm is used to provide fast convolution computation. We also introduce a general method for efficient parameter selection for the proposed algorithm. Architectures with single and double butterfly structures are designed obtaining low area-latency solutions, which we implemented on Xilinx Virtex-6 FPGAs. The results show that our work offers a better area-latency efficiency compared to the state-of-the-art FFT-based MMM architectures from and above 1,024-bit operand sizes. We have obtained area-latency efficiency improvements up to 50.9 percent for 1,024-bit, 41.9 percent for 2,048-bit, 37.8 percent for 4,096-bit and 103.2 percent for 7,680-bit operands. Furthermore, the operating latency is also outperformed with high clock frequency for length-64 transform and above.


IEEE Transactions on Computers | 2017

Area-Time Efficient Computation of Niederreiter Encryption on QC-MDPC Codes for Embedded Hardware

Jingwei Hu; Ray C. C. Cheung

In this paper, we present a fast implementation for QC-MDPC Niederreiter encryption. Existing high-speed implementations are considerably resource involving but the solution we propose here mitigates such situation while maintaining the high throughputs. In particular, new arithmetic for lightweight Hamming weight computation and a fast sorting network for MDPC decoding are proposed. A novel constant weight coding unit is proposed to enable standard asymmetric encryptions. For now, the design presented in this work is the fastest one of existing QC-MDPC code based encryptions in the public domain. The area-time product of this work drops by at least 53 percent compared to previous fast speed designs of QC-MDPC based encryptions. It is shown for instance that our implementation of encrypting engine can sign one encryption in 3.86 <inline-formula><tex-math notation=LaTeX>


IEEE Transactions on Circuits and Systems Ii-express Briefs | 2017

Compact Constant Weight Coding Engines for the Code-Based Cryptography

Jingwei Hu; Ray C. C. Cheung; Tim Güneysu

mu s


Signal Processing-image Communication | 2018

A fast inter CU decision algorithm for HEVC

Zhe Xu; Biao Min; Ray C. C. Cheung

</tex-math><alternatives> <inline-graphic xlink:href=hu-ieq1-2672984.gif/></alternatives></inline-formula> on a Xilinx Virtex-6 FPGA with 3371 slices. Our iterative decrypting engine can decrypt one ciphertext in 114.64 <inline-formula> <tex-math notation=LaTeX>


Journal of Cryptographic Engineering | 2018

Spectral arithmetic in Montgomery modular multiplication

Wangchen Dai; Ray C. C. Cheung

mu s


IEEE Transactions on Circuits and Systems Ii-express Briefs | 2018

High-Speed Discrete Gaussian Sampler With Heterodyne Chaotic Laser Inputs

Yao Liu; Xiao-Zhou Li; Ray C. C. Cheung; Sze-Chun Chan; Hei Wong

</tex-math><alternatives><inline-graphic xlink:href=hu-ieq2-2672984.gif/> </alternatives></inline-formula> with 5271 slices and our faster non-iterative decrypting engine can decrypt in 65.76 <inline-formula><tex-math notation=LaTeX>


international conference on systems signals and image processing | 2017

Fast HEVC intra coding decision based on statistical cost and corner detection

Biao Min; Zhe Xu; Ray C. C. Cheung

mu s


Science in China Series F: Information Sciences | 2017

A low power V-band LC VCO with high Q varactor technique in 40 nm CMOS process

Qian Zhou; Yan Han; Shifeng Zhang; Xiaoxia Han; Lu Jie; Ray C. C. Cheung; Guangtao Feng

</tex-math><alternatives> <inline-graphic xlink:href=hu-ieq3-2672984.gif/></alternatives></inline-formula> with 8781 slices.


Archive | 2017

Side Channel Attacks and Their Low Overhead Countermeasures on Residue Number System Multipliers

Gavin Xiaoxu Yao; Marc Stöttinger; Ray C. C. Cheung; Sorin A. Huss

We present here a more memory efficient method for encoding binary information into words of prescribed length and weight. Existing solutions either require complicated float point arithmetic or additional memory overhead, making it a challenge for resource constrained computing environment. The solution we propose here solves these problems yet obtains better coding efficiency by a memory efficient approximation of the critical intermediate value in constant weight coding. For the time being, the design presented in this brief is the most compact one for any code-based encryption schemes.


Journal of Electromagnetic Waves and Applications | 2017

A V-band VCO with on-chip body bias voltage control technique using 40-nm CMOS process

Qian Zhou; Yan Han; Shifeng Zhang; Xiaoxia Han; Ray C. C. Cheung; Guangtao Feng

Abstract A fast CU decision algorithm is very desirable for High Efficiency Video Coding (HEVC) due to its high encoding complexity. In this paper, a fast inter CU decision algorithm is proposed, with the motion correlations between neighboring CUs discussed. Decision for splitting of collocated CU has a strong impact on current CU, high-motion CUs are early split by means of calculating motion diversity of collocated CUs. On the other hand, SKIP mode indicates a motion sharing relation among neighboring CUs and it can be used to early determine CU termination. A discriminant function minimizing expected risk is defined for both early SKIP mode detection and SKIP mode based termination decision. Experimental results show that the proposed algorithm can reduce computational complexity by 48.2% with only 0.46% BDBR increase for random access configuration. For the low-delay B configuration, it can reduce complexity by 45.9% with 0.55% BDBR increase penalty. The results show our algorithm achieves less BDBR increase compared with other state-of-the-art works.

Collaboration


Dive into the Ray C. C. Cheung's collaboration.

Top Co-Authors

Avatar

Jingwei Hu

City University of Hong Kong

View shared research outputs
Top Co-Authors

Avatar

Wangchen Dai

City University of Hong Kong

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Biao Min

City University of Hong Kong

View shared research outputs
Top Co-Authors

Avatar

Zhe Xu

City University of Hong Kong

View shared research outputs
Top Co-Authors

Avatar

Donald Donglong Chen

City University of Hong Kong

View shared research outputs
Top Co-Authors

Avatar

Guangtao Feng

Semiconductor Manufacturing International Corporation

View shared research outputs
Top Co-Authors

Avatar
Researchain Logo
Decentralizing Knowledge