Wai-Kong Lee
Universiti Tunku Abdul Rahman
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Wai-Kong Lee.
Cluster Computing | 2018
Wai-Kong Lee; Raphael C.-W. Phan; Geong-Sen Poh; Bok-Min Goi
The emergence of Cloud Computing is revolutionizing the way we store, query, analyze and consume data, which also bring forward other development that fundamentally changed our life style. For example, Industry 4.0 and Internet of Things (IoT) can improve the quality of manufacturing and many aspects in our daily life; both of them rely heavily on the cloud computing platform to develop. Central to this paradigm shift is the need to keep any common data, often held at remote outsourced locations and usually to be accessed by different authorized parties, secure from being leaked to unauthorized entities. When using the cloud services, consumer may want to encrypt sensitive data before uploading it to the cloud, but this will also eliminate the possibility to search the data efficiently in the cloud storage. A more practical solution to this is to employ a searchable encryption scheme in the cloud storage, so that user can query the encrypted data efficiently without revealing the sensitive data to the service provider. Besides the security and search features, performance of searchable encryption schemes is also very important when it comes to practical applications. In this paper, we propose several techniques to accelerate the search performance of encrypted data stored on the cloud. Notably, our techniques include massively parallel file encryption, multi-array keyword red black tree (KRBT) implementation, batched keyword search and enhanced parallel search in KRBT. To the best of our knowledge, SearchaStore is the first work that attempts to accelerate searchable encryption using GPU technology.
international symposium on intelligent signal processing and communication systems | 2014
Wai-Kong Lee; Bok-Min Goi; Raphael C.-W. Phan; Geong Sen Poh
Recently, GPU is widely accepted in research community as an effective accelerator to many existing algorithms. In this paper, we contribute to the cryptography research community by presenting high speed implementation of symmetric block ciphers in GPU platform. We implemented Camellia, CAST5 and SEED in NVIDIA GTX680 and present the details of implementation techniques together with benchmarking results against existing solutions. According to the evaluation result, we are able to achieve throughput of 61.1 Gbps, 45.5 Gbps and 47.4 Gbps for Camellia, CAST5 and SEED accordingly, without considering the data transfer between CPU and GPU. By considering the data transfer, the throughput for Camellia, CAST5 and SEED dropped to 44.9 Gbps, 40.5 Gbps and 38.6 Gbps accordingly.
international carnahan conference on security technology | 2017
Wai-Kong Lee; Xian-Fu Wong; Bok-Min Goi; Raphael C.-W. Phan
With the advent of Cloud Computing and IoT, secure communication has becoming an important aspect to protect the users and service providers from malicious attack. However, the adoption SSL/TLS is still not popular, due to the heavy computational requirements to implement them on the server side. Current solutions often rely on installing costly hardware accelerator to compute the cryptographic algorithms in order to offer responsive experience to the users (e.g. online payment and cloud storage). In this paper, we proposed to utilize GPU as an accelerator to compute the cryptographic algorithms, which is more cost effective compare to dedicated hardware accelerator. Firstly, we present several techniques to utilize the massively parallel architecture in GPU compute block ciphers (AES, Camelia, CAST5 and SEED) and public key cryptography (RSA). Secondly, we present a novel idea that utilizes warp shuffle instruction to speed up the implementation of SHA-3. Thirdly, we evaluated the performance of our implementation with state of the art GPU (Pascal architecture). Through extensive experiments, we show that CUDA-SSL is capable of achieving high-speed cryptography computation comparable to hardware accelerators, with only a fraction of their cost.
International Conference on Mobile and Wireless Technology | 2017
Xian-Fu Wong; Bok-Min Goi; Wai-Kong Lee; Raphael C.-W. Phan
RSA is an algorithm widely used in protecting the key exchange between two parties for secure mobile and wireless communication. Modular exponentiation is the main operation involved in RSA, which is very time consuming when the bit-size is large, usually in the range of 1024-bit to 4096-bit. The speed performance of RSA comes to concerns when thousands or millions of authentication requests are needed to handle by the server at a time, through a massive number of connected mobile and wireless devices. The performance of RSA can be improved by utilizing parallel computing architecture or enhancing existing modular exponentiation algorithm. In this paper, we exploit the massively parallel architecture in GPU to perform RSA computations. Various optimization techniques were proposed in this paper to achieve higher throughput in RSA computation in two GPU platforms. Moreover, we also incorporated signed-digit recoding to further improve the performance. To allow a fair comparison with existing implementation techniques, we proposed to evaluate the speed performance in the best case (least ‘0’ in exponent bits), average case (random exponent bits) and worse case (all ‘1’ in exponent bits). The overall throughput achieved by our implementation is about 12% higher in random exponent bits and 50% higher in all 1’s exponent bits compared to the implementation without signed-digit recoding technique. Our implementation is able to achieve 17713 and 89043 1024-bit modular exponentiation per second on random exponent bits in GTX 960 M and GTX 1080, which represent the two state of the art GPU architecture.
high performance computing and communications | 2016
Boon-Chiao Chang; Bok-Min Goi; Raphael C.-W. Phan; Wai-Kong Lee
Multiple precision multiplication is widely used in scientific computing and cryptography. When the size of integer grows beyond computer precision (32-bit or 64-bit), the computational cost of multiplication becomes significant. In this paper, we proposed a novel solution to implement multiple precision multiplication in massively parallel GPU with Kepler architecture. Our implementation is designed based on Chinese Remainder Theorem and Number Theoretic Transform with 64-bit prime. We implemented three versions of multiple precision multiplication which utilized global memory, shared memory and registers to store the precomputed twiddle factors. The register version use warp shuffle instruction (available in GPU with Kepler architecture) to exchange data among threads within the same warp. Thist echnique is able to avoid bank conflict issue in shared memory and allow faster computation in GPU. To the best of our knowledge, this is the first implementation reported in the literature that utilized warp shuffle instruction to accelerate NTT computation. Our best implementation is able to perform 1024-bit, 2048-bit, 4096-bit and 8192-bit multiplication in 0.095ms, 0.169ms, 0.444ms and 1.113ms respectively.
international conference on it convergence and security, icitcs | 2015
Hon-Sang Cheong; Wai-Kong Lee
GPU is widely used in various applications that require huge computational power. In this paper, we contribute to the cryptography research community by presenting techniques to accelerate symmetric block ciphers (IDEA, Blowfish and Threefish) in NVIDIA GTX 690 with Kepler architecture. The results are benchmarked against implementation in OpenMP and existing GPU implementations in the literature. We are able to achieve encryption throughput of 90.3 Gbps, 50.82 Gbps and 83.71 Gbps for IDEA, Blowfish and Threefish respectively. Block ciphers can be used as pseudorandom number generator (PRNG) when it is operating under counter mode (CTR), but the speed is usually slower compare to other PRNG using lighter operations. Hence, we attempt to modify IDEA and Blowfish in order to achieve faster PRNG generation. The modified IDEA and Blowfish manage to pass all NIST Statistical Test and TestU01 Small Crush except the more stringent tests in TestU01 (Crush and BigCrush).
Cluster Computing | 2016
Wai-Kong Lee; Hon-Sang Cheong; Raphael C.-W. Phan; Bok-Min Goi
Nonlinear Dynamics | 2018
Wai-Kong Lee; Raphael C.-W. Phan; Wun-She Yap; Bok-Min Goi
international conference on information networking | 2018
Hoon-Keng Poon; Wun-She Yap; Yee-Kai Tee; Bok-Min Goi; Wai-Kong Lee
IEEE Access | 2018
Wai-Kong Lee; Raphael C.-W. Phan; Bok-Min Goi; Lanxiang Chen; Xiujun Zhang; Naixue Xiong