Trong-Thuc Hoang
University of Electro-Communications
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Trong-Thuc Hoang.
international symposium on circuits and systems | 2016
Xuan-Thuan Nguyen; Hong-Thu Nguyen; Trong-Thuc Hoang; Katsumi Inoue; Osamu Shimojo; Toshio Murayama; Kenji Tominaga; Cong-Kha Pham
Recent years have witnessed a massive growth of global data due to the ubiquitous internet-of-thing products, social networking services, and mobile devices. Fast database analytics, therefore, has been increasingly attractive to numerous research. In this paper, a low-latency FPGA-based Database Processor (DBP) using bitmap index is proposed. By exploiting available embedded memory blocks and logic elements, a 50-MHz DBP is capable of performing 1,024 queries for entire 32,768 4-KB records within around 3.31 ms. In other words, the DBP can analyze a capacity data of nearly 37.76 GB per second.
IEICE Electronics Express | 2015
Hong-Thu Nguyen; Xuan-Thuan Nguyen; Trong-Thuc Hoang; Duc-Hung Le; Cong-Kha Pham
Despite being proposed since more than 50 years ago, COordinate Rotation DIgital Computer (CORDIC) is still one of the most effective algorithms for elementary function calculation so far. Original CORDIC, however, suffers high latency due to its nature of unvarying number of rotations. As a result, a low-latency hybrid adaptive (HA) CORDIC is proposed in this paper. Firstly, adaptive angle selection decreases total iterations up to 50% with respect to higher accuracy of results. Secondly, hybrid architecture including fixed-point input and floating-point output reduces the total hardware utilization and enhances the dynamic range of final results. Lastly, parallel and pipeline processing together with resource sharing technique allow the design to operate fully at 175.7 MHz with low resource consumption 1,139 LUTs and 489 registers.
2016 International Conference on Electronics, Information, and Communications (ICEIC) | 2016
Hong-Thu Nguyen; Xuan-Thuan Nguyen; Cong-Kha Pham; Trong-Thuc Hoang; Duc-Hung Le
Coordinate Rotation Digital Computer (CORDIC) was an efficient algorithm to compute elementary arithmetic such as multiplication, division, and root extractions. However, conventional CORDIC algorithm requires high latency to obtain the results. This paper proposes a low latency parallel pipeline CORDIC (PP-CORDIC) to calculate trigonometric functions. The results show that PP-CORDIC can operate at 83.64 MHz frequency with the latency was 10, 15, and 17 clock cycles in the best, average, and worst case, respectively. The hardware architecture occupies 7,035 LUTs, and 3,409 registers on Stratix IV FPGA.
IEICE Electronics Express | 2018
Hong-Thu Nguyen; Xuan-Thuan Nguyen; Trong-Thuc Hoang; Cong-Kha Pham
The purpose of this article is to propose a CORDIC-based QR Decomposition (CQRD) for MIMO Signal Detector module with qualities of low-resource and low-latency. The design contains four stages with six CORDIC modules in which its hardware architecture employs both vectoring and rotation mode equations. The evaluated results of CORDIC-based QRD prove that the proposed hardware design is high performance, low resource, and low latency. Because of the advantages of CQRD, it is suitable for the signal detector in MIMO systems.
IEICE Electronics Express | 2018
Trong-Thuc Hoang; Duc-Hung Le; Cong-Kha Pham
In this paper, the minimum adder-delay Discrete Cosine Transform (DCT) architecture is proposed using the Adaptive CORDIC (ACor) algorithm with fixed-rotation implementations. The proposed method has six different versions differ from the number of DCT point, i.e., 8-point (8p), 16-point (16p), and 32-point (32p), and the number of ACor stages, i.e., 2-Stage (2S) and 3-Stage (3S). The Altera Stratix IV and Stratix II FPGAs were used to built and verified the implementations. The 2S designs of 8p, 16p, and 32p DCT achieved the timing performances of four, five, and six adder-delay results, respectively. The proposed method was proven to have the best timing performances, good accuracy results, and adequate resources cost in comparison with other recent works.
IEICE Electronics Express | 2018
Katsumi Inoue; Trong-Thuc Hoang; Cong-Kha Pham
In this paper, the hardware design of frequent items counter is proposed. The key idea is to create a matrix of binary-value by using an array of binary-decoder to decode all of the input items in parallel. After that, an array of population-count modules are applied to the rows of the matrix to generate counting results. The architecture was implemented with five options of bit/item from 6-bit/item to 10-bit/item, and seven options of count-register bit-width from 8-bit counters to 32-bit counters. Therefore, there were 35 different versions of implementation presented in this paper. Those implementations were built on the Field-Programmable Gate Array (FPGA) board of Altera Arria V SoC development kit. Also, they were synthesized to chips with the process technology of 65nm Silicon On Thin BuriedOxide (SOTB). The experimental results of the proposed architecture achieved outstanding timing performances compared to other attempts to date.
international midwest symposium on circuits and systems | 2017
Trong-Thuc Hoang; Xuan-Thuan Nguyen; Hong-Thu Nguyen; Nhu-Quynh Truong; Duc-Hung Le; Katsumi Inoue; Cong-Kha Pham
In this paper, an FPGA-based implementation of Frequent Items Counting is proposed. The architecture deploys the equality comparator matrix for comparing the input items with themselves to count them instantly within a single operating clock. The proposed architecture is applied to the case of the 8-bit item. That means 256 different types of items in total. The system is built and verified on the Altera Arria V SoC Development Kit. The experimental results show that the implementation can perform on the maximum clock frequency of 40.85 MHz and requires 51,094 ALUTs and 8,417 registers, which is about 29% of the FPGAs resources. The average throughput performance achieves 1,280 millions items per second, which is about 50 times faster than that of the software-based application at the same setting.
2017 International Conference on Recent Advances in Signal Processing, Telecommunications & Computing (SigTelCom) | 2017
Phuong-Thao Vo-Thi; Trong-Thuc Hoang; Cong-Kha Pham; Duc-Hung Le
In this paper, a single-precision floating-point FFT Twiddle Factor (TF) implementation is proposed. The architecture is based on Adaptive Angle Recoding CORDIC (AARC) algorithm. The TF design is built and verified on Altera Stratix IV FPGA chip and 65nm SOTB synthesis. The FPGA implementation has 103.9 MHz maximum frequency, throughput result of 16.966 Mega-Sample per second (MSps), and resources utilization of 7, 747 ALUTs and 625 registers. On the other hand, the SOTB synthesis has 16, 858 standard cells on an area of 86, 718um2, 166 MHz maximum frequency, and the speed of 27.107 MSps. The accuracy results are 1.133E − 10 Mean-Square-Error (MSE) and about 26 part-per-million (ppm) maximum error-ratio.
international symposium on circuits and systems | 2016
Trong-Thuc Hoang; Duc-Hung Le; Hong-Thu Nguyen; Xuan-Thuan Nguyen; Cong-Kha Pham
In this paper, a hybird adaptive Coordinate Rotation Digital Computer (HA-CORDIC) has implemented in 65nm Silicon On Thin Buried oxide (SOTB) CMOS technology. In the HA-CORDIC implementation, the adaptive algorithm is utilized for reducing the iteration of CORDIC algorithm. In comparison with other floating-point CORDIC designs, the latency of our proposed scheme is lower. It spends only 12, 20, and 26 clocks cycles in the best, average, and worst case, respectively. The HA-CORDIC exploits some design techniques such as resource sharing, pipeline, and parallel processing to achieve low-resource and low-latency. In 65nm SOTB CMOS technology, this design is able to operate at 50 MHz frequency with 0.5 V supply voltage, 0.36 mA current, and 0.058 mm2 area. Its power consumption of HA-CORDIC is 0.251 mW, about three times lower than the one in conventional CMOS technology. Its leakage current is about 0.492 μA if the supply voltage VDD is 0.4 V and the bias voltage VBB is -1.5 V. This leakage current is about four times lower than that of HA-CORDIC implementing in conventional CMOS.
international conference on communications | 2016
Trong-Thuc Hoang; Hong-Thu Nguyen; Xuan-Thuan Nguyen; Cong-Kha Pham; Duc-Hung Le
In this paper, the authors proposed highperformance DCT architectures based on Coordinate Rotation Digital Computer (CORDIC). The implementations deployed Adaptive angle recoding CORDIC (ACor) method and Scale-Free Factor (SFF) technique. There are two models presented in the paper: ACor-based Chen-DCT (ACor-DCT-C) and ACor-based Loeffler-DCT (ACor-DCT-L). The critical path in both models is six adder-delay. The experimental results give the coding gain performances of 8.8238 dB and 8.8229 dB for ACor-DCT-C and ACor-DCT-L, respectively. The mean-square-error (MSE) results are 6.27e-6 and 4.42e-4 for ACor-DCT-C and ACor-DCT-L, respectively. Each design requires 36 adders and 16 shifters in its implementation.