Hojin Kee
National Instruments
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Hojin Kee.
vehicular technology conference | 2015
Swapnil Mhaske; Hojin Kee; Tai Ly; Ahsan Aziz; Predrag Spasojevic
We propose without loss of generality strategies to achieve a high-throughput FPGA-based architecture for a binary Quasi-Cyclic Low-Density Parity-Check (QC-LDPC) code based on a circulant-1 identity matrix construction. We present a novel representation of the parity-check matrix (PCM) providing a multi-fold throughput gain. Splitting of the node processing algorithm enables us to achieve pipelining of blocks and hence layers. By partitioning the PCM into not only layers but superlayers we derive an upper bound on the two-layer pipelining depth for the compact representation. To validate the architecture, a decoder for the IEEE 802.11n (2012) QC-LDPC is implemented on the Xilinx Kintex-7 FPGA with the help of the FPGA IP compiler available in the NI LabVIEW Communication System Design Suite (CSDS). It offers an automated and systematic compilation flow where an optimized hardware implementation from the LDPC algorithm was generated, achieving an overall throughput of 608Mb/s (at 260MHz). As per our knowledge this is the fastest implementation of the IEEE 802.11n QC-LDPC decoder using an algorithmic compiler.
ieee global conference on signal and information processing | 2014
Hojin Kee; Swapnil Mhaske; David C. Uliana; Adam T. Arnesen; Newton G. Petersen; Taylor L. Riché; Dustyn K. Blasig; Tai Ly
Many varied domain experts use Lab VIEW as a graphical system design tool to implement DSP algorithms on myriad target architectures. In this paper, we introduce the latest LabVIEW FPGA compiler that enables domain experts with minimum hardware knowledge to quickly implement, deploy, and verify their domain-specific applications on FPGA hardware. We present two compiler techniques that we use to 1) extract extra parallelism from a users application to take advantage of the parallel hardware resources of the FPGA and 2) minimize memory-access traffic, which is often a bottleneck that restricts overall FPGA performance. Finally, our approach provides the user a simple constraint-driven experience to maximize their development efficiency. We use two case studies in two different domains, a 3GPP Turbo decoder and a Smith-Waterman algorithm, to show the benefits our tool provides to users.
ieee sarnoff symposium | 2015
Swapnil Mhaske; David C. Uliana; Hojin Kee; Tai Ly; Ahsan Aziz; Predrag Spasojevic
The increasing data rates expected to be of the order of Gb/s for future wireless systems directly impact the throughput requirements of the modulation and coding systems of the physical layer. In an effort to design a suitable channel coding solution for 5G wireless systems, in this brief we present two approaches to improve the throughput of a Quasi-Cyclic Low-Density Parity-Check (QC-LDPC) decoder architecture. While providing an algorithmic method to enhance parallel processing within the decoder in the first approach, in the second approach we apply the decoder architecture to achieve another highly-parallel architecture. We have successfully validated the second approach to get a 2.48Gb/s QC-LDPC decoder implementation operating at 200MHz on the Xilinx Kintex-7 FPGA in the NI USRP-2953R. For rapid-prototyping our research findings, the high-level description of the entire decoder was translated to a Hardware Description Language (HDL), namely VHDL, using the algorithmic compiler in the National Instruments LabVIEW™ Communication System Design Suite (CSDS™). As per our knowledge, at the time of writing this paper, this is the fastest FPGA-based implementation of a standard compliant QC-LDPC decoder on a USRP using an algorithmic compiler.
International Journal of Reconfigurable Computing | 2017
Swapnil Mhaske; Hojin Kee; Tai Ly; Ahsan Aziz; Predrag Spasojevic
We propose strategies to achieve a high-throughput FPGA architecture for quasi-cyclic low-density parity-check codes based on circulant-1 identity matrix construction. By splitting the node processing operation in the min-sum approximation algorithm, we achieve pipelining in the layered decoding schedule without utilizing additional hardware resources. High-level synthesis compilation is used to design and develop the architecture on the FPGA hardware platform. To validate this architecture, an IEEE 802.11n compliant 608u2009Mb/s decoder is implemented on the Xilinx Kintex-7 FPGA using the LabVIEW FPGA Compiler in the LabVIEW Communication System Design Suite. Architecture scalability was leveraged to accomplish a 2.48u2009Gb/s decoder on a single Xilinx Kintex-7 FPGA. Further, we present rapidly prototyped experimentation of an IEEE 802.16 compliant hybrid automatic repeat request system based on the efficient decoder architecture developed. In spite of the mixed nature of data processing—digital signal processing and finite-state machines—LabVIEW FPGA Compiler significantly reduced time to explore the system parameter space and to optimize in terms of error performance and resource utilization. A 4x improvement in the system throughput, relative to a CPU-based implementation, was achieved to measure the error-rate performance of the system over large, realistic data sets using accelerated, in-hardware simulation.
ieee sarnoff symposium | 2016
Swapnil Mhaske; Hojin Kee; Tai Ly; Predrag Spasojevic
Physical layer processing for 5G wireless is expected to operate at a very high-throughput with very low latency. Developing a channel coding system based on Hybrid Automatic Repeat reQuest (HARQ) for evolving requirements necessitates extensive experimentation involving undesirably long development cycles. We demonstrate the use of a High-level Synthesis (HLS) compiler in LabVIEW Communications to prototype a real world HARQ system using Low-Density Parity-Check (LDPC) codes, however, without the expertise of an Hardware Description Language (HDL) designer. This implementation consumed 54% of the resources on our FPGA and allowed us to measure error-rate performance of the system over large, realistic data sets using accelerated, in-hardware simulation with a system throughput that is 4× greater than the CPU-based implementation. Furthermore, use of the HLS methodology significantly reduced time to explore the HARQ system parameter space and optimize in terms of error-rate performance and resource utilization.
arXiv: Hardware Architecture | 2015
Swapnil Mhaske; Hojin Kee; Tai Ly; Ahsan Aziz; Predrag Spasojevic
arXiv: Hardware Architecture | 2015
Swapnil Mhaske; David C. Uliana; Hojin Kee; Tai Ly; Ahsan Aziz; Predrag Spasojevic
Archive | 2014
Taylor L. Riché; Newton G. Petersen; Hojin Kee; Adam T. Arnesen; Haoran Yi; Dustyn K. Blasig; Tai A. Ly
Archive | 2016
Tai A. Ly; Swapnil D. Mhaske; Hojin Kee; Adam T. Arnesen; David C. Uliana; Newton G. Petersen
Archive | 2015
Hojin Kee; Tai A. Ly; Newton G. Petersen; Jeffrey D. Washington; Haoran Yi; Dustyn K. Blasig