Younhee Choi
University of Saskatchewan
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Younhee Choi.
Microprocessors and Microsystems | 2010
Yu Zhang; Dongdong Chen; Younhee Choi; Li Chen; Seok-Bum Ko
In this work we propose a high performance elliptic curve cryptographic processor over GF(2^1^6^3) for the applications that require high performance. It has three finite field (FF) RISC cores and a main controller to achieve instruction-level parallelism (ILP) for elliptic curve point multiplication. Customized instructions are proposed to decrease clock cycles. The interconnection among three FF cores and the main controller is obtained based on the analysis of both data dependency and critical path. The proposed design can reach 185MHz with 20,807 slices when implemented on Xilinx XC4VLX80 FPGA device and 263MHz with 217,904 gates when synthesized with TSMC .18@mm CMOS technology.
Computers & Electrical Engineering | 2013
Younhee Choi; Qiao Zhang; Seok-Bum Ko
It is widely accepted that pulse transit time (PTT), from the R wave peak of electrocardiogram (ECG) to a characteristic point of photoplethysmogram (PPG), is related to arterial stiffness, and can be used to estimate blood pressure. A promising signal processing technology, Hilbert-Huang transform (HHT), is introduced to analyze both ECG and PPG data, which are inherently nonlinear and non-stationary. The relationship between blood pressure and PTT is illustrated, and the problems of calibration and re-calibration are also discussed in this paper. Moreover, multi-innovation recursive least square algorithm is employed to update the unknown parameter vector for the model and improve the results. Our algorithm is tested based on the continuous data from MIMIC database, and the accuracy is calculated to validate the proposed method.
Canadian Journal of Electrical and Computer Engineering-revue Canadienne De Genie Electrique Et Informatique | 2008
Ali Malik; Dongdong Chen; Younhee Choi; Moon Ho Lee; Seok-Bum Ko
With gate counts of ten million, field-programmable gate arrays (FPGAs) are becoming suitable for floating-point computations. Addition is the most complex operation in a floating-point unit and can cause major delay while requiring a significant area. Over the years, the VLSI community has developed many floating-point adder algorithms aimed primarily at reducing the overall latency. An efficient design of the floating-point adder offers major area and performance improvements for FPGAs. Given recent advances in FPGA architecture and area density, latency has become the main focus in attempts to improve performance. This paper studies the implementation of standard; leading-one predictor (LOP); and far and close datapath (2-path) floating-point addition algorithms in FPGAs. Each algorithm has complex sub-operations which contribute significantly to the overall latency of the design. Each of the sub-operations is researched for different implementations and is then synthesized onto a Xilinx Virtex-II Pro FPGA device. Standard and LOP algorithms are also pipelined into five stages and compared with the Xilinx IP. According to the results, the standard algorithm is the best implementation with respect to area, but has a large overall latency of 27.059 ns while occupying 541 slices. The LOP algorithm reduces latency by 6.5% at the cost of a 38% increase in area compared to the standard algorithm. The 2-path implementation shows a 19% reduction in latency with an added expense of 88% in area compared to the standard algorithm. The five-stage standard pipeline implementation shows a 6.4% improvement in clock speed compared to the Xilinx IP with a 23% smaller area requirement. The five-stage pipelined LOP implementation shows a 22% improvement in clock speed compared to the Xilinx IP at a cost of 15% more area.
symposium on computer arithmetic | 2009
Dongdong Chen; Yu Zhang; Younhee Choi; Moon Ho Lee; Seok-Bum Ko
This paper presents a new design and implementation of a 32-bit decimal floating-point (DFP) logarithmic converter based on the digit-recurrence algorithm. The converter can calculate accurate logarithms of 32-bit DFP numbers which are defined in the IEEE 754-2008 standard. Redundant digit e1 is obtained by look-up table in the first iteration and the rest redundant digits ej are selected by rounding the scaled remainder during the succeeding iterations. The sequential architecture of the proposed 32-bit DFP logarithmic converter is implemented on Xilinx Virtex-II Pro P30 FPGA device and then synthesized with TMSC 0.18-um standard cell library. The implementation results indicate that the maximum frequency of the proposed architecture is 47.7 MHz in FPGA and 107.9 MHz in TMSC 0.18-um technology. The faithful 32-bit DFP logarithm results can be obtained in 18 cycles.
international conference of the ieee engineering in medicine and biology society | 2009
Qiao Zhang; Yang Shi; Daniel Teng; Anh Dinh; Seok-Bum Ko; Li Chen; Jenny Basran; Vanina Dal Bello-Haas; Younhee Choi
The pulse transit time (PTT) based method has been suggested as a continuous, cuffless and non-invasive approach to estimate blood pressure. It is of paramount importance to accurately determine the pulse transit time from the measured electrocardiogram (ECG) and photoplethysmo-gram (PPG) signals. We apply the celebrated Hilbert-Huang Transform (HHT) to process both the ECG and PPG signals, and improve the accuracy of the PTT estimation. Further, the blood pressure variation is obtained by using a well-established formula reflecting the relationship between the blood pressure and the estimated PTT. Simulation results are provided to illustrate the effectiveness of the proposed method.
IEEE Transactions on Computers | 2012
Dongdong Chen; Liu Han; Younhee Choi; Seok-Bum Ko
This paper presents the algorithm and architecture of the decimal floating-point (DFP) logarithmic converter, based on the digit-recurrence algorithm with selection by rounding. The proposed approach can compute faithful DFP logarithm results for any one of the three DFP formats specified in the IEEE 754-2008 standard. In order to optimize the latency for the proposed design, we mainly integrate the following novel features: 1) using the redundant carry-save representation of the data path; 2) reducing the number of iterations by determining the number of initial iteration; and 3) retiming and balancing the delay of the proposed architecture. The proposed architecture is synthesized with STM 90-nm standard cell library and the results show that the critical path delay and the number of clock cycles of the proposed Decimal64 logarithmic converter are 1.55 ns (34.4 FO4) and 19, respectively, and the total hardware complexity is 43,572 NAND2 gates. The delay estimation results of the proposed architecture show that its latency is close to that of the binary radix-16 logarithmic converter, and that it has a significant decrease on latency compared with a recently published high performance CORDIC implementation.
Eurasip Journal on Embedded Systems | 2009
Yongsoon Lee; Younhee Choi; Seok-Bum Ko; Moon Ho Lee
This paper implements a field programmable gate array- (FPGA-) based face detector using a neural network (NN) and the bit-width reduced floating-point arithmetic unit (FPU). The analytical error model, using the maximum relative representation error (MRRE) and the average relative representation error (ARRE), is developed to obtain the maximum and average output errors for the bit-width reduced FPUs. After the development of the analytical error model, the bit-width reduced FPUs and an NN are designed using MATLAB and VHDL. Finally, the analytical (MATLAB) results, along with the experimental (VHDL) results, are compared. The analytical results and the experimental results show conformity of shape. We demonstrate that incremented reductions in the number of bits used can produce significant cost reductions including area, speed, and power.
international symposium on circuits and systems | 2008
Dongdong Chen; Younhee Choi; Li Chen; Daniel Teng; Khan A. Wahid; Seok-Bum Ko
This paper presents a novel design and implementation of a 7-digit fixed-point decimal-to-decimal logarithmic converter. Two approaches, binary-based decimal approximation algorithm (algorithm 1) and decimal linear approximation algorithm (algorithm 2), are proposed and investigated. It shows that decimal linear approximation algorithm (algorithm 2) is error-free in conversion between decimal and binary formats and also able to reduce maximum absolute error from binary-based algorithm 1s 0.00399 (integer cases) and 0.0483 (fraction cases) to 0.000994 (both cases). The Algorithm 2 is modeled in VHDL and implemented using combinational logic only in a Xilinx Virtex-II Pro P30 FPGA device. The logarithms results can be obtained in a single clock cycle, running at 50.9 MHz.
canadian conference on electrical and computer engineering | 2011
Anh Dinh; Younhee Choi; Seok-Bum Ko
Heart rate is essential in monitoring the health of an individual. Traditional use of ECG to measure the heart beat poses complication in many applications. There is need for a less intrusive, simple, easy to use, and versatile heart rate sensor. This paper describes the design and testing of a heart rate sensor using a low cost accelerometer. The design is based on the seismocardiography signals generated by the activity of the heart and blood flow acting on the chest wall. A custom circuit is built to process the signal to suit to a wide range of applications in health care. The design was built and successfully tested.
international symposium on circuits and systems | 2010
Yu Zhang; Dongdong Chen; Younhee Choi; Li Chen; Seok-Bum Ko
In this paper, we propose a high performance processor for elliptic curve cryptography (ECC) over GF(2163) by using polynomial presentation. It has three finite field (FF) RISC cores and a main controller to achieve instructionvb-level parallelism (ILP) with pipeline so that the largely parallelized algorithm for elliptic curve point multiplication can be well suited on this platform. Instructions for combined FF operation are proposed to decrease clock cycles in the instruction set. The interconnection among three FF cores and the main controller is obtained by analyzing the data dependency in the parallelized algorithm. The whole design is implemented on Xilinx XC4VLX80 FPGA device, and it can reach 185 MHz with 20,807 slices. The total time required for one ECC point scalar operation is 7.7µs in 1428 cycles.