Is this you? Create Your Porfile

Dong-U Lee

University of California, Los Angeles

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Dong-U Lee is active.

Explore More

Publication

Featured researches published by Dong-U Lee.

IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems | 2006

Accuracy-Guaranteed Bit-Width Optimization

Dong-U Lee; Altaf Abdul Gaffar; Ray C. C. Cheung; Oskar Mencer; Wayne Luk; George A. Constantinides

An automated static approach for optimizing bit widths of fixed-point feedforward designs with guaranteed accuracy, called MiniBit, is presented. Methods to minimize both the integer and fraction parts of fixed-point signals with the aim of minimizing the circuit area are described. For range analysis, the technique in this paper identifies the number of integer bits necessary to meet range requirements. For precision analysis, a semianalytical approach with analytical error models in conjunction with adaptive simulated annealing is employed to optimize the number of fraction bits. The analytical models make it possible to guarantee overflow/underflow protection and numerical accuracy for all inputs over the user-specified input intervals. Using a stream compiler for field-programmable gate arrays (FPGAs), the approach in this paper is demonstrated with polynomial approximation, RGB-to-YCbCr conversion, matrix multiplication, B-splines, and discrete cosine transform placed and routed on a Xilinx Virtex-4 FPGA. Improvements for a given design reduce the area and the latency by up to 26% and 12%, respectively, over a design using optimum uniform fraction bit widths. Studies show that MiniBit-optimized designs are within 1% of the area produced from the integer linear programming approach

IEEE Transactions on Computers | 2006

A hardware Gaussian noise generator using the Box-Muller method and its error analysis

Dong-U Lee; John D. Villasenor; Wayne Luk; Philip Heng Wai Leong

We present a hardware Gaussian noise generator based on the Box-Muller method that provides highly accurate noise samples. The noise generator can be used as a key component in a hardware-based simulation system, such as for exploring channel code behavior at very low bit error rates, as low as 10-12 to 10-13. The main novelties of this work are accurate analytical error analysis and bit-width optimization for the elementary functions involved in the Box-Muller method. Two 16-bit noise samples are generated every clock cycle and, due to the accurate error analysis, every sample is analytically guaranteed to be accurate to one unit in the last place. An implementation on a Xilinx Virtex-4 XC4VLX100-12 FPGA occupies 1,452 slices, three block RAMs, and 12 DSP slices, and is capable of generating 750 million samples per second at a clock speed of 375 MHz. The performance can be improved by exploiting concurrent execution: 37 parallel instances of the noise generator at 95 MHz on a Xilinx Virtex-II Pro XC2VP100-7 FPGA generate seven billion samples per second and can run over 200 times faster than the output produced by software running on an Intel Pentium-4 3 GHz PC. The noise generator is currently being used at the Jet Propulsion Laboratory, NASA to evaluate the performance of low-density parity-check codes for deep-space communications

IEEE Transactions on Computers | 2004

A Gaussian noise generator for hardware-based simulations

Dong-U Lee; Wayne Luk; John D. Villasenor; Peter Y. K. Cheung

Hardware simulation offers the potential of improving code evaluation speed by orders of magnitude over workstation or PC-based simulation. We describe a hardware-based Gaussian noise generator used as a key component in a hardware simulation system, for exploring channel code behavior at very low bit error rates (BERs) in the range of 10/sup -9/ to 10/sup -10/. The main novelty is the design and use of nonuniform piecewise linear approximations in computing trigonometric and logarithmic functions. The parameters of the approximation are chosen carefully to enable rapid computation of coefficients from the inputs while still retaining high fidelity to the modeled functions. The output of the noise generator accurately models a true Gaussian Probability Density Function (PDF) even at very high /spl sigma/ values. Its properties are explored using: 1) several different statistical tests, including the chi-square test and the Anderson-Darling test, and 2) an application for decoding of low-density parity-check (LDPC) codes. An implementation at 133 MHz on a Xilinx Virtex-II XC2V4000-6 FPGA produces 133 million samples per second, which is seven times faster than a 2.6 GHz Pentium-IV PC; another implementation on a Xilinx Spartan-IIE XC2S300E-7 FPGA at 62 MHz is capable of a three times speedup. The performance can be improved by exploiting parallelism: an XC2V4000-6 FPGA with nine parallel instances of the noise generator at 105 MHz can run 50 times faster than a 2.6 GHz Pentium-IV PC. We illustrate the deterioration of clock speed with the increase in the number of instances.

field-programmable technology | 2005

Reconfigurable acceleration for Monte Carlo based financial simulation

Guanglie Zhang; Philip Heng Wai Leong; Chun Hok Ho; Kuen Hung Tsoi; Chris C. C. Cheung; Dong-U Lee; Ray C. C. Cheung; Wayne Luk

This paper describes a novel hardware accelerator for Monte Carlo (MC) simulation, and illustrates its implementation in field programmable gate array (FPGA) technology for speeding up financial applications. Our accelerator is based on a generic architecture, which combines speed and flexibility by integrating a pipelined MC core with an on-chip instruction processor. We develop a generic number system representation for determining the choice of number representation that meets numerical precision requirements. Our approach is then used in a complex financial engineering application, involving the Brace, Gatarek and Musiela (BGM) interest rate model for pricing derivatives. We address, in our BGM model, several challenges including the generation of Gaussian distributed random numbers and pipelining of the MC simulation. Our BGM application, based on an off-the-shelf system with a Xilinx XC2VP30 device at 50 MHz, is over 25 times faster than software running on a 1.5 GHz, Intel Pentium machine

IEEE Transactions on Very Large Scale Integration Systems | 2005

A hardware Gaussian noise generator using the Wallace method

Dong-U Lee; Wayne Luk; John D. Villasenor; Guanglie Zhang; Philip Heng Wai Leong

We describe a hardware Gaussian noise generator based on the Wallace method used for a hardware simulation system. Our noise generator accurately models a true Gaussian probability density function even at high /spl sigma/ values. We evaluate its properties using: 1) several different statistical tests, including the chi-square test and the Anderson-Darling test and 2) an application for decoding of low-density parity-check (LDPC) codes. Our design is implemented on a Xilinx Virtex-II XC2V4000-6 field-programmable gate array (FPGA) at 155 MHz; it takes up 3% of the device and produces 155 million samples per second, which is three times faster than a 2.6-GHz Pentium-IV PC. Another implementation on a Xilinx Spartan-III XC3S200E-5 FPGA at 106 MHz is two times faster than the software version. Further improvement in performance can be obtained by concurrent execution: 20 parallel instances of the noise generator on an XC2V4000-6 FPGA at 115 MHz can run 51 times faster than software on a 2.6-GHz Pentium-IV PC.

IEEE Transactions on Image Processing | 2009

Energy-Efficient Image Compression for Resource-Constrained Platforms

Dong-U Lee; Hyungjin Kim; Mohammad H. Rahimi; Deborah Estrin; John D. Villasenor

One of the most important goals of current and future sensor networks is energy-efficient communication of images. This paper presents a quantitative comparison between the energy costs associated with 1) direct transmission of uncompressed images and 2) sensor platform-based JPEG compression followed by transmission of the compressed image data. JPEG compression computations are mapped onto various resource-constrained platforms using a design environment that allows computation using the minimum integer and fractional bit-widths needed in view of other approximations inherent in the compression process and choice of image quality parameters. Advanced applications of JPEG, such as region of interest coding and successive/progressive transmission, are also examined. Detailed experimental results examining the tradeoffs in processor resources, processing/transmission time, bandwidth utilization, image quality, and overall energy consumption are presented.

field-programmable logic and applications | 2005

Ziggurat-based hardware Gaussian random number generator

Guanglie Zhang; Philip Heng Wai Leong; Dong-U Lee; John D. Villasenor; Ray C. C. Cheung; Wayne Luk

An architecture and implementation of a high performance Gaussian random number generator (GRNG) is described. The GRNG uses the Ziggurat algorithm which divides the area under the probability density function into three regions (rectangular, wedge and tail). The rejection method is then used and this amounts to determining whether a random point falls into one of the three regions. The vast majority of points lie in the rectangular region and are accepted to directly produce a random variate. For the nonrectangular regions, which occur 1.5% of the time, the exponential or logarithm functions must be computed and an iterative fixed point operation unit is used. Computation of the rectangular region is heavily pipelined and a buffering scheme is used to allow the processing of rectangular regions to continue to operate in parallel with evaluation of the wedge and tail computation. The resulting system can generate 169 million normally distributed random numbers per second on a Xilinx XC2VP3O-6 device.

field-programmable technology | 2003

Hierarchical segmentation schemes for function evaluation

Dong-U Lee; Wayne Luk; John D. Villasenor; Peter Y. K. Cheung

This paper presents a method for evaluating functions based on piecewise polynomial approximation with a novel hierarchical segmentation scheme. The use of a novel hierarchy scheme of uniform segments and segments with size varying by powers of two enables us to approximate non-linear regions of a function particularly well. This partitioning is automated: efficient look-up tables and their coefficients are generated for a given function, input range, order of the polynomials, desired accuracy and finite precision constraints. We describe an algorithm to find the optimum number of segments and the placement of their boundaries, which is used to analyze the properties of a function and to benchmark out approach. Our method is illustrated using three non-linear compound functions, /spl radic/-log(x), x log(x) and a high order rational function. We present results for various operand sizes between 8 and 24 bits for first and second order polynomial approximations.

IEEE Transactions on Computers | 2005

Optimizing hardware function evaluation

Dong-U Lee; Altaf Abdul Gaffar; Oskar Mencer; Wayne Luk

We present a methodology and an automated system for function evaluation unit generation. Our system selects the best function evaluation hardware for a given function, accuracy requirements, technology mapping, and optimization metrics, such as area, throughput, and latency. Function evaluation f(x) typically consists of range reduction and the actual evaluation on a small convenient interval such as [0, /spl pi//2) for sin(x). We investigate the impact of hardware function evaluation with range reduction for a given range and precision of x and f(x) on area and speed. An automated bit-width optimization technique for minimizing the sizes of the operators in the data paths is also proposed. We explore a vast design space for fixed-point sin(x), log(x), and /spl radic/x accurate to one unit in the last place using MATLAB and ASC, a stream compiler for field-programmable gate arrays (FPGAs). In this study, we implement over 2,000 placed-and-routed FPGA designs, resulting in over 100 million application-specific integrated circuit (ASIC) equivalent gates. We provide optimal function evaluation results for range and precision combinations between 8 and 48 bits.

IEEE Transactions on Very Large Scale Integration Systems | 2007

Hardware Generation of Arbitrary Random Number Distributions From Uniform Distributions Via the Inversion Method

Ray C. C. Cheung; Dong-U Lee; Wayne Luk; John D. Villasenor

We present an automated methodology for producing hardware-based random number generator (RNG) designs for arbitrary distributions using the inverse cumulative distribution function (ICDF). The ICDF is evaluated via piecewise polynomial approximation with a hierarchical segmentation scheme that involves uniform segments and segments with size varying by powers of two which can adapt to local function nonlinearities. Analytical error analysis is used to guarantee accuracy to one unit in the last place (ulp). Compact and efficient RNGs that can reach arbitrary multiples of the standard deviation sigma can be generated. For instance, a Gaussian RNG based on our approach for a Xilinx Virtex-4 XC4VLX100-12 field-programmable gate array produces 16-bit random samples up to 8.2 sigma. It occupies 487 slices, 2 block-RAMs, and 2 DSP-blocks. The design is capable of running at 371 MHz and generates one sample every clock cycle.

Explore More