Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Seonil Choi is active.

Publication


Featured researches published by Seonil Choi.


international parallel and distributed processing symposium | 2004

Analysis of high-performance floating-point arithmetic on FPGAs

Gokul Govindu; Ling Zhuo; Seonil Choi; Viktor K. Prasanna

Summary form only given. FPGAs are increasingly being used in the high performance and scientific computing community to implement floating-point based hardware accelerators. We analyze the floating-point multiplier and adder/subtractor units by considering the number of pipeline stages of the units as a parameter and use throughput/area as the metric. We achieve throughput rates of more than 240 Mhz (200 Mhz) for single (double) precision operations by deeply pipelining the units. To illustrate the impact of the floating-point units on a kernel, we implement a matrix multiplication kernel based on our floating-point units and show that a state-of-the-art FPGA device is capable of achieving about 15 GFLOPS (8 GFLOPS) for the single (double) precision floating-point based matrix multiplication. We also show that FPGAs are capable of achieving up to 6x improvement (for single precision) in terms of the GFLOPS/W (performance per unit power) metric over that of general purpose processors. We then discuss the impact of floating-point units on the design of an energy efficient architecture for the matrix multiply kernel.


field programmable gate arrays | 2003

Energy-efficient signal processing using FPGAs

Seonil Choi; Ronald Scrofano; Viktor K. Prasanna; Ju-wook Jang

In this paper, we present techniques for energy-efficient design at the algorithm level using FPGAs. We then use these techniques to create energy-efficient designs for two signal processing kernel applications: fast Fourier transform (FFT) and matrix multiplication. We evaluate the performance, in terms of both latency and energy efficiency, of FPGAs in performing these tasks. Using a Xilinx Virtex-II as the target FPGA, we compare the performance of our designs to those from the Xilinx library as well as to conventional algorithms run on the PowerPC core embedded in the Virtex-II Pro and the Texas Instruments TMS320C6415. Our evaluations are done both through estimation based on energy and latency equations and through low-level simulation. For FFT, our designs dissipated an average of 60% less energy than the design from the Xilinx library and 56% less than the DSP. Our designs showed a factor of 10 improvement over the embedded processor. These results provide concrete evidence to substantiate the idea that FPGAs can outperform DSPs and embedded processors in signal processing. Further, they show that FPGAs can achieve this performance while still dissipating less energy than the other two types of devices.


field-programmable technology | 2002

Area and time efficient implementations of matrix multiplication on FPGAs

Ju-wook Jang; Seonil Choi; Viktor K. Prasanna

We develop new algorithms and architectures for matrix multiplication on configurable hardware. These designs significantly reduce the latency as well as the area. Our designs improve the previous designs in terms of the area/speed metric where the speed denotes the maximum achievable running frequency. The area/speed metrics for the previous designs and our design are 14.45, 4.93, and 2.35, respectively, for 4 /spl times/ 4 matrix multiplication. The latency of one of the previous design is 0.57 /spl mu/s, while our design takes 0.15 /spl mu/s using 18% less area. The area of our designs is smaller by 11% - 46% compared with the best known systolic designs with the same latency for the matrices of sizes 3 /spl times/ 3 - 12 /spl times/ 12. The performance improvements tend to grow with the problem size.


international parallel and distributed processing symposium | 2004

A high-performance and energy-efficient architecture for floating-point based LU decomposition on FPGAs

Gokul Govindu; Seonil Choi; Viktor K. Prasanna; Vikash Daga; Sridhar Gangadharpalli; V. Sridhar

Summary form only given. We first develop a novel architecture for fixed-point LU decomposition of streaming input matrices, on FPGAs. Our architecture, based on a circular linear array, achieves the minimal latency and is resource-efficient. We then extend it, by using a stacked matrices approach, to a floating-point based architecture, which achieves the minimal effective latency. Our design objective was to develop high-throughput and energy-efficient architectures for applications, which require computing LU decomposition. We analyze (1) the impact of high-throughput, pipelined floating-point units (with different depths of pipelining and different performance) on the architectures performance, and (2) the impact of algorithm level design on the system-wide energy dissipation. We analyze the energy dissipation by capturing algorithm and architectural details of the target FPGA device. We analyze and compare our architecture with a state-of-art architecture implemented on FPGAs with respect to latency, area and energy. Our designs achieve a 10%-60% reduction in energy over that of the state-of-art architecture.


field programmable logic and applications | 2002

Energy-Efficient Matrix Multiplication on FPGAs

Ju-wook Jang; Seonil Choi; Viktor K. Prasanna

We develop new algorithms and architectures for matrix multiplication on configurable devices. These designs significantly reduce the energy dissipation and latency compared with the state-of-the-art FPGA-based designs. We derive functions to represent the impact of algorithmic level design choices on the system-wide energy dissipation, latency, and area by capturing algorithm and architecture details including features of the target FPGA. The functions are used to optimize energy performance under latency and area constraints for a family of candidate algorithms and architectures. As a result, our designs improve the energy performance of the optimized design from the recent Xilinx library by 32% to 88% without any increase in area-latency product. In terms of comprehensive metrics such as EAT (Energy-Area-Time) and E/AT (Energy/Area-Time), our designs offer superior performance compared with the Xilinx design by 50%-79% and 13%-44%, respectively. We also address how to exploit further increases in density of future FPGA devices for asymptotic improvement in latency and energy dissipation for multiplication of larger size matrices.


field-programmable technology | 2002

Energy efficiency of FPGAs and programmable processors for matrix multiplication

Ronald Scrofano; Seonil Choi; Viktor K. Prasanna

Advances in their technologies have positioned FPGAs and embedded processors to compete with digital signal processors (DSPs). In this paper, we evaluate the performance in terms of both latency and energy-efficiency of FPGAs, embedded processors, and DSPs in multiplying two n /spl times/ n matrices. As specific examples, we have chosen a representative of each type of device. Our results show that the FPGAs can multiply two n /spl times/ n matrices with both lower latency and lower energy consumption than the other two types of devices. This makes FPGAs the ideal choice for matrix multiplication in signal processing applications.


field-programmable logic and applications | 2003

Time and Energy Efficient Matrix Factorization Using FPGAs

Seonil Choi; Viktor K. Prasanna

In this paper, new algorithms and architectures for matrix factorization are presented. Two fully-parallel and block-based designs for LU decomposition on configurable devices are proposed. A linear array architecture is employed to minimize the usage of long interconnects, leading to lower energy dissipation. The designs are made scalable by using a fixed I/O bandwidth independent of the problem size. High level models for energy profiling are built and the energy performance of many possible designs is predicted. Through the analysis of design tradeoffs, the block size that minimizes the total energy dissipation is identified. A set of candidate designs was implemented on the Xilinx Virtex-II to verify the estimates. Also, the performance of our designs is compared with that of state-of-the-art DSP based designs and with the performance of designs obtained using a state-of-the-art commercial compilation tool such as Celoxica DK1. Our designs on the FPGAs are significantly more time and energy efficient in both cases.


The Journal of Supercomputing | 2003

Domain-Specific Modeling for Rapid Energy Estimation of Reconfigurable Architectures

Seonil Choi; Ju-wook Jang; Sumit Mohanty; Viktor K. Prasanna

Reconfigurable architectures such as FPGAs are flexible alternatives to DSPs or ASICs used in mobile devices for which energy is a key performance metric. Reconfigurable architectures offer several design parameters such as operating frequency, precision, amount of memory, degree of parallelism, etc. These parameters define a large design space that must be explored to find energy-efficient solutions. It is also challenging to predict the energy variation at the early design phases when a design is modified at algorithm level. Efficient traversal of such a large design space requires high-level modeling to facilitate rapid estimation of system-wide energy. However, FPGAs do not exhibit a high-level structure like, for example, a RISC processor for which high-level as well as low-level energy models are available. To address this scenario, we propose a domain-specific modeling technique for energy-efficient kernel design that exploits the knowledge of the algorithm and the target architecture family for a given kernel to develop a high-level model. This model captures architecture and algorithm features, parameters affecting energy performance, and power estimation functions based on these parameters. A system-wide energy function is derived based on the power functions and cycle specific power state of each building block of the architecture. This model is used to understand the impact of various parameters on system-wide energy and can be a basis for the design of energy-efficient algorithms. Our high-level model is used to quickly obtain fairly accurate estimate of the system-wide energy dissipation of data paths configured using FPGAs. We demonstrate our modeling methodology by applying it to four domains.


international workshop on computer architecture for machine perception | 1997

Parallel object recognition on an FPGA-based configurable computing platform

Yongwha Chung; Seonil Choi; Viktor K. Prasanna

Object recognition involves identifying known objects in a given scene. It plays a key role in image understanding. Geometric hashing has been proposed as a technique for model-based object recognition in occluded scenes. However, parallel techniques are needed to realize real-time vision systems employing geometric hashing. In this paper, we develop a design technique for parallelizing geometric hashing on an FPGA-based platform. We first transform the hash table which contains symbolic data into a bit-level representation. By regularizing the data flow and exploiting bit-level parallelism in hardware, our design achieves high performance. Using our approach, given a scene consisting of 256 feature points, a probe can be performed in 1.65 milliseconds on an FPGA-based platform having 32 Xilinx 4062s. In earlier implementations, the same probe operation was performed in 240 milliseconds on a 32K-node CM2 and in 382 milliseconds on a 32-node CM5. Also, the same operation takes 40 milliseconds on a 32-node IBM SP-2. By parameterizing the application and the device characteristics, we derive an area-time efficient design based on these parameters. Furthermore, our approach can be applied to many geometric hashing methods and is portable to other FPGA devices.


international conference on acoustics, speech, and signal processing | 2003

Energy-efficient and parameterized designs for fast Fourier transform on FPGAs

Seonil Choi; Gokul Govindu; Ju-wook Jang; Viktor K. Prasanna

We develop energy efficient designs for the fast Fourier transform (FFT) on FPGAs. Architectures for FFT on FPGAs are designed by investigating and applying techniques for minimizing the energy dissipation. Architectural parameters such as degrees of vertical and horizontal parallelism are identified and a design domain is created through a combination of design choices. We determine design trade-offs using high-level performance estimation to obtain energy-efficient designs. We implemented a set of parametrized designs having parallelism, radix and choice of storage types as parameters, on Xilinx Virtex-II FPGA to verify the estimates. Our designs dissipate 57% to 78% less energy than the optimized designs from the Xilinx library. In terms of a comprehensive metric such as EAT (energy-area-time), our designs offer performance improvements of 3/spl times/ to 13/spl times/ over the Xilinx designs.

Collaboration


Dive into the Seonil Choi's collaboration.

Top Co-Authors

Avatar

Viktor K. Prasanna

University of Southern California

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Gokul Govindu

University of Southern California

View shared research outputs
Top Co-Authors

Avatar

Sumit Mohanty

University of Southern California

View shared research outputs
Top Co-Authors

Avatar

Jingzhao Ou

University of Southern California

View shared research outputs
Top Co-Authors

Avatar

Ling Zhuo

University of Southern California

View shared research outputs
Top Co-Authors

Avatar

Ronald Scrofano

University of Southern California

View shared research outputs
Top Co-Authors

Avatar

Yongwha Chung

University of Southern California

View shared research outputs
Top Co-Authors

Avatar

Jongwoo Bae

University of Southern California

View shared research outputs
Top Co-Authors

Avatar

V. Sridhar

Indian Institute of Science

View shared research outputs
Researchain Logo
Decentralizing Knowledge