Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Ju-wook Jang is active.

Publication


Featured researches published by Ju-wook Jang.


field programmable gate arrays | 2003

Energy-efficient signal processing using FPGAs

Seonil Choi; Ronald Scrofano; Viktor K. Prasanna; Ju-wook Jang

In this paper, we present techniques for energy-efficient design at the algorithm level using FPGAs. We then use these techniques to create energy-efficient designs for two signal processing kernel applications: fast Fourier transform (FFT) and matrix multiplication. We evaluate the performance, in terms of both latency and energy efficiency, of FPGAs in performing these tasks. Using a Xilinx Virtex-II as the target FPGA, we compare the performance of our designs to those from the Xilinx library as well as to conventional algorithms run on the PowerPC core embedded in the Virtex-II Pro and the Texas Instruments TMS320C6415. Our evaluations are done both through estimation based on energy and latency equations and through low-level simulation. For FFT, our designs dissipated an average of 60% less energy than the design from the Xilinx library and 56% less than the DSP. Our designs showed a factor of 10 improvement over the embedded processor. These results provide concrete evidence to substantiate the idea that FPGAs can outperform DSPs and embedded processors in signal processing. Further, they show that FPGAs can achieve this performance while still dissipating less energy than the other two types of devices.


field-programmable technology | 2002

Area and time efficient implementations of matrix multiplication on FPGAs

Ju-wook Jang; Seonil Choi; Viktor K. Prasanna

We develop new algorithms and architectures for matrix multiplication on configurable hardware. These designs significantly reduce the latency as well as the area. Our designs improve the previous designs in terms of the area/speed metric where the speed denotes the maximum achievable running frequency. The area/speed metrics for the previous designs and our design are 14.45, 4.93, and 2.35, respectively, for 4 /spl times/ 4 matrix multiplication. The latency of one of the previous design is 0.57 /spl mu/s, while our design takes 0.15 /spl mu/s using 18% less area. The area of our designs is smaller by 11% - 46% compared with the best known systolic designs with the same latency for the matrices of sizes 3 /spl times/ 3 - 12 /spl times/ 12. The performance improvements tend to grow with the problem size.


IEEE Transactions on Parallel and Distributed Systems | 1997

Constant time algorithms for computational geometry on the reconfigurable mesh

Ju-wook Jang; Madhusudan Nigam; Viktor K. Prasanna; Sartaj Sahni

The reconfigurable mesh consists of an array of processors interconnected by a reconfigurable bus system. The bus system can be used to dynamically obtain various interconnection patterns among the processors. Recently, this model has attracted a lot of attention. The authors show O(1) time solutions to the following computational geometry problems on the reconfigurable mesh: all-pairs nearest neighbors, convex hull, triangulation, two-dimensional maxima, two-set dominance counting, and smallest enclosing box. All these solutions accept N planar points as input and employ an N/spl times/N reconfigurable mesh. The basic scheme employed in the implementations is to recursively find an O(1) time solution. The number of recursion levels and the size of the subproblems at each level of recursion are optimized such that the problem decomposition and the solution to the problem can be obtained in constant time. As a result, they have developed some efficient merge techniques to combine the solutions for subproblems on the reconfigurable mesh. These techniques exploit reconfigurability in nontrivial ways leading to constant time solutions using optimal size of the mesh.


field programmable logic and applications | 2002

Energy-Efficient Matrix Multiplication on FPGAs

Ju-wook Jang; Seonil Choi; Viktor K. Prasanna

We develop new algorithms and architectures for matrix multiplication on configurable devices. These designs significantly reduce the energy dissipation and latency compared with the state-of-the-art FPGA-based designs. We derive functions to represent the impact of algorithmic level design choices on the system-wide energy dissipation, latency, and area by capturing algorithm and architecture details including features of the target FPGA. The functions are used to optimize energy performance under latency and area constraints for a family of candidate algorithms and architectures. As a result, our designs improve the energy performance of the optimized design from the recent Xilinx library by 32% to 88% without any increase in area-latency product. In terms of comprehensive metrics such as EAT (Energy-Area-Time) and E/AT (Energy/Area-Time), our designs offer superior performance compared with the Xilinx design by 50%-79% and 13%-44%, respectively. We also address how to exploit further increases in density of future FPGA devices for asymptotic improvement in latency and energy dissipation for multiplication of larger size matrices.


The Journal of Supercomputing | 2003

Domain-Specific Modeling for Rapid Energy Estimation of Reconfigurable Architectures

Seonil Choi; Ju-wook Jang; Sumit Mohanty; Viktor K. Prasanna

Reconfigurable architectures such as FPGAs are flexible alternatives to DSPs or ASICs used in mobile devices for which energy is a key performance metric. Reconfigurable architectures offer several design parameters such as operating frequency, precision, amount of memory, degree of parallelism, etc. These parameters define a large design space that must be explored to find energy-efficient solutions. It is also challenging to predict the energy variation at the early design phases when a design is modified at algorithm level. Efficient traversal of such a large design space requires high-level modeling to facilitate rapid estimation of system-wide energy. However, FPGAs do not exhibit a high-level structure like, for example, a RISC processor for which high-level as well as low-level energy models are available. To address this scenario, we propose a domain-specific modeling technique for energy-efficient kernel design that exploits the knowledge of the algorithm and the target architecture family for a given kernel to develop a high-level model. This model captures architecture and algorithm features, parameters affecting energy performance, and power estimation functions based on these parameters. A system-wide energy function is derived based on the power functions and cycle specific power state of each building block of the architecture. This model is used to understand the impact of various parameters on system-wide energy and can be a basis for the design of energy-efficient algorithms. Our high-level model is used to quickly obtain fairly accurate estimate of the system-wide energy dissipation of data paths configured using FPGAs. We demonstrate our modeling methodology by applying it to four domains.


international parallel processing symposium | 1999

An efficient dynamic load balancing using the dimension exchange method for balancing of quantized loads on hypercube multiprocessors

Hwakyung Rim; Ju-wook Jang; Sung-Chun Kim

Dynamic load balancing on hypercube multiprocessors is considered with emphasis on quantized loads. Quantized loads are divisible only in a fixed size. First, we show that a direct application of the well-known Dimension Exchange Method (DEM) to quantized loads may result in difference in assigned loads to processors as large as log N units after balancing for a hypercube of size N. Then we propose a new method which reduces the maximum difference by half to [1/2 log N]. The claim is proved both by analysis of possible cases incrementing the difference on each phase of balancing and by enumerating all possible combination of load for hypercubes of limited sizes using a computer. To estimate the accumulated effect of balancing instances under real-world parallel processing environment, a simulation for hypercube multiprocessors using SLAM II tool is performed. The result shows about 30% improvement in speedup which results from reduced processing time, which in turn results from reduced nonuniformity.


IEEE Transactions on Parallel and Distributed Systems | 1997

An optimal multiplication algorithm on reconfigurable mesh

Ju-wook Jang; Heonchul Park; Viktor K. Prasanna

An O(1) time algorithm to multiply two K-bit binary numbers using an N/spl times/N bit-model of reconfigurable mesh is shown. It uses optimal mesh size and it improves previously known results for multiplication on the reconfigurable mesh. The result is obtained by using novel techniques for data representation and data movement and using multidimensional Rader Transform. The algorithm is extended to result in AT/sup 2/ optimality over 1/spl les/t/spl les//spl radic/N in a variant of the bit-model of VLSI.


international conference on acoustics, speech, and signal processing | 2003

Energy-efficient and parameterized designs for fast Fourier transform on FPGAs

Seonil Choi; Gokul Govindu; Ju-wook Jang; Viktor K. Prasanna

We develop energy efficient designs for the fast Fourier transform (FFT) on FPGAs. Architectures for FFT on FPGAs are designed by investigating and applying techniques for minimizing the energy dissipation. Architectural parameters such as degrees of vertical and horizontal parallelism are identified and a design domain is created through a combination of design choices. We determine design trade-offs using high-level performance estimation to obtain energy-efficient designs. We implemented a set of parametrized designs having parallelism, radix and choice of storage types as parameters, on Xilinx Virtex-II FPGA to verify the estimates. Our designs dissipate 57% to 78% less energy than the optimized designs from the Xilinx library. In terms of a comprehensive metric such as EAT (energy-area-time), our designs offer performance improvements of 3/spl times/ to 13/spl times/ over the Xilinx designs.


application-specific systems, architectures, and processors | 2002

A model-based methodology for application specific energy efficient data path design using FPGAs

Sumit Mohanty; Seonil Choi; Ju-wook Jang; Viktor K. Prasanna

Presents a methodology to design energy-efficient data paths using FPGAs. Our methodology integrates domain specific modeling, coarse-grained performance evaluation, design space exploration, and low level simulation to understand the tradeoffs between energy, latency, and area. The domain specific modeling technique defines a high-level model by identifying various components and parameters specific to a domain that affect the system-wide energy dissipation. A domain is a family of architectures and corresponding algorithms for a given application kernel. The high-level model also consists of functions for estimating energy, latency, and area that facilitate tradeoff analysis. Design space exploration (DSE) analyzes the design space defined by the domain and selects a set of designs. Low-level simulations are used for accurate performance estimation for the designs selected by the DSE and also for final design selection. We illustrate our methodology using a family of architectures and algorithms for matrix multiplication. The designs identified by our methodology demonstrate tradeoffs among energy, latency, and area.


Reconfigurable technology : FPGAs and reconfigurable processors for computing and communications. Conference | 2002

Minimizing energy dissipation of matrix multiplication kernel on Virtex-II

Seonil Choi; Viktor K. Prasanna; Ju-wook Jang

In this paper, we develop energy-efficient designs for matrix multiplication on FPGAs. To analyze the energy dissipation, we develop a high-level model using domain-specific modeling techniques. In this model, we identify architecture parameters that significantly affect the total energy (system-wide energy) dissipation. Then, we explore design trade-offs by varying these parameters to minimize the system-wide energy. For matrix multiplication, we consider a uniprocessor architecture and a linear array architecture to develop energy-efficient designs. For the uniprocessor architecture, the cache size is a parameter that affects the I/O complexity and the system-wide energy. For the linear array architecture, the amount of storage per processing element is a parameter affecting the system-wide energy. By using maximum amount of storage per processing element and minimum number of multipliers, we obtain a design that minimizes the system-wide energy. We develop several energy-efficient designs for matrix multiplication. For example, for 6×6 matrix multiplication, energy savings of upto 52% for the uniprocessor architecture and 36% for the linear arrary architecture is achieved over an optimized library for Virtex-II FPGA from Xilinx.

Collaboration


Dive into the Ju-wook Jang's collaboration.

Top Co-Authors

Avatar

Viktor K. Prasanna

University of Southern California

View shared research outputs
Top Co-Authors

Avatar

Seonil Choi

University of Southern California

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Researchain Logo
Decentralizing Knowledge