Hoseok Chang | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Hoseok Chang is active.

Explore More

Publication

Featured researches published by Hoseok Chang.

international symposium on circuits and systems | 2006

An FPGA based SIMD processor with a vector memory unit

Junho Cho; Hoseok Chang; Wonyong Sung

A SIMD processor that contains a 16-way partitioned data-path is designed for efficient multimedia data processing. In order to automatically align data needed for SIMD processing, the architecture adopts a vector memory unit that consists of 17-bank memory blocks. The vector memory unit also has address generation and rearrangement units for eliminating bank conflicts. The MicroBlaze FPGA based RISC processor is used for program control and scalar data processing. The architecture has been implemented on a Xilinx FPGA, and the implementation performance for several multimedia kernels is obtained

compilers, architecture, and synthesis for embedded systems | 2008

Efficient vectorization of SIMD programs with non-aligned and irregular data access hardware

Hoseok Chang; Wonyong Sung

Automatic vectorization of programs for partitioned-ALU SIMD (Single Instruction Multiple Data) processors has been difficult because of not only data dependency issues but also non-aligned and irregular data access problems. A non-aligned or irregular data access operation incurs many overhead cycles for data alignment. Moreover, this causes difficulty in efficient code generation and hinders automatic vectorization. In this paper, we employ special memory access hardware for improving the performance of SIMD processors; one is the split line buffer and the other is the packing buffer. The former solves the non-aligned memory access problem, while the latter simplifies irregular and stride data access. The addition of these hardware units not only requires very small changes to the instruction set architecture but also contributes to the significant performance improvement by vectorizing more loops and reducing the overhead cycles. We have also developed an auto-vectorization compiler which utilizes these special hardware units. Experiments have been conducted to compare the proposed method with the conventional one, which show 50% increase in the number of vectorized loops and 77% increase in the total performance of an MPEG2 encoder program.

signal processing systems | 2006

Performance Evaluation of an SIMD Architecture with a Multi-bank Vector Memory Unit

Hoseok Chang; Junho Cho; Wonyong Sung

The SIMD architecture is very efficient for multimedia data processing since it can handle multiple data with a single instruction. In order to perform an SIMD operation, data must be aligned in the vector register at first, which requires shuffle, pack or unpack instructions and such instructions can be an obstacle to the performance enhancement. The alignment restriction also hinders the efficient automatic vectorization in SIMD compilers. In this paper, an SIMD processor with a multi-bank vector memory unit is designed. The SIMD processor consists of a 2-way VLIW processor, an n-way SIMD co-processor, and an (n+1)-bank vector memory unit. The vector memory unit also includes the address generation logic. An SIMD compiler which exploits the vector memory unit is developed. Since the vector memory permits unaligned and stride accesses without overhead instructions, the developed compiler shows a quite good performance. The performance of an MPEG2 encoder that is optimized by the developed SIMD compiler is analyzed

signal processing systems | 2009

Compiler-Based Performance Evaluation of an SIMD Processor with a Multi-Bank Memory Unit

Hoseok Chang; Junho Cho; Wonyong Sung

The single instruction multiple data (SIMD) architecture is very efficient for executing arithmetic intensive programs, but frequently suffers from data-alignment problems. The data-alignment problem not only induces extra time overhead but also hinders automatic vectorization of the SIMD compiler. In this paper, we compare three on-chip memory systems, which are single-bank, multi-bank, and multi-port, for the SIMD architecture to resolve the data-alignment problems. The single-bank memory is the simplest, but supports only the aligned accesses. The multi-bank memory requires a little higher complexity, but enables the unaligned accesses and the stride accesses with a bank-conflict limitation. The multi-port memory is capable of both the unaligned and stride accesses without any restriction, but needs quite much expensive hardware. We also developed a vectorizing compiler that can conduct dynamic memory allocation and SIMD code generation. The performances of the three memory systems with our SIMD compiler are evaluated using several digital signal processing kernels and the MPEG2 encoder. The experimental results show that the multi-bank memory can carry out MPEG2 encoding 5.8 times faster, whereas the single-bank memory only achieves 2.9 times speed-up when employed in a multimedia system with a 2-issue host processor and an 8-way SIMD coprocessor. The multi-port memory obviously shows the best performance, which is however an impractical improvement over the multi-bank memory when the hardware cost is considered.

IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems | 2009

Access-Pattern-Aware On-Chip Memory Allocation for SIMD Processors

Hoseok Chang; Wonyong Sung

The number of cycles for each external memory access in Single Instruction Multiple Data (SIMD) processors is heavily affected by the access pattern, such as aligned, unaligned, or stride. We developed a high-performance dynamic on-chip memory-allocation method for SIMD processors by considering the memory access pattern as well as the access frequency. The access pattern and the access count for an array of a loop are determined by both code analysis and profiling, which are performed on a developed compiler framework. This framework not only conducts dynamic on-chip memory allocation but also generates optimized codes for a target processor. The proposed allocation method has been tested with several multimedia benchmarks including motion estimation, 2-D discrete cosine transform, and MPEG2 encoder programs.

signal processing systems | 2009

SIMD processor based implementation of recursive filtering equations

Jaewoo Ahn; Hoseok Chang; Junho Cho; Wonyong Sung

Implementation of recursive equations using parallel computer architecture has long been of interest because the dependency problem makes it difficult to achieve significant speed-up. In this paper, efficient implementation of recursive filtering equations on partitioned data-path SIMD (Single Instruction Multiple Data) processors is studied. Especially, three parallel computation techniques, which are the block filtering, recursive doubling, and multi-block filtering methods, are implemented and their performances are compared using a Pentium CPU based system. The performance evaluation result of the multi-block processing method on a scalable SIMD processor is also presented.

international conference on embedded computer systems architectures modeling and simulation | 2005

Compressed swapping for NAND flash memory based embedded systems

Sangduck Park; Hyunjin Lim; Hoseok Chang; Wonyong Sung

A swapping algorithm for NAND flash memory based embedded systems is developed by combining data compression and an improved page update method. The developed method allows efficient execution of a memory demanding or multiple applications without requiring a large size of main memory. It also helps enhancing the stability of a NAND flash file system by reducing the number of writes. The update algorithm is based on the CFLRU (Clean First LRU) method and employs some additional features such as selective compression and delayed swapping. The WKdm compression algorithm is used for software based compression while the LZO is used for hardware based implementation. The proposed method is implemented on an ARM9 CPU based Linux system and the performances in the execution of MPEG2 decoder, encoder, and gcc programs are measured and interpreted.

international symposium on circuits and systems | 2003

Optimization of power consumption for an ARM7-based multimedia handheld device

Hoseok Chang; Wonchul Lee; Wonyong Sung

We have developed a multimedia handheld educational device and optimized the current consumption not only by employing several software optimization techniques but also by using dynamic clock frequency scaling scheme (DFS). Although the ARM7 CPU employed does not support operating voltage scaling, the controlling of the operating frequency helps reducing the current consumption in the idle time and results in up to 25% of power reduction in the system level. The CPU operation frequency is determined by profiling the multimedia program components, which include LZW (Lempel-Ziv Welch) image decompression, MP3 audio decoding, CELP based speech decoding, speech recognition and ADPCM. Especially, it is shown that the time for LZW decompression is proportional to the image size rather than the size of the compressed file. The CPU load becomes almost full, between 80 to 95%, after applying the DFS.

signal processing systems | 2002

Speaking partner: an ARM7-based multimedia handheld device

Wonyong Sung; Hoseok Chang; Wonchul Lee; Suhong Ryu

We have implemented MP3 audio decoding, speech recognition, and image compression programs for a handheld foreign language learning device, Speaking Partner. The hardware is based on a relatively low-performance RISC CPU, ARM7TDMI. Several previously known software optimization techniques for RISC processors as well as a few algorithm specific optimization methods are employed. The number of clock cycles for the implementation is more significantly reduced by employing block data transfer instructions, rather than by reducing the accuracy of multiplication. The implementation results show that the ARM7 based CPU can conduct multimedia applications in a multi-tasking mode. The power consumption for each activity and each hardware component is also analyzed.

international conference on consumer electronics | 2005

A handheld English pronunciation evaluation device

Kisun You; Hoyoun Kim; Hoseok Chang; Juyup Lee; Wonyong Sung

A handheld device for practicing English pronunciation is developed by applying speech recognition technology. This machine helps correct the intonation by showing the pitch contour and assesses the pronunciation accuracy by computing the log-posterior probability of the segmented phonemes. Not only the phoneme replacement errors but also deletion and insertion mistakes are checked. It is also possible to find out the Korean phoneme replacement errors because this algorithm employs both English and Korean phoneme sets. This handheld device is implemented using an ARM7 CPU and equips a graphic LCD.

Explore More