Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Byeong-Gyu Nam is active.

Publication


Featured researches published by Byeong-Gyu Nam.


international solid-state circuits conference | 2003

A 210-mW graphics LSI implementing full 3-D pipeline with 264 mtexels/s texturing for mobile multimedia applications

Ramchan Woo; Sungdae Choi; Ju-Ho Sohn; Seong-Jun Song; Young-Don Bae; Chi-Weon Yoon; Byeong-Gyu Nam; Jeong-Ho Woo; Sung-Eun Kim; In-Cheol Park; Sungwon Shin; Kyung-Dong Yoo; Jin-Yong Chung; Hoi-Jun Yoo

A 121 mm/sup 2/ graphics LSI is for portable 2D/3D graphics and MPEG4 applications. The LSI contains a RISC processor with MAC, a 3D rendering engine, 29Mb DRAM and is built in a 0.16/spl mu/m pure DRAM technology. Programmable clocking allows the LSI to operate in several power modes for various applications. In lower cost mode, power consumption is under 210mW, delivering 264M texture mapped pixels per second.


asian solid state circuits conference | 2005

A 231-MHz, 2.18-mW 32-bit Logarithmic Arithmetic Unit for Fixed-Point 3-D Graphics System

Hyejung Kim; Byeong-Gyu Nam; Ju-Ho Sohn; Jeong-Ho Woo; Hoi-Jun Yoo

A 32-bit fixed-point logarithmic arithmetic unit is proposed for the possible application to mobile three-dimensional (3-D) graphics system. The proposed logarithmic arithmetic unit performs division, reciprocal, square-root, reciprocal-square-root and square operations in two clock cycles and powering operation in four clock cycles. It can program its number range for accurate computation flexibility of 3-D graphics pipeline and eight -region piecewise linear approximation model for logarithmic and antilogarithmic conversion to reduce the operation error under 0.2%. Its test chip is implemented by 1-poly 6-metal 0.18-mum CMOS technology with 9-k gates. It operates at the maximum frequency of 231 MHz and consumes 2.18 mW at 1.8-V supply


international solid-state circuits conference | 2003

An 800MHz star-connected on-chip network for application to systems on a chip

Se-Joong Lee; Seong-Jun Song; Kangmin Lee; Jeong-Ho Woo; Sung-Eun Kim; Byeong-Gyu Nam; Hoi-Jun Yoo

A 10.8/spl times/6.0mm/sup 2/ prototype chip is implemented with a star-connected on-chip network. The chip consists of a PLL, 1KB SRAM, two 2/spl times/2 crossbar switches, Up/Down-Samplers, two off-chip gateways, and synchronizers. The on-chip network contains 81k transistors, dissipates 264mW at 2.3V and 800MHz, and provides 1.6GB/s per port and 12.8GB/s aggregated bandwidth, supporting plesiochronous communication without global synchronization.


IEEE Transactions on Computers | 2008

Power and Area-Efficient Unified Computation of Vector and Elementary Functions for Handheld 3D Graphics Systems

Byeong-Gyu Nam; Hyejung Kim; Hoi-Jun Yoo

A unified computation method of vector and elementary functions is proposed for handheld 3D graphics systems. It unifies vector operations like vector multiply, multiply-and-add, divide, divide-by-square-root, and dot product and elementary functions like trigonometric, inverse trigonometric, hyperbolic, inverse hyperbolic, power (xy with two variables), and logarithm to an arbitrary base into a single four-way arithmetic platform. A number system called the fixed-point hybrid number system (FXP-HNS), which combines the fixed-point number system (FXP) and the logarithmic number system (LNS), is proposed for the power and area-efficient unification. Power and area-efficient logarithmic and antilogarithmic conversion schemes are also proposed for the data conversions between fixed-point and logarithmic numbers in the FXP-HNS and achieve 0.41 percent and 0.08 percent maximum conversion errors, respectively. The unified arithmetic unit based on the proposed schemes is presented with less than 6.3 percent operation error. Its fully pipelined architecture achieves single-cycle throughput with maximum four-cycle latency for all of the supported operations. Comparison results show that the proposed arithmetic unit achieves 30 percent power and 10.9 percent area reductions and runs two times faster than the previous approach.


international solid-state circuits conference | 2007

A 52.4mW 3D Graphics Processor with 141Mvertices/s Vertex Shader and 3 Power Domains of Dynamic Voltage and Frequency Scaling

Byeong-Gyu Nam; Jeabin Lee; Kwanho Kim; Seungjin Lee; Hoi-Jun Yoo

A 3D graphics processor fabricated using 0.18mum 6M CMOS contains 1.57M transistors and 29kB SRAM in a core size of 17.2mm2. The vertex shader utilizes a logarithmic number system for 141 Mvertices/s and the 3 power domains are controlled separately by dynamic voltage and frequency scaling for 52.4mW at 60fps.


IEEE Journal of Solid-state Circuits | 2009

An Embedded Stream Processor Core Based on Logarithmic Arithmetic for a Low-Power 3-D Graphics SoC

Byeong-Gyu Nam; Hoi-Jun Yoo

A low-power and high-performance 4-way 32-bit stream processor core is developed for handheld low-power 3-D graphics systems. It contains a floating-point unified matrix, vector, and elementary function unit. By exploiting the logarithmic arithmetic and the proposed adaptive number conversion scheme, a 4-way arithmetic unit achieves a single-cycle throughput for all these operations except for the matrix-vector multiplication that takes 2 cycles per result, which were 4 cycles in conventional way. The processor featured by this functional unit and several proposed architectural schemes including embedded register index calculations, functional unit reconfiguration, and operand forwarding in logarithmic domain achieves 19.1% cycle count reduction for OpenGL transformation and lighting (TnL) operation from the latest work. The proposed stream processor core is integrated into a 3-D graphics SoC as a vertex shader to show its effectiveness. The entire SoC is fabricated into a test chip using 1-poly 6-metal 0.18 mum CMOS technology. The 17.2 mm2 chip contains 1.57 M transistors and 29 kB SRAM. The stream processor core takes 9.7 mm2 and dissipates 86.8 mW at 200 MHz operating frequency. It shows a peak performance of 141 Mvertices/s for geometry transformation (TFM) and achieves 17.5% performance improvement and 44.7% and 39.4% power and area reductions for the TFM from the latest work. For power management of the SoC, the chip is divided into the triple power domains separately controlled by dynamic voltage and frequency scaling (DVFS). With this scheme, it shows 52.4 mW power consumption at 60 fps, 50.5% power reduction from the latest work.


asian solid state circuits conference | 2007

Dynamic Voltage and Frequency Scaling (DVFS) scheme for multi-domains power management

Jeabin Lee; Byeong-Gyu Nam; Hoi-Jun Yoo

The power of 3 different power domains is managed by continuous co-locking of voltage and clock, dynamically varying clock frequency and supply voltage level from 90 MHz to 200 MHz and from 1.0 V to 1.8 V, respectively. A test 3D-graphics SoC is divided into 3 power domains and their power are managed separately. The workload of each domain is the control parameter to each power management unit (PMU). It takes 0.45 mm2 with 0.18 um CMOS process and consumes 5 mW. Total SoC takes 17.2 mm2 and consumes 52.4 mW at full operation with triple domains power management.


Journal of Semiconductor Technology and Science | 2015

A Reconfigurable Lighting Engine for Mobile GPU Shaders

Jonghun Ahn; Seongrim Choi; Byeong-Gyu Nam

A reconfigurable lighting engine for widely used lighting models is proposed for low-power GPU shaders. Conventionally, lighting operations that involve many complex arithmetic operations were calculated by the shader programs on the GPU, which led to a significant energy overhead. In this letter, we propose a lighting engine to improve the energy-efficiency by supporting the widely used advanced lighting models in hardware. It supports the Blinn-Phong, Oren-Nayar, and Cook-Torrance models, by exploiting the logarithmic arithmetic and optimizing the trigonometric function evaluations for the energy-efficiency. Experimental results demonstrate 12.7%, 42.5%, and 35.5% reductions in terms of power-delay product from the shader program implementations for each lighting model. Moreover, our work shows 10.1% higher energy-efficiency for the Blinn-Phong model compared to the prior art.


IEEE Transactions on Consumer Electronics | 2005

Development of a 3-D graphics rendering engine with lighting acceleration for handheld multimedia systems

Byeong-Gyu Nam; Min-Wuk Lee; Hoi-Jun Yoo

Low-power three-dimensional (3-D) graphics rendering engine with lighting acceleration is designed and implemented for handheld multimedia terminals. The lighting unit is hardware implemented and integrated into the chip for the low-power acceleration of the 3D graphics applications. We adopt the following three steps to handle the memory bandwidth problem for rendering operations. I) We find bilinear MIPMAP is the best texture filtering algorithm for handheld systems based on our developed energy-efficiency metric. With this observation, we adopt bilinear MIPMAP for our texture filtering unit, which requires only 50% of texture memory bandwidth compared with trilinear MIPMAP filtering. II) We put the depth test operation into the earlier stage of the graphics pipeline, which eliminates texture memory accesses for invisible pixels. III) We develop a power-efficient small cache system as the interface to rendering memory. The accelerator takes 181 K gates and the performance reaches 20 Mpixels/s. A test chip is implemented with 1-poly 6-metal 0.18 /spl mu/m CMOS technology. It operates at the frequency of 20 MHz with 14.7 mW power consumption.


custom integrated circuits conference | 2006

A Low-Power Unified Arithmetic Unit for Programmable Handheld 3-D Graphics Systems

Byeong-Gyu Nam; Hyejung Kim; Hoi-Jun Yoo

A low-power, area-efficient four-way 32-bit multifunction arithmetic unit has been developed for programmable shaders for handheld 3D graphics systems. It adopts the logarithmic number system (LNS) at the arithmetic core for the single-cycle throughput and the small-size low-power unification of various complicated arithmetic operations such as power, logarithm, trigonometric functions, vector-SIMD multiplication, division, square root and vector dot product. 24-region and 16-region piecewise linear logarithmic and antilogarithmic converters are proposed with 0.8% and 0.02% maximum conversion error, respectively. All the supported operations are implemented with less than 6.3% operation error and unified into a single arithmetic platform with maximum four-cycle latency and single-cycle throughput. A 93 K gate test chip is fabricated using one-poly five-metal 0.18-mum CMOS technology. It operates at 210 MHz with maximum power consumption of 15.3 mW at 1.8 V.

Collaboration


Dive into the Byeong-Gyu Nam's collaboration.

Top Co-Authors

Avatar

Seongrim Choi

Chungnam National University

View shared research outputs
Top Co-Authors

Avatar

Hyejung Kim

Katholieke Universiteit Leuven

View shared research outputs
Researchain Logo
Decentralizing Knowledge