Chih-Chieh Hsiao | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Chih-Chieh Hsiao is active.

Explore More

Publication

Featured researches published by Chih-Chieh Hsiao.

high performance computing and communications | 2010

OpenCL: Make Ubiquitous Supercomputing Possible

Slo-Li Chu; Chih-Chieh Hsiao

Due to the dramatic requirements of 3D games and applications, graphics processing unit (GPU) or general-purpose graphics processing unit (GPGPU) have become required components in the modern computer systems. While these devices enable high parallelism with huge amount of processing elements, the utilization of their capabilities in general scientific applications are still low due to their difficult programming paradigms. Therefore an open standard, OpenCL, is proposed to provide universal APIs and programming paradigms for various GPUs and accelerators. In this study, it adopts several benchmarks, with various computation characteristics, to demonstrate the capabilities of OpenCL with several platforms. These programs are parallelized by OpenMP and OpenCL, and then targeted on several GPUs and conventional servers. This paper also provides an example to illustrate the migration of the given program, from OpenMP to OpenCL. The presented experimental results show that these inexpensive GPUs will lead better performance than servers if adopt OpenCL paradigms. It will be the preliminary milestone of cheap supercomputing by the acceleration of GPUs that can be obtained ubiquitously.

embedded and ubiquitous computing | 2011

An Energy-Efficient Unified Register File for Mobile GPUs

Slo-Li Chu; Chih-Chieh Hsiao; Chiu-Cheng Hsieh

The programmability of mobile GPUs have raised in recently years, where the shaders inside are instructed by shading programs for realistic 3D effects. The register files for a conventional high throughput multithreaded shader consumes 10% to 20% energy of it. However the register usages of shading program are quite low. In order to reduce the dynamic energy for register file in a multithreaded mobile shader, this paper proposed an unified register file design to reduce both dynamic and leakage energy of it. The result shows that proposed design reduces 85% of dynamic energy in a multithreaded register file. Furthermore, the proposed design reduces 59% of leakage energy and 25% of area with negligible performance degradation. Also, the energy savings in proposed designs are at least 75% more than related work.

IEEE Transactions on Multimedia | 2014

An Adaptive Thread Scheduling Mechanism With Low-Power Register File for Mobile GPUs

Chih-Chieh Hsiao; Slo-Li Chu; Chiu-Cheng Hsieh

In response to the remarkable increase in 3D applications in consumer electronics devices in recent years, graphics processing units (GPUs) have become widely available on mobile devices. These GPUs typically use hardware multithreaded shaders to improve their throughputs for real-time rendering, but they depend on duplicate register files to maintain the context of each hardware thread, increasing power consumption. However, the register usage of shading programs is often relatively low, which causes many registers to remain unused, thus wasting power. Long latency memory operations can also consume unnecessary power to activate registers. This study proposes a low-power register file with multiple power modes to reduce the power consumption of the register file. This study also presents an adaptive thread scheduling mechanism to achieve a tradeoff between the power consumption of the register file and frames per second (FPS). Results show that the average performance degradation from the proposed low-power register file is only 0.62%. The proposed adaptive thread scheduling has average under prediction ratio of 3.32%. The leakage reduction of the proposed low-power register file is 74.80%. This reduction can be improved to 81.49%, 82.22%, and 84.28% with adaptive thread scheduling at frame rates of 30, 25, and 20, respectively.

Computers & Graphics | 2013

Energy-aware hybrid precision selection framework for mobile GPUs

Chih-Chieh Hsiao; Slo-Li Chu; Chen-Yu Chen

As 3D applications in mobile devices have become increasingly popular, mobile GPUs have become one of their most essential components. Because the lifetime of these devices is generally battery-limited, the tradeoff between energy consumption and user experience has become an important issue. Conventional mechanisms include the use of fixed-point and reducing the precision of floating-point to reduce the energy consumption of the shader in a mobile GPU. A fixed-point has a narrower numerical range than a floating-point, but is faster and more energy-efficient. However, reduced precision floating-point has a wider numerical range but consumes more energy. In this work, an Energy-aware Hybrid Precision Selection (EHPS) framework is proposed to integrate the above mechanisms with a profile-based precision selection mechanism to maximize energy savings. In addition, a built-in energy model is used to evaluate whether fixed-point or reduced floating-point is more energy-efficient for the current application. The more energy-efficient option will be used to render the current application to save more energy. The results reveal that the proposed EHPS framework reduces the energy consumed by the shader by an average of 33.66% and 31.63% in the low and high-quality modes, respectively. The average PSNRs of the resulting images are 26.89dB and 45.94dB in these two rendering modes, respectively. The proposed EHPS framework yields a better image quality and uses less energy than related works. Graphical abstractDisplay Omitted An energy-aware hybrid rendering management scheme with fixed point and reduced floating point systems.An automatic precision selection mechanism for both vertex and fragment shading during run-time.A runtime energy and precision evaluation system for determining feasible number system to render current application.

embedded and ubiquitous computing | 2011

A Dual-Mode Unified Shader with Frame-Based Dynamic Precision Adjustment for Mobile GPUs

Slo-Li Chu; Chih-Chieh Hsiao; Chen-Yu Chen

In order to extend the life for battery driven mobile devices and maintain image quality, this paper presents a dual-mode unified shader for mobile GPUs, which consists of floating-point and fixed-point SIMD shader, for high quality or energy-saving rendering. Furthermore, in order to increase the image quality in fixed-point rendering, this paper proposes a frame-based dynamic precision adjustment scheme to select appropriate precision for different 3D scenes. The proposed design has following characteristics: I) high quality rendering with floating-point and fixed-point rendering for energy saving, II) a frame-based dynamic precision adjustment scheme to select appropriate precision for given scene, III) a workload-based scene change detection mechanism to re-select precision in time. Furthermore, this paper presents side by side comparison on performance, power and image quality between floating-point and fixed-point rendering in real world 3D games. The results of proposed shader in real world 3D games have 48.6% reduction in dynamic power and 33% faster in thread execution for a shader under energy saving mode in average. Furthermore, the rendered image qualities under proposed dynamic precision are insensitive to human eyes and the PSNR outperform related work for 2.37% in average. This reveals a way to use conventional fixed-point with dynamic precision to implement low power unified shader with quality rendering for such power limited devices.

IEICE Electronics Express | 2012

Demand-driven register file for multithreaded mobile GPUs

Slo-Li Chu; Chih-Chieh Hsiao; Chiu-Cheng Hsieh

Mobile GPUs are used in modern portable devices to satisfy the growing requirements of 3D applications. These GPUs generally integrate hardware multithreaded shaders to improve the throughput for real-time rendering, but they depend on duplicate register files to maintain the context of each hardware thread. This work develops a demand-driven register file (DDRF) to reduce the power consumption by register files. The proposed DDRF is shared on demand among concurrent threads and turns off almost all unused registers. Experimental results reveal the DDRF uses 85.8% less power than a conventional multithreaded GPU. The chip area and circuit latency of DDRF are also discussed.

Journal of Circuits, Systems, and Computers | 2014

AN ENERGY-EFFICIENT DEMAND-DRIVEN REGISTER FILE FOR MOBILE GPUs

Chih-Chieh Hsiao; Chiu-Cheng Hsieh; Slo-Li Chu

In response to the remarkable increase in 3D applications in mobile devices in recent years, mobile GPUs have become widely available. Although the computation requirements are tremendous of 3D applications, they are highly data parallel operations. Therefore, mobile GPUs are usually hardware multithreaded to increase their throughput and achieve real-time rendering. This design increases energy consumption by duplicate register files in shaders. However, the register usage of shading programs is often relatively low, which causes many duplicated registers to go unused, and thus wastes energy. In addition, long latency memory operations can consume unnecessary energy to activate registers as well. This study proposes a compiler-assisted energy-efficient demand-driven register file (EDRF) to reduce energy consumptions of registers that are unused and waiting for long latency memory operations. The proposed EDRF is shared on demand between concurrent threads with multiple power gating modes. The management ...

Computers & Electrical Engineering | 2013

Program-based dynamic precision selection framework with a dual-mode unified shader for mobile GPUs

Slo-Li Chu; Chih-Chieh Hsiao; Chen-Yu Chen

To extend the life of battery-driven mobile devices while maintaining image quality, this work proposes a Program-based Dynamic Precision Selection (PDPS) framework with a dual-mode unified shader. Since fixed-point arithmetic can be performed faster and more energy-efficiently than floating-point arithmetic on power-limited devices, the use of fixed-point rather than floating-point rendering is a critical concern. The proposed PDPS framework is composed of a runtime profile-based mechanism for automatically determining the precision of each shading program in fixed-point arithmetic. Additionally, a scene change detection mechanism is developed to recalculate the rendering precision whenever a 3D scene changes. The results reveal an average 18% reduction in energy and 35% faster performance under fixed-point rendering. The degradation in rendered image quality under the proposed PDPS cannot be detected by the naked eye, and the PSNR is an average of 15% better than that achieved using related approach.

international symposium on parallel architectures, algorithms and programming | 2012

A Hierarchical Triangle-Level Culling Technique for Tile-Based Rendering

Chih-Chieh Hsiao; Slo-Li Chu

Current 3D graphics rendering relies on tens of thousand triangles to generate realistic images on screen. As the number of triangles increases, no specific order for their input sequence exists, thus, many triangles that do not contribute to a final image must still be rasterized and shaded. This study proposes a triangle-level hierarchical culling technique for tile-based rendering. With a novel hardware efficient technique focuses on the depth and coverage relationships among triangles, the invisible triangles can now be culled right after geometry stage. Intended advantages include: cull invisible triangles earlier, reduce storage pressure, reduce triangle and list data accesses (from external memory) during rendering. The results show the proposed mechanism culls 32.99% of triangles before rasterization. In addition, about 15% of storage requirements and external memory transfer are reduced as well.

ieee international conference on high performance computing data and analytics | 2012

Optimizing Techniques for OpenCL Programs on Heterogeneous Platforms

Slo-Li Chu; Chih-Chieh Hsiao

Heterogeneous platforms that are consisted of CPU and add-on streaming processors are widely used in modern computer systems. These add-on processors provide substantially more computation capability and memory bandwidth than conventional multi-cores platforms. General-purpose computations can also be leveraged onto these add-on processors. In order to utilize their potential performance, programming these streaming processors is challenging because of their diverse underlying architectural characteristics. Several optimization techniques are applied on OpenCL-compatible heterogeneous platforms to achieve thread-level, data-level, and instruction-level parallelism. The architectural implications of these techniques and optimization principles are discussed. Finally, a case study of MRI-Q benchmark will be addressed to illustrate to capabilities of these optimization techniques. The experimental results reveal the speedup from non-optimized to optimized kernel can vary from 8 to 63 on different target platforms.

Explore More