Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Chi-Bang Kuan is active.

Publication


Featured researches published by Chi-Bang Kuan.


international conference on parallel processing | 2012

Enabling an OpenCL Compiler for Embedded Multicore DSP Systems

Jia-Jhe Li; Chi-Bang Kuan; Tung-Yu Wu; Jenq Kuen Lee

OpenCL is an industrys attempt to unify heterogeneous multicore programming. With its programming model defining SPMD kernels, vector types, and address space qualifiers, OpenCL allows programmers to exploit data parallelism with multicore processors and SIMD instructions as well as data locality with memory hierarchy. Recently, OpenCL has gained success on many architectures, including multicore CPUs, GPUs, vector processors, embedded systems with application-specific processors, and even FPGAs. However, how to support OpenCL for embedded multicore DSP systems remains unaddressed. In this paper, we illustrate our OpenCL support for embedded multicore DSP systems. Our target platform consists of one MPU and a DSP subsystem with multiple DSPs. The DSPs we address are VLIW processors with clustered functional units and distributed register files. To generate efficient code for such DSPs, compilers are required to consider irregular register file access in many optimization phases. To utilize the DSPs with distributed register files, we propose a cluster-aware work-item dispatching scheme to vectorize OpenCL kernels and assign independent workload to clusters of a DSP. In addition, we also incorporate several optimizations to enable efficient DSP code generation. In our experiments, we employ a set of OpenCL benchmark programs to evaluate the effectiveness of our OpenCL support. The experiments are conducted on a DSP cycle-accurate simulator and a multicore evaluation board. We report average 29% performance improvement with our vectorization scheme and a near 2-fold speedup with two DSPs compared with a single-MPU setup.


international conference on hardware/software codesign and system synthesis | 2010

Power aware SID-based simulator for embedded multicore DSP subsystems

Cheng-Yen Lin; Po-Yu Chen; C. L. Tseng; Chung-Wen Huang; Chia-Chieh Weng; Chi-Bang Kuan; Shih-Han Lin; Shi-Yu Huang; Jenq Kuen Lee

The embedded multicore DSP systems are playing increasingly important role for consumer electronic design. Such systems try to optimize the objective for both performance and power with mobile devices. Embedded application developers will then devise designs to optimize embedded applications for not only performance but also power. However, currently there are no power metrics support for popular application design platforms such as QEMU and SID, where application developers develop their applications. This hinders application developers to help tune optimizations for power. In this paper, we propose a power aware simulation framework on embedded multicore DSP subsystems for SID framework. To the best of our knowledge, this is the first work to attempt to build a power aware simulator based on SID simulation framework. The power estimation flow includes two phases, IP level power modeling and system level power prower profiling. In the IP level power modeling, PowerMixerIP is employed to build up the power model for PAC DSP and major IPs. In the system level power profiling, we provide a power profiling hierarchy that meets the demand of embedded software developers. The granularity of power profiling can be configured to the whole simulation stage or any specific time slot in the simulation such as a dedicated function loop. In our experiments, DSP programs with SIMD intrinsics for DSPStone benchmark are examined with our proposed power aware simulator. In addition, a face detection application is deployed as a running example on multi-core DSP systems to show how our power simulator can be used to help collaborate with developers in the optimization process to illustrate views of power dissipations of applications.


embedded systems for real time multimedia | 2011

Support of software framework for embedded multi-core systems with Android environments

Yu-Hao Chang; Chi-Bang Kuan; Cheng-Yen Lin; Te-Feng Su; Chun-Ta Chen; Jyh-Shing Roger Jang; Shang-Hong Lai; Jenq Kuen Lee

Applications on mobile devices are getting more complicated with the new wave of applications in the mobile devices. The computing power for embedded devices are increased with such trends, and embedded multi-core platform are in a position to help boost system performance. Software frameworks integrated the multi-core platforms are often needed to help boost the system performance and reduce programming complexity. In this paper, we present a software framework based on Android and multi-core embedded systems. In the framework, we integrate the compiler toolkit chain for multi-core programming environment which includes DSP C/C++ compilers, streaming RPC programming model, debugger, ESL simulator, and power management models. We also develop software framework for face detection, voice recognition, and mobile streaming management. Those frameworks are designed as multi-core programs and are used to illustrate the design flow for applications on embedded multi-core environments equipped with Android systems. We demonstrate our proposed mechanisms by implementing two applications, Face RMS and voice recognition. The proposed framework gives a case study to illustrate software framework and design flow for emerging RMS-based and voice recognition applications on embedded multi-core systems equipped with Android systems.


ACM Transactions on Design Automation of Electronic Systems | 2015

The Design and Experiments of A SID-Based Power-Aware Simulator for Embedded Multicore Systems

Cheng-Yen Lin; Chung-Wen Huang; Chi-Bang Kuan; Shi-Yu Huang; Jenq Kuen Lee

Embedded multicore systems are playing increasingly important roles in the design of consumer electronics. The objective of such systems is to optimize both performance and power characteristics of mobile devices. However, currently there are no power metrics supporting popular application design platforms (such as SID) that application developers use to develop their applications. This hinders the ability of application developers to optimize power consumption. In this article we present the design and experiments of a SID-based power-aware simulation framework for embedded multicore systems. The proposed power estimation flow includes two phases: IP-level power modeling and power-aware system simulation. The first phase employs PowerMixerIP to construct the power model for the processor IP and other major IPs, while the second phase involves a power abstract interpretation method for summarizing the simulation trace, then, with a CPE module, estimating the power consumption based on the summarized trace information and the input of IP power models. In addition, a Manager component is devised to map each digital signal processor (DSP) component to a host thread and maintain the access to shared resources. The aim is to maintain the simulation performance as the number of simulated DSP components increases. A power-profiling API is also supported that developers of embedded software can use to tune the granularity of power-profiling for a specific code section of the target application. We demonstrate via case studies and experiments how application developers can use our SID-based power simulator for optimizing the power consumption of their applications. We characterize the power consumption of DSP applications with the DSPstone benchmark and discuss how compiler optimization levels with SIMD intrinsics influence the performance and power consumption. A histogram application and an augmented-reality application based on human-face-based RMS (recognition, mining, and synthesis) application are deployed as running examples on multicore systems to demonstrate how our power simulator can be used by developers in the optimization process to illustrate different views of power dissipations of applications.


international conference on parallel processing | 2013

Compilers for Low Power with Design Patterns on Embedded Multicore Systems

Cheng-Yen Lin; Chi-Bang Kuan; Jenq Kuen Lee

Minimization of power dissipation can be considered at algorithmic, compilers, architectural, logic, and circuit levels. Recent research trends for multicore programming models have come to the direction that parallel design patterns can be a solution to develop multicore applications. As parallel design patterns are with regularity, we view this as a great opportunity to exploit power optimizations in the software layer. In this paper, we present case studies to investigate compilers for low power with parallel design patterns on embedded multicore systems. We evaluate two major parallel design patterns, Pipe and Filter and MapReduce with Iterator. Our work, attempts to devise power optimization schemes in compilers by exploiting the opportunities of the recurring patterns of embedded multicore programs. In all two cases of the patterns investigated, the common recurring patterns of programs are exploited to seek the opportunity for compiler optimizations for low power. Proposed optimization schemes are rate-based optimization for Pipe and Filter pattern and early-exit power optimization for MapReduce with Iterator pattern. Our experiment is based on a power simulator simulating a heterogeneous multicore system under SID simulation framework. In our experiments, a finite impulse response (FIR) program with Pipe and Filter pattern and an image recognition application applied MapReduce with Iterator pattern are evaluated by incorporating our proposed power optimization schemes for each pattern. Significant power reductions are observed in all two cases. With the case study, we present a direction for power optimizations that one can further identify additional key design patterns for embedded multicore systems to explore power optimization opportunities via compilers.


international conference on parallel processing | 2012

Parallelized Background Substitution System on a Multi-core Embedded Platform

Yutzu Lee; Chen-Kuo Chiang; Te-Feng Su; Yu-Wei Sun; Chi-Bang Kuan; Shang-Hong Lai

We present an automatic human background substitution system based on a Random Walk (RW) algorithm on a multi-core processing architecture. Firstly, a fast algorithm is proposed to solve the large linear system in RW based on adapting the Gauss-Seidel method. Two tables, TYPE and INDEX, are introduced to fast locate the required data for the close-form solution. Then, face detection along with a human shape prior model are exploited to decide the approximated human body and background area. Pixels inside these areas are used as seed points in RW algorithm for automatic segmentation. The proposed method is designed to be highly parallelizable and suitable for running on a multi-core architecture. We demonstrate the parallelization strategies for the proposed fast RW algorithm and face detection on heterogeneous multi-core embedded platform to make the most use of the system architecture. Compared to the single processor implementation, the experimental results show significant speedup ratio of the parallelized human background substitution system on a multi-core embedded platform, which consists of an ARM processor and two DSP cores.


embedded systems for real time multimedia | 2011

Parallelization of a Bokeh application on embedded multicore DSP systems

Chi-Bang Kuan; Shao-Chung Wang; Wen-Li Shih; Kun-Hsien Tsai; Shang-Hong Lai; Jenq Kuen Lee

Bokeh application presents the blur or the aesthetic quality of blurring in out-of-focus areas of an image. The out-of-focus effect of Bokeh results depends on accuracy of depth information and blurring effects produced by image postprocessing. To obtain accurate depth information, current stereo vision techniques however consume a huge amount of processing time. In this paper, we present a case study on parallelizing a Bokeh application on an embedded multicore platform, which features one MPU and one DSP sub-system consisting of two VLIW DSP processors. The Bokeh application employs a Belief Propagation method to obtain depth information of input images and uses the information to generate output images with out-of-focus effect. This study also illustrates how to deliver performance for applications on embedded multicore systems. To sustain heavy computation requirement of the stereo vision techniques, DSPs with their SIMD instructions are leveraged to exploit data parallelism in critical kernels. In addition, DMAs on the multicore system are also incorporated to facilitate data transmission between processors. The access to SIMD and DMAs is provided by two essential programming models we developed for embedded multicore systems. Our work also gives the firsthand experiences of how C++ classes and abstractions can be used to help parallelization of applications on embedded multicore DSP systems. Finally, in our experiments, we utilize DSPs, SIMD and DMAs to obtain performance for two key components of the Bokeh application with their speedups of 1.67 and 2.75, respectively.


signal processing systems | 2014

C++ Support and Applications for Embedded Multicore DSP Systems

Chi-Bang Kuan; Jia-Jhe Li; Chung-Kai Chen; Jenq Kuen Lee

In recent years embedded systems have entered the multicore era. As the number of cores keeps growing in embedded systems, it becomes more important to provide programming support which considers embedded system constraints and in the meanwhile helps utilize multicore systems. So far though C still dominates embedded programming, C++ is gaining in importance in parallel programming. It is promising to support C++ for embedded multicore systems. However, embedded systems usually have tight resource budgets, and C++ is commonly considered having huge code size that embedded systems can not afford. Therefore, in this paper we investigate the code size requirement of a C++ library and propose a layered design to provide a code size aware library support. On the other hand, to utilize embedded multicore systems, we employ C++ linguistic features to facilitate embedded multicore programming. With C++, we incorporate high-level abstractions and design patterns into the programming support to enhance low-level programming APIs that can be used to exploit DSPs, SIMD instructions, and DMAs on embedded multicore systems. At last, we evaluate our C++ support with a Blur and a JPEG program. Our result on a dual-DSP platform shows that we can obtain speedups of 3.32 and 3.09 for the Blur and JPEG program, respectively.


international conference on parallel processing | 2011

C++ Compiler Supports for Embedded Multicore DSP Systems

Chi-Bang Kuan; Jia-Jhe Li; Chung-Kai Chen; Jenq Kuen Lee

The development of embedded systems has moved toward multicore in recent years. As processor numbers continue growing in embedded multicore systems, how to provide efficient programming models and tailored compiler supports becomes a critical issue in developing embedded multicore applications. Though C still dominates embedded computing so far, C++ is gaining importance and popularity in DSP systems for its power and flexibility. In addition, current C++ compilers are able to produce efficient and compact code as C compilers. This increases the practical use of C++ technologies in embedded systems. In this paper, we address issues in supporting C++ compilers and present methods to leverage C++ in embedded multicore computing. Since embedded systems are usually limited by tight resource, code size issues are addressed when supporting C++ libraries. Code size of standard C++ library is analyzed and a library layering technique is provided to guide reasonable library use in embedded applications. Our methods to leverage C++ include enhancing programming models with high-level abstraction and incorporating the programming models with parallel patterns to simplify program parallelization. In our experiments, PAC multi-DSP systems, composed of one MPU and two VLIW DSPs, are used to evaluate the proposed methods. Parallelization results on stereo-vision and image-blurring applications are presented with key components of the systems, including SIMD and DMAs, incorporated to pursue maximal performance. The result shows our approaches with C++ compilers can deliver performance improvements of 61% and 174% for the stereo-vision and image-blurring applications, respectively.


Archive | 2010

Power Aware SID-based Simulator for Embedded Multicore

Dsp Subsystems; Cheng-Yen Lin; Po-Yu Chen; C. L. Tseng; Chung-Wen Huang; Chia-Chieh Weng; Chi-Bang Kuan; Shih-Han Lin; Shi-Yu Huang; Jenq-Kuen Lee

Collaboration


Dive into the Chi-Bang Kuan's collaboration.

Top Co-Authors

Avatar

Jenq Kuen Lee

National Tsing Hua University

View shared research outputs
Top Co-Authors

Avatar

Cheng-Yen Lin

National Tsing Hua University

View shared research outputs
Top Co-Authors

Avatar

Chung-Wen Huang

National Tsing Hua University

View shared research outputs
Top Co-Authors

Avatar

Jia-Jhe Li

National Tsing Hua University

View shared research outputs
Top Co-Authors

Avatar

Shang-Hong Lai

National Tsing Hua University

View shared research outputs
Top Co-Authors

Avatar

Shi-Yu Huang

National Tsing Hua University

View shared research outputs
Top Co-Authors

Avatar

C. L. Tseng

National Tsing Hua University

View shared research outputs
Top Co-Authors

Avatar

Chia-Chieh Weng

National Tsing Hua University

View shared research outputs
Top Co-Authors

Avatar

Po-Yu Chen

National Tsing Hua University

View shared research outputs
Researchain Logo
Decentralizing Knowledge